Ceph Nautilus Installation Guide

Ceph Version 14 - Nautilus

Installation Guide for CentOS 7.

Release Version 1.4

CHAPTER 1 - WHAT IS CEPH?

Ceph is a license free, open source storage platform that ties together multiple storage servers to provide interfaces for object, block and file-level storage in a single, horizontally scalable storage cluster, with no single point of failure.

Ceph clusters consist of several different types of services which will be explained below :

1.1 - ANSIBLE  ADMINISTRATOR NODE

T his type of node is where ansible will be configured and run from. Any node in the cluster can functi on as the ansible node. This node  provide the following functions:

  • Centralized storage cluster management

  • Ceph configuration files and keys

  • Optionally, local repositories for installing Ceph on nodes that cannot access the Internet

1.2 - MONITOR NODES

Each monitor node runs the monitor daemon ( ceph-mon ), which maintains a master copy of the cluster map. The cluster map includes the cluster topology. A client connecting to the Ceph cluster retrieves the current copy of the cluster map from the monitor which enables the client to read from and write data to the cluster. It’s Important  to note that Ceph can run with one monitor; however, it is highly suggested to have three monitors to ensure high availability.

1.3 - OSD NODES

Each Object Storage Device (OSD) node runs the Ceph OSD daemon ( ceph-osd ), which interacts with  logical disks attached to the node . Simply put, an OSD node  is a server, and an OSD  itself is an HDD or SSD inside the server.  Ceph stores data on these OSDs. Ceph can run with very few OSD nodes, where the minimum is three , but production clusters realize better performance beginning at modest scales, for example 5 OSD nodes in a storage cluster . Ideally, a Ceph cluster has multiple OSD nodes, allowing isolated failure domains by creating the CRUSH map.

1.4 - MANAGER NODES

Each Manager node runs the MGR daemon ( ceph-mgr ), which maintains detailed information about placement groups, process metadata and host metadata in lieu of the Ceph Monitor—​significantly improving performance at scale. The Ceph Manager handles execution of many of the read-only Ceph CLI queries, such as placement group statistics. The Ceph Manager also provides the RESTful monitoring APIs.  The manager node is also responsible for dashboard hosting, giving the user real time metrics, as well as the capability to create new pools, exports, etc.

1.5 - MDS NODES

 Each Metadata Server (MDS) node runs the MDS daemon ( ceph-mds ), which manages metadata related to files stored on the Ceph File System (CephFS). The MDS daemon also coordinates access to the shared cluster. The MDS daemon maintains a cache of CephFS metadata in system memory to accelerate IO performance. This cache size can be grown or shrunk based on workload, allowing linearly scaling of performance as data grows. The service is required for CephFS to function.

1.6 - OBJECT GATEWAY NODES

Ceph Object Gateway node runs the Ceph RADOS Gateway daemon ( ceph-radosgw ), and is an object storage interface built on top of librados to provide applications with a RESTful gateway to Ceph Storage Clusters. The Ceph Object Gateway supports two interfaces:

  • S3  - Provides object storage functionality with an interface that is compatible with a large subset of the Amazon S3 RESTful API.

  • Swift - Provides object storage functionality with an interface that is compatible with a large subset of the OpenStack Swift API.

Below is a diagram showing the architecture of the above services, and how they communicate on the networks.

The cluster network relieves OSD replication and heartbeat traffic from the public network.

CHAPTER 2 - REQUIREMENTS FOR INSTALLING CEPH

Before starting the installation and configuration of your Ceph cluster, there are a few requirements that need to be met.

2.1 - HARDWARE REQUIREMENTS

As mentioned before, there are minimum quantities required for the different types of nodes. Below is a table showing the minimum number required to achieve a highly available Ceph cluster. It is important  to note that the MON’s, MGR’s, MDS’s, FSGW’s , and RGW’s  can be either virtualized or on physical hardware.

Pool Type

OSD

MON

MGR

MDS

FSGW

RGW

2 Rep / 3 Rep

3

3

2

2

2

2

Erasure Coded

3

3

2

2

2

2

2.2 - OPERATING SYSTEM

45Drives requires that Ceph Naut i l us  be deployed on a minimal installation of CentOS 7.6  or newer. Every node in the cluster should be running the same version to ensure uniformity.

2.3 - NETWORK CONFIGURATION

As seen in the Figure in Chapter 1, all Ceph nodes require a public network. It is required to have a network interface card configured to a public network where Ceph clients can reach Ceph monitors and Ceph OSD nodes.

45Drives recommends having a second network interface card configured as a backend private  network so that Ceph can conduct heart-beating, peering, replication, and recovery on a network separate from the public network. It is recommended to configure network   b ond ing  on each network interface card across the cluster. Choice of bonding mode will vary depending on needs. 45Drives recommends, either bonding mode 1 (Active-Backup) and 4 (LACP) for the cluster nodes.

2.4 - CONFIGURING FIREWALLS

By default w hen installing Ceph using these ansible packages , it will open  the required  firewall ports on the appropriate nodes using firew alld .

If using iptables or  requiring manual firewall configuration,  the following  is a table for reference showing the default ports / ranges which are required for each c e ph daemon as well as services used for real time metrics. These must be open before you begin installing the cluster.

Note the cluster role column, that will determine which hosts need the ports opened. It corresponds with the group names in the ansible inventory file.

Ceph Daemon / Service

Firewall Port

Protocol

Firewalld

Service Name

Cluster Role

ceph-osd

6800-7300

TCP

ceph

osds

ceph-mon

6789,3300

TCP

ceph, ceph-mon

mons

ceph-mgr

6800-7300

TCP

ceph

mgrs

ceph-mds

6800

TCP

ceph

mdss

ceph-radosg w [1]

8080

TCP

rgws

Ceph Prometheus Exporter

9283

TCP

mgrs

Node Exporter

9100

TCP

metric

Prometheus Server

9090

TCP

metric

Alertmanage r

9091

TCP

metric

Grafana Server

3000

TCP

metric

nfs

2049

TCP

nfs

nfss

rpcbind

111

TCP/UDP

rpc-bind

nfss

corosync

5404-5406

UDP

nfss

pacemaker

2224

TCP

nfss

samba

137,138

UDP

samba

smbs

samba

139,445

TCP

samba

smbs

CTDB

4379

TCP/UDP

ctdb

smbs

iSCSI Target

3260

TCP

iscsigws

iSCSI API Port

5000

TCP

iscsigws

iSCSI Metric Exporter

9287

TCP

iscsigws

2.5 - CONFIGURING PASSWORDLESS SSH

Generate an SSH key pair on the Ansible administrator node and distribute the public key to all other nodes in the storage cluster so that Ansible can access the nodes without being prompted for a password.

Perform the following steps from the Ansible administrator node, as the root user.

  1. Generate the SSH key pair, accept the default filename and leave the passphrase empty:

[root @cephADMIN  ~] $ ssh-keygen

      2. Copy the public key to all nodes in the storage cluster:

[root@cephADMIN ~]$ ssh- copy -id root@ $HOST_NAME

Replace $HOST_NAME with the host name of the Ceph nodes.

Example

[root @cephADMIN  ~] $ ssh-copy-id root @cephOSD1

2.6 - INSTALL CEPH-ANSIBLE-45D

45Drives provides a slightly  modified ceph-ansible repository on GitHub. From the Ansible administrator node, the first thing to do is to pull down the ceph-ansible archive and run the admin-setup script.

[root @cephADMIN  ~] $ cd /etc/yum.repos.d/ [root @cephADMIN  ~] $ curl -LO http: / /images.45drives.com/ceph/rpm/ceph _45drives.repo [root @cephADMIN  ~] $ yum install ceph-ansible- 45 d

[root @cephADMIN  ~] $ touch /usr/share/ceph-ansible/hosts

[root @cephADMIN ~] $ ln -s /usr/share/ceph-ansible/hosts /etc/ansible

This will set up  the Ansible environment for a 45Drives Ceph cluster.

CHAPTER 3 - DEPLOYING CEPH NAUTILUS

This chapter describes how to use the Ansible application to deploy a Ceph cluster and other components, such as Metadata Servers, File System Gateways, Ceph Object Gateways etc.

3.1 - PREREQUISITES

Prepare the cluster nodes. On each node verify:

  • Passwordless SSH configured from Ansible Node  to all other nodes

  • Network ing configured

  • All nodes must be reachable from each on the public network

  • All OSDS nodes must be reachable on the cluster network as well

3.2 - INSTALLING A CEPH CLUSTER

  1. Navigate to the /usr/share/ceph-ansible/ directory

[root @cephADMIN  ~] # cd /usr/share/ceph-ansible/

  1. Edit the hosts file and place the hostnames under the correct blocks. It is common to collocate the Ceph Manager ( ceph_mgr ) with the Ceph Monitor nodes. If you have a lengthy list with sequential naming you can use a range such as OSD_[1:10] . See hosts.sample for a full list of host groups available.

[mons] MONITOR_1 MONITOR_2 MONITOR_3 [mgrs] MONITOR_1 MONITOR_2 MONITOR_3 [osds] OSD_1 OSD_2 OSD_3

  1. Ensure that Ansible can reach the Ceph hosts. Until this finishes with success, stop here and verify the connectivity to each host in your host file.

[root @cephADMIN  ceph-ansible] # ansible all -m ping

  1. Edit the group_vars/all.yml file:

[root @cephADMIN  ceph-ansible] # vim group_vars/all.yml

Below is a table that includes the minimum parameters that have to be updated.

Option

Value

Required

Notes

monitor_interfa ce

The interface that the Monitor nodes listen to (eth0, bond0, etc)

1 of the 3

monitor_interfa ce , monitor_address , or monitor_address _ block  is required

public_network

The IP address and netmask of the Ceph public network

Yes

In the form of: 192.168.0.0/16

cluster_network

The IP address and netmask of the Ceph cluster network

No, defaults to public_network

In the form of: 10.0.0.0/24

hybrid_cluster

Are there SSD and HDD OSDs in the cluster ?

No, defaults to false

In the form of true or false

  1. Run the ansible-playbook to configure device alias’.

[root @cephADMIN  ceph-ansible] # ansible-playbook device-alias.yml

  1. Run the ansible-playbook to generate-osd-vars.yml to populate device variables. This will use every disk present in the chassis. If you want to exclude certain drives manually remove them from the host_vars/ file.

[root @cephADMIN  ceph-ansible] # ansible-playbook generate-osd-vars.yml

  1. Run the ceph-ansible playbook to build the cluster. When it finishes, the core components of the cluster are deployed and can be verified by running “ceph -s” from one of the monitor nodes.

[root @cephADMIN  ceph-ansible] # ansible-playbook core2.yml

3.3 - INSTALLING METADATA SERVERS (CephFS )

Metadata Server daemons are necessary for deploying a Ceph File System. This section will show you using Ansible, how to install a Ceph Metadata Server (MDS)

  1. Add a new section [mdss]  to the / usr/share/ceph-ansible /hosts file:

[mdss] cephMDS1 cephMDS2 cephMDS3

  1. Run the cephfs.yml  playbook and  install and configure the Ceph Metadata Servers.

[ root @cephADMIN  ceph-ansible] # ansible-playbook cephfs.yml

  1. Verify the file system from one of the cluster nodes

[root @cephMON  ~] # ceph fs status

3.4 - INSTALLING THE CEPH OBJECT GATEWAY

The Ceph Object Gateway, also known as the RADOS gateway, is an object storage interface built on top of the librados API to provide applications with a RESTful gateway to Ceph storage clusters.

  1. Add a new section, [rgws]  to the / usr/share /ceph-ansible/hosts  file . Be sure to define the IP for each rgw as well.

[rgws] cephRGW1 radosgw_address= 192.168.18.53 cephRGW2 radosgw_address= 192.168.18.56

  1. The default port will be 80 , change the following line in group_vars/all.yml if another is desired:

radosgw_civetweb_port:   8080

  1. Below is an example section of the group_vars/all.yml.

## Rados Gateway options # radosgw_frontend_type: beast radosgw_civetweb_port: 8080 radosgw_civetweb_num_threads: 100 radosgw_civetweb_options: “num_threads= {{ radosgw_civetweb_num_threads }} ” # For additional civetweb configuration options available such as SSL, logging, # keepalive, and timeout settings, please see the civetweb docs at # https://github.com/civetweb/civetweb/blob/master/docs/UserManual.md radosgw_frontend_port: ” {{ radosgw_civetweb_port if radosgw_frontend_type == ‘civetweb’ else ‘8080’ }} ” radosgw_frontend_options: ” {{ radosgw_civetweb_options if radosgw_frontend_type == ‘civetweb’ }} “

  1. Run the rgws.yml  playbook to install the RGWs.

[root @cephADMIN  ceph-ansible] # ansible-playbook radosgw.yml

  1. Verify with:

[root@cephADMIN ceph-ansible]# curl -g http://cephRGW1:8080 <? xml version= “1.0”  encoding= “UTF-8” ?> <ListAllMyBucketsResult xmlns= “http://s3.amazonaws.com/doc/2006-03-01/” ><Owner><ID> anonymous </ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllM yBucketsResult>

3.4.1 Configure Haproxy for RGW Load Balancing

Since each object gateway instance has its own IP address, HAProxy and keepalived can be used to balance the load across Ceph Object Gateway servers.

Another use case for HAProxy and keepalived is to terminate HTTPS at the HAProxy server. You can use an HAProxy server to terminate HTTPS at the HAProxy server and use HTTP between the HAProxy server and the RGWs.

  1. Add a new section, [rgwloadbalancers]  to the /usr/share/ceph-ansible/hosts  file. The RGW nodes themselves can be used or other CentOS servers

[rgwloadbalancers] cephRGW1 cephRGW2

  1. Edit group_vars/rgwloadbalancers.yml and specify

  2. Virtual IP(s)

  3. Virtual IP Netmask

  4. Virtual IP interface

haproxy_frontend_port: 80 haproxy_frontend_ssl_port: 443 haproxy_frontend_ssl_certificate: haproxy_ssl_dh_param: 4096 haproxy_ssl_ciphers:  - EECDH+AESGCM  - EDH+AESGCM haproxy_ssl_options:  - no-sslv3  - no-tlsv10  - no-tlsv11  - no-tls-tickets virtual_ips:   - 192.168.18.57 virtual_ip_netmask: 16 virtual_ip_interface: eth0

3.5 - CONFIGURING THE SMB GATEWAYS

There are two cases that will be covered in this section. They are:

  1. CephFS + Samba + Active Directory Integration

  2. CephFS + Samba + Local Users

The SMB Gateways can be physical hardware or virtual machines . CTDB will be configured with a floating IP for access to the Samba share.

The CephFS volume will be mounted on each gateway at /mnt/cephfs/  - and then the gateways share out that directory via SMB.

Edit the hosts  file to include the File System Gateways in the [smbs] block.

[smbs] smb1 smb2

3.5.1 CephFS + Samba + Active Directory Integration Edit the group_vars/smbs.yml file to choose the samba role

# Roles samba_server: true samba_cluster: true domain_member: true

Edit group_vars/smbs.yml to edit active_directory_info

active_directory_info:  workgroup: ‘SAMDOM’  idmap_range: ‘100000 - 999999’  realm: ‘SAMDOM.COM’  winbind_enum_groups: yes  winbind_enum_users: yes  winbind_use_default_domain: yes  domain_join_user: ‘’  domain_join_password: ‘’

Edit group_vars/smbs.yml to edit ctdb_public_addresses

ctdb_public_addresses :  - vip_address : ‘192.168.103.10’     vip_interface : ‘eth0’     subnet_mask : ‘16’  - vip_address : ‘192.168.103.11’     vip_interface : ‘eth0’     subnet_mask : ‘16’

Edit group_vars/smbs.yml to edit samba_shares

samba_shares :  - name : ‘share1’     path : ‘{{ shared_storage_mountpoint }}/fsgw/share1’     writeable : ‘yes’     guest_ok : ‘no’     comment : “comment for share1”  - name : ‘share2’     path : ‘{{ shared_storage_mountpoint }}/fsgw/share2’     writeable : ‘yes’     guest_ok : ‘no’     comment : “comment for share2”

3.5.2 CephFS + Samba + Local Users

If AD is not being used, set domain_member to false.

Edit the group_vars/smbs.yml file to choose the samba role

# Roles samba_server: true samba_cluster: true domain_member: false

Edit group_vars/smbs.yml to edit samba_shares

samba_shares :  - name : ‘share1’     path : ‘{{ shared_storage_mountpoint }}/fsgw/share1’     writeable : ‘yes’     guest_ok : ‘no’     comment : “comment for share1”  - name : ‘share2’     path : ‘{{ shared_storage_mountpoint }}/fsgw/share2’     writeable : ‘yes’     guest_ok : ‘no’     comment : “comment for share2”

Edit group_vars/smbs.yml to edit ctdb_public_addresses

ctdb_public_addresses :  - vip_address : ‘192.168.103.10’     vip_interface : ‘eth0’     subnet_mask : ‘16’  - vip_address : ‘192.168.103.11’     vip_interface : ‘eth0’     subnet_mask : ‘16’

Run the smb.yml  playbook:

[root @cephADMIN  ceph-ansible] # ansible-playbook smb.yml

3.5.2 Samba Overrides

In either case, all smb.conf global config options can be overridden using “/etc/samba/overrides.conf”. This is meant to be used when making user (no ansible) changes or when needing an option that is not defined in the playbooks. The override.conf assumes the same syntax as the main smb.conf.

For example if not using the recommended centralized share management , you could define a share in /etc/samba/overrides.conf

[global]

    log level = 3

[share1]    path = /mnt/cephfs/fsgw/share1    comment = comment for  share1    valid  users = user1    Write list = user1

3.5.3 Centralized Share Management

Samba offers a registry based configuration system to complement the original text-only configuration via smb.conf. The “net conf” command offers a dedicated interface for reading and modifying the registry based configuration.

3.5.3.1 Adding share

To create a new share “share1” using net conf

net  conf addshare share1   $PATH  writable=[ yes | no ] guest_ok=[ yes | no ] “comment”

To add extra parameters like “valid users” and “write list”:

net  conf setparm share1   $PATH  “valid users” “@readonly,@trusted”

net  conf setparm share1   $PATH  “write list” “@trusted”

3.5.3.2 Removing a share

To remove a share called “share1” using net conf

net  conf delshare share1

3.5.3.3 Listing current shares

To show all defined shares

net  conf list

To show a specific share named “share1”

net  conf show share1

3.7 - CONFIGURING CEPHFS WITH NFS GANESHA

3.7.1 - Prerequisites

  • An Ansible deployed ceph cluster

  • Ceph File-System created, called “cephfs” for this example

  • Node(s) to act as NFS Gateway

  • NFS Gateways can be physical hardware or virtual machines

  • Password-less SSH access from ansible node

3.7.2 - Active-Active Configuration

Active-Active NFS

  • No floating IP. Shares accessible from every gateway IP​.

  • HA only possible if application can multipath.

  • Useful for highly concurrent use cases.

First thing to do is edit the / usr/share /ceph-ansible/group_vars/nfss.yml  file. NFS Ganesha can be setup on top of Filesystem or Object, so you need to specify in the file:

nfs_file_gw: true nfs_obj_gw: false

Set the backend driver to “rados_cluster” in /usr/share/ceph-ansible/group_vars/nfss.yml .

# backend mode , either rados_kv,rados_ng, or  rados_cluster # Default (rados_ng) is   for  single gateway/active-passive use.

# rados_kv is  obsoleted by  rados_ng # rados_cluster is   for  active-active nfs cluster .

# Requires ganesha-grace-db to be  initialized ceph_nfs_rados_backend_driver: “rados_cluster”

Now add the NFS Gateway hostnames to the /root/ceph-ansible-45d/hosts  file

[nfss] cephNFS1 cephNFS2

Next run the nfs.yml playbook:

[root @cephADMIN  ceph-ansible] # ansible-playbook nfs.yml

This playbook will install all necessary packages as well as setup the default export.

Setting up all of your exports can be done from the dashboard which will be setup in the next section.

3.7.3 - Active-Passive Configuration

Active-Passive

  • Floating IP. Service only running on 1 of the gateways at a time.​

  • HA possible for all clients

First thing to do is edit the /usr/share/ceph-ansible/group_vars/nfss.yml  file. NFS Ganesha can be setup on top of Filesystem or Object, so you need to specify in the file:

nfs_file_gw: true nfs_obj_gw: false

Set the backend driver to “rados_ng” in /usr/share/ceph-ansible/group_vars/nfss.yml .

# backend mode , either rados_kv,rados_ng, or  rados_cluster # Default (rados_ng) is   for  single gateway/active-passive use.

# rados_kv is  obsoleted by  rados_ng # rados_cluster is   for  active-active nfs cluster .

# Requires ganesha-grace-db to be  initialized ceph_nfs_rados_backend_driver: “rados_ng”

Specify the floating IP the NFS-Ganesha Gateway will be reachable from. in /usr/share/ceph-ansible/group_vars/nfss.yml .

ceph_nfs_floating_ip_address:   ‘192.168.18.73’ ceph_nfs_floating_ip_cidr:   ‘16’

Now add the NFS Gateway hostnames to the /usr/share/ceph-ansible/hosts  file

[nfss] cephNFS1 cephNFS2

Next run the nfs.yml playbook:

[root @cephADMIN  ceph-ansible] # ansible-playbook nfs.yml

This playbook will install all necessary packages as well as setup the default export.

Setting up all of your exports can be done from the Ceph dashboard.

Set the following ceph configuration setting to allow nfs to failover properly.

[root @cephADMIN  ceph-ansible] # ceph config set mds mds_cap_revoke_eviction_timeout 10

3.7 - CONFIGURING RBD + iSCSI

3.7.1 Prerequisites

  • An Ansible deployed ceph cluster

  • Node(s) to act as iSCSI Gateway

  • iSCSI Gateways can be physical hardware or virtual machines.

  • iSCSi Gateways can be co-located on the OSDs nodes.

  • Password-less SSH access from the ansible node.

3.7.2 Ceph iSCSI Installation

See Knowledge Base article for more detail. Ceph iSCSI Configuration

Add iSCSI gateways hostnames to /usr/share/ceph-ansible/hosts

[iscsigws] iscs i1 iscs i2 iscs i3

Next run the iscsi.yml playbook

[root @cephADMIN  ceph-ansible] # ansible-playbook iscsi.yml

3.7.3 Ceph iSCSI Configuration

Configuration on the iSCSI nodes is to be done on the iSCSI gateways and the ceph dashboard.

From one of the iSCSI nodes, create the initial iSCSI gateways with gwcli . Note the first time gwcli is run you will be promoted with the warning below, it can be ignored as gwcli will create an initial preferences file if not present.

[root@iscsi1 ~] # gwcli Warning: Could not  load preferences file /root/ .gwcli/prefs.bin. >

Create iSCSI target of for the cluster

[root@iscsi1 ~]# gwcli >  /> cd  /iscsi-target >  /iscsi-target>  create iqn.2003-01.com.45drives.iscsi-gw:iscsi-igw

Create the first iSCSI gateway. It has to be the node you are running this command on.

[root@iscsi1 ~]# gwcli > cd /iscsi-targets/iqn.2003-01.com.45drives.iscsi-gw:iscsi-igw/gateways /iscsi-target.. .7283 /gateways> create iscsi1 .45 lab.com 192.168 .*.*

The Ceph Administration dashboard is recommended to finish iSCSI configuration. See Section 5 before proceeding here.

CHAPTER 4 - EXPANDING THE CLUSTER

CHAPTER 5 - CONFIGURING THE MANAGEMENT DASHBOARDS

5.1 Installing Ceph Dashboard

Using Ansible, the steps below will be install and configure the metric collection/alert stack.This will also configure and start the ceph management UI.

By default the metric stack will be installed to the first node running the manager service in your cluster. Optionally to specify another server to host this stack use the group label “metrics”

[metrics] metric1

The Ceph Dashboard is hosted by ceph-mgr service. The dashboard playbook will also install haproxy on the metric server, this way the dashboard will be reachable from any of the nodes running the ceph-mgr service as well as the metric server itself.

Enable/disable, set port, protocol, and cert in group_vars/all.yml

dashboard_enabled: true # When true HAProxy will be installed on the server in the metric group dashboard_haproxy: true dashboard_haproxy_port: 80 dashboard_haproxy_protocol: http dashboard_haproxy_cert:

Run the /usr/share/ceph-ansible/dashboard.yml  file:

[root @cephADMIN  ceph-ansible] # ansible-playbook dashboard.yml

Below is a table of the default ports for the dashboards that were configured. These can be modified in /usr/share/ceph-ansible/group_vars/all.yml  file

Name

Default Port

Grafana

3000/tcp

Prometheus

9090/tcp

Alertmanager

9091/tcp

Ceph Dashboard

8234/tcp

CHAPTER 6 - MANAGING STORAGE POOLS VIA CLI

It is recommended to create storage pools in the Ceph Dashboard . The below chapter will go into detail on the specifics of creating pool.

Before creating pools, there are a few things to consider. First thing is the number of placement groups. Second being what type of pool is to be created - replicated or erasure coded.

6 .1 - Placement Groups

Sizing placement groups is very important. You can always increase the size of placement groups later but never decrease it. Increasing the number of placement groups will cause the data to begin migrating to be evenly spread across all placement groups. This will put a strain on cluster performance until the migration is complete.

It is best to use the pg_calculator  to have the proper number of placement groups. If you’re unsure what to choose, start with 64 placement groups per pool, and then increase the number of placement groups at a later date (ideally before you put data on the pool).

A quick rule of thumb:

  • Less than 5 OSDs, set pg_num to 128

  • Between 5 and 10 OSDs, set pg_num to 512

  • Between 10 and 50 OSDs, set pg_num to 1024

  • If you have more than 50 OSDs, understand the tradeoffs and use the calculator

6 .2 - Pool Types

Ceph stores data in pools and there are two types of pools:

  • Replicated

  • Erasure-coded

Ceph uses the replicated pools by default, meaning the Ceph copies every object from a primary OSD node to one or more secondary OSDs. Erasure coding is a method of storing an object where the erasure code algorithm breaks the object into data chunks ( k ) and coding chunks ( m ), and stores those chunks in different OSDs. Erasure coding uses storage capacity more efficiently than replication. The n-replication approach maintains  n  copies of an object (3x by default in Ceph), whereas erasure coding maintains only  k + m  chunks. For example, 3 data and 2 coding chunks use 1.5x the storage space of the original object.

Below is a table showing pool type, required number of OSD nodes, and storage efficiency.

Pool Type

Storage Efficiency

Minimum # of OSD Nodes

Recommended # of OSD Nodes

2-replication

50%

3

3+

3-replication

33%

3

3+

2+1 Erasure Coded

66%

4

5

4+2 Erasure Coded

66%

8

10

8+2 Erasure Coded

80%

12

14

8+4 Erasure Coded

66%

16

20

* NOTE  - Minimum # of OSD  Nodes can withstand ‘m’ number of failures, after that I/O will stop until you recover that OSD node. The Recommended # of OSD  node gives that extra cushion of protection and keeps  I/O going while you recover the failed OSD node.

6 .3 - Creating a Pool

Below is the syntax required for creating a ceph pool:

ceph osd  pool create {pool-name} {pg-num} [{pgp-num}] [replicated] [crush-ruleset-name] ceph osd  pool create {pool-name} {pg-num} {pgp-num} erasure [erasure-code-profile]

A few things to note:

Variable

Description

Type

Required?

Default Value

pool-name

Name of the pool.  Must be unique.

String

Yes

pg-num

Total number of placement groups.

Integer

Yes

8

pgp-num

Total number of placement groups for placement purposes. pg-num=pgp- num

Integer

Yes

8

replicated| erasure

Pool type. Replication level will be set later, erasure-cod e profile needs to be defined at time of pool creation.

String

No

replicated

crush-rules et-name

Name  of a CRUSH ruleset to use for this pool. Ruleset must exist.

String

No

For replicated pools it is the ruleset specified by the osd pool default crush replicated ruleset  config variable. This ruleset must exist. For erasure pools it is erasure-cod e if the default erasure code profile is used or {pool-name} otherwise. This ruleset will be created implicitly if it doesn’t already exist.

erasure-cod e-profile

It must be an existing profile as defined by  osd erasure-cod e-profile set .

String

No

expected-nu m-objects

The expected number of objects for this pool.

Integer

No

0

Example:  This corresponds to a cluster with 120 OSDs, a replicated pool.

ceph osd  pool create tank_data 8192 8192 replicated ceph osd  pool create tank_metadata 256 256 replicated

CHAPTER 7 - UPGRADING A CEPH CLUSTER

There are two  types of updates when it comes to a Ceph cluster.

  1. Minor Updates

  2. Major Updates

Both types can be completed without cluster downtime, but release notes should be reviewed in both cases.

7.1 - MINOR UPDATES

Minor updates are minor bug fixes released every 4-6 months. These are quick updates that can be done safely by simply running a yum update .

If a new kernel is installed, a reboot will be required to take effect. If there is no kernel update you can stop here.

If there is a new kernel, set osd flag noout  and norebalance  to prevent the cluster from trying to heal itself while the nodes reboot one by one.

ceph osd set  flag noout ceph osd set  flag norebalance

Then reboot each node one at a time. Do not reboot the next node until the prior is up and back in the cluster. After each node is rebooted, unset the flags set earlier when you’re all done.

ceph osd unset  flag noout ceph osd unset  flag norebalance

7.2 - MAJOR UPDATES

Major updates are applied with ansible. To upgrade to the next major release edit the group_vars/all.yml  in the ceph-ansible-45d directory. In the INSTALL  heading, find the line ceph_stable_release , replace the existing release with the next stable version. For example:  updating from Mimic (13.2.X) to Nautilus (14.2.X)

[root@cephADMIN ~]# vim /root/ceph-ansible-45d-1.2/group_vars/all.yml >  Change “ceph_stable_release: mimic” >  To     “ceph_stable_release: nautilus”

Now, run the rolling-updates.yml  playbook.

[root @cephADMIN  ceph-ansible -45 d -1.2 ] # ansible-playbook infrastructure-playbooks/rolling-updates.yml

It will take some time, but all data is up and accessible during the update. You can verify all nodes are running the new version by running the following command:

[root @cephADMIN  ~] # ceph versions

[1]  This port is user specified.  It defaults to 8080