Merge "Update documentation on Ceph partitioning"

This commit is contained in:
Zuul 2019-02-18 15:33:21 +00:00 committed by Gerrit Code Review
commit 48eb6ed0a4
1 changed files with 28 additions and 172 deletions

View File

@ -240,26 +240,27 @@ the order in which you should build your site files is as follows:
Control Plane Ceph Cluster Notes Control Plane Ceph Cluster Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Environment Ceph parameters for the control plane are located in: Configuration variables for ceph control plane are located in:
``site/${NEW_SITE}/software/charts/ucp/ceph/ceph.yaml`` - ``site/${NEW_SITE}/software/charts/ucp/ceph/ceph-osd.yaml``
- ``site/${NEW_SITE}/software/charts/ucp/ceph/ceph-client.yaml``
Setting highlights: Setting highlights:
- data/values/conf/storage/osd[\*]/data/location: The block device that - data/values/conf/storage/osd[\*]/data/location: The block device that
will be formatted by the Ceph chart and used as a Ceph OSD disk will be formatted by the Ceph chart and used as a Ceph OSD disk
- data/values/conf/storage/osd[\*]/journal/location: The directory - data/values/conf/storage/osd[\*]/journal/location: The block device
backing the ceph journal used by this OSD. Refer to the journal backing the ceph journal used by this OSD. Refer to the journal
paradigm below. paradigm below.
- data/values/conf/pool/target/osd: Number of OSD disks on each node - data/values/conf/pool/target/osd: Number of OSD disks on each node
Assumptions: Assumptions:
1. Ceph OSD disks are not configured for any type of RAID (i.e., they 1. Ceph OSD disks are not configured for any type of RAID, they
are configured as JBOD if connected through a RAID controller). (If are configured as JBOD when connected through a RAID controller.
RAID controller does not support JBOD, put each disk in its own If RAID controller does not support JBOD, put each disk in its
RAID-0 and enable RAID cache and write-back cache if the RAID own RAID-0 and enable RAID cache and write-back cache if the
controller supports it.) RAID controller supports it.
2. Ceph disk mapping, disk layout, journal and OSD setup is the same 2. Ceph disk mapping, disk layout, journal and OSD setup is the same
across Ceph nodes, with only their role differing. Out of the 4 across Ceph nodes, with only their role differing. Out of the 4
control plane nodes, we expect to have 3 actively participating in control plane nodes, we expect to have 3 actively participating in
@ -268,16 +269,12 @@ Assumptions:
(cp\_*-secondary) than the other three (cp\_*-primary). (cp\_*-secondary) than the other three (cp\_*-primary).
3. If doing a fresh install, disk are unlabeled or not labeled from a 3. If doing a fresh install, disk are unlabeled or not labeled from a
previous Ceph install, so that Ceph chart will not fail disk previous Ceph install, so that Ceph chart will not fail disk
initialization initialization.
This document covers two Ceph journal deployment paradigms: It's highly recommended to use SSD devices for Ceph Journal partitions.
1. Servers with SSD/HDD mix (disregarding operating system disks).
2. Servers with no SSDs (disregarding operating system disks). In other
words, exclusively spinning disk HDDs available for Ceph.
If you have an operating system available on the target hardware, you If you have an operating system available on the target hardware, you
can determine HDD and SSD layout with: can determine HDD and SSD devices with:
:: ::
@ -288,28 +285,23 @@ and where a value of ``0`` indicates non-spinning disk (i.e. SSD). (Note
- Some SSDs still report a value of ``1``, so it is best to go by your - Some SSDs still report a value of ``1``, so it is best to go by your
server specifications). server specifications).
In case #1, the SSDs will be used for journals and the HDDs for OSDs.
For OSDs, pass in the whole block device (e.g., ``/dev/sdd``), and the For OSDs, pass in the whole block device (e.g., ``/dev/sdd``), and the
Ceph chart will take care of disk partitioning, formatting, mounting, Ceph chart will take care of disk partitioning, formatting, mounting,
etc. etc.
For journals, divide the number of journal disks as evenly as possible For Ceph Journals, you can pass in a specific partition (e.g., ``/dev/sdb1``),
between the OSD disks. We will also use the whole block device, however note that it's not required to pre-create these partitions, Ceph chart
we cannot pass that block device to the Ceph chart like we can for the will create journal partitions automatically if they don't exist.
OSD disks. By default the size of every journal partition is 10G, make sure
there is enough space available to allocate all journal partitions.
Instead, the journal devices must be already partitioned, formatted, and Consider the following example where:
mounted prior to Ceph chart execution. This should be done by MaaS as
part of the Drydock host-profile being used for control plane nodes.
Consider the follow example where:
- /dev/sda is an operating system RAID-1 device (SSDs for OS root) - /dev/sda is an operating system RAID-1 device (SSDs for OS root)
- /dev/sdb is an operating system RAID-1 device (SSDs for ceph journal) - /dev/sd[bc] are SSDs for ceph journals
- /dev/sd[cdef] are HDDs - /dev/sd[efgh] are HDDs for OSDs
Then, the data section of this file would look like: The data section of this file would look like:
:: ::
@ -318,98 +310,31 @@ Then, the data section of this file would look like:
conf: conf:
storage: storage:
osd: osd:
- data:
type: block-logical
location: /dev/sdd
journal:
type: directory
location: /var/lib/openstack-helm/ceph/journal/journal-sdd
- data: - data:
type: block-logical type: block-logical
location: /dev/sde location: /dev/sde
journal: journal:
type: directory type: block-logical
location: /var/lib/openstack-helm/ceph/journal/journal-sde location: /dev/sdb1
- data: - data:
type: block-logical type: block-logical
location: /dev/sdf location: /dev/sdf
journal: journal:
type: directory type: block-logical
location: /var/lib/openstack-helm/ceph/journal/journal-sdf location: /dev/sdb2
- data: - data:
type: block-logical type: block-logical
location: /dev/sdg location: /dev/sdg
journal: journal:
type: directory type: block-logical
location: /var/lib/openstack-helm/ceph/journal/journal-sdg location: /dev/sdc1
pool:
target:
osd: 4
where the following mount is setup by MaaS via Drydock host profile for
the control-plane nodes:
::
/dev/sdb is mounted to /var/lib/openstack-helm/ceph/journal
In case #2, Ceph best practice is to allocate journal space on all OSD
disks. The Ceph chart assumes this partitioning has been done
beforehand. Ensure that your control plane host profile is partitioning
each disk between the Ceph OSD and Ceph journal, and that it is mounting
the journal partitions. (Drydock will drive these disk layouts via MaaS
provisioning). Note the mountpoints for the journals and the partition
mappings. Consider the following example where:
- /dev/sda is the operating system RAID-1 device
- /dev/sd[bcde] are HDDs
Then, the data section of this file will look similar to the following:
::
data:
values:
conf:
storage:
osd:
- data: - data:
type: block-logical type: block-logical
location: /dev/sdb2 location: /dev/sdh
journal: journal:
type: directory
location: /var/lib/openstack-helm/ceph/journal0/journal-sdb
- data:
type: block-logical type: block-logical
location: /dev/sdc2 location: /dev/sdc2
journal:
type: directory
location: /var/lib/openstack-helm/ceph/journal1/journal-sdc
- data:
type: block-logical
location: /dev/sdd2
journal:
type: directory
location: /var/lib/openstack-helm/ceph/journal2/journal-sdd
- data:
type: block-logical
location: /dev/sde2
journal:
type: directory
location: /var/lib/openstack-helm/ceph/journal3/journal-sde
pool:
target:
osd: 4
where the following mounts are setup by MaaS via Drydock host profile
for the control-plane nodes:
::
/dev/sdb1 is mounted to /var/lib/openstack-helm/ceph/journal0
/dev/sdc1 is mounted to /var/lib/openstack-helm/ceph/journal1
/dev/sdd1 is mounted to /var/lib/openstack-helm/ceph/journal2
/dev/sde1 is mounted to /var/lib/openstack-helm/ceph/journal3
Update Passphrases Update Passphrases
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
@ -685,75 +610,6 @@ permission denied errors from apparmor when the MaaS container tries to
leverage libc6 for /bin/sh when MaaS container ntpd is forcefully leverage libc6 for /bin/sh when MaaS container ntpd is forcefully
disabled. disabled.
Setup Ceph Journals
~~~~~~~~~~~~~~~~~~~
Until genesis node reprovisioning is implemented, it is necessary to
manually perform host-level disk partitioning and mounting on the
genesis node, for activites that would otherwise have been addressed by
a bare metal node provision via Drydock host profile data by MaaS.
Assuming your genesis HW matches the HW used in your control plane host
profile, you should manually apply to the genesis node the same Ceph
partitioning (OSDs & journals) and formatting + mounting (journals only)
as defined in the control plane host profile. See
``airship-treasuremap/global/profiles/host/base_control_plane.yaml``.
For example, if we have a journal SSDs ``/dev/sdb`` on the genesis node,
then use the ``cfdisk`` tool to format it:
::
sudo cfdisk /dev/sdb
Then:
1. Select ``gpt`` label for the disk
2. Select ``New`` to create a new partition
3. If scenario #1 applies in
site/$NEW\_SITE/software/charts/ucp/ceph/ceph.yaml\_, then accept
default partition size (entire disk). If scenario #2 applies, then
only allocate as much space as defined in the journal disk partitions
mounted in the control plane host profile.
4. Select ``Write`` option to commit changes, then ``Quit``
5. If scenario #2 applies, create a second partition that takes up all
of the remaining disk space. This will be used as the OSD partition
(``/dev/sdb2``).
Install package to format disks with XFS:
::
sudo apt -y install xfsprogs
Then, construct an XFS filesystem on the journal partition with XFS:
::
sudo mkfs.xfs /dev/sdb1
Create a directory as mount point for ``/dev/sdb1`` to match those
defined in the same host profile ceph journals:
::
sudo mkdir -p /var/lib/ceph/cp
Use the ``blkid`` command to get the UUID for ``/dev/sdb1``, then
populate ``/etc/fstab`` accordingly. Ex:
::
sudo sh -c 'echo "UUID=01234567-ffff-aaaa-bbbb-abcdef012345 /var/lib/ceph/cp xfs defaults 0 0" >> /etc/fstab'
Repeat all preceeding steps in this section for each journal device in
the Ceph cluster. After this is completed for all journals, mount the
partitions:
::
sudo mount -a
Promenade bootstrap Promenade bootstrap
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~