specs/doc/source/specs/stx-6.0/approved/storage_2009074_upgrade-cep...

7.7 KiB

Ceph upgrade from Mimic to Nautilus

Storyboard: https://storyboard.openstack.org/#!/story/2009074

This story covers the upgrade of Ceph from Mimic to Nautilus. The upgrade also includes code and configuration changes in StarlingX components that are needed to support Nautilus.

Official instructions about how to migrate from Mimic to Nautilus can be found in1.

Problem description

Mimic end of life happened in 2020-07-22. It is needed to choose an active release that supports an automated version migration between releases (i.e. start MON/OSD/MDS and service data formatting is migrated to new formats if required from Mimic version).

This will require to evaluate historic HA reliability code and remove/retire unneeded code, enable new/default supported features (Bluestore, systemd service files, use ceph-volume instead of ceph-disk for OSD deployment) and enable ease of future upgrades.

Use Cases

Users should be able to have access to the same current storage features without noticing a difference between ceph versions.

Proposed change

Firstly we should focus in building a StarlingX ISO having Ceph Nautilus built.

Other choices such as Octopus or Pacific are ruled out because we want to align with what is currently supported by Debian Bullseye which is Nautilus. In addition, Pacific only supports migration from Octopus or Nautilus2.

After having the image built, we can evaluate the changes made in Ceph Mimic downstream and port those that are needed for Ceph Nautilus downstream. Next, we should be able to install an AIO-SX successfully having Ceph Nautilus built in it.

It will be needed to check integration of Ceph subsystems with the new Ceph version. The subsystems are:

  • ceph-manager
  • python-cephclient
  • mgr-restful-plugin
  • puppet
  • ansible playbooks

New features enablement

Having an ISO, we can verify the enablement of some new features such as:

  • Switch OSDs to BlueStore from FileStore
    • BlueStore is the default technology for OSDs (starting in Luminous) which improves performance over the previous FileStore technology. More details can be found in3.
  • Switch from sysvinit services/HA scripts to systemd/HA driven services
    • Ceph upstream uses systemd to control ceph process initialization. This was disabled in downstream to maintain historical (Ceph Hammer and Ceph Jewel) use of sysvinit script optimizations.
  • Migrate OSD deployment to use ceph-volume due to ceph-disk deprecation

Investigate differences between ceph versions

It is possible that some commands were changed/deprecated between migration from Mimic to Nautilus. It will be needed to verify what are those commands and see what are the difference and their impact in the overall system. The impacts might happen in the following projects/modules:

  • config:
    • sysinv/cgts-client
    • sysinv/sysinv
  • stx-puppet:
    • puppet-manifests/src/modules/platform/manifests/ceph.pp
  • utilities:
    • ceph/ceph-manager
    • ceph/python-cephclient
  • integ:
    • ceph
    • config/puppet-modules/openstack/puppet-ceph-2.2.0 - upgrade to 3.1.1

Alternatives

Ceph Octopus might be an alternative if it shows up in Debian bullseye package list and if time permits.

Data model impact

N/A

REST API impact

Impact will depend on the required changes of Nautilus commands.

Security impact

N/A

Other end user impact

N/A

Performance Impact

  • Performance improvement should happen when switching from FileStore OSDs to BlueStore.
  • Replacement of ceph-disk by ceph-volume should increase reliability and improve performance. Details to be verified in4.

Other deployer impact

N/A

Developer impact

N/A

Upgrade impact

It should be possible to upgrade to the next releases in a simpler way. New features to be enabled should provide a better user experience.

Implementation

Assignee(s)

Primary assignee:

Vinícius Lopes da Silva (viniciuslopesdasilva)

Other contributors:
  • Delfino Gomes Curado Filho (dcuradof)
  • Felipe Sanches Zanoni (fsanches)
  • Mauricio Biasi do Monte Carmelo (mbiasido)
  • Thiago Oliveira Miranda (thiagooliveiramiranda)
  • Alan Kyoshi (akyoshi)
  • Daniel Pinto Barros (dbarros)

Repos Impacted

  • config
  • integ
  • stx-puppet
  • ha
  • ansible-playbook
  • utilities

Work Items

  • Verify compatibility between Nautilus and Mimic. According to upgrade compatibility notes, there are some commands that have changed between versions and we should make sure of the impact in current implementation.
  • Current OSDs are FileStore based, Nautilus supports BlueStore OSDs. So it will be needed to determine the feasibility of migrating from FileStore to BlueStore OSDs. It will also be required to determine if FileStore and BlueStore OSDs can coexist.
  • Current Ceph's default use of systemd to control ceph process initialization is disabled. It should be re-enabled and evaluate the changes to be done in init script and pmon.
  • Currently ceph-disk is being used to deploy OSDs. Problem is ceph-disk is deprecated and we should use ceph-volume in its place. This will require an investigation about the impacts of this change. In the worst case scenario, it is possible to still use ceph-disk since this is available through the Ceph Pacific release (latest to date).
  • Evaluate code from current patch set applied on Mimic and port the relevant patches to Nautilus branch.
  • Ensure integration between Ceph and its subsystems (ceph-manager, python-cephclient, mgr-restful-plugin, Puppet code, ansible-playbooks) are working correctly.

Dependencies

N/A

Testing

All validation activities should pass Sanity/Storage regression tests.

Standard configurations scenarios

  • AIO-SX
  • AIO-DX
  • Standard 2C+2W
  • Storage 2C+2S+2W
  • Storage Tiers - Can be done on AIO-SX, should be valid across all installs

Additional scenarios

  • SSD Journal Disks - Use SSD journal disks validate proper configuration on storage lab
  • Peer Groups - Provision system with up to 8 (replication 2) and 9 (replication 3) storage hosts
  • OSD disk replacement - Validate OSD disk replacement procedure

Backup and restore scenarios

  • B&R - AIO-SX
  • B&R - AIO-DX
  • B&R - Standard 2C+2W
  • B&R - Storage 2C+2S+2W

Documentation Impact

The changes to be made shouldn't interfere with system usage. At this time, there is expected to be no documentation changes required.

References


  1. https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous↩︎

  2. https://docs.ceph.com/en/latest/releases/pacific/#upgrade-from-pre-nautilus-releases-like-mimic-or-luminous↩︎

  3. https://ceph.io/en/news/blog/2017/new-luminous-bluestore/↩︎

  4. https://docs.ceph.com/en/latest/ceph-volume/intro/#ceph-disk-replaced↩︎