6ec44a704d
Change-Id: Ic47b19cc59d51d576f6e1e8549b76c8dbbdab19d Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
4.5 KiB
4.5 KiB
Update pacemaker and corosync infrastructure (Corosync 2.x)
https://blueprints.launchpad.net/fuel/+spec/corosync-2
A next iteration of Corosync & Pacemaker improvements required by scaling requirements, better Pacemaker management and new OS support.
Problem description
The current Pacemaker implementation has some limitations:
- Doesn't allow to deploy a large amount of OpenStack Controllers
- Operations with CIB utilizes almost 100% of CPU on the Controller
- Corosync shutdown process takes a lot of time
- No support of new OSes as CentOS 7 or Ubuntu 14.04
- Current Fuel Architecture is limited to Corosync 1.x and Pacemaker 1.x
- Pacemaker service can be run only as a plugin for Corosync service. We cannot restart pacemaker separately from the corosync and vice versa.
- Fuel fork of corosync module contains a lots of tunings for parallel deployment of controllers which cannot be contributed to the upstream yet because of the huge diverge of the code base
Proposed change
- Support Fuel Controllers with Corosync 2.3.3 and Pacemaker 1.1.12 packages for Centos 6.5 and Ubuntu 14.04
- Run Pacemaker service separated from Corosync (ver:1)
- Get the puppet corosync module from puppetlabs and integrate it. That would allow to install and configure Corosync cluster with Pacemaker without additional reosurces for the code maintanance.
- Move all custom Fuel changes for corosync and pacemaker providers to the separate pacemaker module. That would allow custom changes to not interfere with the upstream code.
Alternatives
- Continue to develop and support Fuel fork of corosync module in order to make it compatible with Corosync 2 without help from puppet community
- Leave Corosync 1.x infrastructure as is
Data model impact
None
REST API impact
None
Upgrade impact
- Corosync 2.x is NOT compatible with previous versions of Corosync [0]. Please make sure to upgrade all nodes at once (full-downtime patching)
Security impact
None
Notifications impact
None
Other end user impact
- If Corosync service started/restarted, Pacemaker service should be (re)started next as well. Otherwise, the inter-service communication layer would be broken.
- Corosync service cannot be stopped gracefully prior to the Pacemaker service. When shutting down, pacemaker service should be turned off first.
Performance Impact
- Deployment process will be improved and will require less time as CIB operations will not require 100% CPU time
- Corosync 2 has a lot of improvements that allow to have up to 100 Controllers. Corosync 1.0 scales up to 10-16 node
Other deployer impact
None
Developer impact
- All changes for custom pacemaker providers should go to the separate pacemaker module.
- Any changes not related to the providers should be done for corosync module and contributed to the upstream as well
Implementation
Assignee(s)
Primary assignee: * sgolovatiuk@mirantis.com * bdobrelya@mirantis.com
Other contributors: * dilyin@mirantis.com
Work Items
- Replace Corosync 1.x infrastructure with Corosync 2.3.3 and Pacemaker 1.1.12 at the staging mirrors
- Adapt puppet modules for corosync and pacemaker for Corosync 2.x
- Synchronize corosync manifest with puppetlabs as well
- Push staging mirrors to the public ones once manifests is ready
Dependencies
- Corosync 2.3.3 and Pacemaker 1.1.12 packages with dependency libraries
Testing
- Standard swarm testing are required.
- Manual HA testing is required.
- Rally testing is preffered but not mandatory.
Acceptance criteria
- Openstack clouds deployed by Fuel are passing OSTF tests with Corosync 2.
Documentation Impact
- High Availability guide should be reviewed. For Ubuntu, crm tool stays as is, but documentation should be as well enhanced with pcs equivivalents for Centos
- Upgrade/Patching impact should be described - corosync 2.x upgrading assumes full downtime for cloud