From ea43be77f9c4ccc91d7eb1834878e2740e444d75 Mon Sep 17 00:00:00 2001 From: Sergii Golovatiuk Date: Wed, 24 Sep 2014 11:32:34 -0700 Subject: [PATCH] Specification for pacemaker-improvements Change-Id: Ic3fe2d279e46062951f14ca0be50deb990eb0190 --- specs/6.0/pacemaker-improvements.rst | 240 +++++++++++++++++++++++++++ 1 file changed, 240 insertions(+) create mode 100644 specs/6.0/pacemaker-improvements.rst diff --git a/specs/6.0/pacemaker-improvements.rst b/specs/6.0/pacemaker-improvements.rst new file mode 100644 index 00000000..eecd6d07 --- /dev/null +++ b/specs/6.0/pacemaker-improvements.rst @@ -0,0 +1,240 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================================== +Improve Corosync and Pacemaker management +========================================== + +https://blueprints.launchpad.net/fuel/+spec/pacemaker-improvements [1]_ + +A next iteration of Corosync & Pacemaker improvements required by scaling +requirements, better Pacemaker management and new OS support. + +Problem description +=================== + +The current Pacemaker implementation has some limitations: + +* Doesn't allow to deploy a large amount of OpenStack Controllers + +* Operations with CIB utilizes almost 100% of CPU on the Controller + +* Corosync shutdown process takes a lot of time + +* No support of new OSes as CentOS 7 or Ubuntu 14.04 + +* Current Fuel Architecture is limited to Corosync 1.x and Pacemaker 1.x + +* Puppet service provider for pacemaker doesn't disable Upstart or SystemV + services by default + +* At current implementation ordering between resources is not specified + +* Diff operations against Corosync CIB require to save data to file rather + than keep all data in memory + +* Debug process of OCF scripts is not unified requires a lot of actions from + Cloud Operator + +* Not granular enough + +* Openstack services are not managed by Pacemaker + +* Compute nodes aren't in Pacemaker cluster, hence, are lacking a viable + control plane for their's compute/nova services. + +Proposed change +=============== + +* Support Fuel Controllers with Corosync 2.0 packages + +* Get the puppet corosync module from puppetlabs and integrate it + +* Rename OCF resources. Remove __old from resource names + +* Refactor service provider and include disabling of the same services under + systemd/upstart/system v + +* Refactor provider and remove diff operation from files + +* Add wrapper handler for OCF scripts or unify debug handling of OCF scripts + +* Move pacemaker & corosync installation to own stage. Create own corosync.pp + to make it more granular + +Permissive change: + +* Add all openstack services to pacemaker and make ordering + +* Use monit as compute nodes' services additional control plane + +Alternatives +------------ + +All changes are not critical and doesn't affect deployment or Cluster +Operation + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Upgrade impact +-------------- + +* Since Resources will be renamed Upgrade process should delete old resources + on upgrade and delete new resource names on roll back. + +* Corosync 2.x is NOT compatible with previous versions of Corosync (1.3/1.4). + Please make sure to upgrade all nodes at once (full-downtime patching) + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +* Deployment process will be improved and will require less time as CIB + operations will not require 100% CPU time + +* Corosync 2.0 has a lot of improvements that allow to have up to 100 + Controllers. Corosync 1.0 scales up to 10-16 node + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +* Enchanced pacemaker provider requires some refactoring of puppet manifests + in Fuel Library manifests: + + - Upstream corosync manifests will replace our in-memory diff invention to + standard approach: crm or pcs or cibadmin --patch '' directly. + + - Renaming vip primitives could require additional orchestration refactoring + as well. + +* New Pacemaker/monit control plane for Openstack services would require + appropriate changes in manifests as well. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: +* sgolovatiuk@mirantis.com +* bdobrelya@mirantis.com + +Other contributors: +* dilyin@mirantis.com +* vkuklin@mirantis.com +* svasilenko@mirantis.com + +Work Items +---------- + +Mandatory items: + +* Replace Corosync 1.0 with Corosync 2.0 + +* Synchronize corosync manifest with puppetlabs + +* Refactor puppet service core provider. It should: + + - Disable systemd/upstart/system V when corosync system + provider is enabled + +* Redesing puppet manifests to start all OCF scripts via + Wrapper + +Permissive items: + +* Add openstack services to Pacemaker + +* Configure ordering between services in Pacemaker + +* Configure monit for compute nodes' Openstack services + +Dependencies +============ + +* Corosync 2.x packages + +* Monit packages + +Testing +======= + +* Standard swarm testing are required. + +* Manual HA testing is required. + +* Rally testing is preffered but not mandatory. + +* New control plane for Openstack services requires manual testing. + +* New debug wrappers for OCF require manual testing. + +Acceptance criteria +------------------- + +* Openstack clouds deployed by Fuel are passing OSTF tests with + Corosync 2.0 and new Pacemaker/monit control plane for services, + if any. + +* Debug wrappers for OCF do produce enough information but aren't too + verbouse as well. + +* VIP resources do not contain an _old postfix in their names. + +* Upstart/system V control plane is disabled for services managed via + Pacemaker OCF. + +Documentation Impact +==================== + +* High Availability guide should be reviewed. For Ubuntu, crm tool stays + as is, but documentation should be as well enhanced with pcs + equivivalents for Centos + +* Upgrade/Patching impact should be described - corosync 2.0 upgrading + assumes full downtime for cloud + +* Changes to OCF debugging approach with bash wrappers should be described + +* Renaming of VIP resources should be mentioned + +* In case of Openstack services become managed by Pacemaker + monit, related + changes for their new control plane should be described + +References +========== + +.. [0] http://lists.corosync.org/pipermail/discuss/2012-April/001456.html +.. [1] https://blueprints.launchpad.net/fuel/+spec/pacemaker-improvements +