Add spec for scaling with the ansible inventory
Change-Id: I0cbc1620904acb149230cd5f295f2a17abd59146
This commit is contained in:
parent
88b4a4a203
commit
91ccca4058
|
@ -0,0 +1,251 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==================================
|
||||
Scaling with the Ansible Inventory
|
||||
==================================
|
||||
|
||||
https://blueprints.launchpad.net/tripleo/scaling-with-Ansible-inventory
|
||||
|
||||
Scaling an existing deployment should be possible by adding new host
|
||||
definitions directly to the Ansible inventory, and not having to increase the
|
||||
<Role>Count parameters.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
Currently to scale a deployment, a Heat stack update is required. The stack
|
||||
update reflects the new desired node count of each role, which is then
|
||||
represented in the generated Ansible inventory. The inventory file is then used
|
||||
by the config-download process when ansible-playbook is executed to perform the
|
||||
software configuration on each node.
|
||||
|
||||
Updating the Heat stack with the new desired node count has posed some
|
||||
scaling challenges. Heat creates a set of resources associated with each node.
|
||||
As the number of nodes in a deployment increases, Heat has more and more
|
||||
resources to manage.
|
||||
|
||||
As the stack size grows, Heat must be tuned with software configurations or
|
||||
horizontally scaled with additional engine workers. However, horizontal scaling
|
||||
of Heat workers will only help so much as eventually other service workers
|
||||
would need to be scaled as well, such as database, messaging, or Keystone
|
||||
worker process. Having to increasingly scale worker processes results in
|
||||
additional physical resource consumption.
|
||||
|
||||
Heat performance also begins to degrade as stack size increases. It takes
|
||||
longer and longer for stack operations to complete as node count increases. The
|
||||
stack operation time often reaches into taking many hours, which is usually
|
||||
outside the range of typical maintenance windows.
|
||||
|
||||
It is also hard to predict what changes Heat will make. Often, no changes are
|
||||
desired other than to scale out to new nodes. However, unintended template
|
||||
changes or user error around forgetting to pass environment files poses
|
||||
additional unnecessary risk to the scaling operation.
|
||||
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
The proposed change would allow for users to directly add new node definitions
|
||||
to the Ansible inventory by way of a new Heat parameter to allow for scaling
|
||||
services onto those new nodes. No change in the <Role>Count parameters would be
|
||||
required.
|
||||
|
||||
A minimum set of data would be required when adding a new node to the Ansible
|
||||
inventory. Presently, this includes the TripleO role, and an IP address on each
|
||||
network that is used by that role.
|
||||
|
||||
Only scaling of already defined roles will be possible with this method.
|
||||
Defining new roles would still require a full Heat stack update which defined
|
||||
the new role.
|
||||
|
||||
Once the new node(s) are added to the inventory, ansible-playbook could be
|
||||
rerun with the config-download directory to scale the software services out
|
||||
on to the new nodes.
|
||||
|
||||
As increasing the node count in the Heat stack operation won't be necessary
|
||||
when scaling, if baremetal provisioning is required for the new nodes, then
|
||||
this work depends on the nova-less-deploy work:
|
||||
|
||||
https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html
|
||||
|
||||
Once baremetal provisioning is migrated out of Heat with the above work, then
|
||||
new nodes can be provisioned with those new workflows before adding them
|
||||
directly to the Ansible inventory.
|
||||
|
||||
Since new nodes added directly to the Ansible inventory would still be
|
||||
consuming IP's from the subnet ranges defined for the overcloud networks,
|
||||
Neutron needs to be made aware of those assignments so that there are no
|
||||
overlapping IP addresses. This could be done with a new interface in
|
||||
tripleo-heat-templates that allows for specifying the extra node inventory
|
||||
data. The parameter would be called ``ExtraInventoryData``. The templates would
|
||||
take care of operating on that input and creating the appropriate Neutron ports
|
||||
to correspond to the IP addresses specified in the data.
|
||||
|
||||
When tripleo-ansible-inventory is used to generate the inventory, it would
|
||||
query Heat as it does today, but also layer in the extra inventory data as
|
||||
specified by ``ExtraInventoryData``. The resulting inventory would be a unified
|
||||
view of all nodes in the deployment.
|
||||
|
||||
``ExtraInventoryData`` may be a list of files that are consumed with Heat's
|
||||
get_file function so that the deployer can keep their inventory data organized
|
||||
by file.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
This change is primarily targeted at addressing scaling issues around the
|
||||
Heat stack operation. Alternative methods include using undercloud minions:
|
||||
|
||||
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html
|
||||
|
||||
Multi-stack/split-controlplane also addresses the issue somewhat by breaking up
|
||||
the deployment into smaller and more manageable stacks:
|
||||
|
||||
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html
|
||||
|
||||
These alternatives are complimentary to the proposed solution here, and all of
|
||||
these solutions can be used together for the greatest benefits.
|
||||
|
||||
Direct manipulation of inventory data
|
||||
_____________________________________
|
||||
|
||||
Another alternative would be to not make use of any new interface in the
|
||||
templates such as the previously mentioned ``ExtraInventoryData``. Users could just
|
||||
update the inventory file manually, or drop inventory files in a specified
|
||||
location (since Ansible can use a directory as an inventory source).
|
||||
|
||||
The drawbacks to this approach are that another tool would be necessary to
|
||||
create associated ports in Neutron so that there are no overlapping IP
|
||||
addresses. It could also be a manual step, although that is prone to error.
|
||||
|
||||
The advantages to this approach is that it would completely eliminate the stack
|
||||
update operation as part of the scaling. Not having any stack operation is
|
||||
appealing in some regards due to the potential to forget environment files or
|
||||
other user error (out of date templates, etc).
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
|
||||
IP addresses and hostnames would potentially exist in user managed templates
|
||||
that have the value for ``ExtraInventoryData``, however this is no different than
|
||||
what is present today.
|
||||
|
||||
Upgrade Impact
|
||||
--------------
|
||||
|
||||
The upgrade process will need to be aware that not all nodes are represented in
|
||||
the Heat stack, and some will be represented only in the inventory. This should
|
||||
not be an issue as long as there is a consistent interface to get a single
|
||||
unified inventory as there exists now.
|
||||
|
||||
Any changes around creating the unified view of the inventory should be made
|
||||
within the implementation of that interface (tripleo-ansible-inventory) such
|
||||
that existing tooling continues to use an inventory that contains all nodes for
|
||||
a deployment.
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
|
||||
Users will potentially have to manage additional environment files for the
|
||||
extra inventory data.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Performance should be improved during scale out operations.
|
||||
|
||||
However, it should be noted that Ansible will face scaling challenges as well.
|
||||
While this change does not directly introduce those new challenges, it may
|
||||
expose them more rapidly as it bypasses the Heat scaling challenges.
|
||||
|
||||
For example, it is not expected that simply adding hundreds or thousands of new
|
||||
nodes directly to the Ansible inventory means that scaling operation would
|
||||
succeed. It would likely expose new scaling challenges in other tooling, such
|
||||
as the playbook and role tasks or Ansible itself.
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
|
||||
Since this proposal is meant to align with the nova-less-deploy, all nodes
|
||||
(whether they are known to Heat or not) would be unprovisioned if the
|
||||
deployment is deleted.
|
||||
|
||||
If using pre-provisioned nodes, then there is no change in behavior in that
|
||||
deleting the Heat stack does not actually "undeploy" any software. This
|
||||
proposal does not change that behavior.
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
|
||||
Developers could more quickly test scaling by bypassing the Heat stack update
|
||||
completely if desired, or using the ``ExtraInventoryData`` interface.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
James Slagle <jslagle@redhat.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add new parameter ``ExtraInventoryData``
|
||||
|
||||
* Add Heat processing of ``ExtraInventoryData``
|
||||
|
||||
* create Neutron ports
|
||||
|
||||
* add stack outputs
|
||||
|
||||
* Update tripleo-ansible-inventory to consume from added stack outputs
|
||||
|
||||
* Update HostsEntry to be generic
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* Depends on nova-less-deploy work for baremetal provisioning outside of Heat.
|
||||
If using pre-provisioned nodes, does not depend on nova-less-deploy.
|
||||
|
||||
* All deployment configurations coming out of Heat need to be generic per role.
|
||||
Most of this work was complete in Train, however this should be reviewed. For
|
||||
example, the HostsEntry data is still static and Heat is calculating the node
|
||||
list. This data needs to be moved to an Ansible template.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Scaling is not currently tested in CI, however perhaps it could be with this
|
||||
change.
|
||||
|
||||
Manual test plans and other test automation would need to be updated to also
|
||||
test scaling with ``ExtraInventoryData``.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Documentation needs to be added for ``ExtraInventoryData``.
|
||||
|
||||
The feature should also be fully explained in that users and deployers need to
|
||||
be made aware of the change of how nodes may or may not be represented in the
|
||||
Heat stack.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html
|
||||
* https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html
|
||||
* https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html
|
Loading…
Reference in New Issue