Add spec for scaling with the ansible inventory
Change-Id: I0cbc1620904acb149230cd5f295f2a17abd59146
This commit is contained in:
parent
88b4a4a203
commit
91ccca4058
|
@ -0,0 +1,251 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
==================================
|
||||||
|
Scaling with the Ansible Inventory
|
||||||
|
==================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/tripleo/scaling-with-Ansible-inventory
|
||||||
|
|
||||||
|
Scaling an existing deployment should be possible by adding new host
|
||||||
|
definitions directly to the Ansible inventory, and not having to increase the
|
||||||
|
<Role>Count parameters.
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Currently to scale a deployment, a Heat stack update is required. The stack
|
||||||
|
update reflects the new desired node count of each role, which is then
|
||||||
|
represented in the generated Ansible inventory. The inventory file is then used
|
||||||
|
by the config-download process when ansible-playbook is executed to perform the
|
||||||
|
software configuration on each node.
|
||||||
|
|
||||||
|
Updating the Heat stack with the new desired node count has posed some
|
||||||
|
scaling challenges. Heat creates a set of resources associated with each node.
|
||||||
|
As the number of nodes in a deployment increases, Heat has more and more
|
||||||
|
resources to manage.
|
||||||
|
|
||||||
|
As the stack size grows, Heat must be tuned with software configurations or
|
||||||
|
horizontally scaled with additional engine workers. However, horizontal scaling
|
||||||
|
of Heat workers will only help so much as eventually other service workers
|
||||||
|
would need to be scaled as well, such as database, messaging, or Keystone
|
||||||
|
worker process. Having to increasingly scale worker processes results in
|
||||||
|
additional physical resource consumption.
|
||||||
|
|
||||||
|
Heat performance also begins to degrade as stack size increases. It takes
|
||||||
|
longer and longer for stack operations to complete as node count increases. The
|
||||||
|
stack operation time often reaches into taking many hours, which is usually
|
||||||
|
outside the range of typical maintenance windows.
|
||||||
|
|
||||||
|
It is also hard to predict what changes Heat will make. Often, no changes are
|
||||||
|
desired other than to scale out to new nodes. However, unintended template
|
||||||
|
changes or user error around forgetting to pass environment files poses
|
||||||
|
additional unnecessary risk to the scaling operation.
|
||||||
|
|
||||||
|
|
||||||
|
Proposed Change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
The proposed change would allow for users to directly add new node definitions
|
||||||
|
to the Ansible inventory by way of a new Heat parameter to allow for scaling
|
||||||
|
services onto those new nodes. No change in the <Role>Count parameters would be
|
||||||
|
required.
|
||||||
|
|
||||||
|
A minimum set of data would be required when adding a new node to the Ansible
|
||||||
|
inventory. Presently, this includes the TripleO role, and an IP address on each
|
||||||
|
network that is used by that role.
|
||||||
|
|
||||||
|
Only scaling of already defined roles will be possible with this method.
|
||||||
|
Defining new roles would still require a full Heat stack update which defined
|
||||||
|
the new role.
|
||||||
|
|
||||||
|
Once the new node(s) are added to the inventory, ansible-playbook could be
|
||||||
|
rerun with the config-download directory to scale the software services out
|
||||||
|
on to the new nodes.
|
||||||
|
|
||||||
|
As increasing the node count in the Heat stack operation won't be necessary
|
||||||
|
when scaling, if baremetal provisioning is required for the new nodes, then
|
||||||
|
this work depends on the nova-less-deploy work:
|
||||||
|
|
||||||
|
https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html
|
||||||
|
|
||||||
|
Once baremetal provisioning is migrated out of Heat with the above work, then
|
||||||
|
new nodes can be provisioned with those new workflows before adding them
|
||||||
|
directly to the Ansible inventory.
|
||||||
|
|
||||||
|
Since new nodes added directly to the Ansible inventory would still be
|
||||||
|
consuming IP's from the subnet ranges defined for the overcloud networks,
|
||||||
|
Neutron needs to be made aware of those assignments so that there are no
|
||||||
|
overlapping IP addresses. This could be done with a new interface in
|
||||||
|
tripleo-heat-templates that allows for specifying the extra node inventory
|
||||||
|
data. The parameter would be called ``ExtraInventoryData``. The templates would
|
||||||
|
take care of operating on that input and creating the appropriate Neutron ports
|
||||||
|
to correspond to the IP addresses specified in the data.
|
||||||
|
|
||||||
|
When tripleo-ansible-inventory is used to generate the inventory, it would
|
||||||
|
query Heat as it does today, but also layer in the extra inventory data as
|
||||||
|
specified by ``ExtraInventoryData``. The resulting inventory would be a unified
|
||||||
|
view of all nodes in the deployment.
|
||||||
|
|
||||||
|
``ExtraInventoryData`` may be a list of files that are consumed with Heat's
|
||||||
|
get_file function so that the deployer can keep their inventory data organized
|
||||||
|
by file.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
This change is primarily targeted at addressing scaling issues around the
|
||||||
|
Heat stack operation. Alternative methods include using undercloud minions:
|
||||||
|
|
||||||
|
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html
|
||||||
|
|
||||||
|
Multi-stack/split-controlplane also addresses the issue somewhat by breaking up
|
||||||
|
the deployment into smaller and more manageable stacks:
|
||||||
|
|
||||||
|
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html
|
||||||
|
|
||||||
|
These alternatives are complimentary to the proposed solution here, and all of
|
||||||
|
these solutions can be used together for the greatest benefits.
|
||||||
|
|
||||||
|
Direct manipulation of inventory data
|
||||||
|
_____________________________________
|
||||||
|
|
||||||
|
Another alternative would be to not make use of any new interface in the
|
||||||
|
templates such as the previously mentioned ``ExtraInventoryData``. Users could just
|
||||||
|
update the inventory file manually, or drop inventory files in a specified
|
||||||
|
location (since Ansible can use a directory as an inventory source).
|
||||||
|
|
||||||
|
The drawbacks to this approach are that another tool would be necessary to
|
||||||
|
create associated ports in Neutron so that there are no overlapping IP
|
||||||
|
addresses. It could also be a manual step, although that is prone to error.
|
||||||
|
|
||||||
|
The advantages to this approach is that it would completely eliminate the stack
|
||||||
|
update operation as part of the scaling. Not having any stack operation is
|
||||||
|
appealing in some regards due to the potential to forget environment files or
|
||||||
|
other user error (out of date templates, etc).
|
||||||
|
|
||||||
|
Security Impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
IP addresses and hostnames would potentially exist in user managed templates
|
||||||
|
that have the value for ``ExtraInventoryData``, however this is no different than
|
||||||
|
what is present today.
|
||||||
|
|
||||||
|
Upgrade Impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
The upgrade process will need to be aware that not all nodes are represented in
|
||||||
|
the Heat stack, and some will be represented only in the inventory. This should
|
||||||
|
not be an issue as long as there is a consistent interface to get a single
|
||||||
|
unified inventory as there exists now.
|
||||||
|
|
||||||
|
Any changes around creating the unified view of the inventory should be made
|
||||||
|
within the implementation of that interface (tripleo-ansible-inventory) such
|
||||||
|
that existing tooling continues to use an inventory that contains all nodes for
|
||||||
|
a deployment.
|
||||||
|
|
||||||
|
Other End User Impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Users will potentially have to manage additional environment files for the
|
||||||
|
extra inventory data.
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Performance should be improved during scale out operations.
|
||||||
|
|
||||||
|
However, it should be noted that Ansible will face scaling challenges as well.
|
||||||
|
While this change does not directly introduce those new challenges, it may
|
||||||
|
expose them more rapidly as it bypasses the Heat scaling challenges.
|
||||||
|
|
||||||
|
For example, it is not expected that simply adding hundreds or thousands of new
|
||||||
|
nodes directly to the Ansible inventory means that scaling operation would
|
||||||
|
succeed. It would likely expose new scaling challenges in other tooling, such
|
||||||
|
as the playbook and role tasks or Ansible itself.
|
||||||
|
|
||||||
|
Other Deployer Impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Since this proposal is meant to align with the nova-less-deploy, all nodes
|
||||||
|
(whether they are known to Heat or not) would be unprovisioned if the
|
||||||
|
deployment is deleted.
|
||||||
|
|
||||||
|
If using pre-provisioned nodes, then there is no change in behavior in that
|
||||||
|
deleting the Heat stack does not actually "undeploy" any software. This
|
||||||
|
proposal does not change that behavior.
|
||||||
|
|
||||||
|
Developer Impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Developers could more quickly test scaling by bypassing the Heat stack update
|
||||||
|
completely if desired, or using the ``ExtraInventoryData`` interface.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
James Slagle <jslagle@redhat.com>
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Add new parameter ``ExtraInventoryData``
|
||||||
|
|
||||||
|
* Add Heat processing of ``ExtraInventoryData``
|
||||||
|
|
||||||
|
* create Neutron ports
|
||||||
|
|
||||||
|
* add stack outputs
|
||||||
|
|
||||||
|
* Update tripleo-ansible-inventory to consume from added stack outputs
|
||||||
|
|
||||||
|
* Update HostsEntry to be generic
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
* Depends on nova-less-deploy work for baremetal provisioning outside of Heat.
|
||||||
|
If using pre-provisioned nodes, does not depend on nova-less-deploy.
|
||||||
|
|
||||||
|
* All deployment configurations coming out of Heat need to be generic per role.
|
||||||
|
Most of this work was complete in Train, however this should be reviewed. For
|
||||||
|
example, the HostsEntry data is still static and Heat is calculating the node
|
||||||
|
list. This data needs to be moved to an Ansible template.
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Scaling is not currently tested in CI, however perhaps it could be with this
|
||||||
|
change.
|
||||||
|
|
||||||
|
Manual test plans and other test automation would need to be updated to also
|
||||||
|
test scaling with ``ExtraInventoryData``.
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
Documentation needs to be added for ``ExtraInventoryData``.
|
||||||
|
|
||||||
|
The feature should also be fully explained in that users and deployers need to
|
||||||
|
be made aware of the change of how nodes may or may not be represented in the
|
||||||
|
Heat stack.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
* https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html
|
||||||
|
* https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html
|
||||||
|
* https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html
|
Loading…
Reference in New Issue