.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode =========================================== Remove merge.py from TripleO Heat Templates =========================================== https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-remove-mergepy ``merge.py`` is where we've historically accumulated the technical debt for our Heat templates [0]_ with the intention of migrating away from it when Heat meets our templating needs. Its main functionality includes combining smaller template snippets into a single template describing the full TripleO deployment, merging certain resources together to reduce duplication while keeping the snippets themselves functional as standalone templates and a support for manual scaling of Heat resources. This spec describes the changes necessary to move towards templates that do not depend on ``merge.py``. We will use native Heat features where we can and document the rest, possibly driving new additions to the Heat template format. It is largely based on the April 2014 discussion in openstack-dev [1]_. Problem Description =================== Because of the mostly undocumented nature of ``merge.py`` our templates are difficult to understand or modify by newcomers (even those already familiar with Heat). It has always been considered a short-term measure and Heat can now provide most of what we need in our templates. Proposed Change =============== We will start with making small correctness-preserving changes to our templates and ``merge.py`` that move us onto using more Heat native features. Where we cannot make the change for some reason, we will file a bug with Heat and work with them to unblock the process. Once we get to a point where we have to do large changes to the structure of our templates, we will split them off to new files and enable them in our CI as parallel implementations. Once we are confident that the new templates fulfill the same requirements as the original ones, we will deprecate the old ones, deprecate ``merge.py`` and switch to the new ones as the default. The list of action items necessary for the full transition is below. **1. Remove the custom resource types** TripleO Heat templates and ``merge.py`` carry two custom types that (after the move to software config [8]_, [9]_) are no longer used for anything: * OpenStack::ImageBuilder::Elements * OpenStack::Role We will drop them from the templates and deprecate in the merge tool. **2. Remove combining whitelisted resource types** If we have two ``AWS::AutoScaling::LaunchConfiguration`` resources with the same name, ``merge.py`` will combine their ``Properties`` and ``Metadata``. Our templates are no longer using this after the software-config update. **3. Port TripleO Heat templates to HOT** With most of the non-Heat syntax out of the way, porting our CFN/YAML templates to pure HOT format [2]_ should be straightforward. We will have to update ``merge.py`` as well. We should be able to support both the old format and HOT. We should be able to differentiate between the two by looking for the ``heat_template_version`` top-level section which is mandatory in the HOT syntax. Most of the changes to ``merge.py`` should be around spelling (``Parameters`` -> ``parameters``, ``Resources`` -> ``resources``) and different names for intrinsic functions, etc. (``Fn::GetAtt`` -> ``get_attr``). This task will require syntactic changes to all of our templates and unfortunately, it isn't something different people can update bit by bit. We should be able to update the undercloud and overcloud portions separately, but we can't e.g. just update a part of the overcloud. We are still putting templates together with ``merge.py`` at this point and we would end up with a template that has both CFN and HOT bits. **4. Move to Provider resources** Heat allows passing-in multiple templates when deploying a stack. These templates can map to custom resource types. Each template would represent a role (compute server, controller, block storage, etc.) and its ``parameters`` and ``outputs`` would map to the custom resource's ``properties`` and ``attributes``. These roles will be referenced from a master template (``overcloud.yaml``, ``undercloud.yaml``) and eventually wrapped in a scaling resource (``OS::Heat::ResourceGroup`` [5]_) or whatever scaling mechanism we adopt. .. note:: Provider resources represent fully functional standalone templates. Any provider resource template can be passed to Heat and turned into a stack or treated as a custom resource in a larger deployment. Here's a hypothetical outline of ``compute.yaml``:: parameters: flavor: type: string image: type: string amqp_host: type: string nova_compute_driver: type: string resources: compute_instance: type: OS::Nova::Server properties: flavor: {get_param: flavor} image: {get_param: image} compute_deployment: type: OS::Heat::StructuredDeployment properties: server: {ref: compute_instance} config: {ref: compute_config} input_values: amqp_host: {get_param: amqp_host} nova_compute_driver: {get_param: nova_compute_driver} compute_config: type: OS::Heat::StructuredConfig properties: group: os-apply-config config: amqp: host: {get_input: amqp_host} nova: compute_driver: {get_input: nova_compute_driver} ... We will use a similar structure for all the other roles (``controller.yaml``, ``block-storage.yaml``, ``swift-storage.yaml``, etc.). That is, each role will contain the ``OS::Nova::Server``, the associated deployments and any other resources required (random string generators, security groups, ports, floating IPs, etc.). We can map the roles to custom types using Heat environments [4]_. ``role_map.yaml``: :: resource_registry: OS::TripleO::Compute: compute.yaml OS::TripleO::Controller: controller.yaml OS::TripleO::BlockStorage: block-storage.yaml OS::TripleO::SwiftStorage: swift-storage.yaml Lastly, we'll have a master template that puts it all together. ``overcloud.yaml``:: parameters: compute_flavor: type: string compute_image: type: string compute_amqp_host: type: string compute_driver: type: string ... resources: compute0: # defined in controller.yaml, type mapping in role_map.yaml type: OS::TripleO::Compute parameters: flavor: {get_param: compute_flavor} image: {get_param: compute_image} amqp_host: {get_param: compute_amqp_host} nova_compute_driver: {get_param: compute_driver} controller0: # defined in controller.yaml, type mapping in role_map.yaml type: OS::TripleO::Controller parameters: flavor: {get_param: controller_flavor} image: {get_param: controller_image} ... outputs: keystone_url: description: URL for the Overcloud Keystone service # `keystone_url` is an output defined in the `controller.yaml` template. # We're referencing it here to expose it to the Heat user. value: { get_attr: [controller_0, keystone_url] } and similarly for ``undercloud.yaml``. .. note:: The individual roles (``compute.yaml``, ``controller.yaml``) are structured in such a way that they can be launched as standalone stacks (i.e. in order to test the compute instance, one can type ``heat stack-create -f compute.yaml -P ...``). Indeed, Heat treats provider resources as nested stacks internally. **5. Remove FileInclude from ``merge.py``** The goal of ``FileInclude`` was to keep individual Roles (to borrow a loaded term from TripleO UI) viable as templates that can be launched standalone. The canonical example is ``nova-compute-instance.yaml`` [3]_. With the migration to provider resources, ``FileInclude`` is not necessary. **6. Move the templates to Heat-native scaling** Scaling of resources is currently handled by ``merge.py``. The ``--scale`` command line argument takes a resource name and duplicates it as needed (it's a bit more complicated than that, but that's beside the point). Heat has a native scaling ``OS::Heat::ResourceGroup`` [5]_ resource that does essentially the same thing:: scaled_compute: type: OS::Heat::ResourceGroup properties: count: 42 resource_def: type: OS::TripleO::Compute parameters: flavor: baremetal image: compute-image-rhel7 ... This will create 42 instances of compute hosts. **7. Replace Merge::Map with scaling groups' inner attributes** We are using the custom ``Merge::Map`` helper function for getting values out of scaled-out servers: * `Building a comma-separated list of RabbitMQ nodes`__ __ https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L642 * `Getting the name of the first controller node`__ __ https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L405 * `List of IP addresses of all controllers`__ __ https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L405 * `Building the /etc/hosts file`__ __ https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L585 The ``ResourceGroup`` resource supports selecting an attribute of an inner resource as well as getting the same attribute from all resources and returning them as a list. Example of getting an IP address of the controller node: :: {get_attr: [controller_group, resource.0.networks, ctlplane, 0]} (`controller_group` is the `ResourceGroup` of our controller nodes, `ctlplane` is the name of our control plane network) Example of getting the list of names of all of the controller nodes: :: {get_attr: [controller_group, name]} The more complex uses of ``Merge::Map`` involve formatting the returned data in some way, for example building a list of ``{ip: ..., name: ...}`` dictionaries for haproxy or generating the ``/etc/hosts`` file. Since our ResourceGroups will not be using Nova servers directly, but rather the custom role types using provider resources and environments, we can put this data formatting into the role's ``outputs`` section and then use the same mechanism as above. Example of building out the haproxy node entries:: # overcloud.yaml: resources: controller_group: type: OS::Heat::ResourceGroup properties: count: {get_param: controller_scale} resource_def: type: OS::TripleO::Controller properties: ... controllerConfig: type: OS::Heat::StructuredConfig properties: ... haproxy: nodes: {get_attr: [controller_group, haproxy_node_entry]} # controller.yaml: resources: ... controller: type: OS::Nova::Server properties: ... outputs: haproxy_node_entry: description: A {ip: ..., name: ...} dictionary for configuring the haproxy node value: ip: {get_attr: [controller, networks, ctlplane, 0]} name: {get_attr: [controller, name]} Alternatives ------------ This proposal is very t-h-t and Heat specific. One alternative is to do nothing and keep using and evolving ``merge.py``. That was never the intent, and most members of the core team do not consider this a viable long-term option. Security Impact --------------- This proposal does not affect the overall functionality of TripleO in any way. It just changes the way TripleO Heat templates are stored and written. If anything, this will move us towards more standard and thus more easily auditable templates. Other End User Impact --------------------- There should be no impact for the users of vanilla TripleO. More advanced users may want to customise the existing Heat templates or write their own. That will be made easier when we rely on standard Heat features only. Performance Impact ------------------ This moves some of the template-assembling burden from ``merge.py`` to Heat. It will likely also end up producing more resources and nested stacks on the background. As far as we're aware, no one has tested these features at the scale we are inevitably going to hit. Before we land changes that can affect this (provider config and scaling) we need to have scale tests in Tempest running TripleO to make sure Heat can cope. These tests can be modeled after the `large_ops`_ scenario: a Heat template that creates and destroys a stack of 50 Nova server resources with associated software configs. We should have two tests to asses the before and after performance: 1. A single HOT template with 50 copies of the same server resource and software config/deployment. 2. A template with a single server and its software config/deploys, an environment file with a custom type mapping and an overall template that wraps the new type in a ResourceGroup with the count of 50. .. _large_ops: https://github.com/openstack/tempest/blob/master/tempest/scenario/test_large_ops.py Other Deployer Impact --------------------- Deployers can keep using ``merge.py`` and the existing Heat templates as before -- existing scripts ought not break. With the new templates, Heat will be called directly and will need the resource registry (in a Heat environment file). This will mean a change in the deployment process. Developer Impact ---------------- This should not affect non-Heat and non-TripleO OpenStack developers. There will likely be a slight learning curve for the TripleO developers who want to write and understand our Heat templates. Chances are, we will also encounter bugs or unforeseen complications while swapping ``merge.py`` for Heat features. The impact on Heat developers would involve processing the bugs and feature requests we uncover. This will hopefully not be an avalanche. Implementation ============== Assignee(s) ----------- Primary assignee: Tomas Sedovic Work Items ---------- 1. Remove the custom resource types 2. Remove combining whitelisted resource types 3. Port TripleO Heat templates to HOT 4. Move to Provider resources 5. Remove FileInclude from ``merge.py`` 6. Move the templates to Heat-native scaling 7. Replace Merge::Map with scaling groups' inner attributes Dependencies ============ * The Juno release of Heat * Being able to kill specific nodes in Heat (for scaling down or because they're misbehaving) - Relevant Heat blueprint: `autoscaling-parameters`_ .. _autoscaling-parameters: https://blueprints.launchpad.net/heat/+spec/autoscaling-parameters Testing ======= All of these changes will be made to the tripleo-heat-templates repository and should be testable by our CI just as any other t-h-t change. In addition, we will need to add Tempest scenarios for scale to ensure Heat can handle the load. Documentation Impact ==================== We will need to update the `devtest`_, `Deploying TripleO`_ and `Using TripleO`_ documentation and create a guide for writing TripleO templates. .. _devtest: http://docs.openstack.org/developer/tripleo-incubator/devtest.html .. _Deploying TripleO: http://docs.openstack.org/developer/tripleo-incubator/deploying.html .. _Using TripleO: http://docs.openstack.org/developer/tripleo-incubator/userguide.html References ========== .. [0] https://github.com/openstack/tripleo-heat-templates .. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-April/031915.html .. [2] http://docs.openstack.org/developer/heat/template_guide/hot_guide.html .. [3] https://github.com/openstack/tripleo-heat-templates/blob/master/nova-compute-instance.yaml .. [4] http://docs.openstack.org/developer/heat/template_guide/environment.html .. [5] http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::ResourceGroup .. [6] http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::RandomString .. [7] http://lists.openstack.org/pipermail/openstack-dev/2014-July/040115.html .. [8] https://review.openstack.org/#/c/81666/ .. [9] https://review.openstack.org/#/c/93319/ .. [10] http://docs.openstack.org/developer/heat/template_guide/hot_spec.html#str-replace