tripleo-specs/975c7280c649e390625700604c7...

1136 lines
75 KiB
Plaintext

{
"comments": [
{
"key": {
"uuid": "31d34c9f_dfd3c0b0",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 201,
"author": {
"id": 8833
},
"writtenOn": "2021-08-09T04:28:43Z",
"side": 1,
"message": "Can we please elaborate on the messaging layer security with either zeromq or any other messaging layer (QDR)? I don\u0027t see anything like CurveZMQ in directord atm.\n\nAs we\u0027ve been assuming that ctlplane and provisioning network is not secure (anyone can hack into the compute nodes it seems), we can\u0027t push the security aspects to the backburner.",
"range": {
"startLine": 201,
"startChar": 0,
"endLine": 201,
"endChar": 15
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "2bbdf081_4997234b",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 201,
"author": {
"id": 7353
},
"writtenOn": "2021-08-09T13:34:08Z",
"side": 1,
"message": "Both token based authentication and CurveZMQ, with Curve25519, has been implemented in Directord; in fact CurveZMQ is the default when running a production based deployment and is tested on every PR.\n\nThis is all documented here: https://directord.com/authentication.html\n\nI totally agree, security can not be an after thought. While I still attest we will need a security assessment, the code provides for messaging encryption and tests are active on every PR to ensure we\u0027re not introducing known CVEs: https://github.com/cloudnull/directord/actions/workflows/codeql-analysis.yml\n\nThat said, there\u0027s nothing to elaborate on here. Messaging transport security is an implementation detail in the messaging driver and not something that is provided by Directord itself; while ZMQ is using Curve25519, other messaging drivers will undoubtedly use something else.",
"parentUuid": "31d34c9f_dfd3c0b0",
"range": {
"startLine": 201,
"startChar": 0,
"endLine": 201,
"endChar": 15
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "63f02212_c8864707",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 246,
"author": {
"id": 8833
},
"writtenOn": "2021-08-09T04:28:43Z",
"side": 1,
"message": "Before we moved to ansible, heat provided a similar \u0027pull\u0027 execution model with heat-engine and software config agents on the nodes, with swift as the communication layer. AFAIK, the reasons we moved away from it are mainly due to.\n\n1. Terrible debugging and troubleshooting experience with the \u0027pull\u0027 model.\n2. Not supporting a good model for partial and fine-grained (successful) deployment (which we achieved with --limit and other features in ansible)\n\nCan we\u0027ve some clarity on how those would be addressed with this new architecture? Link to a document outside this spec should be fine.",
"range": {
"startLine": 246,
"startChar": 0,
"endLine": 246,
"endChar": 21
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "e761e751_274dfeaa",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 246,
"author": {
"id": 7353
},
"writtenOn": "2021-08-09T13:34:08Z",
"side": 1,
"message": "While Directord is using messaging, it is using a \"push\" [0] model, similar to Ansible and will continue to do so, the driver is following the router dealer pattern in ZMQ [1] and we intend to do something similar with QDR; nothing in Directord subscribes to the \"pull\" model. I will spell that out better here.\n\n[0] https://directord.com/overview.html#cluster-messaging\n[1] http://wiki.zeromq.org/tutorials:dealer-and-router",
"parentUuid": "63f02212_c8864707",
"range": {
"startLine": 246,
"startChar": 0,
"endLine": 246,
"endChar": 21
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "dc88d59c_f2f18ed1",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 246,
"author": {
"id": 14985
},
"writtenOn": "2021-08-09T15:39:00Z",
"side": 1,
"message": "Heat\u0027s problem was using a pull stateless model via HTTP rather than a proper messaging system. The proposed method uses a messaging platform to do work distributed but is additionally more active in understanding state of remote workers. Heat had no concept of which nodes were still active leading to issues understanding where the entire process currently is. Task-core will push work into directord which is active in ensuring the work is completed before responding to task-core for the next tasks. The messaging platform replaces the ssh mesh that ansible handles with something more robust and reliable (e.g. developed specifically for this).",
"parentUuid": "e761e751_274dfeaa",
"range": {
"startLine": 246,
"startChar": 0,
"endLine": 246,
"endChar": 21
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "b40d4e5a_c93c1e70",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 246,
"author": {
"id": 8833
},
"writtenOn": "2021-08-10T03:41:18Z",
"side": 1,
"message": "I don\u0027t know what we mean by stateless model. Heat definitely had deployment/stack state stored in the heat database, but obviously it was not designed as a task engine to keep state/health of remote workers/nodes.\n\nI would like to know more about \u0027robust and reliable\u0027 part of the new toolset which includes recovery from server/client failure, network issues and queue overflow etc and how those are addressed. \n\nThough I don\u0027t see we using zeromq (rather something like QDR maybe) and don\u0027t know a lot about zeromq, here is an example doc snippet from zeromq documentation about heartbeats.\n\n\"When we use a ROUTER socket in an application that tracks peers, as peers disconnect and reconnect, the application will leak memory (resources that the application holds for each peer) and get slower and slower.\"\n\nI\u0027m asking these questions because we\u0027re purpose writing some custom tools for our needs and they should just address specific needs and pain points without any overlap.",
"parentUuid": "dc88d59c_f2f18ed1",
"range": {
"startLine": 246,
"startChar": 0,
"endLine": 246,
"endChar": 21
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "7896e6f1_ed5a2174",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 246,
"author": {
"id": 14985
},
"writtenOn": "2021-08-10T13:23:02Z",
"side": 1,
"message": "The deployment has state. The actual execution status was effectively stateless during processing. There was no way for heat to know if a node didn\u0027t get work because it was client side driven. The proposed architecture with directord include a much more active way of understanding execute status as it relates to the configured nodes. heat didn\u0027t have that when we were using it to orchestrate work on the overcloud nodes during the deployment so thats why we ended up with very long timeouts and no at who lot of status information on the undercloud.",
"parentUuid": "b40d4e5a_c93c1e70",
"range": {
"startLine": 246,
"startChar": 0,
"endLine": 246,
"endChar": 21
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "01565ee6_eba5cbc0",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 246,
"author": {
"id": 7353
},
"writtenOn": "2021-08-10T13:39:17Z",
"side": 1,
"message": "Rabi, re: \"robust and reliable\" and the use of ROUTER sockets. Directord implements an internal heartbeat for sockets and remote clients when running jobs to a given set of targets. These hearbeats monitor the environment and ensure all of the nodes remain active. In the event of a failure, the heatbeat process will cleanup orphaned sockets ensuring that the application remains performant throughout the orchestration or halt an execution in the event required targets are missing. This is a good call out though, the heartbeat process has very little documentation, only being mentioned in the service setup [0] and only referenced once in the data-flow-diagram [1]; I\u0027ll look to improve the documentation surrounding the heartbeat process and how Directord recovers from a failure.\n\n[0] https://directord.com/service-setup.html\n[1] https://directord.com/assets/Directord-Data-flow.png",
"parentUuid": "7896e6f1_ed5a2174",
"range": {
"startLine": 246,
"startChar": 0,
"endLine": 246,
"endChar": 21
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "0ca9bf5c_f0b09213",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 327,
"author": {
"id": 11655
},
"writtenOn": "2021-08-06T00:18:05Z",
"side": 1,
"message": "Just a nit, but the above paragraph feels like it should be two paragraphs starting at the sentence starting with \"Test clouds\" as it is a logical transition point, that or the second two thirds of that sentence maybe split apart.",
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "15a1e2a7_39e43386",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 327,
"author": {
"id": 7353
},
"writtenOn": "2021-08-06T15:02:13Z",
"side": 1,
"message": "Ack",
"parentUuid": "0ca9bf5c_f0b09213",
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": false
},
{
"key": {
"uuid": "32582931_4fd39841",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 346,
"author": {
"id": 7294
},
"writtenOn": "2021-08-05T21:21:57Z",
"side": 1,
"message": "developers",
"range": {
"startLine": 346,
"startChar": 50,
"endLine": 346,
"endChar": 60
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "aa69ba39_44868099",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 346,
"author": {
"id": 7353
},
"writtenOn": "2021-08-06T15:02:13Z",
"side": 1,
"message": "Ack",
"parentUuid": "32582931_4fd39841",
"range": {
"startLine": 346,
"startChar": 50,
"endLine": 346,
"endChar": 60
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": false
},
{
"key": {
"uuid": "feab510c_47e7c0f0",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 359,
"author": {
"id": 7294
},
"writtenOn": "2021-08-05T21:21:57Z",
"side": 1,
"message": "~and~",
"range": {
"startLine": 359,
"startChar": 34,
"endLine": 359,
"endChar": 37
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "ae50c809_ac3316c8",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 359,
"author": {
"id": 7353
},
"writtenOn": "2021-08-06T15:02:13Z",
"side": 1,
"message": "Ack",
"parentUuid": "feab510c_47e7c0f0",
"range": {
"startLine": 359,
"startChar": 34,
"endLine": 359,
"endChar": 37
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": false
},
{
"key": {
"uuid": "a07e1031_47879f7c",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 8833
},
"writtenOn": "2021-08-06T03:03:35Z",
"side": 1,
"message": "Can we spell out the changes to config-download somewhere? Are we going to generate a mix of directord orchestrations and ansible playbooks from the existing service templates during the transition and finally it would only be directord orchestrations?\n\nAlso an end-to-end sequence diagram would be useful as the block diagram above does not depict the role of each component in the new architecture and there is lot of ambiguity.",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "792e3901_217aee82",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 8833
},
"writtenOn": "2021-08-06T03:49:07Z",
"side": 1,
"message": "Also, there seems overlapping functionalities with these tools. We would have three tools (heat-engine/taskflow engine/directord server) providing orchestration/task management functionality, where we had two (heat, ansible) earlier.",
"parentUuid": "a07e1031_47879f7c",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "4e573618_4160cea0",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7353
},
"writtenOn": "2021-08-06T15:02:13Z",
"side": 1,
"message": "There will be a mix of ansible and directord, at least initially. Through the task-core interface, we\u0027ll be able to support both code paths as we transition.\n\nI\u0027ll look to building a sequence diagram in the next update.\n\nYou are correct that there would be three tools in place of the current two. That said, the two tools we have are the reason why we have horrendous deployment times, where as the three tools we could have would dramatically shrink the install footprint and time to delivery.",
"parentUuid": "792e3901_217aee82",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "694c6583_11fabb81",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-06T21:07:36Z",
"side": 1,
"message": "I agree there is overlap between existing and new tools. I\u0027d like to see it more clearly explained how they will work together.\n\nMy previous comment on PS3 pointed this out as well:\nhttps://review.opendev.org/c/openstack/tripleo-specs/+/801630/3/specs/xena/directord-orchestration.rst#36\n\nIt feels a bit like task-core is an alternative to heat-as-a-library.",
"parentUuid": "4e573618_4160cea0",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "763b5dc0_b2d29c19",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7353
},
"writtenOn": "2021-08-09T13:55:01Z",
"side": 1,
"message": "++ Alex to elaborate, but there has been some WIP changes proposed to highlight how everything would work together: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798747",
"parentUuid": "694c6583_11fabb81",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "46569b5a_3fbfc8be",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-09T15:39:00Z",
"side": 1,
"message": "task-core is not heat-as-a-library because it takes the cloud services and figures out execution ordering. Task-core has no knowledge of the data being run and does not put together any of that. It takes nodes/roles/services and figures out what needs to get run in a specific order.\n\nHeat (end user UI) -\u003e task-core (figures out ordering/execution status) -\u003e directord (work runner)\n\nThe example of how heat -\u003e task-core works is available via the proposed THT patches https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798747\n\nEffectively task-core is the ansible playbook parser of the future deployment framework. There is no plan to use it to replace what heat provides us today. Heat understands the networking/host/roles that we define as our \"cloud\" where task-core expect you to provide all this information as inputs. Similar to what we have today with ansible where we provide inventory and playbook files. Task-core becomes the layer between the heat and the actual task executor (directord) similar to what ansible-runner/ansible-playbook provide today.",
"parentUuid": "763b5dc0_b2d29c19",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "537352b5_36fe3792",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-09T20:49:24Z",
"side": 1,
"message": "can you check my comment at https://review.opendev.org/c/openstack/tripleo-specs/+/801630/3/specs/xena/directord-orchestration.rst#134\n\ni get that we are adding \"core_services\" to the role_data output in the service template interface. What is the plan on how we migrate existing services?\n\nHow do we move from keystone-container-puppet.yaml to a task-core native definition (e.g., https://github.com/mwhahaha/task-core/blob/main/examples/directord/services/openstack-keystone.yaml).\n\nWhat is the expectation on service owners? What I\u0027d like to see is an example of migrating an existing service template.",
"parentUuid": "46569b5a_3fbfc8be",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "9cbdd113_7d5b12ca",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-09T21:05:24Z",
"side": 1,
"message": "\u003cquote\u003e\ntask-core is not heat-as-a-library because it takes the cloud services and figures out execution ordering. Task-core has no knowledge of the data being run and does not put together any of that. It takes nodes/roles/services and figures out what needs to get run in a specific order.\n\u003c/quote\u003e\n\nYes, but we could do that with Heat as well. Heat and taskflow are both dependency graphs. Instead of introducing a new tool, we could rework our service templates (required with either tool), and let Heat figure out the specific order. I do have some concerns with using taskflow, as I think we\u0027d need to anticipate maintaining taskflow within our core team.\n\n\u003cquote\u003e\nThe example of how heat -\u003e task-core works is available via the proposed THT patches https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798747\n\nEffectively task-core is the ansible playbook parser of the future deployment framework. There is no plan to use it to replace what heat provides us today. Heat understands the networking/host/roles that we define as our \"cloud\" where task-core expect you to provide all this information as inputs. Similar to what we have today with ansible where we provide inventory and playbook files. Task-core becomes the layer between the heat and the actual task executor (directord) similar to what ansible-runner/ansible-playbook provide today.\n\u003c/quote\u003e\n\nWhat that patch shows replacing is the parts of tripleoclient that execute ansible-runner (utils.run_ansible_playbook) with inputs to directord, and directord executions.\n\nCan we see an example of what the world looks like the driver is not \"ansible_runner\" in the task-core tasks?",
"parentUuid": "537352b5_36fe3792",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "66788876_edee1551",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-09T21:48:27Z",
"side": 1,
"message": "\u003e i get that we are adding \"core_services\" to the role_data output in the service template interface. What is the plan on how we migrate existing services?\n\u003e \n\u003e How do we move from keystone-container-puppet.yaml to a task-core native definition (e.g., https://github.com/mwhahaha/task-core/blob/main/examples/directord/services/openstack-keystone.yaml).\n\u003e \n\u003e What is the expectation on service owners? What I\u0027d like to see is an example of migrating an existing service template.\n\u003e \n\nThe existing service definitions can continue it\u0027s just different data in the output. We can use a mixture of both to handle migration/upgrades/etc. The goal is *not* to write new service definitions but rather adjust the role_data output to populate a \u0027core_services` definition as we work to migrate from ansible -\u003e directord. This allows the migration to occur under the covers and heat params that used to exist continue to function. Additionally not having even more service files to manage.\n\nSince it\u0027s unlikely that it\u0027ll get completed in a single cycle, it\u0027s something that we want to extend in the framework and come through and remove the legacy tasks as they are replaced. For something like keystone, we would remove deploy_step_tasks and populate core_services as necessary. There isn\u0027t a specific service we can point to which is why we\u0027re point to an example of how the framework would change. Since the structure of this is still a WIP and up for debate, I would be hesitant to include concrete examples here.\n\n\n\n\u003e \n\u003e Yes, but we could do that with Heat as well. Heat and taskflow are both dependency graphs. Instead of introducing a new tool, we could rework our service templates (required with either tool), and let Heat figure out the specific order. I do have some concerns with using taskflow, as I think we\u0027d need to anticipate maintaining taskflow within our core team.\n\u003e \n\n\nHeat does not currently handle being able to feed the execution data into something to run it. Heat is good at ordering if it\u0027s known upfront, but since we\u0027ve removed it from the actual framework execution it can no longer handle dynamic ordering. Today ansible is the thing that handles this. It used to be heat when it was os-collect-config. Even if we don\u0027t have task core, we need an overall task processing/execution thing. We\u0027ve seen that heat doesn\u0027t work well for that. So if we don\u0027t use task-core, what do we use? Directord is supposed to be a stupid fast execution platform but doesn\u0027t handle how to deal with success/failure. Taskflow supports things like being able to define roll back tasks for failures or handling tasks linear-ally or parallel. Heat doesn\u0027t do that in this context. Additionally task-core is acting as a layer for being able to leverage different tooling for the execution as necessary (or for a migration path). AFAIK Heat allows you to specify specific ordering like we used to use in terms of SoftwareConfig but I don\u0027t think it lets you dynamically add ordering requirements at a task level. The scope of task-core is at a task level such that a task written in nova-db.yaml could just say \u0027i require database up\u0027 and once that happens then it could be run. That\u0027s a runtime dependency that heat cannot handle unless we use heat to perform the deployment orchestration and I do not thing there is desire to go back to that because of other issues mentioned in comments. In order to do deployment we need an Input Engine (heat \u0026 templates), a deployment ordering/management (task-core/ansible) and an executor (ansible/directord/bash/magic). The overall goal is to break these out into very specific pieces of software and not continue to make something handle things it\u0027s not designed for. \n\n\n\u003e \u003cquote\u003e\n\u003e The example of how heat -\u003e task-core works is available via the proposed THT patches https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798747\n\u003e \n\u003e Effectively task-core is the ansible playbook parser of the future deployment framework. There is no plan to use it to replace what heat provides us today. Heat understands the networking/host/roles that we define as our \"cloud\" where task-core expect you to provide all this information as inputs. Similar to what we have today with ansible where we provide inventory and playbook files. Task-core becomes the layer between the heat and the actual task executor (directord) similar to what ansible-runner/ansible-playbook provide today.\n\u003e \u003c/quote\u003e\n\u003e \n\u003e What that patch shows replacing is the parts of tripleoclient that execute ansible-runner (utils.run_ansible_playbook) with inputs to directord, and directord executions.\n\u003e \n\u003e Can we see an example of what the world looks like the driver is not \"ansible_runner\" in the task-core tasks?\n\n\nTask core simply calls the executor on the task type. Depending on the task type used (this would be defined in THT), when the task is \"run\" by task-core it calls execute: https://github.com/mwhahaha/task-core/blob/main/task_core/tasks.py#L93\n\nIf it\u0027s an ansible task, it uses ansible runner. If it\u0027s a directord task it uses directord. The results of the execution are processed by task-core when the execution is run (e.g. if failure, stop, if not move on to the next task).\n\nTask-core really gets lists of tasks as input (structured from heat template processing into inventory/roles/services), figures out runtime order dependencies as defined on these tasks and \"processes\" the tasks. The \"processing\" of a task is handing the task off to a specific executor (ansible/directord/whatever is written). The executor does a thing and returns success/fail and task-core figures out what to do next and continues processing until it runs out of tasks.\n\nToday it\u0027s heat generating ansible playbooks with ansible-runner doing both the task processing and execution. As mentioned elsewhere there are limits with ansible being able to handle this task processing/execution at scale. In the future it would be heat generating deployment task data, task-core processing the tasks and calling out to directord (or ansible or whatever) to do the execution.",
"parentUuid": "9cbdd113_7d5b12ca",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "24585e3c_3ac09e2b",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-11T11:19:48Z",
"side": 1,
"message": "\u003e For something like keystone, we would remove deploy_step_tasks and populate core_services as necessary.\n\nThis is what I\u0027d like to see an example of if that\u0027s the plan. Especially if we expect service owners to eventually do this work. If they\u0027re going to have to be invested enough to do the work, I think we need to show it at least in a pseudo-code manner.\n\n\u003e Heat does not currently handle being able to feed the execution data into something to run it.\n\n+1\n\n\u003e Heat is good at ordering if it\u0027s known upfront, but since we\u0027ve removed it from the actual framework execution it can no longer handle dynamic ordering.\n\nYes, but only because of how we\u0027ve written the service templates. Heat is actually great at ordering. It\u0027s a declarative dependency graph, same as taskflow. Even though we\u0027ve separated the execution from the ordering, we could still use Heat to figure out dependencies and ordering, generate the imperative sequential execution steps, and then pass it off to something like directord to execute. Granted, it would be a significant refactoring in our internal template architecture in how things are laid out. Our usage of ServiceChain/ResourceChain ends up being opaque to Heat, and it has no way to order dependencies between services. That would need to change. I\u0027m not specifically proposing that, Although I would be interested in Rabi\u0027s thoughs around this approach.\n\nI\u0027m just a little hesitant to go all-in on taskflow given it is unproven for our needs. With Heat, we have a fair amount of familiarity there, and I don\u0027t like the overlap between taskflow and Heat.\n\n\u003e AFAIK Heat allows you to specify specific ordering like we used to use in terms of SoftwareConfig but I don\u0027t think it lets you dynamically add ordering requirements at a task level. \n\nIt does, it has depends_on, very similar to taskflow. And it would dynamically figure everything out. Given we\u0027ve separated that from the execution though, \n\nThe structure of t-h-t has backed us into a corner to where we can\u0027t/don\u0027t use what Heat can do. And given that we\u0027ve separated the execution from Heat (necessary, as os-collect-config wasn\u0027t cutting it), I\u0027m not sure to what extent we could actually take advantage of it.",
"parentUuid": "66788876_edee1551",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "c8af848d_d71246e8",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 6926
},
"writtenOn": "2021-08-11T11:38:37Z",
"side": 1,
"message": "I would agree with James and I think we should carefully examine the option with using Heat as a dependency graph builder (without actual execution). Adding more components (task-core and taskflow) into tripleo violates the simplification trend.",
"parentUuid": "24585e3c_3ac09e2b",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "949f2358_644f2d3e",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-11T13:35:16Z",
"side": 1,
"message": "\u003e This is what I\u0027d like to see an example of if that\u0027s the plan. Especially if we expect service owners to eventually do this work. If they\u0027re going to have to be invested enough to do the work, I think we need to show it at least in a pseudo-code manner.\n\nHonestly we\u0027ve done more with less in spec form. I appreciate this however I think the examples provided are sufficient because they are the tasks that would effectively be included into the existing THT services. \n\n\u003e Yes, but only because of how we\u0027ve written the service templates. Heat is actually great at ordering. It\u0027s a declarative dependency graph, same as taskflow. Even though we\u0027ve separated the execution from the ordering, we could still use Heat to figure out dependencies and ordering, generate the imperative sequential execution steps, and then pass it off to something like directord to execute. Granted, it would be a significant refactoring in our internal template architecture in how things are laid out. Our usage of ServiceChain/ResourceChain ends up being opaque to Heat, and it has no way to order dependencies between services. That would need to change. I\u0027m not specifically proposing that, Although I would be interested in Rabi\u0027s thoughs around this approach.\n\n\nThe goal here is to maintain backwards compatibility so that maybe we could actually backport and leverage this for upgrades. We could leverage both but the issue really comes around run time dynamic execution order which heat won\u0027t solve. We could leverage the service chain for some elements of execution to restrict the scope of the execution ordering but today we already have this concept in ansible. We\u0027re honestly not proposing anything new here but rather simplifying and explicitly using smaller code bases to handle the function. Right now this functionality is spread across heat/ansible. The goal would be to have heat process templates and handle dynamic configuration information. Task-core figures out ordering based on what is inputted. Directord does the execution. \n\n\n\u003e I would agree with James and I think we should carefully examine the option with using Heat as a dependency graph builder (without actual execution). Adding more components (task-core and taskflow) into tripleo violates the simplification trend.\n\nAs just mentioned this is already a thing today in ansible. It\u0027s simplification because the code to handle the functionality is scoped to a specific project and doesn\u0027t include tons of code that we don\u0027t actually leverage (e.g. ansible).",
"parentUuid": "c8af848d_d71246e8",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "d1f22753_0be150f5",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-11T13:36:30Z",
"side": 1,
"message": "That\u0027s not quite what I said.\n\nMy concern is more about taskflow, as I\u0027d want to see agreement that the core team is willing to maintain that going forward. Just like we do with Heat and Ansible. I don\u0027t think we know enough now to make that agreement. We\u0027d have to prove it out with a POC that is at least on scale with the complexity of an existing tripleo deployment.\n\nI don\u0027t see a lot of activity in taskflow, and even if there were, activity isn\u0027t a predictable indicator of future maintenance (see Heat, Mistral, Ansible).\n\nI\u0027m more leaning towards one or the other (Heat or taskflow or something else). That is why I proposed the alternative in my comment about clean room, with support for opt-in backwards compatibility.",
"parentUuid": "c8af848d_d71246e8",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "f389486d_443831d1",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-11T13:42:47Z",
"side": 1,
"message": "\u003e I don\u0027t see a lot of activity in taskflow, and even if there were, activity isn\u0027t a predictable indicator of future maintenance (see Heat, Mistral, Ansible).\n\u003e \n\u003e I\u0027m more leaning towards one or the other (Heat or taskflow or something else). That is why I proposed the alternative in my comment about clean room, with support for opt-in backwards compatibility.\n\nNo thanks. I don\u0027t think anyone wants to sign on to write a task graph utility from scratch. We already have 3 as you\u0027ve mentioned. Heat, Ansible, Taskflow. All three overlap here but only *1* is specifically for this functionality. Taskflow. Yes it\u0027s low in activity but it\u0027s core to things like VM creation in nova or volume creation in Cinder. I keep pointing out that the goal is to simplify by using tools *specifically* designed for a function to reduce the issues we have. In terms of execution testing, we\u0027ve already tested task-core at a scale larger than tripleo and have tools to show that it can handle the number of tasks with various relationships.\n\nhttps://github.com/mwhahaha/task-core/blob/main/examples/scale/gen_scale_data.py\n\nYour ask here is a bit difficult for me because we didn\u0027t ask for any of this before jumping on the ansible bandwagon and have been dealing with the fallout for years now. In this case we\u0027ve provided scale examples for both task-core and directord at larger scale than we ever tested with ansible or heat. Why is this such a sticking point *now*?",
"parentUuid": "d1f22753_0be150f5",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "078735c9_80ca937b",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 8833
},
"writtenOn": "2021-08-11T14:03:31Z",
"side": 1,
"message": "\u003e Yes it\u0027s low in activity but it\u0027s core to things like VM creation in nova or volume creation in Cinder. \n\nAAFAIK, usage of taskflow in openstack projects have very limited scope and is not tested at the scale/scope we want to use it for. For example, cinder volume creation uses a linear flow[1]. There were some discussions long time back for heat to use it, but never looked at it properly. I\u0027m not saying it won\u0027t be good enough for our use-cases, but as I raised earlier it\u0027s like kinda moving to one unsupported tool (read ansible) to another.\n\n[1] https://github.com/openstack/cinder/blob/master/cinder/scheduler/flows/create_volume.py#L167",
"parentUuid": "f389486d_443831d1",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "00f9dc6a_e461ca1b",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-11T14:16:03Z",
"side": 1,
"message": "glance, masakari, octavia also use it. We\u0027re not looking to actually do much in terms of complexity around tasks either. Right now the issues with the existing unsupported tool is that it\u0027s effectively being changed out from under us in an incompatible and less efficient fashion. It\u0027s unlikely that taskflow would have the same issue.",
"parentUuid": "078735c9_80ca937b",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "6922dabb_9376b813",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-11T19:03:58Z",
"side": 1,
"message": "\u003e No thanks. I don\u0027t think anyone wants to sign on to write a task graph utility from scratch.\n\nThat is not what I proposed. What I proposed was directord + task-core without Heat. A clean room implementation of a deployment using the proposed framework here without having to worry about Heat or backwards compatibility (and puppet for that matter, outside the scope I know). The keystone example is what devstack does. I\u0027d like to see more of that, but what TripleO does.\n\nTaskflow activity and maintenance is a concern for me, but I\u0027m not dismissing on those grounds alone. I don\u0027t see any agreement from the tripleo core team about maintaining it, and that *would* be grounds to not use it IMO.\n\n\u003e Your ask here is a bit difficult for me because we didn\u0027t ask for any of this before jumping on the ansible bandwagon and have been dealing with the fallout for years now. In this case we\u0027ve provided scale examples for both task-core and directord at larger scale than we ever tested with ansible or heat. Why is this such a sticking point *now*?\n\nI don\u0027t totally agree with your characterization of the past, but it\u0027s valid feedback. If your position is we didn\u0027t carefully consider choices in the past, and now we\u0027re paying for the fallout, then to learn from that we should carefully consider choices now. Assuming we failed at making decisions on tooling choices on the past, let\u0027s not make decisions the same way again. That\u0027s why it\u0027s a sticking point, and why it\u0027s worth all this discussion.\n\nThe idea that we can stand up keystone in 5 minutes is not evidence on which to build a foundational choice. We all know it\u0027s the long tail of deployment that takes the most time to implement.\n\nMy specific ask is that we prove more of that out. Illustrate migrating an actual service template if that\u0027s the route we agree on, or do a clean room implementation using directord+task-core with the goal being the end result is the same overcloud architecture.",
"parentUuid": "00f9dc6a_e461ca1b",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "39e21adf_1322ed52",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-11T19:18:16Z",
"side": 1,
"message": "\u003e That is not what I proposed. What I proposed was directord + task-core without Heat. A clean room implementation of a deployment using the proposed framework here without having to worry about Heat or backwards compatibility (and puppet for that matter, outside the scope I know). The keystone example is what devstack does. I\u0027d like to see more of that, but what TripleO does.\n\nPlease provide the scope of the ask in terms of what type of PoC you are looking for. Is it pacemaker+mysql+keystone+nova+cinder on 3ctrl+1compute? What is the scope we\u0027re talking about? We\u0027ve put together a PoC that does a standalone of mysql+rabbit+keystone on a single node and showed how it works conceptually already.\n\n\u003e Taskflow activity and maintenance is a concern for me, but I\u0027m not dismissing on those grounds alone. I don\u0027t see any agreement from the tripleo core team about maintaining it, and that *would* be grounds to not use it IMO.\n\nTask flow is the initial choice of project to provide a graph processor + execution framework. If we find something else, we can use that. I feel that the concern about that is nit picking implementation.\n\n\u003e I don\u0027t totally agree with your characterization of the past, but it\u0027s valid feedback. If your position is we didn\u0027t carefully consider choices in the past, and now we\u0027re paying for the fallout, then to learn from that we should carefully consider choices now. Assuming we failed at making decisions on tooling choices on the past, let\u0027s not make decisions the same way again. That\u0027s why it\u0027s a sticking point, and why it\u0027s worth all this discussion.\n\nAs previously mentioned elsewhere, we\u0027ve done more intrusive changes with less thought than is here. You might not agree but if you look through tripleo-specs vs the major changes we\u0027ve done you\u0027ll see my points about lack of forethought. I don\u0027t have issues with people raising concerns, however they seems to be things that have been addressed in the provided documentation or not necessarily in scope for this spec which seems like we have a disconnect on explaining the scope.\n\n\u003e The idea that we can stand up keystone in 5 minutes is not evidence on which to build a foundational choice. We all know it\u0027s the long tail of deployment that takes the most time to implement.\n\nTrue a simple keystone deployment is not representative an actual environment but it\u0027s more of a conceptual PoC in that we can do X actions on Y time where as the existing method would take significantly longer. We investigated what we actually do in all the services and as Kevin has pointed out it\u0027s somewhere like 50% shell. And doing shell in ansible is slow so speeding up the execution of the work around processing the actual work on the remote nodes is significant. We\u0027ve historically focused on the execution time of the actual work vs the overall time it takes for the deployment process to do that. We know that whatever keystone-db-manage or script to manage containers is a fixed cost that this effort willy likely not improve. But there\u0027s a performance penalty that we\u0027ve payed repeatedly with Heat/Ansible when we go to actually perform the deployment which is where we\u0027re hoping to recover some of the lost performance. We saw that Heat was faster overall, but had issues with reliability in comparison to ansible. So we\u0027re proposing essentially combining the two methods we\u0027ve previously used into the next solution via task-core+directord where we get to pick the best of both based on our historical experiances.\n\n\u003e My specific ask is that we prove more of that out. Illustrate migrating an actual service template if that\u0027s the route we agree on, or do a clean room implementation using directord+task-core with the goal being the end result is the same overcloud architecture.\n\nOk, services and architecture do you want for this PoC?",
"parentUuid": "6922dabb_9376b813",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "b226ec97_1e66abfa",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-11T19:52:22Z",
"side": 1,
"message": "\u003e Task flow is the initial choice of project to provide a graph processor + execution framework. If we find something else, we can use that. I feel that the concern about that is nit picking implementation.\n\nFair enough, if the task-core consumption of taskflow is abstracted in a way that we can swap implementations, I think that would help. Not asking for that initially, but we should think about the possibility.\n\n\u003e As previously mentioned elsewhere, we\u0027ve done more intrusive changes with less thought than is here. You might not agree but if you look through tripleo-specs vs the major changes we\u0027ve done you\u0027ll see my points about lack of forethought.\n\nI do agree we\u0027ve made bigger changes in the past with less discussion.\n\nI don\u0027t agree with your assessment on how some of the examples you provide have worked out. Regardless though, if it\u0027s the case that we failed at those past decisions, then I at least hope we would do things differently this time around, and not hold up those supposed failed decisions as the model for how we should be moving such decisions forward this time around.\n\n\u003e I don\u0027t have issues with people raising concerns, however they seems to be things that have been addressed in the provided documentation or not necessarily in scope for this spec which seems like we have a disconnect on explaining the scope.\n\nI will say the spec starts with pretty broad language in the introduction and problem description. I did not nitpick that and tried to focus on the details and proposed change. But perhaps if that were simplified, it would help...\n\nWhat is the \"perfect storm\" that we are reacting to?\n\nWhat is the \"course correct\"?\n\n\"bespoke tools\"...yet we are proposing 3 more, and don\u0027t illustrate a path for removing any?\n\nInitially the spec didn\u0027t say anything about Heat. I do think that has caused some confusion. The example review is for a test service and doesn\u0027t show how an existing service template would migrate to use the new interface, nor is it explained how we would get from the ansible-runner driver to something not !ansible. Nor does it address how to remove the other uses of ansible (as dprince has pointed out).\n\nPuppet has also caused confusion. It is included in the same problem description alongside Ansible, so not surprising there is some assumption that it is addressed by the spec as well.\n\nIt\u0027s not push back, I\u0027m in favor of trying these new tools, but I\u0027m asking if we have ideas on how to approach these other points that the spec itself introduces.\n\nAs for the clean room idea, I would say if we just used tripleoclient, roles_data, network_data, task-core, and directord, what would it look like?\n\nIf we just illustrated one role, a simplified AIO, then that role would be easy to parse in tripleoclient, and pull the list of services out, then pass that to task-core. The roles_data processing could even be pseudo code, and use a hardcoded list of services.\n\nThat doesn\u0027t have to be the approach, but what I\u0027d like to see is a minimum AIO (doesn\u0027t have to use pacemaker), but should do network config, and the end result should be what tripleo would deploy.",
"parentUuid": "39e21adf_1322ed52",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "c95064e6_cc5780d8",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7353
},
"writtenOn": "2021-08-11T19:57:25Z",
"side": 1,
"message": "\u003e \u003e The idea that we can stand up keystone in 5 minutes is not evidence on which to build a foundational choice. We all know it\u0027s the long tail of deployment that takes the most time to implement.\n\nWe did run an execution analysis which is something we should be able to build a foundation upon. The proposal is execution in pseudo realtime, while our current solution has a typical roundtrip of around 1.5 seconds per-task. While keystone in 5 minutes is an example of our current POC the analysis reveals it currently takes half an hour and ~30GiB of RAM to concurrently execute 1000 tasks across 8 hosts, while directord can run the same workload in around 20 seconds with negligible impacts on memory; we\u0027ve documented this and provided all of the testing materials to recreate the experiment. It is true the long tail of deployment is time consuming, however, foundationally I assert we\u0027re starting from a better place.",
"parentUuid": "39e21adf_1322ed52",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "38a654a8_a8ce2e34",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 14985
},
"writtenOn": "2021-08-11T20:14:56Z",
"side": 1,
"message": "\u003e I don\u0027t agree with your assessment on how some of the examples you provide have worked out. Regardless though, if it\u0027s the case that we failed at those past decisions, then I at least hope we would do things differently this time around, and not hold up those supposed failed decisions as the model for how we should be moving such decisions forward this time around.\n\nWe are doing things differently this time but it doesn\u0027t seem that\u0027s being recognized. This spec actually has technical details around scale, etc as those things were completely glossed over in the past but have bit us hard.\n\n\u003e \"bespoke tools\"...yet we are proposing 3 more, and don\u0027t illustrate a path for removing any?\n\nSo maybe that needs to be clarified a bit more in terms of road-map. I think the first target for removal would be ansible/ansible-runner as a primary path in our deployment process. But it\u0027s still TBD on if we can completely remove it for 3rd parties. I think the description might be adjusted to highlight that the tools choose to perform an action are not necessarily developed with that scope in mind. Lots of scope creep for things like heat/ansible/puppet and what we want is a more unix tool philosophy for smaller tools that do a single thing and do it well rather than larger tools that do many things marginally adequately. \n\n\u003e As for the clean room idea, I would say if we just used tripleoclient, roles_data, network_data, task-core, and directord, what would it look like?\n\nI talked to Kevin an we\u0027re going to try and construct something that deploys openstack from after baremetal \u0026 network provisioning for comparison sake. Using a basic 3ctl+1compute (no pacemaker). network data application is just execution of os-net-config after putting a file in place. I don\u0027t think this is really valuable in a PoC context.\n\n\u003e If we just illustrated one role, a simplified AIO, then that role would be easy to parse in tripleoclient, and pull the list of services out, then pass that to task-core. The roles_data processing could even be pseudo code, and use a hardcoded list of services.\n\nYou need to scope what you consider AIO. We use just a keystone AIO for updates CI. So we\u0027ve already done your PoC ask by one definition of AIO. \n\n\u003e That doesn\u0027t have to be the approach, but what I\u0027d like to see is a minimum AIO (doesn\u0027t have to use pacemaker), but should do network config, and the end result should be what tripleo would deploy.\n\nSo I think the issue is how much of a PoC do you want and what needs to be included in it? We have a PoC to deploy openstack services on a single node. If we need to leverage the existing tripleo services/framework as part of the PoC, there are still issues with trying to integration with THT that perhaps we don\u0027t want to do for this PoC. I proposed one path forward in for the adoption patches but they may not be a correct solution since its still trying to continue backwards compatibility.",
"parentUuid": "b226ec97_1e66abfa",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "81e4fa3f_245dab18",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7144
},
"writtenOn": "2021-08-11T21:12:06Z",
"side": 1,
"message": "\u003e We are doing things differently this time but it doesn\u0027t seem that\u0027s being recognized. This spec actually has technical details around scale, etc as those things were completely glossed over in the past but have bit us hard.\n\nI recognize it. Not sure what type of responses you were looking for.\n\n\u003e So I think the issue is how much of a PoC do you want and what needs to be included in it? We have a PoC to deploy openstack services on a single node. If we need to leverage the existing tripleo services/framework as part of the PoC, there are still issues with trying to integration with THT that perhaps we don\u0027t want to do for this PoC. I proposed one path forward in for the adoption patches but they may not be a correct solution since its still trying to continue backwards compatibility.\n\nI\u0027m not going to get into spelling it out as a box to check, because I\u0027m not blocking on the existence of a POC.\n\nWhat I originally asked for is how someone would migrate a service template to use task-core/directord, as it wasn\u0027t clear to me and not addressed in the spec. What I wanted to see is a description of how we get from the keystone in t-h-t to the example in task-core. \n\nA POC would illustrate that, as would an example or pseudo code, it wouldn\u0027t actually have to execute. What is tripleo asking of service owners?\n\nAs the t-h-t patch shows using ansible-runner, it\u0027s also not clear what we would do to not use ansible. I think there were some earlier comments I have not fully caught up on about migrating ansible roles.\n\nThe task-core keystone example is more of a devstack type deployment of keystone. The t-h-t patch shows a sample service using ansible-runner. What would the first step be for keystone in t-h-t? Add a core_services output that uses ansible-runner to apply the same ansible roles we do today to deploy keystone? The end to end dots are still not connected (at least in the spec).\n\nThat\u0027s what got onto the POC path I believe. I originally asked for pseudo code, or just an explanation of how that would work.\n\nI\u0027m also fine to work that out in the code reviews. It just seemed like there were already some ideas on how this would work and I wanted to see it could be explained or illustrated.",
"parentUuid": "38a654a8_a8ce2e34",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "40361e43_ae39ed4f",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 8833
},
"writtenOn": "2021-08-12T03:01:02Z",
"side": 1,
"message": "\u003e Right now the issues with the existing unsupported tool is that it\u0027s effectively being changed out from under us in an incompatible and less efficient fashion.\n\nThere are number of other OpenStack projects and deployment tools that use ansible (openstack-ansible, kolla-ansible etc). As one of the major drivers for this effort is \"future direction of ansible\" and it\u0027s incompatibility, do we\u0027ve an understanding what are the future roadmaps for those projects look like, or this is driven purely from downstream packaging standpoint?",
"parentUuid": "81e4fa3f_245dab18",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "b5adcb75_f0d8e2b5",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 8833
},
"writtenOn": "2021-08-12T03:10:41Z",
"side": 1,
"message": "\u003e Please provide the scope of the ask in terms of what type of PoC you are looking for.\n\nLooking at the directord example for keystone, I don\u0027t get much sense of how the the existing components (ex. THT, container tools) would tie-up together, maybe it\u0027s only me.\n\nIt would be better if we\u0027ve something concrete. Why not have a prototype that deploys undercloud using the proposed toolset. This will give us understanding of the integration issues and complexities.\n\n[1] https://github.com/cloudnull/directord/blob/main/orchestrations/openstack-keystone.yaml",
"parentUuid": "40361e43_ae39ed4f",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "ae86ac9c_d4204a85",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7353
},
"writtenOn": "2021-08-12T03:57:56Z",
"side": 1,
"message": "\u003e Looking at the directord example for keystone, I don\u0027t get much sense of how the the existing components (ex. THT, container tools) would tie-up together, maybe it\u0027s only me.\n\nThe example in directord was something we put together to show we could deploy keystone, and it\u0027s supporting services, in a multi-node setup; that example does not highlight any TripleO integrations. The work in progress review from Alex highlights how we\u0027d integrate [0].\n\n\u003e There are number of other OpenStack projects and deployment tools that use ansible (openstack-ansible, kolla-ansible etc). As one of the major drivers for this effort is \"future direction of ansible\" and it\u0027s incompatibility, do we\u0027ve an understanding what are the future roadmaps for those projects look like, or this is driven purely from downstream packaging standpoint?\n\nDownstream packaging is a significant huddle for us. In terms of other OpenStack deployment tooling, projects like OpenStack-Ansible builds everything from source so the issue of packaging will almost never be problem; if memory serves, this is mostly true for Kolla-Ansible as well. While we may be able to keep everything operational upstream, that isn\u0027t true downstream and operational doesn\u0027t mean we\u0027d be inline with the community.\n\n[0] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798747",
"parentUuid": "b5adcb75_f0d8e2b5",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "f0b69c6c_7f11d36e",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 8833
},
"writtenOn": "2021-08-12T04:10:10Z",
"side": 1,
"message": "\u003e The work in progress review from Alex highlights how we\u0027d integrate [0].\n\nI had seen that example and it does not tell how things would integrated, once we drop ansible as it\u0027s mostly using the config-downloaded playbooks with ansible-runner.\n\nAs I mentioned earlier a prototype that deploys undercloud/standalone without ansible would help.",
"parentUuid": "ae86ac9c_d4204a85",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "4eefbb21_ba9c57e4",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 421,
"author": {
"id": 7353
},
"writtenOn": "2021-08-12T13:24:32Z",
"side": 1,
"message": "+1 we\u0027ll work on a more complete prototype and get back to this spec with the results / examples.",
"parentUuid": "f0b69c6c_7f11d36e",
"range": {
"startLine": 421,
"startChar": 24,
"endLine": 421,
"endChar": 71
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "975093cb_cbd9f7bf",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 451,
"author": {
"id": 4571
},
"writtenOn": "2021-08-05T23:24:22Z",
"side": 1,
"message": "The problem description implies that puppet will be replaced for config generation. Can this be captured in the work items? I\u0027m assuming the migration path would start with replacing puppet invocations with equivalent ansible?\n\nAlso it might be worth identifying what work items other teams can do to migrate their own services to the new model - this is what we did for the move to containers. This would ease the development workload on the tripleo team and help with knowledge transfer and general buy-in",
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "74f3f18d_cce30f3e",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 451,
"author": {
"id": 11655
},
"writtenOn": "2021-08-06T00:18:05Z",
"side": 1,
"message": "I concur. Even though the work is ultimately dependencies, it is still going to need to be spelled out for teams to be able to estimate for their management chains the effort involved and if they *can* even commit to cross team work in the time window.",
"parentUuid": "975093cb_cbd9f7bf",
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "60decf25_8b3dffec",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 451,
"author": {
"id": 7353
},
"writtenOn": "2021-08-06T15:02:13Z",
"side": 1,
"message": "Puppet is part of the problem statement because it is one of the many layers of abstractions we have. With task-core we\u0027ll support a mixed ansible/directord environment but should we move forward with the spec, development of new ansible based functionality would stop. While puppet will remain at play, due to how deeply embedded it is in some places, we\u0027ll look for opportunities to replace redundant capabilities; just as we did when we began the tripleo-ansible push.",
"parentUuid": "74f3f18d_cce30f3e",
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "ae4593d6_4507eee7",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 451,
"author": {
"id": 14985
},
"writtenOn": "2021-08-09T15:39:00Z",
"side": 1,
"message": "Port ansible roles does not imply anything about puppet (ansible roles don\u0027t handly any of this). Puppet executions are separate action embedded in the framework today. There is no current desire to replace that along with the actual task execution methods at this time. In the future if we decide to break the existing hieradata interface that customers and 3rd parties have relied on for years, that will be a separate effort.",
"parentUuid": "60decf25_8b3dffec",
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "6a9a2a2f_a47bd6d2",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 458,
"author": {
"id": 11655
},
"writtenOn": "2021-08-06T00:18:05Z",
"side": 1,
"message": "If not, it might be time for TripleO to consider renaming it\u0027s project or embracing a new identity because the identity which it started with, as suggested by this specification, is going to systematically removed. The core team identity would no longer be openstack on openstack. Plus side: t-shirts that don\u0027t confuse people outside of our community\u0027s circle. Actually, maybe the project should rename regardless. :)",
"range": {
"startLine": 458,
"startChar": 17,
"endLine": 458,
"endChar": 68
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "e7db6f13_ac71b169",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 458,
"author": {
"id": 7353
},
"writtenOn": "2021-08-06T15:02:13Z",
"side": 1,
"message": "I agree, the original idea of OpenStack on OpenStack is no longer at play; even today, without this spec. \n\nI would love less confusing conference swag 😊",
"parentUuid": "6a9a2a2f_a47bd6d2",
"range": {
"startLine": 458,
"startChar": 17,
"endLine": 458,
"endChar": 68
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "66bbc07c_731d249f",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 458,
"author": {
"id": 7144
},
"writtenOn": "2021-08-06T21:07:36Z",
"side": 1,
"message": "This specific spec doesn\u0027t propose removing any more OpenStack API\u0027s (Heat, Ironic, Neutron, Keystone still used).\n\nRenames have come up in the past, and personally I don\u0027t support it as it feels like wasted energy to me. It\u0027s just a name, it does not have to imply anything or mean anything. I\u0027m rather indifferent in folks what to invest in a rename, I just don\u0027t think it\u0027s worth it.\n\nOriginally, it had a distinct meaning (OpenStack on OpenStack), but even that has little significance with changes that are multiple years old by now (e.g. Big Tent). Kolla, Kayobe, openstack-ansible are also \"OpenStack\". Are they not also OpenStack on OpenStack? I\u0027d argue the original meaning was never well understood anyway, and I don\u0027t see a lot of confusion existing today either.\n\nMission statements, goals, and strategies certainly should be changed and evolved as needed to reflect reality.",
"parentUuid": "e7db6f13_ac71b169",
"range": {
"startLine": 458,
"startChar": 17,
"endLine": 458,
"endChar": 68
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
},
{
"key": {
"uuid": "f4c251f1_9c36131e",
"filename": "specs/xena/directord-orchestration.rst",
"patchSetId": 4
},
"lineNbr": 458,
"author": {
"id": 7144
},
"writtenOn": "2021-08-06T21:09:00Z",
"side": 1,
"message": "edit: *if folks want to invest in a rename*",
"parentUuid": "66bbc07c_731d249f",
"range": {
"startLine": 458,
"startChar": 17,
"endLine": 458,
"endChar": 68
},
"revId": "975c7280c649e390625700604c793d41bf05e32c",
"serverId": "4a232e18-c5a9-48ee-94c0-e04e7cca6543",
"unresolved": true
}
]
}