diff --git a/specs/juno/l3-agent-responsiveness.rst b/specs/juno/l3-agent-responsiveness.rst new file mode 100644 index 000000000..964b40d52 --- /dev/null +++ b/specs/juno/l3-agent-responsiveness.rst @@ -0,0 +1,248 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +======================= +L3 Agent Responsiveness +======================= + +https://blueprints.launchpad.net/neutron/+spec/l3-agent-responsiveness + +:Author: Carl Baldwin +:Copyright: 2014 Hewlett-Packard Development Company, L.P. + +On agent restart, the L3 agent loops through all routers to be sure they're in +sync with the database. This task can take over an hour on a heavily loaded +system because of rootwrap, sudo and other inefficiencies. This task locks out +RPC processing until it is done. From a user's perspective, it appears that +the system is completely unresponsive to floating ip and port changes. + + +Problem description +=================== + +On agent restart, the L3 agent immediately kicks off a periodic task called +*_sync_routers_task*. This task grabs a semaphore which locks out the +*_rpc_loop* until it is done. This makes the L3 agent unresponsive to new work +coming in via RPC. Floating IPs need to wait to become active or inactive, +router gateways don't get plugged or unplugged, and subnet ports cannot be +manipulated. This gives a poor impression to a user who has just made an API +call to get something done. + + +Proposed change +=============== + +Overview +-------- + +This blueprint proposes unifying *_sync_routers_task* and *_rpc_loop* in to a +single processing loop. This single loop will give priority to RPC messages. +In other words, an RPC message about a given router will bump that router ahead +in the queue before all of the routers that are in the queue from the +*_sync_routers_task* code path. + +The justification for prioritizing in this way is that *_sync_routers_task* +requests maintenance updates to routers. It is meant to catch the somewhat +unlikely case that a change was made to a router while the agent was down. RPC +messages generally represent changes to the system that are being requested +through the API in the moment. When you consider this, it is clear that RPC +messages should be given precedence to improve the user experience. + +To be fair, the *_sync_routers_task* is also helpful if the system reboots +after a crash. In this case, these updates are more than just maintenance. +However, in this case, each router on the system is already down. It is still +prudent to respond to user requests with priority. + +Each update will carry a timestamp so that they can be prioritized by time if +there are many updates at once with the same priority. + +Parallelism +~~~~~~~~~~~ + +The current L3 implementation allows processing many routers in parallel. In +fact, there is virtually no bound on the number of routers that can be +processed in parallel except for the limit of 1000 grean threads. In reality +though, it is not practical to process more than 4-8 routers in parallel +because there are enough contention points that prevent proper cooperation +between the threads that most of them get starved anyway. + +This blueprint implementation will create a _process_routers_loop to process +all updates. This loop will use a green thread pool of fixed size. The loop +will continuously spawn worker threads to ensure that the maximum number of +workers are either processing a router or waiting on the queue for the next +update to come in. + +The size of the worker pool can easily be made configurable in a follow on to +this blueprint if there is enough demand. However, based on testing done at +scale, the size will initially be set to 8. + +ExclusiveRouterProcessor +~~~~~~~~~~~~~~~~~~~~~~~~ + +A new worker will spawn and immediately call _process_router_update. This +method will immediately look to the queue for the next router update. It will +block (friendly green thread style block) until one is available. + +At this point, there are some timing and coordination issues to consider. The +ExclusiveRouterProcessor class was designed to take care of these. + +First, since there are multiple workers and many routers, we don't want to have +multiple workers touching the same router at once. To avoid this, the queue +implementation will return an instance of ExclusiveRouterProcessor that +guarantees that worker has exclusive access to the router, even if other update +messages come in while it is being processed. This worker will be considered +the master for this router until updates are finished. + +Second, there is the possibility that a new update for the router will come in +and bubble to the front of the priority queue while the router is being +processed with outdated information. To handle this case, the worker that +picks up this new update will try to get exclusive access to the router by +creating an instance of ExclusiveRouterProcessor. This instance will realize +that it is not the master processor and will simply append the update to the +list of updates that the master instance will process. + +When the master instance is done processing the router, it will check its queue +of updates to see if the router needs to be processed again. If another update +is found, it will process the router again, fetching new information from the +DB. It will loop until there are no more updates. This covers the case where +a user is actively making updates to a router over a period of time. The +router simply needs to be processed several times in a row to respond to these +updates until the user is finished. + +It is important to note here that the new update must have bubbled all the way +to the front of the priority queue and a worker needs to grab it off of the +queue before the router processor will loop on the router and process it +multiple times in a row. Without this important distinction, the algorthim +would be subject to a denial of service attack where 8 routers could completely +starve all of the other routers on the system. + +The complexity in this class is mostly around making a guarantee that there +is only one master processor for any given router. The rest is around the +`Timing`_ issues discussed next. + +Timing +~~~~~~ + +Each update carries a timestamp that is initialized to the time when the update +was received. + +The ExclusiveRouterProcessor class carries one timestamp per router that is +updated to just before a database query fetched the latest data about the +router. This timestamp is *not* recorded until the router has been processed +using those data. It will be recorded by calling the fetched_and_processed +method. This is very important because the timestamp records the age of the +data that was last used to complete an update to the router on the system. +This handles the time delta between when the data were fetched and when the +router is finally updated. + +In the case of *_sync_routers_task*, the same timestamp is used for the update +and for the age of the data since the system will immediately run a query to +get all of the router data after the updates are created. However, the new +implementation will still update all routers. They will just be updated with +a lower priority than the RPC generated updates. + +An update will be processed for a router iff the update timestamp is newer than +the most recent router_data_timestamp. + +Alternatives +------------ + +Speeding up the L3 agent has been a work item for some time. Progress has been +made in this area during the Icehouse time frame. For example, sudo was found +to have an inefficiency that added 100 milliseconds or more to each invocation. +This affected the L3 agent's ability to plumb routers in a timely manner. + +In the Juno timeframe, we will get a new daemon mode for rootwrap which will +speed up the agent a great deal. + +The bottom line is that speeding up the agent will not be enough. On an agent +hosting hundreds of routers, there will still be a significant delay caused by +the *_sync_routers_task* which will affect the end user experience. + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +Improved responsiveness to L3 changes made through the API following an agent +restart. + +Other deployer impact +--------------------- + +This change will allow deployers of large scale cloud deployments using L3 +agent to breathe easier. They will be able to deploy updates to the code base, +restart the L3 agents and not worry about the effect it has on the system's +overall responsiveness. + +Developer impact +---------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + `carl-baldwin `_ + +Other contributors: + None + +Work Items +---------- + +https://review.openstack.org/#/c/78819 + +Dependencies +============ + +None + +Testing +======= + +No new gate tests will be required as this does not change functionality. The +implementation will be fully unit tested including new tests to cover the +functionality of the priority queue and router processor. + + +Documentation Impact +==================== + +None + + +References +========== + +https://review.openstack.org/#/c/78819