da3ce73198
This adds support for deleting OVN controller/metadata agents. Behavior is undefined if the agents are still actually up as per the Agent API docs. As part of this, it is necessary to be able to tell all workers that the agent is gone. This can't be done by deleting the Chassis, because ovn-controller deletes the Chassis if it is stopped gracefully and we need to still display those agents as down until ovn-controller is restarted. This also means we can't write a value to the Chassis marking the agent as 'deleted' because the Chassis may not be there. And of course you can't use the cache because then other workers won't see that the agent is deleted. Due to the hash ring implementation, we also cannot naively just send some pre-defined event that all workers can listen for to update their status of the agent. Only one worker would process the event. So we need some kind of GLOBAL event type that is processed by all workers. When the hash ring implementation was done, the agent API implementation was redesigned to work around moving from having a single OVN Worker to having distributed events. That implementation relied on marking the agents 'alive' in the OVSDB. With large numbers of Chassis entries, this induces significant load, with 2 DB writes per Chassis per cfg.CONF.agent_down_time / 2 seconds (37 by default). This patch reverts that change and goes back to using events to store agent information in the cache, but adds support for "GLOBAL" events that are run on each worker that uses a particular connection. Change-Id: I4581848ad3e176fa576f80a752f2f062c974c2d1