d9913370de
One of the biggest frustrations larger operators have is when they trigger a massive number of concurrent deployments. As one would expect, the memory utilization of the conductor goes up. Except, even with the default number of worker threads, if we're requested to convert 80 images at the same time, or to perform the write-out to the remote node at the same time, we will consume a large amount of system RAM. Or more specifically, qemu-img will consume a large amount of memory. If the amount of memory goes too low, the system can trigger OOMKiller which will slay processes using ram. Ideally, we do not want this to happen to our conductor process, much less the work that is being performed, so we need to add some guard rails to help keep us from entering into situations where we may compromise the conductor by taking on too much work. Adds a guard in the conductor to prevent multiple parallel deployment operations from running the conductor out of memory. With the defaults, the conductor will attempt to throttle back automatically and hold worker threads which will slow down the amount of work also proceeding through the conductor, as we are in a memory condition where we should be careful about the work. The defaults allow this to occur for a total of 15 seconds between re-check of available RAM, for a total number of six retries. The minimum default is 1024 (MB), as this is the amount of memory qemu-img allocates when trying to write images. This quite literally means no additional qemu-img process can spawn until the default memory situation has resolved itself. Change-Id: I69db0169c564c5b22abd0cb1b890f409c13b0ac2
24 lines
1.0 KiB
YAML
24 lines
1.0 KiB
YAML
---
|
|
features:
|
|
- |
|
|
The ``ironic-conductor`` process now has a concept of an internal
|
|
memory limit. The intent of this is to prevent the conductor from running
|
|
the host out of memory when a large number of deployments have been
|
|
requested.
|
|
|
|
These settings can be tuned using
|
|
``[DEFAULT]minimum_required_memory``,
|
|
``[DEFAULT]mimimum_memory_wait_time``,
|
|
``[DEFAULT]minimum_memory_wait_retries``, and
|
|
``[DEFAULT]minimum_memory_warning_only``.
|
|
|
|
Where possible, Ironic will attempt to wait out the time window, thus
|
|
consuming the conductor worker thread which will resume if the memory
|
|
becomes available. This will effectively rate limit concurrency.
|
|
|
|
If raw image conversions with-in the conductor is required, and a
|
|
situation exists where insufficent memory exists and it cannot be waited,
|
|
the deployment operation will fail. For the ``iscsi`` deployment
|
|
interface, which is the other location in ironic that may consume large
|
|
amounts of memory, the conductor will wait until the next agent heartbeat.
|