Guard conductor from consuming all of the ram

One of the biggest frustrations larger operators have is when they
trigger a massive number of concurrent deployments. As one would
expect, the memory utilization of the conductor goes up. Except,
even with the default number of worker threads, if we're requested
to convert 80 images at the same time, or to perform the write-out
to the remote node at the same time, we will consume a large amount
of system RAM. Or more specifically, qemu-img will consume a large
amount of memory.

If the amount of memory goes too low, the system can trigger
OOMKiller which will slay processes using ram. Ideally, we do not
want this to happen to our conductor process, much less the work
that is being performed, so we need to add some guard rails to help
keep us from entering into situations where we may compromise the
conductor by taking on too much work.

Adds a guard in the conductor to prevent multiple parallel
deployment operations from running the conductor out of memory.

With the defaults, the conductor will attempt to throttle back
automatically and hold worker threads which will slow down the
amount of work also proceeding through the conductor, as we are
in a memory condition where we should be careful about the work.

The defaults allow this to occur for a total of 15 seconds between
re-check of available RAM, for a total number of six retries.
The minimum default is 1024 (MB), as this is the amount of memory
qemu-img allocates when trying to write images. This quite literally
means no additional qemu-img process can spawn until the default
memory situation has resolved itself.

Change-Id: I69db0169c564c5b22abd0cb1b890f409c13b0ac2
This commit is contained in:
Julia Kreger 2020-05-08 15:15:06 -07:00
parent fd34d3c437
commit d9913370de
11 changed files with 266 additions and 1 deletions

View File

@ -1414,6 +1414,9 @@ function configure_ironic {
iniset $IRONIC_CONF_FILE DEFAULT use_syslog $SYSLOG
# NOTE(vsaienko) with multinode each conductor should have its own host.
iniset $IRONIC_CONF_FILE DEFAULT host $LOCAL_HOSTNAME
# NOTE(TheJulia) Set a minimum amount of memory that is more in-line with
# OpenStack CI and the images deployed.
iniset $IRONIC_CONF_FILE DEFAULT minimum_required_memory 256
# Retrieve deployment logs
iniset $IRONIC_CONF_FILE agent deploy_logs_collect $IRONIC_DEPLOY_LOGS_COLLECT
iniset $IRONIC_CONF_FILE agent deploy_logs_storage_backend $IRONIC_DEPLOY_LOGS_STORAGE_BACKEND

View File

@ -829,3 +829,46 @@ commonly where it is encountered.:
Once you have updated the saved interfaces, you should be able to safely
return the ``ironic.conf`` configuration change in changing what interfaces
are enabled by the conductor.
I'm getting Out of Memory errors
================================
This issue, also known as the "the OOMKiller got my conductor" case,
is where your OS system memory reaches a point where the operating
system engages measures to shed active memory consumption in order
to prevent a complete failure of the machine. Unfortunately this
can cause unpredictable behavior.
How did I get here?
-------------------
One of the major consumers of memory in a host running an ironic-conductor is
transformation of disk images using the ``qemu-img`` tool. This tool, because
the disk images it works with are both compressed and out of linear block
order, requires a considerable amount of memory to efficently re-assemble
and write-out a disk to a device, or to simply convert the format such as
to a ``raw`` image.
By default, ironic's configuration limits this conversion to 1 GB of RAM
for the process, but each conversion does cause additional buffer memory
to be used, which increases overall system memory pressure. Generally
memory pressure alone from buffers will not cause an out of memory condition,
but the multiple conversions or deployments running at the same time
CAN cause extreme memory pressure and risk the system running out of memory.
How do I resolve this?
----------------------
This can be addressed a few different ways:
* Use raw images, however these images can be substantially larger
and require more data to be transmitted "over the wire".
* Add more physical memory.
* Add swap space.
* Reduce concurrency, possibly via another conductor or changing the
nova-compute.conf ``max_concurrent_builds`` parameter.
* Or finally, adjust the ``[DEFAULT]minimum_required_memory`` parameter
in your ironic.conf file. The default should be considered a "default
of last resort" and you may need to reserve additional memory. You may
also wish to adjust the ``[DEFAULT]minimum_memory_wait_retries`` and
``[DEFAULT]minimum_memory_wait_time`` parameters.

View File

@ -808,3 +808,8 @@ class UnknownAttribute(ClientSideError):
class AgentInProgress(IronicException):
_msg_fmt = _('Node %(node)s command "%(command)s" failed. Agent is '
'presently executing a command. Error %(error)s')
class InsufficentMemory(IronicException):
_msg_fmt = _("Available memory at %(free)s, Insufficent as %(required)s "
"is required to proceed at this time.")

View File

@ -402,6 +402,8 @@ def image_to_raw(image_href, path, path_tmp):
if fmt != "raw":
staged = "%s.converted" % path
utils.is_memory_insufficent(raise_if_fail=True)
LOG.debug("%(image)s was %(format)s, converting to raw",
{'image': image_href, 'format': fmt})
with fileutils.remove_path_on_error(staged):

View File

@ -27,6 +27,7 @@ import os
import re
import shutil
import tempfile
import time
import jinja2
from oslo_concurrency import processutils
@ -35,6 +36,7 @@ from oslo_serialization import jsonutils
from oslo_utils import fileutils
from oslo_utils import netutils
from oslo_utils import timeutils
import psutil
import pytz
from ironic.common import exception
@ -586,3 +588,61 @@ def file_mime_type(path):
"""Gets a mime type of the given file."""
return execute('file', '--brief', '--mime-type', path,
use_standard_locale=True)[0].strip()
def _get_mb_ram_available():
# NOTE(TheJulia): The .available value is the memory that can be given
# to a process without this process beginning to swap itself.
return psutil.virtual_memory().available / 1024 / 1024
def is_memory_insufficent(raise_if_fail=False):
"""Checks available system memory and holds the deployment process.
Evaluates the current system memory available, meaning can be
allocated to a process by the kernel upon allocation request,
and delays the execution until memory has been freed,
or until it has timed out.
This method will issue a sleep, if the amount of available memory is
insufficent. This is configured using the
``[DEFAULT]minimum_memory_wait_time`` and the
``[DEFAULT]minimum_memory_wait_retries``.
:param raise_if_fail: Default False, but if set to true an
InsufficentMemory exception is raised
upon insufficent memory.
:returns: True if the check has timed out. Otherwise None is returned.
:raises: InsufficentMemory if the raise_if_fail parameter is set to
True.
"""
required_memory = CONF.minimum_required_memory
loop_count = 0
while _get_mb_ram_available() < required_memory:
log_values = {
'available': _get_mb_ram_available(),
'required': required_memory,
}
if CONF.minimum_memory_warning_only:
LOG.warning('Memory is at %(available)s MiB, required is '
'%(required)s. Ironic is in warning-only mode '
'which can be changed by altering the '
'[DEFAULT]minimum_memory_warning_only',
log_values)
return False
if loop_count >= CONF.minimum_memory_wait_retries:
LOG.error('Memory is at %(available)s MiB, required is '
'%(required)s. Notifying caller that we have '
'exceeded retries.',
log_values)
if raise_if_fail:
raise exception.InsufficentMemory(
free=_get_mb_ram_available(),
required=required_memory)
return True
LOG.warning('Memory is at %(available)s MiB, required is '
'%(required)s, waiting.', log_values)
# Sleep so interpreter can switch threads.
time.sleep(CONF.minimum_memory_wait_time)
loop_count = loop_count + 1

View File

@ -352,6 +352,32 @@ service_opts = [
('json-rpc', _('use JSON RPC transport'))],
help=_('Which RPC transport implementation to use between '
'conductor and API services')),
cfg.BoolOpt('minimum_memory_warning_only',
mutable=True,
default=True,
help=_('Setting to govern if Ironic should only warn instead '
'of attempting to hold back the request in order to '
'prevent the exhaustion of system memory.')),
cfg.IntOpt('minimum_required_memory',
mutable=True,
default=1024,
help=_('Minimum memory in MiB for the system to have '
'available prior to starting a memory intensive '
'process on the conductor.')),
cfg.IntOpt('minimum_memory_wait_time',
mutable=True,
default=15,
help=_('Seconds to wait between retries for free memory '
'before launching the process. This, combined with '
'``memory_wait_retries`` allows the conductor to '
'determine how long we should attempt to directly '
'retry.')),
cfg.IntOpt('minimum_memory_wait_retries',
mutable=True,
default=6,
help=_('Number of retries to hold onto the worker before '
'failing or returning the thread to the pool if '
'the conductor can automatically retry.')),
]
utils_opts = [

View File

@ -699,7 +699,19 @@ class ISCSIDeploy(agent_base.AgentDeployMixin, agent_base.AgentBaseMixin,
node = task.node
LOG.debug('Continuing the deployment on node %s', node.uuid)
if utils.is_memory_insufficent():
# Insufficent memory, but we can just let the agent heartbeat
# again in order to initiate deployment when the situation has
# changed.
LOG.warning('Insufficent memory to write image for node '
'%(node)s. Skipping until next heartbeat.',
{'node': node.uuid})
info = node.driver_internal_info
info['skip_current_deploy_step'] = False
node.driver_internal_info = info
node.last_error = "Deploy delayed due to insufficent memory"
node.save()
return states.DEPLOYWAIT
uuid_dict_returned = do_agent_iscsi_deploy(task, self._client)
utils.set_node_nested_field(node, 'driver_internal_info',
'deployment_uuids', uuid_dict_returned)

View File

@ -19,12 +19,14 @@ import os
import os.path
import shutil
import tempfile
import time
from unittest import mock
import jinja2
from oslo_concurrency import processutils
from oslo_config import cfg
from oslo_utils import netutils
import psutil
from ironic.common import exception
from ironic.common import utils
@ -447,6 +449,49 @@ class TempFilesTestCase(base.TestCase):
utils._check_dir_free_space, "/fake/path")
mock_stat.assert_called_once_with("/fake/path")
@mock.patch.object(time, 'sleep', autospec=True)
@mock.patch.object(psutil, 'virtual_memory', autospec=True)
def test_is_memory_insufficent(self, mock_vm_check, mock_sleep):
self.config(minimum_memory_warning_only=False)
class vm_check(object):
available = 1000000000
mock_vm_check.return_value = vm_check
self.assertTrue(utils.is_memory_insufficent())
self.assertEqual(14, mock_vm_check.call_count)
@mock.patch.object(time, 'sleep', autospec=True)
@mock.patch.object(psutil, 'virtual_memory', autospec=True)
def test_is_memory_insufficent_good(self, mock_vm_check,
mock_sleep):
self.config(minimum_memory_warning_only=False)
class vm_check(object):
available = 3276700000
mock_vm_check.return_value = vm_check
self.assertFalse(utils.is_memory_insufficent())
self.assertEqual(1, mock_vm_check.call_count)
@mock.patch.object(time, 'sleep', autospec=True)
@mock.patch.object(psutil, 'virtual_memory', autospec=True)
def test_is_memory_insufficent_recovers(self, mock_vm_check,
mock_sleep):
class vm_check_bad(object):
available = 1023000000
class vm_check_good(object):
available = 3276700000
self.config(minimum_memory_warning_only=False)
mock_vm_check.side_effect = iter([vm_check_bad,
vm_check_bad,
vm_check_good])
self.assertFalse(utils.is_memory_insufficent())
self.assertEqual(3, mock_vm_check.call_count)
class GetUpdatedCapabilitiesTestCase(base.TestCase):

View File

@ -62,6 +62,24 @@ class TestImageCacheFetch(base.TestCase):
None, self.uuid, self.dest_path, True)
self.assertFalse(mock_clean_up.called)
@mock.patch.object(image_cache, '_fetch', autospec=True)
@mock.patch.object(image_cache.ImageCache, 'clean_up', autospec=True)
@mock.patch.object(image_cache.ImageCache, '_download_image',
autospec=True)
def test_fetch_image_no_master_dir_memory_low(self,
mock_download,
mock_clean_up,
mock_fetch):
mock_fetch.side_effect = exception.InsufficentMemory
self.cache.master_dir = None
self.assertRaises(exception.InsufficentMemory,
self.cache.fetch_image,
self.uuid, self.dest_path)
self.assertFalse(mock_download.called)
mock_fetch.assert_called_once_with(
None, self.uuid, self.dest_path, True)
self.assertFalse(mock_clean_up.called)
@mock.patch.object(image_cache.ImageCache, 'clean_up', autospec=True)
@mock.patch.object(image_cache.ImageCache, '_download_image',
autospec=True)
@ -220,6 +238,14 @@ class TestImageCacheFetch(base.TestCase):
self.assertEqual(2, mock_link.call_count)
self.assertTrue(mock_log.error.called)
@mock.patch.object(image_cache, '_fetch', autospec=True)
def test__download_image_raises_memory_guard(self, mock_fetch):
mock_fetch.side_effect = exception.InsufficentMemory
self.assertRaises(exception.InsufficentMemory,
self.cache._download_image,
self.uuid, self.master_path,
self.dest_path)
@mock.patch.object(os, 'unlink', autospec=True)
class TestUpdateImages(base.TestCase):

View File

@ -611,6 +611,9 @@ class ISCSIDeployTestCase(db_base.DbTestCase):
)
self.node.driver_internal_info['agent_url'] = 'http://1.2.3.4:1234'
dhcp_factory.DHCPFactory._dhcp_provider = None
# Memory checks shoudn't apply here as they will lead to
# false positives for unit testing in CI.
self.config(minimum_memory_warning_only=True)
def test_get_properties(self):
with task_manager.acquire(self.context, self.node.uuid,
@ -1081,6 +1084,23 @@ class ISCSIDeployTestCase(db_base.DbTestCase):
set_boot_device_mock.assert_called_once_with(
mock.ANY, task, device=boot_devices.DISK, persistent=True)
@mock.patch.object(iscsi_deploy, 'do_agent_iscsi_deploy', autospec=True)
@mock.patch.object(utils, 'is_memory_insufficent', autospec=True)
def test_write_image_out_of_memory(self, mock_memory_check,
mock_do_iscsi_deploy):
mock_memory_check.return_value = True
self.node.provision_state = states.DEPLOYWAIT
self.node.target_provision_state = states.ACTIVE
self.node.save()
with task_manager.acquire(self.context, self.node.uuid) as task:
value = task.driver.deploy.write_image(task)
self.assertEqual(states.DEPLOYWAIT, value)
self.assertEqual(states.DEPLOYWAIT, task.node.provision_state)
self.assertIn('skip_current_deploy_step',
task.node.driver_internal_info)
self.assertTrue(mock_memory_check.called)
self.assertFalse(mock_do_iscsi_deploy.called)
@mock.patch.object(manager_utils, 'restore_power_state_if_needed',
autospec=True)
@mock.patch.object(manager_utils, 'power_on_node_if_needed', autospec=True)

View File

@ -0,0 +1,23 @@
---
features:
- |
The ``ironic-conductor`` process now has a concept of an internal
memory limit. The intent of this is to prevent the conductor from running
the host out of memory when a large number of deployments have been
requested.
These settings can be tuned using
``[DEFAULT]minimum_required_memory``,
``[DEFAULT]mimimum_memory_wait_time``,
``[DEFAULT]minimum_memory_wait_retries``, and
``[DEFAULT]minimum_memory_warning_only``.
Where possible, Ironic will attempt to wait out the time window, thus
consuming the conductor worker thread which will resume if the memory
becomes available. This will effectively rate limit concurrency.
If raw image conversions with-in the conductor is required, and a
situation exists where insufficent memory exists and it cannot be waited,
the deployment operation will fail. For the ``iscsi`` deployment
interface, which is the other location in ironic that may consume large
amounts of memory, the conductor will wait until the next agent heartbeat.