From bb30f9945c63487fcc060df6c511e26072aa0b7c Mon Sep 17 00:00:00 2001 From: Julia Kreger Date: Tue, 7 Jul 2020 07:49:40 -0700 Subject: [PATCH] Add some tuning documentation Change-Id: I56e3c45bf7ae89b3f96ee826565bf153908d1bf7 --- doc/source/admin/drivers/ipa.rst | 2 + doc/source/admin/index.rst | 1 + doc/source/admin/tuning.rst | 141 +++++++++++++++++++++++++++++++ 3 files changed, 144 insertions(+) create mode 100644 doc/source/admin/tuning.rst diff --git a/doc/source/admin/drivers/ipa.rst b/doc/source/admin/drivers/ipa.rst index 3a5aa57c40..2fea3f9c84 100644 --- a/doc/source/admin/drivers/ipa.rst +++ b/doc/source/admin/drivers/ipa.rst @@ -44,6 +44,8 @@ Requirements Using IPA requires it to be present and configured on the deploy ramdisk, see :ref:`deploy-ramdisk` +.. _ipa-proxies: + Using proxies for image download ================================ diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst index c2bb921905..dd3a17bf67 100644 --- a/doc/source/admin/index.rst +++ b/doc/source/admin/index.rst @@ -53,6 +53,7 @@ Advanced Topics Agent Token Deploying without BMC Credentials Layer 3 or DHCP-less Ramdisk Booting + Tuning Ironic .. toctree:: :hidden: diff --git a/doc/source/admin/tuning.rst b/doc/source/admin/tuning.rst new file mode 100644 index 0000000000..02636e6083 --- /dev/null +++ b/doc/source/admin/tuning.rst @@ -0,0 +1,141 @@ +============= +Tuning Ironic +============= + +Memory Utilization +================== + +Memory utilization is a difficult thing to tune in Ironic as largely we may +be asked by API consumers to perform work for which the underlying tools +require large amounts of memory. + +The biggest example of this is image conversion. Images not in a raw format +need to be written out to disk (local files or remote in iscsi deploy) which +requires the conversion process to generate an in-memory map to re-assemble +the image contents into a coherent stream of data. This entire process also +stresses the kernel buffers and cache. + +This ultimately comes down to a trade-off of Memory versus Performance, +similar to the trade-off of Performance versus Cost. + +On a plus side, an idle Ironic deployment does not need much in the way +of memory. On the down side, a highly bursty environment where a large +number of concurrent deployments may be requested should consider two +aspects: + +* How is the ironic-api service/process set up? Will more + processes be launched automatically? +* Are images prioritized for storage size on disk? Or are they compressed and + require format conversion? + +API +=== + +Ironic's API should have a fairly stable memory footprint with activity, +however depending on how the webserver is running the API, additional +processes can be launched. + +Under normal conditions, as of Ironic 15.1, the ``ironic-api`` service/process +consumes approximately 270MB of memory per worker. Depending on how the +process is being launched, the number of workers and maximum request threads +per worker may differ. Naturally there are configuration and performance +trade-offs. + +* Directly as a native python process, i.e. execute ``ironic-api`` + processes. Each single worker allows for multiple requests to be handled + and threaded at the same time which can allow high levels of request + concurrency. As of the Victoria cycle, a direct invocation of the + ``ironic-api`` program will only launch a maximum of four workers. +* Launched via a wrapper such as Apache+uWSGI may allow for multiple distinct + worker processes, but these workers typically limit the number of request + processing threads that are permitted to execute. This means requests can + stack up in the front-end webserver and be released to the ``ironic-api`` + as prior requests complete. In environments with long running synchronous + calls, such as use of the vendor passthru interface, this can be very + problematic. + +When the webserver is launched by the API process directly, the default is +based upon the number of CPU sockets in your machine. + +When launching using uwsgi, this will entirely vary upon your configuration, +but balancing workers/threads based upon your load and needs is highly +advisable. Each worker process is unique and consumes far more memory than +a comparable number of worker threads. At the same time, the scheduler will +focus on worker processes as the threads are greenthreads. + +.. note:: + Host operating systems featuring in-memory de-duplication should see + an improvement in the overall memory footprint with multiple processes, + but this is not something the development team has measured and will vary + based upon multiple factors. + +One important item to note: each Ironic API service/process *does* keep a +copy of the hash ring as generated from the database *in-memory*. This is +done to help allocate load across a cluster in-line with how individual nodes +and their responsible conductors are allocated across the cluster. +In other words, your amount of memory WILL increase corresponding to +the number of nodes managed by each ironic conductor. It is important +to understand that features such as `conductor groups <./conductor-groups.rst>`_ +means that only matching portions of nodes will be considered for the +hash ring if needed. + +Conductor +========= + +A conductor process will launch a number of other processes, as required, +in order to complete the requested work. Ultimately this means it can quickly +consume large amounts of memory because it was asked to complete a substantial +amount of work all at once. + +The ``ironic-conductor`` from ironic 15.1 consumes by default about 340MB of +RAM in an idle configuration. This process, by default, operates as a single +process. Additional processes can be launched, but they must have unique +resolvable hostnames and addresses for JSON-RPC or use a central +oslo.messaging supported message bus in order for Webserver API to Conductor +API communication to be functional. + +Typically, the most memory intensive operation that can be triggered is a +image conversion for deployment, which is limited to 1GB of RAM per conversion +process. + +Most deployments, by default, do have a concurrency limit depending on their +Compute (See `nova.conf `_ +setting ``max_concurrent_builds``) configuration. However, this is only per +``nova-compute`` worker, so naturally this concurrency will scale with +additional workers. + +Stand-alone users can easily request deployments exceeding the Compute service +default maximum concurrent builds. As such, if your environment is used this +way, you may wish to carefully consider your deployment architecture. + +With a single nova-compute process talking to a single conductor, asked to +perform ten concurrent deployments of images requiring conversion, the memory +needed may exceed 10GB. This does however, entirely depend upon image block +structure and layout, and what deploy interface is being used. + +What can I do? +============== + +Previously in this document, we've already suggested some architectural +constraints and limitations, but there are some things that can be done +to maximize performance. Again, this will vary greatly depending on your +use. + +* Use the ``direct`` deploy interface. This offloads any final image + conversion to the host running the ``ironic-python-agent``. Additionally, + if Swift or other object storage such as RadosGW is used, downloads can + be completely separated from the host running the ``ironic-conductor``. +* Use small/compact "raw" images. Qcow2 files are generally compressed + and require substantial amounts of memory to decompress and stream. +* Tune the internal memory limit for the conductor using the + ``[DEFAULT]memory_required_minimum`` setting. This will help the conductor + throttle back memory intensive operations. The default should prevent + Out-of-Memory operations, but under extreme memory pressure this may + still be sub-optimal. Before changing this setting, it is highly advised + to consult with your resident "Unix wizard" or even the Ironic + development team in upstream IRC. This feature was added in the Wallaby + development cycle. +* If network bandwidth is the problem you are seeking to solve for, you may + wish to explore a mix of the ``direct`` deploy interface and caching + proxies. Such a configuration can be highly beneficial in wide area + deployments. See :ref:`Using proxies for image download `.