From 28b97db915b0773e368889f97aa121588b588cda Mon Sep 17 00:00:00 2001 From: Arne Wiebalck Date: Thu, 19 Aug 2021 10:30:07 +0200 Subject: [PATCH] [doc] Update power sync documentation Add some notes on potential UDP packet loss during conductor/BMC power sync with IPMI, the corresponding increase in retries and how to mitigate. Change-Id: I4bc9a8f6f7f4da7f719a65f76ae97b1244701ee9 --- doc/source/admin/index.rst | 2 +- doc/source/admin/power-sync.rst | 26 +++++++++++++++++++++----- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst index 28f55d021c..3e11d35818 100644 --- a/doc/source/admin/index.rst +++ b/doc/source/admin/index.rst @@ -26,7 +26,7 @@ the services. Upgrade Guide Security Troubleshooting FAQ - Power Sync with the Compute Service + Power Synchronization Node Multi-Tenancy Fast-Track Deployment Booting a Ramdisk or an ISO diff --git a/doc/source/admin/power-sync.rst b/doc/source/admin/power-sync.rst index f19ff6c1b3..f4d10aa3c9 100644 --- a/doc/source/admin/power-sync.rst +++ b/doc/source/admin/power-sync.rst @@ -1,6 +1,6 @@ -=================================== -Power Sync with the Compute Service -=================================== +===================== +Power Synchronization +===================== Baremetal Power Sync ==================== @@ -10,8 +10,24 @@ value of the :oslo.config:option:`conductor.force_power_state_during_sync` option is set to ``true`` the power state in the database will be forced on the hardware and if it is set to ``false`` the hardware state will be forced on the database. If this periodic task is enabled, it runs at an interval -defined by the :oslo.config:option:`conductor.sync_power_state_interval` config -option for those nodes which are not in maintenance. +defined by the :oslo.config:option:`conductor.sync_power_state_interval` +config option for those nodes which are not in maintenance. The requests sent +to Baseboard Management Controllers (BMCs) are done with a parallelism +controlled by :oslo.config:option:`conductor.sync_power_state_workers`. +The motivation to send out requests to BMCs in parallel is to handle +misbehaving BMCs which may delay or even block the synchronization otherwise. + +.. note:: + In deployments with many nodes and IPMI as the configured BMC protocol, + the default values of a 60 seconds power sync interval and 8 worker + threads may lead to a high rate of required retries due to client-side UDP + packet loss (visible via the corresponding warnings in the conductor + logs). While Ironic automatically retries to get the power status + for the affected nodes, the failure rate may be reduced by increasing + the power sync cycle, e.g. to 300 seconds, and/or by reducing the number + of power sync workers, e.g. to 2. Pleae keep in mind, however, that + depending on the concrete setup increasing the power sync interval may + have an impact on other components relying on up-to-date power states. Compute-Baremetal Power Sync ============================