Document PXE retries

Change-Id: I5937fa190e780269ffa677aa01efaa1048fa20b0
2019-09-23 16:02:35 +02:00 · 2019-09-23 16:02:35 +02:00 · b164e18f08
commit b164e18f08
parent 0450c4f2fd
1 changed files with 26 additions and 0 deletions
--- a/doc/source/install/configure-pxe.rst
+++ b/doc/source/install/configure-pxe.rst
@ -490,3 +490,29 @@ nodes will be deployed by 'grubaa64.efi', and ppc64 nodes by 'bootppc64'::
    # configuration per node architecture. For example:
    # aarch64:/opt/share/grubaa64_pxe_config.template (dict value)
    pxe_config_template_by_arch=aarch64:pxe_grubaa64_config.template,ppc64:pxe_ppc64_config.template
+
+PXE timeouts tuning
+-------------------
+
+Because of its reliance on UDP-based protocols (DHCP and TFTP), PXE is
+particularly vulnerable to random failures during the booting stage. If the
+deployment ramdisk never calls back to the bare metal conductor, the build will
+be aborted, and the node will be moved to the ``deploy failed`` state, after
+the deploy callback timeout. This timeout can be changed via the
+:oslo.config:option:`conductor.deploy_callback_timeout` configuration option.
+
+Starting with the Train release, the Bare Metal service can retry PXE boot if
+it takes too long. The timeout is defined via
+:oslo.config:option:`pxe.boot_retry_timeout` and must be smaller than the
+``deploy_callback_timeout``, otherwise it will have no effect.
+
+For example, the following configuration sets the overall timeout to 60
+minutes, allowing two retries after 20 minutes:
+
+.. code-block:: ini
+
+    [conductor]
+    deploy_callback_timeout = 3600
+
+    [pxe]
+    boot_retry_timeout = 1200