Hardware introspection for OpenStack Bare Metal =============================================== This is an auxiliary service for discovering hardware properties for a node managed by `Ironic`_. Hardware introspection or hardware properties discovery is a process of getting hardware parameters required for scheduling from a bare metal node, given it's power management credentials (e.g. IPMI address, user name and password). A special ramdisk is required to collect the information on a node. The default one can be built using diskimage-builder_ and `ironic-discoverd-ramdisk element`_ (see Configuration_ below). * Free software: Apache license * Source: http://git.openstack.org/cgit/openstack/ironic-inspector * Bugs: http://bugs.launchpad.net/ironic-inspector * Blueprints: https://blueprints.launchpad.net/ironic-inspector * Downloads: https://pypi.python.org/pypi/ironic-inspector * Python client library and CLI tool: `python-ironic-inspector-client `_. Refer to CONTRIBUTING.rst_ for instructions on how to contribute. .. _Ironic: https://wiki.openstack.org/wiki/Ironic .. _PyPI: https://pypi.python.org/pypi/ironic-inspector .. _CONTRIBUTING.rst: https://github.com/openstack/ironic-inspector/blob/master/CONTRIBUTING.rst .. note:: **ironic-inspector** was called *ironic-discoverd* before version 2.0.0. Version Support Matrix ---------------------- **ironic-inspector** currently requires bare metal API version ``1.6`` to be provided by Ironic. This version is available starting with Ironic Kilo release. Here is a mapping between Ironic versions and supported **ironic-inspector** versions. The Standalone column shows which **ironic-inspector** versions can be used in standalone mode with each Ironic version. The Inspection Interface column shows which **ironic-inspector** versions can be used with the Ironic inspection interface in each version of Ironic. +--------------+-------------------------------+ |Ironic Version| Inspector (Discoverd) Version | | +----------+--------------------+ | |Standalone|Inspection Interface| +==============+==========+====================+ |Juno |1.0 |N/A | +--------------+----------+--------------------+ |Kilo |1.0 - 2.2 |1.0 - 1.1 | +--------------+----------+--------------------+ |Liberty |1.1 - 2.X |2.0 - 2.X | +--------------+----------+--------------------+ .. note:: ``2.X`` means we don't have specific plans on deprecating support for this Ironic version. This does not imply that we'll support it forever though. Workflow -------- Usual hardware introspection flow is as follows: * Operator enrolls nodes into Ironic_ e.g. via ironic CLI command. Power management credentials should be provided to Ironic at this step. * Nodes are put in the correct state for introspection as described in `Node States`_. * Operator sends nodes on introspection using **ironic-inspector** API or CLI (see Usage_). * On receiving node UUID **ironic-inspector**: * validates node power credentials, current power and provisioning states, * allows firewall access to PXE boot service for the nodes, * issues reboot command for the nodes, so that they boot the ramdisk. * The ramdisk collects the required information and posts it back to **ironic-inspector**. * On receiving data from the ramdisk, **ironic-inspector**: * validates received data, * finds the node in Ironic database using it's BMC address (MAC address in case of SSH driver), * fills missing node properties with received data and creates missing ports. .. note:: **ironic-inspector** is responsible to create Ironic ports for some or all NIC's found on the node. **ironic-inspector** is also capable of deleting ports that should not be present. There are two important configuration options that affect this behavior: ``add_ports`` and ``keep_ports`` (please refer to ``example.conf`` for detailed explanation). Default values as of **ironic-inspector** 1.1.0 are ``add_ports=pxe``, ``keep_ports=all``, which means that only one port will be added, which is associated with NIC the ramdisk PXE booted from. No ports will be deleted. This setting ensures that deploying on introspected nodes will succeed despite `Ironic bug 1405131 `_. Ironic inspection feature by default requires different settings: ``add_ports=all``, ``keep_ports=present``, which means that ports will be created for all detected NIC's, and all other ports will be deleted. Refer to the `Ironic inspection documentation`_ for details. * Separate API (see Usage_) can be used to query introspection results for a given node. * Nodes are put in the correct state for deploying as described in `Node States`_. Starting DHCP server and configuring PXE boot environment is not part of this package and should be done separately. .. _instack-undercloud: https://www.rdoproject.org/Deploying_an_RDO_Undercloud_with_Instack .. _Ironic inspection documentation: http://docs.openstack.org/developer/ironic/deploy/install-guide.html#hardware-inspection Installation ------------ Install from PyPI_ (you may want to use virtualenv to isolate your environment):: pip install ironic-inspector Also there is a `DevStack `_ plugin for **ironic-inspector** - see CONTRIBUTING.rst_ for the current status. Finally, some distributions (e.g. Fedora) provide **ironic-inspector** packaged, some of them - under its old name *ironic-discoverd*. Configuration ~~~~~~~~~~~~~ Copy ``example.conf`` to some permanent place (e.g. ``/etc/ironic-inspector/inspector.conf``). Fill in at least these configuration values: * ``os_username``, ``os_password``, ``os_tenant_name`` - Keystone credentials to use when accessing other services and check client authentication tokens; * ``os_auth_url``, ``identity_uri`` - Keystone endpoints for validating authentication tokens and checking user roles; * ``database`` - where you want **ironic-inspector** SQLite database to be placed; * ``dnsmasq_interface`` - interface on which ``dnsmasq`` (or another DHCP service) listens for PXE boot requests (defaults to ``br-ctlplane`` which is a sane default for TripleO-based installations but is unlikely to work for other cases). See comments inside `example.conf `_ for the other possible configuration options. .. note:: Configuration file contains a password and thus should be owned by ``root`` and should have access rights like ``0600``. As for PXE boot environment, you'll need: * TFTP server running and accessible (see below for using *dnsmasq*). Ensure ``pxelinux.0`` is present in the TFTP root. * Build and put into your TFTP directory kernel and ramdisk from the diskimage-builder_ `ironic-discoverd-ramdisk element`_:: ramdisk-image-create -o discovery fedora ironic-discoverd-ramdisk You need diskimage-builder_ 0.1.38 or newer to do it (using the latest one is always advised). * You need PXE boot server (e.g. *dnsmasq*) running on **the same** machine as **ironic-inspector**. Don't do any firewall configuration: **ironic-inspector** will handle it for you. In **ironic-inspector** configuration file set ``dnsmasq_interface`` to the interface your PXE boot server listens on. Here is an example *dnsmasq.conf*:: port=0 interface={INTERFACE} bind-interfaces dhcp-range={DHCP IP RANGE, e.g. 192.168.0.50,192.168.0.150} enable-tftp tftp-root={TFTP ROOT, e.g. /tftpboot} dhcp-boot=pxelinux.0 * Configure your ``$TFTPROOT/pxelinux.cfg/default`` with something like:: default discover label discover kernel discovery.kernel append initrd=discovery.initramfs discoverd_callback_url=http://{IP}:5050/v1/continue ipappend 3 Replace ``{IP}`` with IP of the machine (do not use loopback interface, it will be accessed by ramdisk on a booting machine). .. note:: There are some prebuilt images which use obsolete ``ironic_callback_url`` instead of ``discoverd_callback_url``. Modify ``pxelinux.cfg/default`` accordingly if you have one of these. Here is *inspector.conf* you may end up with:: [DEFAULT] debug = false [ironic] identity_uri = http://127.0.0.1:35357 os_auth_url = http://127.0.0.1:5000/v2.0 os_username = admin os_password = password os_tenant_name = admin [firewall] dnsmasq_interface = br-ctlplane .. note:: Set ``debug = true`` if you want to see complete logs. .. _diskimage-builder: https://github.com/openstack/diskimage-builder .. _ironic-discoverd-ramdisk element: https://github.com/openstack/diskimage-builder/tree/master/elements/ironic-discoverd-ramdisk Running ~~~~~~~ Run as ``root``:: ironic-inspector --config-file /etc/ironic-inspector/inspector.conf .. note:: Running as ``root`` is not required if **ironic-inspector** does not manage the firewall (i.e. ``manage_firewall`` is set to ``false`` in the configuration file). A good starting point for writing your own *systemd* unit should be `one used in Fedora `_ (note usage of old name). Usage ----- Refer to HTTP-API.rst_ for information on the HTTP API. Refer to the `client page`_ for information on how to use CLI and Python library. .. _HTTP-API.rst: https://github.com/openstack/ironic-inspector/blob/master/HTTP-API.rst .. _HTTP API: https://github.com/openstack/ironic-inspector/blob/master/HTTP-API.rst .. _client page: https://pypi.python.org/pypi/python-ironic-inspector-client Using from Ironic API ~~~~~~~~~~~~~~~~~~~~~ Ironic Kilo introduced support for hardware introspection under name of "inspection". **ironic-inspector** introspection is supported for some generic drivers, please refer to `Ironic inspection documentation`_ for details. Node States ~~~~~~~~~~~ * As of Ironic Kilo release the nodes should be moved to ``MANAGEABLE`` provision state before introspection (requires *python-ironicclient* of version 0.5.0 or newer):: ironic node-set-provision-state manage With Juno release and/or older *python-ironicclient* it's recommended to set maintenance mode, so that nodes are not taken by Nova for deploying:: ironic node-update replace maintenance=true * After successful introspection and before deploying nodes should be made available to Nova, either by moving them to ``AVAILABLE`` state (Kilo):: ironic node-set-provision-state provide or by removing maintenance mode (Juno and/or older client):: ironic node-update replace maintenance=false .. note:: Due to how Nova interacts with Ironic driver, you should wait 1 minute before Nova becomes aware of available nodes after issuing these commands. Setting IPMI Credentials ~~~~~~~~~~~~~~~~~~~~~~~~ If you have physical access to your nodes, you can use **ironic-inspector** to set IPMI credentials for them without knowing the original ones. The workflow is as follows: * Ensure nodes will PXE boot on the right network by default. * Set ``enable_setting_ipmi_credentials = true`` in the **ironic-inspector** configuration file. * Enroll nodes in Ironic with setting their ``ipmi_address`` only. This step allows **ironic-inspector** to distinguish nodes. * Set maintenance mode on nodes. That's an important step, otherwise Ironic might interfere with introspection process. * Start introspection with providing additional parameters: * ``new_ipmi_password`` IPMI password to set, * ``new_ipmi_username`` IPMI user name to set, defaults to one in node driver_info. * Manually power on the nodes and wait. * After introspection is finished (watch nodes power state or use **ironic-inspector** status API) you can turn maintenance mode off. Note that due to various limitations on password value in different BMC, **ironic-inspector** will only accept passwords with length between 1 and 20 consisting only of letters and numbers. Plugins ~~~~~~~ **ironic-inspector** heavily relies on plugins for data processing. Even the standard functionality is largely based on plugins. Set ``processing_hooks`` option in the configuration file to change the set of plugins to be run on introspection data. Note that order does matter in this option. These are plugins that are enabled by default and should not be disabled, unless you understand what you're doing: ``ramdisk_error`` reports error, if ``error`` field is set by the ramdisk, also optionally stores logs from ``logs`` field, see `HTTP API`_ for details. ``scheduler`` validates and updates basic hardware scheduling properties: CPU number and architecture, memory and disk size. ``validate_interfaces`` validates network interfaces information. Here are some plugins that can be additionally enabled: ``example`` example plugin logging it's input and output. ``raid_device`` (deprecated name ``root_device_hint``) gathers block devices from ramdisk and exposes root device in multiple runs. ``extra_hardware`` stores the value of the 'data' key returned by the ramdisk as a JSON encoded string in a Swift object. Refer to CONTRIBUTING.rst_ for information on how to write your own plugin. Troubleshooting --------------- Errors when starting introspection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * *Refusing to introspect node with provision state "available" and maintenance mode off* In Kilo release with *python-ironicclient* 0.5.0 or newer Ironic defaults to reporting provision state ``AVAILABLE`` for newly enrolled nodes. **ironic-inspector** will refuse to conduct introspection in this state, as such nodes are supposed to be used by Nova for scheduling. See `Node States`_ for instructions on how to put nodes into the correct state. Introspection times out ~~~~~~~~~~~~~~~~~~~~~~~ There may be 3 reasons why introspection can time out after some time (defaulting to 60 minutes, altered by ``timeout`` configuration option): #. Fatal failure in processing chain before node was found in the local cache. See `Troubleshooting data processing`_ for the hints. #. Failure to load the ramdisk on the target node. See `Troubleshooting PXE boot`_ for the hints. #. Failure during ramdisk run. See `Troubleshooting ramdisk run`_ for the hints. Troubleshooting data processing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this case **ironic-inspector** logs should give a good idea what went wrong. E.g. for RDO or Fedora the following command will output the full log:: sudo journalctl -u openstack-ironic-inspector (use ``openstack-ironic-discoverd`` for version < 2.0.0). .. note:: Service name and specific command might be different for other Linux distributions (and for old version of **ironic-inspector**). If ``ramdisk_error`` plugin is enabled and ``ramdisk_logs_dir`` configuration option is set, **ironic-inspector** will store logs received from the ramdisk to the ``ramdisk_logs_dir`` directory. This depends, however, on the ramdisk implementation. Troubleshooting PXE boot ^^^^^^^^^^^^^^^^^^^^^^^^ PXE booting most often becomes a problem for bare metal environments with several physical networks. If the hardware vendor provides a remote console (e.g. iDRAC for DELL), use it to connect to the machine and see what is going on. You may need to restart introspection. Another source of information is DHCP and TFTP server logs. Their location depends on how the servers were installed and run. For RDO or Fedora use:: $ sudo journalctl -u openstack-ironic-inspector-dnsmasq (use ``openstack-ironic-discoverd-dnsmasq`` for version < 2.0.0). The last resort is ``tcpdump`` utility. Use something like :: $ sudo tcpdump -i any port 67 or port 68 or port 69 to watch both DHCP and TFTP traffic going through your machine. Replace ``any`` with a specific network interface to check that DHCP and TFTP requests really reach it. If you see node not attempting PXE boot or attempting PXE boot on the wrong network, reboot the machine into BIOS settings and make sure that only one relevant NIC is allowed to PXE boot. If you see node attempting PXE boot using the correct NIC but failing, make sure that: #. network switches configuration does not prevent PXE boot requests from propagating, #. there is no additional firewall rules preventing access to port 67 on the machine where *ironic-inspector* and its DHCP server are installed. If you see node receiving DHCP address and then failing to get kernel and/or ramdisk or to boot them, make sure that: #. TFTP server is running and accessible (use ``tftp`` utility to verify), #. no firewall rules prevent access to TFTP port, #. DHCP server is correctly set to point to the TFTP server, #. ``pxelinux.cfg/default`` within TFTP root contains correct reference to the kernel and ramdisk. Troubleshooting ramdisk run ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Connect to the remote console as described in `Troubleshooting PXE boot`_ to see what is going on with the ramdisk. The ramdisk drops into emergency shell on failure, which you can use to look around. There should be file called ``logs`` with the current ramdisk logs.