Commit Graph

570 Commits (afc432af4d2342a550536967a7c880145266c362)

Author SHA1 Message Date
Shivanand Tendulker 433b1fd197 Adds rescue_interface to base driver class
This commit adds `rescue` interface to `BaseDriver` and implements
it for `fake-hardware` hardware type. It adds configuration
parameters '[DEFAULT]/enabled_rescue_interfaces' and
'[DEFAULT]/default_rescue_interface'. The default value of
configuration parameter '[DEFAULT]/enabled_rescue_interfaces' is
`no-rescue`.

It adds new rescue states and a new 'rescue' field to the Node
object. It adds objects.node.Node._convert_to_version().
The method handles converting the new rescue_interface field
between different versions of the Node.

Partial-bug: #1526449
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Co-Authored-By: Josh Gachnang <josh@pcsforeducation.com>
Co-Authored-By: Jesse J. Cook <jesse.j.cook@member.fsf.org>
Co-Authored-By: Mario Villaplana <mario.villaplana@gmail.com>
Co-Authored-By: Aparna <aparnavtce@gmail.com>
Co-Authored-By: Shivanand Tendulker <stendulker@gmail.com>

Change-Id: I1534247bf207a20a7a58534988192aef392eaff2
6 years ago
Sam Betts b642f28be4 Receive and store agent version on heartbeat
This patch enables receiving agent_version as part of heartbeat, and
stores this information on driver_internal_info. This is so that Ironic
can dynamically adjust which features and parameters it uses based on
which version of the agent is being used.

Change-Id: I400adba5d908b657751a83971811e8586f46c673
Partial-Bug: #1602265
6 years ago
John L. Villalovos 8ceaad42ff ipmitool: reboot: Don't power off node if already off
Commit ee5d4942a1 changed the existing
behavior so that if an ipmitool command fails when attempting to set
the power state it causes a failure.  The problem with that approach
is that on some systems if the system is already in the desired power
state, an error will be generated when ipmitool tries to change it to
the desired power state.

Now when doing a reboot command we check beforehand to see if the node
is already off, if so then don't attempt to power off the node again.

Also optimize ironic/conductor/utils.py node_power_action() so that it
only checks a node's power status if it might perform an action based
on the node's power status.

Change-Id: If838aae871753ebfbdf359e0bbe3afcc54c4b559
Closes-Bug: #1718794
6 years ago
John L. Villalovos 0d788525df Reduce complexity of node_power_action() function
While attempting to modify the node_power_action() function it caused
the complexity to exceed the 'pep8' check limits. This patch reduces
the complexity of the node_power_action() function by refactoring it.
This will allow future changes to node_power_action() to be allowed.

Added unit tests to test the functions that were pulled out.

Change-Id: I7f84ff23401997c2b92759c306a7bff0589c7f4c
6 years ago
Jenkins 95cc0b110b Merge "Adds more exception handling for ironic-conductor heartbeat" 6 years ago
D G Lee 56b8eae918 Adds more exception handling for ironic-conductor heartbeat
When heartbeat thread of ironic-conductor server is reporting heartbeat,
it will be interrupted by database exceptions except 'DBConnectionError'.
So add 'Exception' in _conductor_service_record_keepalive to catch all
possible exceptions raised from database to ensure the heartbeat thread
not to exit. And also log the exception information. When the database
recovers from an exception, heartbeat thread will continue to report
heartbeat.

Change-Id: I0dc3ada945275811ef7272d500823e0a57011e8f
Closes-Bug: #1696296
6 years ago
Jenkins f16e7cdf41 Merge "Prevent changes of a resource class for an active node" 6 years ago
Dmitry Tantsur 6525a0da29 Prevent changes of a resource class for an active node
Doing so would confuse nova-scheduler, and may result in attempts
to schedule a new instance on such node. The API version is not
updated, as this behavior is broken already, we're just moving
the breakage to the API level.

Change-Id: I758587d36c927c8eed852170728f6267ae18f001
6 years ago
zhufl 7e6ce7e78c Fix missing print format error
Missing print format will cause 'ValueError: unsupported
format character' error, this is to fix it.

Change-Id: I83bb1c093b6ce20c901554bba6b1913785bc1c0f
6 years ago
Jenkins c8884754bb Merge "Rolling upgrades support for create_port RPCAPI" 6 years ago
Jenkins 719a4ef8f7 Merge "Refactor get_physnets_by_portgroup_id" 6 years ago
Mark Goddard fc1e18e3e4 Rolling upgrades support for create_port RPCAPI
The new field 'physical_network' was added to Port in version
1.7, and port creation was moved from the API to the conductor
service in commit 9e3f412186.

This change adds a method can_send_create_port() to the conductor
RPCAPI that allows the caller to determine whether the conductor is able
to create ports. During a rolling upgrade this may return False, and the
API will need to determine whether it is able to create the port locally
as was done previously.

The create_port RPC method was added to support validation of the
physical_network field of ports in portgroups. A port may therefore be
safely created in the API service if it is not a member of a portgroup.
If the port being created is a member of a portgroup, then it cannot be
safely validated by the API service and the request must be rejected.

Change-Id: I8c417cba085f61c3d2ffe1f7e97c64fa85a014cb
Related-Bug: #1666009
Related-Bug: #1526283
6 years ago
Jenkins 15db6df2a5 Merge "Allow updating interfaces on a node in available state" 6 years ago
Mark Goddard 43fa6fc294 Refactor get_physnets_by_portgroup_id
There is a lot of common code between the functions
ironic.conductor.utils.validate_port_physnets() and
ironic.drivers.modules.network.common._get_physnets_by_portgroup().

This change refactors the code, adding a new utility method,
get_physnets_by_portgroup_id, which returns the physical networks
associated with a portgroup. There should be at most one such physical
network, and the presence of multiple will cause
PortgroupPhysnetInconsistent to be raised.

Change-Id: I8f01dcc5eaa0c8511ce77622e41db88e27791327
Related-Bug: #1666009
6 years ago
Hironori Shiina 7b8ecaefc4 Allow updating interfaces on a node in available state
Enable being able to set the interfaces for a node that is in the
provision state 'available'.

Change-Id: I428dd5905e6ab90c2c0b7867ba487482171b9496
Closes-Bug: #1704913
6 years ago
Jenkins d9983f1eec Merge "Add missing parameter descriptions." 6 years ago
Jenkins f641463cfb Merge "Improve graceful shutdown of conductor process" 6 years ago
gaozx 9ad88d014d Add missing parameter descriptions.
Change-Id: If26820665aedd771773075621b15fcad506c0b38
6 years ago
Ruby Loo 578f01678c Follow-up to fix for power action failure
This is a follow-up patch to the patch so that the power status
is not retried if a power action fails:
ee5d4942a1

It addresses the comments as well as adds more clarification
and updates the documentation to refer to the new
[ipmi]command_retry_timeout config option.

Change-Id: Ib21544da260565ae399e2d07b32af9bd8b810280
Related-Bug: #1692895
6 years ago
Jenkins bc5efdf459 Merge "Physical network aware VIF attachment" 6 years ago
Jenkins 0656229b48 Merge "Raise HTTP 400 rather than 500 error" 6 years ago
Galyna Zholtkevych 3bda561e31 Raise HTTP 400 rather than 500 error
Currently conductor method inspecting hardware
raises HardwareInspectionFailure with 500 Internal Error
if driver.power.validate fail (e.g. , ``driver_info`` is
not provided or some fields are missing).

Since this is an apparent client error, an HTTP error code
400-Bad request is more appropriate.
The validation method actually raises this needed error
and catching this is not needed anymore.

Change-Id: I080dedeac7ce33135fde8c53494e618ccf07c941
Closes-Bug: #1686457
6 years ago
Mark Goddard b9b820954d Physical network aware VIF attachment
When attaching virtual interfaces to ironic ports and portgroups, we
need to take account of the physical network assigned to those ports and
portgroups. A neutron virtual interface has a set of physical networks
on which it can be attached which is governed by the segments of its
network (of which there may be more than one).

This change makes the ironic VIF attach API physical network-aware using
the physical network information added to the port object.

When selecting a port or portgroup to attach a virtual interface to, the
following ordered criteria are applied:

* Require ports or portgroups to have a physical network that is
  None or one of the VIF's allowed physical networks.
* Prefer ports or portgroups with a physical network field which
  is not None.
* Prefer portgroups to ports.
* Prefer ports with PXE enabled.

The change is backwards compatible, as the old behaviour is maintained
when ports have a physical_network field which is None.

Change-Id: I3d13bfacfb5578f570791e3c06e769a9a0140a4c
Partial-Bug: #1666009
6 years ago
Yuriy Zveryanskyy b720359c06 Improve graceful shutdown of conductor process
If conductor is being stopped it is trying to wait of completion of
all periodical tasks which are already in the running state. If there
are many nodes assigned to the conductor this may take a long time,
and oslo service library can kill thread by timeout. This patch adds
code
that stops iterations over nodes in periodical tasks if conductor
is being stopped. These changes reduce probability to get locked
nodes after shutdown and time of shutdown.

Closes-Bug: #1701495
Change-Id: If6ea48d01132817a6f47560d3f6ee1756ebfab39
6 years ago
Hironori Shiina ddcd97714c Add node power state validation to volume resource update/deletion
This patch validate the power state of a node when the following
actions regarding volume resources associated with the node are
requested.
  - update a volume connector
  - delete a volume connector
  - update a volume target
  - delete a volume target

These actions should allowed only when the node is tuned off as
designed in the SPEC.

Change-Id: I5d0465c6ac2d2c6ddac03385e6ed0ccb37556306
Partial-Bug: 1526231
6 years ago
Julian Edwards ee5d4942a1 Don't retry power status if power action fails
The old code blindly required power status even if the power action
failed. Now, it will retry the power action only when it detects a
retryable failure, and will only poll for power status if the power
action is successful. This patch also moves the logic for handling
waiting for power status into the conductor so that the logic is
standardised between drivers.

Change-Id: Ib48056e05d359848386ac057b58921f40b7bdd60
Co-Authored-By: Sam Betts <sam@code-smash.net>
Related-Bug: #1675529
Closes-Bug: #1692895
6 years ago
Jenkins 0424e1a075 Merge "Wire in storage interface attach/detach operations" 6 years ago
Mark Goddard 039225610d Validate portgroup physical network consistency
When creating or updating a port that is a member of a portgroup, we
need to validate the consistency of the physical networks of the ports
in the portgroup.

There are 3 cases we are interested in:

- Creating a port which is a member of a portgroup.
- Updating the physical network of a port which is a member of a
  portgroup.
- Updating the portgroup of a port.

All ports in a portgroup should have the same value (which may be None)
for their physical_network field.

During creation or update of a port in a portgroup we apply the
following validation criteria:

- If the portgroup has existing ports with different physical networks,
  we raise PortgroupPhysnetInconsistent. This shouldn't ever happen.
- If the port has a physical network that is inconsistent with other
  ports in the portgroup, we raise exception.Conflict.

If a port's physical network is None, this indicates that ironic's VIF
attachment mapping algorithm should operate in a legacy (physical
network unaware) mode for this port or portgroup. This allows existing
ironic nodes to continue to function after an upgrade to a release
including physical network support.

Change-Id: I6a6d248155f98109dd36dba5837494f6974846e6
Partial-Bug: #1666009
6 years ago
Joanna Taryma 104000549e Wire in storage interface attach/detach operations
Addition of storage interface attachment and detachment
operations when:

* Node power is turned on/off, with a storage_interface
  configured, and when the node is in ACTIVE state.
* Node deployment and node tear_down operations.

In addition to attachment and detachment, driver_internal_info
is now populated with a boot from volume target uuid, if a
volume is defined for the node.

Additionally, upon tear_down, the drivers now call a helper
to remove storage related dictionaries and destroy
volume target records.

The "cinder" storage interface has been enabled by default,
and further details on the storage interface's use are in
later patchsets for this feature.

Authored-By: Julia Kreger <juliaashleykreger@gmail.com>
Co-Authored-By: Joanna Taryma <joanna.taryma@intel.com>
Co-Authored-By: Michael Turek <mjturek@linux.vnet.ibm.com>
Change-Id: I0e22312e8cebb37b8f025da2baeca8eb635f35b7
Partial-Bug: #1559691
6 years ago
Mark Goddard 9e3f412186 Move port object creation to conductor
Previously, the API service created port objects without hitting the
conductor. This change moves port creation to the conductor service,
adding a create_port method.

Currently this just performs the port object creation but in a future
change this will be used to validate the physical network assignment of
ports in a portgroup when a port is created, ensuring that all ports in
the group have the same physical network.

The conductor RPC API version has been bumped to 1.41.

Change-Id: I7501bf9fedc668629d5b627475bb54fef5c6bf20
Partial-Bug: #1666009
6 years ago
Yuriy Zveryanskyy 98dc0ddb14 Fix docstrings in conductor manager
Remove NodeCleaningFailure exception from docstrings of two methods,
they are not raise it.

Change-Id: I7885d359a613c9ef9f82953cfef67176f94ea59e
6 years ago
Jenkins 19ced862a7 Merge "Config drive support for ceph radosgw" 6 years ago
Anup Navare 58b34b0b30 Config drive support for ceph radosgw
Currently config drive can be stored in swift with keystone
authentication. This change allows ironic to store the config drive in
ceph radosgw and use radosgw authentication mechanism that is not
currently supported. It uses swift API compatibility for ceph radosgw.

New options:
    [deploy]/configdrive_use_object_store
    [deploy]/object_store_endpoint_type
Deprecations:
    [conductor]/configdrive_use_swift
        Replaced by: [deploy]/configdrive_use_object_store
    [glance]/temp_url_endpoint_type
        Replaced by: [deploy]/object_store_endpoint_type

Change-Id: I9204c718505376cfb73632b0d0f31cea00d5e4d8
Closes-Bug: #1642719
6 years ago
Steven Hardy 98d2749bca Improve error message for deleting node from error state
Currently the message doesn't make it clear that you can
switch to maintenance mode, so clarify it to avoid confusion
over what state transitions are possible.

Change-Id: I8990c0bbca26e3ca7cf2beb6531be8bbc426a769
Closes-Bug: #1681778
6 years ago
Jenkins c5531772af Merge "Add comments re RPC versions being in sync" 6 years ago
Pavlo Shchelokovskyy d8438df350 Add comments re RPC versions being in sync
place it to remind to keep it in sync there too.

Change-Id: I745b4b14d6217030a463a9a343eee139d8e155b3
Related-Bug: #1526283
6 years ago
yuan liang 588f2e921f add portgroups in the task_manager docstrings
Ironic portgroup property is added to the TaskManager class,but it is 
not mentioned in the docstrings..

Change-Id: I228fd3410810b130d14d8c53cf14eade5186153d
6 years ago
Ruby Loo 62c6377adf Node should reflect what was saved
After a node is saved to the database, we weren't updating the
Node object to reflect what was saved. This caused a problem
where the node's update_at field was incorrect. It was fixed
in 065326c0f5 by explicitly
setting node.update_at. However, that doesn't address other
node fields that may be out of sync.

The more correct fix would be to do a similar thing that (most
of) the other Objects do, which is for the node to update itself
via ._from_db_object().

Doing this revealed several incorrect tests and code in the conductor
and agent where changes to the node's dictionaries were incorrectly
being set and thus, not being saved. Those are fixed in this patch.

Change-Id: Ia84cd60c1a4eabcc1ad0a756124c338fa9f644c8
Closes-Bug: #1679297
Related-Bug: #1281638
6 years ago
Jenkins a84c2e0d8e Merge "Add RPC and object version pinning" 6 years ago
Grzegorz Grasza 9bc06783ec Add RPC and object version pinning
To support rolling upgrades, capping of RPC communication and Ironic
objects is required. Old RPC services and objects may still be running
while an upgrade is in progress. This makes sure that these old
services are called and all objects are used in a supported RPC and
objects version.

This patch adds the configuration option "pin_release_version". Setting
this option caps (downgrades) the internal RPC communication to the
specified version to enable communication with older services. When
doing a rolling upgrade from version X to Y, set this to X. It defaults
to using the newest (current) possible RPC behavior and object versions.

Change-Id: Ie2342d4051f85392a8b10d39ebffc287da57bf2b
Partial-Bug: #1526283
Co-Authored-By: Szymon Borkowski <szymon.borkowski@intel.com>
Co-Authored-By: Ruby Loo <ruby.loo@intel.com>
6 years ago
Ramamani Yeleswarapu d82fb9a9a2 Remove translation of log messages from ironic/conductor
The i18n team has decided not to translate the logs because it seems
like it's not very useful.

This patch removes translation of log messages from ironic/conductor.

Change-Id: I0fabef88f2d1bc588150f02cac0f5e975965fc29
Partial-Bug: #1674374
6 years ago
John L. Villalovos d0a2e13f10 Use flake8-import-order
Use the flake8 plugin flake8-import-order to check import ordering. It
can do it automatically and don't need reviewers to check it.

Change-Id: I821fd7467f6c5cc1487149297f26e4ad539cf25d
6 years ago
Ruby Loo d8bdbda030 No node interface settings for classic drivers
Classic drivers have their interfaces (except for
network and storage) pre-determined. Unlike dynamic
drivers, it is not possible to change these interfaces
for nodes with a classic driver. If that is attempted
(when creating or updating a node), an exception
MustBeNone is raised (and HTTP status 400 is returned).

So that we don't break existing ironic clusters that
have nodes with classic drivers and interfaces specified,
a warning is logged (instead of raising an exception)
when a TaskManager is created.

Change-Id: I290b10f735d0da9710d1ee3b50c3252f73956428
Partial-Bug: #1524745
6 years ago
Jenkins 72516959f3 Merge "Validate the network interface before cleaning" 6 years ago
Jenkins 70a6301115 Merge "exception from driver_factory.default_interface()" 6 years ago
Ruby Loo f192abb9ed Validate the network interface before cleaning
In addition to validating the power interface before cleaning
is started, the conductor will also validate the network
interface.

Change-Id: Id8938c4e426243de721ce781f28fd1dbeff6b8ab
Fixes-Bug: #1662372
6 years ago
Ruby Loo 0a7fc6059a exception from driver_factory.default_interface()
This changes driver_factory.default_interface() so that instead
of returning None if there is no calculated default interface,
it raises exception.NoValidDefaultForInterface.

This is a follow up to 6206c47720.

Change-Id: I0c3d5d75b5a37af02f3660968cf3f2c669e52019
Partial-Bug: #1524745
6 years ago
Jenkins c439caf0f9 Merge "Improve enabled_*_interfaces config help and validation" 6 years ago
Jim Rollenhagen 1f25244bf6 Improve enabled_*_interfaces config help and validation
This adds additional constraints to the help messages for the
enabled_*_interfaces config options. It also checks if they are
empty at conductor startup, and if any are empty, errors out
with a better error message than previously provided.

Change-Id: I97fc318ce00291d5e43b70423930981c2f5a2de0
Partial-Bug: #1524745
6 years ago
Jenkins 2fbc101fc1 Merge "Allow duplicate execution of update node DB api method" 6 years ago