79 Commits

Author SHA1 Message Date
Julia Kreger
4fb8163717 Fix boot mode detection for partition images
Previously, partition images were hard coded to be bios based
as opposed to consulting all of the values AND the node itself
before making the most appropriate determination. Now the agent
utilises the internal helper to properly determine the boot
mode when calling ironic-lib.

Story: 2008070
Task: 41265
Change-Id: Id5eeda69d5b9de2b393af414472d57b0d4380c43
2020-12-19 19:03:16 +00:00
Julia Kreger
246e0cf29e Change default ironic_lib invocation to flag local booting
The partition image support has been telling ironic-lib
that the machine will be local booted. While this is likely
harmless, and doesn't seem to break anythign, we should have
it match moving forward just to be on the safe side so we don't
accidently break things down the road.

Change-Id: I33e5d583964ef8c21aa04d7427bcd3957b89d449
2020-12-19 19:02:58 +00:00
Julia Kreger
cb6c0059b5 Fix default disk label with partition images
Partition images through the agent have the unfortunate
side effect of being executed without full node context
by default. Luckilly we've had a similar problem and
cache the node.

This patch changes the lookup from a default of msdos
partitions to use the cached node object.

Change-Id: I002816c9372fdf1cc32f3c67f420073551479fd9
2020-12-14 06:36:18 -08:00
Julia Kreger
d3c3d4dabe Update the cache if we don't have a root device hint
Or at least try to.

Some deployments just don't use root device hints, and this is okay.

However, other deployments need root device hints, and with fast
track mode in ramdisks, we created a situation where the node cache
could be updated by a human or software between the time the agent
was started, and the deployment was requested.

As a result, the agent has been updated to check if we have a hint
and if we don't, update the cache from the node lookup endpoint.

This is not needed when the inband deploy steps are executed, as
the process of updating the steps does force the node cache to be
updated.

Change-Id: I27201319f31cdc01605a3c5ae9ef4b4218e4a3f6
Story: 2008039
Task: 40701
2020-08-25 19:34:48 +00:00
Dmitry Tantsur
00ad03b709 Fixes minor issues in the read() retries patch
Follow-up to commit c5b97eb781cf9851f9abe87a1500b4da55b8bde8.

Two things slipped through the cracks:
* ImageDownloadError was instantiated incorrectly, resulting in a wrong
  error message. This was uncovered by using assertRaisesRegext in tests.
* We allowed calling write(None). This was uncovered by avoiding sleep(4)
  in tests and enabling more failed calls before timeout.

Change-Id: If5e798c5461ea3e474a153574b0db2da96f2dfa8
2020-06-30 10:51:53 +02:00
Zuul
c94fb84497 Merge "Minor clean-up follow-up to timeout on read() fix" 2020-06-25 10:23:18 +00:00
Julia Kreger
7abda4eefe Minor clean-up follow-up to timeout on read() fix
Just some minor cleanup driven from the review process.

Change-Id: I0b3d73c251d6da6d85e11279990dcc36751e27e7
2020-06-24 10:02:28 -07:00
Julia Kreger
159ab9f0ce Add full download retries
Instead of just trying to get the connection and handler
for the download, lets try to retry the whole action of
of downloading.

Change-Id: I9217792d32e6f33c70f146a9b7d3ef58c5644d8a
2020-06-23 20:27:41 +00:00
Julia Kreger
c5b97eb781 Add timeout operations to try and prevent hang on read()
Socket read operations can be blocking and may not timeout as
expected when thinking of timeouts at the beginning of a
socket request. This can occur when streaming file contents
down to the agent and there is a hard connectivity break.

In other words, we could be in a situation like:

- read(fd, len) - Gets data
- Select returns context to the program, we do things with data.
** hard connectivity break for next 90 seconds**
-  read(fd, len) - We drain the in-memory buffer side of the socket.
-  Select returns context, we do things with our remaining data
** Server retransmits **
** Server times out due to no ack **
** Server closes socket and issues a FIN,RST packet to the client **
** Connectivity restored, Client never got FIN,RST **
** Client socket still waiting for more data **
- read(fd, len) - No data returned
- Select returns, yet we have no data to act on as the buffer is
  empty OR the buffered data doesn't meet our requried read len value.
  tl;dr noop
- read(fd, len) <-- We continue to try and read until the socket is
                    recognized as dead, which could be a long time.

NOTE: The above read()s are python's read() on an contents being
      streamed. Lower level reads exist, but brains will hurt
      if we try to cover the dynamics at that level.

As such, we need to keep an eye on when the last time we
received a packet, and treat that as if we have timed out
or not. Requests periodically yeilds back even when no data
has been received, in order to allow the caller to wall
clock the progress/status and take appropriate action.

When we exceed the timeout time value with our wall clock,
we will fail the download.

Change-Id: I7214fc9dbd903789c9e39ee809f05454aeb5a240
2020-06-23 13:25:09 -07:00
Dmitry Tantsur
6d7ec350ff Make get_partition_uuids work with whole disk images
We used to popular root UUID inside the message formatting function,
move it to actual prepare_image/cache_image calls.

Change-Id: Ifb22220dfd49633e8623dd76f7a6a128f5874b78
2020-06-17 14:38:58 +02:00
Dmitry Tantsur
6c1545b75b New extension call to return partition UUIDs
Currently we parse the success message from the write_image call.
This is inconvenient and incompatible with the deploy steps split.

Change-Id: I258dc1ff1ad1c9df5cbc26a7825d9e7ef2f3205b
Story: #2006963
2020-06-02 15:05:59 +02:00
Dmitry Tantsur
8adb7e1a04 Add timeout and retries when connection to an image server
If the server is stuck for any reason, the download will hang for
a potentially long time. Provide a timeout (defaults to 60 seconds)
and 2 retries on failure.

Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f
2020-04-24 10:34:40 +02:00
Riccardo Pittau
a332a19a57 Bump hacking to 3.0.0
Change-Id: I1032ea6a2e9d79aeaecb1458c319cbeb15ac1fff
2020-03-30 12:55:46 +02:00
Julia Kreger
55b011cb1f Fix GPT partition tables after agent writes contents
Fixes errors that were being raised upon restarting the agent
directly written out software raid images as the raidset is
restarted for device consistency and partition updates later
on in the code path of deployment.

Story: 2007455
Task: 39187
Change-Id: I9abf51eb77b262932e70329af5ce1593106a3171
2020-03-29 07:45:25 -07:00
Zuul
5521fa32f6 Merge "Add NTP time sync" 2020-03-11 19:51:24 +00:00
Julia Kreger
cee4bfc4bc Add NTP time sync
Attempt to sync the clock and save it to the hardware clock.

This feature supports use of chrony or ntpdate.

Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
2020-03-07 09:16:19 -08:00
Kaifeng Wang
629a19f24b Ignore None md5 checksum field
Current checking on md5 checksum field is a bit strict after we
have alternate hashing algorithm support from glance, this
patch ignores None value md5 checksum if it exists.
This dosn't provide any use to end users but maybe provide
convenience on internal logic.

Change-Id: I89d7ea8ac3464a430141e80be57b743673c3a173
2020-02-22 10:52:44 +08:00
Julia Kreger
ab00904e27 Catch ValueError for FIPS 140-2 mode
In FIPS 140-2 mode, the underlying operating system will
prevent the loading of certian algorithms for hasing and
encryption. Python hashlib returns a ValueError exception
when the type cannot be instantiated.

This change catches the error and returns a relatively
user understandable reason as to why a failure has occured.

Change-Id: Id1a144b906303caa92ce88793fba8d1b14def738
Story: 2007306
Task: 38788
2020-02-18 10:45:23 -08:00
Riccardo Pittau
ca7a46b113 Stop using six library
Since we've dropped support for Python 2.7, it's time to look at
the bright future that Python 3.x will bring and stop forcing
compatibility with older versions.
This patch removes the six library from requirements, not
looking back.

Change-Id: I4795417aa649be75ba7162a8cf30eacbb88c7b5e
2019-11-29 10:18:14 +01:00
Kaifeng Wang
6f634c358b Adds bandit template and exclude some of tests
Adds bandit configuration template and exclude some of
tests that we don't want to fix for the moment.

Keeping job unvoted so that we can keep an eye on possible
issues while not breaking gate.

Change-Id: I092d686ba38723d7951e8f06415f28cc809ad365
Story: 2005791
Task: 33563
2019-06-20 14:39:36 +08:00
Kaifeng Wang
a9cac52190 Relax checksum fields validation
In stein, ironic added the new os_hash_algo and os_hash_value checksum
fields provided by glance, but the checksum field is still mandatory,
which is inconvenient for standalone use case.

We could relax the checksum checking and proceed as long as there is at
least one of checksum mechanism available.

Change-Id: Ia90197416f76ada0422681044a16f1c07d7049a1
Story: 2005773
Task: 33490
2019-05-28 09:38:36 +08:00
Dmitry Tantsur
f821db3a54 Allow image checksum to be a URL
We allow image_source to be a URL, let us also support URLs for checksums.
This change copies handling of multi-file checksum files from metalsmith.

Change-Id: Ie4d7e5c79b76bdd72d50eeb384cf10519278a80c
Story: #2005061
Task: #29605
2019-02-25 14:28:09 +01:00
Sam Betts
fc2dfcee60 Attempt to read the partition table after writing an image
This patch adds code that tries to read the partition table after we've
successfully written an image to make sure the image that we wrote has a
valid partition table so we can more easily guarantee that what we've
written is bootable and not just junk. Without a valid partition table
writing a config drive will fail for whole disk images.

Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: I5cfd8c433a4db3e0d2d5086250e629d16234b7a4
Story: 2001760
Task: 12159
2018-11-19 18:57:23 +01:00
Zuul
f63099ebb6 Merge "Allow streaming raw partition images" 2018-10-26 14:14:55 +00:00
Dmitry Tantsur
29136bf68d Allow streaming raw partition images
Currently we support streaming raw whole disk images, but not
partition ones. This change enables it.

Change-Id: Ie95102aa3f2054a6b429f3d3e0926e90923c5faf
Story: #2003809
Task: #26558
2018-10-17 11:16:04 +02:00
Kaifeng Wang
ec2bf8667d Enhanced checksum support
Adds enhanced checksum support to IPA, when os_hash_algo and os_hash_value
are passed in via image_info, it will be used to calculate image checksum
and verification.

In other cases, the old md5 checksum is used.

Change-Id: I1d2f33e7059910326b4ac3f7786543b333a93a5a
Story: 2003938
Task: 26846
2018-10-15 17:15:38 +08:00
Michael Turek
b32750f5c4 Install grub to PReP partition when prep_boot_part_uuid is provided
Installs the grub bootloader to the PreP Boot partition when the
prep_boot_partition_uuid is provided. This is required when
booting a partition image locally on ppc64* systems.

This change also passes the cpu_arch along to work_on_disk so
that the PReP partition is created when partitioning disks for
local boot on ppc64* systems,

Change-Id: I70667d43af962b357e6eeccba258f4fa5a91a09e
Depends-On: I2bc9f13ec605de7b7b96d96a1a4edebee0af76dc
Story: #1749057
Task: #22999
2018-07-20 16:07:16 +00:00
Julia Kreger
3164053f08 Fix gate and bump CoreOS version to latest stable.
Increases the amount of ram for CoreOS IPA to 2GB
as the base CoreOS image is now 310MB.

Bumped CPU count for CoreOS runs to 2 CPUs as the
concurrency helps boot times for the CoreOS ramdisk.

Adds netbase, udev, and open-iscsi to debian jessie container
as they are no longer present in the default container.

Explicitly set path variable for execution in the debian
container as udevadm is in /sbin, and we may not have
/sbin on the path that is passed through to the
chroot.

Also fixed new pep8 test failures.

Story: #1600228
Task: #16287
Change-Id: I488445dfd261b7bca322a0be7b4d8ca6105750a3
2018-05-10 15:50:05 -07:00
Julia Kreger
71fda732d2 Catch OSError thrown when hexdump is missing
Change c5bf7b088f1ec776b788a81f2775e1b2577720e8 introduced
a new requirement via a pre-existing ironic-lib method being
called that utilizes hexdump. Hexdump is not always present
and since we did not explicitly call it out as a new
requirement, we should at least somewhat gracefully handle
the exception.

Change-Id: Id0223ef1417f6e419770ceb56b2a3b80c6118a85
Closes-Bug: #1732470
2017-12-11 17:11:52 -05:00
Shivanand Tendulker
c5bf7b088f Fix to return 'root_uuid' as part of command status
IPA does not return 'root_uuid' as part of command status when
provisioning of whole disk image is done using 'agent' deploy
interface from ironic. This commit fixes the issue.
Also updated Dockerfile to include package 'bsdmainutils' related
to 'hexdump' binary.

Change-Id: I89597fe4a704686fe31c064c3443fd8404a300e5
Partial-Bug: #1713916
2017-10-24 05:00:16 -04:00
vmud213
85869a134b Remove unused function _configdrive_location
This function is never used and can be removed safely.

Change-Id: Ied7b4984185ea170d33cb57010de89edeaaaeec5
Closes-Bug: #1690135
2017-05-11 12:23:38 +00:00
John L. Villalovos
e9344077fc flake8: Specify 'ironic_python_agent' as name of app
Specify 'ironic_python_agent' as the name of the application for the
flake8-import-order plugin. That way it knows that imports of
ironic_python_agent should come after external libraries.

Change-Id: Id39d558a51aeb97d96633afea28676634547d0d7
2017-03-16 07:09:07 -07:00
Galyna Zholtkevych
9c2d0cdd85 Correct failure message output when downloading
This fixes unreadable output on download image failure.
Adding new instance variable to exception `ImageDownloadError` class
to avoid redundant logs.

Change-Id: I51782abd572588adfc62745eeab9c559eb8346dd
Closes-Bug: #1657691
2017-03-10 19:16:07 +00:00
John L. Villalovos
949f4f509e Use flake8-import-order
Use the flake8 plugin flake8-import-order to check import ordering. It
can do it automatically and don't need reviewers to check it.

Change-Id: I946457e9079ce0b54c7fe0ad554d024a1c61dce0
2017-02-16 09:46:21 -08:00
Jenkins
fd7f10b993 Merge "Configure and use SSL-related requests options" 2017-02-07 09:57:49 +00:00
Annie Lezil
20dc04e5e2 Reboot and Poweroff fails with coreos IPA image
The CoreOS IPA images do not support poweroff/reboot due to running in a
chroot. For this case, we fall back to forcing poweroff or reboot via
sysrq commands

Change-Id: I75d68b6308beba299d043e43a5fa1671b6ef3ada
Closes-Bug: #1628367
2017-01-20 12:55:12 -08:00
Pavlo Shchelokovskyy
fdd11b54a5 Configure and use SSL-related requests options
This patch adds standard SSL options to IPA config and makes use of them
when making HTTP requests.

For now, a single set of certificates is used when needed.
In the future configuration can be expanded to allow per-service
certificates.

Besides, the 'insecure' option (defaults to False) can be overridden
through kernel command line parameter 'ipa-insecure'.
This will allow running IPA in CI-like environments with self-signed SSL
certificates.

Change-Id: I259d9b3caa9ba1dc3d7382f375b8e086a5348d80
Closes-Bug: #1642515
2017-01-13 11:33:44 +02:00
Bharath kumar
9948349b10 Moving Reboot bashscript to python
Currently a reboot bash script file is used to call reboot and
poweroff operation. Deleting this file and moving the code to
python file using utils.execute()

Partial-Bug: #1557542

Change-Id: Iad9cd9d15417e9a954d108d2759e6303452fca27
Author: Bharath kumar <shettybharath4@gmail.com>
Co-Authored-By: Annie Lezil <annie.lezil@gmail.com>
2016-12-15 00:05:03 +00:00
Shivanand Tendulker
7471d4004e Remove duplicated logging in configdrive creation
Ironic-lib logs a message when configdrive is created
successfully. Remove duplicate message from IPA.

Change-Id: I2af81cdfda4cfc004288f44d14a5c127639cc1f1
2016-10-26 02:47:53 -07:00
Shivanand Tendulker
3665306dfb Use ironic-lib to create configdrive
Shell script to create config drive being replaced with python
code in ironic-lib.

Closes-Bug: #1493328

Change-Id: I31108f1173db3fb585386b2949ec880a95305fb6
2016-10-21 03:39:06 +00:00
John L. Villalovos
20d960ff98 Remove Python 2.6 format style
In Python 2.6 it was required to use {0}, {1}...{n} when using the
string format function. In Python 2.7 and Python 3 it it not required.

Change {N} to {} in code.

This brings the code in style alignment with other projects like
ironic and ironic-lib.

Change-Id: I81c4bb67b0974f73905f14b589b3dd0a7131650d
Depends-On: I8f0e5405f3e2d6e35418c73f610ac6b779dd75e5
2016-10-06 09:05:26 -07:00
Galyna Zholtkevych
993149cfb4 Improve error message while download image
Collecting warning logs in the case of download failure
and write them to error logs in the end. This will help
a user to diagnose a problem when warning log was not
enabled.

Change-Id: I4198d7be08fc11b616b3f95c595ff53794436e24
Partial-Bug: 1512186
2016-09-21 10:02:22 -04:00
Josh Gachnang
fd874652e3 Add metrics support to IPA
This utilizes the new metrics support in ironic-lib to allow the agent to
report timing metrics for agent API methods as configured in ironic-lib.

Additionally, this adds developer docs on how to use metrics in IPA,
including some caveats specific to ironic-lib.metrics use in IPA.

Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Co-Authored-By: Alex Weeks <alex.weeks@gmail.com>
Change-Id: Ic08d4ff78b6fb614b474b956a32eac352a14262a
Partial-bug: #1526219
2016-08-03 11:24:54 -07:00
Clif Houck
3cf5369cb6 Add docstrings to all functions in Agent standby extension
Change-Id: Ic8101a6b29dee4b79c2d7f3dc064e4c98a9a0741
Partial-Bug: 1367915
2016-03-31 11:44:50 -05:00
Nisha Agarwal
4ec49be8e2 Add disk_label support for partition images
This commit adds the disk_label support for partition
images. It also fixes the node_uuid info passed to the
ironic_lib.

Partial-Bug: 1560560
Change-Id: I8b8ef20787468c1b8dc6fbc0b8905abd285325e1
2016-03-22 16:59:38 +00:00
Nisha Agarwal
936b2e4c4a Fixes the agent message for uefi netboot for partition image
The agent returns "efi_system_partition_uuid=None" in the
status message for uefi netboot for partition images.
This commit fixes to remove this unwanted message
from the status message as efi partition is created only
for localboot.

Closes-bug: 1526289

Change-Id: I6376406cdde29493619f50b0a6cd8b6ce3784d6e
2016-03-21 18:11:32 +00:00
Jenkins
4f1caf11e9 Merge "Add sync() command to the standby module" 2016-03-21 09:51:44 +00:00
Lucas Alvares Gomes
4b802c47b5 Add sync() command to the standby module
This patch is adding a new command called sync to the standby module of
IPA. The new command runs synchronously and it's responsible for
flushing file system buffers to the disks.

The initial intention for this command is to use it as part of the fix
for the bug #1512492 where some hardware/firmwares do have problems to
come back online after a soft ACPI power off, therefore we need to call
sync() to make sure all file system buffers have been synced and then
issue a hard power off (e.g via the BMC).

Partial-Bug: #1512492
Change-Id: I5cd1d1b821426e995dc584452494b93ab23917e0
2016-03-18 15:20:48 +00:00
Faizan Barmawer
944595a69d Add support for partition images in agent driver.
It also adds the ironic-lib in the requirements
list of the IPA package.

Partial-bug: 1526289
Depends-On: I22bc29a39bf5c35f3eecb6d4e51cebd6aee0ce19
Change-Id: I37908470484744bb720f741d378106d1cb1227a3
2016-03-18 08:21:01 +00:00
Lucas Alvares Gomes
e320bb8942 Add support for streaming raw images directly onto the disk
This patch adds support for streaming a raw image directly onto the disk,
that means no more time spent writing the image to a tmpfs partition prior
to copying it to the disk. Checksum computation is also done as the image
is being streamed. Streaming raw images is disabled by default, however
this behavior can be enabled by passing a key called "stream_raw_images"
with the value of True to the prepare_image() command of IPA.

For non-raw images this may not be possible, not sure about all image
file formats, but common types such as qcow2 requires random access to
the image file in order to be converted to raw.

Closes-Bug: #1505685
Change-Id: Iddf67907bc9b54bbd3065a97064cb5a3602cfe18
2015-11-18 11:19:40 +00:00