Commit Graph

5069 Commits (master)

Author SHA1 Message Date
Zuul f66c05be6d Merge "Update OAM firewall" 2023-12-07 21:54:00 +00:00
Zuul af271844a3 Merge "Prevent mgmt gateway overwriting default route" 2023-12-07 18:41:12 +00:00
Zuul df779828e2 Merge "Set extra rules for the platform firewalls" 2023-12-07 18:31:39 +00:00
Teresa Ho db2c3b7c52 Prevent mgmt gateway overwriting default route
The default route is setup using the OAM IP gateway normally.
For a standalone AIO-SX, if the management gateway is specified in
the address pool creation during management reconfiguration, a
second gateway is setup for the management interface. The ifupdown
package ensures that there is only one gateway based on the configured
gateway(s). For this reason the default route using OAM IP gateway may
get overwrittened by the management gateway.

For a DC subcloud, the management gateway is used to configure the
system controller gateway IP for communication between the subcloud and
system controller. Therefore, it is required to configure the managemet
gateway for a subcloud.

The fix is to add a semantic check to prevent the addition of the
management gateway for a standalone AIO-SX.

Test Plan:
PASS: AIO-SX management reconfiguration
PASS: AIO-SX subcloud management reconfiguration

Story: 2010722
Task: 49218

Signed-off-by: Teresa Ho <teresa.ho@windriver.com>
Change-Id: Ib50f3942fc7a81acf4db1dfa7fbf86226e5e245a
2023-12-07 09:08:07 -05:00
Andre Kantek 6c1209d4d2 Update OAM firewall
In order to use the OAM network value in the firewall we are using
the destination instead of the source (as it is done for the other
platform networks) since the OAM is a special case where outside
access is possible but needs to be limited to this network.
For ICMPv6 the link-local (unicast and multicast) networks
are added.

Test Plan:
[PASS] Install an AIO-SX in IPv4 and check OAM traffic termination
[PASS] Install an AIO-DX in IPv4 and check OAM traffic termination
[PASS] Install an AIO-SX in IPv6 and check OAM traffic termination
[PASS] Install an AIO-DX in IPv6 and check OAM traffic termination

Story: 2010591
Task: 49214

Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
Change-Id: Icededa544de12545d1cb8644b47ce941d89d5f56
2023-12-06 16:27:47 -03:00
Andre Kantek e95dac567b Set extra rules for the platform firewalls
There are some conditions where a specialized rule is needed to be
added to the firewall. This change adds a information with the
necessary data to the hieradata file.

Test Plan
[PASS] Install an AIO-SX IPv4 node and make sure the rules are
      exercised for the corresponding incoming traffic
[PASS] Install an AIO-DX IPv4 cluster and make sure the rules are
      exercised for the corresponding incoming traffic

Story: 2010591
Task: 49215

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/902805

Change-Id: I6032983ede18d37639851ac2adfc3be0b6789bb5
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-12-06 15:04:25 -03:00
Heitor Matsui dade970c8a Fix missing runtime apply parameter
The parameter config_out_of_date_timeout was removed by
commit [1], however a reference to it remained in the code,
introduced by commit [2].

This commit fixes the reference, now declared as a constant
on sysinv/common/constants.py

[1] https://review.opendev.org/c/starlingx/config/+/894544
[2] https://review.opendev.org/c/starlingx/config/+/896164

Test Plan
PASS: force a scenario where an amount of runtime configurations
      are enqueued/deferred, verify no errors on sysinv.log and
      that the message "_ready_to_apply_runtime_config: wait %s secs"
      is shown on the log, indicating that the previous point of
      failure is not failing anymore.

Closes-bug: 2045704

Change-Id: I96d6cbe5087936790a0854a51a464798a87786ef
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
2023-12-05 19:47:35 -03:00
Zuul aff99f4245 Merge "Skip sysinv load update for a USM upgrade" 2023-12-04 21:56:17 +00:00
Bin Qian bad475584f Skip sysinv load update for a USM upgrade
In USM upgrade, no longer need to update sysinv database for upgrade
state. This change add an option to skip the update against sysinv
database. The database update will be continuously supported until
USM upgrade cutoff.
Subsequent task 48981 is created to remove the code updating sysinv
database after cutoff.

TCs:
     passed: USM upgrade data migration
     passed: legacy upgrade data migration

Story: 2009303
Task: 48980

Change-Id: I3039418f6565adb0199a9ea6408765ddc5db30ce
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2023-12-04 20:07:34 +00:00
Zuul 33cbbaf51f Merge "Query stale runtime config and reapply" 2023-12-01 21:27:59 +00:00
Zuul a57547518a Merge "Improve PTP phc2sys cmdline_opts handling" 2023-12-01 17:20:22 +00:00
Zuul 01e8b85cf9 Merge "Fix FEC pcidp resource generation with expected data" 2023-12-01 14:48:15 +00:00
Zuul 19f467d159 Merge "Resolve kubeadm alpha Check in kube-cert-rotation Script" 2023-11-30 14:26:00 +00:00
Lucas Borges 1e10f4a086 Resolve kubeadm alpha Check in kube-cert-rotation Script
The 'kube-cert-rotation' script evaluates the status
of the 'kubeadm' command to determine if it is in
alpha mode. This modification addresses an issue with
the check, ensuring it conforms to the correct usage.
After kubernetes v1.21 the 'kubeadm alpha' has been
deprecated.

TEST PLAN:

PASS: Deploy the DC (with dc-libvirt) and verify
      that alarm 250.003 is not triggered.

Closes-bug: 2045271
Change-Id: I05d7aabbb3ac35b59b78e1c4f0cb070ec671b6a7
Signed-off-by: Lucas Borges <lucas.borges@windriver.com>
2023-11-30 13:13:43 +00:00
Andre Mauricio Zelak 4ce5e25665 Improve PTP phc2sys cmdline_opts handling
The cmdline_opts configuration field handling was improved to remove leading
and trailing quotes characters. As the cmdline_opts is a freeform string
field the enhancement can avoid errors starting the phc2sys service.

Test plan:
PASS - Encapsulate cmdline with '.
PASS - Encapsulate cmdline with \'.

Closes-Bug: 2045036

Change-Id: I156673120c28f625a0200b46e74aeacd7f1b1955
Signed-off-by: Andre Mauricio Zelak <andre.zelak@windriver.com>
2023-11-29 13:48:03 -03:00
Steven Webster 305dc493af Fix FEC pcidp resource generation with expected data
An issue was seen after restoring a system making use of the
ACC100 FEC device.

After restore, the values for the sriov_numvs and sriov_vf_driver
in the database were 0/None.

I am unsure if this is an error in the restore process, or whether
there was some subsequent database corruption.  The issue was seen
on only one system out of many.

In any case, this seems to have been handled in the generation of
the actual ACC100 device config in the past.  That is, we store
the 'expected' value of the sriov_numvfs and sriov_vf_driver in
the extra_info field of the pci_device table.  These values are
preferred over the actual values in the DB.

The issue here is that in generating the SR-IOV device plugin
resource data for puppet, the 'actual' values are used, rather
than the 'expected' values.  This causes under current logic
the hieradata generation to skip the device, as it's
sriov_vf_driver is NULL.

This commit makes the generation of the SR-IOV device plugin
resource data consistent with the method used for the actual
configuration data of the device, based on a preference for the
'expected' data in the extra_info field of the device.

Test Plan:

Force the issue seen in the field by setting the sriov_numvfs=0
and sriov_vf_driver=None in the database.

  - Lock/unlock host and ensure that the hieradata is based on
    the expected_vf_driver.
  - The unit test cases should cover all cases of the modified
    function

Closes-Bug: #2045149

Change-Id: Ic7beb4e6a6fd69901db3a012649461fc445380ee
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2023-11-29 10:00:23 -05:00
Zuul 10d74aecf7 Merge "Migrate script to update static hieradata" 2023-11-28 21:50:34 +00:00
Zuul 1ab7375abe Merge "disable image gc when doing k8s upgrade" 2023-11-24 19:58:02 +00:00
Chris Friesen c645ce21d6 disable image gc when doing k8s upgrade
Static pods cannot use image pull secrets, so it's important that the
control plane images are not garbage-collected while we're doing a
Kubernetes upgrade otherwise the upgrade can fail.

Accordingly we want to disable garbage-collecting the images, then
pre-pull the new images, then do the actual K8s upgrade, then re-enable
image garbage collection.

For duplex systems we can disable garbage collection from the puppet
manifest, but for simplex puppet isn't involved so we have to do it
from sysinv.

The re-enabling of the image garbage collection happens when we
upgrade kubelet to the final desired version.  It's done in the
puppet commit linked below.

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/901778

TEST-PLAN:

PASS: Perform single-verison K8s upgrade on AIO-SX, ensure upgrade
      passes and image garbage collection is disabled when we
      download images and re-enabled when kubelet gets upgraded.

Closes-Bug: 2044493

Change-Id: Ide258768c3b05a01c4e903e52380a348c2fcae65
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2023-11-24 13:08:36 -06:00
Zuul a87ed49751 Merge "Add instrumentation for kube_upgrade_control_plane" 2023-11-24 18:43:34 +00:00
Yuxing Jiang bc40879eca Improve kube-rootca-get-id API and error handling
This commit corrects a error in the API reference introduced in:
Ie78121d0c21d2c6033c8b5d4919e251fc4d98050.

This commit also improves the error handling to return understandable
error message, avoids print exception if the cert missed in the
file system.

Reduces the info logs from utils to prevent the dc audit dump too many
logs into the sysiv.log.

Test plan:
Passed - deploy an AIOSX, check the cert id by:
         system kube-rootca-get-cert-id.
Passed - manually remove the kube-rootca cert and key from the system,
         check the output of "system kube-rootca-get-cert-id", verified
         the error message w/o exceptions.
Passed - verify the dc audit doesn't dump logs about the cert id in
         sysinv.log.

Story: 2010852
Task: 49091

Signed-off-by: Yuxing Jiang <Yuxing.Jiang@windriver.com>
Change-Id: I47f1a9ca617bf0daf9c25e7b4552e52d3e9d1811
2023-11-24 09:16:48 -05:00
Jim Gauld 80cd44c22a Add instrumentation for kube_upgrade_control_plane
This adds instrumentation for sysinv kube_upgrade_control_plane
so that we see progress and more error reasons when retrieving
kubernetes control-plane versions. This adds a few more places
to generate exceptions so that a retry is performed.

This enforces we must be able to get the versions from each
control-plane component (kube-apiservver, kube-controller-manager,
kube-scheduler) by querying pods that match expected pod name and
container image.

TEST CASES:
PASS: Run orchestrated kubernetes upgrade: AIO-SX, AIO-DX, STANDARD.
      Verify we see new logs during upgrade control plane.
PASS: Manually modify code to test likely exception paths causes retry.

Closes-bug: #2044413

Change-Id: Ic33cdbdf390804c7a0791609a350dd1df6e697e4
Signed-off-by: Jim Gauld <James.Gauld@windriver.com>
2023-11-24 00:34:44 -05:00
Zuul e72aca646a Merge "Upgrade sts-silicom app" 2023-11-23 20:32:23 +00:00
Caio Bruchert 64e1b35522 Upgrade sts-silicom app
Images:
  Tsyncd: quay.io/silicom/tsyncd:2.1.3.6
  Phc2Sys: quay.io/silicom/phc2sys:3.1-00193-g6bac465
  GrpcTsyncd: quay.io/silicom/grpc-tsyncd:2.1.2.18
  Gpsd: quay.io/silicom/gpsd:3.23.1

Test Plan:
PASS: Apply sts-silicom app and check that it gets locked to
      clock class 6
PASS: Delete sts-silicom pod several times and check that it
      always gets locked to clock class 6
PASS: Lock/unlock the controller several times and check that
      the STS app always gets locked to clock class 6

Closes-Bug: 2044178
Depends-On: https://review.opendev.org/c/starlingx/app-sts-silicom/+/901620

Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
Change-Id: I7699c41e54fc2247295b6bb526f506c64c81a389
2023-11-21 17:00:12 -03:00
Gabriel de Araújo Cabral a46a4adbd0 Add flag to update ceph-mon ip due to MGMT reconfiguration
During a management network reconfiguration on AIO-SX with Ceph
storage backend, this flag will indicate the need to update the
ceph-monitor to the new network.

The flag is used in the mandatory controller-0 unlock during the
mgmt reconfiguration by the "platform::ceph::monitor" puppet class,
which will proceed with the ip update.

Test Plan:
 PASS: Reconfigure mgmt network on AIO-SX ipv4 with Ceph backend
 PASS: Reconfigure mgmt network on AIO-SX ipv6 with Ceph backend
 PASS: After controller-0 is unlocked, check if Ceph is healthy with
       'ceph -s'
 PASS: Reconfigure mgmt network on AIO-SX ipv4 without Ceph backend

Story: 2010722
Task: 48973

Depends on: https://review.opendev.org/c/starlingx/config/+/889724

Change-Id: Id3e4cee3b4cf51fa48c160d4ddbed0a3b55cb97a
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
2023-11-21 18:01:09 +00:00
Zuul 47516972f4 Merge "MGMT address_pool reconfiguration for AIO-SX" 2023-11-21 17:03:03 +00:00
Fabiano Correa Mercer 8503921d55 MGMT address_pool reconfiguration for AIO-SX
This change allows the reconfiguration of the management
address_pool for an AIO-SX installation.
The reconfiguration can be done even when the system was
already configured and unlocked, but it needs to lock the
controller in order to reconfigure the management network.
Since there are ansible rules using the name: "management"
as the address_pool, it is necessary to enforce the use of
the address_pool named "management" in order to create the
mgmt network.
During a management network reconfiguration the DNSMASQ
changes must not be applied in runtime, to all changes
take effect the host lock/unlock is mandatory.

Test plan
PASS: AIO-SX IPv4 delete and create another management
      address-pool and create the management network with it
PASS: AIO-SX IPv6 delete and create another management
      address-pool and create the management network with it
PASS: AIO-SX IPv4 fresh install
PASS: AIO-SX IPv6 fresh install
PASS: AIO-DX IPv4 fresh install
PASS: AIO-DX IPv6 fresh install
PASS: STANDARD IPv4 fresh install
PASS: DC with AIO-SX IPv4 fresh install

Story: 2010722
Task: 48469
Change-Id: I2de156162dc83d2c16437d2d8068054de19a3b20
Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>
Signed-off-by: Teresa Ho <teresa.ho@windriver.com>
2023-11-20 14:59:08 -03:00
Zuul 0825055c7a Merge "Add pod health status to kube rootca check" 2023-11-17 21:11:52 +00:00
Victor Romano f6247569ce Add pod health status to kube rootca check
As part of the kube rootca certificate update, it's recommended to
have all pods in ready state to avoid problems during it. This
commit adds an optional flag to 'system health-query-kube-upgrade'
command to check for pod health and returns a list of pods that are
not ready or completed.

Usage: system health-query-kube-upgrade --rootca

Test Cases:
1) PASS: Run the following commands and verify their output remains
         unchanged.
         - system health-query
         - system health-query-upgrade
         - system health-query-kube-upgrade (without --rootca)
2) PASS: Run "system health-query-kube-upgrade --rootca" without any
         pod in failure state and verify that the correct success
         message was included in the command output.
3) PASS: Repeat test 2 but adding pods in unhealthy state (Error,
         Evicted and CrashLoopBackOff) and verify that the output
         contains the correct error message and a list of the
         unhealthy pods.
4) PASS: Repeat test 2 but adding pods with completed and pending
         status and verify the completed pod was not added to the
         failed pod list and the correct success message was showned.
5) PASS: Repeat test 3 but adding pods with completed and pending
         status and verify this pods weren't added to the failed pod
         list and the correct failure message was showned.
6) PASS: Run 'system kube-rootca-update-start' with pods in unhealthy
         state and verify the update did not start and the correct
         error message was displayed.
7) PASS: Run 'system kube-rootca-update-start' with all pods in
         healthy state and verify the update process started
         successfully.
8) PASS: Create and apply a new sw-manager kube-rootca-update-strategy
         with pods in unhealthy state and verify the apply was aborted
         and the correct error message was displayed.
9) PASS: Create and apply a new sw-manager kube-rootca-update-strategy
         with all pods in healthy state and verify the update was
         applied successfully.

Story: 2010852
Task: 49085

Change-Id: I463ecc8a1107375e4e0997e07581b10ec8d129e2
Signed-off-by: Victor Romano <victor.gluzromano@windriver.com>
2023-11-17 17:19:42 -03:00
Robert Church efc6f54430 Add docker filesystem class to docker runtime execution
To ensure that the docker-lv filesystem is present and mounted before
starting docker, docker.pp was updated to include additional
dependencies on the mount of the filesystem. This was done to avoid a
race condition where docker was started prior to creation of the
filesystem and would generate key files/directories only to have the
filesystem later mount causing those files to "disappear" from the
daemon. The end result was docker pulls failing due to missing files.

This change now include the 'platform::filesystem::docker' to provide
the Mount['docker-lv'] target now need by the docker configuration
classes that are executed at unlock and at runtime.

Test Plan:

PASS - creation and application of service parameter:
       docker/proxy/http_proxy, docker/proxy/https_proxy,
       docker/proxy/no_proxy. Observed successful runtime manifest
       execution and create of http-proxy.conf files for containerd and
       docker.

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/900806
Change-Id: Ia742b17af7492f45b2fbe742a4887a0d2047747a
Partial-Bug: #2043399
Signed-off-by: Robert Church <robert.church@windriver.com>
2023-11-17 01:55:38 -06:00
Zuul 8c82e07cb9 Merge "Block host-unlock till apparmor manifest completes" 2023-11-16 14:32:04 +00:00
Jagatguru Prasad Mishra 0fb91eb62a Block host-unlock till apparmor manifest completes
If the following commands are issued in quick succession,
1. system host-update controller-0 apparmor=enabled
2. system host-unlock controller-0

The puppet runtime manifest, which is executed asynchronously,
will not have enough time to run and apparmor module won't get
loaded after unlock.

This feature will add reporting of apparmor runtime
manifest status. The 'in progress' status will be persisted
in the i_host table and used to validate host-unlock

Closes-Bug: 2042926

Test plan:
PASS: AIO-DX: Issue host-unlock command soon after
      'system host-update <host> apparmor=enabled' command.
      Verify that host-unlock fails with message 'Can not unlock
      <hostname> apparmor configuration in progress.'
PASS: AIO-DX: Enable/disable the apparmor module on a host  using
      host-update command and verify if it is enabled/disabled
      respectively after reboot
PASS: AIO-SX: Enable/disable the apparmor module on a host  using
      host-update command and verify if it is enabled/disabled
      respectively after reboot

Change-Id: I8f13ad4316e4edd4a6c73648ee4b06eb379ebe76
Signed-off-by: Jagatguru Prasad Mishra <jagatguruprasad.mishra@windriver.com>
2023-11-16 02:49:42 -05:00
Zuul ea07f96992 Merge "Fix load-import required patch check" 2023-11-15 23:36:07 +00:00
Zuul 1e272a97d7 Merge "API to get kube rootCA ID" 2023-11-15 20:38:37 +00:00
Bin Qian e781961d33 Fix load-import required patch check
The required_patch xml tag was mistakenly changed to required_patches in
previous commit [1]. This change is to make sure the correct tag is used.

[1] https://review.opendev.org/c/starlingx/config/+/888329

Test plan:
  PASS: load-import with and without --inactive, both with and without
        required patches applied on from-release.

Closes-bug: 2043510

Change-Id: I69504396ffda4565ebcce9b759c88f98d2631564
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2023-11-15 16:15:50 +00:00
Yuxing Jiang 8ace6db94c API to get kube rootCA ID
The DC Kubernetes root CA audit was based on the cert expiration alarm.
We would like to switch to cert comparison to ensure the Kubernetes
root CA certs. For this purpose, this commit creates a new sub API to
get Kubernetes root CA ID from local.

Test plan:
1. Passed - Create ISO and deploy AIOSX and distributed cloud.
2. Passed - Run "system --debug  kube-rootca-get-cert-id", have the
expected response, and the cert ID was printed.

Story: 2010852
Task: 49091

Signed-off-by: Yuxing Jiang <Yuxing.Jiang@windriver.com>
Change-Id: Ie78121d0c21d2c6033c8b5d4919e251fc4d98050
2023-11-15 16:12:08 +00:00
Zuul 05b24067a3 Merge "Set default error message for for forbidden" 2023-11-15 02:00:57 +00:00
Takamasa Takenaka 56a6555315 Set default error message for for forbidden
The issue is that some error dialog shows
"the JSON object must be str, bytes or
bytearray, not NoneType" when non admin user
tries to modify configuration.

Functional-wise it is expected behavior to
reject configure operation because of
forbidden. But this message is not proper
for users.

This happens when it extracts the error message
from api, it is a case to be empty. The fix
is check empty message and show the default
message if it is empty.

Closes-bug: 2037320

Test Plan:
PASS: Confirm error message dialog contains
      proper text in detail.

Change-Id: I980df3356b60a59b19ec8b552f848e63dec3b621
Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
2023-11-14 21:18:46 -03:00
Zuul 43cc03568a Merge "Get system-local-ca secret's ca.crt (DX sc upgrade)" 2023-11-14 20:23:33 +00:00
fperez c1ed66920c Fix unhashable type: 'slice' error in upgrade
The previous return type of the function get_upgrade_msg() caused
issues in the dcmanager orchestrator during subcloud upgrades.

With this small change, the output value from this function can now
be read directly without the need for any specific parsing.

Test Plan:
PASS: Create an upgrade strategy, force errors, and verify the accessibility of the output from the Systemcontroller.

Closes-bug: 2043408

Change-Id: I336014269593c504c566a6a902071c160696992b
Signed-off-by: fperez <fabrizio.perez@windriver.com>
2023-11-13 20:07:55 +00:00
Zuul 936a10d71f Merge "Increase timeout for cordon operation" 2023-11-11 00:36:23 +00:00
Boovan Rajendran 25a42e06df Increase timeout for cordon operation
Based on lab testing we need to increase the overall timeout for the
host cordon operation to give pods more time to shut down cleanly.
In lab testing 150 seconds was sufficient, but it's possible that
some conditions exist which might still cause us to hit the timeout.
If this happens, we still want to treat the cordon operation as a
success.

Test Plan:
Pass: Test by running 'system kube-host-cordon controller-0' on
AIO-SX and verify cordon operation completed successfully.
Pass: Perform k8s upgrade using orchestration method on AIO-SX
and verify k8s upgraded successfully.

Closes-Bug: 2042353

Change-Id: I47fad26c98297227f6352c2df666c384be42d252
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2023-11-10 12:56:16 -05:00
Zuul 417433251e Merge "Add firewall rules to allow IPSec ESP traffic in mgmt network" 2023-11-10 17:28:30 +00:00
Lucas Ratusznei Fonseca 53703caab0 Add firewall rules to allow IPSec ESP traffic in mgmt network
This commit adds rules to allow ESP packets through the management
network.

Test plan

System: STANDARD

[PASS] Allow IPSec connections and traffic for IPv4 in VBox
[PASS] Allow IPSec connections and traffic for IPv6 in VBox

Story: 2010940
Task: 49072
Change-Id: Ie279a6e4f761dad83b652522335884334c9ff21e
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
2023-11-10 12:47:51 -03:00
Zuul 1802a7a094 Merge "Typo fix in system help command output for kube-upgrade-failed" 2023-11-10 03:41:15 +00:00
Zuul 29c2b91446 Merge "Fix: Retain sysinv data during migration" 2023-11-09 21:37:58 +00:00
Zuul 95fc0cde7c Merge "Disable cert-mon audit for subclouds being rehomed" 2023-11-09 21:37:53 +00:00
Joshua Kraitberg d9e12f6c45 Fix: Retain sysinv data during migration
The migration code was deleting /opt/platform/<FROM_RELEASE>/sysinv,
before it was migrated to /opt/platform/<TO_RELEASE>/sysinv.
This caused the files inside like `sysinv.conf.default` to be lost
during simplex upgrade.

Originally, in legacy restore, `sysinv.conf.default` was
individually restored after migration so the deletion
had no impact.

`sysinv.conf.default` is required on non-SX systems.
This is used so that other hosts sysinv-agent can mount and
have an initial sysinv.conf suitable for RPC to the controller.

The loss of the file is not problematic on a SX system, but would
prevent a later SX-to-DX migration.

TEST PLAN
PASS: Optimized upgrade AIO-SX, stx6 to stx8
PASS: Optimized upgrade AIO-SX, stx7 to stx8

Closes-bug: 2042971
Signed-off-by: Joshua Kraitberg <joshua.kraitberg@windriver.com>
Change-Id: I7a22e050f74785b99ea6b7758cf23d3419add1de
2023-11-09 15:51:48 -05:00
Gustavo Herzmann 1d0cda1863 Disable cert-mon audit for subclouds being rehomed
This commit disables cert-mon audit for subclouds with the
'rehome-pending', 'pre-rehome', 'rehoming' and
'rehome-failed' deploy_status by adding them to the list
of invalid deploy statuses.

Test Plan:
For each introduced deploy status:
1. PASS - Deploy a subcloud, ensuring it's audited by cert-mon. Change
          its deploy-status to the invalid one and verify that the
          audit is skipped, with the relevant message logged;
2. PASS - Change the deploy-status back to 'complete' and verify that
          the next cert-mon audit runs successfully without being
          skipped.

Depends-on: https://review.opendev.org/c/starlingx/distcloud/+/900288

Story: 2010852
Task: 49062

Change-Id: I59f204923fe3752df5caf6b1ea3ddb4ddc14d49e
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
2023-11-09 12:58:06 +00:00
Zuul ab82680c1d Merge "Remove 'resourceVersion' from cert-manager-backup.yml file" 2023-11-08 21:33:59 +00:00