When using a pre-patched ISO, the ostree history in the feed only
shows the last commi, the oldest commits are replaced by a message:
"History beyond this commit not fetched".
In these cases, ostree has a bug if we add a commit on this repo and
remove it later, the ostree complains a tombstone file is missing.
To fix this, after a patch removal, this review checks if this file
exists. If not the file is created.
It also creates a tombstone-commits=true in the repo config file, in
case in the future this bug is fixed on ostree side, then ostree will
be aware a tombstone file exists.
Test-Plan:
PASS: Remove a patch on a pre-patched system with success
PASS: Apply 2 patches and remove the last one on a pre-patched
system with success
PASS: Check if tombstone-true was added in the config file
Closes-bug: 2098891
Change-Id: Icff3f834b81dd399736f14965e4b90b2130cce2e
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
In order to save memory the system deletes old ostree deployments,
after a reboot required update/upgrade. But in some cases, it was
not deleting them.
This commit fixes it, by creating a unique flag to say if the system
was rebooted after an update/upgrade.
Using this flag the agent can perform operations after the reboot.
This flag is saved under /var/persist, a folder that persists
between deployments.
Test-plan:
PASS: Install patches 22.12
PASS: Remove patches 22.12
PASS: Install patches 24.09
PASS: Remove patches 24.09
PASS: Rollback patches 24.09
PASS: Install major release 22.12 to 24.09
PASS: Rollback major release (during an upgrade from 22.12 to 24.09)
PASS: Apply/Remove 24.09 patch before bootstrap
Closes-bug: 2098029
Change-Id: I9e8cbea83e95f7cf8fe2af505766d476fcd3aee7
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
The reapply evaluation process has been causing race conditions during
the platform upgrade process. The issue was initially addressed by
blocking the evaluation process for apps that are in 'updating' status.
However, depending on other triggers and different order of events, it
is still possible that similar problems happen.
In order to fully prevent such race conditions, the reapply evaluation
process was deferred until the upgrade is completed. This is feasible
because no apps will actually be reapplied during the upgrade since
application auditing is already disabled.
This commit implements a trigger to start the evaluate app process at
the end of software upgrade complete.
Test plan:
PASS: AIO-SX stx-10 install
PASS: upgrade from stx-10 to master
PASS: check if evaluate_apps_reapply function was called
PASS: AIO-DX stx-10 install
PASS: upgrade from stx-10 to master
PASS: check if evaluate_apps_reapply function was called
Depends-on: https://review.opendev.org/c/starlingx/config/+/940820
Story: 2011242
Task: 51667
Change-Id: Ibdd9d4a8716ca08332b66e9af4a9e6518ac987bc
Signed-off-by: edias <edson.dias@windriver.com>
The ceph.conf configuration file is needed during data migration to
support updating the inventory to accurately reflect the configuration
of the ceph cluster. Known bugs from previous releases may result in an
inaccurate inventory that will cause errors during upgrade activation.
Test Plan:
- PASS: Verify that during the upgrade start the
ceph.conf file was modified.
Partial-Bug: 2095187
Change-Id: I4bf799ddbcb697086444970c076c652369268fb3
Signed-off-by: Gabriel Przybysz Gonçalves Júnior <gabriel.przybyszgoncalvesjunior@windriver.com>
This commit removes the duplicate message for system-local-ca private
key in deploy precheck as it appears twice in the code.
Test Plan:
PASS: check 22.12 -> 24.09 deploy precheck do not have dup message.
PASS: check 24.09 -> 25.09 deploy precheck do not have dup message.
Story: 2010676
Task: 51631
Change-Id: I6c2c24c38ce65a729002e0fe5dec8e1284db2ce2
Signed-off-by: Luis Eduardo Bonatti <luizeduardo.bonatti@windriver.com>
This change [1] copies the pxe files to a folder however in the rollback
scenario the file don't exist. This commit adds a condition if the folder
exists before copy.
Test Plan:
PASS: DX rollback after host-done state.
1: https://review.opendev.org/c/starlingx/update/+/938922
Story: 2010676
Task: 51557
Change-Id: Iae348ed0ee400e7847f5eec58d250a890cd34488
Signed-off-by: Luis Eduardo Bonatti <luizeduardo.bonatti@windriver.com>
When state is "None", it doesn't have a value attribute. This commit
handles null value and logs it accordingly.
Test Plan:
PASS: Unit test case with state value null
PASS: Verify patch deploy and removal for pre-bootstrap
case
Task: 51543
Story: 2010676
Change-Id: Ib5fce886c5b88dcaecfe506a16b31b179a2e536d
Signed-off-by: sshathee <shunmugam.shatheesh@windriver.com>
During the upgrade of a DX+ subcloud the pxe files do not exists
under /var/pxeboot/pxelinux.cfg.files which results in an error to
unlock the controller-1 after deploy host. This commit add a code to
copy these files to the correct var during deploy start.
Test Plan:
PENDING: Finish a System Controller and DX+ subloud upgrade.
PASS: Unlock controller-1 after deploy start in DX+ subcloud
Story: 2010676
Task: 51541
Change-Id: I078dcb3b1756f3701efef71263ed0a65f79f4a53
Signed-off-by: Luis Eduardo Bonatti <luizeduardo.bonatti@windriver.com>
The "ostree pull" command was pulling from feed repo to
"debian:starlingx" ref of sysroot repo. During "software
deploy host" we check that feed repo commit and sysroot
commit are same after pull. Earlier we were checking on
"starlingx" branch which was causing issue as pull was
making changes on "debian:starlingx".
This commit checks on debian:starlingx ref of sysroot and
creates deployment there too.
Test Plan:
PASS: Patch deployment lifecycle on AIO-SX
(deploying and removing)
PENDING: Patch deployment lifecycle on Standard
PENDING: Patch deployment using prestage on DC
Task: 51527
Story: 2010676
Change-Id: Ic8e3dedbbe04bb09049529a5f6b485ec24520675
Signed-off-by: sshathee <shunmugam.shatheesh@windriver.com>
This commit is to add the nodetype check during USM initialize
service to ensure the controller preset only exists in AIO controller.
This commit is to fix the issue where in AIO + Worker system, the
controller preset is incorrectly existing in worker node.
Test Plan:
PASS: build and deploy iso
PASS: ensure the controller preset not present in worker node
in AIO+Worker system
Task: 51528
Story: 2010676
Change-Id: I827fa091ccf10f8e8d1c43f1122cd08ab1b40e78
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
The license_file may appear empty during the license plugin
verification because the write() operation buffers the data
in memory and does not immediately write it to disk.
Calling flush() explicitly ensures that the content is
written to disk immediately.
Test Plan
PASS: verify the license check passes with no empty file error
Closes-bug: 2093343
Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>
Change-Id: Ibf5e04b1516ce789812e09a50689ec268cd06e4d
software-controller and sw-patch-controller both arbitrarily perform
repo rsync from the active controller. This operation deletes the newly
created to-release repo during a DX major release deploy, as the
to-release is first deployed to the standby controller.
This change applies the same change to both services, to verify that the
active controller is actually running the same sw-version before
performing rsync.
This change works together with [1]
TCs:
passed: on a DX system, upgrade from 22.12 to 24.09 then patch to
24.09 with success
Story: 2010676
Task: 51516
[1] https://review.opendev.org/c/starlingx/update/+/938409
Change-Id: I952d1633387512ccdc40dcbc8c062cbe8efedcd6
Signed-off-by: Bin Qian <bin.qian@windriver.com>
The current 24.09 UpdateKernelParameters rollback hook will only
rollback the out-of-tree-drivers for nodes that have a worker function.
The out-of-tree-drivers are need for all personality types.
This update removes the current worker personality check so that the
out-of-tree-drivers kernel command line argument is rolled back for
all node types.
There were 2 additional fixes required:
1. The from_release compare was wrong and needs to be 24.09
rather than 22.12
2. The parameter name passed to the hook was 'oot_drivers' but
needs to match what is passed in from additional_data
as 'out-of-tree-drivers'
Test Plan:
PASS: Verify Rollback AIO SX from 24.09 back to 22.12
PASS: Verify Rollback AIO DX from 24.09 back to 22.12
PASS: Verify Rollback Standard System from 24.09 back to 22.12
PASS: Verify Rollback Standard System Worker from 24.09 back to 22.12
PASS: Verify Rollback Standard System Storage from 24.09 back to 22.12
Closes-Bug: 2092950
Change-Id: Icf2c1598371f05b411ffe82c535cb2b3cb0db9ba
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is to fix the release deletion error on system controller
when /opt/software/metadata/committed/ directory doesn't exist.
This directory doesn't exist if the system controller is upgraded from
previous release. It will cause error if the release deletion tries to delete
the files in the directory.
This fix will check if this directory exists before doing the deletion.
Test Plan:
PASS: build and deploy iso
PASS: upgrade system controller and delete the unavailable release
PASS: upgrade SX and delete the unavailable release
Task: 51524
Story: 2010676
Change-Id: I1ccd4e30e8600a95cf749a15f61ce0f081ae9d21
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
This commit removes the passwd file from being copied
from from-release to to-release during the upgrade
preparation, as this file can contain differences
between the releases, which can lead to issues later
in other operations, as it was observed with postgres
user.
Test Plan
PASS: AIO-SX stx8 -> stx10 upgrade, verify postgres user
owns /etc/postgresql and /var/lib/postgresql/<release>
and verify command 'sudo -u postgres psql -d sysinv'
runs successfully
PASS: STD stx8 -> stx10 upgrade, verify postgres user
owns /etc/postgresql and /var/lib/postgresql/<release>
and verify command 'sudo -u postgres psql -d sysinv'
runs successfully
Closes-bug: 2093121
Change-Id: I0e2e6f1eecde817fffa47ba02134331bfbcb3218
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
When upgrading from 22.12 to a pre-patched ISO the
software-controller service is not being initialized.
This service among other things is responsible for initializing
apt-ostree and syncing some folders. Without it, the system is unable
to apply a patch after the upgrade.
This commit fixes it by starting the service after a major upgrade
Test-plan:
PASS: Upgrade a system from 22.12 to 24.09.1 and check if the service
was started, apt-ostree initialized and folders synced.
PASS: Apply a patch with success
Story: 2010676
Task: 51516
Change-Id: I8ab8664ab309fc9dc5edf0973c220874b0692bc7
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
This commit pre-populates out-of-tree-drivers kernel parameter
in boot.env according to the service parameter setting.
It will also make content of boot.env and /proc/cmdline
consistent with each other.
Additionally,it will prohibit updating the same kernel parameter
ie out-of-tree-drivers in /proc/cmdline twice during driver
switching between in-tree and out-tree.
TEST PLAN:
PASS: Upgrade the AIO-SX system and observe whether the
kernel parameter is consistent in boot.env and
/proc/cmdline.
PASS: In upgrade system, using system service-parameter-modify,
switch the out-of-tree-drivers from "none" to "ice,i40e,iavf"
and vice-versa.After system unlock,observe whether the
the kernel parameter is consistent in boot.env and
/proc/cmdline and there is no duplication.
PASS: Monitored the system for no additional reboot after upgrade
post deploy unlock operation.
Story: 2010676
Task: 51493
Change-Id: I8086164aaeb9ba56c33f9c687777920d1bbb1bfd
Signed-off-by: sshaikh1 <sirin.shaikh@windriver.com>
This commit pre-populates out-of-tree-drivers kernel parameter
in boot.env according to the service parameter setting.
It will also make content of boot.env and /proc/cmdline
consistent with each other.
Additionally,it will prohibit updating the same kernel parameter
ie out-of-tree-drivers in /proc/cmdline twice during driver
switching between in-tree and out-tree.
TEST PLAN:
PASS: Upgrade the AIO-SX system and observe whether the
the kernel parameter is consistent in boot.env and
/proc/cmdline.
PASS: In upgrade system, using system service-parameter-modify,
switch the out-of-tree-drivers from "none" to "ice,i40e,iavf"
and vice-versa.After system unlock,observe whether the
the kernel parameter is consistent in boot.env and
/proc/cmdline and there is no duplication.
PASS: Monitored the system for no additional reboot after upgrade
post deploy unlock operation.
Story: 2010676
Task: 51493
Change-Id: I130283c93341ec069f2a92a8128ebd9c25032d1b
Signed-off-by: sshaikh1 <sirin.shaikh@windriver.com>
Currently when deploy precheck is executed, the license check
may append extra output to the final output.
This commit suppresses the extra output appended by the
verify-license binary.
Test Plan
PASS: run deploy precheck with an invalid license, verify it fails
and no extra output is appended
PASS: run deploy precheck with a valid license, verify it passes
and no extra output is appended
Closes-bug: 2092402
Change-Id: I967dafdca0db2957310e966c562b5ddbd455623d
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Ostree config file must have a min-free-space-percent input into the
core (first) section. Assuming only core section existed at this
point, it was always adding this line at the end of the config file,
thus the end of core. But that is not the case anymore.
This commit, uses configParser to always put min-free-space-percent
into the core section.
Test-Plan:
PASS: Upload a pre-patched 24.09.1 ISO in a 22.12 system and check
if min-free-space-percent was put under core in the 24.09
ostree config file.
Story: 2010676
Task: 51486
Change-Id: Ib180a748b17fbe1079a8b8bd424cc3312b6d0fcb
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
This commit changes the default location of upgrade scripts,
which is currently used by major release deployment.
Test Plan
PASS: AIO-SX: run stx-8 -> stx-10 major release deployment e2e,
verify the new script location is used in migrate/activate
Depends-on: https://review.opendev.org/c/starlingx/config/+/937885
Partial-bug: 2091944
Change-Id: I3791030a2c7827308146bc6cbca91b6bd2014e46
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
This is a commit adds new additional data to pass to deploy host in
order to support pre-populating kernel parameter in the case some
changes occur during a major release deployment. This will help reduce
unnecessary host reboot due to kernel parameter change with new
configuration after switching to new software release.
This change also includes new out-of-tree-drivers kernel parameter
pre-populated according to the service parameter setting.
Story: 2010676
Task: 51461
TCs:
passed: upgrade from 22.12 on AIO-DX, observed kernel parameter
out-of-tree-drivers is set or removed according to the
corresponding service parameter prior to host unlock after
deploy host
Signed-off-by: Bin Qian <bin.qian@windriver.com>
Change-Id: Ie3b0d9442674f452fc890960138adde745f8087c
The software agent uses the latest_feed_commit value to check if the
correct deployment was installed in the node. The query function is
responsible for updating this value. But for major releases,
sometimes it is not possible to get this value updated, so query has
a major_release parameter to return early if it is a major release.
This commit fixes a query call that was not passing the major release
parameter. And resets the latest_feed_commit value in case it is a
major release.
Test-Plan:
PASS: Install a patch
PASS: Rollback a patch
PASS: Install a major release
PASS: Rollback a major release
Story: 2010676
Task: 51460
Change-Id: I8c9eeab39484bab1a4e8886deacc8fe32c1cf611
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
Sometimes the "ostree admin deploy" command does not work in the
first try, we don't know yet the root cause of it. When this happens
the node trying to deploy, fails, reboots and tries to deploy again.
But this reboots cost a lot of time.
This commit identifies the fail in the "ostree admin deploy" command
earlier and retry without the need to reboot.
Test-Plan:
PASS: Install a patch in a SX system
PASS: Install a patch in a DX system
PASS: Install a major release in a DX system
PASS: Install a patch in a DC system
PASS: Install a patch in a system with nodes
failing to install on the first try
Story: 2010676
Task: 51447
Change-Id: I8956dfd058394fc9a5927d08185fe8d26735aaa1
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
To address timeouts during large patch sync operations,
added a constant `TIMEOUT_SYNC_API_CALL` with a value of 120
seconds. This constant is used in the `call_api` method for
syncing controllers.
Test Plan:
PASS Verify no timeout while software sync with duplex
subcloud
Closes-Bug: 2091149
Change-Id: If453d13f76825e497b70e3ba6574cf216cb3ddd0
Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>
Three health checks are currently being skipped:
- k8s version
- active controller == controller-0
- license validation
This commit avoid these checks from being skipped.
Test Plan
PASS: run deploy precheck and verify the missing health
checks appear in the output
Closes-bug: 2090959
Change-Id: I843263e9d8df964637b36c236b13a6e93b5067ad
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
This commit sets the same output message for a major release upgrade
and a patching operation.
Test Plan:
PASS: Run a software deploy start on a system uploaded with the
latest patch and observe the message. Run the software deploy start
on a N-1 system uploaded with the current version iso. Verify that
the messages are the same.
Closes-bug: 2089412
Change-Id: I52e81bd588e0b535d358a5e6c624f5feb80341d9
Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
sw-patch-agent service would cause unwanted
reboots and conflicts with the USM patching strategy:
sw-patch-agent can interfere with software-agent and incorrectly
flag the host as reboot-required after noticing that it's not
patch-current by sw-patch standards
Logs will typically look like this before each reboot:
sw-patch-agent[2049]: patch_agent.py(390): INFO: Active Sysroot
Commit:650ace717b24afd2e7283cc6ce8b01f13adce84db95e03685610e120424610b9
does not match active controller's Feed Repo Commit:
028e1fa688afaa27aa7a34d4b5ee9eeb8d188b691ba3e558c409bb58e0e83fe2
sw-patch: Node has been patched, with reboot-required flag set.
Rebooting
Since sw-patch-agent is no longer needed, it's to be removed.
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/935555
Test-Plan:
PASS: AIO-SX upgrade using sw-manager strategy
PASS: AIO-DX System Controller upgrade using strategy
PASS: subcloud upgrade using dcmanager strategy
PASS: DC patch orchestration for n-1 subclouds
Story: 2010676
Task: 51387
Change-Id: I2af7dfab9da89eeba4ffef3fa0d884ae6f2c354f
Signed-off-by: mmachado <mmachado@windriver.com>
When installing the Debian packages, sometimes the package feed path
was wrongly passed, using the default component "updates", instead of
the release version.
This commit fixes it by building and using the correct path for each
patch.
Test-plan:
PASS: Deploy a patch in a SX system
PASS: Deploy a patch in a DX system
PASS: Deploy a patch in an upgraded SX system
PASS: Deploy a patch in an upgraded DX system
Story: 2010676
Task: 51379
Change-Id: I07aabdf49dae8474678a020917aaca5b57661c1d
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
Since VIM starts the patch deployment in one controller and finishes
in the other, the patch update scripts must be synced. So if it
fails, it must return false, and then the sync_from_nbr function will
try to sync again.
This way it ensures the update scripts sync between controllers.
Instead of deleting the entire /etc/update.d folder, it deletes only
the scripts inside.
Test-Plan:
PASS: Install a patch in a DX system using VIM with success
PASS: Do not raise an exception if the update scripts
folder does not exist in the first controller
Story: 2010676
Task: 51377
Change-Id: I4ae36499e2a092199c82d5742172f3777c0ddc74
Signed-off-by: Lindley Vieira <lindley.vieira@windriver.com>
During rollback operation a hook that removes the kubernetes config
symlink resulting in error to unlock, this hook should only be
executed in a 24.09 -> 22.12 rollback scenario, this commit adds a
validation to ensure that.
Test Plan:
PASS: Finish full rollback scenario.
Story: 2010676
Task: 51384
Change-Id: I3cec7868bab7fb0ccd6c6b1212cb443edac46fa2
Signed-off-by: Luis Eduardo Bonatti <luizeduardo.bonatti@windriver.com>