The deprecated feature gate "TTLAfterFinished=true" is no longer
supported in K8s 1.25. This feature gate needs to be
removed from the kubeadm config during the K8s upgrade from the
lower version to 1.25.
Test Plan: Debian
PASS: Verified removal of TTLAfterFinished=true with k8s 1.24 to 1.25
upgrade in AIO-SX.
Story: 2010368
Task: 47370
Signed-off-by: Ramesh Kumar Sivanandam <rameshkumar.sivanandam@windriver.com>
Change-Id: Ibc9e43ba69bfb83fbd35c1c0b1c1b95e4ab58539
The platform-nfs-ip service is not necessary for fresh installs
because it is just an alias for the controller IP.
But for old releases like StarlingX rel. 6 or 7 the
platform-nfs-ip uses a specific IP, If for some reason an error
occurs during the upgrade process, the upgrade will be aborted
and the nodes will downgrade to the older release again.
At this moment the nodes will try to communicate with the
previous platform-nfs-ip IP configured in /etc/hosts.
But if the active controller is using the new Release
this IP doesn't exist anymore and the downgrade will fail.
For this reason the platform-nfs-ip service will be available
just for upgrade operations and will be deprovisioned for fresh
installs or at the end of the upgrade process
( upgrade-activate phase ).
Test plan
PASS Fresh install on AIO-SX
Fresh install on AIO-DX
PASS Upgrade AIO-DX system from CENTOS Rel 7 to DEBIAN Rel 8
PASS Reboot controller-0 during upgrade of AIO-DX
controller-1 was the active one with the new release ( Rel 8 )
controller-0 using old release.
reboot controller-0 and check if it could connect to
controller-1 using old platform-nfs-ip.
PASS Upgrade-abort during AIO-DX upgrade
controller-1 was the active controller and already upgraded
controller-0 was upgraded but locked.
Abort the upgrade and downgrade to old release ( Rel 7 )
Partial-Bug: #2012387
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/878122
Signed-off-by: Fabiano Mercer <fabiano.correamercer@windriver.com>
Change-Id: Ia7217544f2c954a83af71d488e0f2d722e17ec64
system-health-query has a malformed output after "Use "fm alarm-list"
for more details.". To fix that this commit is adding a new line after
the phrase.
This is bug is introduced by Story 2009303 task 47478 in :
https://review.opendev.org/c/starlingx/config/+/874097
Test Plan:
PASS: Run "health-query-upgrade" and verify if there is a new line after
"Use "fm alarm-list" for more details.".
Closes-Bug: 2012415
Signed-off-by: Karla Felix <karla.karolinenogueirafelix@windriver.com>
Change-Id: I4cc82d31096c4728875d2bf7e6d28143e4872806
Implement logic to recover from Helm pre-upgrade hook timeout.
In order to successfully recover in this scenario, applications
need to be removed prior to retrying. A new constant was created
to store Helm error messages that can trigger this recovery logic.
More errors can be added to this constant in the future if needed.
The recovery logic leverages the already existing retry mechanism
triggered when an ApplicationApplyFailure exception is raised.
Test Plan:
PASS: build-pkgs -p sysinv
PASS: Add pre-upgrade hook to platform-integ-apps running
"sleep 300" and set the Helm release timeout to 3
minutes. Then rebuild package, update app, observe the
hook timeout being triggered, delete flux pods, and
watch the recovery logic successfully finish updating
the app.
PASS: upload/apply/remove/delete unmodified snmp app
Closes-Bug: 2011850
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
Change-Id: Ib2cf97ea728e8a9bec4559de04d3731f34f35f1b
A deferred config typically occurs shortly after initialization, when
the sysinv agent is ready to handle config requests. In rare cases,
an exception may occur when attempting to generate hieradata, which
depends on external services (e.g. k8s, ceph, etc).
This updates the puppet hieradata update to handle potential exception
and retry in the case of deferred runtime config.
TestPlan:
PASSED: bootstrap and host-unlock with these changes
PASSED: duplex system swact with these changes
PASSED: runtime manifest apply and file update apply
PASSED: deferred runtime manifest retries on hieradata update exception
PASSED: deferred runtime file retries on hieradata update exception
Closes-Bug: 2011838
Signed-off-by: John Kung <john.kung@windriver.com>
Change-Id: I77b2fd084165645abb7d68d436696c183a91ec8a
A docstring was missed when the functionality was implemented [1].
Add the missing docstring.
Skip tests because there is no functional impact.
[1]: https://review.opendev.org/c/starlingx/config/+/865327
Related-Bug: 1998499
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Change-Id: Ia1b5d281ebe67c4f1605cf2d645f896bb1e2b6e6
This adds sysinv upgrades support for Kubernetes 1.25.3 to 1.26.1
Test-plan: Debian
PASS: system kube-version-list shows
v1.26.1 available
Story: 2010368
Task: 47667
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
Change-Id: Ia8d7bfab558659efc385433154f44c3c4b0c763b
update_tpm_config_manifests was removed by:
https://review.opendev.org/c/starlingx/config/+/851510
Two methods were still referencing it.
However nothing was calling those methods, and therefore
all of the unused code is being removed.
Story: TBD
Task: TBD
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: If29a7f16f0c930823b06c1f496e9126027cf8c84
The ObjectListBase is unused by sysinv, and contains
no-member errors related to some of its unused code.
Removing this class and its unit tests from the codebase.
Note: oslo_versionedobjects contains this class, so
if in the future the data model needs to use this class
type, the oslo component can be used instead.
Test Plan:
PASS: tox
PASS: build / install / unlock AIO-SX
PASS: sysinv CLI and API tests that query lists
Story: 2010642
Task: 47627
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: Ia13e0cf86ffabe0abb6425fadcc192267db697b8
The set_quota_gib method was removed by:
There were still calls to that non-existant method
in the code, but that code was never being invoked
so it has been removed.
This change has no runtime impact. It is removing methods
that are never invoked (and would have failed if called).
This non-existant code was detected by pylint no-member,
but that check cannot be un-suppressed until all no-member
issues are addressed.
Test Plan:
PASS: build / install /unlock AIO-SX
Story: 2010642
Task: 47624
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I9295d87ad0bf3e8b3c0855c0ee36d046f457d22b
The command 'system health-query-upgrade' returns the value [Fail]
for the field 'Valid platform-backup partition',
that happens because the lsblk command is not able to return
a device path when multipath disks are used or when the root
partition is under cgts--vg.
To fix this, backup commands were added in case lsblk fails to
return a valid path.
Test Plan:
PASS: Upgrade AIO-SX (with and without multipath):
Valid platform-backup partition: [Ok]
on "system health-query-update"
PASS: Upgrade AIO-DX (with and without multipath):
Valid platform-backup partition: [Ok]
on "system health-query-update"
PASS: Upgrade Standard (with and without multipath):
Valid platform-backup partition: [Ok]
on "system health-query-update"
Story: 2010608
Task: 47648
Signed-off-by: Matheus Guilhermino <matheus.machadoguilhermino@windriver.com>
Change-Id: I2655dd4cfe1c91a8bcd2d7d8e10ddb662a58d6ce
After switching the management network to the admin network,
we need to update the cert-mon sysinv endpoint in the
dc token cache to sync the dc certificate.
Test Plan:
PASS: dcmanager subcloud update (using admin network parameters)
1. Endpoints for the subcloud updated with admin ip value
2. subcloud availability = online
PASS: Verify that the subcloud is online shortly after succesful
completion of subcloud_update playbook
PASS: Verify that the cert-mon sysinv endpoint is updated
PASS: Manage the subcloud and verify that dc-cert_sync_status is
in-sync as expected
Story: 2010319
Task: 47645
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
Change-Id: Ia69e81597af2efec4a8e2496ae239fd73c864654
For AIO-SX "system kube-upgrade-start" allow a target kube upgrade
version more than one version larger than the current version.
Test Plan:
Pass: "system kube-upgrade-start v1.24.4" allow kube upgrade from
the current version v1.22.5.
Story: 2010565
Task: 47269
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
Change-Id: I05033fdb909a1caaf4d0c69e8b8b4c73289b81f4
Normally, a platform network's address pool is created at bootstrap
and is unable to be deleted after the initial configuration is
complete.
An exception to this is the admin network / address pool which is
able to be deleted and re-configured by a user at any time.
This facilitates the ability for a subcloud to be updated or
re-homed to adapt to network changes that may occur between
itself and the desired system controller.
Currently, most platform network address pools are created at
bootstrap by only specifying the range of addresses based on
the start-end addresses. When the address pool is associated
with a network, the appropriate floating, controller-0, and
controller-1 addresses are allocated.
In the case of the admin network, a user may decide to delete
the network/address-pool and re-create it with the (valid)
specification of a floating-address, controller-0-address,
and controller-1-address. ie:
system addrpool-add \
--floating-address 192.168.103.2 \
--controller0-address 192.168.103.3 \
--controller1-address 192.168.103.4 \
--gateway-address 192.168.103.1 \
admin 192.168.103.0 24
Currently, this would cause the floating IP to be
allocated as 192.168.103.5, controller-0 IP to be allocated
as 192.168.103.6 and controller-1 IP to be allocated as
192.168.103.7 when the address pool is associated with
a network. This would not be what the user expects.
In addition, it was found that if a user does use the
options for floating-ip, controller0/1 ip, the current
code has some significant bugs:
1. Possible to have mixed IPv4/IPv6 addresses
2. Possible to specify floating/controller addresses
as the network (subnet) address or the broadcast
address
3. Possible to specify the floating/controller addresses
completely outside the specified subnet.
This commit ensures that any user specification for the
floating or controller addresses are preferred over
auto-allocation.
Test Plan:
The addition of the addrpool unit tests provide over 40
new cases across IPv4/IPv6
Pass: Bootstrap a subcloud using the admin network.
Pass: On the subcloud, delete the admin addrpool and
re-create it with various combinations of
subnet/floating/controller-0/1 IP. These
combos are also covered by the unit tests.
Story: 2010319
Task: 46910
Change-Id: Ie59c26ae2a57b9cb570c7b20e3b40aa0f14fd95d
pylint indicated a no-member error in sysinv.common.utils
Unit tests were added to test_utils in order to verify that
an AttributeError was being incorrectly raised rather than
SysinvException. Then the code was corrected so that the
unit test would pass.
Note: The pylint error check cannot un-suppressed until
other components are updated.
This code change has no runtime effect on existing code,
since there are currently no config files that trigger that
error.
Test Plan:
PASS: Verify the new unit test fails without the code
change, and passes once the pylint error is fixed.
Story: 2010642
Task: 47626
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: I88ac6f2f7eba3c0dc77478071d59a5dd99f7936f
Details: This commit is to enhance the execution of lldpcli
in the subprocess.
The existing way is reading the OS release file during subprocess
execution. This file reading requires an extra file descriptor.
The commit is to only take "json0" value for Debian OS as
moving forward only Debian OS is supported.
Test Plan:
PASS: upgrade from 21.12 to 22.12 with this commit
PASS: fresh installation with this commit
Closes-Bug: 2009962
Signed-off-by: Junfeng (Shawn) Li <junfeng.li@windriver.com>
Change-Id: I1b35ab07775bbaee14a8c20513c398efcfa1d6e7
The command host-disk-list was reporting free space wrongly for
multipath disks that have partitions added to nova-local
logical volume.
This happens when the device node path has a dash in it like in
'/dev/mapper/mpathc-part1'. A 'grep -w' command was used to verify
if the device node is a physical volume. The command 'grep -w'
searchs for whole word, but the dash is considered a word separator,
causing the device node '/dev/mapper/mpathc' to be matched with
'/dev/mapper/mpathc-part1', once 'grep -w' detects the word
'/dev/mapper/mpathc'.
To fix this, the command is changed to detect the device node as
a string with an extra space at the end, for instance,
'/dev/mapper/mpathc '.
As an example consider device_node='/dev/mapper/mpathc' and
the output of 'pvs' command below:
PV VG Fmt Attr PSize PFree
/dev/mapper/mpatha-part5 cgts-vg lvm2 a-- <488.41g 251.59g
/dev/mapper/mpathb cgts-vg lvm2 a-- <20.00g <20.00g
/dev/mapper/mpathc-part1 cgts-vg lvm2 a-- <15.00g <15.00g
The 'pvs | grep -w /dev/mapper/mpathc' will match the string
'/dev/mapper/mpathc-part1', but it should not.
The 'pvs | grep "/dev/mapper/mpathc "' will not match any
string in that output.
Test Plan:
PASS: Install AIO-SX with multipath disks, add a new partition
to a spare multipath disk using half of its capacity.
Add this partition to nova-local logical volume group and
run 'system host-disk-list controller-0'. The available_gib
column should report the expected free space correctly.
PASS: Install AIO-SX without multipath disks, add a new partition
to a spare disk using half of its capacity.
Add this partition to nova-local logical volume group and
run 'system host-disk-list controller-0'. The available_gib
column should report the expected free space correctly.
Closes-bug: 2007309
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: If7636f39c0ff3d0858623d7ada8a391fd783ca32
Older versions of StarlingX created Barbican registry secrets as
"text/plain", that are retrieved as "string" in Python 3. This change
adds support to these secrets, that may be used in upgraded systems. In
non-upgraded newer systems, these secrets are created as
"application/octet-stream", being retrieved as "bytes" to be decoded
using UTF-8 in Python 3.
Test Plan:
PASS: Check that the system can retrive the registries usernames and
passwords when the secrets have the content type as "text/plain".
PASS: Check that the system can retrive the registries usernames and
passwords when the secrets have the content type as
"application/octet-stream".
Closes-Bug: 2009631
Signed-off-by: Joao Victor Portal <Joao.VictorPortal@windriver.com>
Change-Id: I5a71239b09ef1124449dc66f86ef790e1f23222c
This commit updates SSSD configuration to support SSSD sudo
capabilities for ldap users and groups.
Remote WAD ldap users as well as local openldap users can be
configured to get "sudo" and "sys_protected" privileges on
the stx platform when connecting using SSH.
Configuration updates were done by adding SSSD sudo service
and supporting parameters in the SSSD configuration file.
Test Plan:
PASS: Verify SSSD configuration in "/etc/sssd/sssd.conf" gets
updated with sudo parameters.
PASS: Create a user with sudo privileges in openldap and verify
that the sudo privileges are available in the stx platform
when the user connects with SSH.
PASS: Create a user with sys_protected privileges in openldap
and verify that the sys_protected privileges are available in
the stx platform when user connects with SSH.
PASS: Configure a sys_protected group in a remote WAD server and
verify it has been cached in the stx platform.
PASS: Add a WAD user to the sys_protected WAD group and verify the
user has sys_protected privileges in the stx platform.
PASS: Configure a sudo rule for a remote WAD user and verify the user
has sudo privileges in the stx platform.
PASS: Verify that a regular WAD user that has no sudo rules defined,
does not have sudo privileges in the stx platform.
Story: 2010589
Task: 47588
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/876393
Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
Change-Id: Id505d462cca26daad3fd82a49929e41a3d2cc1f4
When some application is deleted, it's information remains
cached in all dicts and lists from apps_metadata, with this
structure:
apps_metadata =
{apps: {},
platform_managed_apps_list: {},
desired_states: {},
ordered_apps: []}
The mentioned behavior was causing a bug after an upgrade from
stx 6 to stx 8, because an app that has no support in stx 8 is
deleted but continue cached in apps_metadata, and when
k8s_application_audit() is run after the upgrade is complete,
the system uses the cached list to try to upload this app until
the upload-failed state appears in the alarms list.
Now the created method is called right after deleting an
application, this way the app will be completely out of the
system.
Test-Plan:
PASS: Ugrade from stx 6.0 to 8.0 in a standard config lab.
PASS: During the upgrade, an app from the previous version
which has no support in the next version is automatically
deleted.
PASS: Validate that the deleted app is not in any
collection from apps_metadata.
PASS: Validate that the upload try of the deleted app doesn't
happen anymore.
Closes-Bug: 2009025
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
Change-Id: I6a54218b398493acc931c5eca34b800383b16cc0
This task will adapt existing implementation to run full certificate
expiration audit in "health-query-upgrade" and return fail in
_check_alarms in case of existence of any cert alarm in the system.
Both "expiring soon" and "expired" alarms will block upgrades, but
can be skipped with the use of the force flag. This change will also
add a information about certificate expiration alarms to the line
related to existing alarms of the output in "health-query-upgrade".
Note: Now that 'keystone_opt_group' is used for both cert_alarm and
health.py, the variable 'keystone_authtoken' had to be changed
to 'KEYSTONE_AUTHTOKEN' to match with the key that is used by
the CONF object from health.py which is configured as
uppercase in line 118 of openstack.py.
Test Plan:
PASS: Run "health-query-upgrade" with one or more 'expiring soon'or
'expired' alarms and verify that a message is show in
'health-query-upgrade' output saying that there is certificate
expiration alarms.
PASS: Run 'health-query-upgrade' with no active certificate alarm and
verify that no certificate alarms were shown in the output of
'health-query-upgrade'.
PASS: Run 'system upgrade-start' with the --force flag with one or more
certificate alarms and verify that the upgrade can be started
normally.
PASS: Add a new certificate with expiry date of less than 30 days
and run 'health-query-upgrade' before the scheduled full audit
runs and check if the alarm was created and detected by
'health-query-upgrade'.
PASS: Delete secret from a certificate that is monitored by cert-mon
and check if cert-mon was able to reinstall the secret to the
filesystem.
Task: 47478
Story: 2009303
Signed-off-by: Karla Felix <karla.karolinenogueirafelix@windriver.com>
Change-Id: Iaba585b6ecd7f63e0ed186f87c7274c4b9778889
In commit cf94bebd9 we added a routine to edit the saved bootstrap
config file. Originally this was fine since the file was guaranteed
to exist.
However, we now want to be able to do a "skip version" upgrade from
earlier releases where the bootstrap config file won't exist. In this
case we should just ignore the missing file.
Accordingly, catch the specific error case and emit an informational
log rather than failing the K8s upgrade.
Closes-Bug: 2009619
Change-Id: I103f9b769a4803f85624a8fa4014ae11fa4cf227
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>