3656 Commits

Author SHA1 Message Date
Zuul
9140826fdd Merge "Create SYSTEM_CONFIG in sysinv.common.platform_firewall" 2023-06-30 19:41:21 +00:00
Andre Kantek
1f4a9dd8fd Create SYSTEM_CONFIG in sysinv.common.platform_firewall
As part of story 2010591 the hard-coded values for L4 ports in
puppet were moved to sysinv.common.constants and exported there to
"system.yaml".

But the bootstrap runs prior to this file availability, hence the
port definitions are not available, and barbican failed during the
subcloud bootstrap.

To avoid that we are exporting all L4 ports that were exported to
system.yaml to the runtime.yaml file by accessing SYSTEM_CONFIG
during bootstrap.

Test Plan:
[PASS] AIO-DX standalone install/unlock/enable
[PASS] AIO-DX system controller install/unlock/enable
[PASS] AIO-SX subcloud install/unlock/enable

Closes-Bug: 2025361


Change-Id: Ieca1d6dd764622ec575fc29b73cc167cae968a5e
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-30 18:59:21 +00:00
Thales Elero Cervi
a5050c48fa Customize sysinv dpdk_elf_file for OVS-DPDK
As part of Debian migration, the sysinv procedure to check DPDK
compatibility for each host interface was also updated in order to make
it customizable in case one would like to use other virtual switch than
the delivered OVS with DPDK support [1].

For other virtual switches, that might or not rely on DPDK, the ELF
target that sysinv uses to verify interfaces compatibility must be
customizable and the query_pci_id script is already able to use custom
values [2].

This change adds the required logic on sysinv ovs puppet module such
that it is able to customize the hiera data with OVS-DPDK correct ELF
file. This is not strictly necessary, since it is configuring sysinv
with query_pci_id default ELF value, but would be an example for anyone
that wants to use a different virtual switch in the future and needs to
update the sysinv configuration likewise.

[1] https://review.opendev.org/c/starlingx/config/+/872979
[2] https://opendev.org/starlingx/config/src/branch/master/sysinv/sysinv/sysinv/scripts/query_pci_id#L34

Test Plan:
PASS - Build sysinv packages
PASS - Build a custom stx ISO with the new packages
PASS - Bootstrap AIO-SX virtual system (vswitch_type=none)
       and ensure the hiera data was not modified neither
       sysinv.conf was updated
PASS - Bootstrap AIO-SX virtual system (vswitch_type=ovs-dpdk)*
       and ensure the hiera data was modified correctly and
       sysinv.conf was updated accordingly
* A successful complete installation with ovs-dpdk is still blocked by
a bug that will be solved soon:
https://bugs.launchpad.net/starlingx/+bug/2008124

Story: 2010317
Task: 46389

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/887102

Signed-off-by: Thales Elero Cervi <thaleselero.cervi@windriver.com>
Change-Id: I4e33fb86199b3a0de7015aab44a66ef84138fca3
2023-06-29 20:16:43 -03:00
Zuul
55f6ca8659 Merge "Fix to clear the alarms on compute/storage nodes" 2023-06-27 15:57:17 +00:00
Zuul
799aeb596e Merge "Sysinv exports OAM firewall L4 ports to system.yaml" 2023-06-27 15:40:52 +00:00
Zuul
9940df5a4a Merge "Fix sysinv for TLS system local CA secret" 2023-06-27 14:22:52 +00:00
Zuul
c868814fac Merge "Update ptp config generator to accept new phc2sys params" 2023-06-27 14:06:53 +00:00
Andre Kantek
954f1cd666 Sysinv exports OAM firewall L4 ports to system.yaml
This change creates the necessary L4 ports constants to be exported
to system.yaml, to be used by the respective puppet classes.

This is necessary to the next change that will use the constants in
the new implementation to create the OAM firewall, inside sysinv.

The test below validates the correct values are present in the OAM
firewall

Test Plan:
[PASS] Install, Lock, Unlock AIO-SX
[PASS] Install, Lock, Unlock AIO-DX (as SystemController)

Story: 2010591
Task: 48254

Change-Id: Ib5f314eea457f99d1b9834a37f0cfa00b3b1d01f
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-27 07:47:22 -03:00
Marcelo de Castro Loebens
a767d68ae1 Fix sysinv for TLS system local CA secret
The sysinv component of the issue is related to the fact that
subclouds are bootstrapped with a 'system-local-ca' of type Opaque,
but cert-manager migration playbook replaces it by a TLS one. Sysinv
must therefore be able to deal with both of the possible types.

This requires two changes in config:
- In the openldap puppet plugin, making it capable of reading both
types for this secret;
- In conductor, the API called by orchestration while upgrading
subclouds will create the secret or replace the existing one matching
the current type (Opaque or TLS).

This changes will match a change in distcloud, where the key will be
sent alongside the cert always. Conductor will choose when to store
the key is required (TLS) or not (Opaque).

Test plan:
- Deploy SX, DX and DC with both SX and DX subclouds.
- Change the 'system-local-ca' type from Opaque to TLS using
  SystemController's 'system-local-ca' data (*).
- Upgrade SX, DX and DC Systems with SX and DX subclouds from 21.12
  and 22.06 to designer iso 22.12. (**)

P.S.:
(*) This was done in previous releases by cert-manager migration
    playbook and require fixes moving forward in ansible-playbook
    module:
https://review.opendev.org/c/starlingx/ansible-playbooks/+/878916

(**) Due to the existence of an upgrade start script called in the
     'from' side that will overwrite the secret after this code is
     called, this change will only have effects in upgrades moving
     forward.

Partial-Bug: 2012435
Depends-on: https://review.opendev.org/c/starlingx/distcloud/+/882106

Signed-off-by: Marcelo de Castro Loebens <Marcelo.DeCastroLoebens@windriver.com>
Change-Id: I2f044c20bcf402735deb5128ef4638238ac7441a
2023-06-26 18:13:23 -04:00
Ayyappa Mantri
236a907296 Fix to clear the alarms on compute/storage nodes
When remote ldap service parameters are added, the config out-of-date
alarms raised on compute,storage nodes are not cleared, this fix
addresses the issue by updating the personalities with compute,storage
node types for host update configs with reboot=False

Test Cases:
PASS: Add remote ldap service parameters on standard lab with
      compute and storage nodes and apply, verify the alarms
      are cleared.

Closes-bug: 2024916

Change-Id: I3c45d89910db0818daf542edae91b3633cd34173
Signed-off-by: Ayyappa Mantri <ayyappa.mantri@windriver.com>
2023-06-26 12:14:01 -04:00
Zuul
8518100966 Merge "Improve error handling when loading app plugins" 2023-06-23 21:14:09 +00:00
Zuul
925955e9cc Merge "Firewall: enable IGMP for mgmt and cluster-host networks" 2023-06-23 20:29:00 +00:00
Igor Soares
9c6d1a77eb Improve error handling when loading app plugins
Improve error handling when loading app plugins on scenarios that DRBD
fails to sync the helm overrides folder.

This commit fixes a behavior that, in such scenarios, triggers the
generic app plugin which, in turn, raises the following cryptic error
message: "Automatic apply is disabled".

That behavior was changed to raising a SysinvException while attempting
to load the plugins. In addition, the exception now contains a more
descriptive message, clearly stating that there was a failure while
loading the application plugins, also mentioning the specific plugin
folders that could not be found. That exception is handled and logged
so that sysinv can keep running. If DRBD succeeds to sync after failing,
plugins are properly loaded and normal operation can resume.

Test Plan:
PASS: build-pkgs and build-image
PASS: AIO-SX full system deploy
PASS: Simulate the DRBD sync failure by renaming /opt/platform/helm and
      restarting sysinv. Then watch the exception being raised and
      logged referencing the correct plugin folder, rather than
      displaying the "Automatic apply is disabled for
      platform-integ-apps" message. Rename the overrides folder back to
      its original name and check if plugins were correctly loaded.
PASS: Same as above but renaming the folder back and forth while
      sysinv is running.

Closes-Bug: 2024491
Change-Id: Iefa4259fd468a9ae582fc1138b1d1022eba36b0d
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2023-06-23 15:15:53 -03:00
Zuul
4bbf8d7dd8 Merge "Add kubernetes endpoint health-check with timeout" 2023-06-23 14:19:25 +00:00
Caio Bruchert
03acc6e166 Firewall: enable IGMP for mgmt and cluster-host networks
Depending on the IGMP switch configuration between the nodes, IGMP
query and notification packets need to be exchanged. If the firewall
drops IGMP packets, the querier will not receive the IGMP report packets
and as a result multicast packets will not be forwarded by the switch.
Since hbs uses multicast, the controllers will not to able to sync
correctly.

This fix adds both ingress and egress rules to allow IGMP packets on
management and cluster-host networks.

Test Plan
[PASS] check globalnetworkpolicies IGMP rules for mgmt and cluster-host
[PASS] on each controller, check with tcpdump that it's receiving IGMP
      replies and sending IGMP reports
[PASS] on each controller, check with tcpdump that it's receiving hbs
       multicast packets from both controllers
[PASS] check with kubectl that controllers are in sync

Story: 2010591
Task: 48271

Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
Change-Id: Idaca99a1cf774854fd340cce7f52758f053503e6
2023-06-23 09:07:03 -03:00
Zuul
3e69e2b6d2 Merge "Use fully qualified names for WAD users/groups" 2023-06-22 22:48:45 +00:00
Zuul
c802738bb6 Merge "Fix file descriptor leak on zerorpc Client" 2023-06-22 22:11:49 +00:00
Alyson Deives Pereira
3618b76d94 Fix file descriptor leak on zerorpc Client
Reuse zerorpc client with client_provide.get_client_for_endpoint() to
avoid leaking file descriptors.

Test Plan (AIO-DX):
PASS: Without this change, set controller-0 to not use zeromq and
configure controller-1 to use RPC hybrid_mode. Verify the number of
fd files on controller-1 at /proc/<sysinv-agent-pid>/fd keeps
increasing over time.

PASS: With this change on controller-1 and the same configuration
above, verify that the number of fd files on controller-1 does not
increase over time.

PASS: With this change on controller-1, update controller-0 to use
zeromq instead of rabbitmq, and keep controller-1 on hybrid mode.
Verify that no errors occurs on logs and that the number of fds do not
 increase.

Closes-Bug: 2024834
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
Change-Id: I85733533b1ff3a2ef869ae2b23730527fc24466e
2023-06-22 17:59:19 -03:00
Carmen Rata
37ec9dcab8 Use fully qualified names for WAD users/groups
WAD users and groups discovered by SSSD and imported in the stx
platform have been configured to get Linux IDs. When there are
users or groups with the same name in multiple WAD domains they
need to be unequivocally identified by stx platform NSS, using
the full name format "user_name@domain_name".
This commit sets the sssd attribute "use_fully_qualified_names" to
"true", the default value being "false". The setting ensures that
user's full login name, "user_name@domain_name" gets reported to NSS.
So, with this change, SSSD discovered users and groups would get
fully qualified names on stx platform. All requests pertaining a
WAD domain user or group must use the format "name@domain", for
example "getent passwd user1@wad.domain1.com".
This commit also removes 2 WAD domain attributes that are obsolete.

Test Plan:
PASS: Debian image gets successfully installed in AIO-SX system.
PASS: Configure SSSD to connect to 2 WAD domains, "wad.domain1.com"
and "wad.domain2.com".
PASS: Create 2 users with the same name "test-user", one in
wad.domain1 and the other in wad.domain2. Check using "getent passwd"
command that SSSD has cached the users with the fully qualified
name: "getent passwd test-user@wad.domain1.com" and "getent passwd
test-user@wad.domain2.com".
PASS: Check that "getent passwd test-user" or "getent passwd|grep
test-user" does not find the users.
PASS: Verify ssh works using the fully qualified names for the users.
PASS: Verify that 2 groups with the same name, "test-group", created
one in wad.domain1 and the other in wad.domain2 follow the same rules
as users with the same names.
PASS: Add test-user from wad.domain1 to the test-group in the same
domain and verify membership. Verify that test-user in wad.domain2 does
not belong to "test-group" in wad.domain1.

Story: 2010589
Task: 48270

Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
Change-Id: I34388e2f1389cb39a6d258b126572e6a72308b40
2023-06-22 04:03:22 +00:00
Cole Walker
51bf24e351 Update ptp config generator to accept new phc2sys params
Updated the logic networking.py which generates the puppet hieradata for
ptp configurations to accept a new category of parameters which will be
used for the Redundant / HA PTP timing clock sources feature.

This change checks for the presence of the "phc2sys_ha=enabled"
parameter supplied via user configuration and then allows additional
phc2sys_ha prefixed parameters to be included in the phc2sys config.

This allows the Redundant timing feature and all related parameters to
be enabled or disabled by setting/removing the above value.

Test Plan:
PASS: Verify phc2sys config is generated and applied when phc2sys_ha is
disabled
PASS: Verify phc2sys config is generated and applied when phc2sys_ha is
enabled, additional phc2sys_ha prefixed params are also allowed to
appear in the config as a result
PASS: System lock/unlock succeeds after code changes

Story: 2010723
Task: 48262

Change-Id: I72e0c9e615affe467a83554d3d7f6a732efccaf8
Signed-off-by: Cole Walker <cole.walker@windriver.com>
2023-06-21 12:15:56 -04:00
Zuul
2b9ea72538 Merge "Eliminate unused function in sysinv helm" 2023-06-20 20:13:01 +00:00
Joshua Reed
ef87a1b523 Eliminate unused function in sysinv helm
The file base.py in helm was identified for having a
function that is not used externally from the BaseHelm
class.

The function was verified to be unused anywhere else
in the sysinv code base outside the helm folder,
or anywhere else as well.

The function was eliminated and uses found in helm.py
were refactored.

Test Plan:
PASS: build-pkg -a && build image
PASS: AIO-SX full install with clean bootup.
PASS: Verified initial system applications installed
      correctly;  system application-list
PASS: Installed a new application to verify a new
      app can be installed cleanly.
      Used the metrics-server app in
      /usr/local/share/applications/helm to do so.

Story: 2010794
Task: 48246
Change-Id: I9325b3bbc3b10a465cb3535d10042461cc713c9d
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
2023-06-20 11:57:38 -07:00
Zuul
40d0d3f01c Merge "Enable firewall for DC setups" 2023-06-20 17:59:17 +00:00
Zuul
be9a075fb1 Merge "Add patch validation importing a inactive load" 2023-06-20 13:19:45 +00:00
Boovan Rajendran
a4e440f31f Add kubernetes endpoint health-check with timeout
This checks k8s control-plane component health for a specified
endpoint, and waits for that endpoint to be up and running.
This checks the endpoint 'tries' times using a API connection
timeout, and a sleep interval between tries.

The default endpoint is the localhost apiserver readyz URL
if not specified.

Test plan:
Pass: Verified during k8s upgrade abort it waits for all control-plane
endpoints to be healthy.

Story: 2010565
Task: 48215

Change-Id: I9aae478765cf8aa13b7769127a87e18c33b5fe0b
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2023-06-20 03:40:41 -04:00
Zuul
33b6de2a12 Merge "Aborting kubernetes upgrade process for AIO-SX" 2023-06-16 19:27:07 +00:00
Zuul
49d74236f0 Merge "Relocate pxeboot-update script to writable dir" 2023-06-16 19:27:02 +00:00
Andre Kantek
682d17f18f Enable firewall for DC setups
As in the case for non-DC installations, internal cluster traffic for
the platform networks will receive a firewall that allows only packets
within the internal networks, by filtering only with source IP address
and not using L4 ports.

It will restrict traffic between the system-controller and the
subclouds to the L4 ports described in:
https://docs.starlingx.io/dist_cloud/kubernetes/distributed-cloud-ports-reference.html

They also restrict the L4 ports to only the networks involved, the
subcloud only accepts traffic from the system controller and in the
system-controller from the subclouds.

The DC rules are applied in the management network (or the admin
network, if used in a subcloud).

Test Plan
[PASS] Install DC system-controller with firewall active
[PASS] Install DC subcloud with firewall active (using management
       network on both sides)
[PASS] Modify subcloud to use admin network during runtime
[PASS] Validate that only the registered firewall ports are
       accessible from system-controller to subcloud
[PASS] Validate that only the registered firewall ports are
       accessible from subcloud to system-controller
[PASS] Execute a subcloud rehoming

Story: 2010591
Task: 48244

Change-Id: I4d27baa601d7f9b43e6c09e703a548656f8846f4
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-16 11:54:52 -03:00
Joshua Reed
50aa301e6d Correct Log bug in sysinv conductor.
In file kube_app.py, the class FlucCDHelper has a function
called make_fluxcd_operation.  Inside there is a LOG.error
with an invalid python string formatter.  The manifest_dir
string is missing, and this change corrects this issue.

Test Plan:
PASS: build-pkg -a &&  build-image
PASS: full AIO-SX install
PASS: run system application-upload
          /usr/local/share/applications/helm/security-profiles-operator-22.12-1.tgz
      run system application-list to validate the security
          profile application is uploaded successfully
      run system application-apply security-profiles-operator to deploy
          the application
      Last: observe the corrected log output in /var/log/sysinv.log

Closes-Bug: 2024020
Change-Id: Icfffd04309721193b71654927751b783b9c6ace2
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
2023-06-15 12:44:10 -07:00
Zuul
d2039c40c0 Merge "Restart vim on admin endpoint re-config" 2023-06-15 16:47:36 +00:00
Zuul
fb2636273f Merge "Update System Inventory semantic checks to permit pci-sriov AE members." 2023-06-15 13:22:45 +00:00
Guilherme Schons
3952cf4d7f Add patch validation importing a inactive load
This commit add a patch validation to check if the importing load is
patch compatible with the current version.

This validation uses the current metadata against the comps file inside
the patches directory, if it exists.

Test Plan:
PASS: Import a CentOS pre-patched image upgradable to current version.
PASS: Fail to import a CentOS pre-patched image non upgradable to
current version.

Story: 2010611
Task: 48198

Signed-off-by: Guilherme Schons <guilherme.dossantosschons@windriver.com>
Change-Id: I236222fd21de7004ebbfcf585f9e30d7418777c4
2023-06-14 10:09:11 -03:00
Zuul
f0cb88013a Merge "Add static route subnets to the firewall if in mgmt or admin nets" 2023-06-13 17:41:00 +00:00
Zuul
e8224cb2bd Merge "Add firewall update when admin network is updated in subclouds" 2023-06-13 17:40:54 +00:00
Mayank Patel
3aa6294ae3 Update System Inventory semantic checks to permit pci-sriov
AE members.

The current link aggregation (bonding) of platform interfaces is
restricted to ports that have an interface class of none.
The semantic check need to removed to permit the configuration
of Aggregated Ethernet (AE) interfaces  with member interfaces
that have a class of pci-sriov. This would create a bond
interface of the SRIOV Physical Function (PF) network devices.

Test Plan:
PASS: system host-if-modify -c pci-sriov -N 10 controller-0
      enp179s0f0
PASS: system host-if-modify -c pci-sriov -N 10 controller-0
      enp179s0f1
PASS: system host-if-add -c platform --aemode active_standby
      controller-0 bond0 ae enp179s0f0 enp179s0f1
PASS: system host-if-delete controller-0 bond0
PASS: port sharing

Story: 2010706
Task: 47903
Change-Id: I8d85bbab88e3d55173bc6db012299b45ec091512
Signed-off-by: Mayank Patel <mayank.patel@windriver.com>
2023-06-13 13:20:55 -04:00
Steven Webster
d754e4a261 Restart vim on admin endpoint re-config
A problem can occur when installing a subcloud with the admin
network enabled.

The symptom is the host not enabling VIM services for approximately
one hour after the first unlock.  This may manifest in a user
noticing that the platform-integ-apps application is not enabled.

For context, after ansible bootstrap, sysinv re-configures the
admin endpoint for all services.  This is because (for one reason),
https services cannot be enabled at ansible bootstrap time.

The nfv-vim services depend on using the admin endpoints when
communicating with sysinv. This can lead to a situation where
the VIM service starts before the endpoints are re-configured by
sysinv and cannot then deal with the change in endpoint address/port.

This is likely not seen when a subcloud uses the management
network, as the internal management endpoint can also service the
requests from the VIM.

This commit simply calls into the nfv-vim runtime puppet class to
restart the VIM service on endpoint reconfigure.

Testing:

- Install system controller, ensure it comes up with no issues
- Install subcloud, ensure it comes up and that there are no
  errors in the nfv-vim log after endpoint reconfigure. In
  addition, the symptom that first prompted this bug - that is
  the delay in the application of the platform-integ apps is
  no longer delayed by ~ 1 hour. Perform test with subcloud
  using both the management and admin network for communication
  with the system controller

Story: 2010319
Task: 46910

Change-Id: I677b87e949cceba77240bc62217af3889a697b40
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2023-06-13 10:19:51 -04:00
Andre Kantek
a4aaf6853e Add static route subnets to the firewall if in mgmt or admin nets
This change adds, for subclouds, the static routes subnets to the
appropriate firewall, mgmt, or admin network, to restrict the set of
L4 ports available in distributed cloud setups to be used only from
the system controller.

It collects the static routes that are using the management or admin
networks and adds them to the respective firewall, in a similar way
that is done in the system controller.

Test plan:
[PASS] Install subcloud using management network to connect with
       system controller
[PASS] Modify subcloud in runtime to use the admin network instead of
       management to connect with system controller
[PASS] Execute a subcloud rehoming operation.

Story: 2010591
Task: 48185

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/885457
Change-Id: I2b96c364f0c69d54c08bc5e157f60be335d2b114
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-13 11:12:33 -03:00
Boovan Rajendran
b88a2dee7a Aborting kubernetes upgrade process for AIO-SX
This change is to introduce a new system command
"system kube-upgrade-abort" which will abort the k8s upgrade
process for an AIO-SX.

The expected sequence on AIO-SX is:
 - system kube-upgrade-start <target-version>
 - system kube-upgrade-download-images
 - system kube-upgrade-networking
 - system kube-host-cordon controller-0
 - system kube-host-upgrade controller-0 control-plane
 - system kube-host-upgrade controller-0 kubelet
 - system kube-host-uncordon controller-0
 - system kube-upgrade-complete
 - system kube-upgrade-delete

For system kube-upgrade-start and system kube-upgrade-download-images,
when we abort using system kube-upgrade-abort will just change the
upgrade state as aborted since it doesn't affect anything.

For below mentioned sequence on AIO-SX we can abort the k8s
upgrade at any stage here we call a puppet class
'platform::kubernetes::upgrade_abort' which will drain the node,
stop the kubelet, containerd, docker and etcd services, restore the
etcd snapshot, static manifests files and start the etcd, docker and
containerd services, update the bindmount, start kubelet service
and wait for control plane pod health.

 - system kube-upgrade-networking
 - system kube-host-cordon controller-0
 - system kube-host-upgrade controller-0 control-plane
 - system kube-host-upgrade controller-0 kubelet
 - system kube-host-uncordon controller-0

The initial Kubernetes version control plane state is stored in a backup
containing etcd snapshot and static-pod-manifests. This backup is taken
when 'system kube-upgrade-networking' is issued.

The "system kube-upgrade-abort" command will only be available prior to
the "system kube-upgrade-complete".

Test Case:
AIO-SX: Fresh install ISO as AIO-SX. Perform k8s upgrade v1.24.4 -> v1.25.3
PASS: Create a test pod, before the etcd backup and delete the pod
after taking snapshot run the command "system kube-upgrade-abort",
verify test pod is running after etcd is restored successfully.
PASS: Verified by performing initial bootstrap and host-unlock prior to
bootstrap.
PASS: Verify kubeadm and kubelet version restored successfully to the
from version after k8s upgrade abort.
PASS: Verify static manifest are restored successfully after k8s
upgrade abort.
PASS: Verify /etc/fstab content updated successfully after k8s upgrade
abort.
AIO-DX:
PASS: "system kube-upgrade-abort" will raise an error
'system kube-upgrade-abort is not supported in duplex'.

Story: 2010565
Task: 47826

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/880263
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/883150

Change-Id: Ia18079c5f17f86fb73776bfad124c72a0a3be6ad
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2023-06-13 03:36:03 -04:00
Zuul
45d47fdfaf Merge "Fix rollback for Standard config" 2023-06-09 21:19:49 +00:00
Zuul
c77700fa61 Merge "Playbooks directory cleanup on system load-delete" 2023-06-09 20:12:03 +00:00
Zuul
d4b061b26f Merge "Add unit tests to load-delete workflow" 2023-06-09 20:11:57 +00:00
Luiz Felipe Kina
3093104db0 Fix rollback for Standard config
After upgrading the second controller on a Standard config and rolling
back a software upgrade on 22.12, the current upgrade abort procedure
broke the Ceph monitor quorum resulting in blocked operations to the
Ceph cluster.

During a platform upgrade of a standard controller deployment, two of
three Ceph monitors are required to be unlocked/enabled to maintain a
Ceph quorum and provide uninterrupted access to persistent storage. The
current procedure and semantic checks required the operator to lock the
3rd monitor on a worker while locking and rolling back the software on
one of the active controllers. This left only one monitor active and
the Ceph monitors without a an active quorum

Another problem was that ceph on controller-1 was being
turned off and not turned back on.

The change is on the semantic check to make sure that there doesn't
need to lock the host which has a monitor and that ceph on controller-1
is up during the entire process of rolling-back on controller-0.

Test Plan:
Pass: Install Standard 21.12 Patch 9, upgrade to 22.12
make the changes and rollback the upgrade.
Pass: Install Storage 21.12 Patch 9, upgrade to 22.12
make the changes and rollback the upgrade.

Closes-Bug: 2022964

Signed-off-by: Luiz Felipe Kina <LuizFelipe.EiskeKina@windriver.com>
Change-Id: I607a0b2bbf2fa847e8b76425ea5f940be3a81577
2023-06-09 13:12:30 -04:00
Zuul
20e16db383 Merge "SX host-lock failed by "Timeout while waiting on R"" 2023-06-09 04:51:28 +00:00
Guilherme Schons
38003607e3 Playbooks directory cleanup on system load-delete
Remove playbooks directory from the imported load on the system
controller and its mate as part of system load-delete.

This directory will only exist on a DC system controller

Test Plan:
- PASS: Import an inactive load CentOS and clean up all related
directories and files after deleting.

Story: 2010611
Task: 48159
Signed-off-by: Guilherme Schons <guilherme.dossantosschons@windriver.com>
Change-Id: I191e9b0d13fc9954da8be41df7df717d4f1f5b8c
2023-06-08 20:51:34 +00:00
Zuul
96de678403 Merge "Add cluster-pod network to cluster-host firewall in IPv4" 2023-06-07 18:20:57 +00:00
Zuul
4eb916fb2e Merge "Add functionality for intel qat device plugin" 2023-06-07 14:08:37 +00:00
Andre Kantek
9269aafdbf Add cluster-pod network to cluster-host firewall in IPv4
It was observed that is not necessarily true that all pod traffic
is tunneled in IPv4 installations. To solve that we are extending the
solution done in IPv6 to IPv4, which consists in adding the
cluster-pod network into the cluster-host firewall

The problem showed itself when the stx-openstack application was
installed.

Test Plan:
[PASS] observe stx-openstack installation proceed with the correction

Closes-Bug: 2023085

Change-Id: I572cd85e6638d879d8be1d9992ae852a805eca4b
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-07 09:50:47 -03:00
Lucas Ratusznei Fonseca
ea383425ed Inclusion of the subcloud networks in the firewall rules
This commit implements the inclusion of the subcloud networks in the
firewall ingress rules. The networks are obtained from the routes
table.

Test plan:

Setup: Distributed Cloud with AIO-DX as system controller.

[PASS] Add subcloud, check that the corresponding network is present in
       the system controller's firewall.
[PASS] Remove subcloud, check that the corresponding network is no
       longer present in the system controller's firewall.

Story: 2010591
Task: 48139
Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/885303
Change-Id: Ia83c26c88914413026953fcef97af55fe65bd058
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
2023-06-05 15:50:06 -03:00
Andre Kantek
2a662a4572 Add firewall update when admin network is updated in subclouds
This change adds the firewall update when the interface-network API
is executed with the admin network

Test Plan:
[PASS] add a admin network during runtime in a subcloud

Story: 2010591
Task: 48186

Change-Id: I7b556ec8d95bf879cb9036d654e38fd658da5a61
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-05 15:29:20 -03:00
Guilherme Schons
393a839630 Add unit tests to load-delete workflow
This commit add unit tests to load-delete worklow, improving test
coverage and validating the flow for future changes.

Test Plan:
- PASS: Tox tests

Story: 2010611
Task: 48159
Signed-off-by: Guilherme Schons <guilherme.dossantosschons@windriver.com>
Change-Id: If6a27af3a76d4aff8e4168b72ad5892046fe9ba6
2023-06-05 10:52:40 -03:00