This commit adds an upgrade-script to enable and configure IPsec on
multi-node systems. It is required that IPsec is enabled on systems
after all upgrade-scripts are executed to prevent any occurrence of
network instability.
This script should prepare active controller environment and execute
initial-auth operation on each node pending to be IPsec configured.
An ansible-playbook is executed to contact and trigger initial-auth
operation request from other nodes to IPsec server. As a result of
the execution of the playbook, IPsec is configured on nodes. If any
node is missing to be configured, the script exits w/ an exception.
Notice that mtce_heartbeat_failure is updated to its default value
only after IPsec is successfully enabled per the execution of this
ansible-playbook.
The IPsec server port is set to 64764 as 54724 may be used for k8s
services.
Test Plan:
PASS: Deploy AIO-DX system and upgrade software version from stx 8 to
stx 9. Observe that 100-enable-ipsec-on-hosts.py script is
executed successfully and IPsec is enabled/configured on all
nodes. The nodes remain online on unlocked enabled available
state.
PASS: Deploy AIO-DX system on stx 9 version and manually execute
100-enable-ipsec-on-hosts.py script. Observe that IPsec is
already enabled/configured on all nodes, script is successfully
executed with no additional changes applied on system and nodes
remain online on unlocked enabled available state.
Depends-on: https://review.opendev.org/c/starlingx/ansible-playbooks/+/923294
Story: 2010940
Task: 50720
Change-Id: I3b3fde8f18d6c3f6d9f3ad548ff633aaabf40362
Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
API does not allow a second datanetwork assignment to an interface that
already has a datanetwork assigned. This limitation should just be
applied to SR-IOV interfaces since SR-IOV plugin v3.5.1 does not allow
that anymore.
Testplan
========
[PASS] Assign a SRIOV interface and datanetwork that already exists.
Configuration should not be allowed.
[PASS] Assign multiple datanetworks to same SRIOV interface.
Configuration should not be allowed.
[PASS] Assign multiple datanetworks to one data interface. Configuration
must be allowed.
[PASS] Assign multiple data interfaces to one datanetwork. Configuration
should not be allowed.
[PASS] Assign a data interface to datanetwork that already exists.
Configuration should not be allowed.
[PASS] Assign multiple datanetworks to one pci-passthrough interface.
configuration must be allowed.
[PASS] Assign a pci-passthrough interface to datanetwork that already
exists. Configuration should not be allowed.
Closes-Bug: 2075961
Change-Id: If7aa6f2c3dc148761b09d0b4d0ee4ea1efccf8bc
Signed-off-by: Ferdinando Terada <ferdinando.godoyterada@windriver.com>
This commit is to fix the 98-update-isystem-data.py script database
connection issue introduced in [1].
During the activate stage, the database doesn't allow the peer authentication.
To connect to the database, the username/password is required.
[1] https://review.opendev.org/c/starlingx/config/+/924072
Test Plan:
PASS: build and deploy iso on SX
PASS: run the USM upgrade to next release
Task: 50555
Story: 2010676
Change-Id: I603f1c2adb17c027b6094526e6f010ffdcdefb7d
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
This script was not taking in account that default-ipv4-ippool does not
exist on a IPv6 setup and therefore it was failing and broke the upgrade
process.
This fix tests the presence of default-ipv4-ippool before using it.
Test Plan:
PASS: execute the script on a IPv4 lab (IPIP disabled)
PASS: execute the script on a IPv6 lab (no errors)
Story: 2011124
Task: 50717
Change-Id: I59ccaad63c388fed67682571ff9f166647d8c27c
Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
After running system interface-network-assign for the MGMT interface,
the mgmt-mac is not updated in the database and system host-show shows
it as "00:00:00:00:00:00"
There were two checks preventing the database update in case the
bootstrap flag was not set. Both checks are being removed to fix this
issue.
Test Plan:
PASS: SX: system interface-network-assign and check that mgmt_mac is ok
PASS: DX: install system and check that mgmt_mac is ok
Closes-Bug: 2075519
Change-Id: I07a1af3e661d7572e849e75a90979d872e9c9cc5
Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
To improve the HA of Ceph in AIO-DX, a three Ceph monitor setup is
being configured. One new monitor will be running on each controller
using the new Ceph filesystem provided by host-fs command. This
filesystem will be applied automatically on both controllers during
Ceph configuration.
The floating monitor will still be running and will run on the active
controller only.
Test Plan:
PASS: Fresh install AIO-DX.
PASS: Fresh install AIO-SX.
PASS: Install AIO-DX without Ceph configured and configure Ceph in
runtime.
PASS: Fresh install Standard (2+2) and (2+2+2).
PASS: Resize Ceph monitor volume using ceph-mon-modify command with
available space and without available space in cgts-vg. The host-fs
ceph-lv must have the same size updated.
PASS: Apply ceph after already configured host-fs ceph filesystem.
Story: 2011122
Task: 50131
Depends-On: https://review.opendev.org/c/starlingx/config/+/918451
Depends-On: https://review.opendev.org/c/starlingx/integ/+/914913
Change-Id: I1c63d59dad09e8ec5f51748dd97360cce4072ce4
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
This commit is to update the software_version field in i_system
table during platform upgrade. This script is executed during
the activate or activate-rollback stage.
The software_version field can only be updated directly in database
operation. This is the same as what the legacy upgrade uses.
Test Plan:
PASS: build and deploy the iso on SX
PASS: manually run the script
Task: 50555
Story: 2010676
Change-Id: I3e901b780766d7bad3b6b05c00adefb99defbbd5
Signed-off-by: junfeng-li <junfeng.li@windriver.com>
Update the 65-k8s-app-upgrade.sh script to use the new "sysinv-app
query <k8s_target>" command. The return of this command overrides
the hard-coded list of applications that should be checked for
updates. Using the "sysinv-app query" tool, the script can now be
generic and update all applications regardless of the hardcoded
list, which was now removed.
Test plan:
PASS: build-pkgs && build-image
PASS: Install default and non-default apps
PASS: upgrade from stx-9 to master
PASS: 65-k8s-app-upgrade.sh script runs without errors
PASS: All apps have been updated successfully
Story: 2010929
Task: 50604
Change-Id: Id7d28f6c46f3417d2588843cf76be7762c1b03d6
Signed-off-by: David Bastos <david.barbosabastos@windriver.com>
After performing a DX subcloud rehoming, updating the K8S RCA was
failing due to the pods scheduled in c1 not being able to pull
images from the local registry.
Investigation showed that the cause was the secrets containing the
auth data for user 'sysinv' not being updated after the keystone
password is changed during rehoming.
There is an audit task that runs once a day and updates the auth
data inside the secrets. This task is also triggered by a callback
when keystone password changes for the user 'admin'. This review adds
the callback for the user 'sysinv' as well.
Test plan:
PASS: Deploy DC + DX subcloud
Rehome DX subcloud and perform kube-root-ca-update
Closes-bug: 2074386
Change-Id: Ia42e2a33342374d5e2c98b9b95b0713595783aa1
Signed-off-by: Marcelo de Castro Loebens <Marcelo.DeCastroLoebens@windriver.com>
Pods in default namespace should be affined to shared pool
(platform + application - application-isolated) In kernel 6.6,
when kube-cpu-mgr-policy is none, all pods in default namespace
are getting affined to platform cores.
This change prevents setting the policy to none.
Test Plan:
PASS: Lock the host, try setting kube-cpu-mgr-policy=none, verify
that it prevents setting it.
Closes-Bug: 2073781
Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/924655
Change-Id: I6ce5f6e0f97b17147e287010fedab20626e1165e
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
Ceph will not work on IPv6 with kernel 6.6 if it also has the
ms_bind_ipv4 option enabled.
By default, the system is disabling ms_bind_ipv6 for IPv4-only clusters.
The same behavior is expected disabling ms_bind_ipv4 for IPv6-only.
Otherwise, each Ceph service tries to bind Ipv4 first, leading to daemon
miscommunication and unabling to mount RBD and CephFS volumes.
Read more: https://www.spinics.net/lists/ceph-users/msg73459.html
Depends-on: https://review.opendev.org/c/starlingx/integ/+/925015
Closes-Bug: 2074226
Test Plan:
PASS: Build all packages and generate a custom ISO
PASS: Testing Read/Write pod access using AIO-SX IPv4
PASS: Testing Read/Write pod access using AIO-DX IPv4
PASS: Testing Read/Write pod access using AIO-SX IPv6
PASS: Testing Read/Write pod access using AIO-DX IPv6
Change-Id: I8024459f150f12961a68ae709e7d72602a1c3d3c
Signed-off-by: Hediberto C Silva <hediberto.cavalcantedasilva@windriver.com>
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
This change added policies in IPsec configuration to bypass platform
services that are already secured in other means. The bypassed services
by port number are: 22, 443, 8443, 9001, 9002, 3300, 6789, 6800:7300
Test Plan:
PASS: Multi nodes system deployment. Verify the bypass policies for the
services are defined in /etc/swanctl/swanctl.conf on all nodes.
PASS: On one node run the tcpdump command:
tcpdump -n -i ens785f0 | grep "fdff:10:80:18.* fdff:10:80:18"
| grep fdff:10:80:18::2ea7.22
where "ens785f0" is the node's mgmt network IF, "fdff:10:80:18"
is the mgmt subnet, "fdff:10:80:18::2ea7.22" is the node's mgmt
network address and service port (22 is the sshd service)
ssh to the node, with the bypass policy, the command will show
some non-ESP traffic between the source and destination nodes.
PASS: On one node run a service included in the bypass policies by
netcat (there should be no existing service listening on that
port):
nc -l 7300
On the same node run the tcpdump command:
tcpdump -n -i ens785f0 | grep "fdff:10:80:18.* fdff:10:80:18"
| grep fdff:10:80:18::2ea7.7300
where "ens785f0" is the node's mgmt network IF, "fdff:10:80:18"
is the mgmt subnet, "fdff:10:80:18::2ea7.7300" is the node's mgmt
network address and service port (7300 is the service by nc).
On another node, access the netcat service by:
nc -v <node where the netcat service running> 7300
the tcpdump command will show some non-ESP traffic between the
source and destination nodes.
PASS: On one node run a service excluded in the bypass policies by
netcat (there should be no existing service listening on that
port):
nc -l 7301
On the same node run the tcpdump command:
tcpdump -n -i ens785f0 | grep "fdff:10:80:18.* fdff:10:80:18"
| grep fdff:10:80:18::2ea7.7301
where "ens785f0" is the node's mgmt network IF, "fdff:10:80:18"
is the mgmt subnet, "fdff:10:80:18::2ea7.7301" is the node's mgmt
network address and service port (7301 is the service by nc).
On another node, access the netcat service by:
nc -v <node where the netcat service running> 7301
the tcpdump command will NOT show any non-ESP traffic between the
source and destination nodes, because they are encapsulated in
ESP.
Story: 2010940
Task: 50661
Change-Id: I220d7eddf6aab2f33d8fd882adfb75a76edc3320
Signed-off-by: Andy Ning <andy.ning@windriver.com>
Currently the default IPv4 IPPool configuration sets ipipMode to always,
enabling Calico to use an overlay networking for the pods to communicate
between different nodes using IPIP encapsulation. This overlay has
caused some problems in the past and, since it's not needed, it will be
removed by this change.
The default IPv6 IPPool already uses the flat networking model, so it
needs no changes.
Besides a supporting fresh install with IPIP disabled, release upgrades
should also support disabling IPIP during data migration. That's
implemented through a new upgrade script.
Note: it was not possible to test a full upgrade using AIO-DX due to USM
bugs. This test will be done again when USM is stable.
Test Plan:
PASS: AIO-DX: fresh install: check pod communicating without encap
PASS: AIO-SX: upgrade: check that IPIP was disabled
PASS: AIO-SX: upgrade rollback: check that IPIP was re-enabled
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/924496
Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/924495
Story: 2011124
Task: 50616
Change-Id: I2d3ed96c4b60173ccceb2f55b52d54c528d0abb2
Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
During subcloud enrollment, the openstack runtime manifest are
expected to update the service configurations on a non-DC system(not
yet to be updated as a subcloud). This commit allows the puppet
runtime manifest to be applied after receiving the notification from
keystone listener after the service user's passwords are updated in
keystone.
Test plan:
Passed - Build an ISO with this change and bootstrap a AIO-SX and DC
system controllers.
Passed - Verified the keystone users' passwords are updated by puppet
runtime manifest after updating in keystone.
Story: 2011100
Task: 50666
Signed-off-by: Yuxing Jiang <Yuxing.Jiang@windriver.com>
Change-Id: I01373fee403165c1396387e60732f49fe9a76fa2
This commit enables the generation of the hieradata necessary to
configure dcagent on a subcloud.
Test plan (shared with puppet story):
- PASS: Bootstrap and unlock a subcloud. Verify that the endpoints
were correctly configured, the dcagent.conf file has all
necessary information and haproxy.conf includes the
dcagent entry.
- PASS: Launch the service and verify dcmanager can audit the
subcloud with dcagent.
- PASS: Bootstrap and unlock a system controller. Verify the
Keystone user and service for dcagent were created without
creating any endpoints.
- PASS: Run 'sm-restart service dcagent-api' and verify the dcagent
was correctly restarted and service is working as expected.
- PASS: Induce a failure in dcagent code. Verify sm correctly
restarts the service until the failure is corrected.
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/923696
Story: 2011106
Task: 50563
Change-Id: I94823a8bdd4ad49a1ec67b24604e7123853de081
Signed-off-by: Victor Romano <victor.gluzromano@windriver.com>
To support upgrades with legacy certificate configuration (where the
HTTPS and local Docker registry certificates are not managed by
cert-manager), the upgrade activation script that handles the
creation/update of these certs (#41) is being modified to avoid
overwrite user provided certificates for auto generated ones.
If the certificates aren't managed by cert-manager and aren't
self-signed certs generated by the platform, the certificates and
keys are stored in the expected secrets (system-restapi-gui-certificate
and system-registry-local-certificate), without creating the CRDs
for cert-manager to issue new ones from system-local-ca.
This avoids that the current user provided certificates are replaced
by new ones issued by system-local-ca, allows the certs to still show
in the certificate list CLI call as other K8S secrets and be monitored
for expiration. The certificates in this case will not be auto renewed.
In other cases, the certificates will still be issued from
system-local-ca and managed by cert-manager.
Test plan:
PASS: Upgrade AIO-SX. Tested upgrade activation in these cases:
- HTTPS and Docker Registry self-signed and issued by the
platform.
- HTTPS and Docker Registry issued from a ICA, not managed
by cert-manager.
- HTTPS and Docker Registry managed by cert-manager.
P.S.: Activate rollback will be tested as a follow up activity.
Story: 2009811
Task: 50654
Change-Id: I37858dafad1d3087442a7a68afe92c3c834502dd
Signed-off-by: Marcelo de Castro Loebens <Marcelo.DeCastroLoebens@windriver.com>
Fixes the issue where the installation failed due to the error message
"Could not find command 'configure'" in controller-1's puppet log.
The 'command' is due to missing join_cmd which is generated by :
`get_host_config()` in `sysinv/puppet/kubernetes.py`.
Test Plan:
[PASS] DC(system controller- 2+1)
[PASS] fresh install/Host lock/unlock/swact
[PASS] no alarm found
Closes-Bug: 2074093
Change-Id: Iaf38ce6d990baec3fc68c4f2cf24c7a50833bbba
Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>
Removed ‘not’ during rebase or comment fixing. The condition should only be true if all conditions fail, so need to negate the test condition.
Testing:
PASS: The build was completed with the creation of a Debian package
PASS: The designer image build has been completed.
PASS: Verify service parameter configuration using valid/invalid values
Story: 2011056
Task: 49898
Change-Id: I435533b17a9c013ebf44c75b0c63cffc1d6bc172
Signed-off-by: AbhishekJ <abhishek.jaiswal@windriver.com>
A new approach to getting the chart namespace was introduced by
review: https://review.opendev.org/c/starlingx/config/+/922932.
This solution consists of getting the helmrelease namespace
resulting from the command "kubectl kustomization <fluxcd-dir>"
However, this generated a side effect. The "kubectl kustomize"
command by default uses the kustomization.yaml file to build
the set of KRM resources.
However, if the application had disabled charts, the "system
application-delete" command would fail, as the disabled charts are
not listed in kustomization.yaml. In this case, the correct option
is to use the kustomization-orig.yaml file to generate the set of
KRM resources.
The solution consists of, when include_disabled=true, copying the
FluxCD folder to a temporary folder, and then deleting the
kustomization.yaml file and renaming kustomization-orig.yaml to
kustomization.yaml. The "kubectl kustomization <temp-fluxcd-dir>"
command now uses the temporary folder. At the end of the process,
the folder is deleted.
Test Plan:
PASS: Upload, apply, remove and delete platform-integ-app
PASS: Upload, apply, remove and delete metrics-server
PASS: Upload, apply, remove and delete Kubevirt
PASS: Upload, apply, remove and delete dell-storage
PASS: Upload, apply, remove and delete NFD
PASS: Upload, apply, remove and delete
intel-device-plugins-operator
Closes-bug: 2073888
Change-Id: I3d7f969b9eb18a2e262cccf1ce981818d2576d9e
Signed-off-by: David Bastos <david.barbosabastos@windriver.com>
Added logging instructions to sysinv-api startup script to measure time
SM actions (status, start, stop) take.
Test Plan:
PASS: AIO-DX installation, after installation, verify time for SM
actions are logged in /var/log/daemon-ocf.log
Story: 2010940
Task: 50621
Change-Id: Ia448008c595a022d619e0477a5cd9cb1d9209a03
Signed-off-by: Andy Ning <andy.ning@windriver.com>
This change introduces support for the OPTIONS HTTP method
in the root controller of the sysinv-api.
Additionally, the OCF monitor function is updated to send
an HTTP OPTIONS request using `curl` to check the health
of the sysinv-api service. This ensures that the service
is properly monitored and health checks are performed
reliably.
To facilitate this, the sysinv-api now supports an
unauthenticated endpoint using the OPTIONS method, allowing
for health checks without requiring keystone authentication.
Test Plan:
[PASS] Verified that the OPTIONS method correctly lists the
allowed methods
[PASS] Confirmed that the OPTIONS method responds with HTTP
204 No Content, indicating successful handling of the request.
[PASS] Verify simulate fail path by making sysinv-api non-responsive
to OPTIONS request
[PASS] Verify system host-lock/unlock and system host-swact
[PASS] no failed services (fm alarm-list; sudo sm-dump)
[PASS] Verify sm-restart
[PASS] Verify monitor service works fine in IPV6 lab
Story: 2011106
Task: 50614
Change-Id: Ibe97359de40a60db8d2d84fac8200443c77b7c85
Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>
A change in the approach of how WAD users and groups are mapped
to Linux groups prompted this change in the SSSD configuration.
In the current implementation the mapping of WAD user and group
IDs is based on Posix schema IDs such as uidNumber and gidNumber.
This approach comes with extra work on the WAD server side to
create sudo rules for the WAD users/groups that are going to be
awarded sudo privileges. In addition, WAD users and groups
need to be configured with uidNumber and gidNumber in order to
have their group membership mapped to a Linux group.
A simplified approach is to use Linux pam_group configuration to
map/bind LDAP user and groups to Linux groups in order to award
stx platform Roles and Permissions to LDAP users/groups.
The new approach that uses the Linux pam_group configuration,
requires this commit to reinstate the former default SSSD
configuration, where the WAD users/groups are mapped to Linux
users/groups on stx platform, using Windows Security Identifiers
(objectSid).
Test Plan:
PASS: Successful install in AIO-SX system configuration.
PASS: The WAD users and groups have been mapped successfully
using ActiveDirectory objectSID mapping. No ID setting was required
on the WAD server
PASS: Add pam_group.so configuration in /etc/pam.d/common-auth.
Add a line in /etc/security/group.conf to map users of a WAD group
to a list of stx platform groups, e.g. "sys_protected", "sudo" and
"root". Login with a ldap user in the mapped WAD group and verify it
has membership in all the stx platform groups in the list.
PASS: Verify sudo capabilities of ldap users in the WAD group mapped
to the stx platform "sudo" group.
PASS: Verify WAD users mapped to the stx platfrom "sys_protected"
group can perform privileged operations like
"source /etc/platform/openrc".
PASS: Verify any WAD user has access to kubectl.
PASS: Add a line in /etc/security/group.conf to map users of the
local OpenLDAP group to a list of stx platform groups, e.g.
"sys_protected", "sudo" and "root". Login with a ldap user of the
mapped local OpenLDAP group and verify it has membership in all the
stx platform groups in the list.
Verify sudo, root and sys_protected privileges of an ldap user
added to those stx platform groups.
Story: 2011180
Task: 50608
Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
Change-Id: Iff03896de2e9e6aae80b4d991839b82458b71b41
This commit adds IPsec support to multi layer CA certificates by
splitting Intermediate CAs into different files. This procedure
is needed because IPSec read only one certificate per file.
Test Plan:
PASS: Bootstrap AIO-DX with a multi layer CA certificate.
PASS: After unlocking controller-0 and controller-1, verify
IPSec is enabled and the SAs are established.
PASS: Verify the certificates in the directory /etc/swanctl/x509ca/
and observe that it looks like this:
- system-local-ca-0.crt
- system-local-ca-0_l1.crt
- system-local-ca-0_l2.crt
- system-local-ca-1.crt
- system-local-ca-1_l1.crt
- system-local-ca-1_l2.crt
- system-root-ca-0.crt
- system-root-ca-1.crt
Story: 2010940
Task: 50653
Change-Id: I6859eac29b275ad8c4dcaf8db208ee5a16d83b66
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
This change adds a semantic check to host-add to ensure that the
provided BMC IP is the same version as the primary OAM address pool.
Test plan
[PASS] In an IPv4 setup, add a host with IPv4 BMC address, check that
command succeeds.
[PASS] In an IPv4 setup, add a host with IPv6 BMC address, check that
command fails.
[PASS] In an IPv6 setup, add a host with IPv6 BMC address, check that
command succeeds.
[PASS] In an IPv6 setup, add a host with IPv4 BMC address, check that
command fails.
Story: 2011027
Task: 50648
Change-Id: I5bd7925161b6e5eaa6ce6c4b28d4d92addfdaf0c
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
91-upgrade-dm.sh needs to run after 95-watch-apps-upgrade.sh.
This change rename 91-upgrade-dm.sh -> 86-upgrade-dm.sh, and
95-watch-apps-upgrade.sh -> 82-watch-apps-upgrade.sh to ensure the
correct execution order.
Closes-bug: 2073931
TCs:
passed: SX USM major release deploy activate.
Change-Id: I4edc264685f2b821fc19b152e100f18801394f79
Signed-off-by: Bin Qian <Bin.Qian@windriver.com>