1813 Commits

Author SHA1 Message Date
Leonardo Fagundes Luz Serrano
277529f9f9 tox/zuul: Set puppetlabs_spec_helper version to 6.0.3
Temporary fix for zuul failing puppet lint gem install

TestPlan:
PASS Zuul

Closes-Bug: 2039880

Change-Id: Id5557933551c226516148ec74559305320e4597f
Signed-off-by: Leonardo Fagundes Luz Serrano <Leonardo.FagundesLuzSerrano@windriver.com>
2023-10-19 18:56:46 +00:00
Zuul
57bd4e0e95 Merge "Update kubelet system overrides on unlock" 2023-10-12 21:42:28 +00:00
Ramesh Kumar Sivanandam
82ca22f5b6 Update correct iptable config values in /etc/sysctl.d/k8s.conf
The /etc/sysctl.d/k8s.conf file is missing the below iptable config
values which causes the error in kubeadm init -
"/proc/sys/net/ipv6/conf/default/forwarding was not set to 1"
during optimized BnR opearion.

net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv6.conf.all.forwarding = 1

Recent changes in the below review modified the way Kubernetes is
restored. It exposes the incorrect kernel parameters in stx-puppet.
https://review.opendev.org/c/starlingx/ansible-playbooks/+/890370

This change updates the correct iptable configuration values in the
file /etc/sysctl.d/k8s.conf during bootstrap which fixes the
optimized BnR operation failure.

These settings are intended to exactly align with the settings
already being configured by the bringup-kubemaster task in the
ansible-playbooks.

Test Plan:
PASS: Fresh install ISO as AIO-SX. Verify that /etc/sysctl.d/k8s.conf
      have the correct configuration values.
PASS: Performed optimized BnR on IPv4 enabled AIO-SX.
PASS: Performed optimized BnR on IPv6 enabled AIO-SX.

Closes-Bug: 2038545

Change-Id: I585117190b2372cfd7c978eff9bd9ff6da61a88f
Signed-off-by: Ramesh Kumar Sivanandam <rameshkumar.sivanandam@windriver.com>
2023-10-11 14:43:33 -04:00
Zuul
e49328c20a Merge "Add k8s cfg file to the OAM firewall script" 2023-10-10 21:06:06 +00:00
Andre Kantek
58581b88e9 Add k8s cfg file to the OAM firewall script
In the change
https://review.opendev.org/c/starlingx/stx-puppet/+/897467 the OAM
firewall was not updated to pass the k8s config file as argument
to calico_firewall_apply_policy.sh. It then created an error that
prevented the global network policy to be created, making the OAM
interface to block all traffic, except for the failsafed ones.

This change corrects that

Test Plan
[PASS] In AIO-DX remove the current OAM GNP and execute lock/unlock
        on one of the controllers, verify the OAM GNP is recreated.
[PASS] In AIO-DX remove the current OAM GNP and force the runtime
        execution by creating the file
        /etc/platform/.platform_firewall_config_required and observe
        the request to recreate the OAM GNP

Closes-Bug: 2038550


Change-Id: Ica03dbf6ffd9f6f592fa53efa40293191203377a
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-10-10 17:07:40 -03:00
Gleb Aronsky
f426f5c67a Update kubelet system overrides on unlock
Add logic to the platform::kubernetes::configuration method
to generate the kubelet's systemd override file. This
change ensures the file is generated every time a host is
unlocked. This facilitates delivery of systemd service changes
via patches to existing installs.

This change is needed by bug 2027810 to ensure that the
orphan volume cleanup script is executed as part of the systemd
ExecStartPre kubelet service override.

This bug is an update for the this reverted commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/895364

Test Plan:

Pass:  - Update the kube-stx-override.conf.erb file
       - Lock the AIO-SX host
       - Unlock the AIO-SX host
       - Verify that kube-stx-override.conf has been updated
       - Verify AIO-SX fresh install
       - Verify Standard Duplex lock/unlock and
         verify that kube-stx-override.conf has been updated
       - Verify Standard Duplex Install

Partial-Bug: 2027810
Change-Id: I4e47bce634c21396acb2e5f1540cac0be3ed34ec
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
2023-10-10 12:57:56 -07:00
Zuul
2a96b5b482 Merge "Update network interfaces during upgrade bootstrap execution" 2023-10-10 18:03:22 +00:00
Gleb Aronsky
3dcd90d6eb Add missing default value for a kubernetes puppet definition
Set default value for 'onlyif' in the puppet definition 'platform::kubernetes::mask_stop_service' if no conditional criteria are set.

This fixes a regression where the mask_stop_kubelet method calls
mask_stop_service with no value set for the onlyif parameter.

Test Plan:
- PASS: Upgrade Kubelet

Closes-Bug: 2038858
Change-Id: I06e4cf42dbe710a78443dffb83c972c73f9789bf
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
2023-10-10 15:25:16 +00:00
Lucas Ratusznei Fonseca
521f9294db Update network interfaces during upgrade bootstrap execution
There are two issues adressed by this change:

1. The change introduced by
https://review.opendev.org/c/starlingx/stx-puppet/+/895726 prevents
the interfaces from being activated during upgrade bootstrap. Such
behaviour causes issues when specific network configurations are needed
for the upgrade to finish. This change reverts it, but making sure that
the sysinv lock issue is properly taken care of.

2. Given an interface that is already activated and configured with ip
addresses via the 'ip' command, if an attempt is made to bring it up
via the 'ifup' command, the command fails and the default route (if
present in the config file) is not properly configured. To prevent this
from happening, this change improves the logic in the function that
brings the interface down, so that the interface is guaranteed to be in
down state and have no IP adresses.

Test plan

Systems
  - AIO-SX IPv4
  - AIO-SX IPv6

Test IP addresses (examples)
+---+-----------------+-----------------+
| # | IPv4 (/24 mask) | IPv6 (/64 mask) |
+---+-----------------+-----------------+
| 1 | 10.20.1.1       | fd00::1:1       |
| 2 | 10.20.1.2       | fd00::1:2       |
| 3 | 10.20.1.3       | fd00::1:3       |
| 4 | 10.20.1.4       | fd00::1:4       |
| 5 | 10.20.2.1       | fd01::1:1       |
+---+-----------------+-----------------+

Test scenarios

The initial setup for each test must follow what is described in each
corresponding scenario below. For it to be valid, the configuration in
the kernel must be in sync with the sysinv database.

1. Standard ethernet interface as OAM
  - oam0
    > Type: regular ethernet
    > Underlaying interface: 'if0' for reference
    > Static IP: #2
    > Gateway IP: #1

2. VLAN interface as OAM
  - oam0
    > Type: VLAN ID 100
    > Interface name: 'vlan100' for reference
    > Underlaying interface: 'if0' for reference
    > Static IP: #2
    > Gateway IP: #1

Actions

- Edit ifcfg-<interface>:
  > Manually edit /etc/network/interfaces.d/ifcfg-<interface> to change
    MTU value (for example, from 1500 to 1502), this will cause the
    script to detect the difference and trigger an interface update.

- Erase ifcfg-<interface>:
  > Manually remove /etc/network/interfaces.d/ifcfg-<interface> file
    from the filesystem.

- ifdown <interface>:
  > Run command 'ifdown <interface>' to cause the interface to be
    deactivated by ifupdown.

- Set link up <interface>:
  > Run command 'ip link set up dev <interface>' to put interface's
    link to UP state.

- Create VLAN <vlan-name> on <iface>:
  > Create VLAN interface and set it's link to UP through the commands
    'ip link add link <iface> name <vlan-name> type vlan id <vlan-id>'
    and 'ip link set up dev <vlan-name>'.

- Add IP <address> to <interface>:
  > Run command 'ip address add <address> dev <interface>' to add an
    IP address to the interface.

- Add route to <interface> via <address>:
  > Run command 'ip route add default via <address> dev <interface>' to
    add a default route to the interface.

- Modify MTU:
  > Modify the MTU of the interface via sysinv (example:
    'system host-if-modify controller-0 oam0 -m 1502')

[ Test Case 1 - Direct script call tests ]

For the tests, changes to the OAM interface will be made to check if
its parameters (link state, IP address, default route) are correctly
restored by the apply_network_config.sh script. The changes here are
made manually to the files and not through sysinv.

Test procedure

1. Apply initial setup.
2. Apply actions.
3. Run /usr/local/bin/apply_network_config.sh as root.
4. Check that interface state, IP address and default route in kernel
   match the ones in the sysinv database.

Tests

For scenario #1

PASS For if0, edit ifcfg
PASS For if0, erase ifcfg
PASS For if0, ifdown, erase ifcfg
PASS For if0, ifdown, edit ifcfg, set link up, add address IP#2
PASS For if0, ifdown, edit ifcfg, set link up, add address IP#5
PASS For if0, ifdown, edit ifcfg, set link up, add address IP#2,
     add route via IP#4

For scenario #2

PASS For vlan100, edit ifcfg
PASS For vlan100, erase ifcfg
PASS For vlan100, ifdown, erase ifcfg
PASS For vlan100, ifdown, edit ifcfg, create VLAN, add address IP#2
PASS For vlan100, ifdown, edit ifcfg, create VLAN, add address IP#5
PASS For vlan100, ifdown, edit ifcfg, create VLAN, add address IP#2,
     add route via IP#4
PASS For if0, edit ifcfg
PASS For if0, erase ifcfg

[ Test Case 2 - Indirect script call on lock/unlock ]

Test Procedure

1. Apply initial setup.
2. Lock host.
3. Apply actions.
4. Unlock host.
5. Check that interface state, IP address and default route in kernel
   match the ones in the sysinv database.

Tests

For scenario #1

PASS For if0, modify MTU
PASS For if0, erase ifcfg (to simulate first unlock when ifcfg-* files
     don't exist)

For scenario #2

PASS For vlan100, modify MTU
PASS For if0, modify MTU
PASS For if0 and vlan100, erase ifcfg

[ Test Case 3 - Indirect script call on upgrade ]

Tests

PASS Perform upgrade on VirtualBox
PASS Perform upgrade on a physical lab

-----------------------------------------------------------------------

Closes-Bug: #2036451
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
Change-Id: Ibda7c744e9be26b0bbcbd1520ffe15825ad1f60f
2023-10-09 16:41:00 -03:00
Zuul
d9f373a3f9 Merge "Gracefully stop the isolcpu and kubelet service" 2023-10-06 20:08:36 +00:00
Zuul
bad07c4e2f Merge "Remove worker remote firewall scripts" 2023-10-06 17:30:49 +00:00
Andre Kantek
eebf90d20e Remove worker remote firewall scripts
The implementation for worker firewall avoided using local kubectl
commands. This required access to the keyring for remote ansible
ad-hoc commands and leaves the /opt/platform/.config mounted on the
worker.

Use kubectl command with /etc/kubernetes/kubelet.conf instead, so we
can refrain from mounting /opt/platform/.config

Since all firewall data is generated in the host's hierada file, the
worker node needs to be able to access the calico firewall resources.
To achieve that a ClusterRole and ClusterRoleBinding are added, via
the controller node, allowing access to only the necessary resources.

Test Plan:
[PASS] Install a standard setup and validate the worker node firewall
        configuration
[PASS] Execute a DOR test in the cluster and check if the worker nodes
        install the firewall GNP and HE
[PASS] Execute worker node lock/unlock and check if the worker nodes
        install the firewall GNP and HE

Closes-Bug: 2038550

Change-Id: Icf31b513427120fe81c53be21b8d8a81a8e323f8
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-10-06 08:12:27 -03:00
Andre Mauricio Zelak
179fef6fef Fix task ordering to prevent race condition on ptp config gen
Changed the task ordering for the ptpinstance runtime manifest. The
previous ordering could result in a race condition where ts2phc was
being restarted before clock-conf.conf file is updated. This would
result PTP instance out-of-tolerance, skewed from the primary
clock.

Changed the order between 'platform::ptpinstance' and 'platform::
ptpinstance::nic_clock', and moved the directory "${ptp_conf_dir}/
ptpinstance" creation to nic_clock. The nic_clock class will first
use it to store the clock-conf.conf file.

Test Plan:
PASS: Using a multiple PTP instances configuration and run the
manifest with system ptp-instance-apply. Ensure that ts2phc
starts after clock_nic class, the ts2phc is handling all the
interfaces configured and the clock is not skewed.
PASS: Host lock and unlock and check the puppet manifest log

Closes-bug: 2038383

Change-Id: I97e5a6bf536f05720fcaea860a83e53454e83ab6
Signed-off-by: Andre Mauricio Zelak <andre.zelak@windriver.com>
2023-10-05 19:45:41 -03:00
Zuul
f050063889 Merge "Update deprecated K8S API references of v1beta2 to v1beta3" 2023-09-29 22:07:15 +00:00
Zuul
a86f226666 Merge "Revert "Update dnsmasq conf file for host-record support"" 2023-09-29 20:23:41 +00:00
Joseph V
1fbc6cdca4 Revert "Update dnsmasq conf file for host-record support"
This reverts commit 6c418b0441460ed018e76cfda0b13c97e240e000.

Reason for revert: Partial-Bug

LP: https://bugs.launchpad.net/starlingx/+bug/2037734


Closes-Bug: 2037734
Story: 2010835
Task: 48724

Change-Id: Ic9779078349fdbfa6e48a9e90bad8f0794b6ddd9
2023-09-29 19:00:24 +00:00
Zuul
a34c2ff255 Merge "Add DNS host records to /etc/hosts" 2023-09-28 15:57:35 +00:00
Gleb Aronsky
9f2bc83059 Gracefully stop the isolcpu and kubelet service
This change ensures that the isolcpu_plugin service is
stopped and masked prior to masking and stopping the
kubelet service. Additionally, on startup, the kubelet
service is unmasked and started prior to unmasking isolcpu_plugin.
This change is intended to avoid any race conditions that can
occur because of the dependency on kubelet by isolcpu_plugin,
resulting in numerous restarts of both services and
leading to failed node upgrades.

Test Plan:

   - PASS: Upgrade kubelet AIO-SX
   - PASS: Upgrade kubelet on a STANDARD installation

Closes-Bug: 2036985

Change-Id: Ifb2b512c3953d2a1f7efdba289a31d5a9315cae4
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
2023-09-28 14:23:09 +00:00
Zuul
fbadea9885 Merge "Update dnsmasq conf file for host-record support" 2023-09-27 13:34:36 +00:00
Saba Touheed Mujawar
b65a954562 Update deprecated K8S API references of v1beta2 to v1beta3
Updating just a comment that has apiVersion: kubeadm.k8s.io/v1beta2
to v1beta3.

Test Plan:
PASS: k8s upgrade from 1.22.5 to next all available versions.

Story: 2010878
Task: 48836

Change-Id: Id06a0e73212ddeb5b1817a6f7de286ca37a78538
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2023-09-27 08:50:29 -04:00
Zuul
cd13fc3c4b Merge "Add logic to recreate routes when interfaces are brought down/up" 2023-09-25 15:32:02 +00:00
Joseph Vazhappilly
23ea7423db Add DNS host records to /etc/hosts
The controller needs to add user configured DNS host records to
/etc/hosts file, before initial unlock when
dnsmasq service is not running. Other personalities like
worker hosts and storage hosts does not require this change.

Test Plan:
PASS: Successful build
PASS: Successful bootstrap, initial unlock of controller-0
PASS: Verify after controller-0 unlock, lock and unlock controller-1
PASS: Verify duplex controller host-swact with dns host records
PASS: Verify host records in /etc/hosts when dnsmasq is down
PASS: Verify host records absent in /etc/hosts when dnsmasq is up

Story: 2010835
Task: 48725

Change-Id: I3f11084c7458f89288ba4bef18c78e60ff55b74e
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
2023-09-25 05:42:19 -04:00
Lucas Ratusznei Fonseca
99ee3f190e Add logic to recreate routes when interfaces are brought down/up
When the apply_network_config.sh script detects changes in an
interface's config file, it brings the link down and up again. This
causes any associated routes to be only automatically deleted from the
kernel and not restored.
This commit adds logic to restore any affected routes when the
associated interface is brought down/up.

Test plan

Setup:

System
- AIO-SX IPv4

Interfaces
- data0: ETH static ip 10.10.10.3/24
- data0.100: VLAN static IP 10.10.11.3/24
- data1: ETH static ip 10.10.20.3/24
- data1.200: VLAN static IP 10.10.21.3/24

Pre-existing routes
- rt01: 10.20.1.0 -> 10.10.10.1 via data0
- rt02: 10.20.2.0 -> 10.10.10.1 via data0
- rt03: 10.20.3.0 -> 10.10.11.1 via data0.100
- rt04: 10.20.4.0 -> 10.10.11.1 via data0.100
- rt05: 10.20.5.0 -> 10.10.20.1 via data1
- rt06: 10.20.6.0 -> 10.10.20.1 via data1
- rt07: 10.20.7.0 -> 10.10.21.1 via data1.200
- rt08: 10.20.8.0 -> 10.10.21.1 via data1.200

Routes to be added
- rt09: 10.20.9.0 -> 10.10.10.1 via data0
- rt10: 10.20.10.0 -> 10.10.11.1 via data0.100
- rt11: 10.20.11.0 -> 10.10.20.1 via data1
- rt12: 10.20.12.0 -> 10.10.21.1 via data1.200

Actions
- Change interface: manually edit /etc/network/interfaces.d/ifcfg-* to
    change MTU value, this will cause the script to detect the
    difference and trigger an interface update.
- Add route: manually edit /var/run/network-scripts.puppet/routes
    to add a new route, this will cause the script to detect the
    difference and trigger an update in the routes.
- Remove route: manually edit /var/run/network-scripts.puppet/routes
    to remove a route.

Procedure for direct tests
1. Perform action
2. Run /usr/local/bin/apply_network_config.sh as root
3. Validade that routes in /var/run/network-scripts.puppet/routes and
   in the kernel match

Direct tests:

PASS Change data0.100
PASS Change data0
PASS Change both data0.100 and data1.200
PASS Change both data0 and data1
PASS Add route rt09
PASS Remove route rt01
PASS Change data0.100 and add route rt10
PASS Change data0.100 and remove route rt03
PASS Change data0 and add routes rt09 and rt10
PASS Change data0 and remove routes rt01 and rt03
PASS Change data0 and data1, add routes rt09, rt10, rt11, rt12
PASS Change data0 and data1, remove routes rt01, rt03, rt05, rt07
PASS Repeat all the previous tests in an equivalent IPv6 setup

Indirect tests:

PASS Lock system, change MTU of data0 via 'system host-if-modify',
     unlock system
PASS Lock system, erase /etc/network/interfaces.d/ifcfg-* and
     /etc/network/routes, unlock system

Closes-Bug: #2036667
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
Change-Id: If7f301ff797a6644aaa3ff7f73de530d680c6b3b
2023-09-22 11:57:12 -03:00
Zuul
3cdbbcc3b4 Merge "Add crashdump template and parameter handling" 2023-09-21 16:50:33 +00:00
Enzo Candotti
f6b33579d9 Add crashdump template and parameter handling
This commit aims to handle the new platform crashdump service
parameters in order to create a crash-dump-manager configuration
file.

This file will be located in /etc/default/crash-dump-manager, and
will be used as an EnvironmentFile for the crashDumpMgr service, in
order to set the new parameters.

Test Plan:

PASS: TC1
      add new parameters in the service-parameter table
      with valid value
      --> parameter added in service-parameter table
      --> crash-dump-manager updated with new values

PASS: TC2
      modify new parameters in the service-parameter table
      with valid value
      --> service-parameter table updated with new value
      --> crash-dump-manager updated with new value

PASS: TC3
      delete new parameters from the service-parameter table
      --> parameter deleted from service-parameter table
      --> crash-dump-manager updated with default value

PASS: TC4
      add/modify new parameters in the service-parameter table
      with no value
      --> msg: "The service parameter value is mandatory"
      --> parameter not added/modified in service-parameter table
      --> crash-dump-manager not updated

PASS: TC5
      a)add/modify max_files in the service-parameter table
      with wrong format
      --> msg: "Parameter 'max_files' must be an integer value."
      --> parameter not added/modified in service-parameter table
      b)add/modify the other parameters in the service-parameter
      table with wrong format
      --> msg: "Parameter <value> must be written in human readable
      format"
      --> parameter not added/modified in service-parameter table

PASS: TC6
      reboot host with new parameters configured
      --> service-parameter table keeps new parameters value
      --> crash-dump-manager keep configured values

PASS: ISO installation

Story: 2010893
Task: 48766


Signed-off-by: Enzo Candotti <enzo.candotti@windriver.com>
Change-Id: Ia02e73462802c7831331bed6dec98a6b1cb37020
2023-09-21 16:06:54 +00:00
Zuul
ffd09b26de Merge "When executing upgrade bootstrap for AIO-SX only update /etc/network/" 2023-09-20 15:18:04 +00:00
Andre Kantek
4a816dfa6c When executing upgrade bootstrap for AIO-SX only update /etc/network/
During upgrade bootstrap, the runtime execution of the puppet class
platform::network::runtime created a lock timeout with sysinv-agent
to allow the interface configuration in the kernel. The lock exists
to allow sysinv-agent to collect interface information for the
system inventory.

The optimized upgrade feature created this runtime execution to fill
the contents in /etc/network/interface.d/ and /etc/network/routes
to be available during the network bringup phase that happens after
the system unlock in an earlier step than the regular puppet manifest
execution. But as part of apply_network_config.sh execution it also
brings up the interfaces, accessing the script protected section and
causing lock timeout.

This change uses the file /var/run/.network_upgrade_bootstrap to
indicate a upgrade bootstrap  is under way to just populate the
/etc/network files and to not activate the interfaces.

The interface activation is not needed as the bootstrap is still under
way and the minimal network configuration is already provided prior to
the bootstrap. After unlocking the files in /etc/network will provide
a faster network availability as systemd's network service is one of
the first to be executed during boot.

Test Plan:
[PASS] in 21.12 add an assorted network configuration with
       - vlan, bonded, and ethernet interfaces (the vlan interface on
         top of both base interfaces)
       - configure the bond interface with static address
       - add routes on all platform and data interfaces
[PASS] execute lock/unlock in 21.12 to verify config in applied
[PASS] execute AIO-SX upgrade to 22.12, at the bootstrap end check
        files in /etc/network/
[PASS] finish upgrade and after unlock verify the network access and
        address, interfaces, and route creation in the kernel

Closes-Bug: 2036451

Change-Id: Ib04f72298252a52a8a05cf644671106ad6530e5f
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-09-19 18:34:53 +00:00
Zuul
8abb5796bc Merge "Etcd service status: check for certs error" 2023-09-15 21:14:50 +00:00
Zuul
bfbf807a8f Merge "Revert "Update kubelet system overrides on unlock"" 2023-09-15 17:03:06 +00:00
Bruce Jones
22f341e5af Revert "Update kubelet system overrides on unlock"
This reverts commit 7d870177c6c01f5b47a33ef1cd7ce92c3c58694f.

Reason for revert: blocking testing due to lock/unlock failures

Change-Id: Idc746dfc1076ef2f45a9790c7fecf4bb848162e2
2023-09-15 16:40:12 +00:00
Zuul
2fde8b0872 Merge "In DC setup update firewall with the routes" 2023-09-14 23:33:10 +00:00
kaustubh.dhokte
3ffe8b7e1e Etcd service status: check for certs error
The script /etc/init.d/etcd is used by the service manager for
management of the etcd service. The call '/etc/init.d/etcd status'
uses etcdctl health API to determine if the service is running
fine or not. In an event if etcd certs are replaced with new ones
but the service has not yet been restarted to use new ones, the
status call will fail even though the service is running fine and
the service manager will treat that as service is failed.
'sm-audit' (which is run periodically) uses '/etc/init.d/etcd status'
call to determine and maintain the service health. Service manager
receiving false service status may introduce a lot bugs.

One such scenario is that 'sm' ignores the 'service restart' call
if it thinks service is disabled. This leads to etcd not being
restarted with new certs during upgrade activate and not being
reachable to the kube-apiserver (which may have started using new
client certs).

This change modifies '/etc/init.d/etcd status' call to not just
rely on etcd health api to determine if the etcd service is running
and checks for the existence of etcd runtime information in case
the health api fails with the 'bad certificate' error.

Test Plan:
PASS: Replace old certs with new certs at /etc/etcd/ and do not
      restart the service. Check that the '/etc/init.d/etcd status'
      is 'running'.
PASS: Replace old certs with new certs at /etc/etcd/ and restart
      the service. Check that the '/etc/init.d/etcd status' is
      'running'.

Closes-Bug: 2033942

Change-Id: Id30a262ca1bde6d8acb85de10882ca9bd4b59bdd
Signed-off-by: kaustubh.dhokte <kaustubh.dhokte@windriver.com>
2023-09-14 22:59:10 +00:00
Andre Kantek
ed3a1d65a2 In DC setup update firewall with the routes
The tests in DC labs intended to bringup up to 1000 subclouds showed
problems to create all routes when it was necessary to add the
firewall update in the same puppet manifest apply that handled
platform::network::routes::runtime, generating an ever growing queue
that generated timeout during subcloud creation by DC manager.

This change adds the firewall classes to be run from inside
platform::network::routes::runtime to allow both the route
configuration and the firewall update

Test PLan:
[PASS] In a SysCtrl, add/remove a single route (via CLI/sysinv-API)
        if in a controller, verify route and firewall update
[PASS] In a SysCtrl, add/remove a single route (via CLI/sysinv-API)
        if in a worker, verify only route update is executed
[PASS] In a SysCtrl, add/remove 50 routes (via CLI/sysinv-API)
        using the parallel bash command:
     seq 1 50|parallel --jobs 25 --eta '\
     system host-route-add 1 mgmt0 51.{}.{}.0 24 192.168.0.1 1 && \
     system host-route-add 2 mgmt0 51.{}.{}.0 24 192.168.0.1 1;'
[PASS] In a SysCtrl, add a subcloud AIO-SX and validate route and
        firewall update
[PASS] Install a subcloud and check the SysCtrl network is installed
        as a route and in the mgmt firewall

Closes-Bug: 2033919

Change-Id: Ic5bf9bc84b8c583a39d8c91b72caef4d84240123
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-09-13 17:10:08 -03:00
Zuul
8722928985 Merge "Set default value for private_dc_ip_address" 2023-09-12 18:06:01 +00:00
Zuul
75b0b669f0 Merge "Remove rabbitmq dependencies from sysinv puppet" 2023-09-12 14:51:47 +00:00
Steven Webster
34431bb0f7 Set default value for private_dc_ip_address
It was found that when applying a patch containing the code for
enabling the admin network on distributed cloud systems, a
problem could occur after unlock because of an undefined value
for 'private_dc_ip_address'.

This is because the value of this puppet parameter is only set
in the system.yaml hieradata, which is not written on normal
host lock/unlock.

The value is generated however, when a user assigns an interface
to the admin network, or performs another action that leads to
the system hieradata being generated.

If the user has not assigned an interface to the admin network,
or if the user has no interest in using the admin network, this
commit ensures that the value of the private_dc_ip_address is
that of the private_ip_address (ie. mgmt).

Setting the private_dc_ip_address to the private_ip_address
as default ensures that every component using the haproxy::params
behaves as it did pre-patch.

Testing:

Success path:

- Install, lock / unlock AIO-SX system (stx 8.0 + patch)
- Install, lock / unlock an AIO-DX DC subcloud (stx 8.0 + patch)
- Update a DC subcloud to use the admin network after the system
  has been patched.

Regression:

- Install, lock / unlock AIO-SX system (dev build)
- Install, lock / unlock an AIO-DX DC subcloud (dev build)
- Install a DC subcloud using the mgmt. network (dev build)
- Install a DC subcloud using the admin network (dev build)

Story: 2010319
Task: 46911

Signed-off-by: Steven Webster <steven.webster@windriver.com>
Change-Id: Ice16a4a3a2d9faa461c1c0f98d1ea0b0c7ce751a
2023-09-12 09:36:32 -04:00
Zuul
94f80ab772 Merge "Revert "Revert Patch of puppet-manifest-apply.sh"" 2023-09-12 00:58:16 +00:00
Zuul
545f251d3e Merge "Update haproxy config to include keystone request retry." 2023-09-11 19:30:49 +00:00
Lucas Borges
e44ff4ecfe Revert "Revert Patch of puppet-manifest-apply.sh"
This reverts commit a1784deca9d30848f05d2ca53e66cf832d54b0da.

Reason for revert:
The white list created to ignore only for VM, however,
the ignored warning was also seen in real server.
This needs to be more extensively tested in different
types of server.

Story: 2010757
Task: 48644

Change-Id: I979b4269d0e8f68b5ea0c8471b14e666a437730d
Signed-off-by: Lucas Borges <lucas.borges@windriver.com>
2023-09-11 15:21:03 +00:00
Zuul
68404747e7 Merge "Fix puppet network.pp to load vfio-pci driver with sriov enabled" 2023-09-11 13:06:58 +00:00
Lucas Borges
ded6c7a3e0 Update puppet bucket cache dir
Every puppet apply generates some caches of files
in /var/cache/puppet/clientbucket and the script is
cleaning another directory
/var/lib/puppet/clientbucket. This updates the cache path.

The vardir setting in puppet.conf is removed in
https://review.opendev.org/c/starlingx/integ/+/830542
this configuration sets the cache dir
Without that, the puppet stores the cache under
/var/cache/puppet/clientbucket.

Test Plan:
PASS: Ensure the cache directory is removed
      after the puppet apply (AIO-SX)

Closes-Bug: 2034932
Change-Id: If564d00bc09030d0543669a11a646fcf502bf65b
Signed-off-by: Lucas Borges <lucas.borges@windriver.com>
2023-09-08 17:38:59 +00:00
Bezerra Filho, Moacir
a3ceb3ea34 Fix puppet network.pp to load vfio-pci driver with sriov enabled
- Modified manifests/network.pp enabled sriov when driver is vfio-pci

plan test:
01. PASSED - system host-lock controller-0
02. PASSED - Configure any NIC that has status UP
             (check with the command: ip addr)
03. PASSED - system host-if-modify -m 1500 -n sriov0 -c pci-sriov \
                -N 63 --vf-driver=vfio controller-0 <interface_name>
04. PASSED - system host-unlock controller-0

05. PASSED - system application-upload <sriov-fec-operator-version.tgz>
06. PASSED - system application-apply sriov-fec-operator

07. PASSED - cat /sys/module/vfio_pci/parameters/enable_sriov eq. "Y"
08. PASSES - cat /sys/module/vfio_pci/parameters/disable_idle_d3 eq. "Y"

(In this example I'm using ACC200 but it can be ACC100 or N3000)
09. PASSED - kubectl apply -f sriov-fec-config-acc200-vfio-pci.yaml
10. PASSED - kubectl apply -f acc200.yml
11. PASSED - kubectl exec acc200 -it – bash
12. PASSED - echo $PCIDEVICE_INTEL_COM_INTEL_FEC_ACC200
13. PASSED - Run the script below in the container
    CPU_SET=$(taskset -pc 1 | sed -e 's/pid 1.s current \
                affinity list://g')
    for d in $(lspci -d 8086:57c1 | cut -d' ' -f1) ; do
        /opt/sysroot/usr/local/bin/dpdk-test-bbdev \
        --vfio-vf-token=02bddbbf-bbb0-4d79-886b-91bad3fbb510 \
        -l $CPU_SET -a $d -- -c validation \
        -v /opt/sysroot/usr/local/app/test-bbdev/ldpc_dec_default.data
    done
14. PASSED - DPDK Validation with testpmd was performed


Closes-bug: #2033103
Signed-off-by: Bezerra Filho, Moacir <Moacir.BezerraFilho@windriver.com>
Change-Id: I880551fafbb351c370acedeaf43f9de595fea0af
2023-09-08 16:47:47 +00:00
Bezerra Filho, Moacir
86c4ab043b Update haproxy config to include keystone request retry.
- Add keywork retry_on in haproxy::backend
- Add values retry_on in keystone.pp
- Modified keystone_http_connect_timeout 10 to 15 in api.pp, api_proxy.pp, certalarm.pp and certmon.pp

this workaround solves:
- DC Scale | RR Patch Orchestration fails as it cannot retrieve patches for subcloud after the apply
- DC Patch - Parallel patch orchestration fails to establish connection to MGMT interface of subclouds
- Patch orchestration fail due to transient keystone errors

Test plan:
1. (PASSED) Patch Creation:
    - Construct a "reboot required" RR patch that encompasses the specified changes.
    - Generate an "in-service test" NRR patch.

2. (PASSED) Initial Setup:
    - Commission a DC system with over 500 subclouds.
    - Assert that the patch encompassing the fix is applied successfully on the DC.

3. (PASSED) Strategy Creation and RR Patch Deployment (Max 250 Subclouds):
    - Created a RR patch strategy with max_parallel_subclouds set to 250
    - Checked that the RR patch strategy is applied to all subclouds successfully.
    - Repeat this process in more 250 subclouds
    - Checked that the patch strategy is applied to all subclouds successfully.

4. (PASSED) Strategy Alteration and NRR Patch Deployment (Max 500 Subclouds):
    - Eliminate the existing patch strategy.
    - Initiate a NRR patch strategy, adjusting the max_parallel_subclouds parameter to 500.
    - Checked that the "in-service test" NRR patch is successfully applied across all subclouds and that no linked issues arise.

Closes-Bug: #2025646
Change-Id: I95e9c8f3cd904d7f637da2ea69a83fd7fa5f03a1
Signed-off-by: Bezerra Filho, Moacir <Moacir.BezerraFilho@windriver.com>
2023-09-08 13:13:07 +00:00
Zuul
47eccd0e9f Merge "Update kubelet system overrides on unlock" 2023-09-08 05:11:59 +00:00
Gleb Aronsky
7d870177c6 Update kubelet system overrides on unlock
Move generation of kubelet's systemd override file,
kube-stx-override.conf, from platform::kubernetes::master::init
to platform::kubernetes::configuration so that the file will
be generated on every host unlock. This facilitates delivery
of systemd service changes via patches to existing installs.

This change is needed by bug 2027810 to ensure that the
orphan volume cleanup script is executed as part of the systemd
ExecStartPre kubelet service override.

Test Plan:

Pass:  - Update the kube-stx-override.conf.erb file
       - Lock the host
       - Unlock the host
       - Verify that kube-stx-override.conf has been updated
       - Verify AIO-SX
       - Verify Standard config

Partial-Bug: 2027810
Change-Id: I3b496abc807bf75716d28079c62ef4700dcd3244
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
2023-09-06 13:50:27 -07:00
Joseph Vazhappilly
6c418b0441 Update dnsmasq conf file for host-record support
Add option 'conf-file' in dnsmasq conf file to use additional
config file in persistent store. This additional conf file is used
to support host-record option of dnsmasq.
Users can add A, AAAA and PTR records to the DNS with TTL support.

Test Plan:
PASS: Successful build
PASS: Successful bootstrap and unlock
PASS: Verify ping hostnames by adding host-record in new conf file

Story: 2010835
Task: 48724

Change-Id: I583d1cf783a32a3dc6f2d1c9786287a7f64809a3
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
2023-09-05 05:06:40 -04:00
Samuel Toledo
3d5b46834a Remove rabbitmq dependencies from sysinv puppet
Continuing the efforts from [1], this review consists in removing all
dependencies related to amqp classes as well as initializations for
rabbitmq variables. This removal can be done because sysinv does not
use rabbitmq.

Test plan
PASS - Perform fresh install and bootstrap in an AIO-SX successfully
PASS - Perform fresh install and bootstrap in an AIO-DX successfully
PASS - Run any system command successfully (system host-list, system application-list, etc)

Story: 2010802
Task: 48578

[1] - https://storyboard.openstack.org/#!/story/2010802

Change-Id: I5da60b97ac8808d95d5b76ade065ea521e62e251
Signed-off-by: Samuel Toledo <samuel.presatoledo@windriver.com>
2023-08-31 19:43:12 +00:00
Luis Marquitti
a1784deca9 Revert Patch of puppet-manifest-apply.sh
Script puppet-manifest-apply.sh was patched until the warnings in
running the bootstrap, aio unlock and runtime manifests were resolved on
Debian: https://review.opendev.org/c/starlingx/stx-puppet/+/844121
In addition to the patch reversal, a whitelist was created for virtual
environments, where any warnings during the development process can be
added. The whitelist was implemented with an initial warning ("Could not
retrieve fact ipaddress"), that only affects virtual environments.

Test Plan:
PASS: Build & Install
PASS: AIO-SX & AIO-DX Successful Bootstrap
PASS: AIO-SX & AIO-DX Successful Unlock
PASS: Verified that Warnings added to the whitelist doesn't affect
manifests execution in virtual environments.

Story: 2010757
Task: 48644
Change-Id: Id67facc82bed7e069efb5f52e0775a69da355de0
Signed-off-by: Luis Marquitti <luis.eduardoangelinimarquitti@windriver.com>
2023-08-31 09:46:35 -03:00
Zuul
cdefacfea6 Merge "Replace puppet template with shell script" 2023-08-29 17:28:09 +00:00
Heron Vieira
0c43be4cd1 Replace puppet template with shell script
Based on the goal of the story 2010802, this review replaces
the puppet template that calls the manage-partitions script
with a shell script, saving time by not rendering the puppet
template, just executing the shell script.
Tests showed a time reduction of around 10%.

Test plan
PASS: AIO-SX fresh install, bootstrap and initial unlock.
PASS: AIO-DX fresh install, bootstrap and initial unlock
      for all nodes.
PASS: Standard (2+2) fresh install, bootstrap and initial
      unlock for all nodes.
PASS: AIO-SX lock and unlock after install.
PASS: AIO-DX lock and unlock after install.
PASS: Standard (2+2) lock and unlock after install.
PASS: Standard (2+2):Test manage-partitions script with
      modify, delete and create operations on all hosts.
PASS: SX:Test manage-partitions script with modify,
      delete and create operations.

Story: 2010802
Task: 48595

Signed-off-by: Heron Vieira <heron.vieira@windriver.com>
Change-Id: I95847762b08d49d0fe8cf144691321489ea5b2c9
2023-08-29 14:48:44 +00:00