The /etc/sysctl.d/k8s.conf file is missing the below iptable config
values which causes the error in kubeadm init -
"/proc/sys/net/ipv6/conf/default/forwarding was not set to 1"
during optimized BnR opearion.
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv6.conf.all.forwarding = 1
Recent changes in the below review modified the way Kubernetes is
restored. It exposes the incorrect kernel parameters in stx-puppet.
https://review.opendev.org/c/starlingx/ansible-playbooks/+/890370
This change updates the correct iptable configuration values in the
file /etc/sysctl.d/k8s.conf during bootstrap which fixes the
optimized BnR operation failure.
These settings are intended to exactly align with the settings
already being configured by the bringup-kubemaster task in the
ansible-playbooks.
Test Plan:
PASS: Fresh install ISO as AIO-SX. Verify that /etc/sysctl.d/k8s.conf
have the correct configuration values.
PASS: Performed optimized BnR on IPv4 enabled AIO-SX.
PASS: Performed optimized BnR on IPv6 enabled AIO-SX.
Closes-Bug: 2038545
Change-Id: I585117190b2372cfd7c978eff9bd9ff6da61a88f
Signed-off-by: Ramesh Kumar Sivanandam <rameshkumar.sivanandam@windriver.com>
In the change
https://review.opendev.org/c/starlingx/stx-puppet/+/897467 the OAM
firewall was not updated to pass the k8s config file as argument
to calico_firewall_apply_policy.sh. It then created an error that
prevented the global network policy to be created, making the OAM
interface to block all traffic, except for the failsafed ones.
This change corrects that
Test Plan
[PASS] In AIO-DX remove the current OAM GNP and execute lock/unlock
on one of the controllers, verify the OAM GNP is recreated.
[PASS] In AIO-DX remove the current OAM GNP and force the runtime
execution by creating the file
/etc/platform/.platform_firewall_config_required and observe
the request to recreate the OAM GNP
Closes-Bug: 2038550
Change-Id: Ica03dbf6ffd9f6f592fa53efa40293191203377a
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
Add logic to the platform::kubernetes::configuration method
to generate the kubelet's systemd override file. This
change ensures the file is generated every time a host is
unlocked. This facilitates delivery of systemd service changes
via patches to existing installs.
This change is needed by bug 2027810 to ensure that the
orphan volume cleanup script is executed as part of the systemd
ExecStartPre kubelet service override.
This bug is an update for the this reverted commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/895364
Test Plan:
Pass: - Update the kube-stx-override.conf.erb file
- Lock the AIO-SX host
- Unlock the AIO-SX host
- Verify that kube-stx-override.conf has been updated
- Verify AIO-SX fresh install
- Verify Standard Duplex lock/unlock and
verify that kube-stx-override.conf has been updated
- Verify Standard Duplex Install
Partial-Bug: 2027810
Change-Id: I4e47bce634c21396acb2e5f1540cac0be3ed34ec
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
Set default value for 'onlyif' in the puppet definition 'platform::kubernetes::mask_stop_service' if no conditional criteria are set.
This fixes a regression where the mask_stop_kubelet method calls
mask_stop_service with no value set for the onlyif parameter.
Test Plan:
- PASS: Upgrade Kubelet
Closes-Bug: 2038858
Change-Id: I06e4cf42dbe710a78443dffb83c972c73f9789bf
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
There are two issues adressed by this change:
1. The change introduced by
https://review.opendev.org/c/starlingx/stx-puppet/+/895726 prevents
the interfaces from being activated during upgrade bootstrap. Such
behaviour causes issues when specific network configurations are needed
for the upgrade to finish. This change reverts it, but making sure that
the sysinv lock issue is properly taken care of.
2. Given an interface that is already activated and configured with ip
addresses via the 'ip' command, if an attempt is made to bring it up
via the 'ifup' command, the command fails and the default route (if
present in the config file) is not properly configured. To prevent this
from happening, this change improves the logic in the function that
brings the interface down, so that the interface is guaranteed to be in
down state and have no IP adresses.
Test plan
Systems
- AIO-SX IPv4
- AIO-SX IPv6
Test IP addresses (examples)
+---+-----------------+-----------------+
| # | IPv4 (/24 mask) | IPv6 (/64 mask) |
+---+-----------------+-----------------+
| 1 | 10.20.1.1 | fd00::1:1 |
| 2 | 10.20.1.2 | fd00::1:2 |
| 3 | 10.20.1.3 | fd00::1:3 |
| 4 | 10.20.1.4 | fd00::1:4 |
| 5 | 10.20.2.1 | fd01::1:1 |
+---+-----------------+-----------------+
Test scenarios
The initial setup for each test must follow what is described in each
corresponding scenario below. For it to be valid, the configuration in
the kernel must be in sync with the sysinv database.
1. Standard ethernet interface as OAM
- oam0
> Type: regular ethernet
> Underlaying interface: 'if0' for reference
> Static IP: #2
> Gateway IP: #1
2. VLAN interface as OAM
- oam0
> Type: VLAN ID 100
> Interface name: 'vlan100' for reference
> Underlaying interface: 'if0' for reference
> Static IP: #2
> Gateway IP: #1
Actions
- Edit ifcfg-<interface>:
> Manually edit /etc/network/interfaces.d/ifcfg-<interface> to change
MTU value (for example, from 1500 to 1502), this will cause the
script to detect the difference and trigger an interface update.
- Erase ifcfg-<interface>:
> Manually remove /etc/network/interfaces.d/ifcfg-<interface> file
from the filesystem.
- ifdown <interface>:
> Run command 'ifdown <interface>' to cause the interface to be
deactivated by ifupdown.
- Set link up <interface>:
> Run command 'ip link set up dev <interface>' to put interface's
link to UP state.
- Create VLAN <vlan-name> on <iface>:
> Create VLAN interface and set it's link to UP through the commands
'ip link add link <iface> name <vlan-name> type vlan id <vlan-id>'
and 'ip link set up dev <vlan-name>'.
- Add IP <address> to <interface>:
> Run command 'ip address add <address> dev <interface>' to add an
IP address to the interface.
- Add route to <interface> via <address>:
> Run command 'ip route add default via <address> dev <interface>' to
add a default route to the interface.
- Modify MTU:
> Modify the MTU of the interface via sysinv (example:
'system host-if-modify controller-0 oam0 -m 1502')
[ Test Case 1 - Direct script call tests ]
For the tests, changes to the OAM interface will be made to check if
its parameters (link state, IP address, default route) are correctly
restored by the apply_network_config.sh script. The changes here are
made manually to the files and not through sysinv.
Test procedure
1. Apply initial setup.
2. Apply actions.
3. Run /usr/local/bin/apply_network_config.sh as root.
4. Check that interface state, IP address and default route in kernel
match the ones in the sysinv database.
Tests
For scenario #1
PASS For if0, edit ifcfg
PASS For if0, erase ifcfg
PASS For if0, ifdown, erase ifcfg
PASS For if0, ifdown, edit ifcfg, set link up, add address IP#2
PASS For if0, ifdown, edit ifcfg, set link up, add address IP#5
PASS For if0, ifdown, edit ifcfg, set link up, add address IP#2,
add route via IP#4
For scenario #2
PASS For vlan100, edit ifcfg
PASS For vlan100, erase ifcfg
PASS For vlan100, ifdown, erase ifcfg
PASS For vlan100, ifdown, edit ifcfg, create VLAN, add address IP#2
PASS For vlan100, ifdown, edit ifcfg, create VLAN, add address IP#5
PASS For vlan100, ifdown, edit ifcfg, create VLAN, add address IP#2,
add route via IP#4
PASS For if0, edit ifcfg
PASS For if0, erase ifcfg
[ Test Case 2 - Indirect script call on lock/unlock ]
Test Procedure
1. Apply initial setup.
2. Lock host.
3. Apply actions.
4. Unlock host.
5. Check that interface state, IP address and default route in kernel
match the ones in the sysinv database.
Tests
For scenario #1
PASS For if0, modify MTU
PASS For if0, erase ifcfg (to simulate first unlock when ifcfg-* files
don't exist)
For scenario #2
PASS For vlan100, modify MTU
PASS For if0, modify MTU
PASS For if0 and vlan100, erase ifcfg
[ Test Case 3 - Indirect script call on upgrade ]
Tests
PASS Perform upgrade on VirtualBox
PASS Perform upgrade on a physical lab
-----------------------------------------------------------------------
Closes-Bug: #2036451
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
Change-Id: Ibda7c744e9be26b0bbcbd1520ffe15825ad1f60f
The implementation for worker firewall avoided using local kubectl
commands. This required access to the keyring for remote ansible
ad-hoc commands and leaves the /opt/platform/.config mounted on the
worker.
Use kubectl command with /etc/kubernetes/kubelet.conf instead, so we
can refrain from mounting /opt/platform/.config
Since all firewall data is generated in the host's hierada file, the
worker node needs to be able to access the calico firewall resources.
To achieve that a ClusterRole and ClusterRoleBinding are added, via
the controller node, allowing access to only the necessary resources.
Test Plan:
[PASS] Install a standard setup and validate the worker node firewall
configuration
[PASS] Execute a DOR test in the cluster and check if the worker nodes
install the firewall GNP and HE
[PASS] Execute worker node lock/unlock and check if the worker nodes
install the firewall GNP and HE
Closes-Bug: 2038550
Change-Id: Icf31b513427120fe81c53be21b8d8a81a8e323f8
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
Changed the task ordering for the ptpinstance runtime manifest. The
previous ordering could result in a race condition where ts2phc was
being restarted before clock-conf.conf file is updated. This would
result PTP instance out-of-tolerance, skewed from the primary
clock.
Changed the order between 'platform::ptpinstance' and 'platform::
ptpinstance::nic_clock', and moved the directory "${ptp_conf_dir}/
ptpinstance" creation to nic_clock. The nic_clock class will first
use it to store the clock-conf.conf file.
Test Plan:
PASS: Using a multiple PTP instances configuration and run the
manifest with system ptp-instance-apply. Ensure that ts2phc
starts after clock_nic class, the ts2phc is handling all the
interfaces configured and the clock is not skewed.
PASS: Host lock and unlock and check the puppet manifest log
Closes-bug: 2038383
Change-Id: I97e5a6bf536f05720fcaea860a83e53454e83ab6
Signed-off-by: Andre Mauricio Zelak <andre.zelak@windriver.com>
This change ensures that the isolcpu_plugin service is
stopped and masked prior to masking and stopping the
kubelet service. Additionally, on startup, the kubelet
service is unmasked and started prior to unmasking isolcpu_plugin.
This change is intended to avoid any race conditions that can
occur because of the dependency on kubelet by isolcpu_plugin,
resulting in numerous restarts of both services and
leading to failed node upgrades.
Test Plan:
- PASS: Upgrade kubelet AIO-SX
- PASS: Upgrade kubelet on a STANDARD installation
Closes-Bug: 2036985
Change-Id: Ifb2b512c3953d2a1f7efdba289a31d5a9315cae4
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
Updating just a comment that has apiVersion: kubeadm.k8s.io/v1beta2
to v1beta3.
Test Plan:
PASS: k8s upgrade from 1.22.5 to next all available versions.
Story: 2010878
Task: 48836
Change-Id: Id06a0e73212ddeb5b1817a6f7de286ca37a78538
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
The controller needs to add user configured DNS host records to
/etc/hosts file, before initial unlock when
dnsmasq service is not running. Other personalities like
worker hosts and storage hosts does not require this change.
Test Plan:
PASS: Successful build
PASS: Successful bootstrap, initial unlock of controller-0
PASS: Verify after controller-0 unlock, lock and unlock controller-1
PASS: Verify duplex controller host-swact with dns host records
PASS: Verify host records in /etc/hosts when dnsmasq is down
PASS: Verify host records absent in /etc/hosts when dnsmasq is up
Story: 2010835
Task: 48725
Change-Id: I3f11084c7458f89288ba4bef18c78e60ff55b74e
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
When the apply_network_config.sh script detects changes in an
interface's config file, it brings the link down and up again. This
causes any associated routes to be only automatically deleted from the
kernel and not restored.
This commit adds logic to restore any affected routes when the
associated interface is brought down/up.
Test plan
Setup:
System
- AIO-SX IPv4
Interfaces
- data0: ETH static ip 10.10.10.3/24
- data0.100: VLAN static IP 10.10.11.3/24
- data1: ETH static ip 10.10.20.3/24
- data1.200: VLAN static IP 10.10.21.3/24
Pre-existing routes
- rt01: 10.20.1.0 -> 10.10.10.1 via data0
- rt02: 10.20.2.0 -> 10.10.10.1 via data0
- rt03: 10.20.3.0 -> 10.10.11.1 via data0.100
- rt04: 10.20.4.0 -> 10.10.11.1 via data0.100
- rt05: 10.20.5.0 -> 10.10.20.1 via data1
- rt06: 10.20.6.0 -> 10.10.20.1 via data1
- rt07: 10.20.7.0 -> 10.10.21.1 via data1.200
- rt08: 10.20.8.0 -> 10.10.21.1 via data1.200
Routes to be added
- rt09: 10.20.9.0 -> 10.10.10.1 via data0
- rt10: 10.20.10.0 -> 10.10.11.1 via data0.100
- rt11: 10.20.11.0 -> 10.10.20.1 via data1
- rt12: 10.20.12.0 -> 10.10.21.1 via data1.200
Actions
- Change interface: manually edit /etc/network/interfaces.d/ifcfg-* to
change MTU value, this will cause the script to detect the
difference and trigger an interface update.
- Add route: manually edit /var/run/network-scripts.puppet/routes
to add a new route, this will cause the script to detect the
difference and trigger an update in the routes.
- Remove route: manually edit /var/run/network-scripts.puppet/routes
to remove a route.
Procedure for direct tests
1. Perform action
2. Run /usr/local/bin/apply_network_config.sh as root
3. Validade that routes in /var/run/network-scripts.puppet/routes and
in the kernel match
Direct tests:
PASS Change data0.100
PASS Change data0
PASS Change both data0.100 and data1.200
PASS Change both data0 and data1
PASS Add route rt09
PASS Remove route rt01
PASS Change data0.100 and add route rt10
PASS Change data0.100 and remove route rt03
PASS Change data0 and add routes rt09 and rt10
PASS Change data0 and remove routes rt01 and rt03
PASS Change data0 and data1, add routes rt09, rt10, rt11, rt12
PASS Change data0 and data1, remove routes rt01, rt03, rt05, rt07
PASS Repeat all the previous tests in an equivalent IPv6 setup
Indirect tests:
PASS Lock system, change MTU of data0 via 'system host-if-modify',
unlock system
PASS Lock system, erase /etc/network/interfaces.d/ifcfg-* and
/etc/network/routes, unlock system
Closes-Bug: #2036667
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
Change-Id: If7f301ff797a6644aaa3ff7f73de530d680c6b3b
This commit aims to handle the new platform crashdump service
parameters in order to create a crash-dump-manager configuration
file.
This file will be located in /etc/default/crash-dump-manager, and
will be used as an EnvironmentFile for the crashDumpMgr service, in
order to set the new parameters.
Test Plan:
PASS: TC1
add new parameters in the service-parameter table
with valid value
--> parameter added in service-parameter table
--> crash-dump-manager updated with new values
PASS: TC2
modify new parameters in the service-parameter table
with valid value
--> service-parameter table updated with new value
--> crash-dump-manager updated with new value
PASS: TC3
delete new parameters from the service-parameter table
--> parameter deleted from service-parameter table
--> crash-dump-manager updated with default value
PASS: TC4
add/modify new parameters in the service-parameter table
with no value
--> msg: "The service parameter value is mandatory"
--> parameter not added/modified in service-parameter table
--> crash-dump-manager not updated
PASS: TC5
a)add/modify max_files in the service-parameter table
with wrong format
--> msg: "Parameter 'max_files' must be an integer value."
--> parameter not added/modified in service-parameter table
b)add/modify the other parameters in the service-parameter
table with wrong format
--> msg: "Parameter <value> must be written in human readable
format"
--> parameter not added/modified in service-parameter table
PASS: TC6
reboot host with new parameters configured
--> service-parameter table keeps new parameters value
--> crash-dump-manager keep configured values
PASS: ISO installation
Story: 2010893
Task: 48766
Signed-off-by: Enzo Candotti <enzo.candotti@windriver.com>
Change-Id: Ia02e73462802c7831331bed6dec98a6b1cb37020
During upgrade bootstrap, the runtime execution of the puppet class
platform::network::runtime created a lock timeout with sysinv-agent
to allow the interface configuration in the kernel. The lock exists
to allow sysinv-agent to collect interface information for the
system inventory.
The optimized upgrade feature created this runtime execution to fill
the contents in /etc/network/interface.d/ and /etc/network/routes
to be available during the network bringup phase that happens after
the system unlock in an earlier step than the regular puppet manifest
execution. But as part of apply_network_config.sh execution it also
brings up the interfaces, accessing the script protected section and
causing lock timeout.
This change uses the file /var/run/.network_upgrade_bootstrap to
indicate a upgrade bootstrap is under way to just populate the
/etc/network files and to not activate the interfaces.
The interface activation is not needed as the bootstrap is still under
way and the minimal network configuration is already provided prior to
the bootstrap. After unlocking the files in /etc/network will provide
a faster network availability as systemd's network service is one of
the first to be executed during boot.
Test Plan:
[PASS] in 21.12 add an assorted network configuration with
- vlan, bonded, and ethernet interfaces (the vlan interface on
top of both base interfaces)
- configure the bond interface with static address
- add routes on all platform and data interfaces
[PASS] execute lock/unlock in 21.12 to verify config in applied
[PASS] execute AIO-SX upgrade to 22.12, at the bootstrap end check
files in /etc/network/
[PASS] finish upgrade and after unlock verify the network access and
address, interfaces, and route creation in the kernel
Closes-Bug: 2036451
Change-Id: Ib04f72298252a52a8a05cf644671106ad6530e5f
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
This reverts commit 7d870177c6c01f5b47a33ef1cd7ce92c3c58694f.
Reason for revert: blocking testing due to lock/unlock failures
Change-Id: Idc746dfc1076ef2f45a9790c7fecf4bb848162e2
The script /etc/init.d/etcd is used by the service manager for
management of the etcd service. The call '/etc/init.d/etcd status'
uses etcdctl health API to determine if the service is running
fine or not. In an event if etcd certs are replaced with new ones
but the service has not yet been restarted to use new ones, the
status call will fail even though the service is running fine and
the service manager will treat that as service is failed.
'sm-audit' (which is run periodically) uses '/etc/init.d/etcd status'
call to determine and maintain the service health. Service manager
receiving false service status may introduce a lot bugs.
One such scenario is that 'sm' ignores the 'service restart' call
if it thinks service is disabled. This leads to etcd not being
restarted with new certs during upgrade activate and not being
reachable to the kube-apiserver (which may have started using new
client certs).
This change modifies '/etc/init.d/etcd status' call to not just
rely on etcd health api to determine if the etcd service is running
and checks for the existence of etcd runtime information in case
the health api fails with the 'bad certificate' error.
Test Plan:
PASS: Replace old certs with new certs at /etc/etcd/ and do not
restart the service. Check that the '/etc/init.d/etcd status'
is 'running'.
PASS: Replace old certs with new certs at /etc/etcd/ and restart
the service. Check that the '/etc/init.d/etcd status' is
'running'.
Closes-Bug: 2033942
Change-Id: Id30a262ca1bde6d8acb85de10882ca9bd4b59bdd
Signed-off-by: kaustubh.dhokte <kaustubh.dhokte@windriver.com>
The tests in DC labs intended to bringup up to 1000 subclouds showed
problems to create all routes when it was necessary to add the
firewall update in the same puppet manifest apply that handled
platform::network::routes::runtime, generating an ever growing queue
that generated timeout during subcloud creation by DC manager.
This change adds the firewall classes to be run from inside
platform::network::routes::runtime to allow both the route
configuration and the firewall update
Test PLan:
[PASS] In a SysCtrl, add/remove a single route (via CLI/sysinv-API)
if in a controller, verify route and firewall update
[PASS] In a SysCtrl, add/remove a single route (via CLI/sysinv-API)
if in a worker, verify only route update is executed
[PASS] In a SysCtrl, add/remove 50 routes (via CLI/sysinv-API)
using the parallel bash command:
seq 1 50|parallel --jobs 25 --eta '\
system host-route-add 1 mgmt0 51.{}.{}.0 24 192.168.0.1 1 && \
system host-route-add 2 mgmt0 51.{}.{}.0 24 192.168.0.1 1;'
[PASS] In a SysCtrl, add a subcloud AIO-SX and validate route and
firewall update
[PASS] Install a subcloud and check the SysCtrl network is installed
as a route and in the mgmt firewall
Closes-Bug: 2033919
Change-Id: Ic5bf9bc84b8c583a39d8c91b72caef4d84240123
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
It was found that when applying a patch containing the code for
enabling the admin network on distributed cloud systems, a
problem could occur after unlock because of an undefined value
for 'private_dc_ip_address'.
This is because the value of this puppet parameter is only set
in the system.yaml hieradata, which is not written on normal
host lock/unlock.
The value is generated however, when a user assigns an interface
to the admin network, or performs another action that leads to
the system hieradata being generated.
If the user has not assigned an interface to the admin network,
or if the user has no interest in using the admin network, this
commit ensures that the value of the private_dc_ip_address is
that of the private_ip_address (ie. mgmt).
Setting the private_dc_ip_address to the private_ip_address
as default ensures that every component using the haproxy::params
behaves as it did pre-patch.
Testing:
Success path:
- Install, lock / unlock AIO-SX system (stx 8.0 + patch)
- Install, lock / unlock an AIO-DX DC subcloud (stx 8.0 + patch)
- Update a DC subcloud to use the admin network after the system
has been patched.
Regression:
- Install, lock / unlock AIO-SX system (dev build)
- Install, lock / unlock an AIO-DX DC subcloud (dev build)
- Install a DC subcloud using the mgmt. network (dev build)
- Install a DC subcloud using the admin network (dev build)
Story: 2010319
Task: 46911
Signed-off-by: Steven Webster <steven.webster@windriver.com>
Change-Id: Ice16a4a3a2d9faa461c1c0f98d1ea0b0c7ce751a
This reverts commit a1784deca9d30848f05d2ca53e66cf832d54b0da.
Reason for revert:
The white list created to ignore only for VM, however,
the ignored warning was also seen in real server.
This needs to be more extensively tested in different
types of server.
Story: 2010757
Task: 48644
Change-Id: I979b4269d0e8f68b5ea0c8471b14e666a437730d
Signed-off-by: Lucas Borges <lucas.borges@windriver.com>
Every puppet apply generates some caches of files
in /var/cache/puppet/clientbucket and the script is
cleaning another directory
/var/lib/puppet/clientbucket. This updates the cache path.
The vardir setting in puppet.conf is removed in
https://review.opendev.org/c/starlingx/integ/+/830542
this configuration sets the cache dir
Without that, the puppet stores the cache under
/var/cache/puppet/clientbucket.
Test Plan:
PASS: Ensure the cache directory is removed
after the puppet apply (AIO-SX)
Closes-Bug: 2034932
Change-Id: If564d00bc09030d0543669a11a646fcf502bf65b
Signed-off-by: Lucas Borges <lucas.borges@windriver.com>
- Modified manifests/network.pp enabled sriov when driver is vfio-pci
plan test:
01. PASSED - system host-lock controller-0
02. PASSED - Configure any NIC that has status UP
(check with the command: ip addr)
03. PASSED - system host-if-modify -m 1500 -n sriov0 -c pci-sriov \
-N 63 --vf-driver=vfio controller-0 <interface_name>
04. PASSED - system host-unlock controller-0
05. PASSED - system application-upload <sriov-fec-operator-version.tgz>
06. PASSED - system application-apply sriov-fec-operator
07. PASSED - cat /sys/module/vfio_pci/parameters/enable_sriov eq. "Y"
08. PASSES - cat /sys/module/vfio_pci/parameters/disable_idle_d3 eq. "Y"
(In this example I'm using ACC200 but it can be ACC100 or N3000)
09. PASSED - kubectl apply -f sriov-fec-config-acc200-vfio-pci.yaml
10. PASSED - kubectl apply -f acc200.yml
11. PASSED - kubectl exec acc200 -it – bash
12. PASSED - echo $PCIDEVICE_INTEL_COM_INTEL_FEC_ACC200
13. PASSED - Run the script below in the container
CPU_SET=$(taskset -pc 1 | sed -e 's/pid 1.s current \
affinity list://g')
for d in $(lspci -d 8086:57c1 | cut -d' ' -f1) ; do
/opt/sysroot/usr/local/bin/dpdk-test-bbdev \
--vfio-vf-token=02bddbbf-bbb0-4d79-886b-91bad3fbb510 \
-l $CPU_SET -a $d -- -c validation \
-v /opt/sysroot/usr/local/app/test-bbdev/ldpc_dec_default.data
done
14. PASSED - DPDK Validation with testpmd was performed
Closes-bug: #2033103
Signed-off-by: Bezerra Filho, Moacir <Moacir.BezerraFilho@windriver.com>
Change-Id: I880551fafbb351c370acedeaf43f9de595fea0af
- Add keywork retry_on in haproxy::backend
- Add values retry_on in keystone.pp
- Modified keystone_http_connect_timeout 10 to 15 in api.pp, api_proxy.pp, certalarm.pp and certmon.pp
this workaround solves:
- DC Scale | RR Patch Orchestration fails as it cannot retrieve patches for subcloud after the apply
- DC Patch - Parallel patch orchestration fails to establish connection to MGMT interface of subclouds
- Patch orchestration fail due to transient keystone errors
Test plan:
1. (PASSED) Patch Creation:
- Construct a "reboot required" RR patch that encompasses the specified changes.
- Generate an "in-service test" NRR patch.
2. (PASSED) Initial Setup:
- Commission a DC system with over 500 subclouds.
- Assert that the patch encompassing the fix is applied successfully on the DC.
3. (PASSED) Strategy Creation and RR Patch Deployment (Max 250 Subclouds):
- Created a RR patch strategy with max_parallel_subclouds set to 250
- Checked that the RR patch strategy is applied to all subclouds successfully.
- Repeat this process in more 250 subclouds
- Checked that the patch strategy is applied to all subclouds successfully.
4. (PASSED) Strategy Alteration and NRR Patch Deployment (Max 500 Subclouds):
- Eliminate the existing patch strategy.
- Initiate a NRR patch strategy, adjusting the max_parallel_subclouds parameter to 500.
- Checked that the "in-service test" NRR patch is successfully applied across all subclouds and that no linked issues arise.
Closes-Bug: #2025646
Change-Id: I95e9c8f3cd904d7f637da2ea69a83fd7fa5f03a1
Signed-off-by: Bezerra Filho, Moacir <Moacir.BezerraFilho@windriver.com>
Move generation of kubelet's systemd override file,
kube-stx-override.conf, from platform::kubernetes::master::init
to platform::kubernetes::configuration so that the file will
be generated on every host unlock. This facilitates delivery
of systemd service changes via patches to existing installs.
This change is needed by bug 2027810 to ensure that the
orphan volume cleanup script is executed as part of the systemd
ExecStartPre kubelet service override.
Test Plan:
Pass: - Update the kube-stx-override.conf.erb file
- Lock the host
- Unlock the host
- Verify that kube-stx-override.conf has been updated
- Verify AIO-SX
- Verify Standard config
Partial-Bug: 2027810
Change-Id: I3b496abc807bf75716d28079c62ef4700dcd3244
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
Add option 'conf-file' in dnsmasq conf file to use additional
config file in persistent store. This additional conf file is used
to support host-record option of dnsmasq.
Users can add A, AAAA and PTR records to the DNS with TTL support.
Test Plan:
PASS: Successful build
PASS: Successful bootstrap and unlock
PASS: Verify ping hostnames by adding host-record in new conf file
Story: 2010835
Task: 48724
Change-Id: I583d1cf783a32a3dc6f2d1c9786287a7f64809a3
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
Continuing the efforts from [1], this review consists in removing all
dependencies related to amqp classes as well as initializations for
rabbitmq variables. This removal can be done because sysinv does not
use rabbitmq.
Test plan
PASS - Perform fresh install and bootstrap in an AIO-SX successfully
PASS - Perform fresh install and bootstrap in an AIO-DX successfully
PASS - Run any system command successfully (system host-list, system application-list, etc)
Story: 2010802
Task: 48578
[1] - https://storyboard.openstack.org/#!/story/2010802
Change-Id: I5da60b97ac8808d95d5b76ade065ea521e62e251
Signed-off-by: Samuel Toledo <samuel.presatoledo@windriver.com>
Script puppet-manifest-apply.sh was patched until the warnings in
running the bootstrap, aio unlock and runtime manifests were resolved on
Debian: https://review.opendev.org/c/starlingx/stx-puppet/+/844121
In addition to the patch reversal, a whitelist was created for virtual
environments, where any warnings during the development process can be
added. The whitelist was implemented with an initial warning ("Could not
retrieve fact ipaddress"), that only affects virtual environments.
Test Plan:
PASS: Build & Install
PASS: AIO-SX & AIO-DX Successful Bootstrap
PASS: AIO-SX & AIO-DX Successful Unlock
PASS: Verified that Warnings added to the whitelist doesn't affect
manifests execution in virtual environments.
Story: 2010757
Task: 48644
Change-Id: Id67facc82bed7e069efb5f52e0775a69da355de0
Signed-off-by: Luis Marquitti <luis.eduardoangelinimarquitti@windriver.com>
Based on the goal of the story 2010802, this review replaces
the puppet template that calls the manage-partitions script
with a shell script, saving time by not rendering the puppet
template, just executing the shell script.
Tests showed a time reduction of around 10%.
Test plan
PASS: AIO-SX fresh install, bootstrap and initial unlock.
PASS: AIO-DX fresh install, bootstrap and initial unlock
for all nodes.
PASS: Standard (2+2) fresh install, bootstrap and initial
unlock for all nodes.
PASS: AIO-SX lock and unlock after install.
PASS: AIO-DX lock and unlock after install.
PASS: Standard (2+2) lock and unlock after install.
PASS: Standard (2+2):Test manage-partitions script with
modify, delete and create operations on all hosts.
PASS: SX:Test manage-partitions script with modify,
delete and create operations.
Story: 2010802
Task: 48595
Signed-off-by: Heron Vieira <heron.vieira@windriver.com>
Change-Id: I95847762b08d49d0fe8cf144691321489ea5b2c9