1704 Commits

Author SHA1 Message Date
Roger Ferraz
580e839d9e starlingx/stx-puppet README improvement
This story shall update the README file of a few most used StarlingX
repos.

Test Plan: N/A

Story: 2010814
Task: 48355

Change-Id: If7d4825337a8057d3be540d96885c7956b857730
Signed-off-by: Roger Ferraz <rogerio.ferraz@encora.com>
2023-07-19 12:21:22 -03:00
Zuul
28fd21419f Merge "Fix bootstrap failure for subclouds w/ complex passwords" 2023-07-11 14:08:35 +00:00
Zuul
d7ad7e7a15 Merge "Add parameter to openrc and change permissions" 2023-07-10 21:24:08 +00:00
Zuul
5d6fdaa80c Merge "Avoid "systemctl is-active" calls to prevent process restarts" 2023-07-10 14:14:29 +00:00
Joao Victor Portal
9a3f92eed9 Add parameter to openrc and change permissions
This commit changes the file "/etc/platform/openrc" to allow its usage
by other users. The parameter "--no_credentials" was added for
this purpose. Also, the permissions of this openrc file was changed to
0644 to allow its usage by users with no privileges.
The typical use case is an LDAP user with or without privileges
sources this openrc file and then sets the variables OS_USERNAME,
OS_PASSWORD and PS1 (that uses OS_USERNAME).
Also, the test to check if the controller is the active one, changed:
previously, it was tested just if the password gotten was empty, but as
the reason now to get an empty password may be a user with insufficient
privileges, the test changed to check whether the executable file
"keyring_file" exists (it exists only in the active controller and that
is the reason why a standby controller gets an empty password).

Test Plan:

PASS: Successfully deploy an AIO-DX containing this change. Check that
the permissions of "/etc/platform/openrc" are 644, owner root, group
sys_protected.
PASS: In the deployed AIO-DX, create 2 users: user1 is not part of
groups sys_protected and root, user2 is part only of group
sys_protected.
PASS: In the active controller of AIO-DX, using users user1 and user2,
execute the following commands: for "source /etc/platform/openrc
--no_credentials" command, the result for all users is that the file is
sourced without errors; for "source /etc/platform/openrc; system
host-list", user1 gets a message saying it doesn't have privileges to
read keyring password and an error message for system command, while
user2 gets the commands executed without errors.
PASS: Repeat the test above for standby controller: for "source
/etc/platform/openrc --no_credentials" command, all users get a message
saying it should only be loaded from active controller; for "source
/etc/platform/openrc; system host-list", also a message is printed
saying it should only be loaded from active controller and an error
message appears for system command.

Partial-Bug: 2024627
Signed-off-by: Joao Victor Portal <Joao.VictorPortal@windriver.com>
Change-Id: I6ef2ca16a272d1fc7c4a24b9f5b48a9cb860450f
2023-06-30 20:08:08 +00:00
Thales Elero Cervi
6e4f3df557 Handle sysinv dpdk_elf_file configuration
As part of Debian migration, the sysinv procedure to check DPDK
compatibility for each host interface was also updated in order to make
it customizable in case one would like to use other virtual switch than
the delivered OVS with DPDK support [1].

For other virtual switches, that might or not rely on DPDK, the ELF
target that sysinv uses to verify interfaces compatibility must be
customizable and the query_pci_id script is already able to use custom
values [2].

This change adds to puppet the system configuration that will write, if
defined, the correct value for the ELF path. This platform parameter can
be overridden on the hiera data so puppet will update sysinv.conf
accordingly.
For now, when deploying StarlingX with vswitch_type=ovs-dpdk we will
override it to the query_pci_id script default value (i.e., the
/usr/sbin/ovs-vswitchd ELF) using the respective sysinv puppet module
and let it as an example for anyone that is later using a different
vswitch which requires this customization [3].

[1] https://review.opendev.org/c/starlingx/config/+/872979
[2] 2cd0b1e14a/sysinv/sysinv/sysinv/scripts/query_pci_id (L34)
[3] https://review.opendev.org/c/starlingx/config/+/887106

Test Plan:
PASS - Build puppet-manifest package
PASS - Build a custom stx ISO with the new package
PASS - Bootstrap AIO-SX virtual system (vswitch_type=none)
       and ensure the hiera data was not modified neither
       sysinv.conf was updated
PASS - Bootstrap AIO-SX virtual system (vswitch_type=ovs-dpdk)*
       and ensure the hiera data was modified correctly and
       sysinv.conf was updated accordingly
* A successful complete installation with ovs-dpdk is still blocked by
a bug that will be solved soon:
https://bugs.launchpad.net/starlingx/+bug/2008124

Story: 2010317
Task: 46389

Signed-off-by: Thales Elero Cervi <thaleselero.cervi@windriver.com>
Change-Id: Iaf31d3b5e2fc03b4783473e4329a780a516a9d43
2023-06-30 10:03:43 -03:00
Manoel Benedito Neto
23479e7183 Fix bootstrap failure for subclouds w/ complex passwords
This commit adds single quotes around user password parameter value to
ensure that complex passwords are valid when user option setup script
is executed by puppet bootstrap.

Test Plan:
PASS: Full build, system install, bootstrap and unlock DC system, with
      one subcloud bootstrapped and unlocked with active enabled
      available status.
PASS: Add, bootstrap, manage and unlock a subcloud with a complex
      password containing special characters, numbers, capital letters
      and an open parenthesis at the end of the sentence.

Closes-Bug: 2025292

Change-Id: Ia5430084bf6b16c78594a2483f2b88ec9b18f36a
Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
2023-06-28 16:36:25 -03:00
Andre Kantek
0ec63a4667 Use L4 ports used in OAM firewall from system.yaml
In order to unify implementation with the other platform firewalls,
the hard-coded values are set to 'undef' and will be provided by
sysinv in system.yaml

The test below validates the correct values are present in the OAM
firewall

Test Plan:
[PASS] Install, Lock, Unlock AIO-SX
[PASS] Install, Lock, Unlock AIO-DX (as SystemController)

Story: 2010591
Task: 48255

Depends-On: https://review.opendev.org/c/starlingx/config/+/885585
Change-Id: Idc1f71f7ba762dc76529022acf4145db00686ec2
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-23 13:26:14 -03:00
Boovan Rajendran
867447b5d7 Wait for control-plane endpoints health during k8s upgrade abort
This script waits for the k8s control-plane component endpoints
(apiserver, scheduler, controller-manager, kubelet) to be up and
running at the end of platform::kubernetes::upgrade_abort.

Retry/timeout parameters are configured to wait up to 3 minutes.

Test plan:
Pass: Verify the abort waits for all control-plane endpoints to be
healthy.
Pass: Verify /var/log/kubernetes/k8s-endpoints-health.log shows
'Timeout: Kubernetes control-plane endpoints not healthy' message
after timeout exceed.

Story: 2010565
Task: 48203

Depends-On: https://review.opendev.org/c/starlingx/config/+/885582

Change-Id: I232b4746a3eb899ba87e706160547e8792489394
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2023-06-22 12:12:44 -04:00
Zuul
52d9a8d58d Merge "Puppet: Fix unacceptable location warnings" 2023-06-20 12:58:03 +00:00
Zuul
3162eb6667 Merge "Add sssd systemd service file override" 2023-06-19 14:27:58 +00:00
Andy Ning
559b79b72e Add sssd systemd service file override
sssd is monitored by pmon. But currently the Restart option in its
systemd service file is set to on-failure. This sometimes causes
systemd and pmon to fight to restart the service when it fails. All
processes monitored by pmon should have Restart set to "no".

This change added a systemd override file to set Restart to "no" for
sssd service.

Test Plan:
PASS: Standard system deployment.
PASS: Check sssd Restart option using "systemctl cat sssd", verify
      Restart option is set to "no", as following:

      # /etc/systemd/system/sssd.service.d/sssd-stx-override.conf
      [Service]
      # pmond monitors sssd service
      Restart=no
PASS: Kill sssd process, verify pmon restart it successfully by
      tailing pmon.log, and verify sssd is running by "systemctl
      status sssd" command.

Closes-Bug: 2023421
Signed-off-by: Andy Ning <andy.ning@windriver.com>
Change-Id: I84521caf3745122492afe9ef4a251e42129b29b0
2023-06-16 10:36:24 -04:00
Carmen Rata
d2b27e7440 Create group for ldap users with denied ssh access
Local OpenLDAP and WAD servers are being used for k8s api and SSH
authentication. We need the ability to disallow SSH authentication
for selective users. As part of the solution, we create a Linux
group where all ldap users with "denied ssh access" will be added.
The group will be set for denied ssh access in the sshd configuration.
The sshd configuration change is part of a separate commit.

Test Plan:
PASS: Debian image gets successfully installed in AIO-SX system.
PASS: Verify the Linux group has been created.
PASS: Create an openldap user and add to the "deny ssh access" group.
Verify that the user cannot ssh.
PASS: Create a WAD group with the same name and gidNumber as the
Linux group for "deny ssh access". Create a WAD user in this group.
Validate that the new WAD user in the "deny ssh group" cannot ssh
to stx platform.
PASS: Remove the WAD user from the WAD "deny ssh access" group.
Validate that now the user can have ssh access to stx platform.
PASS: Remove the openldap user from the Linux "deny ssh access" group.
Validate that now the user can have ssh access to stx platform.

Story: 2010589
Task: 48234

Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
Change-Id: Ib1229f21e207d66d39f8bcdb7acf0533ace527c1
2023-06-15 00:24:29 +00:00
Zuul
c5719c38d1 Merge "Add grub update to restore" 2023-06-14 21:36:35 +00:00
Matheus Guilhermino
665af9b8e8 Puppet: Fix unacceptable location warnings
The names of classes/defines should match the name that's
implied by their file path. Puppet throws an "unacceptable
location" warning whenever this condition is not satisfied.

Test Plan:
PASS: Build & install
PASS: AIO-SX Successful Bootstrap
PASS: AIO-SX Successful Unlock
PASS: Verified that 'unnaceptable location' warnings are
      no longer present on puppet.log

Story: 2010757
Task: 48026

Change-Id: I1cd3d09e90bfeb3d206b540717943ea1e6413444
Signed-off-by: Matheus Guilhermino <matheus.machadoguilhermino@windriver.com>
2023-06-14 19:39:28 +00:00
Zuul
a38e81c143 Merge "Config sssd on storage node" 2023-06-14 14:04:25 +00:00
Joshua Kraitberg
16589c4f3d Add grub update to restore
This ensures that the kernel boot args are correct.
When they are not correct, puppet will trigger a reboot
after unlocking to fix them.

TEST PLAN
PASS: AIO-SX backup and restore
  * New backup will include /boot files
  * Non-default kernel boot args will be kept
  * No double reboot
  * /proc/cmdline can be used to verify kernel boot args
PASS: AIO-SX backup and restore
  * Remove new /boot files from backup
  * Restore with modified backup
  * Non-default kernel boot args will be lost
  * No double reboot
  * /proc/cmdline can be used to verify kernel boot args

Partial-Bug: 2023678
Change-Id: I5f0c91c0c8583f4a86148ddf0fadc03b18ff9c1a
Signed-off-by: Joshua Kraitberg <joshua.kraitberg@windriver.com>
2023-06-14 09:47:00 -04:00
Davi Frossard
319622d42d Avoid "systemctl is-active" calls to prevent process restarts
Replaces "systemctl is-active" calls by "pid file check" approach for
docker-distribution (docker-registry) and registry-token-server
services. These calls were causing unnecessary process restarts in
cases where systemd was halted due to contention on kernfs_mutex.

Test Plan:
PASS: Verify docker-distribution status
PASS: Verify registry-token-server status

Partial-bug: 2016028
Change-Id: I2398d7f397ad14d2ff1ff6d40141ffad4f54f2e3
Signed-off-by: Davi Frossard <dbarrosf@windriver.com>
2023-06-14 13:07:17 +00:00
Andy Ning
cc2e4d086e Config sssd on storage node
Currently sssd is not configured and running on storage nodes so
ldap users can't login to storage nodes. This update makes sssd
configured, and running on storage nodes (with a followup update).

Test Plan:
PASS: System with storage nodes deployment
PASS: In storage nodes, verify that the following config file exist:
      /etc/sssd/sssd.conf

Closes-Bug: 2023399
Signed-off-by: Andy Ning <andy.ning@windriver.com>
Change-Id: I383c101e0f99be93e9da528411c6fa1fd8cde4c6
2023-06-12 09:33:50 -04:00
Andre Kantek
b4d16baa2e Create class to update the admin firewall in runtime
This change creates a class to update the admin firewall during
runtime operations

Test Plan:
[PASS] in subcloud mode, add/remove static routes in the mgmt network
[PASS] in subcloud mode, add/remove static routes in the admin network

Story: 2010591
Task: 48202

Change-Id: I3a4025cb8c6ff8d90ba36b49e2aaa12d0ec7057b
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-06-08 13:14:52 -03:00
Zuul
baaadc0d05 Merge "Create class to update the mgmt firewall in runtime" 2023-06-06 16:49:25 +00:00
Jorge Saffe
64aea5ba6d Fix k8s custom configuration configmap init flag
After system bootstrap, when service-parameter-apply is executed
for the first time, it verifies that the k8s configmaps linked to
service-parameters extra-volumes exist, then creates a flag
(configmap_initialization_flag) to skip this step on subsequent
runs. If the flag is not generated, the k8s custom configuration
script checks the k8s configmaps each time it is run.

Test Plan:
  PASS: Fresh Install STD/DX
  PASS: Apply K8s service-parameter.
  PASS: Verify configmap initialization flag has been created.

Closes-bug: 2022983

Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com>
Change-Id: Ie28247fd62945f90a9018a7ebb7942245ea5aeb4
2023-06-05 22:37:01 -04:00
Lucas Ratusznei Fonseca
e13c3988b1 Create class to update the mgmt firewall in runtime
This change creates the new class to update the management network
firewall in runtime. The class is meant to be applied by
sysinv-conductor when the route config is updated in system controller
hosts.

Test plan:

Setup: Distributed Cloud with AIO-DX as system controller.

[PASS] Add route in a management interface, check that the
       corresponding network is present in the system controller's
	   firewall.
[PASS] Remove previously created route, check that the corresponding
       network is no longer present in the system controller's
	   firewall.

Story: 2010591
Task: 48174
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
Change-Id: I08fa9e2807f0c734c716c28c1996588167ee9d58
2023-06-05 15:17:26 -03:00
Zuul
bad31f6be6 Merge "Introduce new log file in /var/log/rss-memory.log" 2023-06-01 18:11:19 +00:00
Zuul
e05dc19857 Merge "Add support for kube-upgrade-abort" 2023-06-01 14:06:31 +00:00
Zuul
41d80011d3 Merge "Fix ntpd losing sync after some days" 2023-06-01 12:29:29 +00:00
Caio Bruchert
e2b54be8e4 Fix ntpd losing sync after some days
The default ntpd configuration enables network interfaces scanning and
this is causing ntpd to lose sync after about 2 days and 9 to 10 hours.

This fix disables ntpd interface scanning by adding the -U 0 option.

Note: this was detected on CentOS and both CentOS and Debian will have
add the same option to maintain consistency.

Test Plan:
PASSED: Debian: check that ntpd -U 0 configuration is applied
PASSED: Debian: wait for more than 5 days and check that ntp sync is still working

Closes-Bug: 2017697

Change-Id: I1c2727b71d71bf03966c834c470bd225e2a95c81
Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
2023-05-31 14:03:22 +00:00
Zuul
e70ffd8c21 Merge "Disable postgres huge page usage" 2023-05-30 20:12:22 +00:00
Cesar Bombonate
62e2c8b8e5 Introduce new log file in /var/log/rss-memory.log
This change adds a new log /var/log/rss-memory.log for
memory growth debuging

The following entry into crontab will output daily at 01:00:
0 1 * * * /usr/bin/date >> /var/log/rss-memory.log;
 /usr/bin/ps -e -o ppid,pid,nlwp,rss:10,vsz:10,
 comm,cmd --sort=-rss >> /var/log/rss-memory.log

Test Plan:
	- PASS: Build an image, install and bootstrap successfully
	- PASS: Apply monitor pods so addon logs would be installed.
	- PASS: Check that log entries are correctly displayed.
        - PASS: Tested on controller, AIO, worker and storage hosts.


Closes-Bug: 2019007
Change-Id: I6f8e6208d203bcc77320ced3766af04dab977829
Signed-off-by: Cesar Bombonate <Cesar.PompeudeBarrosBombonate@windriver.com>
2023-05-30 15:13:00 +00:00
Boovan Rajendran
093ffbe27c Add support for kube-upgrade-abort
This change is to restore the etcd snapshot during k8s
upgrade abort.

During k8s upgrade abort we need to drain the node, remove the
static pod manifests files stop the kubelet, containerd, docker
and etcd services, restore the etcd snapshot, restore the static pod
manifests, start the etcd, docker and containerd services, update the
bindmount and start kubelet service.

The helper script 'kube-wait-control-plane-terminated.sh' is used to
wait with a timeout for the control plane pods processes to exit after
removing static pod manifests files and forcibly kill the process if the
timeout expires.

Test Plan:
AIO-SX: Perform k8s upgrade v1.24.4 -> v1.25.3
PASS: Create a test pod, before the etcd backup and delete the pod
after taking snapshot run the command "system kube-upgrade-abort",
verify test pod is running after etcd is restored successfully.
PASS: Verify kubeadm and kubelet version restored successfully to the
from version after k8s upgrade abort.
PASS: Verify static manifest are restored successfully after k8s
upgrade abort.
PASS: Verify all the pods are restored and running successfully.
PASS: Verify pod networking are still working.

Story: 2010565
Task: 48070

Change-Id: I2efda2c9f84346933a9b1277e95d95cd8d21c50f
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2023-05-29 12:03:51 -04:00
Zuul
8698a89a3e Merge "Configure the firewall in the worker nodes." 2023-05-29 13:21:48 +00:00
Andre Kantek
c251d37495 Configure the firewall in the worker nodes.
This change adds the capacity to install the worker node required
firewall into the calico configuration since kubectl isn't available
there. It uses ansible ad-hoc commands to access the controller from
the worker and execute the command.

In all test cases below the iptables/ip6tables content in the worker
node was verified

Test Plan:
[PASS] Install worker node.
[PASS] Execute lock/unlock in the worker node.
[PASS] Reinstall worker node.

Story: 2010591
Task: 48067

Change-Id: I613b4ea710172c2bc7c6408bfa36430cbfe33fa2
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-05-25 18:23:41 -03:00
Nidhi Shivashankara Belur
94ee511348 Fix the n3000 config file
- Remove "-a" flag from the command as it has been deprecated in the
  original pf-bb-config source.

Test Status:
- PASS: Configure device using host-device-modify with 1 VF and unlock
  the host.
- PASS: Create a test pod requesting 1 VF.
- PASS: Run dpdk test-bbdev application on 1 VF inside the pod.

Closes-Bug: 2020128

Story: 2010698
Task: 48063

Change-Id: I229b505755138495c79513e926f674c28797b79b
Signed-off-by: Nidhi Shivashankara Belur <nidhi.shivashankara.belur@intel.com>
2023-05-23 08:56:41 -07:00
Zuul
bb546aef2b Merge "lowlatency runtime manifest updates" 2023-05-19 17:02:37 +00:00
Zuul
95e8f50715 Merge "New service parameter for intel_pstate" 2023-05-16 15:44:29 +00:00
Kyale, eliud
b68ece65cc lowlatency runtime manifest updates
updated existing personality manifests with
- include platform::grub::kernel_image (grub.pp)
- include platform::config::file::subfunctions::lowlatency (config.pp)

also new runtime manifests with the same functionality

- platform::grub::kernel_image::runtime
  updates the kernel image in kernel.env

- platform::config::file::subfunctions::lowlatency
  updates /etc/platform/platform.conf 'subfunction=' line

new args for 'puppet-update-grub-env.py' script
'--set-kernel-lowlatency' and '--set-kernel-standard'

Test plan:

Using:
    /opt/platform/puppet/<version>/hieradata/<ipaddress>.yaml
puppet config data will be edited manually 'true' <-> 'false'
    platform::sysctl::params::low_latency

-----------------------------------
Manual puppet runtime manifest test
-----------------------------------

cat << EOF > /tmp/test_runtime.yaml
classes:
- platform::grub::kernel_image::runtime
- platform::config::file::subfunctions::lowlatency::runtime
EOF

/usr/local/bin/puppet-manifest-apply.sh \
 /opt/platform/puppet/<version>/hieradata/ \
 <ipaddres> \
 runtime \
 /tmp/test_runtime.yaml > /tmp/test_runtime_manifest.log

Check
 - cat /etc/platform/platform.conf
 - cat /boot/1/kernel.env
 - /usr/local/bin/puppet-update-grub-env.py --list-kernels
 - sudo -u postgres psql -d sysinv
    -> select id,personality,uuid,subfunctions from i_host

PASS - AIO-SX: iso install and bootstrap successfully
               aio.pp and ansible_bootstrap.pp puppet manifests
               are run successfully
PASS - AIO-SX: change puppet config data in <ip>.yaml
               run puppet runtime manifests manually
               and observer platform.conf and kernel.env updates
PASS - AIO-SX: repeat setting lowlatency and confirm no duplicate
               entries in platform.conf
-----------------------------------

Task: 47942
Story: 2010731

Change-Id: I8e6ccb73829dc315fd6f8955f28fb6f22b57b137
Signed-off-by: Kyale, eliud <Eliud.Kyale@windriver.com>
2023-05-16 10:48:14 -04:00
Zuul
2b1eaace8b Merge "Add L3 firewall support for platform networks" 2023-05-11 17:32:55 +00:00
Zuul
8550cbedfd Merge "Add support for kube-upgrade-abort" 2023-05-09 15:27:50 +00:00
Boovan Rajendran
d2f221eabb Add support for kube-upgrade-abort
This change allow us to call a puppet class to update
the bindmounts, restore the saved static manifest files, restart
kubelet and restart etcd during k8s upgrade abort.

This change is also to solve the warning message
"Unrecognized escape sequence" which comes during kubelet upgrade.

Test plan:
Pass: Abort the k8s upgrade by 'system kube-upgrade-abort' command
and verify static manifest files are restored, bindmounts are updated,
kubelet and etcd restarted successfully.
Pass: Verify /etc/fstab content updated successfully after k8s upgrade
abort.

Story: 2010565
Task: 47822

Change-Id: If1b1bda88a898bc6360403a839e174fbc0d62008
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2023-05-09 09:31:31 -04:00
Zuul
5e31e8af52 Merge "Fix for subcloud admin network addition after initial install" 2023-05-08 22:55:08 +00:00
Steven Webster
0a56493a89 Fix for subcloud admin network addition after initial install
This commit addresses a bug fix for the following scenario:

1. A user installs a subcloud with the communication between
   subcloud and system controller assigned to the
   management network.

2. The user decides they want to transition to the admin network,
   which allows changes to the subnet information after install.

3. The user locks a host, creates a platform interface for the
   admin network, then unlocks.

4. The user (after unlock) creates an address pool, admin
   network, and assignes the network to the previously created
   interface.

Because there is a requirement in StarlingX for the admin network
to be able to apply subnet changes (address pool, network) at
runtime, this scenerio causes an issue because the admin-services
SM service-domain-member and service group are only actually
present in the SM database after an unlock.  In the above
scenerio, we logically create an admin interface but only assign
it to and 'admin' network after unlock.

This commit handles the above by ensuring the admin-services
service-domain-member and service-group are enabled in the case
that the system is a subcloud.

Test Plan:

1. Install a subcloud using the management network for communication
   with a system controller. Ensure no alarms and that the
   admin-services service group is active, with no admin-ip service
   created.
   Lock, create an 'admin' interface and
   unlock.  After unlock create and apply the admin address pool
   and network. Ensure the subcloud can be updated to use the admin
   network via dcmanager subcloud update. Ensure that the admin-ip
   service is enabled-active.

2. Install a subcloud using the management network for communication
   with a system controller.  Lock, create an 'admin' interface,
   create an 'admin' address pool and network, then unlock. Ensure
   the subcloud can be updated to use the admin network via dcmanager
   subcloud update.

3. Install a subcloud using the admin network for communication with
   a system controller.  Ensure the subcloud can become managed,
   online, and in-sync.

4. Perform the steps 1-3 for both AIO-SX and AIO-DX.

Story: 2010319
Task: 46911

Signed-off-by: Steven Webster <steven.webster@windriver.com>
Change-Id: I692dcf4f7e8c280236d63984ffd02afbed0a3e1d
2023-05-08 14:51:30 +00:00
Andre Kantek
3fa5584cf3 Add L3 firewall support for platform networks
Adding puppet classes to install L3 firewall in cluster nodes that
can run kubernetes (controllers and workers), It uses the hash2yaml
function from the package puppet-hash2stuff, the change is marked as
a dependency for this task.

The story 2010591 is still under development and for now we are only
applying the platform firewalls into the controller nodes.

With the change https://review.opendev.org/c/starlingx/config/+/881495
the new classes' config info is provided. At this first delivery the
firewall will not contain restrictive rules, focusing more in making
the necessary GlobalNetworkPolicy and HostEndpoints to be correctly
installed among the nodes

Test Plan:
[PASS] install AIO-DX
[PASS] install Standard with DX+worker+storage nodes

Story: 2010591
Task: 47954

Depends-On: https://review.opendev.org/c/starlingx/integ/+/881497
Change-Id: I1d35abde612cdaf3ccb54a858618037382ff2636
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2023-05-08 09:56:46 -03:00
Matheus Guilhermino
248c258bc9 Disable postgres huge page usage
In order to use the total available 1G hugepages space when
vswitch_type parameter is set to 'none', the value huge_pages=off
needs to be included on /etc/postgresql/postgresql.conf since, by
default, postgres uses hugepages if available.

The postgresql.pp is a manifest called on unlock

Test Plan
PASS: AIO-SX: Successfully bootstrapped and unlocked
PASS: Verified that app_hp_avail_1G == app_hp_total_1G after
      increasing huge page memory to the amount indicated by
      app_hp_total_1G (total and available values match when
      no applications are using huge pages).
PASS: Output of 'cat /proc/meminfo' matches output of
      'system host-memory-list controller-0'
      (HugePages_Free == app_hp_avail_1G).

Closes-bug: 2018324

Change-Id: Iab7b7518fdcfccd2761778ed6a875a42cd35c34c
Signed-off-by: Matheus Guilhermino <matheus.machadoguilhermino@windriver.com>
2023-05-04 08:00:22 -03:00
Chris Friesen
b9ff315ff7 remove reference to deleted exec resource
In commit 77e0c7c1 we removed an exec resource that called out to an
obsolete script.  However, we neglected to remove a "require"
metaparameter which referenced the removed script.  It's unclear how
this was missed since the previous change was tested in VirtualBox.

This causes a puppet error when trying to upgrade K8s:

Could not find resource 'Exec[update kubeadm-config]' in parameter
'require' (file:
/usr/share/puppet/modules/platform/manifests/kubernetes.pp, line: 813)

The fix is to remove the metaparameter.

TEST PLAN:
PASS: While running the dev branch on AIO-DX, upgrade K8s from 1.21
      to 1.22.
      (note, a workaround was required to deal with
       https://bugs.launchpad.net/starlingx/+bug/2018247)

Partial-Bug: 2017696
Change-Id: I66c0e88f0f0a3acc3326391263123e60667561cc
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2023-05-01 12:19:38 -06:00
Davlet Panech
a5489c0b62 Fix github mirroring for this repo
Updating the rsa ssh host key based on:
https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/

Note: In the future, StarlingX should have a zuul job and
secret setup for all repos so we do not need to do this
for every repo.

Needed to rename the secret, because zuul fails if like-named
secrets have diffent values in different branches of the same
repo.

Partial-Bug: #2015246
Change-Id: I94d27934bbfafb174f8e8d48491e6089f47e6408
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
2023-04-28 12:38:53 -04:00
Gleb Aronsky
77e0c7c14f Remove references to deprecated script
upgrade_k8s_config.sh has been deprecated and
removed due to lack of support for "flow" style YAML.
Deprecated functionality has been superseded
by better YAML-aware handling in sysinv.

Updating how we invoke kubeadm, we will now use an explicit
version of kubeadm when calling it.  The version called
will now match the version we are upgrading to in order to handle
the format unsupported by previous versions of kubeadm.

Test Plan
PASS:
- Manually update scripts on controllers and worker nodes based on
  https://review.opendev.org/c/starlingx/integ/+/880390
- Perform manual upgrade from k8s v1.21.8 to v1.22.5
- Verify kubernetes successfully upgraded to v1.22.5

Test was performed in the lab with local changes
to verify the code.
Patch was not tested.

Closes-Bug: 2017696
Change-Id: I840eb566057be495fe0da3cae7604bf8055c0d4f
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
2023-04-26 16:22:51 -07:00
Zuul
8d5b19c952 Merge "Fix restore failure during puppet-manifest-apply" 2023-04-25 13:52:00 +00:00
Zuul
50823022a6 Merge "kubelet-config custom parameters are missing after k8s upgrade" 2023-04-25 13:51:34 +00:00
Jorge Saffe
2ceceb29f6 Fix restore failure during puppet-manifest-apply
When K8s custom config puppet script is executed during restore
playbook, K8s updates fail when trying to validate cluster network
data. This happens whenever the OAM IP address is reconfigured (after
reinstall) with a different protocol version than the one used for the
K8 cluster host subnet.

The issue is related to "advertise-address" parameter. It is not
predefined in the api-server extra-args during bootstrap, so k8s gets
the host's default interface as default value. In this case, the host’s
default value is an IPv4 (IPv6) address while all the other K8s cluster
subnets are configured with IPv6 (IPv4) addresses.

K8s validation fails because STX defaults to a SingleStack mode. Only
dual-stack networks allow the assignment of IPv4 and IPv6 addresses to
pods and services.

Test Plan:
  PASS: Fresh Install AIO-SX.
  PASS: Create a backup and reinstall server.
  PASS: Reconfigure network OAM IF with a different IP family.
  PASS: Restore system.
  PASS: Verify advertise-address parameter.
  PASS: Modify and Apply K8s service-parameter.
  PASS: Fresh Install STD/DX
  PASS: Modify and Apply K8s service-parameter.
  PASS: Verify advertise-address parameter in both controllers.

Closes-bug: 2001715

Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com>
Change-Id: I6f75f171d0a45abe2d5e047a31308dc97ce19eed
2023-04-25 12:44:34 +00:00
Jim Gauld
de57231375 kubelet-config custom parameters are missing after k8s upgrade
The kubernetes.pp class platform::kubernetes::upgrade_first_control_plane
which does 'kubeadm upgrade apply' resulted in versioned kubelet-config
ConfigMap. The pre-upgrade ConfigMap was left behind.

Having multiple ConfigMap causes 'system kube-config-kubelet' to fail,
so reconfiguration was broken.

In historical releases, we had specified '--config
/etc/kubernetes/kubelet_override.yaml', so the the kubelet garbage
collection eviction parameters became incorrect post k8s upgrade,
without a way to reconfigure.

This update will purge all kubelet-config ConfigMap except the most
recent. This occurs immediately following 'kubeadm upgrade apply' step.

Testplan:
PASS: AIO-SX perform k8s upgrade, run 'system kube-config-kubelet'.
      Verify only current version kubelet-config ConfigMap exists.

Closes-Bug: 2012975
Change-Id: I5e34299616690628267c07a744dc9923144e606d
Signed-off-by: Jim Gauld <James.Gauld@windriver.com>
2023-04-24 16:21:22 -04:00