stx-puppet

Author	SHA1	Message	Date
Roger Ferraz	580e839d9e	starlingx/stx-puppet README improvement This story shall update the README file of a few most used StarlingX repos. Test Plan: N/A Story: 2010814 Task: 48355 Change-Id: If7d4825337a8057d3be540d96885c7956b857730 Signed-off-by: Roger Ferraz <rogerio.ferraz@encora.com>	2023-07-19 12:21:22 -03:00
Zuul	28fd21419f	Merge "Fix bootstrap failure for subclouds w/ complex passwords"	2023-07-11 14:08:35 +00:00
Zuul	d7ad7e7a15	Merge "Add parameter to openrc and change permissions"	2023-07-10 21:24:08 +00:00
Zuul	5d6fdaa80c	Merge "Avoid "systemctl is-active" calls to prevent process restarts"	2023-07-10 14:14:29 +00:00
Joao Victor Portal	9a3f92eed9	Add parameter to openrc and change permissions This commit changes the file "/etc/platform/openrc" to allow its usage by other users. The parameter "--no_credentials" was added for this purpose. Also, the permissions of this openrc file was changed to 0644 to allow its usage by users with no privileges. The typical use case is an LDAP user with or without privileges sources this openrc file and then sets the variables OS_USERNAME, OS_PASSWORD and PS1 (that uses OS_USERNAME). Also, the test to check if the controller is the active one, changed: previously, it was tested just if the password gotten was empty, but as the reason now to get an empty password may be a user with insufficient privileges, the test changed to check whether the executable file "keyring_file" exists (it exists only in the active controller and that is the reason why a standby controller gets an empty password). Test Plan: PASS: Successfully deploy an AIO-DX containing this change. Check that the permissions of "/etc/platform/openrc" are 644, owner root, group sys_protected. PASS: In the deployed AIO-DX, create 2 users: user1 is not part of groups sys_protected and root, user2 is part only of group sys_protected. PASS: In the active controller of AIO-DX, using users user1 and user2, execute the following commands: for "source /etc/platform/openrc --no_credentials" command, the result for all users is that the file is sourced without errors; for "source /etc/platform/openrc; system host-list", user1 gets a message saying it doesn't have privileges to read keyring password and an error message for system command, while user2 gets the commands executed without errors. PASS: Repeat the test above for standby controller: for "source /etc/platform/openrc --no_credentials" command, all users get a message saying it should only be loaded from active controller; for "source /etc/platform/openrc; system host-list", also a message is printed saying it should only be loaded from active controller and an error message appears for system command. Partial-Bug: 2024627 Signed-off-by: Joao Victor Portal <Joao.VictorPortal@windriver.com> Change-Id: I6ef2ca16a272d1fc7c4a24b9f5b48a9cb860450f	2023-06-30 20:08:08 +00:00
Thales Elero Cervi	6e4f3df557	Handle sysinv dpdk_elf_file configuration As part of Debian migration, the sysinv procedure to check DPDK compatibility for each host interface was also updated in order to make it customizable in case one would like to use other virtual switch than the delivered OVS with DPDK support [1]. For other virtual switches, that might or not rely on DPDK, the ELF target that sysinv uses to verify interfaces compatibility must be customizable and the query_pci_id script is already able to use custom values [2]. This change adds to puppet the system configuration that will write, if defined, the correct value for the ELF path. This platform parameter can be overridden on the hiera data so puppet will update sysinv.conf accordingly. For now, when deploying StarlingX with vswitch_type=ovs-dpdk we will override it to the query_pci_id script default value (i.e., the /usr/sbin/ovs-vswitchd ELF) using the respective sysinv puppet module and let it as an example for anyone that is later using a different vswitch which requires this customization [3]. [1] https://review.opendev.org/c/starlingx/config/+/872979 [2] `2cd0b1e14a/sysinv/sysinv/sysinv/scripts/query_pci_id (L34)` [3] https://review.opendev.org/c/starlingx/config/+/887106 Test Plan: PASS - Build puppet-manifest package PASS - Build a custom stx ISO with the new package PASS - Bootstrap AIO-SX virtual system (vswitch_type=none) and ensure the hiera data was not modified neither sysinv.conf was updated PASS - Bootstrap AIO-SX virtual system (vswitch_type=ovs-dpdk)* and ensure the hiera data was modified correctly and sysinv.conf was updated accordingly * A successful complete installation with ovs-dpdk is still blocked by a bug that will be solved soon: https://bugs.launchpad.net/starlingx/+bug/2008124 Story: 2010317 Task: 46389 Signed-off-by: Thales Elero Cervi <thaleselero.cervi@windriver.com> Change-Id: Iaf31d3b5e2fc03b4783473e4329a780a516a9d43	2023-06-30 10:03:43 -03:00
Manoel Benedito Neto	23479e7183	Fix bootstrap failure for subclouds w/ complex passwords This commit adds single quotes around user password parameter value to ensure that complex passwords are valid when user option setup script is executed by puppet bootstrap. Test Plan: PASS: Full build, system install, bootstrap and unlock DC system, with one subcloud bootstrapped and unlocked with active enabled available status. PASS: Add, bootstrap, manage and unlock a subcloud with a complex password containing special characters, numbers, capital letters and an open parenthesis at the end of the sentence. Closes-Bug: 2025292 Change-Id: Ia5430084bf6b16c78594a2483f2b88ec9b18f36a Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>	2023-06-28 16:36:25 -03:00
Andre Kantek	0ec63a4667	Use L4 ports used in OAM firewall from system.yaml In order to unify implementation with the other platform firewalls, the hard-coded values are set to 'undef' and will be provided by sysinv in system.yaml The test below validates the correct values are present in the OAM firewall Test Plan: [PASS] Install, Lock, Unlock AIO-SX [PASS] Install, Lock, Unlock AIO-DX (as SystemController) Story: 2010591 Task: 48255 Depends-On: https://review.opendev.org/c/starlingx/config/+/885585 Change-Id: Idc1f71f7ba762dc76529022acf4145db00686ec2 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-06-23 13:26:14 -03:00
Boovan Rajendran	867447b5d7	Wait for control-plane endpoints health during k8s upgrade abort This script waits for the k8s control-plane component endpoints (apiserver, scheduler, controller-manager, kubelet) to be up and running at the end of platform::kubernetes::upgrade_abort. Retry/timeout parameters are configured to wait up to 3 minutes. Test plan: Pass: Verify the abort waits for all control-plane endpoints to be healthy. Pass: Verify /var/log/kubernetes/k8s-endpoints-health.log shows 'Timeout: Kubernetes control-plane endpoints not healthy' message after timeout exceed. Story: 2010565 Task: 48203 Depends-On: https://review.opendev.org/c/starlingx/config/+/885582 Change-Id: I232b4746a3eb899ba87e706160547e8792489394 Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>	2023-06-22 12:12:44 -04:00
Zuul	52d9a8d58d	Merge "Puppet: Fix unacceptable location warnings"	2023-06-20 12:58:03 +00:00
Zuul	3162eb6667	Merge "Add sssd systemd service file override"	2023-06-19 14:27:58 +00:00
Andy Ning	559b79b72e	Add sssd systemd service file override sssd is monitored by pmon. But currently the Restart option in its systemd service file is set to on-failure. This sometimes causes systemd and pmon to fight to restart the service when it fails. All processes monitored by pmon should have Restart set to "no". This change added a systemd override file to set Restart to "no" for sssd service. Test Plan: PASS: Standard system deployment. PASS: Check sssd Restart option using "systemctl cat sssd", verify Restart option is set to "no", as following: # /etc/systemd/system/sssd.service.d/sssd-stx-override.conf [Service] # pmond monitors sssd service Restart=no PASS: Kill sssd process, verify pmon restart it successfully by tailing pmon.log, and verify sssd is running by "systemctl status sssd" command. Closes-Bug: 2023421 Signed-off-by: Andy Ning <andy.ning@windriver.com> Change-Id: I84521caf3745122492afe9ef4a251e42129b29b0	2023-06-16 10:36:24 -04:00
Carmen Rata	d2b27e7440	Create group for ldap users with denied ssh access Local OpenLDAP and WAD servers are being used for k8s api and SSH authentication. We need the ability to disallow SSH authentication for selective users. As part of the solution, we create a Linux group where all ldap users with "denied ssh access" will be added. The group will be set for denied ssh access in the sshd configuration. The sshd configuration change is part of a separate commit. Test Plan: PASS: Debian image gets successfully installed in AIO-SX system. PASS: Verify the Linux group has been created. PASS: Create an openldap user and add to the "deny ssh access" group. Verify that the user cannot ssh. PASS: Create a WAD group with the same name and gidNumber as the Linux group for "deny ssh access". Create a WAD user in this group. Validate that the new WAD user in the "deny ssh group" cannot ssh to stx platform. PASS: Remove the WAD user from the WAD "deny ssh access" group. Validate that now the user can have ssh access to stx platform. PASS: Remove the openldap user from the Linux "deny ssh access" group. Validate that now the user can have ssh access to stx platform. Story: 2010589 Task: 48234 Signed-off-by: Carmen Rata <carmen.rata@windriver.com> Change-Id: Ib1229f21e207d66d39f8bcdb7acf0533ace527c1	2023-06-15 00:24:29 +00:00
Zuul	c5719c38d1	Merge "Add grub update to restore"	2023-06-14 21:36:35 +00:00
Matheus Guilhermino	665af9b8e8	Puppet: Fix unacceptable location warnings The names of classes/defines should match the name that's implied by their file path. Puppet throws an "unacceptable location" warning whenever this condition is not satisfied. Test Plan: PASS: Build & install PASS: AIO-SX Successful Bootstrap PASS: AIO-SX Successful Unlock PASS: Verified that 'unnaceptable location' warnings are no longer present on puppet.log Story: 2010757 Task: 48026 Change-Id: I1cd3d09e90bfeb3d206b540717943ea1e6413444 Signed-off-by: Matheus Guilhermino <matheus.machadoguilhermino@windriver.com>	2023-06-14 19:39:28 +00:00
Zuul	a38e81c143	Merge "Config sssd on storage node"	2023-06-14 14:04:25 +00:00
Joshua Kraitberg	16589c4f3d	Add grub update to restore This ensures that the kernel boot args are correct. When they are not correct, puppet will trigger a reboot after unlocking to fix them. TEST PLAN PASS: AIO-SX backup and restore * New backup will include /boot files * Non-default kernel boot args will be kept * No double reboot * /proc/cmdline can be used to verify kernel boot args PASS: AIO-SX backup and restore * Remove new /boot files from backup * Restore with modified backup * Non-default kernel boot args will be lost * No double reboot * /proc/cmdline can be used to verify kernel boot args Partial-Bug: 2023678 Change-Id: I5f0c91c0c8583f4a86148ddf0fadc03b18ff9c1a Signed-off-by: Joshua Kraitberg <joshua.kraitberg@windriver.com>	2023-06-14 09:47:00 -04:00
Davi Frossard	319622d42d	Avoid "systemctl is-active" calls to prevent process restarts Replaces "systemctl is-active" calls by "pid file check" approach for docker-distribution (docker-registry) and registry-token-server services. These calls were causing unnecessary process restarts in cases where systemd was halted due to contention on kernfs_mutex. Test Plan: PASS: Verify docker-distribution status PASS: Verify registry-token-server status Partial-bug: 2016028 Change-Id: I2398d7f397ad14d2ff1ff6d40141ffad4f54f2e3 Signed-off-by: Davi Frossard <dbarrosf@windriver.com>	2023-06-14 13:07:17 +00:00
Andy Ning	cc2e4d086e	Config sssd on storage node Currently sssd is not configured and running on storage nodes so ldap users can't login to storage nodes. This update makes sssd configured, and running on storage nodes (with a followup update). Test Plan: PASS: System with storage nodes deployment PASS: In storage nodes, verify that the following config file exist: /etc/sssd/sssd.conf Closes-Bug: 2023399 Signed-off-by: Andy Ning <andy.ning@windriver.com> Change-Id: I383c101e0f99be93e9da528411c6fa1fd8cde4c6	2023-06-12 09:33:50 -04:00
Andre Kantek	b4d16baa2e	Create class to update the admin firewall in runtime This change creates a class to update the admin firewall during runtime operations Test Plan: [PASS] in subcloud mode, add/remove static routes in the mgmt network [PASS] in subcloud mode, add/remove static routes in the admin network Story: 2010591 Task: 48202 Change-Id: I3a4025cb8c6ff8d90ba36b49e2aaa12d0ec7057b Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-06-08 13:14:52 -03:00
Zuul	baaadc0d05	Merge "Create class to update the mgmt firewall in runtime"	2023-06-06 16:49:25 +00:00
Jorge Saffe	64aea5ba6d	Fix k8s custom configuration configmap init flag After system bootstrap, when service-parameter-apply is executed for the first time, it verifies that the k8s configmaps linked to service-parameters extra-volumes exist, then creates a flag (configmap_initialization_flag) to skip this step on subsequent runs. If the flag is not generated, the k8s custom configuration script checks the k8s configmaps each time it is run. Test Plan: PASS: Fresh Install STD/DX PASS: Apply K8s service-parameter. PASS: Verify configmap initialization flag has been created. Closes-bug: 2022983 Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com> Change-Id: Ie28247fd62945f90a9018a7ebb7942245ea5aeb4	2023-06-05 22:37:01 -04:00
Lucas Ratusznei Fonseca	e13c3988b1	Create class to update the mgmt firewall in runtime This change creates the new class to update the management network firewall in runtime. The class is meant to be applied by sysinv-conductor when the route config is updated in system controller hosts. Test plan: Setup: Distributed Cloud with AIO-DX as system controller. [PASS] Add route in a management interface, check that the corresponding network is present in the system controller's firewall. [PASS] Remove previously created route, check that the corresponding network is no longer present in the system controller's firewall. Story: 2010591 Task: 48174 Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com> Change-Id: I08fa9e2807f0c734c716c28c1996588167ee9d58	2023-06-05 15:17:26 -03:00
Zuul	bad31f6be6	Merge "Introduce new log file in /var/log/rss-memory.log"	2023-06-01 18:11:19 +00:00
Zuul	e05dc19857	Merge "Add support for kube-upgrade-abort"	2023-06-01 14:06:31 +00:00
Zuul	41d80011d3	Merge "Fix ntpd losing sync after some days"	2023-06-01 12:29:29 +00:00
Caio Bruchert	e2b54be8e4	Fix ntpd losing sync after some days The default ntpd configuration enables network interfaces scanning and this is causing ntpd to lose sync after about 2 days and 9 to 10 hours. This fix disables ntpd interface scanning by adding the -U 0 option. Note: this was detected on CentOS and both CentOS and Debian will have add the same option to maintain consistency. Test Plan: PASSED: Debian: check that ntpd -U 0 configuration is applied PASSED: Debian: wait for more than 5 days and check that ntp sync is still working Closes-Bug: 2017697 Change-Id: I1c2727b71d71bf03966c834c470bd225e2a95c81 Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>	2023-05-31 14:03:22 +00:00
Zuul	e70ffd8c21	Merge "Disable postgres huge page usage"	2023-05-30 20:12:22 +00:00
Cesar Bombonate	62e2c8b8e5	Introduce new log file in /var/log/rss-memory.log This change adds a new log /var/log/rss-memory.log for memory growth debuging The following entry into crontab will output daily at 01:00: 0 1 * * * /usr/bin/date >> /var/log/rss-memory.log; /usr/bin/ps -e -o ppid,pid,nlwp,rss:10,vsz:10, comm,cmd --sort=-rss >> /var/log/rss-memory.log Test Plan: - PASS: Build an image, install and bootstrap successfully - PASS: Apply monitor pods so addon logs would be installed. - PASS: Check that log entries are correctly displayed. - PASS: Tested on controller, AIO, worker and storage hosts. Closes-Bug: 2019007 Change-Id: I6f8e6208d203bcc77320ced3766af04dab977829 Signed-off-by: Cesar Bombonate <Cesar.PompeudeBarrosBombonate@windriver.com>	2023-05-30 15:13:00 +00:00
Boovan Rajendran	093ffbe27c	Add support for kube-upgrade-abort This change is to restore the etcd snapshot during k8s upgrade abort. During k8s upgrade abort we need to drain the node, remove the static pod manifests files stop the kubelet, containerd, docker and etcd services, restore the etcd snapshot, restore the static pod manifests, start the etcd, docker and containerd services, update the bindmount and start kubelet service. The helper script 'kube-wait-control-plane-terminated.sh' is used to wait with a timeout for the control plane pods processes to exit after removing static pod manifests files and forcibly kill the process if the timeout expires. Test Plan: AIO-SX: Perform k8s upgrade v1.24.4 -> v1.25.3 PASS: Create a test pod, before the etcd backup and delete the pod after taking snapshot run the command "system kube-upgrade-abort", verify test pod is running after etcd is restored successfully. PASS: Verify kubeadm and kubelet version restored successfully to the from version after k8s upgrade abort. PASS: Verify static manifest are restored successfully after k8s upgrade abort. PASS: Verify all the pods are restored and running successfully. PASS: Verify pod networking are still working. Story: 2010565 Task: 48070 Change-Id: I2efda2c9f84346933a9b1277e95d95cd8d21c50f Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>	2023-05-29 12:03:51 -04:00
Zuul	8698a89a3e	Merge "Configure the firewall in the worker nodes."	2023-05-29 13:21:48 +00:00
Andre Kantek	c251d37495	Configure the firewall in the worker nodes. This change adds the capacity to install the worker node required firewall into the calico configuration since kubectl isn't available there. It uses ansible ad-hoc commands to access the controller from the worker and execute the command. In all test cases below the iptables/ip6tables content in the worker node was verified Test Plan: [PASS] Install worker node. [PASS] Execute lock/unlock in the worker node. [PASS] Reinstall worker node. Story: 2010591 Task: 48067 Change-Id: I613b4ea710172c2bc7c6408bfa36430cbfe33fa2 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-05-25 18:23:41 -03:00
Nidhi Shivashankara Belur	94ee511348	Fix the n3000 config file - Remove "-a" flag from the command as it has been deprecated in the original pf-bb-config source. Test Status: - PASS: Configure device using host-device-modify with 1 VF and unlock the host. - PASS: Create a test pod requesting 1 VF. - PASS: Run dpdk test-bbdev application on 1 VF inside the pod. Closes-Bug: 2020128 Story: 2010698 Task: 48063 Change-Id: I229b505755138495c79513e926f674c28797b79b Signed-off-by: Nidhi Shivashankara Belur <nidhi.shivashankara.belur@intel.com>	2023-05-23 08:56:41 -07:00
Zuul	bb546aef2b	Merge "lowlatency runtime manifest updates"	2023-05-19 17:02:37 +00:00
Zuul	95e8f50715	Merge "New service parameter for intel_pstate"	2023-05-16 15:44:29 +00:00
Kyale, eliud	b68ece65cc	lowlatency runtime manifest updates updated existing personality manifests with - include platform::grub::kernel_image (grub.pp) - include platform::config::file::subfunctions::lowlatency (config.pp) also new runtime manifests with the same functionality - platform::grub::kernel_image::runtime updates the kernel image in kernel.env - platform::config::file::subfunctions::lowlatency updates /etc/platform/platform.conf 'subfunction=' line new args for 'puppet-update-grub-env.py' script '--set-kernel-lowlatency' and '--set-kernel-standard' Test plan: Using: /opt/platform/puppet/<version>/hieradata/<ipaddress>.yaml puppet config data will be edited manually 'true' <-> 'false' platform::sysctl::params::low_latency ----------------------------------- Manual puppet runtime manifest test ----------------------------------- cat << EOF > /tmp/test_runtime.yaml classes: - platform::grub::kernel_image::runtime - platform::config::file::subfunctions::lowlatency::runtime EOF /usr/local/bin/puppet-manifest-apply.sh \ /opt/platform/puppet/<version>/hieradata/ \ <ipaddres> \ runtime \ /tmp/test_runtime.yaml > /tmp/test_runtime_manifest.log Check - cat /etc/platform/platform.conf - cat /boot/1/kernel.env - /usr/local/bin/puppet-update-grub-env.py --list-kernels - sudo -u postgres psql -d sysinv -> select id,personality,uuid,subfunctions from i_host PASS - AIO-SX: iso install and bootstrap successfully aio.pp and ansible_bootstrap.pp puppet manifests are run successfully PASS - AIO-SX: change puppet config data in <ip>.yaml run puppet runtime manifests manually and observer platform.conf and kernel.env updates PASS - AIO-SX: repeat setting lowlatency and confirm no duplicate entries in platform.conf ----------------------------------- Task: 47942 Story: 2010731 Change-Id: I8e6ccb73829dc315fd6f8955f28fb6f22b57b137 Signed-off-by: Kyale, eliud <Eliud.Kyale@windriver.com>	2023-05-16 10:48:14 -04:00
Zuul	2b1eaace8b	Merge "Add L3 firewall support for platform networks"	2023-05-11 17:32:55 +00:00
Zuul	8550cbedfd	Merge "Add support for kube-upgrade-abort"	2023-05-09 15:27:50 +00:00
Boovan Rajendran	d2f221eabb	Add support for kube-upgrade-abort This change allow us to call a puppet class to update the bindmounts, restore the saved static manifest files, restart kubelet and restart etcd during k8s upgrade abort. This change is also to solve the warning message "Unrecognized escape sequence" which comes during kubelet upgrade. Test plan: Pass: Abort the k8s upgrade by 'system kube-upgrade-abort' command and verify static manifest files are restored, bindmounts are updated, kubelet and etcd restarted successfully. Pass: Verify /etc/fstab content updated successfully after k8s upgrade abort. Story: 2010565 Task: 47822 Change-Id: If1b1bda88a898bc6360403a839e174fbc0d62008 Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>	2023-05-09 09:31:31 -04:00
Zuul	5e31e8af52	Merge "Fix for subcloud admin network addition after initial install"	2023-05-08 22:55:08 +00:00
Steven Webster	0a56493a89	Fix for subcloud admin network addition after initial install This commit addresses a bug fix for the following scenario: 1. A user installs a subcloud with the communication between subcloud and system controller assigned to the management network. 2. The user decides they want to transition to the admin network, which allows changes to the subnet information after install. 3. The user locks a host, creates a platform interface for the admin network, then unlocks. 4. The user (after unlock) creates an address pool, admin network, and assignes the network to the previously created interface. Because there is a requirement in StarlingX for the admin network to be able to apply subnet changes (address pool, network) at runtime, this scenerio causes an issue because the admin-services SM service-domain-member and service group are only actually present in the SM database after an unlock. In the above scenerio, we logically create an admin interface but only assign it to and 'admin' network after unlock. This commit handles the above by ensuring the admin-services service-domain-member and service-group are enabled in the case that the system is a subcloud. Test Plan: 1. Install a subcloud using the management network for communication with a system controller. Ensure no alarms and that the admin-services service group is active, with no admin-ip service created. Lock, create an 'admin' interface and unlock. After unlock create and apply the admin address pool and network. Ensure the subcloud can be updated to use the admin network via dcmanager subcloud update. Ensure that the admin-ip service is enabled-active. 2. Install a subcloud using the management network for communication with a system controller. Lock, create an 'admin' interface, create an 'admin' address pool and network, then unlock. Ensure the subcloud can be updated to use the admin network via dcmanager subcloud update. 3. Install a subcloud using the admin network for communication with a system controller. Ensure the subcloud can become managed, online, and in-sync. 4. Perform the steps 1-3 for both AIO-SX and AIO-DX. Story: 2010319 Task: 46911 Signed-off-by: Steven Webster <steven.webster@windriver.com> Change-Id: I692dcf4f7e8c280236d63984ffd02afbed0a3e1d	2023-05-08 14:51:30 +00:00
Andre Kantek	3fa5584cf3	Add L3 firewall support for platform networks Adding puppet classes to install L3 firewall in cluster nodes that can run kubernetes (controllers and workers), It uses the hash2yaml function from the package puppet-hash2stuff, the change is marked as a dependency for this task. The story 2010591 is still under development and for now we are only applying the platform firewalls into the controller nodes. With the change https://review.opendev.org/c/starlingx/config/+/881495 the new classes' config info is provided. At this first delivery the firewall will not contain restrictive rules, focusing more in making the necessary GlobalNetworkPolicy and HostEndpoints to be correctly installed among the nodes Test Plan: [PASS] install AIO-DX [PASS] install Standard with DX+worker+storage nodes Story: 2010591 Task: 47954 Depends-On: https://review.opendev.org/c/starlingx/integ/+/881497 Change-Id: I1d35abde612cdaf3ccb54a858618037382ff2636 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-05-08 09:56:46 -03:00
Matheus Guilhermino	248c258bc9	Disable postgres huge page usage In order to use the total available 1G hugepages space when vswitch_type parameter is set to 'none', the value huge_pages=off needs to be included on /etc/postgresql/postgresql.conf since, by default, postgres uses hugepages if available. The postgresql.pp is a manifest called on unlock Test Plan PASS: AIO-SX: Successfully bootstrapped and unlocked PASS: Verified that app_hp_avail_1G == app_hp_total_1G after increasing huge page memory to the amount indicated by app_hp_total_1G (total and available values match when no applications are using huge pages). PASS: Output of 'cat /proc/meminfo' matches output of 'system host-memory-list controller-0' (HugePages_Free == app_hp_avail_1G). Closes-bug: 2018324 Change-Id: Iab7b7518fdcfccd2761778ed6a875a42cd35c34c Signed-off-by: Matheus Guilhermino <matheus.machadoguilhermino@windriver.com>	2023-05-04 08:00:22 -03:00
Chris Friesen	b9ff315ff7	remove reference to deleted exec resource In commit 77e0c7c1 we removed an exec resource that called out to an obsolete script. However, we neglected to remove a "require" metaparameter which referenced the removed script. It's unclear how this was missed since the previous change was tested in VirtualBox. This causes a puppet error when trying to upgrade K8s: Could not find resource 'Exec[update kubeadm-config]' in parameter 'require' (file: /usr/share/puppet/modules/platform/manifests/kubernetes.pp, line: 813) The fix is to remove the metaparameter. TEST PLAN: PASS: While running the dev branch on AIO-DX, upgrade K8s from 1.21 to 1.22. (note, a workaround was required to deal with https://bugs.launchpad.net/starlingx/+bug/2018247) Partial-Bug: 2017696 Change-Id: I66c0e88f0f0a3acc3326391263123e60667561cc Signed-off-by: Chris Friesen <chris.friesen@windriver.com>	2023-05-01 12:19:38 -06:00
Davlet Panech	a5489c0b62	Fix github mirroring for this repo Updating the rsa ssh host key based on: https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/ Note: In the future, StarlingX should have a zuul job and secret setup for all repos so we do not need to do this for every repo. Needed to rename the secret, because zuul fails if like-named secrets have diffent values in different branches of the same repo. Partial-Bug: #2015246 Change-Id: I94d27934bbfafb174f8e8d48491e6089f47e6408 Signed-off-by: Davlet Panech <davlet.panech@windriver.com>	2023-04-28 12:38:53 -04:00
Gleb Aronsky	77e0c7c14f	Remove references to deprecated script upgrade_k8s_config.sh has been deprecated and removed due to lack of support for "flow" style YAML. Deprecated functionality has been superseded by better YAML-aware handling in sysinv. Updating how we invoke kubeadm, we will now use an explicit version of kubeadm when calling it. The version called will now match the version we are upgrading to in order to handle the format unsupported by previous versions of kubeadm. Test Plan PASS: - Manually update scripts on controllers and worker nodes based on https://review.opendev.org/c/starlingx/integ/+/880390 - Perform manual upgrade from k8s v1.21.8 to v1.22.5 - Verify kubernetes successfully upgraded to v1.22.5 Test was performed in the lab with local changes to verify the code. Patch was not tested. Closes-Bug: 2017696 Change-Id: I840eb566057be495fe0da3cae7604bf8055c0d4f Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>	2023-04-26 16:22:51 -07:00
Zuul	8d5b19c952	Merge "Fix restore failure during puppet-manifest-apply"	2023-04-25 13:52:00 +00:00
Zuul	50823022a6	Merge "kubelet-config custom parameters are missing after k8s upgrade"	2023-04-25 13:51:34 +00:00
Jorge Saffe	2ceceb29f6	Fix restore failure during puppet-manifest-apply When K8s custom config puppet script is executed during restore playbook, K8s updates fail when trying to validate cluster network data. This happens whenever the OAM IP address is reconfigured (after reinstall) with a different protocol version than the one used for the K8 cluster host subnet. The issue is related to "advertise-address" parameter. It is not predefined in the api-server extra-args during bootstrap, so k8s gets the host's default interface as default value. In this case, the host’s default value is an IPv4 (IPv6) address while all the other K8s cluster subnets are configured with IPv6 (IPv4) addresses. K8s validation fails because STX defaults to a SingleStack mode. Only dual-stack networks allow the assignment of IPv4 and IPv6 addresses to pods and services. Test Plan: PASS: Fresh Install AIO-SX. PASS: Create a backup and reinstall server. PASS: Reconfigure network OAM IF with a different IP family. PASS: Restore system. PASS: Verify advertise-address parameter. PASS: Modify and Apply K8s service-parameter. PASS: Fresh Install STD/DX PASS: Modify and Apply K8s service-parameter. PASS: Verify advertise-address parameter in both controllers. Closes-bug: 2001715 Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com> Change-Id: I6f75f171d0a45abe2d5e047a31308dc97ce19eed	2023-04-25 12:44:34 +00:00
Jim Gauld	de57231375	kubelet-config custom parameters are missing after k8s upgrade The kubernetes.pp class platform::kubernetes::upgrade_first_control_plane which does 'kubeadm upgrade apply' resulted in versioned kubelet-config ConfigMap. The pre-upgrade ConfigMap was left behind. Having multiple ConfigMap causes 'system kube-config-kubelet' to fail, so reconfiguration was broken. In historical releases, we had specified '--config /etc/kubernetes/kubelet_override.yaml', so the the kubelet garbage collection eviction parameters became incorrect post k8s upgrade, without a way to reconfigure. This update will purge all kubelet-config ConfigMap except the most recent. This occurs immediately following 'kubeadm upgrade apply' step. Testplan: PASS: AIO-SX perform k8s upgrade, run 'system kube-config-kubelet'. Verify only current version kubelet-config ConfigMap exists. Closes-Bug: 2012975 Change-Id: I5e34299616690628267c07a744dc9923144e606d Signed-off-by: Jim Gauld <James.Gauld@windriver.com>	2023-04-24 16:21:22 -04:00

1 2 3 4 5 ...

1704 Commits