stx-puppet

Author	SHA1	Message	Date
Igor Pires Soares	c2c9a70d97	Revert "Add instrumentation log for kubeadm join command" This reverts commit b1ec6538c6daeb21c01f59b74196b87bf99040ec. Reason for revert: Fix an issue where puppet doesn't stop when a command fails. Also aims to fix kubeadm join failures when unlocking controller-1 after Backup & Restore. This also enforces Zuul to use python 3.9, since it was running tox checks with python 3.12. Closes-bug: 2090224 Change-Id: Iebb76674e45f82563d9901b44cf0afe44a436822	2024-11-29 23:16:49 +00:00
Wallysson Silva	74f90fc674	Escape $ character in dc configuration files This ensures password containing a $ symbol can be read by oslo_config. oslo_config supports variable substitution [1] and to avoid substitution is need to escape $ with $$. [1]: https://docs.openstack.org/oslo.config/latest/configuration/format.html#substitution Test Plan: - PASS: bootstrap a system controller with keystone admin password containing $, dc services should start Closes-Bug: 2089783 Change-Id: Icdbfae04b663bb9373116ff4967d4d78f57625c6 Signed-off-by: Wallysson Silva <wallysson.silva@windriver.com>	2024-11-27 15:54:13 -03:00
Zuul	cf6171a7dd	Merge "Fix CLI behavior change for Helm v > 3.3.1"	2024-11-26 21:41:50 +00:00
Zuul	890719a2d1	Merge "Remove sw-patch-agent.service from manifests"	2024-11-26 18:57:11 +00:00
Marcelo de Castro Loebens	c863d68029	Fix CLI behavior change for Helm v > 3.3.1 Helm introduced changes in the CLI behavior for the command 'helm repo add' in versions superior to 3.3.1. For reference: https://github.com/helm/helm/issues/8771 . This caused the code inside platform::helm::repository to return an error when repos are updated, which in turn makes the puppet manifest fail. The previous behavior can be achieved by using the flag 'force-update' introduced by Helm. However, this is not backwards compatible, so it's usage is conditioned to the software version to decrease the chance of issues during upgrades from stx 8.0. Test plan: PASS: Bootstrap DC + SX subcloud. PASS: Switch http_port to 80 (default is 8080). Verified that the puppet manifest executed successfully. Verified that helm repos were updated. PASS: Switch https_enabled to false. Verified that Horizon is accessible using the port 80. Story: 2011266 Task: 51407 Change-Id: I5dd0a5ac1914073a6e500cd9dde602aa63b874eb Signed-off-by: Marcelo de Castro Loebens <Marcelo.DeCastroLoebens@windriver.com>	2024-11-26 14:08:45 +00:00
Bin Qian	1b6ead7eba	fix compute node puppet failure after upgrade Fix a small syntax error in puppet manifest that causes the compute node main manifest failure after upgrade to stx-10. Closes-bug: 2089591 TCs: passed: compute node unlock successfully with kubelet service running after upgrade to stx-10. Change-Id: I73eb9b69647a0d8e05a8e4792fcec6deec6531fc Signed-off-by: Bin Qian <bin.qian@windriver.com>	2024-11-25 19:36:05 +00:00
mmachado	020cfa9d63	Remove sw-patch-agent.service from manifests sw-patch-agent service is to be disabled and must be removed from patching and keystone manifests. Depends-On: https://review.opendev.org/c/starlingx/config/+/936143 Test-Plan: PASS: AIO-SX upgrade using sw-manager strategy PASS: AIO-DX System Controller upgrade using strategy PASS: subcloud upgrade using dcmanager strategy Story: 2010676 Task: 51386 Change-Id: I201de8f2f2f4f16ad2d01933881a61f3ad41af7c Signed-off-by: mmachado <mmachado@windriver.com>	2024-11-25 09:11:22 -03:00
Zuul	cd1a3ee190	Merge "Change log in check_grub_config to Facter::warn"	2024-11-22 18:33:42 +00:00
Zuul	52997b98a2	Merge "Configure systemd CPUShares/CPUQuota for Kubernetes services"	2024-11-22 17:30:34 +00:00
Kyale, Eliud	7cdae0548e	Change log in check_grub_config to Facter::warn Puppet::info switch to Facter::warn different log api . doesn't rely on debug enabling Test plan: PASS - AIO-SX: iso install PASS - AIO-DX: iso install PASS - trigger update_grub_config.rb to test logging Example log: 2024-11-22T17:22:52.792 Warning: 2024-11-22 17:22:52 +0000 Facter: nohz_full=disabled is not presented in ... Related-Bug : 2089028 Change-Id: I24598a2b54a6649a0f76b6a4b295eb1254a203dc Signed-off-by: Kyale, Eliud <Eliud.Kyale@windriver.com>	2024-11-22 12:26:34 -05:00
Jim Gauld	602fa7a08a	Configure systemd CPUShares/CPUQuota for Kubernetes services This creates the systemd k8splatform.slice and this is configured with 10x CPUShares. Kubernetes services are latency critical. The following services are members of k8splatform.slice: etcd.service, containerd.service, kubelet.service. This also configures systemd CPUQuota: - 75%PlatformCPUS for kubelet.service - 75%PlatformCPUS for containerd.service In general the process behaviour of containerd and etcd services are auto-regulated by the load from kubelet. Usually these three services are well behaved (highly interactive, wakeup, do little work), mostly request driven. In theory putting a quota on kubelet.service should be sufficient, but there is occasionally a runaway log-flooding problem causing containerd/containerd-shim to use too much. This is the reason to also put a quota on containerd.service. This adds systemd hung behavior mitigations for Kubernetes DropIn files configuration, used after daemon-reload and restarting services. This includes usage of new scripts: - verify-systemd-running.sh This is part of an overall set of adjustments are required for systemd cgroups CPUShares, CPUQuota, and AllowedCPUs for key system services. This will improve latency of Kubernetes critical components, and throttles lesser important services. Partial-Bug: 2084714 TEST PLAN: AIO-SX, AIO-DX, Standard, Storage, DC: - PASS: Fresh install - PASS: bootstrap: Verify that K8S services run under k8splaform.slice systemctl status k8splatform.slice - PASS: unlock: Verify that K8S services run under k8splaform.slice - PASS: reboot: Verify that K8S services run under k8splaform.slice AIO-SX: - PASS: reconfigure number platform cpus; unlock, verify updated CPUShares: kubelet, containerd - PASS: ansible-playbook replay - PASS: Platform USM Upgrade; verify systemd CPUShares settings - FAIL: docker-stx-override.conf requires regeneration AIO-SX, AIO-DX: - PASS: BnR - verify CPUShares after restore - PASS: K8S orchestrated Upgrade 1.24 - 1.29 - TODO: Platform USM Upgrade, including pre-activation rollback Change-Id: Ica5821b620453678861656db4efe6d7382bccadb Signed-off-by: Jim Gauld <James.Gauld@windriver.com>	2024-11-22 08:38:58 -05:00
Zuul	005359b81a	Merge "Add instrumentation log for kubeadm join command"	2024-11-21 21:14:04 +00:00
Jim Gauld	b1ec6538c6	Add instrumentation log for kubeadm join command This uses kube_command helper for logging instrumentation of 'kubeadm join' command. This is useful in cases when the join command fails or hits timeout. In the case of timeout, we currently get no indication of progress or the actual failure. The platform::kubernetes::kube_command helper function is updated to have new optional parameter 'unless', and the 'environment' parameter is modified to pass an array instead of a string to handle an empty array as the default. Partial-Bug: 2084714 TEST CASES: PASS: AIO-SX, AIO-DX, Standard, DC: Fresh install ISO. Verify we get file output logs in /var/log/puppet/<dir>/ for kubeadm-join-command.log with verbose output. PASS: AIO-DX: K8S Orchestrated upgrade Change-Id: Id88d07a62d9bd8785227213d5e1b49fca5260084 Signed-off-by: Jim Gauld <James.Gauld@windriver.com>	2024-11-20 19:04:40 -05:00
Zuul	89df24b7e9	Merge "Show detailed grub update logs"	2024-11-20 21:59:11 +00:00
Kyale, Eliud	e01bfe5fed	Show detailed grub update logs Show detailed logs that indicate which kernel arguments have been updated in order to assist in determining reboot cause. kernel arguments require a reboot that affects performance and timing Test plan: PASS - AIO-SX: iso install PASS - AIO-SX: manually edit kernel parameters and trigger puppet audit observe logs and reboot Closes-Bug : 2089028 Change-Id: I721cadf3dfb725bf3722eacca7a039cf3c4e31d1 Signed-off-by: Kyale, Eliud <Eliud.Kyale@windriver.com>	2024-11-20 02:21:41 -05:00
Zuul	67361c5a53	Merge "Tune postgresql memory and I/O settings for system controllers"	2024-11-19 16:58:45 +00:00
Jim Gauld	f5b83fe391	Configure toprc with additional fields P, NU, CGNAME This configures top using toprc configuration file for sysadmin and root. This enables fields: P, NU, CGNAME, and shows full command arguments by default. This improves the System Engineering debugability of the system since we can easier see where a task is running. We see what logical cpu a task is currently running with 'P' last cpu used and 'NU' numa node, and see what systemd cgroup name with 'CGNAME'. This also helps diagnose tasks running in pods since they belong to well named cgroups. Partial-Bug: 2084714 TEST PLAN: - PASS: Fresh install AIO-SX, AIO-DX, Standard, Storage, DC - PASS: run 'top' as sysadmin and root user, verify we see 'P', 'NU', 'CGNAME' and command arguments. Change-Id: I50d8f25336a980bcdcba4d94ea727fee9726a527 Signed-off-by: Jim Gauld <James.Gauld@windriver.com>	2024-11-15 01:31:39 -05:00
Gustavo Herzmann	d405647ecb	Tune postgresql memory and I/O settings for system controllers Reduce work_mem from 512MB to 32MB to better handle increased connection counts in scale environments. The previous value was a legacy setting from when Ceilometer's complex queries required higher memory allocation. Increase shared_buffers from 80MB to 256MB to improve database caching performance for large queries. Tune random_page_cost and effective_io_concurrency parameters to optimize I/O performance for solid state drives. Test Plan: 01. PASS - Build an ISO with the commit and install a system controller with it, verify that the install completes successfully and that PostgreSQL starts successfully with the new config. 02. PASS - Run basic load test to confirm there's no performance degradation. 03. PASS - Monitor resource-intensive Distributed Cloud operations in a scale environment (e.g. dcmanager-audit) and verify they complete successfully. 04. PASS - Check logs for any warnings, errors or slow queries. Story: 2011106 Task: 51246 Change-Id: Idb43369b3e11590a50226cfaa0c903a091586de2 Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>	2024-11-13 18:15:00 -03:00
Zuul	3e0d07fc35	Merge "Fix .ceph_started flag creation" vf/caracal	2024-11-06 19:10:11 +00:00
Zuul	da3716f602	Merge "Select devices to disable APM"	2024-11-05 19:37:24 +00:00
Zuul	900256b0f2	Merge "Increase HAProxy USM timeout for "slow" Requests"	2024-11-05 19:37:18 +00:00
Fabiano Correa Mercer	ba5bb84892	Increase HAProxy USM timeout for "slow" Requests The software upload command currently fails due to a timeout while awaiting the HTTP response. This issue commonly occurs when uploading larger patches, such as a 1GB file. For example: software upload test.patch This timeout issue may occur on the client-side request, which is addressed in the following fix: https://review.opendev.org/c/starlingx/update/+/934084 Additionally, an adjustment is needed on the HAProxy side to increase the timeout for "slow" USM requests (PUT,POST,DEL + precheck requests) Test Done: PASSED software upload on AIO-SX with a 1GB patch file with timeout 1800s PASSED software upload on AIO-SX with a 1GB patch file with timeout 300s Edited the file: /etc/haproxy/haproxy.cfg Changed the "timeout server" setting to 300 seconds for the backend: alt-usm-restapi-internal Restarted the HAProxy service: sudo sm-restart service haproxy Apply the patch: software upload large.patch Confirmed the command failed due to HAProxy timeout, with the error: Gateway Timeout Depends-On: https://review.opendev.org/c/starlingx/update/+/934084 Story: 2010676 Task: 51265 Change-Id: I3a520455ceb150d753262d1542e5c4e608acec99 Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2024-11-05 16:06:27 -03:00
Hediberto C Silva	d8463f6031	Select devices to disable APM This commit is a part of the solution to mitigate a known issue that the Advanced Power Management (APM) disk settings impacted read performance. These settings are dynamically set based on the enabled StarlingX tuned service profiles. On some specific hardware configurations (for example, PowerEdge XR11 with an integrated storage controller), degraded read performance was observed where the Tuned Disk Monitor didn't detect high usage, maintaining a limited and low APM level (default 20). For write operations, a delay of about 60 seconds was noticed to achive the highest disk performance. Each unlocking will ensure the APM is disabled, but it can still be set manually at runtime using: "sudo hdparm -B <apm_level> /dev/sda". Ensuring it's disabled for all devices, we need to provide the names, as Tuned retrieves all devices from /sys/block and attempts to apply the apm_level setting for each one. After failing to apply it three times (for example, with DRBD block devices), Tuned will disable the set_apm command for the others. For populate this parameter, it will needed to use the disks persistent name (by-path value) from the inventory. For example: devices_udev_regex=(ID_PATH=pci-...-ata-1.0)\|(ID_PATH=pci-...-ata-2.0) Test Plan: PASS: All packages built successfully PASS: Fresh Install SX/DX/STD in virtual environments PASS: After unlocking, verify that APM and Tuned Disk Monitor are disabled PASS: After unlocking, verify that /etc/tuned/starlingx/tuned.conf is populated with the selected devices PASS: All previous tests using XR11 lab PASS: After the initial unlock, the virtual host is locked, powered off, a disk is added, powered on, and after a new unlock, the new disk is added to devices_udev_regex. Partial-bug: 2086509 Depends-on: https://review.opendev.org/c/starlingx/config-files/+/933897 Change-Id: I7c71ecb05c5a406283af6da7af9bef08df0ded66 Signed-off-by: Hediberto C Silva <hediberto.cavalcantedasilva@windriver.com>	2024-11-04 17:08:25 -03:00
Victor Romano	9240927b58	Adjust certmon parameters for scalability To reduce the time it takes to audit subclouds by certmon, the number of subclouds that can be audited in parallel was increased from 4 to 20 and the timeout to check if it's possible to establish a connection was reduced from 10 to 5. Test plan: - PASS: Lock/unlock a system controller with these changes and verify the config file was correctly updated and certmon is auditing subclouds successfully Partial-bug: 2085540 Change-Id: I01be5a7b50598e6ba97878e71eb84f1472673deb Signed-off-by: Victor Romano <victor.gluzromano@windriver.com>	2024-10-29 10:58:03 -03:00
Zuul	9bc64ca31c	Merge "pxeboot should be provisioned after BnR"	2024-10-24 19:03:01 +00:00
Fabiano Correa Mercer	48d396dfdc	pxeboot should be provisioned after BnR The pxeboot-ip service was not provisioned after an AIO-SX BnR in R8.0, even though pxeboot ip ( 169.254.202.1 ) was installed. This issue does not occur in R9.0. However, the SM.pp can be simplified to ensure provisioning in all cases. Tests done: AIO-SX fresh install AIO-DX fresh install AIO-DX host-swact AIO-SX BnR AIO-DX BnR Closes-Bug: 2085537 Change-Id: I4143a23e75e8e17444364cf6c707722e9e494fd3 Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2024-10-24 12:56:44 +00:00
Zuul	4587bc0fd8	Merge "Disable Kubernetes application audit via config option."	2024-10-22 12:41:22 +00:00
Zuul	372d7509d9	Merge "Fix Puppet NFV Cinder version configuration"	2024-10-21 18:08:02 +00:00
marantes	b48469793e	Fix Puppet NFV Cinder version configuration In addition to the information shared at [1], the default Cinder version configuration has been updated to version 3, replacing the deprecated version 2. [1] https://review.opendev.org/c/starlingx/config/+/932563 TEST PLAN: PASS - build-pkgs -c -p puppet-nfv PASS - build-image PASS - AIO-SX fresh install PASS - Upload/Apply stx-openstack PASS - 'openstack endpoint list' showing cinderv3 Depends-On: https://review.opendev.org/c/starlingx/config/+/932563 Partial-Bug: 2084683 Change-Id: I9f7cb76b763df14af767dce8569aea23c711b391 Signed-off-by: marantes <murillo.arantes@windriver.com>	2024-10-21 12:22:52 -03:00
Zuul	22ab44d300	Merge "Start IPsec daemon before SM during reboot"	2024-10-18 10:28:01 +00:00
Kaustubh Dhokte	f33aa6a578	Use systemctl to restart kubelet This is a temporary change for the puppet class platform::kubernetes::update_kubelet_config::runtime. There is a known issue that pmon-restart kubelet does not actually restart the kubelet post platfom upgrades. This is because of the missing kubelet.conf under /etc/pmon.d/. The issue is tracked seperately. Until that issue is fixed, systemctl kubelet restart replaces pmon kubelet restart for the runtime kubelet reconfig functionality. Test: PASS: Boot standard (2+2) lab. Run 'system kube-config-kubelet'. Kubelets on all four nodes are restarted. PASS: Boot AIO-SX lab. Run 'system kube-config-kubelet'. Kubelet on the controller is restarted. Kubernetes cluster is healthy. PASS: Remove kubelet.conf from /etc/pmon.d to emulate current problem Run "system kube-config-kubelet". Kubelet is restarted. Kubernetes cluster healthy. Closes-Bug: 2084622 Change-Id: I6ffffb2fd56682dfc5da34aa4b867190c20f27b2 Signed-off-by: Kaustubh Dhokte <kaustubh.dhokte@windriver.com>	2024-10-17 19:08:35 +00:00
Andy Ning	3844a4c513	Start IPsec daemon before SM during reboot Currently IPsec daemon and SM has no ordering dependency in systemd, so IPsec daemon usually starts before SM during system booting and stops before SM during system shutdown. This causes an issue that when the active controller shuts down, the IPsec daemon stops earlier and all IPsec connections to the active controller are terminated. Since the mtcAgent (a SM service) is still runing and working together with hbsAgent to monitor the standby controller, it sends reboot request to the standby controller, via pxeboot network, to reboot it when connectivity lost on mgmt network is detected by heartbeat. The expected behaviour is, when the active controller shuts down, the other controller should become active without rebooting. This change fixed the issue by updating IPsec starter systemd service unit file so IPsec daemon start before SM, thus stop after SM (inverse of start order). Test Plan: PASS: Multiple nodes system (such as DX + 1 worker node) deployment, verify deployment is successful with all nodes in unlocked/enabled/available state. PASS: Shut down active controller, verify the other controller becomes active without rebooting. PASS: Power on the shut down controller, after it boots up, verify all nodes are in unlocked/enabled/available state, and there are no alarms. Story: 2010940 Task: 51182 Change-Id: I94166fae927a98b9caaac163bd399533bfe52719 Signed-off-by: Andy Ning <andy.ning@windriver.com>	2024-10-16 12:02:11 -04:00
Edson Dias	cc4ee0f8bd	Disable Kubernetes application audit via config option. This commit aims to facilitate the application debugging process in scenarios where the audit task is not convenient, a configuration option, skip_k8s_application_audit, was added in the application framework section of sysinv.conf file to enable the possibility of turning the audit task on/off. If this option is enabled, then the system will skip the audit task. Also, the default value set to skip_k8s_application_audit is False. Test Plan: PASS: build-pkgs && build-image PASS: AIO-SX fresh install PASS: check if _k8s_application_audit is running PASS: set skip_k8s_application_audit as false in sysinv.conf && restart sysinv conductor. PASS: check if _k8s_application_audit stopped to work. Depends-on: https://review.opendev.org/c/starlingx/config/+/932329 Story: 2011242 Task: 51176 Change-Id: I77708a7d8be4a9c3254a15e13979277d96f20f33 Signed-off-by: Edson Dias <edson.dias@windriver.com>	2024-10-16 09:01:01 -03:00
Steven Webster	d74a25a7e7	Use IP address over FQDN for dcmanager rabbit/db connections In the past few months, _most_ StarlingX services have moved from static IP addressing to FQDN resolution, in support of the management network reconfig feature. While doing DC scalability testing, it was found that a transient domain resolution (controller.internal) issue was found after adding approximately 250 subclouds to the system and involved the rabbitmq/RPC subsystem. The error message returned was similar to: "OSError: failed to resolve broker hostname" The rabbitmq/amqp library is calling a _connect() function, which in turn calls the python socket getaddrinfo() Multiple attemps were made to reproduce the scenario in a non-scaled lab by stressing the getaddrinfo(), getting dnsmasq up to ~40 CPU usage, but the same error was not returned. Testing was done on the DC scale lab by manually changing the rabbit and DB config files and this confirmed that using the static floating IP (avoiding domain name resolution all-together resolved the issue) It was decided to revert the FQDN aspect of the dcmanager and dcorch modules for now, as the management network reconfiguration feature would not even apply to an AIO-DX system controller at this time. This may be re-evaluated in the future at which point a deeper dive into the rabbit/RPC usage should be considered. Testing: - Install an AIO-DX system controller and install a subcloud. Ensure the subcloud is managed and online. - Ensure the dcmanager.conf and dcorch.conf commands use an IP address in their transport_url and database connection parameters. Depends-On: https://review.opendev.org/c/starlingx/config/+/932013 Story: 2010722 Task: 48447 Change-Id: Icd067441dd08321936eb03498ff65241fac0010e	2024-10-09 22:24:03 -04:00
Rei Oliveira	97dde7d666	Fix keystone access log for c1 This commit fix keystone access logging to /var/log/keystone/keystone.log in controller-1. The INFO log level is set only during bootstrap, it gets applied at the moment only to controller-0. This adds the logic to puppet as well to ensure that controller-1 has the right settings too. Test plan: PASS: Full build, install and bootstrap PASS: Host-swact to controller-1. Run authenticated commands such as 'system host-list' and verify that it gets logged to /var/log/keystone/keystone.log Story: 2011106 Task: 51139 Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com> Change-Id: I2fa902b09474214bafa268a185474a2df6e7aa97	2024-10-08 14:28:06 +00:00
Zuul	0106e1f454	Merge "kubelet not running after unmask and pmon-start"	2024-10-07 15:28:32 +00:00
Edson Dias	31daec3365	kubelet not running after unmask and pmon-start Start the kubelet service if not running after executing kubelet unmask and pmon-start during Kubernetes upgrades. This also replaces the Kubernetes health check script used during Kubernetes upgrades in favor of the new 'sysinv-k8s-health check' which faster at evaluating multiple health endpoints and has enhanced logging. Test Plan PASS: Upgrade from previous release Upgrade Kubernetes from 1.24 to 1.25 PASS: Upgrade from previous release Multi-version Kubernetes upgrade from 1.24 to 1.29 PASS: Upgrade from previous release Pause the kubeadm process to trigger a K8S upgrade abort. Upgrade Kubernetes from 1.24 to 1.25 Check if the system sucessfully aborted the upgrade. Check kubectl command works without errors. Closes-bug: 2083635 Co-Authored-By: Boovan Rajendran <boovan.rajendran@windriver.com> Change-Id: Id735db17a4a398065e82fd392d9b7cfbbc212210 Signed-off-by: Edson Dias <edson.dias@windriver.com>	2024-10-04 17:03:24 -03:00
Zuul	67c78fe160	Merge "Modify QAT default configuration"	2024-10-04 19:59:42 +00:00
Felipe Sanches Zanoni	ba690c5ae3	Fix .ceph_started flag creation The flag .ceph_started was being created in the ceph.pp manifest. This flag enables SM and Pmon monitoring, but the Ceph processes might not be initialized yet. This flag is created in ceph.sh script called by MTC when enabling the host. When configuring ceph backend at runtime, the MTC will not call the ceph.sh script and the processes will not be fully started and the flag will not be created. To solve this, the ceph.sh script will be called at the end of puppet manifest runtime apply. Test-Plan: PASS: On all setups, do a fresh install and check the flag is not being created during the puppet manifest apply. The flag must be created by the MTC. PASS: On all setups, do a fresh install without Ceph backend configured. Configure Ceph at runtime and check if the flag is created after Ceph is configured. Partial-bug: https://bugs.launchpad.net/starlingx/+bug/2083056 Depends-on: https://review.opendev.org/c/starlingx/integ/+/930514 Change-Id: Ibbd7dbb41f00c2b2354eaa9a5bd8d383a3d63ac8 Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>	2024-10-04 08:22:15 -03:00
Jorge Saffe	fcb9dbde24	Update Postgres Auth and Password Encryption Currently, the default authentication and password encryption method for PostgreSQL is 'md5'. However, it is necessary to update this to a more secure method, such as 'scram-sha-256'. The proposed solution addresses these updates using the 'puppetlabs-postgresql' Puppet module. Two new parameters have been added to the hieradata to configure the password encryption and authentication methods. Test Plan: - PASS Fresh Install SX env * Verify system status unlock/available * Login as admin user in psql (psql -U admin -h 127.0.0.1 -d sysinv) * Check postgres authorization configuration (SELECT * from pg_hba_file_rules;) * Check postgres password encryption configuration (SELECT rolname, rolpassword FROM pg_authid WHERE rolpassword IS NOT NULL;). - PASS Fresh Install DX env * Verify system status unlock/available * Login as admin user in psql (psql -U admin -h 127.0.0.1 -d sysinv) * Check postgres authorization configuration (SELECT * from pg_hba_file_rules;) * Check postgres password encryption configuration (SELECT rolname, rolpassword FROM pg_authid WHERE rolpassword IS NOT NULL;). * Host swact to controller-1 * Login as admin user in psql (psql -U admin -h 127.0.0.1 -d sysinv) * Check postgres authorization configuration (SELECT * from pg_hba_file_rules;) * Check postgres password encryption configuration (SELECT rolname, rolpassword FROM pg_authid WHERE rolpassword IS NOT NULL;). * collect logs (collect) * verify '/var/extra/database/' content - PASS Fresh Install DC env * Verify system status unlock/available * Check postgres authorization configuration (SELECT * from pg_hba_file_rules;) * Check postgres password encryption configuration (SELECT rolname, rolpassword FROM pg_authid WHERE rolpassword IS NOT NULL;). - PASS Backup and Restore SX - optimized * Verify system status unlock/available * Check postgres authorization configuration (SELECT * from pg_hba_file_rules;) * Check postgres password encryption configuration (SELECT rolname, rolpassword FROM pg_authid WHERE rolpassword IS NOT NULL;). - PASS Upgrade SX - PASS Upgrade SX-rollback - PASS Upgrade DX - PASS Upgrade DX-rollback Closes-bug: 2069842 Depends-On: https://review.opendev.org/c/starlingx/integ/+/930638 Change-Id: I0e93ff924e5448454d7cb6ae356f074befa3dc33 Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com>	2024-10-03 20:55:31 +00:00
Zuul	a2a91b2537	Merge "Remove the database admin role for postgres"	2024-10-02 16:50:28 +00:00
Ramesh Kumar Sivanandam	72e1e0c84d	AIO-DX: Add super-admin.conf file on standby controller for K8s v1.29 A fresh installation of Kubernetes v1.29, as well as an upgrade from v1.28 to v1.29 in a duplex system, creates the super-admin.conf file on the active controller but not on the standby controller. The super-admin.conf file should exist on both controller nodes for redundancy purposes. This change ensures that the super-admin.conf file is generated on the standby controller during both a fresh install of K8s v1.29 and the upgrade from v1.28 to v1.29. Test Plan: PASS: Install ISO with K8s 1.29 on AIO-DX and verify that the super-admin.conf is present on both controllers. PASS: Install ISO with K8s 1.28 on AIO-DX, upgrade to 1.29 and verify that the super-admin.conf is present on both controllers. PASS: Install ISO with K8s 1.28 on AIO-DX, set the controller-1 as active, upgrade to 1.29 and verify that the super-admin.conf is present on both controllers. PASS: Verify that "sudo kubeadm certs check-expiration" command outputs the super-admin-conf details on both controllers. Closes-Bug: 2081769 Change-Id: I58b6a995b37b70e8b3350311ca3c89e4a008f8b7 Signed-off-by: Ramesh Kumar Sivanandam <rameshkumar.sivanandam@windriver.com>	2024-09-26 15:25:40 -04:00
Jorge Saffe	84eaab9ab7	Remove the database admin role for postgres PostgreSQL object-relational database admin role is no longer needed. It can be removed safely. Test Plan: - PASS Fresh Install SX env * Verify system status unlock/available - PASS Fresh Install DX env * Verify system status unlock/available - PASS Upgrade SX - PASS Upgrade SX-rollback - PASS Upgrade DX - PASS Upgrade DX-rollback - PASS Fresh Install DC env * Verify system status unlock/available Partial-bug: 2080971 Change-Id: I0c57f9bcab90ae0f987b828806c0b02e1200c2ca Signed-off-by: Jorge Saffe <jorge.saffe@windriver.com>	2024-09-26 20:29:21 +02:00
Md Irshad Sheikh	da173de3b7	Modify QAT default configuration This commit is to change default "ServicesEnabled" configuration from asym;dc to sym;dc in the PF and VF configuration template files. With the asym;dc configuration symmetric crypto is disabled, so crypto-perf test failed and only the compression-perf test passed. The crypto-perf test requires that the symmetric crypto (sym) service to be enabled, while the compression-perf test requires the dc service to be enabled. For testing details please refer following link: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/release-0.30/cmd/qat_plugin/README.md#demos-and-testing Test Plan: PASSED: build-pkgs & build-image PASSED: Check "systemctl status qat_service.service" Service should be up and running. PASSED: Check the "systemctl is-enabled qat_service.service". Service should be enabled. PASSED: Check the "/etc/init.d/qat_service status". The number of QAT VF endpoints should match to QAT supported sriov numvfs i.e 16. PASSED: Check the number of PF and VF config files (Eg: 4xxx_dev0,4xxxvf_dev0.conf) in /etc directory. It should match the total QAT PFs and number of sriov numvfs. It should also have "ServicesEnabled = sym;dc". PASSED: App apply after enabling the QAT plugin chart. After apply, QAT pod should be running. PASSED: Check the description of the node after applying the app using command "kubectl describe node controller-0". It shows the Capacity: qat.intel.com/sym-dc:32 and Allocatable: qat.intel.com/sym-dc: 32 PASSED: Crypto-perf test PASSED: Compression-perf test Story: 2010604 Task: 51052 Change-Id: Ie3d8da6f7d2fb06e0c90b0ba3f93652482cc1277 Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com>	2024-09-26 11:49:29 -04:00
Zuul	deb5736553	Merge "Fix Ceph being unresponsive on AIO-DX standalone controller"	2024-09-24 14:02:16 +00:00
Zuul	7d7b6f3664	Merge "Revert "Update Postgres Auth and Password Encryption""	2024-09-18 18:58:56 +00:00
Jorge Saffe	db10c755a2	Revert "Update Postgres Auth and Password Encryption" This reverts commit 2594c9a860258280a74e7f7248942cd0f9814c23. Reason for revert: Changes are affecting DC' installation/bootstraping Change-Id: Iface6ecb222f703219d05b04d7798d8f336b502a	2024-09-18 18:44:50 +00:00
Zuul	52cd886904	Merge "Update swanctl AppArmor profile for LUKS fs access"	2024-09-16 16:42:33 +00:00
Zuul	14255078d2	Merge "Prevent mtce runtime manifest from sighup'ing processes not running"	2024-09-16 15:48:19 +00:00
Zuul	145cb9744e	Merge "Update Postgres Auth and Password Encryption"	2024-09-16 15:39:28 +00:00

1 2 3 4 5 ...

2126 Commits