config

Author	SHA1	Message	Date
Ayyappa Mantri	236a907296	Fix to clear the alarms on compute/storage nodes When remote ldap service parameters are added, the config out-of-date alarms raised on compute,storage nodes are not cleared, this fix addresses the issue by updating the personalities with compute,storage node types for host update configs with reboot=False Test Cases: PASS: Add remote ldap service parameters on standard lab with compute and storage nodes and apply, verify the alarms are cleared. Closes-bug: 2024916 Change-Id: I3c45d89910db0818daf542edae91b3633cd34173 Signed-off-by: Ayyappa Mantri <ayyappa.mantri@windriver.com>	2023-06-26 12:14:01 -04:00
Zuul	4bbf8d7dd8	Merge "Add kubernetes endpoint health-check with timeout"	2023-06-23 14:19:25 +00:00
Zuul	3e69e2b6d2	Merge "Use fully qualified names for WAD users/groups"	2023-06-22 22:48:45 +00:00
Zuul	c802738bb6	Merge "Fix file descriptor leak on zerorpc Client"	2023-06-22 22:11:49 +00:00
Alyson Deives Pereira	3618b76d94	Fix file descriptor leak on zerorpc Client Reuse zerorpc client with client_provide.get_client_for_endpoint() to avoid leaking file descriptors. Test Plan (AIO-DX): PASS: Without this change, set controller-0 to not use zeromq and configure controller-1 to use RPC hybrid_mode. Verify the number of fd files on controller-1 at /proc/<sysinv-agent-pid>/fd keeps increasing over time. PASS: With this change on controller-1 and the same configuration above, verify that the number of fd files on controller-1 does not increase over time. PASS: With this change on controller-1, update controller-0 to use zeromq instead of rabbitmq, and keep controller-1 on hybrid mode. Verify that no errors occurs on logs and that the number of fds do not increase. Closes-Bug: 2024834 Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com> Change-Id: I85733533b1ff3a2ef869ae2b23730527fc24466e	2023-06-22 17:59:19 -03:00
Carmen Rata	37ec9dcab8	Use fully qualified names for WAD users/groups WAD users and groups discovered by SSSD and imported in the stx platform have been configured to get Linux IDs. When there are users or groups with the same name in multiple WAD domains they need to be unequivocally identified by stx platform NSS, using the full name format "user_name@domain_name". This commit sets the sssd attribute "use_fully_qualified_names" to "true", the default value being "false". The setting ensures that user's full login name, "user_name@domain_name" gets reported to NSS. So, with this change, SSSD discovered users and groups would get fully qualified names on stx platform. All requests pertaining a WAD domain user or group must use the format "name@domain", for example "getent passwd user1@wad.domain1.com". This commit also removes 2 WAD domain attributes that are obsolete. Test Plan: PASS: Debian image gets successfully installed in AIO-SX system. PASS: Configure SSSD to connect to 2 WAD domains, "wad.domain1.com" and "wad.domain2.com". PASS: Create 2 users with the same name "test-user", one in wad.domain1 and the other in wad.domain2. Check using "getent passwd" command that SSSD has cached the users with the fully qualified name: "getent passwd test-user@wad.domain1.com" and "getent passwd test-user@wad.domain2.com". PASS: Check that "getent passwd test-user" or "getent passwd\|grep test-user" does not find the users. PASS: Verify ssh works using the fully qualified names for the users. PASS: Verify that 2 groups with the same name, "test-group", created one in wad.domain1 and the other in wad.domain2 follow the same rules as users with the same names. PASS: Add test-user from wad.domain1 to the test-group in the same domain and verify membership. Verify that test-user in wad.domain2 does not belong to "test-group" in wad.domain1. Story: 2010589 Task: 48270 Signed-off-by: Carmen Rata <carmen.rata@windriver.com> Change-Id: I34388e2f1389cb39a6d258b126572e6a72308b40	2023-06-22 04:03:22 +00:00
Zuul	2b9ea72538	Merge "Eliminate unused function in sysinv helm"	2023-06-20 20:13:01 +00:00
Joshua Reed	ef87a1b523	Eliminate unused function in sysinv helm The file base.py in helm was identified for having a function that is not used externally from the BaseHelm class. The function was verified to be unused anywhere else in the sysinv code base outside the helm folder, or anywhere else as well. The function was eliminated and uses found in helm.py were refactored. Test Plan: PASS: build-pkg -a && build image PASS: AIO-SX full install with clean bootup. PASS: Verified initial system applications installed correctly; system application-list PASS: Installed a new application to verify a new app can be installed cleanly. Used the metrics-server app in /usr/local/share/applications/helm to do so. Story: 2010794 Task: 48246 Change-Id: I9325b3bbc3b10a465cb3535d10042461cc713c9d Signed-off-by: Joshua Reed <joshua.reed@windriver.com>	2023-06-20 11:57:38 -07:00
Zuul	40d0d3f01c	Merge "Enable firewall for DC setups"	2023-06-20 17:59:17 +00:00
Zuul	be9a075fb1	Merge "Add patch validation importing a inactive load"	2023-06-20 13:19:45 +00:00
Boovan Rajendran	a4e440f31f	Add kubernetes endpoint health-check with timeout This checks k8s control-plane component health for a specified endpoint, and waits for that endpoint to be up and running. This checks the endpoint 'tries' times using a API connection timeout, and a sleep interval between tries. The default endpoint is the localhost apiserver readyz URL if not specified. Test plan: Pass: Verified during k8s upgrade abort it waits for all control-plane endpoints to be healthy. Story: 2010565 Task: 48215 Change-Id: I9aae478765cf8aa13b7769127a87e18c33b5fe0b Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>	2023-06-20 03:40:41 -04:00
Zuul	33b6de2a12	Merge "Aborting kubernetes upgrade process for AIO-SX"	2023-06-16 19:27:07 +00:00
Zuul	49d74236f0	Merge "Relocate pxeboot-update script to writable dir"	2023-06-16 19:27:02 +00:00
Andre Kantek	682d17f18f	Enable firewall for DC setups As in the case for non-DC installations, internal cluster traffic for the platform networks will receive a firewall that allows only packets within the internal networks, by filtering only with source IP address and not using L4 ports. It will restrict traffic between the system-controller and the subclouds to the L4 ports described in: https://docs.starlingx.io/dist_cloud/kubernetes/distributed-cloud-ports-reference.html They also restrict the L4 ports to only the networks involved, the subcloud only accepts traffic from the system controller and in the system-controller from the subclouds. The DC rules are applied in the management network (or the admin network, if used in a subcloud). Test Plan [PASS] Install DC system-controller with firewall active [PASS] Install DC subcloud with firewall active (using management network on both sides) [PASS] Modify subcloud to use admin network during runtime [PASS] Validate that only the registered firewall ports are accessible from system-controller to subcloud [PASS] Validate that only the registered firewall ports are accessible from subcloud to system-controller [PASS] Execute a subcloud rehoming Story: 2010591 Task: 48244 Change-Id: I4d27baa601d7f9b43e6c09e703a548656f8846f4 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-06-16 11:54:52 -03:00
Joshua Reed	50aa301e6d	Correct Log bug in sysinv conductor. In file kube_app.py, the class FlucCDHelper has a function called make_fluxcd_operation. Inside there is a LOG.error with an invalid python string formatter. The manifest_dir string is missing, and this change corrects this issue. Test Plan: PASS: build-pkg -a && build-image PASS: full AIO-SX install PASS: run system application-upload /usr/local/share/applications/helm/security-profiles-operator-22.12-1.tgz run system application-list to validate the security profile application is uploaded successfully run system application-apply security-profiles-operator to deploy the application Last: observe the corrected log output in /var/log/sysinv.log Closes-Bug: 2024020 Change-Id: Icfffd04309721193b71654927751b783b9c6ace2 Signed-off-by: Joshua Reed <joshua.reed@windriver.com>	2023-06-15 12:44:10 -07:00
Zuul	d2039c40c0	Merge "Restart vim on admin endpoint re-config"	2023-06-15 16:47:36 +00:00
Zuul	fb2636273f	Merge "Update System Inventory semantic checks to permit pci-sriov AE members."	2023-06-15 13:22:45 +00:00
Guilherme Schons	3952cf4d7f	Add patch validation importing a inactive load This commit add a patch validation to check if the importing load is patch compatible with the current version. This validation uses the current metadata against the comps file inside the patches directory, if it exists. Test Plan: PASS: Import a CentOS pre-patched image upgradable to current version. PASS: Fail to import a CentOS pre-patched image non upgradable to current version. Story: 2010611 Task: 48198 Signed-off-by: Guilherme Schons <guilherme.dossantosschons@windriver.com> Change-Id: I236222fd21de7004ebbfcf585f9e30d7418777c4	2023-06-14 10:09:11 -03:00
Zuul	f0cb88013a	Merge "Add static route subnets to the firewall if in mgmt or admin nets"	2023-06-13 17:41:00 +00:00
Zuul	e8224cb2bd	Merge "Add firewall update when admin network is updated in subclouds"	2023-06-13 17:40:54 +00:00
Mayank Patel	3aa6294ae3	Update System Inventory semantic checks to permit pci-sriov AE members. The current link aggregation (bonding) of platform interfaces is restricted to ports that have an interface class of none. The semantic check need to removed to permit the configuration of Aggregated Ethernet (AE) interfaces with member interfaces that have a class of pci-sriov. This would create a bond interface of the SRIOV Physical Function (PF) network devices. Test Plan: PASS: system host-if-modify -c pci-sriov -N 10 controller-0 enp179s0f0 PASS: system host-if-modify -c pci-sriov -N 10 controller-0 enp179s0f1 PASS: system host-if-add -c platform --aemode active_standby controller-0 bond0 ae enp179s0f0 enp179s0f1 PASS: system host-if-delete controller-0 bond0 PASS: port sharing Story: 2010706 Task: 47903 Change-Id: I8d85bbab88e3d55173bc6db012299b45ec091512 Signed-off-by: Mayank Patel <mayank.patel@windriver.com>	2023-06-13 13:20:55 -04:00
Steven Webster	d754e4a261	Restart vim on admin endpoint re-config A problem can occur when installing a subcloud with the admin network enabled. The symptom is the host not enabling VIM services for approximately one hour after the first unlock. This may manifest in a user noticing that the platform-integ-apps application is not enabled. For context, after ansible bootstrap, sysinv re-configures the admin endpoint for all services. This is because (for one reason), https services cannot be enabled at ansible bootstrap time. The nfv-vim services depend on using the admin endpoints when communicating with sysinv. This can lead to a situation where the VIM service starts before the endpoints are re-configured by sysinv and cannot then deal with the change in endpoint address/port. This is likely not seen when a subcloud uses the management network, as the internal management endpoint can also service the requests from the VIM. This commit simply calls into the nfv-vim runtime puppet class to restart the VIM service on endpoint reconfigure. Testing: - Install system controller, ensure it comes up with no issues - Install subcloud, ensure it comes up and that there are no errors in the nfv-vim log after endpoint reconfigure. In addition, the symptom that first prompted this bug - that is the delay in the application of the platform-integ apps is no longer delayed by ~ 1 hour. Perform test with subcloud using both the management and admin network for communication with the system controller Story: 2010319 Task: 46910 Change-Id: I677b87e949cceba77240bc62217af3889a697b40 Signed-off-by: Steven Webster <steven.webster@windriver.com>	2023-06-13 10:19:51 -04:00
Andre Kantek	a4aaf6853e	Add static route subnets to the firewall if in mgmt or admin nets This change adds, for subclouds, the static routes subnets to the appropriate firewall, mgmt, or admin network, to restrict the set of L4 ports available in distributed cloud setups to be used only from the system controller. It collects the static routes that are using the management or admin networks and adds them to the respective firewall, in a similar way that is done in the system controller. Test plan: [PASS] Install subcloud using management network to connect with system controller [PASS] Modify subcloud in runtime to use the admin network instead of management to connect with system controller [PASS] Execute a subcloud rehoming operation. Story: 2010591 Task: 48185 Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/885457 Change-Id: I2b96c364f0c69d54c08bc5e157f60be335d2b114 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-06-13 11:12:33 -03:00
Boovan Rajendran	b88a2dee7a	Aborting kubernetes upgrade process for AIO-SX This change is to introduce a new system command "system kube-upgrade-abort" which will abort the k8s upgrade process for an AIO-SX. The expected sequence on AIO-SX is: - system kube-upgrade-start <target-version> - system kube-upgrade-download-images - system kube-upgrade-networking - system kube-host-cordon controller-0 - system kube-host-upgrade controller-0 control-plane - system kube-host-upgrade controller-0 kubelet - system kube-host-uncordon controller-0 - system kube-upgrade-complete - system kube-upgrade-delete For system kube-upgrade-start and system kube-upgrade-download-images, when we abort using system kube-upgrade-abort will just change the upgrade state as aborted since it doesn't affect anything. For below mentioned sequence on AIO-SX we can abort the k8s upgrade at any stage here we call a puppet class 'platform::kubernetes::upgrade_abort' which will drain the node, stop the kubelet, containerd, docker and etcd services, restore the etcd snapshot, static manifests files and start the etcd, docker and containerd services, update the bindmount, start kubelet service and wait for control plane pod health. - system kube-upgrade-networking - system kube-host-cordon controller-0 - system kube-host-upgrade controller-0 control-plane - system kube-host-upgrade controller-0 kubelet - system kube-host-uncordon controller-0 The initial Kubernetes version control plane state is stored in a backup containing etcd snapshot and static-pod-manifests. This backup is taken when 'system kube-upgrade-networking' is issued. The "system kube-upgrade-abort" command will only be available prior to the "system kube-upgrade-complete". Test Case: AIO-SX: Fresh install ISO as AIO-SX. Perform k8s upgrade v1.24.4 -> v1.25.3 PASS: Create a test pod, before the etcd backup and delete the pod after taking snapshot run the command "system kube-upgrade-abort", verify test pod is running after etcd is restored successfully. PASS: Verified by performing initial bootstrap and host-unlock prior to bootstrap. PASS: Verify kubeadm and kubelet version restored successfully to the from version after k8s upgrade abort. PASS: Verify static manifest are restored successfully after k8s upgrade abort. PASS: Verify /etc/fstab content updated successfully after k8s upgrade abort. AIO-DX: PASS: "system kube-upgrade-abort" will raise an error 'system kube-upgrade-abort is not supported in duplex'. Story: 2010565 Task: 47826 Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/880263 Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/883150 Change-Id: Ia18079c5f17f86fb73776bfad124c72a0a3be6ad Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>	2023-06-13 03:36:03 -04:00
Zuul	45d47fdfaf	Merge "Fix rollback for Standard config"	2023-06-09 21:19:49 +00:00
Zuul	c77700fa61	Merge "Playbooks directory cleanup on system load-delete"	2023-06-09 20:12:03 +00:00
Zuul	d4b061b26f	Merge "Add unit tests to load-delete workflow"	2023-06-09 20:11:57 +00:00
Luiz Felipe Kina	3093104db0	Fix rollback for Standard config After upgrading the second controller on a Standard config and rolling back a software upgrade on 22.12, the current upgrade abort procedure broke the Ceph monitor quorum resulting in blocked operations to the Ceph cluster. During a platform upgrade of a standard controller deployment, two of three Ceph monitors are required to be unlocked/enabled to maintain a Ceph quorum and provide uninterrupted access to persistent storage. The current procedure and semantic checks required the operator to lock the 3rd monitor on a worker while locking and rolling back the software on one of the active controllers. This left only one monitor active and the Ceph monitors without a an active quorum Another problem was that ceph on controller-1 was being turned off and not turned back on. The change is on the semantic check to make sure that there doesn't need to lock the host which has a monitor and that ceph on controller-1 is up during the entire process of rolling-back on controller-0. Test Plan: Pass: Install Standard 21.12 Patch 9, upgrade to 22.12 make the changes and rollback the upgrade. Pass: Install Storage 21.12 Patch 9, upgrade to 22.12 make the changes and rollback the upgrade. Closes-Bug: 2022964 Signed-off-by: Luiz Felipe Kina <LuizFelipe.EiskeKina@windriver.com> Change-Id: I607a0b2bbf2fa847e8b76425ea5f940be3a81577	2023-06-09 13:12:30 -04:00
Zuul	20e16db383	Merge "SX host-lock failed by "Timeout while waiting on R""	2023-06-09 04:51:28 +00:00
Guilherme Schons	38003607e3	Playbooks directory cleanup on system load-delete Remove playbooks directory from the imported load on the system controller and its mate as part of system load-delete. This directory will only exist on a DC system controller Test Plan: - PASS: Import an inactive load CentOS and clean up all related directories and files after deleting. Story: 2010611 Task: 48159 Signed-off-by: Guilherme Schons <guilherme.dossantosschons@windriver.com> Change-Id: I191e9b0d13fc9954da8be41df7df717d4f1f5b8c	2023-06-08 20:51:34 +00:00
Zuul	96de678403	Merge "Add cluster-pod network to cluster-host firewall in IPv4"	2023-06-07 18:20:57 +00:00
Zuul	4eb916fb2e	Merge "Add functionality for intel qat device plugin"	2023-06-07 14:08:37 +00:00
Andre Kantek	9269aafdbf	Add cluster-pod network to cluster-host firewall in IPv4 It was observed that is not necessarily true that all pod traffic is tunneled in IPv4 installations. To solve that we are extending the solution done in IPv6 to IPv4, which consists in adding the cluster-pod network into the cluster-host firewall The problem showed itself when the stx-openstack application was installed. Test Plan: [PASS] observe stx-openstack installation proceed with the correction Closes-Bug: 2023085 Change-Id: I572cd85e6638d879d8be1d9992ae852a805eca4b Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-06-07 09:50:47 -03:00
Lucas Ratusznei Fonseca	ea383425ed	Inclusion of the subcloud networks in the firewall rules This commit implements the inclusion of the subcloud networks in the firewall ingress rules. The networks are obtained from the routes table. Test plan: Setup: Distributed Cloud with AIO-DX as system controller. [PASS] Add subcloud, check that the corresponding network is present in the system controller's firewall. [PASS] Remove subcloud, check that the corresponding network is no longer present in the system controller's firewall. Story: 2010591 Task: 48139 Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/885303 Change-Id: Ia83c26c88914413026953fcef97af55fe65bd058 Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>	2023-06-05 15:50:06 -03:00
Andre Kantek	2a662a4572	Add firewall update when admin network is updated in subclouds This change adds the firewall update when the interface-network API is executed with the admin network Test Plan: [PASS] add a admin network during runtime in a subcloud Story: 2010591 Task: 48186 Change-Id: I7b556ec8d95bf879cb9036d654e38fd658da5a61 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2023-06-05 15:29:20 -03:00
Guilherme Schons	393a839630	Add unit tests to load-delete workflow This commit add unit tests to load-delete worklow, improving test coverage and validating the flow for future changes. Test Plan: - PASS: Tox tests Story: 2010611 Task: 48159 Signed-off-by: Guilherme Schons <guilherme.dossantosschons@windriver.com> Change-Id: If6a27af3a76d4aff8e4168b72ad5892046fe9ba6	2023-06-05 10:52:40 -03:00
Zuul	45814e8e48	Merge "New image parsing pattern that supports "registry""	2023-06-05 13:10:33 +00:00
Md Irshad Sheikh	2f4997cf74	Add functionality for intel qat device plugin Pods of intel qat device plugin will only be created on nodes with label “intelqat: enabled” which support intel qat drivers. In this commit, sysinv agent will check host QAT device driver. once detected supported device, sysinv agent would send request to sysinv conductor, and conductor would set kubernetes label “intelqat: enabled” for specific node if file “/opt/platform/config/22.12/enabled_kube_plugins” exists and "intelqat" is in the file. To detect whether host supports QAT devices or not, sysinv code is added based on following approaches: 1. Output of lspci command is parsed to check whether PFs are listed for 8086 vendor and 4940/4942 QAT devices on Saphire Rappid lab. 2. Output of lspci command is parsed to check whether VFs are listed for other QAT devices, referring old code implementation as no hardware available to validate this. TEST PLAN: PASS: verified “intelqat: enabled" using "kubectl get nodes controller-0 --show-labels" command. PASS: checked whether daemonset pods are running or not, using command "kubectl get ds -A \| grep 'qat-plugin'" NOTE: No QAT hardware is available, so all testing is done using commands after bypassing the driver related checks in the code Story: 2010604 Task: 47853 Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com> Change-Id: Ib5d4bafbf918f4c0e3ebcb5fa78f90d021e0ef20	2023-06-05 12:17:50 +00:00
Zuul	a2cba775b2	Merge "Enable sysinv no-value-for-parameter check in pylint"	2023-06-02 19:02:34 +00:00
Zuul	3b39dd5b8e	Merge "Add joined query to the InterfaceDataNetworks model"	2023-06-01 19:31:24 +00:00
Zuul	8521fbb6ba	Merge "Reduce sudo rules refresh time for openldap"	2023-06-01 18:39:43 +00:00
David Barbosa Bastos	12bb149f92	New image parsing pattern that supports "registry" Added support for new "registry" pattern. Image settings inside charts can now have the following pattern: image: registry: <str> repository: <str> Test Plan: PASS: Upload and apply process successfully completed with tarball changed to new pattern using "registry" PASS: metrics-server, nginx-ingress-controller, vault and sts-silicom upload and apply process without "registry" completed successfully. Closes-bug: 2019730 Change-Id: Id5cadafedf9b85891700dffcede9b0b09ee64359 Signed-off-by: David Barbosa Bastos <david.barbosabastos@windriver.com>	2023-06-01 15:56:56 +00:00
Zuul	a0d420ea57	Merge "Restore openstack/common/context file"	2023-06-01 15:56:08 +00:00
Zuul	f4ec321b9f	Merge "Add system health check for psp policy"	2023-06-01 14:54:14 +00:00
David Barbosa Bastos	e2af795583	SX host-lock failed by "Timeout while waiting on R" When sysinv is restarted and there is an application stuck, the _abort_operation function was called with a parameter different from the expected one. The parameter needs to be an instance of AppOperation.Application. Function call was changed with correct parameter and added documentation in the _abort_operation function. Test Plan: PASS: Restart sysinv successfully PASS: Restart sysinv with stuck kubervirt app performed successfully PASS: Successfully lock and unlock the controller PASS: Shows the name of the chart that caused the app to abort PASS: Individually show the image that failed when trying to apply the app PASS: Command "system application-abort" executed and output message "operation aborted by user" displayed in application-list as expected Closes-bug: 2022007 Signed-off-by: David Bastos <david.barbosabastos@windriver.com> Change-Id: I948ec8f9700d188a5f8e099a4992853822735b95	2023-06-01 14:17:08 +00:00
Lucas de Ataides	c64a8f36a4	Add joined query to the InterfaceDataNetworks model When performing actions on STX-Openstack (apply, re-apply), sysinv logs get filled with a "DetachedInstanceError" message. This was already found previously and fixed by [1], but it looks like that isn't working for the STX-Openstack application, when the app is applied (or re-applied), Neutron needs to fetch some information about the interfaces' data networks, which is throwing this DetachedInstanceError message. Checking the SQLAlchemy documentation [2], it looks like lazy load can be allowed via a relationship parameter on the model, which fits this case since the VirtualInterfaces (Interface) model needs to get information of the InterfaceDataNetwork model. This was also previously done for the InterfaceNetworks model on [3] This change allows the lazy load of the InterfaceDataNetwork model so no error messages are logged to sysinv. [1] https://review.opendev.org/c/starlingx/config/+/826340 [2] https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html [3] https://review.opendev.org/c/starlingx/config/+/657645 Test Plan: PASS - Build sysinv package PASS - Build ISO with new sysinv package PASS - Install and bootstrap an AIO-SX machine PASS - Perform a lock/unlock on an AIO-SX machine PASS - Apply STX-Openstack app PASS - Re-apply STX-Openstack app PASS - Visual inspection of sysinv.log shows nothing unusual Closes-Bug: 1998512 Change-Id: Ic5b168e1a01dc53aa3f8658547c1f4776e681cdc Signed-off-by: Lucas de Ataides <lucas.deataidesbarreto@windriver.com>	2023-06-01 09:53:04 -03:00
Rahul Roshan Kachchap	6d3cfd86ee	Add system health check for psp policy Removal of PSP Support as part of k8s 1.25/1.26 transition, adding a check in the system health-query that there are NO PSP policies present in the cluster. With the release of Kubernetes v1.25, PodSecurityPolicy has been deprecated. We can read more information about the removal of PodSecurityPolicy in the Kubernetes 1.25 release notes here: https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation -past-present-and-future/ The check should FAIL the scenario and log the error output in sysinv log with the psp resource names exisitng in the cluster, asking the user to remove it before the upgrade. Test Plan AIO-SX: Perform system health-query PASS: Iso creation PASS: bootstrap PASS: RUN system health-query with no PSP policies Output: System Health: All hosts are provisioned: [OK] All hosts are unlocked/enabled: [OK] All hosts have current configurations: [OK] All hosts are patch current: [OK] No alarms: [OK] All kubernetes nodes are ready: [OK] All kubernetes control plane pods are ready: [OK] All PodSecurityPolicies are removed: [OK] PASS: RUN system health-query with existing PSP policies Output: System Health: All hosts are provisioned: [OK] All hosts are unlocked/enabled: [OK] All hosts have current configurations: [OK] All hosts are patch current: [OK] No alarms: [OK] All kubernetes nodes are ready: [OK] All kubernetes control plane pods are ready: [OK] All PodSecurityPolicies are removed: [Fail] PSP policies exists, please remove them before upgrade: privileged, restricted PASS: RUN system health-query-kube-upgrade with no PSP policies PASS: RUN system health-query-kube-upgrade with existing PSP policies All hosts are provisioned: [OK] All hosts are unlocked/enabled: [OK] All hosts have current configurations: [OK] All hosts are patch current: [OK] No alarms: [OK] All kubernetes nodes are ready: [OK] All kubernetes control plane pods are ready: [OK] All PodSecurityPolicies are removed: [Fail] PSP policies exists, please remove them before upgrade: privileged, restricted All kubernetes applications are in a valid state: [OK] PASS: RUN system health-query-upgrade with no PSP policies PASS: RUN system health-query-upgrade with existing PSP policies All hosts are provisioned: [OK] All hosts are unlocked/enabled: [OK] All hosts have current configurations: [OK] All hosts are patch current: [OK] No alarms: [OK] All kubernetes nodes are ready: [OK] All kubernetes control plane pods are ready: [OK] All PodSecurityPolicies are removed: [Fail] PSP policies exists, please remove them before upgrade: privileged, restricted No imported load found. Unable to test further Story: 2010590 Task: 48145 Change-Id: I3787bdce505c2d18f5312fc32e95c507d8916b3d Signed-off-by: Rahul Roshan Kachchap <rahulroshan.kachchap@windriver.com>	2023-06-01 05:56:54 -04:00
Carmen Rata	c647ea70c3	Reduce sudo rules refresh time for openldap New openldap users with sudo permision are not able to exercise the sudo capabilities for a maximum of 15min since creation because the sudo rules refresh interval has the default value of 15min. This commit reduces the sudo rules refresh interval to 5min to improve usability. This change will only be done for local openldap server. The interval value was chosen to match the value of "ldap_enumeration_refresh_timeout" attribute that specifies how long SSSD has to wait before refreshing its cache of enumerated records. Due to performance consideration, it is not advisable to reduce the sudo rules refresh time for WAD servers because of their large number of users. Reducing the refresh interval can have negative performance impact. This commit also adjusts the sudo rules search criteria. Test Plan: PASS: Successful install in AIO-SX system configuration. PASS: Create a new openldap user with sudo permissions and verify that sudo capabilities are available for the user in maximum 300sec (5min). PASS: Verify remote ssh connection for a new openldap user. PASS: Verify SSSD sudo rules search works as expected for openldap and WAD servers. Story: 2010589 Task: 48164 Signed-off-by: Carmen Rata <carmen.rata@windriver.com> Change-Id: Ieb8e23068b82e09c3feeec4c8317d32d45ff64e6	2023-05-31 23:40:13 +00:00
Zuul	bc8fb24c90	Merge "Add functionality for intel gpu device plugin"	2023-05-31 20:57:11 +00:00
Al Bailey	1b9cff5fd8	Enable sysinv no-value-for-parameter check in pylint The recent PyYaml upversion issues would have been seen in Zuul/Tox if this check had been enabled. These changes have no impact on the runtime code. This also disables the large import graph report at the end of pylint executions, which makes it difficult to see pylint errors. Test Plan: Pass: tox Story: 2010642 Task: 48161 Signed-off-by: Al Bailey <al.bailey@windriver.com> Change-Id: Ib0c27b2dfebea0ef82345da9a4935b79f00daa5a	2023-05-31 19:43:19 +00:00

1 2 3 4 5 ...

3642 Commits