config

Author	SHA1	Message	Date
Andre Kantek	4459b82f32	Dual-stack: ceph matches address name and family This change splits the IP service for each platform network into ipv4 and ipv6 to support dual-stack. It still supports single-stack (when there is only ipv4 or ipv6). Each service is instantiated if there is a configuration for it. Ceph was not taking into account the address family to generate the list of IPs using the primary pool. This lead to a wrong puppet variable content. Test Plan: [PASS] install, lock, unlock and swact for the following setups - AIO-SX (IPv4 and IPv6) - AIO-DX (IPv4 and IPv6) - Standard (IPv4 and IPv6) - DC (SisCtrl=AIO-DX, subcloud=AIO-SX) [PASS] Add dual-stack configuration and validate services operation with lock, unlock and swact: - AIO-SX (IPv4 and IPv6) - AIO-DX (IPv4 and IPv6) - Standard (IPv4 and IPv6) - DC (SisCtrl=AIO-DX, subcloud=AIO-SX), using the admin network Story: 2011027 Task: 49763 Change-Id: Icda298c51cdd2535146b1e11669f1c6f64c232b7 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2024-04-17 07:24:16 -03:00
Zuul	78b3fe851f	Merge "Create a set_users_options method in openstack endpoint config"	2024-04-15 18:32:23 +00:00
Zuul	2b50ac3e06	Merge "Add pxeboot network hostname resolution for controllers"	2024-04-15 14:52:38 +00:00
Erickson Silva de Oliveira	0c852b54fb	Fix condition for deleting database partition In change [1], a restore in progress check was added, however the flag used for this is removed at the end of the restore playbook that is executed on controller-0, causing possible problems if agents on other nodes send a report with incomplete information due to restore. To resolve this, the _verify_restore_in_progress() function was used, which queries the restore status in the database, and is only modified when executing the "system restore-complete" command. This way we will know that from then on the agent's reports can be considered. Additionally, the “system host-reinstall” command has also been observed to cause similar issues if run on a restored system. To prevent this from happening, another condition was added, which checks if inv_state is "reinstalling". [1]: https://review.opendev.org/c/starlingx/config/+/899510 Test-Plan: PASS: AIO-SX fresh install PASS: Standard fresh install PASS: create/modify/delete a partition in the controller-0/controller-1/compute-0 followed by a reboot and check the status with 'system host-disk-partition-list'. PASS: Restart of sysinv-conductor and/or sysinv-agent services during puppet manifest applying. PASS: AIO-SX Backup and Restore PASS: Standard Backup and Restore Closes-Bug: 2061170 Change-Id: I6c142439c9f13dcdeb493892a5a9283f6a1e2d00 Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>	2024-04-12 14:31:49 -03:00
Zuul	a1aa5b93fb	Merge "Add Intermediate CA support to IPsec configuration"	2024-04-12 14:39:31 +00:00
Zuul	019eeb5016	Merge "Fix IPsec certificates renewal script"	2024-04-12 14:31:29 +00:00
Zuul	82c95f934e	Merge "Make minimum Kubernetes version field mandatory"	2024-04-11 13:58:16 +00:00
Leonardo Mendes	2446746b41	Fix IPsec certificates renewal script This commit fix IPsec certificates renewal script, which is set up as a cron job to run daily at mid night. Due to a recent change, the name of system-loca-ca certificate was changed to system-local-ca-1 and the function that returns the time left to the certificate expiration was not working properly. Test Plan: PASS: Change system date to simulate IPsec cert is about to expire, adjust the system to work properly all pods and services needed to run ipsec-client and run the script, verify IPsec cert, private key and trusted CA cert are renewed, and IKE SAs and CHILD SAs are re-established. PASS: Change the certificate /etc/swanctl/x509ca/system-local-ca-1.crt to simulate the IPsec trusted CA cert is different from the system-local-ca in k8s secret, run the script, verify the trusted CA and IPsec cert/key are renewed, and IKE SAs and CHILD SAs are re-established. Story: 2010940 Task: 49850 Change-Id: Iea88211221d55df763f3f86853d402fffcb58c68 Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>	2024-04-11 10:43:51 -03:00
Eric MacDonald	7e34c08e96	Add pxeboot network hostname resolution for controllers Worker and storage nodes currently support pxeboot-N hostname nslookup resolution because they dhcp for their pxeboot network lease address provided by dnsmasq persists. However, this is not true for the controllers. Although they may initially dhcp for a pxeboot address, that address is overridden by their statically assigned pxeboot network address(es). Adding the controller pxeboot network hostnames and addresses to the dnsmasq.addn_hosts file yields proper pxeboot hostname resolution for controllers. From Controllers: [sysadmin@controller-0 ~$ nslookup pxeboot-2 Server: fdff:10:80:27::2 Address: fdff:10:80:27::2#53 Name: pxeboot-2 Address: 192.168.202.3 [sysadmin@controller-0 ~$ nslookup pxeboot-1 Server: fdff:10:80:27::2 Address: fdff:10:80:27::2#53 Name: pxeboot-1 Address: 192.168.202.2 sysadmin@controller-1:~$ nslookup pxeboot-1 Server: fdff:10:80:27::2 Address: fdff:10:80:27::2#53 Name: pxeboot-1 Address: 192.168.202.2 sysadmin@controller-1:~$ nslookup pxeboot-2 Server: fdff:10:80:27::2 Address: fdff:10:80:27::2#53 Name: pxeboot-2 Address: 192.168.202.3 From Worker: sysadmin@worker-0:~$ nslookup pxeboot-1 Server: 192.168.204.1 Address: 192.168.204.1#53 Name: pxeboot-1 Address: 169.254.202.2 Now all hosts in the system support pxeboot hostname nslookup Also, this update adds an explicit call to _generate_dnsmasq_hosts_file to the conductor process restart. This handles the case where dnsmasq publishes a new lease to handle while the sysinv conductor is not running or being restarted. Test Plan: PASS: Verify build and install AIO DX Plus system. PASS: Verify format of new additions to dhsmasq.addn_hosts file. PASS: Verify nslookup using controller pxeboot hostnames from either controller or even a worker node. PASS: Verify no new pep8 warnings or errors are added to the conductor manager.py. Story: 2010940 Task: 49829 Change-Id: Ibacdaadd24cf8c73fec98167d4a79fece341b1e6 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2024-04-10 17:16:31 +00:00
Raphael Lima	c7f3d71c5f	Create a set_users_options method in openstack endpoint config This commit creates the set_users_options in openstack_config_endpoints.py, which is required in [1] in order to set the ignore_lockout_failure_attempts for both sysinv and admin users during the sysinv bootstrap process. [1]: https://review.opendev.org/c/starlingx/ansible-playbooks/+/913930 Test plan: Note that all of the test cases were performed with the changes from [1]. 1. PASS: Verify the openstack user, role, service and endpoints configuration for sysinv after bootstrap 2. PASS: Verify that both the admin and sysinv openstack users contain the ignore_lockout_failure_attempts option set to True. Story: 2011035 Task: 49844 Change-Id: I9c11e7305602d24f8170759f5f9363e4a6d012a4 Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>	2024-04-10 10:01:14 -03:00
Leonardo Mendes	49df34a4f4	Add Intermediate CA support to IPsec configuration The current implementation of IPsec configuration by IPsec server/client supports Root CA only. This commit adds support for Intermediate CA. Now, IPSec Auth Server send both certificates to IPSec Auth client to store. If it's a self-signed certificate, the same certificate is send as Root CA. Test plan: PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0 and a self-signed CA installed. Run "ipsec-client pxecontroller --opcode 1" in controller-1. Observe that 4 CAs certificates are created, but they are the same certificate. Observe that a security association is established between the hosts via "swanctl --list-sas" command. PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0 and a self-signed CA installed. Run "ipsec-client pxecontroller --opcode 2" in controller-1. Observe the previously created CertificateRequest was deleted and generated a new one for controller-1's node. The new certificate is sent to IPsec Client with Root and Intermediate CA, which is the same, to be stored and the swanctl rekey command executed successfully. PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0 and an intermediate CA installed. Run "ipsec-client pxecontroller --opcode 1" in worker-0. Observe that 4 CAs certificates are created, including Root and Intermediate CA. Observe that a security association is established between the hosts via "swanctl --list-sas" command. PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0 and an Intermediate CA installed. Run "ipsec-client pxecontroller --opcode 2" in worker-0. Observe the previously created CertificateRequest was deleted and generated a new one for worker-0's node. The new certificate is sent to IPsec Client with Root and Intermediate CA to be stored and the swanctl rekey command executed successfully. PASS: In a DX system, simulate the IPsec cert is about to expire, run the script, verify IPsec cert, private key and trusted CA cert are renewed. Story: 2010940 Task: 49825 Change-Id: I25c973350c4f460233a4e6e5ddda8366b948d120 Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>	2024-04-09 16:01:53 -03:00
Zuul	7299fa6118	Merge "Fix usage of address_get_by_name"	2024-04-05 22:24:24 +00:00
Steven Webster	92f00f80fa	Fix usage of address_get_by_name Recent commit https://opendev.org/starlingx/config/commit/634d4916 introducing changes for dual-stack networking made a change to the DB api's address_get_by_name to return a list of IPv4 and IPv6 addresses rather than a singular address. As such, the list can be empty if there are no addresses associated with a particular name, rather than throwing an AddressNotFoundByName exception. Currently, the interface_network code depends on the AddressNotFoundByName exception to determine whether a new address needs to be allocated for a dynamic network. This can cause an issue for worker, storage nodes when one of their interfaces is associated with certain networks (such as the storage network). The symptom of this may be an interface which is 'DOWN' after unlock, as it's interface configuration file is marked for a 'static' address, with no address present (because it wasn't allocated). This commit fixes the issue by simply checking that the list returned by address_get_by_name is empty. Test Plan: - Fresh install of a Standard system. - Ensure named addresses are present in the DB for all nodes (mgmt, cluster-host, oam for controllers) - Create a new address pool and storage network and assign it to a worker node interface. - Unlock the worker node and ensure the address is present on the interface and it is in 'UP' state. Story: 2011027 Task: 49627 Change-Id: I9763f7c71797d9b321e7bf9e1b6db759378af632 Signed-off-by: Steven Webster <steven.webster@windriver.com>	2024-04-05 10:56:47 -04:00
Igor Soares	3773c65f61	Make minimum Kubernetes version field mandatory Make the supported_k8s_version:minimum metadata field mandatory for StarlingX applications. The minimum supported Kubernetes version must be informed in the application metadata.yaml file. For instance: supported_k8s_version: minimum: 1.24.4 Existing applications were previously updated to include the mandatory field as part of story 2010929. Test plan: PASS: build-pkgs -a && build-image PASS: AIO-SX fresh install PASS: Atempt to upload a modified version of platform-integ-apps without the supported_k8s_version section. Confirm that the upload failed. PASS: Atempt to upload a modified version of platform-integ-apps with the the supported_k8s_version section but containing only the maximum supported Kubernetes version. Confirm that the upload failed. PASS: Upload/apply/update/remove/delete a working version of platform-integ-apps. Story: 2010929 Task: 49538 Change-Id: I10160dfcfcc82eb8978b96c87e356db7b6cd227a Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>	2024-04-05 11:11:05 -03:00
Zuul	8ea80c4b27	Merge "Filter cert-mon for geo-redundancy in audit and DC_CertWatcher"	2024-04-04 21:54:21 +00:00
Kyle MacLeod	03443ef16c	Filter cert-mon for geo-redundancy in audit and DC_CertWatcher This commit adds a filter for querying all subclouds from dcmanager, to account for secondary subclouds that should not be audited by cert-mon for this system controller. The filter is performed against a list of invalid deploy states that should be considered when querying the list of subcloud from dcmanager. Likewise, the DC_CertWatcher -> DCIntermediateCertRenew flow must ensure that subclouds which are secondary to this system controller are ignored by the kubernetes watch in place for the DC intermediate cert renewal detection. Subclouds are filtered by the watch based on their online state and their deploy-status. A subcloud with invalid deploy state is ignored by this system controller. Test Cases PASS: - Trigger audits on service restart. Verify that offline/secondary subclouds are excluded. - Ensure full daily audit is executed. Verify that all subclouds belonging to this system controller are audited. Secondary subclouds are not audited. - Verify that DC_CertWatcher -> DCIntermediateCertRenew watch fires are ignored for offline and/or invalid deploy state Closes-Bug: 2060068 Change-Id: Iffe3d7c76db8d2f17aed0bfebc792af0f9d75ca2 Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>	2024-04-04 15:36:06 -04:00
Rei Oliveira	5d853423ef	Wrap 'classes' parameter as a list in config_dict object This change fixes a type mismatch bug introduced in [1]. A python list is expected but a python str is provided instead. [1] https://review.opendev.org/c/starlingx/config/+/893566 This type mismatch will result in the 'deadlock' prevention logic to never be invoked. In [2] below, the 'if classes' branch is never entered: [2] `85a548ffcc/sysinv/sysinv/sysinv/sysinv/conductor/manager.py (L13481)` Test plan: PASS: Run 'sudo chage -M 999 sysadmin; sudo chage -M 888 sysadmin; sudo chage -M 777 sysadmin'. Notice 'out of config alarm' in 'fm alarm-list'. Verify that it clears up after about 5 min. PASS: Verify in i_user db table and /etc/shadow that it correctly contains the last password age, 777 in this case. Note: In a managed subcloud, the value in /etc/shadow file will be changed again in about 20 min to sync with the sysadmin password and age in the system controller. Closes-Bug: 2034446 Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com> Change-Id: I24d9807e9eb2d94e026be7b8f3448a6cd42fcdd6	2024-04-04 14:45:03 -03:00
Zuul	eff4e08e44	Merge "Prevent multiple datanetworks to same interface"	2024-04-03 16:26:02 +00:00
Caio Bruchert	f6158f5b02	Prevent multiple datanetworks to same interface Since sriov-network-device-plugin upgrade to v3.5.1, assigning multiple datanetworks to the same interface is not possible anymore. This change restricts the system interface-datanetwork-assign command to prevent that from happening. Test Plan: PASS: assign datanetwork1 to sriov0 interface: ok PASS: assign datanetwork2 to same sriov0 interface: fails PASS: create new vf0 interface on top of sriov0: ok PASS: assign datanetwork1 to vf0: ok PASS: assign datanetwork2 to vf0: fails PASS: create new vf1 interface on top of sriov0: ok PASS: assign datanetwork2 to vf1: ok PASS: assign datanetwork1 to vf1: fails Closes-Bug: 2059960 Change-Id: If3ab95594917089f01475f9595c9059edeae85f5 Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>	2024-04-02 17:25:41 -03:00
Zuul	e4b32b7e16	Merge "Fix charts upload when there are existing ones"	2024-04-02 16:57:39 +00:00
Igor Soares	b1b160f48b	Fix charts upload when there are existing ones This fixes a bug that prevents StarlingX application charts from being uploaded to the helm repository when one or more of them have been uploaded before. The charts upload logic was changed to check if all charts provided by the given application are valid prior to uploading. If a chart is invalid then no charts for that application will be uploaded, since the upload process cannot proceed in that scenario. Test Plan: PASS: build-pkgs -a && build-image PASS: AIO-SX fresh install PASS: Build a platform-integ-apps version containing one existing chart and two nonexistent charts in the local Helm repository. Update platform-integ-apps to the built version. Confirm that the existing chart was not re-uploaded and that the nonexistent ones were correctly uploaded to the Helm repository. PASS: Apply/remove/delete platform-integ-apps Closes-Bug: 2053074 Depends-on: https://review.opendev.org/c/starlingx/integ/+/912305 Change-Id: I155d457f58be1986cc6f25178929aedfbe1d0693 Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>	2024-04-02 12:05:28 -03:00
Zuul	2cbdc83b04	Merge "Expose Kubernetes ApiextensionsV1Api"	2024-04-01 19:16:12 +00:00
Zuul	25d58ebcf8	Merge "First check Root CAs on kube-cert-rotation.sh"	2024-03-29 00:06:34 +00:00
Rei Oliveira	01a5ea0843	First check Root CAs on kube-cert-rotation.sh As of now, the script only verifies the validity of leaf certificates and, if expired, will regenerate them based on K8s/etcd Root CAs. It doesn't account for the possibility of Root CAs being expired. It will generate leaf certificates based on Root CAs, even if said Root CAs are expired. This change fixes that behaviour by first checking validity of Root CAs and only allowing leaf certificate renewal if RCAs are valid. Test plan: PASS: Cause Root CAs to expire, run kube-cert-rotation.sh script and verify that it fails with an error saying Root CAs are expired and leaf certificates are not renewed. PASS: Ensure to have valid Root CAs, cause leaf certificates to expire, run kube-cert-rotation.sh and verify that the script executes normally and is able to renew the leaf certificates. Closes-Bug: 2059708 Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com> Change-Id: I98dfd8d1417754f3c723d8ddd52a856785ffc83b	2024-03-28 14:28:34 -03:00
Zuul	de9d380dc9	Merge "Update swanctl.conf cacerts w/ system-local-ca files"	2024-03-28 15:10:34 +00:00
Manoel Benedito Neto	abef79e45f	Update swanctl.conf cacerts w/ system-local-ca files This commit introduces a new configuration for swanctl.conf file where cacerts references two system-local-ca files. The two files represents the last (system-local-ca-0.crt) and the current (system-local-ca-1.crt) certificates associated with system-local-ca. The main goal of this implementation is to maintain SAs in all nodes during the update of system-local-ca certificate. Test plan: PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0. Run "ipsec-client pxecontroller --opcode 1" in worker-0. Observe that certificates, keys and swanctl.conf files are created in worker-0 node. Observe that a security association is established between the hosts via "sudo swanctl --list-sas" command. PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0. Run "ipsec-client pxecontroller --opcode 2" in controller-1. Observe the previously created CertificateRequest was deleted and generated a new one for controller-1's node. The new certificate is sent to IPsec Client and stored with the swanctl rekey command executed sucessfully. Story: 2010940 Task: 49777 Change-Id: I638932a602ed9423d20ed448e5aada499ef65d77 Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>	2024-03-28 13:40:10 +00:00
Zuul	160aed20ba	Merge "Handle FM user during endpoint config"	2024-03-26 18:01:25 +00:00
Zuul	5c1569362b	Merge "Report port and device inventory after the worker manifest"	2024-03-26 16:15:05 +00:00
Tara Subedi	933d3a3a73	Report port and device inventory after the worker manifest This is incremental fix of bug:2053149. Upon network boot (first boot) of worker node, agent manager is supposed to report ports/devices, without waiting for worker manifest, as that would never run on first boot. Without this, after system restore, it will be unable to unlock compute node due to sriov config update. kickstart records first boot as "/etc/platform/.first_boot". Agent manager deletes this file. In case agent manager get crashed, it will start again. This time, agent manager don't see .first_boot file, and don't know this is still first boot and it won't report inventory for the worker node. This commit fixes this issue by creating volatile file "/var/run/.first_boot" before deleting "/etc/platform/.first_boot", and agent relies on both files to figure out it is first boot or not. This present same logic for multiple crash/restart of agent manager. TEST PLAN: PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues. PASS: Network-boot worker node, before doing unlock, restart agent manager (sysinv-agent), check sysinv.log to see ports are reported. Closes-Bug: 2053149 Change-Id: Iace5576575388a6ed3403590dbeec545c25fc0e0 Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>	2024-03-26 10:37:56 -04:00
Zuul	85a548ffcc	Merge "Correct Kubernetes control-plane upgrade robustness skip_update_config"	2024-03-25 20:20:03 +00:00
Zuul	839b9b554d	Merge "Add IPsec certificates renewal cron job"	2024-03-25 15:07:05 +00:00
Jim Gauld	4522150c87	Correct Kubernetes control-plane upgrade robustness skip_update_config This removes the skip_update_config parameter from the _config_apply_runtime_manifest() call when upgrading Kubernetes control-plane. This parameter was unintentially set to True, so this configuration step did not persist. This caused generation of 250.001 config-out-of-date alarms during kube upgrade. The review that introduced the bug: https://review.opendev.org/c/starlingx/config/+/911100 TEST PLAN: - watch /var/log/nfv-vim.log for each orchestrated upgrade PASS: orchestrated k8s upgrade (no faults) - AIO-SX, AIO-DX, Standard PASS: orchestrated k8s upgrade, with fault insertion during control-plane upgrade first attempt - AIO-SX - AIO-DX (both controller-0, controller-1) - Standard (both controller-0, controller-1) PASS: orchestrated k8s upgrade, with fault insertion during control-plane upgrade first and second attempt, trigger abort - AIO-SX - AIO-DX (first controller) Closes-Bug: 2056326 Change-Id: I629c8133312faa5c95d06960b15d3e516e48e4cb Signed-off-by: Jim Gauld <James.Gauld@windriver.com>	2024-03-23 19:56:04 -04:00
Zuul	6775a04444	Merge "Fix runtime_config_get method to avoid type error"	2024-03-22 19:59:19 +00:00
Zuul	c5b40d42b6	Merge "Prune stale backup in progress alarm 210.001"	2024-03-22 19:38:31 +00:00
rummadis	a3a20fcf59	Prune stale backup in progress alarm 210.001 User unable to take subcloud backup when there is a stale backup in progress alarm Example: When user tries to take subcloud backup in Distributed cloud env if there is stale 210.001 alarm present in subcloud then user can not trigger the subsequent subcloud backup This Fix helps to identify the 210.001 alarms and clear them if they are pending more than 1 hour TEST PLAN: PASS: DC-libvirt setup with 2 controllers and 2 subclouds PASS: verified stale 210.001 getting removed Closes-Bug: 2058516 Change-Id: Iedcc5e41cd4245c538d331d9aa8c2b6cc445acce Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>	2024-03-22 14:44:47 -04:00
Gustavo Pereira	b356e7ac5a	Add mtce to endpoint reconfiguration script Add mtce user to endpoint reconfiguration script to improve bootstrap execution time. The related puppet class and tasks will be removed in commit: https://review.opendev.org/c/starlingx/stx-puppet/+/912319. Test Plan: PASS: Deploy a subcloud without the changes and record its bootstrap execution time. Deploy another subcloud with the proposed changes. Verify successful subcloud deployment and the bootstrap execution time is 80s faster. PASS: Verify a successful AIO-SX deployment. PASS: Verify a successful AIO-DX controller deployment. PASS: Verify a successful DC environment deployment. Story: 2011035 Task: 49695 Change-Id: I2075026bd378ef3b30978a6d420fbb2253ba290c Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>	2024-03-22 14:48:15 -03:00
Heitor Matsui	fd5d603d86	Fix runtime_config_get method to avoid type error An issue was found when config_applied for a host assumed the default value, which is the string "install" (refer to [1]), returning a type error in runtime_config_get trying to compare string "install" with a column "id" with type int. This commit fixes runtime_config_get method by inverting the logic: if the id passed is an int then compare with id, if it is not then assume it is a string and compare with config_uuid column. [1] `15aefdc468/sysinv/sysinv/sysinv/sysinv/agent/manager.py (L116)` Test Plan PASS: set config_applied="install" for a host, force inventory report and observe no more database errors on sysinv.log PASS: install/bootstrap/unlock AIO-DX Story: 2010676 Task: 49745 Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com> Change-Id: I9c687a1eb67c62291f1d2aa9cef1d6fbe993d0fa	2024-03-21 17:12:17 -03:00
Zuul	1573412c4d	Merge "Modify Host Personality for attribute max_cpu_mhz_configured"	2024-03-21 18:48:43 +00:00
Zuul	a1211d16d4	Merge "Handle Barbican user during endpoint config"	2024-03-21 17:04:07 +00:00
Guilherme Santos	9a564e455e	Expose Kubernetes ApiextensionsV1Api This commit exposes the Kubernetes API extensions in order to allow StarlingX Applications to manage CRDs using API calls. Test Plan: PASS: AIO-SX host-lock and host-unlock run successfully. PASS: Exposed resources have been called from an Application and run accordingly. Story: 2011069 Task: 49756 Change-Id: I7d04d3e779dae9ebf95403c8afb93fe6d048993b Signed-off-by: Guilherme Santos <guilherme.santos@windriver.com>	2024-03-21 10:34:58 +00:00
Poornima Y N	7fc11de9ee	Modify Host Personality for attribute max_cpu_mhz_configured Max_cpu_mhz_personality is the attribute of the host which can be configured in host where turbo freq is enabled.In case of host whose role is both controller and worker, the personality for the attribute was not taken care to include such scenario. Made the changes in the sysinv conductor to update the host personalities based on the function that node operates, which handles the scenario when the host acts as both controller and worker node. TEST PLAN: PASS: Build and deploy ISO on Simplex PASS: Check whether the max cpu freq set on a simplex Below are the commands: system host-show <host_id> \| grep is_max_cpu_configurable system service-parameter-list --name cpu_max_freq_min_percentage system service-parameter-modify platform config cpu_max_freq_min_percentage=<> system host-update <host_id> max_cpu_mhz_configured=<value in mhz> After above commands check whether cpu is set using below command: sudo turbostat Closes-Bug: 2058476 Change-Id: I08a5d1400834afca6a0eeaaa8813ac8d71a9db15 Signed-off-by: Poornima Y N <Poornima.Y.N@windriver.com>	2024-03-21 04:55:02 -04:00
Salman Rana	bdac091e77	Handle FM user during endpoint config Add FM user to endpoint reconfiguration script, following the migration of FM bootstrap from puppet to Ansible: https://review.opendev.org/c/starlingx/ansible-playbooks/+/913251 Openstack related operations (user, service and endpoint configuration) are now handled exclusively by sysinv config_endpoints Test Plan: 1. PASS: Verify full DC system deployment - System Controller + 3 Subclouds install/bootstrap (virtual lab) 2. PASS: Verify Openstack FM user created 3. PASS: Verify Admin role for the FM user set in the services project 4. PASS: Verify Openstack FM service created 5. PASS: Verify admin, internal and public endpoints configured for FM Story: 2011035 Task: 49722 Change-Id: I7d2f1596595ec2613cd5de1ca3d99427ea32d52d Signed-off-by: Salman Rana <salman.rana@windriver.com>	2024-03-20 14:24:59 +00:00
Zuul	15aefdc468	Merge "Add retry robustness for Kubernetes upgrade control plane"	2024-03-19 21:23:41 +00:00
Zuul	c4b7c51ffb	Merge "Update IPsec IKE daemon log config"	2024-03-19 18:44:01 +00:00
Saba Touheed Mujawar	4c42927040	Add retry robustness for Kubernetes upgrade control plane In the case of a rare intermittent failure behaviour during the upgrading control plane step where puppet hits timeout first before the upgrade is completed or kubeadm hits its own Upgrade Manifest timeout (at 5m). This change will retry running the process by reporting failure to conductor when puppet manifest apply fails. Since it is using RPC to send messages with options, we don't get the return code directly and hence, cannot use a retry decorator. So we use the sysinv report callback feature to handle the success/failure path. TEST PLAN: PASS: Perform simplex and duplex k8s upgrade successfully. PASS: Install iso successfully. PASS: Manually send STOP signal to pause the process so that puppet manifest timeout and check whether retry code works and in retry attempts the upgrade completes. PASS: Manually decrease the puppet timeout to very low number and verify that code retries 2 times and updates failure state PASS: Perform orchestrated k8s upgrade, Manually send STOP signal to pause the kubeadm process during step upgrading-first-master and perform system kube-upgrade-abort. Verify that upgrade-aborted successfully and also verify that code does not try the retry mechanism for k8s upgrade control-plane as it is not in desired KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER state PASS: Perform manual k8s upgrade, for k8s upgrade control-plane failure perform manual upgrade-abort successfully. Perform Orchestrated k8s upgrade, for k8s upgrade control-plane failure after retries nfv aborts automatically. Closes-Bug: 2056326 Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806 https://review.opendev.org/c/starlingx/stx-puppet/+/911945 https://review.opendev.org/c/starlingx/integ/+/913422 Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5 Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>	2024-03-19 08:49:36 -04:00
Zuul	2a072b65c5	Merge "Allow mgmt and admin network reconfig"	2024-03-19 11:47:32 +00:00
Zuul	78d3acbb5d	Merge "Addition of OTS Token activation procedure"	2024-03-18 22:06:07 +00:00
Fabiano Correa Mercer	2fb32cf88d	Allow mgmt and admin network reconfig This change allows the management and admin network reconfig at same time in an AIO-DX subcloud. Currently, it is necessary to lock and unlock the controller in order to reconfigure the management network from AIO-SX. If the customer changes the management network fist, the new mgmt network will be in the database but the changes will jsut be applied during the unlock / reboot of the system. But the admin network changes are applied in runtime, if the admin network is changed after the management network reconfig, the admin will apply the changes on the system and some of them will apply the new mgmt network values before the system is updated with the new mgmt ip range, it will cause a puppet error and the system will not be correctly configured. Tests done: IPv4 AIO-SX subcloud mgmt network reconfig IPv4 AIO-SX subcloud admin network reconfig IPv4 AIO-SX subcloud admin and mgmt network reconfig IPv4 AIO-SX subcloud mgmt and admin network reconfig Story: 2010722 Task: 49724 Change-Id: I113eab2618f34b305cb7c4ee9bb129597f3898bb Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2024-03-18 15:58:40 -03:00
Hugo Brito	2b07588a8e	Handle Barbican user during endpoint config Add Barbican user to endpoint reconfiguration script. Openstack related operations (user, service and endpoint configuration) are now handled exclusively by sysinv config_endpoints Test Plan: 1. PASS: Verify full DC system deployment - System Controller + 3 Subclouds install/bootstrap (virtual lab) 2. PASS: Verify Openstack Barbican user created 3. PASS: Verify Admin role for the Barbican user set in the services project 4. PASS: Verify Openstack Barbican service created 5. PASS: Verify admin, internal and public endpoints configured for Barbican Story: 2011035 Task: 49738 Change-Id: I8045cb12d3faa20147b0b84bc9e5ce6c2e0cddf2 Signed-off-by: Hugo Brito <hugo.brito@windriver.com>	2024-03-18 14:32:51 -03:00
Andy Ning	441097fd18	Update IPsec IKE daemon log config This change updated IPsec IKE daemon log (charon.log) configuration so more details are logged and in better format. Test Plan: PASS: Run ipsec-client to generate charon-log.conf and restart ipsec, verify charon logs capture new details and in the new expected format. Story: 2010940 Task: 49711 Change-Id: I0c2943ba60e1867dfcebddca175058b62dde4ad7 Signed-off-by: Andy Ning <andy.ning@windriver.com>	2024-03-15 11:59:12 -04:00

1 2 3 4 5 ...

4083 Commits