config

Author	SHA1	Message	Date
Lucas Ratusznei Fonseca	fc8c161df6	Add dual stack support to the platform firewall This change updates the firewall configuration generation to take into account that a network can have more than one address pool associated to it. More tests were added to address dual stack setups. Test plan ========= Online setup tests ------------------ System: - AIO-DX - STANDARD (2 Controllers, 2 Storages, 1 Compute) Acceptance criteria: For all the platform interfaces, incoming ICMP, TCP and UDP traffic is allowed only for networks/ports that are configured in the associated address pools. [PASS] TC1 - Install IPv4, add IPv6 pools to the platform networks [PASS] TC2 - Install IPv6, add IPv4 pools to the platform networks Installation tests ------------------ Systems: AIO-SX, AIO-DX, STANDARD [PASS] TC3 - Regular installation on VirtualBox, IPv4 [PASS] TC4 - Regular installation on VirtualBox, IPv6 Related changes: - https://review.opendev.org/c/starlingx/stx-puppet/+/915509 - https://review.opendev.org/c/starlingx/ansible-playbooks/+/915510 Story: 2011027 Task: 49816 Depends-On: https://review.opendev.org/c/starlingx/config/+/914141 Change-Id: Id05a583e7fd806a6ea448ac5a521902b2c7e96e4 Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>	2024-04-17 20:16:19 -03:00
Lucas Ratusznei Fonseca	ff3a5d2341	Update network interface puppet resource gen to support dual-stack This change updates the puppet resource generation logic for network interfaces to suport dual-stack. Change summary ============== - Aliases / labels Previously, each alias was associated to a specific network. Now, since more than one address can be associated to the same network, the aliases are also associated to addresses. The label name is now :<network_id>-<address_id>. The network_id is 0 if there's no network associated with the alias, that's the case for the base interface config or for the cases where the address is not associated to a network. The address_id is 0 if there's no address associated with the alias, which is the case for the base config and for when there's no static address associated to the network, i.e. the method is DHCP. - Static addresses Previously, interfaces with more than one static addresses not associated with pools would be assigned just the first one. Now, an alias config is generated for each address. - CentOS compatibility All the code related to CentOS was removed. - Duplex-direct mode Duplex-direct systems must have DAD disabled for management and cluster-host interfaces. The disable DAD command is now generated only in the base interface config for all types of interfaces. - Address pool names The change assumes a new standard for address pool names, they will be formed by the old names with the suffixes '-ipv4' or '-ipv6'. For example: management-ipv4, management-ipv6. Since other systems that rely on the previous standard are not yet upgraded to dual-stack, the constant DUAL_STACK_COMPATIBILITY_MODE was introduced to control resource generation and validation logic in a way that assures compatibility. The constant and the conditionals will be removed once the other modules are updated. The conditionals were implemented more as a way to highlight which parts of the code are affected and make the changes easier in the future. - Tests / DB Base The base class for tests was updated to generate more consistent database states. Mixins for dual-stack cases were also created. - Tests / Interface Most of the test functions in the class InterfaceTestCase caused unnecessary updates to the database and the context. The class was splitted in two, the first one containing the tests that only need the basic database setup (controller, one interface associated with the mgmt network), and the other one for the tests that need different setups. A new fixture was created to test multiple system configs (IPv4, IPv6, dual-stack), which inspects in detail the generated hieradata. The tests associated with the InterfaceHostV6TestCase were moved to the new fixture, and new ones were introduced. Test plan ========= Online setup tests ------------------ System: STANDARD (2 Controllers, 2 Storages, 1 Worker) Stack setups: - Single stack IPv4 - Single stack IPv6 - Dual stack, primary IPv4 - Dual stack, primary IPv6 [PASS] TC1 - Online setup, regular ethernet mgmt0 (Ethernet) -> PXEBOOT, MGMT, CLUSTER_HOST [PASS] TC2 - Online setup, VLAN over ethernet pxe0 (Ethernet) -> PXEBOOT mgmt0 (VLAN over pxe0) -> MGMT, CLUSTER_HOST [PASS] TC3 - Online setup, bondig mgmt0 (Bond) -> PXEBOOT, MGMT, CLUSTER_HOST [PASS] TC4 - Online setup, VLAN over bonding pxe0 (Bond) -> PXEBOOT mgmt0 (VLAN over pxe0) -> MGMT, CLUSTER_HOST Installation tests ------------------ Systems: - AIO-SX - AIO-DX - Standard (2 Controllers, 2 Storages, 1 Worker) [PASS] TC5 - Regular installation on VirtualBox, IPv4 [PASS] TC6 - Regular installation on VirtualBox, IPv6 Data interface tests -------------------- System: AIO-DX Setup: data0 -> Ethernet, ipv4_mode=static, ipv6_mode=static data1 -> VLAN on top of data0, ipv4_mode=static, ipv6_mode=static For both interfaces, the following was performed: [PASS] TC7 - Add static IPv4 address [PASS] TC8 - Add static IPv6 address [PASS] TC9 - Add IPv4 route [PASS] TC10 - Add IPv6 route [PASS] TC11 - Remove IPv4 route [PASS] TC12 - Remove IPv6 route [PASS] TC13 - Remove static IPv4 address [PASS] TC14 - Remove static IPv6 address Story: 2011027 Task: 49815 Change-Id: Ib9603cbd444b21aefbcd417780a12c079f3d0b0f Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>	2024-04-16 16:23:15 -03:00
Zuul	e4b32b7e16	Merge "Fix charts upload when there are existing ones"	2024-04-02 16:57:39 +00:00
Igor Soares	b1b160f48b	Fix charts upload when there are existing ones This fixes a bug that prevents StarlingX application charts from being uploaded to the helm repository when one or more of them have been uploaded before. The charts upload logic was changed to check if all charts provided by the given application are valid prior to uploading. If a chart is invalid then no charts for that application will be uploaded, since the upload process cannot proceed in that scenario. Test Plan: PASS: build-pkgs -a && build-image PASS: AIO-SX fresh install PASS: Build a platform-integ-apps version containing one existing chart and two nonexistent charts in the local Helm repository. Update platform-integ-apps to the built version. Confirm that the existing chart was not re-uploaded and that the nonexistent ones were correctly uploaded to the Helm repository. PASS: Apply/remove/delete platform-integ-apps Closes-Bug: 2053074 Depends-on: https://review.opendev.org/c/starlingx/integ/+/912305 Change-Id: I155d457f58be1986cc6f25178929aedfbe1d0693 Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>	2024-04-02 12:05:28 -03:00
Zuul	2cbdc83b04	Merge "Expose Kubernetes ApiextensionsV1Api"	2024-04-01 19:16:12 +00:00
Zuul	25d58ebcf8	Merge "First check Root CAs on kube-cert-rotation.sh"	2024-03-29 00:06:34 +00:00
Rei Oliveira	01a5ea0843	First check Root CAs on kube-cert-rotation.sh As of now, the script only verifies the validity of leaf certificates and, if expired, will regenerate them based on K8s/etcd Root CAs. It doesn't account for the possibility of Root CAs being expired. It will generate leaf certificates based on Root CAs, even if said Root CAs are expired. This change fixes that behaviour by first checking validity of Root CAs and only allowing leaf certificate renewal if RCAs are valid. Test plan: PASS: Cause Root CAs to expire, run kube-cert-rotation.sh script and verify that it fails with an error saying Root CAs are expired and leaf certificates are not renewed. PASS: Ensure to have valid Root CAs, cause leaf certificates to expire, run kube-cert-rotation.sh and verify that the script executes normally and is able to renew the leaf certificates. Closes-Bug: 2059708 Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com> Change-Id: I98dfd8d1417754f3c723d8ddd52a856785ffc83b	2024-03-28 14:28:34 -03:00
Zuul	de9d380dc9	Merge "Update swanctl.conf cacerts w/ system-local-ca files"	2024-03-28 15:10:34 +00:00
Manoel Benedito Neto	abef79e45f	Update swanctl.conf cacerts w/ system-local-ca files This commit introduces a new configuration for swanctl.conf file where cacerts references two system-local-ca files. The two files represents the last (system-local-ca-0.crt) and the current (system-local-ca-1.crt) certificates associated with system-local-ca. The main goal of this implementation is to maintain SAs in all nodes during the update of system-local-ca certificate. Test plan: PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0. Run "ipsec-client pxecontroller --opcode 1" in worker-0. Observe that certificates, keys and swanctl.conf files are created in worker-0 node. Observe that a security association is established between the hosts via "sudo swanctl --list-sas" command. PASS: In a DX system with available enabled active status with IPsec server being executed from controller-0. Run "ipsec-client pxecontroller --opcode 2" in controller-1. Observe the previously created CertificateRequest was deleted and generated a new one for controller-1's node. The new certificate is sent to IPsec Client and stored with the swanctl rekey command executed sucessfully. Story: 2010940 Task: 49777 Change-Id: I638932a602ed9423d20ed448e5aada499ef65d77 Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>	2024-03-28 13:40:10 +00:00
Zuul	160aed20ba	Merge "Handle FM user during endpoint config"	2024-03-26 18:01:25 +00:00
Zuul	5c1569362b	Merge "Report port and device inventory after the worker manifest"	2024-03-26 16:15:05 +00:00
Tara Subedi	933d3a3a73	Report port and device inventory after the worker manifest This is incremental fix of bug:2053149. Upon network boot (first boot) of worker node, agent manager is supposed to report ports/devices, without waiting for worker manifest, as that would never run on first boot. Without this, after system restore, it will be unable to unlock compute node due to sriov config update. kickstart records first boot as "/etc/platform/.first_boot". Agent manager deletes this file. In case agent manager get crashed, it will start again. This time, agent manager don't see .first_boot file, and don't know this is still first boot and it won't report inventory for the worker node. This commit fixes this issue by creating volatile file "/var/run/.first_boot" before deleting "/etc/platform/.first_boot", and agent relies on both files to figure out it is first boot or not. This present same logic for multiple crash/restart of agent manager. TEST PLAN: PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues. PASS: Network-boot worker node, before doing unlock, restart agent manager (sysinv-agent), check sysinv.log to see ports are reported. Closes-Bug: 2053149 Change-Id: Iace5576575388a6ed3403590dbeec545c25fc0e0 Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>	2024-03-26 10:37:56 -04:00
Zuul	85a548ffcc	Merge "Correct Kubernetes control-plane upgrade robustness skip_update_config"	2024-03-25 20:20:03 +00:00
Zuul	839b9b554d	Merge "Add IPsec certificates renewal cron job"	2024-03-25 15:07:05 +00:00
Jim Gauld	4522150c87	Correct Kubernetes control-plane upgrade robustness skip_update_config This removes the skip_update_config parameter from the _config_apply_runtime_manifest() call when upgrading Kubernetes control-plane. This parameter was unintentially set to True, so this configuration step did not persist. This caused generation of 250.001 config-out-of-date alarms during kube upgrade. The review that introduced the bug: https://review.opendev.org/c/starlingx/config/+/911100 TEST PLAN: - watch /var/log/nfv-vim.log for each orchestrated upgrade PASS: orchestrated k8s upgrade (no faults) - AIO-SX, AIO-DX, Standard PASS: orchestrated k8s upgrade, with fault insertion during control-plane upgrade first attempt - AIO-SX - AIO-DX (both controller-0, controller-1) - Standard (both controller-0, controller-1) PASS: orchestrated k8s upgrade, with fault insertion during control-plane upgrade first and second attempt, trigger abort - AIO-SX - AIO-DX (first controller) Closes-Bug: 2056326 Change-Id: I629c8133312faa5c95d06960b15d3e516e48e4cb Signed-off-by: Jim Gauld <James.Gauld@windriver.com>	2024-03-23 19:56:04 -04:00
Zuul	6775a04444	Merge "Fix runtime_config_get method to avoid type error"	2024-03-22 19:59:19 +00:00
Zuul	c5b40d42b6	Merge "Prune stale backup in progress alarm 210.001"	2024-03-22 19:38:31 +00:00
rummadis	a3a20fcf59	Prune stale backup in progress alarm 210.001 User unable to take subcloud backup when there is a stale backup in progress alarm Example: When user tries to take subcloud backup in Distributed cloud env if there is stale 210.001 alarm present in subcloud then user can not trigger the subsequent subcloud backup This Fix helps to identify the 210.001 alarms and clear them if they are pending more than 1 hour TEST PLAN: PASS: DC-libvirt setup with 2 controllers and 2 subclouds PASS: verified stale 210.001 getting removed Closes-Bug: 2058516 Change-Id: Iedcc5e41cd4245c538d331d9aa8c2b6cc445acce Signed-off-by: rummadis <ramu.ummadishetty@windriver.com>	2024-03-22 14:44:47 -04:00
Gustavo Pereira	b356e7ac5a	Add mtce to endpoint reconfiguration script Add mtce user to endpoint reconfiguration script to improve bootstrap execution time. The related puppet class and tasks will be removed in commit: https://review.opendev.org/c/starlingx/stx-puppet/+/912319. Test Plan: PASS: Deploy a subcloud without the changes and record its bootstrap execution time. Deploy another subcloud with the proposed changes. Verify successful subcloud deployment and the bootstrap execution time is 80s faster. PASS: Verify a successful AIO-SX deployment. PASS: Verify a successful AIO-DX controller deployment. PASS: Verify a successful DC environment deployment. Story: 2011035 Task: 49695 Change-Id: I2075026bd378ef3b30978a6d420fbb2253ba290c Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>	2024-03-22 14:48:15 -03:00
Heitor Matsui	fd5d603d86	Fix runtime_config_get method to avoid type error An issue was found when config_applied for a host assumed the default value, which is the string "install" (refer to [1]), returning a type error in runtime_config_get trying to compare string "install" with a column "id" with type int. This commit fixes runtime_config_get method by inverting the logic: if the id passed is an int then compare with id, if it is not then assume it is a string and compare with config_uuid column. [1] `15aefdc468/sysinv/sysinv/sysinv/sysinv/agent/manager.py (L116)` Test Plan PASS: set config_applied="install" for a host, force inventory report and observe no more database errors on sysinv.log PASS: install/bootstrap/unlock AIO-DX Story: 2010676 Task: 49745 Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com> Change-Id: I9c687a1eb67c62291f1d2aa9cef1d6fbe993d0fa	2024-03-21 17:12:17 -03:00
Zuul	1573412c4d	Merge "Modify Host Personality for attribute max_cpu_mhz_configured"	2024-03-21 18:48:43 +00:00
Zuul	a1211d16d4	Merge "Handle Barbican user during endpoint config"	2024-03-21 17:04:07 +00:00
Guilherme Santos	9a564e455e	Expose Kubernetes ApiextensionsV1Api This commit exposes the Kubernetes API extensions in order to allow StarlingX Applications to manage CRDs using API calls. Test Plan: PASS: AIO-SX host-lock and host-unlock run successfully. PASS: Exposed resources have been called from an Application and run accordingly. Story: 2011069 Task: 49756 Change-Id: I7d04d3e779dae9ebf95403c8afb93fe6d048993b Signed-off-by: Guilherme Santos <guilherme.santos@windriver.com>	2024-03-21 10:34:58 +00:00
Poornima Y N	7fc11de9ee	Modify Host Personality for attribute max_cpu_mhz_configured Max_cpu_mhz_personality is the attribute of the host which can be configured in host where turbo freq is enabled.In case of host whose role is both controller and worker, the personality for the attribute was not taken care to include such scenario. Made the changes in the sysinv conductor to update the host personalities based on the function that node operates, which handles the scenario when the host acts as both controller and worker node. TEST PLAN: PASS: Build and deploy ISO on Simplex PASS: Check whether the max cpu freq set on a simplex Below are the commands: system host-show <host_id> \| grep is_max_cpu_configurable system service-parameter-list --name cpu_max_freq_min_percentage system service-parameter-modify platform config cpu_max_freq_min_percentage=<> system host-update <host_id> max_cpu_mhz_configured=<value in mhz> After above commands check whether cpu is set using below command: sudo turbostat Closes-Bug: 2058476 Change-Id: I08a5d1400834afca6a0eeaaa8813ac8d71a9db15 Signed-off-by: Poornima Y N <Poornima.Y.N@windriver.com>	2024-03-21 04:55:02 -04:00
Salman Rana	bdac091e77	Handle FM user during endpoint config Add FM user to endpoint reconfiguration script, following the migration of FM bootstrap from puppet to Ansible: https://review.opendev.org/c/starlingx/ansible-playbooks/+/913251 Openstack related operations (user, service and endpoint configuration) are now handled exclusively by sysinv config_endpoints Test Plan: 1. PASS: Verify full DC system deployment - System Controller + 3 Subclouds install/bootstrap (virtual lab) 2. PASS: Verify Openstack FM user created 3. PASS: Verify Admin role for the FM user set in the services project 4. PASS: Verify Openstack FM service created 5. PASS: Verify admin, internal and public endpoints configured for FM Story: 2011035 Task: 49722 Change-Id: I7d2f1596595ec2613cd5de1ca3d99427ea32d52d Signed-off-by: Salman Rana <salman.rana@windriver.com>	2024-03-20 14:24:59 +00:00
Zuul	15aefdc468	Merge "Add retry robustness for Kubernetes upgrade control plane"	2024-03-19 21:23:41 +00:00
Zuul	c4b7c51ffb	Merge "Update IPsec IKE daemon log config"	2024-03-19 18:44:01 +00:00
Saba Touheed Mujawar	4c42927040	Add retry robustness for Kubernetes upgrade control plane In the case of a rare intermittent failure behaviour during the upgrading control plane step where puppet hits timeout first before the upgrade is completed or kubeadm hits its own Upgrade Manifest timeout (at 5m). This change will retry running the process by reporting failure to conductor when puppet manifest apply fails. Since it is using RPC to send messages with options, we don't get the return code directly and hence, cannot use a retry decorator. So we use the sysinv report callback feature to handle the success/failure path. TEST PLAN: PASS: Perform simplex and duplex k8s upgrade successfully. PASS: Install iso successfully. PASS: Manually send STOP signal to pause the process so that puppet manifest timeout and check whether retry code works and in retry attempts the upgrade completes. PASS: Manually decrease the puppet timeout to very low number and verify that code retries 2 times and updates failure state PASS: Perform orchestrated k8s upgrade, Manually send STOP signal to pause the kubeadm process during step upgrading-first-master and perform system kube-upgrade-abort. Verify that upgrade-aborted successfully and also verify that code does not try the retry mechanism for k8s upgrade control-plane as it is not in desired KUBE_UPGRADING_FIRST_MASTER or KUBE_UPGRADING_SECOND_MASTER state PASS: Perform manual k8s upgrade, for k8s upgrade control-plane failure perform manual upgrade-abort successfully. Perform Orchestrated k8s upgrade, for k8s upgrade control-plane failure after retries nfv aborts automatically. Closes-Bug: 2056326 Depends-on: https://review.opendev.org/c/starlingx/nfv/+/912806 https://review.opendev.org/c/starlingx/stx-puppet/+/911945 https://review.opendev.org/c/starlingx/integ/+/913422 Change-Id: I5dc3b87530be89d623b40da650b7ff04c69f1cc5 Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>	2024-03-19 08:49:36 -04:00
Zuul	2a072b65c5	Merge "Allow mgmt and admin network reconfig"	2024-03-19 11:47:32 +00:00
Zuul	78d3acbb5d	Merge "Addition of OTS Token activation procedure"	2024-03-18 22:06:07 +00:00
Fabiano Correa Mercer	2fb32cf88d	Allow mgmt and admin network reconfig This change allows the management and admin network reconfig at same time in an AIO-DX subcloud. Currently, it is necessary to lock and unlock the controller in order to reconfigure the management network from AIO-SX. If the customer changes the management network fist, the new mgmt network will be in the database but the changes will jsut be applied during the unlock / reboot of the system. But the admin network changes are applied in runtime, if the admin network is changed after the management network reconfig, the admin will apply the changes on the system and some of them will apply the new mgmt network values before the system is updated with the new mgmt ip range, it will cause a puppet error and the system will not be correctly configured. Tests done: IPv4 AIO-SX subcloud mgmt network reconfig IPv4 AIO-SX subcloud admin network reconfig IPv4 AIO-SX subcloud admin and mgmt network reconfig IPv4 AIO-SX subcloud mgmt and admin network reconfig Story: 2010722 Task: 49724 Change-Id: I113eab2618f34b305cb7c4ee9bb129597f3898bb Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2024-03-18 15:58:40 -03:00
Hugo Brito	2b07588a8e	Handle Barbican user during endpoint config Add Barbican user to endpoint reconfiguration script. Openstack related operations (user, service and endpoint configuration) are now handled exclusively by sysinv config_endpoints Test Plan: 1. PASS: Verify full DC system deployment - System Controller + 3 Subclouds install/bootstrap (virtual lab) 2. PASS: Verify Openstack Barbican user created 3. PASS: Verify Admin role for the Barbican user set in the services project 4. PASS: Verify Openstack Barbican service created 5. PASS: Verify admin, internal and public endpoints configured for Barbican Story: 2011035 Task: 49738 Change-Id: I8045cb12d3faa20147b0b84bc9e5ce6c2e0cddf2 Signed-off-by: Hugo Brito <hugo.brito@windriver.com>	2024-03-18 14:32:51 -03:00
Andy Ning	441097fd18	Update IPsec IKE daemon log config This change updated IPsec IKE daemon log (charon.log) configuration so more details are logged and in better format. Test Plan: PASS: Run ipsec-client to generate charon-log.conf and restart ipsec, verify charon logs capture new details and in the new expected format. Story: 2010940 Task: 49711 Change-Id: I0c2943ba60e1867dfcebddca175058b62dde4ad7 Signed-off-by: Andy Ning <andy.ning@windriver.com>	2024-03-15 11:59:12 -04:00
Zuul	ae8bb0f4d5	Merge "Fix failed pods not being detected by rootca health check"	2024-03-14 19:48:59 +00:00
Victor Romano	d807f868d6	Fix failed pods not being detected by rootca health check On the health check prior to rootca update, there was a bug that prevented CrashLoopBackoff pods being detected as unhealthy. This is because the pods are in phase "Running", but the status of the container itself is "ready: false". This commit adds an additional check to "Running" pods so if any container inside it is not ready, the pod will be deemed unhealthy. Test plan: - PASS: Attempt to perform a rootca update with a pod in CrashloopBackoff state. Verify the update is not possible and the health check fails with the pod being show as unhealthy is "system health-query-kube-upgrade --rootca" - PASS: Verify the rootca update is possible if no pods are in CrashloopBackoff state. Closes-Bug: 2057779 Change-Id: I115b6621df11516db2279fe6bc96452d27975c50 Signed-off-by: Victor Romano <victor.gluzromano@windriver.com>	2024-03-14 08:58:42 -03:00
Manoel Benedito Neto	56e2d1e2cd	Addition of OTS Token activation procedure This commit adds an OTS Token activation procedure to IPsec server implementation. With this implementation, OTS Token is activated when PKI Auth response message is sent from IPsec server to IPsec client. The Token expiry time was increased to 7 seconds due to Kubernetes API dependability that may delay IPsec Auth procedure in a few seconds, affecting OTS Token validation criterea. Test plan: PASS: Full build, system install, bootstrap and unlock DX system w/ unlocked enabled available status. PASS: In a DC system with available enabled active status with IPsec server being executed from controller-0. Run "ipsec-client pxecontroller --opcode 1" in worker-0. Observe that certificates, keys and swanctl.conf files are created in worker-0 node. Observe that a security association is established between the hosts via "sudo swanctl --list-sas" command. PASS: In a DC system with available enabled active status with IPsec server being executed from controller-0. Run "ipsec-client pxecontroller --opcode 2" in controller-1. Observe the previously created CertificateRequest was deleted and generated a new one for controller-1's node. The new certificate is sent to IPsec Client and stored with the swanctl rekey command executed sucessfully. Story: 2010940 Task: 49712 Change-Id: I1c65edf14fd7ae3f47309b35048a805e0306038d Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>	2024-03-13 18:32:13 -03:00
Zuul	b5344801fd	Merge "Fix LDAP issue for DC subcloud"	2024-03-13 20:18:24 +00:00
Steven Webster	f8d30588ad	Fix LDAP issue for DC subcloud This commit fixes an LDAP authentication issue seen on worker nodes of a subcloud after a rehoming procedure was performed. There are two main parts: 1. Since every host of a subcloud authenticates with the system controller, we need to reconfigure the LDAP URI across all nodes of the system when the system controller network changes (upon rehome). Currently, it is only being reconfigured on controller nodes. 2. Currently, the system uses an SNAT rule to allow worker/storage nodes to authenticate with the system controller when the admin network is in use. This is because the admin network only exists between controller nodes of a distributed cloud. The SNAT rule is needed to allow traffic from the (private) management network of the subcloud over the admin network to the system controller and back again. If the admin network is _not_ being used, worker/storage nodes of the subcloud can authenticate with the system controller, but routes must be installed on the worker/storage nodes to facilitate this. It becomes tricky to manage in certain circumstances of rehoming/network config. This traffic really should be treated in the same way as that of the admin network. This commit addresses the above by: 1. Reconfiguring the ldap_server config across all nodes upon system controller network changes. 2. Generalizing the current admin network nat implementation to handle the management network as well. Test Plan: IPv4, IPv6 distributed clouds 1. Rehome a subcloud to another system controller and back again (mgmt network) 2. Update the subcloud to use the admin network (mgmt -> admin) 3. Rehome the subcloud to another system controller and back again (admin network) 4. Update the subcloud to use the mgmt network (admin -> mgmt) After each of the numbered steps, the following were performed: a. Ensure the system controller could become managed, online, in-sync b. Ensure the iptables SNAT rules were installed or updated appropriately on the subcloud controller nodes. c. Log into a worker node of the subcloud and ensure sudo commands could be issued without LDAP timeout. d. Log into worder node with LDAP USER X via console and verify login succeed In general, tcpdump was also used to ensure the SNAT translation was actually happening. Partial-Bug: #2056560 Change-Id: Ia675a4ff3a2cba93e4ef62b27dba91802811e097 Signed-off-by: Steven Webster <steven.webster@windriver.com>	2024-03-13 14:27:13 -04:00
Zuul	74437d5311	Merge "Revert "Modify Memory Field Names""	2024-03-13 14:50:37 +00:00
Andy Ning	3fbe5f1aa6	Add IPsec certificates renewal cron job This change added the IPsec certificates renewal script, and set it up as a cron job to run daily at mid night. Test Plan: PASS: After a DX system deployed, verify the script is in the correct directory with right permission, and is added in /var/spool/cron/crontabs/root PASS: Simulate the IPsec cert is about to expire, run the script, verify IPsec cert, private key and trusted CA cert are renewed, and IKE SAs and CHILD SAs are re-established. PASS: Simulate a failure condition (eg, ipsec-client return non zero), run the script, verify the IPsec renewal fails, and alarm 250.004 is raised. PASS: Run the script with IPsec cert not being about to expire, verify the script finish successfully and alarm 250.004 is cleared. PASS: Simulate the IPsec trusted CA cert is different from the system-local-ca in k8s secret, run the script, verify the trusted CA and IPsec cert/key are renewed, and IKE SAs and CHILD SAs are re-established. Story: 2010940 Task: 49705 Depends-On: https://review.opendev.org/c/starlingx/fault/+/912598 Change-Id: I69236399b59655dd67ac7b01c4472a4b7ab911e5 Signed-off-by: Andy Ning <andy.ning@windriver.com>	2024-03-13 10:46:24 -04:00
Zuul	1b9bc6ed76	Merge "Introduce Puppet variables for primary and secondary pool addresses."	2024-03-13 13:33:01 +00:00
Zuul	2a621c1bc5	Merge "Use correct hiera file for downgrade"	2024-03-12 16:55:10 +00:00
Andre Kantek	fcebab8ef3	Introduce Puppet variables for primary and secondary pool addresses. Details: This change extracts the addresses from both the primary and secondary address pools and makes them available for use in Puppet manifests. To accommodate the dual stack configuration, the address allocation for non-controller nodes was updated for both management and cluster-host networks. Since the task for upgrade data-migration is not ready yet, a logic was added to access directly the network's field pool_uuid and get the addresses with it, if the network_addresspools is empty (as it would be the case after an upgrade) As the data migration functionality for the upgrade is still under development, a temporary solution was implemented. Logic was added to directly access the network's "pool_uuid" field and retrieve addresses through it whenever the "network_addresspools" list is empty, which is expected to occur immediately following an upgrade. This allows for uninterrupted network operation during the upgrade process. Variable Naming: The following naming convention will be used for the variables: $platform::network::[network_type]::[ipv4/ipv6]::params::{var_name} Variable Usage: Primary Pool: Existing variables will be maintained and populated with addresses from the primary pool. This ensures compatibility with applications that currently rely on them. They have the format $platform::network::[network_type]::params::{var_name} The variable platform::network::[network_type]::params::subnet_version indicates the primary pool protocol. Secondary Pool: New variables with the above naming convention will be introduced, allowing applications to utilize addresses from the secondary pool if needed. Benefits: Improved modularity and reusability of network configurations. Clear separation of concerns between primary and secondary pools. Easier implementation of applications requiring addresses from either pool. Notes: Replace [network_type] can be oam. mgmt, cluster_host, ... Replace [ipv4/ipv6] with either "ipv4" or "ipv6" depending on the address family. Replace [variable_name] with a descriptive name for the specific variable (e.g., "subnet_version", "interface_address"). Test Plan: [PASS] unit tests implemented [PASS] AIO-SX, Standard instalation (IPv4 and IPv6) - using the dependency change the secondary pool was introduced - system was lock/unlocked and no puppet manifests were detected - inspection of system.yaml and controller-0.yaml to verify variables content - no alarms or disabled services were found - in standard added hosts with dual-stack config and verified that addresses were allocated for mgmt and cluster-host and after unlock the interface id was assigned to the respective entries. [PASS] For standard systems during upgrade, simulate node unlock by: - Clearing the "network_addresspools" table after Ansible execution and before DM configuration. - Installing remaining nodes with the table empty. This mimics the post-upgrade scenario. Story: 2011027 Task: 49679 Depends-On: https://review.opendev.org/c/starlingx/config/+/908915 Change-Id: If252fa051b2ba5b5eb3033ff269683af741091d2 Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>	2024-03-12 07:25:46 -03:00
Zuul	e378036a0d	Merge "Add sysinv upgrades support for Kubernetes 1.29.2"	2024-03-11 23:14:56 +00:00
Zuul	0befaa8ff0	Merge "Implement new network-addrpool CLI"	2024-03-11 20:12:46 +00:00
Zuul	6945a0fd6b	Merge "Implement IPsec Cert-Renewal Operation"	2024-03-11 20:03:25 +00:00
Zuul	a396dff37c	Merge "Prevent configuring the Dell Minerva NIC VFs"	2024-03-11 17:28:01 +00:00
Zuul	6c3df45f05	Merge "Report port and device inventory after the worker manifest"	2024-03-11 16:19:09 +00:00
Zuul	6e54e4437d	Merge "Add CONF option to set default auto_update value"	2024-03-11 13:32:33 +00:00
Fabiano Correa Mercer	8a18249fda	Use correct hiera file for downgrade During an upgrade abort scenario where both controllers are already upgraded to release N+1, a potential issue arises. Release N+1 utilizes a new hieradata file named hostname-X.yaml, while release N uses the older ip.yaml. Controller-0 must be downgraded first, making controller-1 the active node. However, controller-1 attempts to update the hieradata file at /opt/platform/puppet/<Release N>/.../controller-0.yaml This file doesn't exist because release N uses ip.yaml Solution: The system needs to identify this downgrade scenario and update the correct hieradata file for release N: /opt/platform/puppet/<Release N>/hieradata/<ip>.yaml Tests Done: AIO-DX IPv6 fresh install AIO-DX IPv6 upgrade abort Story: 2010722 Task: 49692 Change-Id: I848543e7606ddc5bb24ddadb07a7a74d56126044 Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>	2024-03-11 13:19:37 +00:00

1 2 3 4 5 ...

4066 Commits