3973 Commits

Author SHA1 Message Date
Zuul
2324cf7a6f Merge "Nohz_full configuration hieradata needs to be updated" 2022-07-21 14:45:17 +00:00
Iago Estrela
705077ed56 Nohz_full configuration hieradata needs to be updated
The grub commandline parameters contains nohz_full grub which needs
to be updated with something different than a cpu list for kernel
isolation when the user decides to disable it by assigning the
disable-nohz-full label. In this manner, the disabled string can be
used to signal the kernel to restore this configuration to default.

Closes-Bug: 1981762

Test plan:
PASS: Disable nohz_full configuration in a standard worker by enabling
      disable-nohz-full label.
PASS: Verify that /etc/default/grub doesn't contains nohz_full
parameter.
PASS: Verify that /sys/devices/system/cpu/nohz_full was restored to
      default.
PASS: Enable nohz_full configuration in a standard worker by removing
      disable-nohz-full label.
PASS: Verify that /sys/devices/system/cpu/nohz_full has the cpulist.
PASS: Remove label and verify alarms and system overall health.
PASS: Add label and verify alarms and system overall health.

Signed-off-by: Iago Estrela <IagoFilipe.EstrelaBarros@windriver.com>
Change-Id: I00b68faae2612088b8e2fe7aeb3900babe638ca0
2022-07-20 18:11:27 -03:00
Zuul
3449792cbf Merge "Debian - Fix update ca certs command" 2022-07-19 22:13:41 +00:00
Zuul
5cb241e318 Merge "host-device-show doesn't show firmware info" 2022-07-18 13:28:37 +00:00
Li Zhu
03f785e953 Debian - Fix update ca certs command
Correct update ca certs command for Debian.

Test Plan:
Verify: Bootstrap and adding a Subcloud on Debian
Verify: Bootstrap and adding a Subcloud on Centos

Story: 2010119
Task: 45763

Signed-off-by: Li Zhu <li.zhu@windriver.com>
Change-Id: I4a9d2758ce012557fad4a19b49aa9b5bfe4f1680
2022-07-15 17:49:57 -04:00
Mohammad Issa
86aa93255e host-device-show doesn't show firmware info
When running "system host-device-show controller-0 0000:b2:00.0",
all firmware info are shown as "None".

Turns out some services like "sysinv-fpga-agent" and
"sysinv-conf-watcher" are not initialized properly during initial boot.

Test Plan:

PASS: modify the rules script that is used to start systemd unit files,
      to become compatible with the Debian command rules. for example,
      specifying the name of the service with the "--name" option:
      "--name=sysinv-fpga-agent". Then execute:
      "ls /usr/lib/systemd/system/sysinv* -l" after initial boot
      --> Command Response:
          -rw-r--r-- 3 root root  81 Jan  1  1970
	  /usr/lib/systemd/system/sysinv-conf-watcher.path

	  -rw-r--r-- 3 root root 219 Jan  1  1970
	  /usr/lib/systemd/system/sysinv-conf-watcher.service

	  -rw-r--r-- 3 root root 389 Jan  1  1970
	  /usr/lib/systemd/system/sysinv-fpga-agent.service

Closes-Bug: 1981824

Signed-off-by: Mohammad Issa <mohammad.issa@windriver.com>
Change-Id: I1491e9607688738fedec188c872a815b074e8dfc
2022-07-15 19:29:00 +00:00
Zuul
fb7eb9664c Merge "Disallow without vlan id when creating vlan interface" 2022-07-12 12:53:35 +00:00
Zuul
af52b9709a Merge "Device image upload with bmc type error" 2022-07-11 16:26:43 +00:00
Zuul
9c85f6536c Merge "Fix sysinv-api crash with 250 parallel requests" 2022-07-08 18:23:48 +00:00
Zuul
1d7e377b44 Merge "Ignore JSON decode error in request log." 2022-07-08 15:53:55 +00:00
Joao Victor Portal
8033aae447 Ignore JSON decode error in request log.
For DELETE requests with no request body or "Content-Length" field, the
expected return value for "hasattr(rest_state.request, 'json')" would be
False and for "rest_state.request.json" would be an empty string.
Instead, what occurs is that both "hasattr(rest_state.request, 'json')"
and "rest_state.request.json" are throwing JSON decode exceptions in
this case. This change adds tolerance for this exception. This is done
because it is not know all the possible request conditions that may
cause this exception to be thrown.

Test Plan:

PASS: Successfully deploy an AIO-SX using a Debian image with this
commit present and, using horizon interface, create and delete a data
network, checking that in log file "/var/log/sysinv-api.log" no
exception was thrown in the delete request.

Closes-Bug: 1980842
Signed-off-by: Joao Victor Portal <Joao.VictorPortal@windriver.com>
Change-Id: Ib9f72f3b2d4d60790fa8011674a013bcd141a61e
2022-07-06 18:26:54 -03:00
Mohammad Issa
109eb6e7f3 Device image upload with bmc type error
device-image-upload with --bmc or --retimer-included failed with
error: "Expecting value: line 1 column 1 (char 0)"
Note: device-image-upload WITHOUT --bmc or --retimer-included works

The parameters for 'bmc' and 'retimer_included' (ex. True or False)
are passed in as strings instead of boolean.

Unit test case passes in parameters as strings, if strtobool
conversion works it will not catch an error

Test Plan:

PASS: Intercept the data Dictionary and override values corresponding
      to their keys with the correct values.
      Then use this command to test:
     "system device-image-upload VistaCreekBravoBMCFW_Release_WW16.1.bin
     functional 8086 0b30 --bitstream-id 32 --bmc true
     --retimer-included true"
     --> Command Response:
         +------------------+--------------------------------------+
         | Property         | Value                                |
  	 +------------------+--------------------------------------+
	 | uuid             | d54038e6-4174-48e0-943e-a7bdfa1caf07 |
         | bitstream_type   | functional                           |
	 | pci_vendor       | 8086                                 |
	 | pci_device       | 0b30                                 |
         | bitstream_id     | 35                                   |
	 | key_signature    | None                                 |
	 | revoke_key_id    | None                                 |
	 | name             | None                                 |
	 | description      | None                                 |
	 | image_version    | None                                 |
	 | applied          | False                                |
	 | applied_labels   |                                      |
         | bmc              | True                                 |
         | retimer_included | True                                 |
	 +------------------+--------------------------------------+

Closes-Bug: 1980405

Signed-off-by: Mohammad Issa <mohammad.issa@windriver.com>
Change-Id: I79ea7661b1bc383df5daa6f4f93efd4df0e7bb2b
2022-07-06 20:13:18 +00:00
Zuul
8b2e968aff Merge "Populate more specific load import error messages" 2022-07-06 13:45:10 +00:00
Zuul
5faacb6844 Merge "Skip Recover From Armada to FluxCD apps" 2022-07-06 13:42:36 +00:00
Zuul
fbf2987ff4 Merge "cgtsclient handle certificate related options properly" 2022-07-06 13:19:45 +00:00
Junfeng (Shawn) Li
8175c0069e Populate more specific load import error messages
Raise each exception from_upload_file() method and populate
it to client side error messages. This change doesn't add or remove any
new behavior. It only update the error message format in the log.

Closes-Bug: 1979689

Test Plan:
PASS: Patch local duplex system and trigger one exception from _upload_file()
      method to verify the error message is populated
PASS: Run 'system load-import bootimage.iso bootimage.sig' successfully and
      verify in 'system load-list' with sufficient /scratch diskspace

Signed-off-by: Junfeng (Shawn) Li <junfeng.li@windriver.com>
Change-Id: I4cbf0ce3f0f9036e41d65d7c19a84d92f768ae32
2022-07-05 21:03:02 +00:00
Andy Ning
3379be986a cgtsclient handle certificate related options properly
Currently cgtsclient ignores "-k/--insecure", "--ca-file",
"--cert-file" and ""--key-file" options. In order for command
such as "system host-list" to work over HTTPS, OS_CACERT env
variable has to be set.

This change updated cgtsclient to accept and properly handle
the ignored options.

Test Plan:
PASS: remote cli docker image build
PASS: from remote cli environment, successfully run the
      "system host-list" commands with the 4 options over
      HTTPS.

Closes-Bug: 1980417
Signed-off-by: Andy Ning <andy.ning@windriver.com>
Change-Id: Iae03ac60188157cb726e6e12ba2209eff6b7e1e1
2022-07-04 09:41:09 -04:00
Joohyun
60c1074f25 Disallow without vlan id when creating vlan interface
When configuring an interface to vlan, it is possible to configure the
interface without entering a vlan id. If vlan id is set without input, vlan id
is set to NONE. If this interface is assigned to oam and unlocked, a problem
occurs. After unlock, the device name is set to vlanNone and the oam interface
cannot up normally.

Closes-Bug: 1979253

Test Plan:
PASS: Create vlan interface with vlan id and succeed in creation
PASS: Create vlan interface without vlan id and fail in creation

Signed-off-by: ohjoohyun <oh011798@gmail.com>
Change-Id: Ifd5527699cb8e8f874a54292f40656fc890f0a1a
2022-07-04 12:24:26 +09:00
Zuul
2914421a12 Merge "Use session object when creating keystone_client" 2022-06-30 20:13:00 +00:00
Zuul
611245e03c Merge "Fix registry credentials retrieval for Debian" 2022-06-30 19:15:35 +00:00
Jerry Sun
1cf6f891bc Fix registry credentials retrieval for Debian
Code that retrieves the registry credentials does not work properly
with Python3. This commit fixes that.

Test Plan:
  - Verify successful bootstrap for both CentOS and Debian
    using authenticated registry
Partial-bug: 1980391

Change-Id: I71cac14d8bdd63501fc804086cb8af429135bd92
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2022-06-30 14:31:15 -04:00
Rei Oliveira
b2eab5aaed Use session object when creating keystone_client
The new version of the keystone client we are using in debian does not
support instantiating a keystone_client without a session object.

This is making use of an existing self._get_keystone_session function
to create the session object and using that session object to
authenticate with the keystone_client. There are also minor
refactorings to avoid using the keystone client at bootstrap time.

Test plan on both centos and debian:

PASS: Run 'openstack user password set' to change the password and
      verified that /opt/platform/puppet/22.06/hieradata/
      secure_system.yaml gets updated with the new password, keyring
      gets updated with new password and /var/log/sm-customer.log
      shows that the vim service gets restarted

PASS: Verify that no keystone error messages are shown during bootstrap
      in sysinv.log keystone listener are started.
      
PASS: When keystone admin user password changes, observe the 
      corresponding entity in keyring is updated, and vim is restarted
      (by openstack::keystone::password::runtime)

Closes-Bug: 1979995

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: Id4d4bf8072a853d4d3e016e9afa2cd984ee85694
2022-06-30 17:03:10 +00:00
Zuul
f8acf758f9 Merge "Move ttys_dcd configuration from agent to puppet runtime manifest" 2022-06-30 16:40:04 +00:00
Iago Estrela
dc7c3c9380 Fix sysinv-api crash with 250 parallel requests
Sysinv WSGI server is crashing after receiving 250 requests in parallel
in route creation endpoint due to limited number of threads to handle
requests.

Closes-Bug: 1974194

Test plan:
PASS: Hit route creation endpoint 250 times in parallel and verify
      WSGI Server didn't restart.

Signed-off-by: Iago Estrela <IagoFilipe.EstrelaBarros@windriver.com>
Change-Id: I89012d7f8c7693cd3dc078d9f67ddffb4308e254
2022-06-30 12:49:14 -03:00
Lucas Cavalcante
5025e1ce71 Skip Recover From Armada to FluxCD apps
If an system application-update is triggered updating an armada app
to a fluxcd app (preceded by a helm release migration) and
update fails, the application framework will try to perform a recover.

Recover will fail as fluxcd uses helm3 and armada helm2. This will
create resources both in helm2 and helm3 leaving the app
in a inconsistent state.

To prevent that from happening recover is skipped if to_app and from_app
use different chart managers.

TEST PLAN:
PASS: recover skipped after update from armada to fluxcd without
migrating helmrelease

Closes-bug: 1980242
Signed-off-by: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
Change-Id: I9061b75f443730e973b79cc93e955069951113ff
2022-06-30 11:39:53 -03:00
Iago Estrela
28f72983de Move ttys_dcd configuration from agent to puppet runtime manifest
The ttys_dcd flag is currently being configured by sysinv agent audit.
The API updates the database but it is not instantly configured, it
must wait for an audit iteration. This change modifies this behavior
by running a puppet runtime manifest [1] from the API.

[1] https://review.opendev.org/c/starlingx/stx-puppet/+/845174

Closes-Bug: 1978009
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/845174

Test plan:
PASS: Bootstrap AIO-SX, host unlock and verify system alarms.
PASS: Set the flag via host-update and check if the serial console
      was really configured.
PASS: Unset the flag and also verify serial console.

Signed-off-by: Iago Estrela <IagoFilipe.EstrelaBarros@windriver.com>
Change-Id: Ia3ef2a5e96905c3cd770601d1e78af368bf54a95
2022-06-29 12:57:30 -03:00
Zuul
2401014697 Merge "sysinv-api: Temporarily increase timeout to let bootstrap pass" 2022-06-29 13:45:38 +00:00
Zuul
146dd74869 Merge "Prevent a new host to be added to AIO-SX" 2022-06-28 17:48:19 +00:00
Zuul
60ad27d514 Merge "update license statements" 2022-06-28 14:34:15 +00:00
Michel Thebeau
40b50bfb11 update license statements
Align the license statements with other files in this repo.  Remove the
proprietary statements which are inconsistent with the use of Apache-2.
Include the SPDX-License-Identifier.

Closes-Bug: 1979242

Change-Id: Ic20b9e896198dee37ccf12ef993e050ff9f53fc2
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2022-06-28 13:56:11 +00:00
Dan Voiculeasa
648e5bcbba sysinv-api: Temporarily increase timeout to let bootstrap pass
On Debian we observe that bootstrap fails intermittently.
From the data so far, this is observed only on VMs.

Couldn't reproduce the error locally, but from the investigation
deduce it is just a performance degradation.
Investigation notes to prove it is a performance degradation and not a
service crash are uploaded to the LP.
Temporarily increase the retries for waiting for sysinv-api to come up.
Jump from 15 seconds to 60 seconds to be defensive.
This will allow sanities to pass and integration effort to continue.

Tests on AIO-SX on Debian:
PASS: bootstrap

Partial-Bug: 1979717
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Change-Id: I10b00eab467303771d25cc5c79760005c3966446
2022-06-24 10:57:31 +03:00
Junfeng Li
693a85bad1 Prevent a new host to be added to AIO-SX
Details: 1. Added a condition check to prevent a host from being added to an AIO-SX.
         2. Updated host delete() to allow hosts to be deleted if it is not controller-0 in AIO-SX

Closes-Bug: 1978134

Test Plan:
PASS: Unit test cases against non controller-0 AIO-SX  host deletion
PASS: Patched local simplex system and ran reproduced steps
PASS: Migration from SX to DX

Signed-off-by: Junfeng Li <junfeng.li@windriver.com>

Change-Id: I3510bc43f13869ec76673ec50a879b463ee760f4
2022-06-22 13:12:04 -04:00
Kaustubh Dhokte
cc3cdbd647 apply feature-gate update during upgrade-activate
This change adds a method to apply 'update-k8s-feature-gates'
puppet class on both controllers during platform upgrade activate
phase.

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/845651

Test Plan: (CentOS)
On AIO-SX and AIO-DX:
PASS: Full platform upgrade successful
PASS: Check logs if puppet manifest is successfully
      applied on both the controllers.

Story: 2009789
Task: 44765

Signed-off-by: Kaustubh Dhokte <kaustubh.dhokte@windriver.com>
Change-Id: I9cb494a72b7ad62476378f3512cb55c94596eb1e
2022-06-19 08:02:04 +00:00
Zuul
9bf7cbb488 Merge "Ceph monitor host is able to lock when only 2 monitors are avilable" 2022-06-18 13:33:51 +00:00
Zuul
8e08bfbcfb Merge "Update application namespaces PSA labels" 2022-06-18 01:17:35 +00:00
Carmen Rata
eef577f13d Update application namespaces PSA labels
This commit updates the per-mode version of Pod Security Admission
labels to "latest" for application namespaces such as cert-manager.
Pod Security Admission labels on namespaces are needed for pod
security admission controller to know how restrictive each
namespace is.
Pinning to a specific Kubernetes version, for example v1.23, allows
the behavior to remain consistent as policy changes happen over
Kubernetes releases. Keeping the version "latest" as the default,
allows more flexibility when supporting multiple kubernetes
versions.
This commit also updates the application namespaces label default
levels to "privileged" from "baseline". This will cause no-harm
if users do not wish to use "beta" PSA feature enabled by default
in Kubernetes v1.23+.

Test Plan:
PASS: In an installed system verify that the pod security admission
      labels of the cert-manager namespace has been updated with the
      per-mode version "latest".
PASS: Created namespaces where policies are applied via labels.
      Privileged pods fail to get created in namespaces that are not
      configured with privileged policy level.
PASS: Privileged pods get created in namespaces with no security
      policy labels.

Story: 2009833
Task: 45632

Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
Change-Id: I76d44873ac447bbc0e2d90643fedf38bef8ebd1a
2022-06-17 20:30:18 -04:00
Zuul
60a6d41355 Merge "OIDC upgrade script" 2022-06-17 17:36:33 +00:00
Michel Thebeau
8293b0af2c OIDC upgrade script
These scripts perform helm override check, backup of helm overrides,
conversion of helm overrides and upgrade of the oidc-auth-apps
application.

The backup helm overrides and conversion files will be in named
sub-directories within /opt/oidc-auth-apps. The sub-directory names
reflect the procedure that creates them, but are otherwise arbitrary.

The configuration check ensures the end-user's helm-overrides can be
converted from old Dex to new Dex. The conversion rearranges the helm
overrides to fit into the new Dex values.yaml.

The 50-validate-oidc-auth-apps.py contains the backup, check and
conversion code and will be run on both on the 'from' release as a
pre-upgrade check, and on the 'to' release for backup, and conversion.

The 70-upgrade-oidc-auth-apps.sh script is run at upgrade activate to
remove the old app and apply the new app with converted user overrides.
This invokes 50-validate-oidc-auth-apps.py on the active controller
during upgrade-activate as well.

Depends-On:
https://review.opendev.org/c/starlingx/oidc-auth-armada-app/+/845380

Test plan:
PASS: tox (python2, python36)
PASS: run upgrade script as postgres user
PASS: sequenced commands: check, backup and convert
PASS: conversion of sample configurations with app verification after
PASS: conversion without user overrides
PASS: backup helm-overrides to /opt/oidc-auth-apps as postgres user
PASS: check and conversion of helm override for documentation examples
PASS: push helm overrides back to database
TODO: upgrade-activate
TODO: end-to-end upgrade SX
In-progress: end-to-end upgrade DX
TODO: python3 Debian

Story: 2009838
Task: 45641

Change-Id: If67ae45826bd9ceae35e50d5536f5054dd3ae9dd
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2022-06-17 11:52:11 -04:00
Zuul
4e46f2ac6a Merge "Remove deprecated upgrade scripts" 2022-06-16 23:47:49 +00:00
Adriano Oliveira
d663a29719 Remove deprecated upgrade scripts
These upgrade scripts were necessary on the stx 5.0 to 6.0 upgrade
path, however, they are not necessary for the next 7.0 upgrade path.

Test Plan:

PASS: Fresh install on AIO-SX
PASS: AIO-SX Upgrade (6.0 -> 7.0)
PASS: AIO-DX Upgrade (6.0 -> 7.0)
PASS: Checked scripts are not present after upgrade completed

Story: 2009754
Task: 45634

Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
Change-Id: I675e8916b2f661a65622dd1d1bf0d7aa9e9deafe
2022-06-16 23:01:39 +00:00
Zuul
d14dc686d8 Merge "system health-query response on ceph query" 2022-06-15 17:28:27 +00:00
Zuul
bf7438bd0f Merge "Update certs spec to work with version v1" 2022-06-15 16:59:04 +00:00
Zuul
369be99c07 Merge "Handle status_code / status error in HTTP response" 2022-06-15 16:55:26 +00:00
Zuul
8fa8085794 Merge "Delete certificate alarm when secret is deleted" 2022-06-15 15:41:36 +00:00
Rei Oliveira
66ac141a28 Delete certificate alarm when secret is deleted
For certificates stored as kubernetes tls secrets, the alarm should be
cleared when the secret is deleted.

This changes the audit_for_deleted_certificates function to also check
for deleted secrets and subsequently clear the alarm and delete the
certificate snapshot information.

Test plan:

PASS: Verify that when a certificate is deleted the alarm is cleared
      from the system
PASS: Verify that deploying a soon-to-expire certificates results in
      an alarm in 'fm alarm-list'
PASS: Verify that an existing certificate alarm is cleared up by
      renewing the certificate to get a valid certificate

Closes-Bug: 1978730

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I6ed9248766b2abbbcc616e10d4575b4ae0471c9d
2022-06-15 11:39:59 -03:00
Cole Walker
b50407b5fa [PTP] Update notification namespace to be privileged
The ptp-notification application requires a privileged namespace in
order to deploy and operate.

This change moves the notification namespace from the baseline policy
group to the privileged policy group so that it can continue to operate
as it did prior to the addition of support for the Pod Security
Admission controller introduced in the upversion to k8s 1.23.

The privileged and baseline groups were defined in
https://review.opendev.org/c/starlingx/config/+/833487

Test-plan:

Pass: Update the privileged and baseline groups in common.py, restart
sysinv-conductor and verify that ptp-notification is able to properly
deploy.

Pass: Verify that the notification namespace has the expected
privileged labels.

Closes-Bug: 1978737

Signed-off-by: Cole Walker <cole.walker@windriver.com>
Change-Id: I5d24a8e81b32809f568a5953701cf2e0c474005e
2022-06-14 17:50:21 -04:00
John Kung
9c2501a720 system health-query response on ceph query
In order to assure a response to the system health-query, when Ceph
storage-backend is configured and the ceph-api is unresponsive,
a Timeout is required.

This Timeout does not rely on the underlying ceph-api timeout as
the ceph-api may not timeout as expected.

Test Plan:
PASSED Verify system health-query response when Ceph is unhealthy
PASSED Verify system health-query response when Ceph is healthy

Closes-Bug: 1978726
Signed-off-by: John Kung <john.kung@windriver.com>
Change-Id: I4702c409e8ea45946ba94fab6a0989a90f2f6604
2022-06-14 17:30:51 -04:00
Kyle MacLeod
edd548d8c1 Handle status_code / status error in HTTP response
Checks for the existence of either 'status_code'
'status_int', or 'status' in the HTTP response object,
instantiating the HTTPException instance with the found
code, or raises a generic Exception if not found.

Test Plan:

PASS:
Tested code on system exhibiting the errors
referenced in the bug. The HTTP response is now
correctly caught and the correct HTTPException is
raised.

Closes-Bug: 1978499

Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
Change-Id: I53d1dd6583368de4f6713a838fe3304d362f1756
2022-06-14 14:12:28 -04:00
Kaustubh Dhokte
144f6fc9c5 Update certs spec to work with version v1
The change https://review.opendev.org/c/starlingx/config/+/838594
updated certificate api-version from cert-manager.io/v1alpha2 to
cert-manager.io/v1. But did not make necessary changes to certificates
specs to work with the new version.
This change makes only the required changes to certificates specs to
work with the new version: cert-manager.io/v1

The spec organization[] should now be subject:organizations[]
See the difference here,
https://cert-manager.io/v0.13-docs/reference/api-docs/#cert-manager.io/v1alpha2.Certificate
 and https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.CertificateSpec

The organization 'system:masters' in the admin.conf certificate is
required to authorize the access for kubernetes-admin to cluster objects.
This authorization is specified in the 'cluster-admin'
clusterrolebinding. Without this change, all kubectl commands fail.

In v1, unlike in v1alpha2, CN is ignored by TLS clients during
authorization (https://cert-manager.io/docs/reference/api-docs/#cert-manager.io/v1.CertificateSpec)
if any subject alt name is set. My initial understanding here was that
the CN field value is being ignored due to
subject:organizations:['system:masters'] (in v1), as all the deployment
and daemonset pods were failing after "system kube-rootca-pods-update
--phase=trust-new-ca" (during rootCA update) with an authorization error
for the user 'kube-apiserver-kubelet-client'.
This forces the removal of organizations from the apiserver kubelet
client certificate as all deployments and daemonset pods authenticate
and authorize with the 'kube-apiserver-kubelet-client' user.

Without 'system:nodes' in the kubelet client certificate,
kube-scheduler and kube-controller-manager fail to authorize.
More Info: https://kubernetes.io/docs/reference/access-authn-authz/node/

Test Plan:
On CentOS AIO-SX:
PASS: Manual kubernetes RootCA update successful
PASS: Orchestrated kubernetes RootCA update successful.
PASS: All deployments, daemonsets and pods running as expected after
      RootCA update.

Closes-Bug: 1978365

Signed-off-by: Kaustubh Dhokte <kaustubh.dhokte@windriver.com>
Change-Id: I767a70a07ab540510e4eb734cb4e282c9918840c
2022-06-14 18:02:24 +00:00
Felipe Sanches Zanoni
561a830e7d Ceph monitor host is able to lock when only 2 monitors are avilable
Ceph monitor quorum requires at least 2 monitors up when 3 are
configured in Standard or Storage setups. If 1 host that has ceph
monitor configured is locked, no other ceph monitor host can be
locked.

Test Plan:
 PASS: AIO-SX CentOS lock/unlock.
 PASS: AIO-DX CentOS lock/unlock standby controller.
 PASS: Storage CentOS lock controller-1. Cannot lock storage-0.
 PASS: Storage CentOS lock controller-1. Force lock storage-0.
 PASS: Standard CentOS lock controller-1. Cannot lock compute-0.
 PASS: Standard CentOS lock controller-1. Force lock compute-0.

Closes-Bug: #1978498

Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: If32aeea4712646430fdba06709aa3d4b9e05c51c
2022-06-14 11:38:01 +00:00