857 Commits

Author SHA1 Message Date
Takamasa Takenaka
978dea28f2 Remove trap destination from fm.conf
With the host-based SNMP removal,
remove trap_destination entry from fm.conf

Story: 2008132
Task: 41350
Change-Id: I3f0298233beedc3370fa8c4c2dbc65fe678b14a6
Depends-On: https://review.opendev.org/765381
Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
2021-01-20 09:55:36 -03:00
Zuul
adb848ecb9 Merge "Support trap_server_port configurable" 2021-01-19 22:12:27 +00:00
Angie Wang
0f7418e761 Configure SQL as helm storage backend
Configmap is the default helmv2 storage backend to store
release information but its 1MB resource limit prevents
scaling up stx openstack worker nodes, so we want to use
SQL as helm storage backend.

Add class in helm puppet manifest to setup helm database
during ansible bootstrap.

This commit also fixes the IP address in postgres pg_hba.conf.

Currently, we have the following rules for both IPv4 and
IPv6 systems:
Rule Name: allow access to all users with encrypted password
from all IPv4 addresses.
host  all  all         0.0.0.0/0   md5
Rule Name: deny access to postgresql user.
host  all  postgres    0.0.0.0/32 reject

For the IPv6 system, the address of pods is IPv6. The CIDR
address in the rule should be changed to corresponding
IPv6 address (::0/0) to allow tiller running in container
to access helm database.

Depends-On: https://review.opendev.org/#/c/761645/
Change-Id: Ifd072000e0680a59d5be0f2f1ef2ce1cbabc1e4f
Partial-Bug: 1887677
Signed-off-by: Angie Wang <angie.wang@windriver.com>
2021-01-07 21:46:08 -05:00
Takamasa Takenaka
4b97414655 Support trap_server_port configurable
Add parameter for trap_server_port to make user can
configure snmp trap server port number through
user helm override.

Story: 2008132
Task: 41548
Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Change-Id: Iac44d813447881591efd7b4a088185f2d59986be
2021-01-07 11:12:20 -03:00
Zuul
8c75eabee4 Merge "Enable etcd with security setting." 2021-01-05 14:12:31 +00:00
Zhipeng Liu
777d5d0de7 Enable etcd with security setting.
Update etcd puppet to support security settings.

Partial-Bug: 1894870

Change-Id: Ifb5bb2506a260186bf4e8caa487bbeaae04df80b
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
2020-12-23 19:49:27 +08:00
Zuul
a51de30c99 Merge "Add auto-version for remaining stx/stx-puppet packages" 2020-12-17 21:40:34 +00:00
Don Penney
6182d3f949 Add auto-version for remaining stx/stx-puppet packages
Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
the version is incremented above the hardcoded version.

Change-Id: I110ef3a10c3164f8edb706b9257f33178b4a2517
Story: 2008455
Task: 41456
Signed-off-by: Don Penney <don.penney@windriver.com>
2020-12-17 13:21:50 -05:00
Zuul
b3eb8e53aa Merge "Add mds support in puppet for CephFS." 2020-12-17 07:39:02 +00:00
Zuul
e21cea31db Merge "Add variables for snmp in fm.conf" 2020-12-15 21:12:02 +00:00
Daniel Safta
b1997248da Add mds support in puppet for CephFS.
Mds configuration needs to be present on every node that
has a ceph monitor in order for CephFS to be available.

Change-Id: Ic4270e401b2c3e5123aecfab21af1e874b733830
Story: 2008162
Task: 40908
Signed-off-by: Daniel Safta <daniel.safta@windriver.com>
2020-12-11 09:18:27 +00:00
Zuul
0c70a49588 Merge "Skip platform ceph osds puppet manifest following DOR" 2020-12-09 23:59:55 +00:00
Zuul
2e1469cdcc Merge "Fix directory permissions for /var/log/rabbitmq" 2020-12-07 17:57:47 +00:00
Andy Ning
6f881cc84e Skip platform ceph osds puppet manifest following DOR
ceph::osd puppet manifest will fail during controller puppet
manifests apply following DOR, because as both controllers are
booting up, there is no ceph monitor cluster so puppet is unable
to validate or invalidate the existing configuration.

This change updated platform::ceph::controller class to skip
platform ceph osds in the case of DOR.

Change-Id: I0254ce28869bc87c5e939ea8984d175244ebb65f
Partial-Bug: 1904739
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2020-12-04 09:38:56 -05:00
Carmen Rata
8ba9e81db4 Fix directory permissions for /var/log/rabbitmq
Updated /var/log/rabbitmq directory permissions to 750 from 755
to disallow world access to rabbitmq log files but at the same
time to allow group access.
The changes are made to comply as much as possible with
openscap rules security requirements.
Verified that installation is successful for AIO-SX
and Standard 2+2 system configurations.

Story: 2008037
Task: 40694

Change-Id: I1c0112575033c04983c56298e2131882911333de
Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
2020-12-01 22:22:57 -05:00
Zuul
24181b196d Merge "Retain more puppet log files" 2020-11-26 04:15:45 +00:00
Lu Yao Chen
3b7c55174a Retain more puppet log files
Increased max log directories to retain more
debugging info from puppet.log.

Was tested by looping system host-cpu-modify
commands, /var/log/puppet caps at 50 log directories
instead of 20.

Closes-Bug: 1903994

Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
Change-Id: Ia8458396867f988d5061d3aa49fa2a21ee6ebac2
2020-11-25 14:37:47 -05:00
Zuul
c04af4dcef Merge "Remove comments in keystone::upgrade class" 2020-11-25 17:00:26 +00:00
Carmen Rata
77d3382d2c Fix permission of puppet saved logs tar file
Changed the permissions of puppet saved logs tar file from
644 to 600 to comply with openscap rules security requirements.
Verified that installation is successful for AIO-SX
and Standard 2+2 system configurations.

Story: 2008037
Task: 40694

Change-Id: I1fe365e808a085999667e898788afacf61fd6612
Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
2020-11-25 11:29:49 -05:00
Andy Ning
e5ff48c2ca Remove comments in keystone::upgrade class
The TODO comments in keystone::upgrade class no longer applies.
This update removed them.

Change-Id: Id9f7b39c15db1f73428d4f23d93ef3e3b4ad50f5
Partial-Bug: 1886064
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2020-11-25 10:30:24 -05:00
Jerry Sun
c10b5897b9 Update dnsmasq config for slow DNS servers
When a configured DNS server is taking a long time to respond to
unknown domains or hosts, registry interactions like push, pull,
and querying for images through system commands will fail due to
hostname resolution for registry.local. This is because it attempt
to resolve registry.local using the A record first, which times out
since it is hitting the configured external DNS server. This
prevents the process from looking up the AAAA record which would
resolve to the dnsmasq CNAME record. This commit updates the dnsmasq
config to prevent forwarding the local domain to upstream servers.

Change-Id: Ic3cf6aae87f8f2d5c61a24db00a4cb814c20aac6
Closes-Bug: 1904885
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2020-11-19 14:31:24 -05:00
Zuul
9d8af1864b Merge "Remove dcdbsync public endpoint from keystone catalog" 2020-11-16 19:50:45 +00:00
Andy Ning
a7449bcb6e Remove dcdbsync public endpoint from keystone catalog
dcdbsync is a private service only used by dcorch in DC system to
synchronize keystone resources. It's not supposed to have public
endpoint in keystone catalog to expose its service on OAM IF.

This update removed its public endpoint from keystone catalog.

Change-Id: Idfb95ad26ea99e3ca01d78b974284909f82becc0
Closes-Bug: 1892391
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2020-11-16 13:57:27 -05:00
Zuul
3088301a32 Merge "Fix permission of puppet dated directory logs" 2020-11-16 16:30:20 +00:00
Zuul
4590c986ae Merge "Handle hugepages params in grub audit" 2020-11-16 16:25:17 +00:00
Zuul
7f55597d7d Merge "Enable cert-mon in all deployment configurations" 2020-11-16 14:32:31 +00:00
Don Penney
1f7aa66c1d Handle hugepages params in grub audit
The grub cmdline can have multiple hugepages config arguments, in
pairs of hugepagesz and hugepages parameters. This commit extends the
puppet-update-default-grub.sh audit to handle these parameters in
pairs, allowing for multiple pairs.

Change-Id: Ia3c431f25ae488987245929de7a451aa4a822f06
Closes-Bug: 1901559
Signed-off-by: Don Penney <don.penney@windriver.com>
2020-11-13 12:07:38 -05:00
Carmen Rata
11790c85eb Fix permission of puppet dated directory logs
Changed the permission of puppet dated directory logs,
"/var/log/puppet/<datetime>_<personality>", from 777
to 700, in order to fix openscap security violation.
Verified that installation is successful for AIO-SX
and Standard 2+2 system configurations.

Story: 2008037
Task: 40694

Change-Id: I260c745dd24c33e3d7bd7be403246f0f63bf0894
Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
2020-11-12 09:25:40 -05:00
Takamasa Takenaka
3ca2387ddb Add variables for snmp in fm.conf
Snmp trap client needs the following three variables
to connect to snmp trap server.
- trap_server_ip
- trap_server_port
- snmp_enabled
Modify puppet to add these variables. trap_server_ip
and trap_server_port are fixed. snmp_enabled takes
True/False depends on snmp armada app is applied
or not (True when applied).

Change-Id: Ibedaf772153f49c6dfefe644044da07b5d32bb20
Story: 2008132
Task: 41207
Depends-On: https://review.opendev.org/761213
Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
2020-11-10 10:36:36 -03:00
Sabeel Ansari
b74550897c Enable cert-mon in all deployment configurations
This enables cert-mon service to run on controllers in all
deployment configurations (AIO, Standard, DC etc)

Depends-on: https://review.opendev.org/#/c/760214/

Change-Id: Ic2f793fe392fca21f0dd8eb55f7b5dee90a9a48b
Story: 2007361
Task: 41163
Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>
2020-11-04 13:53:20 -05:00
Zuul
72f52d29e6 Merge "Make sure OSDs are placed on nodes" 2020-11-03 21:01:21 +00:00
Dan Voiculeasa
3722db3e54 Make sure OSDs are placed on nodes
With the recent work that added the possibility to take a backup on
controller-1 and restore on controller-0 the [osd.X] entries are removed
from ceph.conf before controller-0 unlock.

An [osd.X] entry must be present in ceph.conf so that a ceph-osd process
is spawned and the directory structure for that osd is created in
/var/lib/ceph/osd/ceph-X.

Mtc runs an initialization script which expects that directory to be
populated, otherwise it fails and places the node in a degraded state.

For AIO systems the [osd] entry was present populated, but for STANDARD
it was not. This commit ensures the entry is populated for both AIO and
STANDARD systems.

Tested full install + platform B&R on AIO-SX, AIO-DX, STANDARD with
controller storage, all with ceph.

Closes-Bug: 1901636
Change-Id: I669ad3b22f59136253d42cdf6ac603fba31000be
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
2020-11-03 18:33:21 +02:00
Zuul
1ff36f9807 Merge "Quote password in ldap command" 2020-11-02 14:48:31 +00:00
Andy Ning
1f859052c6 Quote password in ldap command
Quote the password to the "-w" option of the ldap commands in puppet
for special characters in the password.

Change-Id: I43275bb2323b8525c5c77fe9a69d386190292223
Closes-Bug: 1901228
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2020-10-29 15:03:23 -04:00
Zuul
ba15e899ca Merge "Ensure sriovdp is deleted after dev bindings" 2020-10-28 20:59:08 +00:00
Cole Walker
03324f173c Ensure sriovdp is deleted after dev bindings
This change replaces the daemonset rollout restart command with a more
spcific pod delete command that only runs if there is an
sriov-device-plugin pod present on the node. Using the pod delete
command ensures that an existing device-plugin pod is terminated before
the worker manifest completes. The rollout restart command did not
ensure that the pod was terminated before the manifest completed and
could allow user pods to be assigned incorrect VFs if they started up
before the device-plugin pod terminated.

This addresses an issue where pods restarted by k8s-pod-recovery could
be assigned to incorrect VFs if they were started while the
sriov-device-plugin was shutting down. Waiting for the device-plugin
to completely terminate before proceeding with pod-recovery ensures that
the device-plugin will have an accurate view of all device bindings and
can allocate VFs correctly.

Closes-Bug: 1900736

Change-Id: I30fd602208d14ac887d5417fd87f27f23050f670
Co-Authored-By: Steven Webster <steven.webster@windriver.com>
Signed-off-by: Cole Walker <cole.walker@windriver.com>
2020-10-28 16:09:43 -04:00
Zuul
c19067d8e5 Merge "Update grub defaults from manifests" 2020-10-27 12:58:31 +00:00
Don Penney
4913d25a0f Update grub defaults from manifests
This update adds a utility, puppet-update-default-grub.sh, that is
called from the grub.pp and compute.pp manifests when they call grubby
to update the grub cmdline. This ensures that the changes being made
via grubby will have corresponding changes reflected in the grub
defaults file, so that added options are not lost if the grub cmdline
is updated via calls to grub2-mkconfig.

Change-Id: Ibf74ddf9d0445e92dd8903c67e95901d856ad373
Closes-Bug: 1901559
Signed-off-by: Don Penney <don.penney@windriver.com>
2020-10-27 12:21:27 +00:00
Zuul
4a2b58a5d2 Merge "Remove admin_token from keystone config" 2020-10-27 12:10:53 +00:00
Andy Ning
12cce45b1d Remove admin_token from keystone config
Currently the admin_token is still set with a value in keystone.conf
though it is disabled after bootstrap and no longer in use. This update
removes it during controllers unlocking as a security enhancement.

This update also fixes an ceph issue that would be triggered by the
above change and cause ceph.pp not generating ceph.conf properly
due to a resource creation disorder.

Change-Id: I4093bca40fad3724e89d902aae36d26f85aebd60
Closes-Bug: 1900726
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2020-10-23 13:38:05 -04:00
Zuul
a36944dbc6 Merge "Update apiserver certificate's SANs when OAM IP change" 2020-10-23 14:04:58 +00:00
Zuul
109edae9d2 Merge "Replace containerd Sysinv credentials with mtce credentials" 2020-10-20 00:56:56 +00:00
Andy Ning
62a5358753 Update apiserver certificate's SANs when OAM IP change
When the bootstrap manifest is applied the system adds any OAM IP
addresses to the apiserver's certificate SAN list. This is used for
remote kubectl access. However when the OAM IP address is changed,
these IP values are not updated. Without the correct values in
apiserver cert remote access will fail.

This update introduces a kubernetes certsans runtime puppet manifest
which will be applied during OAM IP change process to update apiserver's
cert SANs list with the new IPs.

Change-Id: Iedf35ddaedef5cae2e81941446fc6a8de39639f6
Closes-Bug: 1878451
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2020-10-19 16:49:55 -04:00
Jerry Sun
0405a5529d Replace containerd Sysinv credentials with mtce credentials
Sysinv credentials in the containerd config allowed kubernetes to
deploy images without pull secrets. We replace the credentials with
"mtce" user's credentials. The "mtce" user is treated as a public
user and is not allowed to deploy non-public images.

Closes-Bug: 1894930
Depends-On: https://review.opendev.org/756557

Change-Id: I4a33c6aba50d98d42ef91c75bfc9c148d4ebd9fd
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2020-10-19 11:05:08 -04:00
Tee Ngo
9c25d435a3 Increase max_pool_size for dc audits
Increase max_pool_size for dcorch and dcmanager audits to avoid
database thrashing with connect/disconnect requests resulting in
sharp CPU spike caused by postgres on every dcorch/dcmanager audit
cycle. The CPU spike is magnified when both dcorch and dcmanager
audits happen to run at the same time which can impact resources
intensive operations such as batch subcloud deployment. Low
max_pool_size setting makes sense for on-demand services such as
fm, not for services that perform regular audits.

These settings will be re-assessed and adjusted when all DC
scalability related features are complete.

Closes-Bug: 1895605
Change-Id: I138faa640933bd255d7ae90d3388733f35431e4d
Signed-off-by: Tee Ngo <Tee.Ngo@windriver.com>
2020-10-06 21:35:26 -04:00
Zuul
dd34b088d9 Merge "Fix route config handling for DOR" 2020-09-22 16:40:48 +00:00
Zuul
87a2ed8f28 Merge "Move dcmanager orchestration to a separate process" 2020-09-16 12:04:51 +00:00
Don Penney
4d81ca1780 Fix route config handling for DOR
In a DOR (Dead Office Recovery, where all nodes reboot at once), both
controllers are rebooting at the same time. This means that there is
no active controller from which to retrieve puppet data in order to
apply the controller manifests. As such, we have to be careful not to
rely on having the controller manifests run on every controller boot,
for things like launching services or any sort of changes in a
volatile file system like /var/run.

The route configuration optimization changes that were added in
https://review.opendev.org/703034 inadvertently relied on existing
network configuration data being cached in /var/run, however. As a
result, a route configuration change after a DOR of a system with
standard controllers would end up running with no cached network
config data (an AIO system would have this data generated as part of
applying the worker manifest), and the apply_network_config.sh utility
would think that all network interfaces have been removed from the
system. It would then proceed to apply that config, deleting the
interfaces and taking down all networking.

This commit enhances the apply_network_config.sh to introduce a
--routes option to separate the route configuration operations from
the rest of the networking config. When a route is added or deleted,
then, only the route config changes are processed, ignoring network
interfaces.

Additionally, this adds a check in the interface section of the
apply_network_config.sh utility to verify that at least 'lo' exists.
Since the loopback interface should always exist, its absence would
indicate that the interface config data is missing or corrupted, and
is unsafe to apply.

Change-Id: I5583ec916aee8117e5686cfb10fb18ddda4806b1
Closes-Bug: 1895693
Signed-off-by: Don Penney <don.penney@windriver.com>
2020-09-15 12:28:30 -04:00
Zuul
3d3dde9caf Merge "Periodic message loss between VIM and Openstack pods" 2020-09-10 19:02:08 +00:00
Zuul
11de7bf2f8 Merge "Enable StarlingX in QEMU/KVM VM with small disk" 2020-09-10 15:48:55 +00:00