535 Commits

Author SHA1 Message Date
Al Bailey
e472d61af3 Add a .gitreview file to the new repo
This file is needed in order for people cloning the repo
to be able to initialize it for gerrit by the
"git review -s" command

Change-Id: If0c791896250519def25149dae3e077e689a054d
Story: 2006166
Task: 36530
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-09-09 09:45:59 -05:00
Scott Little
0823f11af6 Config file changes to add 'puppet-manifests modules/puppet-dcdbsync modules/puppet-dcmanager modules/puppet-dcorch modules/puppet-fm modules/puppet-mtce modules/puppet-nfv modules/puppet-patching modules/puppet-smapi modules/puppet-sshd modules/puppet-sysinv ' after relocation from 'stx-config'
Story: 2006166
Task: 35687
Change-Id: Ic77790c76bb3e4ba0775cf5c1f09c37ae2ca99a3
Signed-off-by: Scott Little <scott.little@windriver.com>
Depends-On: I1c3caaf89793c4fdaf97f3232fe1704cd43f5dfd
2019-09-04 11:07:06 -04:00
Zuul
b3cd00d458 Merge "Enable kubernetes SCTPSupport feature"
Change-Id: I1c3caaf89793c4fdaf97f3232fe1704cd43f5dfd
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-09-04 10:10:55 -04:00
Zuul
5243650a75 Merge "Puppet support for authenticated registries" 2019-09-03 19:37:20 +00:00
Al Bailey
4ee0a2fdc4 Enable kubernetes SCTPSupport feature
The feature gate for sctp support in apiserver was added in
kubernetes 1.12 but is disabled by default.  This commit enables it.

Information about SCTP is here:
https://kubernetes.io/docs/concepts/services-networking/service/#sctp

The centos version of netcat can be used to validate the feature.
A Dockerfile for building a centos netcat is provided.

Tested by:

kubectl run --generator=run-pod/v1 --image netcat:v1.0.0 \
    listen-sctp -it --rm -- --sctp -l -p 9000

(get IP of the listener pod)
kubectl run --generator=run-pod/v1 --image netcat:v1.0.0 \
   test-sctp -it --rm -- --sctp <listener pod IP> 9000

Change-Id: I9642e485cb9c30f6b1272c00ec1046b9c98211ac
Story: 2006472
Task: 36403
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-09-03 19:23:05 +00:00
Angie Wang
70917e77cf Puppet support for authenticated registries
This commit supports to pull images from alternative authenticated
registries that configured at Ansible bootstrap to bring up k8s pods
at puppet time.

At bootstrap time, barbican secrets are created to store credentials
for accessing registry and alternative registries info are stored in
service parameter. At puppet time, the barbican sercret is retrieved
to get the credentials in order to pre-pull k8s images that required
by kubeadm to bring up static pods(ie..kube-controller-manager,
kube-apiserver, kube-scheduler..).

The images for dynamic pods(kube-multus, kube-sriov-cni, calico..) and
tiller are not needed to pre-pull, imagePullSecrets is added in their
pod spec to pass credentials to kubelet. This is done in Ansible
bootstrap https://review.opendev.org/#/c/679136/

This commit also updates to pull Armada image before creating Armada
container if Armada image is not available in docker cache.

Tests(AIO-SX, AIO-DX, Standard):
 - All types of system are installed successfully
 - Verified all k8s/gcr/docker images are downloaded from
   authenticated registry on controller-1 and worker nodes
 - Verified images from authenticated registries are used
   by k8s static/dynamic pods on controller-1 and worker nodes
 - Swact to controller-1, lock/unlock controller-0. Verified
   that tiller image is downloaded from authenticated registry
   and tiller pod is created on controller-1
 - Swact to controller-1, apply application. Verified that
   Armada image is downloaded from authenticated registry and
   Armada container is created.

Change-Id: Iaabef0f5d8a6a4640dcfde93a8c0449948f4a59f
Depends-On: https://review.opendev.org/679335
Story: 2006274
Task: 36379
Signed-off-by: Angie Wang <angie.wang@windriver.com>
2019-08-30 18:17:02 +00:00
Zuul
09bdac1e95 Merge "Update kubernetes config for 1.15 features." 2019-08-22 15:38:04 +00:00
Al Bailey
319a388fd2 Update kubernetes config for 1.15 features.
Upgrading from kubernetes 1.13.5 to 1.15.0 meant the config
needed to be updated to handle whatever was deprecated or dropped
in 1.14 and 1.15.

1) Removed "ConfigMapAndSecretChangeDetectionStrategy = Watch"
reported by https://github.com/kubernetes/kubernetes/issues/74412
because this was a golang deficiency, and is fixed by the newer
version of golang.

2) Enforced the kubernetes 1.15.3 version

3) Updated v1alpha3 to v1beta2, since alpha3 was dropped in 1.14
changed fields for beta1 and beta2 are mentioned in these docs:
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2

4) cgroup validation checking now includes the pids subfolder.

5) Update ceph-config-helper to v1.15 kubernetes compatable
This means that the stx-openstack version check needed to be increased

Change-Id: Ibe3d5960c5dee1d217d01fbb56c785581dd1b42c
Story: 2005860
Task: 35841
Depends-On: https://review.opendev.org/#/c/671150
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-08-21 10:30:56 -05:00
Zuul
42fad74699 Merge "Support Single huge page size for openstack worker node" 2019-08-20 20:46:59 +00:00
Tao Liu
2265ba2a65 Support Single huge page size for openstack worker node
Kubernetes only supports a single huge page size per worker
node. Prior to kubernetes 1.15, the huge page feature could
be disabled via a feature gate. In kubernetes 1.15, the
feature gate has been removed so huge page support is always
on in k8s.

This update removes the conditional disabling of the hugepage
feature and enforces the provisioning of a single page size
per worker.

When vswitch type is set to ovs-dpdk or avs, the application
huge pages size goes with the vswitch huge pages size.

This update also changes the auto-provisioning of VM huge
pages to 1G as there is no auto-provisioning in virtual
environment.

Story: 2006295
Task: 36006

Change-Id: I84d4959b420584fdcdf8a8664a6f4855c08ec989
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2019-08-20 16:04:25 -04:00
Zuul
af1d792ae2 Merge "Rebase Armada to latest master" 2019-08-19 15:12:31 +00:00
Robert Church
c5e453ecd7 Rebase Armada to latest master
Rebasing Armada to use the latest docker image tag
8a1638098f88d92bf799ef4934abe569789b885e-ubuntu_bionic.

Change-Id: Ic48a2e053d0de7dacfd6a07d817947e11dc8d596
Story: 2006347
Task: 36105
Signed-off-by: Robert Church <robert.church@windriver.com>
2019-08-15 16:54:51 -04:00
Zuul
9acef3ca9b Merge "Multus support for IPv6 service endpoint" 2019-08-13 15:58:39 +00:00
Steven Webster
ac0837316b Multus support for IPv6 service endpoint
The K8s service host in the Multus kubeconfig file is currently
not wrapped with [brackets] in the case an IPv6 cluster service
endpoint has been configured.

This causes issues for Multus when it attemps to get (curl) for
the address.

This fix ensures the IPv6 address is properly formatted for use
by Multus.

Closes-Bug: 1836972

Change-Id: I803edfb86a70d232d6015a7bb130da0756a56458
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2019-08-12 18:15:37 -05:00
Zuul
af56daf6c9 Merge "Collapse glance into platform in ControllerFS API" 2019-08-09 17:59:52 +00:00
Zuul
e4fa3a693d Merge "Restart Docker After Changing Proxy Settings" 2019-08-08 15:03:18 +00:00
Jerry Sun
addb118eb4 Restart Docker After Changing Proxy Settings
Restart Docker process after changing proxy settings through service
parameters. There is a potential issue currently where Docker is
started before the changes to proxy settings through service parameter
is applied. This means on lock/unlock, Docker restarts with old
proxy settings. This commit fixes that issue.

Closes-Bug: 1838651

Change-Id: I57e527998fdf50c4be38c32ea8d1ee95bc46d3ff
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2019-08-08 09:54:01 -04:00
Kristine Bujold
4def67f768 Collapse glance into platform in ControllerFS API
The existing "platform" filesystem is now resizable and added to
the ControllerFS API. The “glance” filesystem is merged into
"platform" and therefore removed from the ControllerFS API. The
"--force" flag is removed from the controllerfs-modify API as
it was only used for glance fs resizing.

The folder /opt/cgcs is removed and the “helm_charts” and “keystone”
folders now resides under /opt/platform.

  ls /opt/platform/
  armada  config  helm  nfv  puppet  sysinv

  ls /opt/cgcs/
  helm_charts  keystone

Resources related to drbd-cgcs and /opt/cgcs are removed from puppet
or updated to use drbd-platform and /opt/platform.

SM is no longer monitoring resources related to drbd-cgcs.

Tested in AIO-SX, AIO-DX and Standard hardware labs.

Partial-Bug: 1830142

Change-Id: I0a80c95a057e9d6d2acec5f33cc4da31cd20955e
Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>
2019-08-07 11:08:36 -04:00
Zuul
90d4e92b14 Merge "Configure radosgw and ceph-rgw as optional services" 2019-08-02 16:04:48 +00:00
Robert Church
f33d42f3eb Configure radosgw and ceph-rgw as optional services
radosgw is a now an optional platform service which is provisioned via a
system service parameter. To align with this optionality, the ceph-rgw
chart which is used to enable the containerized swift endpoints also
becomes optional.

Changes include:
- Update the stx-openstack application disabled_charts setting in the
  application metadata.yaml to include the ceph-rgw chart. This sets the
  initial chart state to disabled.
- Optimize ceph.pp puppet manifests to provide two runtime classes: one
  for setting up the platform radosgw configuration which will set the
  haproxy configuration and the other for updating the keystone
  information in the ceph configuration based on if the ceph-rgw chart
  is enabled.
- Update the sm.pp manifest to dynamically provision/deprovision the
  radosgw based on if it's enabled in the service parameters
- Rename the SWIFT service parameters to RADOSGW as this is the platform
  service being enabled.
- Restructure ceph.py/ceph.pp to generate and use hieradata such that
  _revert_cephrgw_config() and _update_cephrgw_config() can be combined
  into a single function for runtime updates.

Change-Id: Id8d5c6b1159881d44810fc3622990456f1e54e75
Depends-On: If284f622ceac48c4ffd74e7022fdd390971d0fd8
Partial-Bug: #1833738
Signed-off-by: Robert Church <robert.church@windriver.com>
2019-07-31 12:41:41 -04:00
Zuul
eb61b04a0d Merge "Set kubelet certificate rotation to 1 month" 2019-07-30 20:53:19 +00:00
Zuul
a0bbe59d65 Merge "Adding back kvm_advance_timer service" 2019-07-30 20:33:36 +00:00
David Sullivan
96ce7aeedb Set kubelet certificate rotation to 1 month
Use the experimental-cluster-signing-duration parameter to set the
kubelet certificate to expire after 1 month. Kubelet certificate
rotation is enabled by default.

Closes-Bug: 1834685
Change-Id: Ie5b91a86c1a1b536e51719dad99be0cc89d65722
Signed-off-by: David Sullivan <david.sullivan@windriver.com>
2019-07-30 15:11:36 -04:00
Zuul
626833df44 Merge "Platform restore for AIO-DX and Standard no-storage configuration" 2019-07-29 14:20:02 +00:00
Al Bailey
942d4f7e56 Adding back kvm_advance_timer service
On compute nodes with openstack-compute label, the
kvm_timer_advance_setup.service should be enabled.

The puppet service runs before kubelet.

Change-Id: I84d6c6234d4bd1c8c0c52f5735d7520377b2fe80
Partial-Bug: 1823751
Depends-On: https://review.opendev.org/#/c/672124
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-07-26 15:54:52 -05:00
Zuul
8fbad03ec0 Merge "Add OpenID connect params to kubeadm" 2019-07-26 19:02:52 +00:00
Jerry Sun
9b33c49b49 Add OpenID connect params to kubeadm
This commit modifies the kubeadm template to support OpenID connect
params.

Depends-On: https://review.opendev.org/671259

Story: 2006235
Task: 35836

Change-Id: I38f736aa68f9c0031ed697cdf17cd28ed08cadf6
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2019-07-26 10:59:05 -04:00
Wei Zhou
920ba4111d Platform restore for AIO-DX and Standard no-storage configuration
This commit is to support platform restore for AIO-DX and Standard
no-storage configuration using restore_platform playbook:
 - For AIO-DX, the restored ceph crushmap is loaded through puppet
   when controller-0 is unlocked for the first time. OSDs are
   created on controller nodes during controller unlock.
 - For Standard no-storage configuration, the restored ceph crushmap
   is loaded through sysinv when ceph quorum is formed. OSDs are
   created on controller nodes by applying ceph osd runtime manifests.
 - The .restore_in_progress flag file is removed as part of first
   unlock of controller-0.

Change-Id: I65bfc67cf90e894d125eb6c860139b26d17b562e
Story: 2004761
Task: 35965
Signed-off-by: Wei Zhou <wei.zhou@windriver.com>
2019-07-25 22:30:50 -04:00
Zuul
c929ec03b1 Merge "Restart collectd at the end of configuring cpu" 2019-07-25 19:17:01 +00:00
Zuul
40194db138 Merge "Add customer-specified certificates for kubernetes" 2019-07-25 18:12:47 +00:00
Bin Qian
69747791f5 Restart collectd at the end of configuring cpu
Restart collectd after configuring cpu to ensure collectd loads
updated configuration

Closes-Bug: 1837424
Change-Id: I10e0f431dfd01637f38319d506559aa3927f11ff
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-07-25 14:03:34 -04:00
Zuul
a25e983219 Merge "Revert "Revert "Changing tiller pod networking settings to improve swact time""" 2019-07-25 16:16:52 +00:00
Bart Wensley
5dba850125 Revert "Revert "Changing tiller pod networking settings to improve swact time""
This reverts commit a5c236dc522c050b036e638955c03074a2963996.

It was thought that setting the TCP timeouts for the cluster
network was enough to address the issues with the helm commands
hanging after a controller swact. This is not the case. In
particular, swacting away from the controller with the
tiller-deploy pod seems to cause tcp connection from that pod to
the kube-apiserver to hang. Putting the tiller-deploy pod back on
the host network "fixes" the issue.

Change-Id: I8f37530e1f615afcffcf6cb1d629518436c99cb9
Related-Bug: 1817941
Partial-Bug: 1837055
Signed-off-by: Bart Wensley <barton.wensley@windriver.com>
2019-07-25 09:30:00 -05:00
Zuul
e8865e6cfc Merge "Add floating ip for ironic network" 2019-07-24 14:34:42 +00:00
Mingyuan Qi
d3ee3c4dc0 Add floating ip for ironic network
This commit adds the floating ip support if ironic network is
created and an interface is assigned to that network. Ironic
floating ip is used for ironic node to access openstack services
through it. It's an HA feature for ironic if 2 controllers are
deployed.

Story: 2004760
Task: 34740
Depends-On: https://review.opendev.org/669781
Change-Id: I55681abfee700dcf7036503d1490accc413b84c4
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
2019-07-24 10:22:14 +08:00
David Sullivan
e29cecafb6 Add customer-specified certificates for kubernetes
We need the ability to update the Kubernetes ApiServer RootCA at
ansible-bootstrap-time. This includes the ability of being able to
specify the apiServerCertSANs such that user can specify additional
DNS:<FQDN> and/or IP Records for the auto-generated
apiServerCertificate.

This adds support for storing the apiServerCertSANs in the sysinv
database and modifies the puppet manifest to support user supplied SAN
records.

Partial-Bug: 1837079
Change-Id: I4d23828b31ced55d55b1c6932d0cfd6b59727288
Signed-off-by: David Sullivan <david.sullivan@windriver.com>
2019-07-22 15:11:52 -04:00
Zuul
f473815937 Merge "Restore containerized platform using Ansible restore_platform playbook" 2019-07-19 22:27:27 +00:00
Zuul
438ed57a4a Merge "Zero Touch Provisioning changes for subcloud configuration" 2019-07-18 16:10:41 +00:00
Zuul
574dba4853 Merge "Add new kubelet filesystem to host_fs" 2019-07-18 14:41:28 +00:00
Don Penney
6254928d6f Fix barbican-api.log rotation issue
The barbican-api process currently writes directly
to its logfile. As such, the logrotate config file
needs a copytruncate directive to ensure the process
doesn't end up writing to the rotated file instead.

Change-Id: I60c8a08ce612fd7f82e05f69b168919b12ab0017
Partial-Bug: 1836632
Signed-off-by: Don Penney <don.penney@windriver.com>
2019-07-17 18:19:50 -04:00
Wei Zhou
fff6b1a9de Restore containerized platform using Ansible restore_platform playbook
This commit is to support platform restore for AIO-SX using
restore_platform playbook:
 1. During AIO-SX restore, the restored ceph crushmap is loaded through
    puppet.
 2. Bypass vim when unlocking controller-0 for the first time.
 3. When unlocking controller-0 for the first time, app_reapply is
    skipped for stx-openstack application.
 4. After controller-0 is unlocked, ceph backend task is set to None.

Change-Id: I36d27b162334e5a2f0371793243f2301b5fec1eb
Story: 2004761
Task: 33645
Signed-off-by: Wei Zhou <wei.zhou@windriver.com>
2019-07-17 17:10:12 -04:00
Kristine Bujold
886f0a9718 Add new kubelet filesystem to host_fs
Add a new filesystem called "kubelet" to all hosts with a default
size of 10G. This new fs will be managed by the host_fs API.

Also made the scratch filesystem resizable on all hosts.

Tested with install of hardware Standard and AIO-DX labs. Also
tested install of a vbox AIO-SX lab.

Partial-Bug: 1830142
Depends-On: https://review.opendev.org/671120

Change-Id: I968f84b8ba7a069ec3d7027d4eb4a7355a06d9d3
Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>
2019-07-17 16:18:36 -04:00
Zuul
0f93f3e269 Merge "AIO reaffine tasks and k8s-infra during startup" 2019-07-16 19:04:34 +00:00
Zuul
d096b9a685 Merge "Revert "Changing tiller pod networking settings to improve swact time"" 2019-07-16 17:40:22 +00:00
Jim Gauld
9e7170c9c1 AIO reaffine tasks and k8s-infra during startup
This update reimplements the affine-tasks init script and service to
dynamically reaffine tasks and k8s-infra cgroup cpuset on AIO nodes.
This accomodates CPU intensive phases of work. Tasks are initially
allowed to float across all cores. Once system is at steady-state,
this will ensure that K8S pods are constrained to platform cores and
do not run on cores with VMs/containers.

This will speedup the first stx-application apply, as well as pod
recovery after lock/unlock, reboot, and controller swact.

This script waits forever for sufficient platform readiness criteria
(e.g., system critical pods are recovered, critical openstack pods
are running, nova-compute pod is running) before reaffining back
to platform cores.

This corrects the pod affinity problem seen on AIO introduced by fix
for bug: 1826592, commit e513baad44181f667085886007632d0ebf79eeb0,
i.e., fix allowed the AIO to not timeout, but left pods floating.

Change-Id: Ic257378eac451904a200a0f2e79f7bc4f8373009
Partial-Bug: 1832781
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2019-07-16 12:46:30 -04:00
Bart Wensley
061ee1ebc0 Revert "Changing tiller pod networking settings to improve swact time"
This reverts commit 4802f1d96a1217124e39a057fd7a05e22177b81c.
The change made is no longer necessary due to commit 9a4b6b6a.
The playbookconfig code was moved to the ansible-playbooks repo
and will be removed there.

Conflicts:
	playbookconfig/centos/build_srpm.data
	playbookconfig/playbookconfig/playbooks/bootstrap/roles/bringup-essential-services/tasks/bringup_helm.yml
	puppet-manifests/centos/build_srpm.data
	puppet-manifests/src/modules/platform/manifests/helm.pp

Change-Id: I20a38c1ad882bebb6e1208f43d6582bc399e9e87
Related-Bug: 1817941
Signed-off-by: Bart Wensley <barton.wensley@windriver.com>
2019-07-16 10:55:23 -05:00
Zuul
1071020a93 Merge "Fix domain setting for Barbican during bootstrap" 2019-07-16 14:00:09 +00:00
Alex Kozyrev
06bb9b3245 Fix domain setting for Barbican during bootstrap
Barbican returns "503 Service Unavailable" during bootstrap
phase of StarlingX. This happens because Keystone auth token
lacks domain details for Barbican. Need to explicitly specify
project_domain_name and user_domain_name in Barbican config.

Change-Id: I4bf6b275c1eb271b62a2e7a1bc72c049f193afc4
Closes-bug: 1834670
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
2019-07-15 10:53:42 -04:00
Tyler Smith
cf2d41d0bf Zero Touch Provisioning changes for subcloud configuration
- Cleaning up old RegionOne endpoints during runtime manifest apply
- Configuring dcdbsync endpoints in subclouds

Depends-On: https://review.opendev.org/#/c/670321/
Change-Id: I14729b579646aab9acecc8a953513b87b16363d2
Story: 2004766
Task: 35756
Signed-off-by: Tyler Smith <tyler.smith@windriver.com>
2019-07-11 13:32:23 -04:00
Zuul
dced0b291a Merge "ANSIBLE Bootstrap changes for System Controller" 2019-07-11 17:29:48 +00:00