Updates for Rook 1.6.2 and Ceph 15.2.11

This PS is to update the Rook yaml files for version v1.6.2. Additionally, the version of Ceph is upgraded to v15.2.11 and Ceph-CSI is upgraded to v3.3.1.

v1.6 provides a few features the storage team wants:

* The operator supports upgrading multiple OSDs in parallel
* LVM no longer used to provision OSDs by default
* Monitor failover can be disabled if needed
* Operator support for Ceph Pacific (v16)
* Ceph 15.2.11 by default
* CephClient CRD standardized to controller-runtime library (kubebuilder)

https://github.com/kubernetes-sigs/controller-runtime

* Pod Disruption Budgets enabled by default.

https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md

More notes:

* There are many indentation changes in common.yaml
* There is now a variable in operator.yaml for enabling host networking for the CSI pods. Default is to use host network.

* CSI image updates:

ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.3.1"
ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0"

* There is a very large update to crds.yaml largely due to the controller-runtime being employed.

* Ceph 15.2.11 needed for CVE-2021-20288

Change-Id: I5cf0cf63bfcf4b0ea1d242d6eae2f53adda7be5e
This commit is contained in:
Frank Ritchie 2021-05-11 13:56:37 -04:00 committed by Alexey
parent 2946a13806
commit e7130f4301
12 changed files with 8179 additions and 1824 deletions

View File

@ -27,7 +27,6 @@ spec:
preserveFilesystemOnDelete: true
# The metadata service (mds) configuration
metadataServer:
# The affinity rules to apply to the mds deployment
placement:
# nodeAffinity:
@ -72,4 +71,3 @@ spec:
# A key/value list of labels
labels:
# key: value

View File

@ -58,7 +58,7 @@ spec:
# quota in bytes and/or objects, default value is 0 (unlimited)
# see https://docs.ceph.com/en/latest/rados/operations/pools/#set-pool-quotas
# quotas:
# maxSize: "10Gi" # valid suffixes include K, M, G, T, P, Ki, Mi, Gi, Ti, Pi
# maxSize: "10Gi" # valid suffixes include k, M, G, T, P, E, Ki, Mi, Gi, Ti, Pi, Ei
# maxObjects: 1000000000 # 1 billion objects
# A key/value list of annotations
annotations:

View File

@ -8,5 +8,5 @@ spec:
replicated:
size: 2
quotas:
maxSize: "10Gi" # valid suffixes include K, M, G, T, P, Ki, Mi, Gi, Ti, Pi
maxSize: "10Gi" # valid suffixes include k, M, G, T, P, E, Ki, Mi, Gi, Ti, Pi, Ei
maxObjects: 1000000000 # 1 billion objects

View File

@ -8,6 +8,5 @@ spec:
replicated:
size: 3
quotas:
maxSize: "0" # valid suffixes include K, M, G, T, P, Ki, Mi, Gi, Ti, Pi, eg: "10Gi"
# "0" means no quotas. Since rook 1.5.9 you must use string as a value's type
maxSize: "0" # e.g. "10Gi" - valid suffixes include k, M, G, T, P, E, Ki, Mi, Gi, Ti, Pi, Ei
maxObjects: 0 # 1000000000 = billion objects, 0 means no quotas

View File

@ -9,7 +9,6 @@
#
# Most of the sections are prefixed with a 'OLM' keyword which is used to build our CSV for an OLM (Operator Life Cycle manager)
###################################################################################################################
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
@ -122,6 +121,12 @@ rules:
- create
- update
- delete
- apiGroups:
- batch
resources:
- cronjobs
verbs:
- delete
---
# The cluster role for managing the Rook CRDs
apiVersion: rbac.authorization.k8s.io/v1
@ -173,6 +178,7 @@ rules:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
@ -451,6 +457,8 @@ rules:
- get
- list
- watch
- create
- update
- delete
- apiGroups:
- batch
@ -613,8 +621,8 @@ metadata:
# need to be renamed with a value that will match before others.
name: 00-rook-privileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'runtime/default'
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default'
seccomp.security.alpha.kubernetes.io/allowedProfileNames: "runtime/default"
seccomp.security.alpha.kubernetes.io/defaultProfileName: "runtime/default"
spec:
privileged: true
allowedCapabilities:
@ -682,7 +690,7 @@ spec:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: 'psp:rook'
name: "psp:rook"
rules:
- apiGroups:
- policy
@ -700,7 +708,7 @@ metadata:
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: 'psp:rook'
name: "psp:rook"
subjects:
- kind: ServiceAccount
name: rook-ceph-system
@ -893,7 +901,7 @@ metadata:
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: 'psp:rook'
name: "psp:rook"
subjects:
- kind: ServiceAccount
name: rook-csi-cephfs-plugin-sa
@ -906,7 +914,7 @@ metadata:
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: 'psp:rook'
name: "psp:rook"
subjects:
- kind: ServiceAccount
name: rook-csi-cephfs-provisioner-sa
@ -1066,6 +1074,18 @@ rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get"]
- apiGroups: ["replication.storage.openshift.io"]
resources: ["volumereplications", "volumereplicationclasses"]
verbs: ["create", "delete", "get", "list", "patch", "update", "watch"]
- apiGroups: ["replication.storage.openshift.io"]
resources: ["volumereplications/finalizers"]
verbs: ["update"]
- apiGroups: ["replication.storage.openshift.io"]
resources: ["volumereplications/status"]
verbs: ["get", "patch", "update"]
- apiGroups: ["replication.storage.openshift.io"]
resources: ["volumereplicationclasses/status"]
verbs: ["get"]
# OLM: END CSI RBD CLUSTER ROLE
# OLM: BEGIN CSI RBD CLUSTER ROLEBINDING
---
@ -1076,7 +1096,7 @@ metadata:
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: 'psp:rook'
name: "psp:rook"
subjects:
- kind: ServiceAccount
name: rook-csi-rbd-plugin-sa
@ -1089,7 +1109,7 @@ metadata:
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: 'psp:rook'
name: "psp:rook"
subjects:
- kind: ServiceAccount
name: rook-csi-rbd-provisioner-sa

File diff suppressed because it is too large Load Diff

View File

@ -29,6 +29,11 @@ data:
ROOK_CSI_ENABLE_RBD: "true"
ROOK_CSI_ENABLE_GRPC_METRICS: "false"
# Set to true to enable host networking for CSI CephFS and RBD nodeplugins. This may be necessary
# in some network configurations where the SDN does not provide access to an external cluster or
# there is significant drop in read/write performance.
# CSI_ENABLE_HOST_NETWORK: "true"
# Set logging level for csi containers.
# Supported values from 0 to 5. 0 for general useful logs, 5 for trace level verbosity.
# CSI_LOG_LEVEL: "0"
@ -64,11 +69,11 @@ data:
# The default version of CSI supported by Rook will be started. To change the version
# of the CSI driver to something other than what is officially supported, change
# these images to the desired release of the CSI driver.
ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.2.1"
ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.3.1"
ROOK_CSI_REGISTRAR_IMAGE: "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_RESIZER_IMAGE: "k8s.gcr.io/sig-storage/csi-resizer:v1.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "k8s.gcr.io/sig-storage/csi-provisioner:v2.0.4"
ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.2"
ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0"
ROOK_CSI_ATTACHER_IMAGE: "k8s.gcr.io/sig-storage/csi-attacher:v3.0.2"
# (Optional) set user created priorityclassName for csi plugin pods.
@ -274,6 +279,16 @@ data:
# Whether the OBC provisioner should watch on the operator namespace or not, if not the namespace of the cluster will be used
ROOK_OBC_WATCH_OPERATOR_NAMESPACE: "true"
# Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
# in favor of the CSI driver.
ROOK_ENABLE_FLEX_DRIVER: "false"
# Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
# This daemon does not need to run if you are only going to create your OSDs based on StorageClassDeviceSets with PVCs.
ROOK_ENABLE_DISCOVERY_DAEMON: "false"
# Enable volume replication controller
CSI_ENABLE_VOLUME_REPLICATION: "false"
# CSI_VOLUME_REPLICATION_IMAGE: "quay.io/csiaddons/volumereplication-operator:v0.1.0"
# (Optional) Admission controller NodeAffinity.
# ADMISSION_CONTROLLER_NODE_AFFINITY: "role=storage-node; storage=rook, ceph"
# (Optional) Admission controller tolerations list. Put here list of taints you want to tolerate in YAML format.
@ -308,7 +323,7 @@ spec:
serviceAccountName: rook-ceph-system
containers:
- name: rook-ceph-operator
image: rook/ceph:v1.5.9
image: rook/ceph:v1.6.2
args: ["ceph", "operator"]
volumeMounts:
- mountPath: /var/lib/rook
@ -386,12 +401,6 @@ spec:
# (Optional) Discover Agent Pod Labels.
# - name: DISCOVER_AGENT_POD_LABELS
# value: "key1=value1,key2=value2"
# Allow rook to create multiple file systems. Note: This is considered
# an experimental feature in Ceph as described at
# http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
# which might cause mons to crash as seen in https://github.com/rook/rook/issues/1027
- name: ROOK_ALLOW_MULTIPLE_FILESYSTEMS
value: "false"
# The logging level for the operator: INFO | DEBUG
- name: ROOK_LOG_LEVEL
@ -430,16 +439,6 @@ spec:
- name: DISCOVER_DAEMON_UDEV_BLACKLIST
value: "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
# Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
# in favor of the CSI driver.
- name: ROOK_ENABLE_FLEX_DRIVER
value: "false"
# Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
# This daemon does not need to run if you are only going to create your OSDs based on StorageClassDeviceSets with PVCs.
- name: ROOK_ENABLE_DISCOVERY_DAEMON
value: "false"
# Time to wait until the node controller will move Rook pods to other
# nodes after detecting an unreachable node.
# Pods affected by this setting are:

View File

@ -108,12 +108,12 @@ spec:
rook-operator:
rook-ceph-operator:
rook-ceph-operator:
image: rook/ceph:v1.5.9
image: rook/ceph:v1.6.2
rook-ceph-operator-config:
ceph_daemon:
image: ceph/ceph:v15.2.10
image: ceph/ceph:v15.2.11
rook_csi_ceph_image:
image: quay.io/cephcsi/cephcsi:v3.2.1
image: quay.io/cephcsi/cephcsi:v3.3.1
rook_csi_registrar_image:
image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1
rook_csi_resizer_image:
@ -121,15 +121,15 @@ spec:
rook_csi_provisioner_image:
image: k8s.gcr.io/sig-storage/csi-provisioner:v2.0.4
rook_csi_snapshotter_image:
image: k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.2
image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
rook_csi_attacher_image:
image: k8s.gcr.io/sig-storage/csi-attacher:v3.0.2
storage-rook:
ceph:
ceph-version:
image: ceph/ceph:v15.2.10
image: ceph/ceph:v15.2.11
rook-ceph-tools:
image: rook/ceph:v1.5.9
image: rook/ceph:v1.6.2
image_components:
# image_components are organized by

View File

@ -16,6 +16,7 @@ data:
mon_warn_on_pool_no_redundancy = true
# # You can add other default configuration sections
# # to create fully customized ceph.conf
# [mon]
[mon]
auth_allow_insecure_global_id_reclaim = false
# [osd]
# [rgw]

View File

@ -6,12 +6,16 @@ metadata:
spec:
dataDirHostPath: /var/lib/rook
cephVersion:
#see: https://tracker.ceph.com/issues/48797
image: ceph/ceph:v15.2.10
image: ceph/ceph:v15.2.11
#allowUnsupported: true
mon:
count: 3
allowMultiplePerNode: false
mgr:
count: 1
modules:
- name: pg_autoscaler
enabled: true
dashboard:
enabled: true
# If you are going to use the dashboard together with ingress-controller,
@ -57,4 +61,17 @@ spec:
# deviceFilter: "^/dev/sd[c-h]"
# Also you can configure each device and/or each node. Please refer to the official rook
# documentation for the branch 1.5.x
# The section for configuring management of daemon disruptions during upgrade or fencing.
disruptionManagement:
# If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
# via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
# block eviction of OSDs by default and unblock them safely when drains are detected.
managePodBudgets: true
# A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
# default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
osdMaintenanceTimeout: 30
# A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.
# Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.
# No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
pgHealthCheckTimeout: 0
---

View File

@ -19,7 +19,7 @@ spec:
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: rook-ceph-tools
image: rook/ceph:v1.5.9
image: rook/ceph:v1.6.2
command: ["/tini"]
args: ["-g", "--", "/usr/local/bin/toolbox.sh"]
imagePullPolicy: IfNotPresent