EdgeWorker management phase one

Introduce edgeworker personality

Story: 2008129

Change-Id: If74fb3d3863b05df9875a13e414f02bbfae4842e
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
This commit is contained in:
Mingyuan Qi 2020-09-14 11:02:26 +08:00
parent 5b3e10cb1c
commit d9005f6ff5
1 changed files with 393 additions and 0 deletions

View File

@ -0,0 +1,393 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode
===============================
EdgeWorker Management Phase One
===============================
Storyboard:
https://storyboard.openstack.org/#!/story/2008129
This story will introduce a new node personality 'edgeworker' to StarlingX.
The biggest difference between 'edgeworker' node and 'worker' node is that
the OS of 'edgeworker' nodes are not installed or configured by StarlingX
controller and they may vary due to different cases, for example Ubuntu,
Debian, Fedora... The basic idea is to deploy containerd and kubelet service to
the 'edgeworker' nodes, so that the StarlingX Kubernetes platform will be
extended to 'edgeworker' nodes.
The second difference is that 'edgeworker' are usually deployed close to edge
devices while 'worker' nodes are usually servers deployed in the server room.
The 'edgeworker' personality are suitable for the nodes that users may want to
install their customized OS and may require a deployment physically close to
the data producer or consumer devices.
The way to leverage advantages of StarlingX functionality is to get most flock
agents containerized and enabled on edgeworker nodes. That is also aligned with
long term strategy of flock service containerization.
The whole topic is broken down into 4 phases approximately:
* Phase One
* Add edgeworker personality
* Add ansible-playbook to join edgeworker node to STX K8S cluster
* Support Ubuntu and CentOS as target OS
* Phase Two
* Containerize a set of flock agents to get edgeworker node inventoried
* Enhance multiple Ceph cluster operation
* Phase Three
* Support Openstack running on edgeworker nodes
* Support L3/Tunnel mgmt. network
* Containerize rest of flock agents
* Phase Four
* Enable software management on edgeworker nodes
* Enable optional authentication for new nodes
* Extend target OS support
This spec focuses on *Phase One*.
Problem description
===================
In a typical IoT or industrial use case, StarlingX is usually used to
facilitate the whole edge cluster setup and management. But there are different
types of nodes existing in the cluster that are not in current StarlingX
management scope. Various reasons are hindering administrator to get these
nodes deployed as 'worker' nodes, from software to hardware.
In particular, the common setbacks are:
* OS of the nodes could not or don't want to be installed by StarlingX.
* The nodes are running a Type I hypervisor.
* The hardware resources do not meet StarlingX worker node's minimum
requirement.
* The nodes are connected to StarlingX controllers over a L3 network.
In this story, these nodes are categorized into a new personality to
distinguish from 'worker' nodes. The new personality is called 'edgeworker'
since these nodes are usually deployed close to the edge device side. An edge
device could probably be an I/O device, a camera, a servo motor or a sensor.
The first three setbacks will be addressed in this phase one, while network
requirement and manageability enhancement will be addressed in the next few
phases. Separate specs for different phases will be submitted during different
releases.
Use Cases
---------
* Administrator wants to have all the 'edgeworker' nodes managed by StarlingX
* Make 'edgeworker' in the host list (Phase one)
* Check/Lock/Unlock 'edgeworker' node state (Phase two)
* Query 'edgeworker' hardware resources info (Phase two)
* Configure 'edgeworker' resources for specific usage (Phase two and later)
* Manage alarms generated by 'edgeworker' (Phase three)
* Update 'edgeworker' packages (Phase four)
* Administrator does not want StarlingX to install OS on the 'edgeworker'
nodes
* User wants to orchestrate container workloads to 'edgeworker' nodes
* User wants to orchestrate VM workloads to 'edgeworker' nodes as an option
Proposed change
===============
**Edgeworker personality**
Adding a new personality will require changes in sysinv db, sysinv api and
sysinv conductor, as well as cgts-client.
#. *sysinv db*
In order to get 'edgeworker' node into sysinv, the 'edgeworker' value will
be added to enum type invPersonalityEnum in sysinv db. Accordingly, adding
'edgeworker' to db models is required as well. After this change, a host
from sysinv db perspective could be assigned as edgeworker personality.
#. *sysinv api*
Mainly focus on host api, adding checks during host add for 'edgeworker'
hosts.
Possible checks:
* mgmt ip if mgmt network is not dynamic
* host name validation
* personality check
#. *sysinv conductor*
sysinv conductor is responsible for mgmt ip allocation when the mgmt
network is in dynamic type.
#. *cgts client*
Add 'edgeworker' choice for argument 'personality' of host-add/
host-update command.
After underlying changes applied, the administrator is able to use
::
# system host-add -n <hostname> -p edgeworker
or
# system host-update <id> hostname=<hostname> personality=edgeworker
to add an edgeworker node to the inventory.
When an edgeworker node is added to inventory, sysinv could provide
following services:
* DHCP service (Phase one)
* Host lock/unlock (Phase two)
* Host interface modification and assignment (Phase two)
* Host hardware resource query (Phase two)
* Label assignment (Phase two)
The function that will not be supported on edgeworker:
* host-upgrade
* bmc integration
An edgeworker node is not a server, but a normal PC like industrial
PC/NUC/workstation. BMC is not a required feature for those nodes. The node
life cycle management is done in-band or by the maintainer manually. The use
case which uses edgeworker nodes does not expect an out-of-band node
management for these nodes.
Additional semantic check will be added for these functions.
Other functions will be described in detail in each phase's spec.
**ansible playbook for provisioning edgeworker nodes**
The main steps for provisioning an edgeworker node are installing kubelet,
kubeadm and containerd packages to the node due to different Linux
distributions and joining the node to StarlingX Kubernetes platform. Besides
these steps, system configurations like ntp setup, interface configuration,
dns setup etc. are needed as well.
The first two Linux distributions we propose to support for edgeworker are
*Ubuntu* and *CentOS*.
The version of all the kubernetes packages on edgeworker nodes must be exactly
the same as the packages on controllers. If they are not, the playbook will
reinstall the packages to the proper version.
The playbook sequence to provision an edgeworker node:
#. Preparations on controller
* Send containerd config and cert to edgeworker
* Generate K8S bootstrap token and calculate certificate hash
#. Preparations on edgeworker
* Config network (interface and dns)
* Setup proxy if needed
* Install essential packages
* Setup ntp
#. Add edgeworker node to STX Kubernetes
* Install containerd, kubelet, kubeadm packages (based on OS)
* Config sysctl and swap
* Join k8s cluster
* Download images
There will be one playbook with different roles included.
Alternatives
------------
There are several open source projects that can provision a Kubernetes node.
* Kubespray
Kubespray [1]_ is a composition of Ansible playbooks, inventory, provisioning
tools, and domain knowledge for generic OS/Kubernetes clusters configuration
management tasks. Kubespray performs generic OS configuration as well as
Kubernetes cluster bootstrapping.
Kubespray provides the whole functionality of provisioning a Kubernetes
node just like the edgeworker provisioning playbook does. However, Kubespray
supports multiple container runtimes, multiple CNI plugins and control plane
bootstrap which are too much in functionality to provision an edgeworker.
What edgeworker need is a playbook for certain container runtime, certain
CNI plugins and provision a Kubernetes node only.
* KubeEdge
KubeEdge [2]_ is an open source system for extending native containerized
application orchestration capabilities to hosts at Edge. KubeEdge could
run upon an existing Kubernetes cluster and deploy a customized kubelet
service called 'edged' to the edge node. In between the apiserver and edged,
the EdgeController is the bridge who manages edge nodes and pods metadata
so that the data can be targeted to a specific edge node.
KubeEdge is able to provision edge nodes from cloud. But the kubelet service
is customized to fulfill the specific requirement that the administrator is
able to manage the pods running on edge nodes from public cloud platform.
The customized kubelet(edged) brings compatibility issues when Kubernetes
upgrading to a newer release, which leads to an extra effort to test/upgrade
KubeEdge during each Kubernetes upgrade since edgeworker provision is a key
step to enable these nodes.
Besides, KubeEdge has a whole edge device management logic that is not in
current StarlingX platform scope.
Data model impact
-----------------
The only data model change is to insert 'edgeworker' to 'invPersonalityEnum'
in sysinv db model.
REST API impact
---------------
None
Security impact
---------------
The potential security threat and mitigation could be:
* Malicious node
It must be guaranteed by the administrator that no unauthorized node could
physically connect into the management network.
The authentication of the edgeworker node onboard will be introduced in the
later phases.
* Malicious packages in edgeworker node
It must be guaranteed by the administrator that the packages running in
edgeworker nodes are secure since the OS is managed by the administrator.
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
The deployer is required to run edgeworker provision playbook after adding or
updating the node as edgeworker personality.
Developer impact
----------------
None
Upgrade impact
--------------
The kubelet needs to be upgraded during the Kubernetes upgrade process. The
upgrade process will trigger an additional script/playbook to check the version
of the packages on edgeworker nodes, and upgrade them according to their own
distribution.
The distribution's repo may not update the corresponding packages to the newest
version, due to Kubernetes version skew support policy [3]_ , up to two minor
versions older against apiserver is acceptable for kubelet and kube-proxy.
The SW patching/updating will be addressed in phase four. It could either be a
3rd party solution or plugins of current SW management. Because current SW
management could not patch/update packages other than RPMs, while the OS of
edgeworker nodes could be different types of packages.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Mingyuan Qi
Repos Impacted
--------------
config
ansible-playbook
Work Items
----------
The work items are already introduced in section `Proposed change`_ above.
Dependencies
============
None
Testing
=======
* Sysinv unit test
* Sysinv host operation test
* Adding edgeworker nodes in different deploy mode test
* Simplex
* Duplex
* Standard
* Ansible-playbook test for each target OS
* Host configuration
* Package installation
* Edgeworker node join to the Kubernetes cluster
Documentation Impact
====================
* Add a new page to describe the edgeworker nodes requirement, limitation and use case.
* Add new page to describe the following deployment:
* Duplex + edgeworker
* Standard + edgeworker
* Modify all deployment docs to insert an option to deploy edgewoker nodes and link it
to underlying deployment with edgeworker nodes.
References
==========
.. [1] Kubespray https://github.com/kubernetes-sigs/kubespray
.. [2] KubeEdge https://kubeedge.io
.. [3] Kubernetes version skew policy https://kubernetes.io/docs/setup/release/version-skew-policy/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - stx.5.0
- Edgeworker management phase one introduced