EdgeWorker Management Phase One
This story will introduce a new node personality 'edgeworker' to StarlingX.
The biggest difference between 'edgeworker' node and 'worker' node is that the OS of 'edgeworker' nodes are not installed or configured by StarlingX controller and they may vary due to different cases, for example Ubuntu, Debian, Fedora... The basic idea is to deploy containerd and kubelet service to the 'edgeworker' nodes, so that the StarlingX Kubernetes platform will be extended to 'edgeworker' nodes.
The second difference is that 'edgeworker' are usually deployed close to edge devices while 'worker' nodes are usually servers deployed in the server room. The 'edgeworker' personality are suitable for the nodes that users may want to install their customized OS and may require a deployment physically close to the data producer or consumer devices.
The way to leverage advantages of StarlingX functionality is to get most flock agents containerized and enabled on edgeworker nodes. That is also aligned with long term strategy of flock service containerization.
The whole topic is broken down into 4 phases approximately:
- Phase One
- Add edgeworker personality
- Add ansible-playbook to join edgeworker node to STX K8S cluster
- Support Ubuntu and CentOS as target OS
- Phase Two
- Containerize a set of flock agents to get edgeworker node inventoried
- Enhance multiple Ceph cluster operation
- Phase Three
- Support Openstack running on edgeworker nodes
- Support L3/Tunnel mgmt. network
- Containerize rest of flock agents
- Phase Four
- Enable software management on edgeworker nodes
- Enable optional authentication for new nodes
- Extend target OS support
This spec focuses on Phase One.
In a typical IoT or industrial use case, StarlingX is usually used to facilitate the whole edge cluster setup and management. But there are different types of nodes existing in the cluster that are not in current StarlingX management scope. Various reasons are hindering administrator to get these nodes deployed as 'worker' nodes, from software to hardware. In particular, the common setbacks are:
- OS of the nodes could not or don't want to be installed by StarlingX.
- The nodes are running a Type I hypervisor.
- The hardware resources do not meet StarlingX worker node's minimum requirement.
- The nodes are connected to StarlingX controllers over a L3 network.
In this story, these nodes are categorized into a new personality to distinguish from 'worker' nodes. The new personality is called 'edgeworker' since these nodes are usually deployed close to the edge device side. An edge device could probably be an I/O device, a camera, a servo motor or a sensor.
The first three setbacks will be addressed in this phase one, while network requirement and manageability enhancement will be addressed in the next few phases. Separate specs for different phases will be submitted during different releases.
- Administrator wants to have all the 'edgeworker' nodes managed by StarlingX
- Make 'edgeworker' in the host list (Phase one)
- Check/Lock/Unlock 'edgeworker' node state (Phase two)
- Query 'edgeworker' hardware resources info (Phase two)
- Configure 'edgeworker' resources for specific usage (Phase two and later)
- Manage alarms generated by 'edgeworker' (Phase three)
- Update 'edgeworker' packages (Phase four)
- Administrator does not want StarlingX to install OS on the 'edgeworker' nodes
- User wants to orchestrate container workloads to 'edgeworker' nodes
- User wants to orchestrate VM workloads to 'edgeworker' nodes as an option
Adding a new personality will require changes in sysinv db, sysinv api and sysinv conductor, as well as cgts-client.
In order to get 'edgeworker' node into sysinv, the 'edgeworker' value will be added to enum type invPersonalityEnum in sysinv db. Accordingly, adding 'edgeworker' to db models is required as well. After this change, a host from sysinv db perspective could be assigned as edgeworker personality.
Mainly focus on host api, adding checks during host add for 'edgeworker' hosts. Possible checks:
- mgmt ip if mgmt network is not dynamic
- host name validation
- personality check
sysinv conductor is responsible for mgmt ip allocation when the mgmt network is in dynamic type.
Add 'edgeworker' choice for argument 'personality' of host-add/ host-update command.
After underlying changes applied, the administrator is able to use
# system host-add -n <hostname> -p edgeworker or # system host-update <id> hostname=<hostname> personality=edgeworker
to add an edgeworker node to the inventory.
When an edgeworker node is added to inventory, sysinv could provide following services:
- DHCP service (Phase one)
- Host lock/unlock (Phase two)
- Host interface modification and assignment (Phase two)
- Host hardware resource query (Phase two)
- Label assignment (Phase two)
The function that will not be supported on edgeworker:
An edgeworker node is not a server, but a normal PC like industrial PC/NUC/workstation. BMC is not a required feature for those nodes. The node life cycle management is done in-band or by the maintainer manually. The use case which uses edgeworker nodes does not expect an out-of-band node management for these nodes.
Additional semantic check will be added for these functions.
Other functions will be described in detail in each phase's spec.
ansible playbook for provisioning edgeworker nodes
The main steps for provisioning an edgeworker node are installing kubelet, kubeadm and containerd packages to the node due to different Linux distributions and joining the node to StarlingX Kubernetes platform. Besides these steps, system configurations like ntp setup, interface configuration, dns setup etc. are needed as well.
The first two Linux distributions we propose to support for edgeworker are Ubuntu and CentOS.
The version of all the kubernetes packages on edgeworker nodes must be exactly the same as the packages on controllers. If they are not, the playbook will reinstall the packages to the proper version.
The playbook sequence to provision an edgeworker node:
- Preparations on controller
- Send containerd config and cert to edgeworker
- Generate K8S bootstrap token and calculate certificate hash
- Preparations on edgeworker
- Config network (interface and dns)
- Setup proxy if needed
- Install essential packages
- Setup ntp
- Add edgeworker node to STX Kubernetes
- Install containerd, kubelet, kubeadm packages (based on OS)
- Config sysctl and swap
- Join k8s cluster
- Download images
There will be one playbook with different roles included.
There are several open source projects that can provision a Kubernetes node.
Kubespray1 is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks. Kubespray performs generic OS configuration as well as Kubernetes cluster bootstrapping.
Kubespray provides the whole functionality of provisioning a Kubernetes node just like the edgeworker provisioning playbook does. However, Kubespray supports multiple container runtimes, multiple CNI plugins and control plane bootstrap which are too much in functionality to provision an edgeworker.
What edgeworker need is a playbook for certain container runtime, certain CNI plugins and provision a Kubernetes node only.
KubeEdge2 is an open source system for extending native containerized application orchestration capabilities to hosts at Edge. KubeEdge could run upon an existing Kubernetes cluster and deploy a customized kubelet service called 'edged' to the edge node. In between the apiserver and edged, the EdgeController is the bridge who manages edge nodes and pods metadata so that the data can be targeted to a specific edge node.
KubeEdge is able to provision edge nodes from cloud. But the kubelet service is customized to fulfill the specific requirement that the administrator is able to manage the pods running on edge nodes from public cloud platform. The customized kubelet(edged) brings compatibility issues when Kubernetes upgrading to a newer release, which leads to an extra effort to test/upgrade KubeEdge during each Kubernetes upgrade since edgeworker provision is a key step to enable these nodes.
Besides, KubeEdge has a whole edge device management logic that is not in current StarlingX platform scope.
Data model impact
The only data model change is to insert 'edgeworker' to 'invPersonalityEnum' in sysinv db model.
REST API impact
The potential security threat and mitigation could be:
It must be guaranteed by the administrator that no unauthorized node could physically connect into the management network. The authentication of the edgeworker node onboard will be introduced in the later phases.
Malicious packages in edgeworker node
It must be guaranteed by the administrator that the packages running in edgeworker nodes are secure since the OS is managed by the administrator.
Other end user impact
Other deployer impact
The deployer is required to run edgeworker provision playbook after adding or updating the node as edgeworker personality.
The kubelet needs to be upgraded during the Kubernetes upgrade process. The upgrade process will trigger an additional script/playbook to check the version of the packages on edgeworker nodes, and upgrade them according to their own distribution.
The distribution's repo may not update the corresponding packages to the newest version, due to Kubernetes version skew support policy3 , up to two minor versions older against apiserver is acceptable for kubelet and kube-proxy.
The SW patching/updating will be addressed in phase four. It could either be a 3rd party solution or plugins of current SW management. Because current SW management could not patch/update packages other than RPMs, while the OS of edgeworker nodes could be different types of packages.
- Primary assignee:
The work items are already introduced in section Proposed change above.
- Sysinv unit test
- Sysinv host operation test
- Adding edgeworker nodes in different deploy mode test
- Ansible-playbook test for each target OS
- Host configuration
- Package installation
- Edgeworker node join to the Kubernetes cluster
- Add a new page to describe the edgeworker nodes requirement, limitation and use case.
- Add new page to describe the following deployment:
- Duplex + edgeworker
- Standard + edgeworker
- Modify all deployment docs to insert an option to deploy edgewoker nodes and link it to underlying deployment with edgeworker nodes.
|stx.5.0||Edgeworker management phase one introduced|