diff --git a/specs/zed/database-synchronization.rst b/specs/zed/database-synchronization.rst new file mode 100644 index 00000000..94b48318 --- /dev/null +++ b/specs/zed/database-synchronization.rst @@ -0,0 +1,360 @@ +========================================================== +Database Synchronization between TackerDB and KubernetesDB +========================================================== + +https://blueprints.launchpad.net/tacker/+spec/database-synchronization + +Problem description +=================== + +In the current implementation, TackerDB and VIM DB(KubernetesDB) +such as etcd in Kubernetes are only synchronized when SOL002 heal is executed. +However, when Kubernetes auto-heals or auto-scales and the pod restarts, +the information is only reflected in the KubernetesDB and not in TackerDB. +As a result, the following problems occurs. + +* When a user uses a command or API to get information about + a Pod resource, they see the information stored in TackerDB + and get an out-of-date output. + +* If a user attempts to scale when the database is inconsistent, the Pod + scales at a number that the user does not intend because it sees and + processes old information. + +.. note:: The existing heal process already supports the ability + to synchronize information about a Pod. + +This specification supposes the below changes: + +* Adds periodic synchronization of Pod information between + TackerDB and KubernetesDB + +Proposed Change +=============== + +The following functions are incorporated in this specification. + ++ Regularly retrieve information of pods that TackerDB and + KubernetesDB hold, compare the data, + and update TackerDB from KubernetesDB if there are differences. + + .. note:: When the TackerDB is updated by synchronization processing, + the information is shown in a log. + The log contains the name of the old pod and the new one. + ++ Enable config to specify the frequency of synchronization. + +The attributes in the TackerDB to be updated are as follows: + ++ vnf_instantiated_info.scaleStatus.scaleLevel: Calculate the `scaleLevel` + from the number of Pods retrieved from KubernetesDB and update it if it + differs from TackerDB. ++ vnf_instantiated_info.vnfc_resource_info: If the number of + running Pods changes, increase or decrease the number of data in + `vnfcResourceInfo` to match the number of running Pods. ++ vnf_instantiated_info.vnfc_resource_info.computeResource.resourceId: + Update resourceId with the new resource ID as Pod name. ++ VnfInstanceV2.instantiatedVnfInfo.scaleStatus.scaleLevel: Calculate + the `scaleLevel` from the number of Pods retrieved from KubernetesDB + and update it if it differs from TackerDB. ++ VnfInstanceV2.instantiatedVnfInfo.vnfcResourceInfo: If the number of + running Pods changes, increase or decrease the number of data in + `vnfcResourceInfo` and `VnfInstanceV2.instantiatedVnfInfo.vnfcInfo` + to match the number of running Pods. ++ VnfInstanceV2.instantiatedVnfInfo.vnfcResourceInfo. + computeResource.resourceId: + Update resourceId with the new resource ID as Pod name. + +.. note:: Get the following resource information from KubernetesDB + + * Pod + * Deployment + * Replicaset + * Daemonset + * StatefulSet + + Pods under the control of StatefulSet have static names, + thus resourceId does not change even if they are recreated by + Kubernetes. However, Tacker must fetch the current number of + Pods to update ScaleStatus during DB synchronization. + +Synchronizes between TackerDB and KubernetesDB +---------------------------------------------- + +Periodic database synchronization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This feature periodically synchronizes information related to +Pod resources between TackerDB and KubernetesDB. + +The diagram below shows the database synchronization. + +:: + + + +----------------------------------------------------------+ + | VNFM | + | +-----------------------+ | + +-------------------------------+ | | Tacker-conductor | | + | Kubernetes | | | +--------------+ | | + | | | | | VnflcmDriver | | | + | +--------------------------+ | | | | (v1/v2) | | | + | | Master | | | | +---+----------+ | | + | | | | | | | 1. Start periodic database synchronization | + | | | | | | | | | + | | | | | | | | | + | | +------+ +---------+ | | | | +---v----------+ | | + | | | etcd | | | | | | | | Kubernetes | | +-----------+ | + | | | |<-->|kube- |<---------------------------+ InfraDriver +------------------->| TackerDB | | + | | +------+ |apiserver| | | 3. Get pod | | | | | | | | + | | | | | | information | | +--------------+ | +-----------+ | + | | | | | | | | 4. Check difference | 2. Get pod information | + | | | | | | | | | | + | | | | | | | | | 5. If there are difference, | + | | +---------+ | | | | | update pod information | + | | | | | | | | + | +--------------------------+ | | +-----------------------+ | + | | | | + +-------------------------------+ +----------------------------------------------------------+ + + + + +Sequence for periodic database synchronization: + +.. seqdiag:: + + seqdiag { + node_width = 80; + edge_length = 100; + + "Client" + "Tacker-server" + "Tacker-conductor" + "VnfLcmDriver/VnfLcmDriverV2" + "KubernetesInfraDriver" + "TackerDB" + "Kubernetes client" + + === Start periodic database synchronization. === + "Tacker-conductor" -> "VnfLcmDriver/VnfLcmDriverV2" + [label = "1. Start periodic database synchronization."]; + "VnfLcmDriver/VnfLcmDriverV2" -> "KubernetesInfraDriver" + [label = "check pod status"]; + "KubernetesInfraDriver" -> "TackerDB" + [label = "2. Get pod information"]; + "KubernetesInfraDriver" <-- "TackerDB" + [label = "return pod information"]; + "KubernetesInfraDriver" -> "Kubernetes client" + [label = "3. Get pod information"]; + "KubernetesInfraDriver" <-- "Kubernetes client" + [label = "return pod information"]; + "KubernetesInfraDriver" -> "KubernetesInfraDriver" + [label = "4. Check differences"]; + "KubernetesInfraDriver" -> "TackerDB" + [label = "5. If there are difference, update pod information"]; + + "Client" -> "Tacker-server" + [label = "LCM request", rightnote = "This is a sample of LCM request,\nnot intending to describe specific LCM."]; + "Tacker-server" -> "Tacker-conductor" + [label = "LCM request"]; + "Tacker-conductor" -> "VnfLcmDriver/VnfLcmDriverV2" + [label = "LCM request"]; + "VnfLcmDriver/VnfLcmDriverV2" -> "KubernetesInfraDriver" + [label = "LCM request"]; + "VnfLcmDriver/VnfLcmDriverV2" <-- "KubernetesInfraDriver" + [label = "Return error due to synchronization", rightnote = "LCM requests will be rejected\nduring synchronization."]; + "Tacker-conductor" <-- "VnfLcmDriver/VnfLcmDriverV2" + [label = "Return error due to synchronization"]; + "Tacker-server" <-- "Tacker-conductor" + [label = "Return error due to synchronization"]; + "Client" <-- "Tacker-server" + [label = "409 Conflict"]; + + "KubernetesInfraDriver" <-- "TackerDB" + [label = ""]; + "VnfLcmDriver/VnfLcmDriverV2" <-- "KubernetesInfraDriver" + [label = ""]; + "Tacker-conductor" <-- "VnfLcmDriver/VnfLcmDriverV2" + [label = ""]; + "Client" <-- "Tacker-conductor" + [style=none]; + + === In case of conflicting DB synchronization during LCMs === + "Client" -> "Tacker-server" + [label = "LCM request", rightnote = "This is a sample of LCM request,\nnot intending to describe specific LCM."]; + "Tacker-server" -> "Tacker-conductor" + [label = "LCM request"]; + "Tacker-conductor" -> "VnfLcmDriver/VnfLcmDriverV2" + [label = "LCM request"]; + "VnfLcmDriver/VnfLcmDriverV2" -> "KubernetesInfraDriver" + [label = "LCM request"]; + + "Tacker-server" -> "Tacker-conductor" + [style=none] + "Tacker-conductor" -> "VnfLcmDriver/VnfLcmDriverV2" + [label = "1'. Start periodic database synchronization."]; + "VnfLcmDriver/VnfLcmDriverV2" -> "KubernetesInfraDriver" + [label = "check pod status"]; + "KubernetesInfraDriver" -> "TackerDB" + [label = "2'. Get pod information"]; + "KubernetesInfraDriver" <-- "TackerDB" + [label = "return pod information"]; + "KubernetesInfraDriver" -> "Kubernetes client" + [label = "3'. Get pod information"]; + "KubernetesInfraDriver" <-- "Kubernetes client" + [label = "return pod information"]; + "KubernetesInfraDriver" -> "KubernetesInfraDriver" + [label = "4'. Check differences"]; + "KubernetesInfraDriver" -> "KubernetesInfraDriver" + [label = "5'. Skip DB update if the target instance is same as current ongoing LCM."]; + "VnfLcmDriver/VnfLcmDriverV2" <-- "KubernetesInfraDriver" + [label = ""]; + "Tacker-conductor" <-- "VnfLcmDriver/VnfLcmDriverV2" + [label = ""]; + "Tacker-server" <-- "Tacker-conductor" + [style=none] + + "VnfLcmDriver/VnfLcmDriverV2" <-- "KubernetesInfraDriver" + [label = "return"]; + "Tacker-conductor" <-- "VnfLcmDriver/VnfLcmDriverV2" + [label = "return"]; + "Tacker-server" <-- "Tacker-conductor" + [label = "return"]; + "Client" <-- "Tacker-server" + [label = "return"]; + } + +#. Start the database synchronization process periodically at + intervals specified in Config + +#. KubernetesInfraDriver retrieves Pod information stored in TackerDB + +#. KubernetesInfraDriver retrieves information about currently running + Pod name and number of Pod from KubernetesDB + +#. KubernetesInfraDriver compares these two pieces of information + + Comparisons include: + + * Pod name + + * Number of running Pods + +#. If there are differences, KubernetesInfraDriver updates the + Pod information and scaleLevel to TackerDB. + + * Any LCM requests of target instances is rejected by HTTP 409 + (Conflict). + +Step 1' to 5' describes the detail of conflict of DB synchronization during +LCM process. If DB synchronization is needed for the instance which is +affected by ongoing LCM procedure, DB synchronization will be skipped for that +instance. + +Configuration for Interval timer +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Config can set a timer to synchronize pod information between +TackerDB and KubernetesDB. + +This configuration is common to the v1 and v2 APIs. + +The default is 300 seconds. + + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance impact +------------------ + +Contention may occur when database synchronization and LCM operations are +performed simultaneously. + ++ If you perform an LCM operation on the same VNF instance while periodic + database synchronization is taking place on that VNF instance, Tacker + responds with "409 conflicts" and the LCM operation is not performed. ++ If database synchronization is performed for the same VNF instance during + an LCM operation for a VNF instance, the database synchronization is skipped. + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Masaki Ueno + +Other contributors: + Hideki Matsuda + + Ayumu Ueha + + Yoshiyuki Katada + + Yusuke Niimi + +Work Items +---------- + ++ Support for periodic database synchronization + ++ Add a new configuration about interval for checking database synchronization + ++ Add new unit and functional tests + +Dependencies +============ + +None + +Testing +======= + +Unit and functional tests will be added to cover cases required in the +specification. + +Documentation Impact +==================== + +Complete configuration guide will be added to explain configuration +about specify the interval between synchronizations. + +References +========== + +None