Browse Source

Added lessons learnt document

The lessions learnt document has been created on etherpad, this
patch took all the content from that document, edited, reformatted,
and also added more few items I learnt from my own experiences of
developing these workload scripts.

Change-Id: I3ab9cacd6369c3cbcfc912707a9e844ae7c4fc2d
Tong Li 2 years ago
parent
commit
bdaa492657
1 changed files with 112 additions and 0 deletions
  1. 112
    0
      doc/source/lessonslearnt.rst

+ 112
- 0
doc/source/lessonslearnt.rst View File

@@ -0,0 +1,112 @@
1
+Tooling
2
+-------
3
+
4
+For interoperable automated deployment, Ansible + Ansible OpenStack cloud
5
+modules (based on OpenStack Shade) provided the best results.
6
+
7
+Terraform and its OpenStack cloud modules (based on OpenStack Shade) has
8
+also been tried, but various issues such as not supporting multiple same
9
+service endpoints renders many clouds which support multiple endpoints for
10
+Nova, Neutron versions rendered failed deployment on these clouds, but
11
+supporting multiple endpoints is necessary for different versions of
12
+OpenStack client applications. Terraform also does not allow apply (creation)
13
+and destroy (remove) action to be used at the same time. But it is often
14
+needed, for example, during a deployment, you may need a floating IP (that is
15
+the apply action in Terraform) but at the end of the deployment you may want
16
+to remove that floating IP (that is the destroy action in Terraform), so the
17
+floating IP which is a resource in Terraform only existed in a short period
18
+of time, Terraform can not really handle the situation. This is probably the
19
+most unforgiven restriction of the Terraform. The Interop Challenge working
20
+group can not seem find a work around to overcome the restriction.  Also it
21
+appears that these issues have been identified but have not been actively
22
+addressed by Terraform community.
23
+
24
+OpenStack Heat has also been discussed but since the adoption of HEAT is
25
+still not wide spread, this tool was not used. Similar reasons for other
26
+tools like Murano and Juju.
27
+
28
+It's perhaps worth noting that both the Ansible OpenStack cloud modules and
29
+Terraform OpenStack cloud modules based on OpenStack Shade, which is
30
+a library that was written explicitlly to work around some Interop
31
+problems. So we can essentially have some degree of interop as long as
32
+there is an interop layer between us and the cloud (the aim should be not
33
+to need this library), tooling in interop challenge is a very important
34
+subject.
35
+
36
+Shade seems to be missing AZ parameter for create_keypair (Ansible's
37
+os_keypair) and other functions which can cause problems on clouds with
38
+multi-AZs per region.
39
+
40
+
41
+Networking
42
+----------
43
+
44
+Network virtualization features are where most interoperability issues become
45
+visible. OpenStack Neutron support very large number of plugins, these plugins
46
+can behave very differently. For example, private IP and floating IP
47
+supporting can vary, some clouds make public accessable IP address as private
48
+IP address when returned from client library, some clouds make the same thing
49
+as public IP address, the later seems to be the right behavior, but clouds
50
+implement them differently. Layer 2 and layer 3 functions can be also
51
+challenge, some clouds won't expose the functions for customers to create
52
+routers, or networks. Releasing the alocated floating IPs is completely
53
+missing from all OpenStack cloud modules tools like Ansible and Terraform.
54
+This problem results in the alocated floating IPs hanging around, it is
55
+especially bad for clouds which do not have small public IP address segment.
56
+
57
+Not all clouds provide tenant networks by default.  Be prepared to have to
58
+configure your own if netnant network can be had.
59
+
60
+Can not assume the first NIC on the guest is going to be eth0 (this is common
61
+on older guest OS's prior to the arrival of Predictable Network Interface
62
+Names and systemd, and likely isn't true on newer guest OS's). Instead, allow
63
+the user to set those as parameters to the workload or try to detect these
64
+names in the workload when the network nic is needed.
65
+
66
+Not all clouds support floating IP or private IP. You may want to structure
67
+your workload so that it can adapt to either attach instances to a routeable
68
+network or use floating IP's based on the parameters it's given.
69
+
70
+The tenant network has its advantages when the communications are server to
71
+server on the same network. For example, when your deployment scenario
72
+involves multiple backend servers such as database and application servers,
73
+the commuincation between these servers can be placed on the tenant network
74
+to improve security and performance.
75
+
76
+
77
+Provisiong
78
+----------
79
+
80
+It makes a real difference not only the HW that the cloud is running on but
81
+also if the backend is ceph or something else, if it is co-located, if the
82
+images have any sort of overhead checks, etc.
83
+
84
+If you don't assume a particular guest OS image, be careful with
85
+storage/networking.  We encountered one example in which a particular
86
+guestOS/virtual adapter pair needed to rescan the SCSI bus before it would
87
+recognize a newly attached Cinder volume. Rescanning the bus is generally
88
+harmless if not needed and ensures that images built with adapter types that
89
+need it run successfully, so it's an example of something you can do to make
90
+your workloads more interoperable.
91
+
92
+Parameterize things that are likely to change across different cloud/guest
93
+OS setups.  For example: don't assume the first volume attached to a guest
94
+will always be /dev/vdb (this is common but not guaranteed on libvirt, often
95
+untrue on other hypervisors).
96
+
97
+
98
+Metadata
99
+--------
100
+
101
+Not all cloud support cloud-init, when develop workloads which heavily rely
102
+on metadata services, the clouds without metadata support will fail.
103
+
104
+
105
+Conclusion
106
+----------
107
+
108
+With best practices it is possible to create enterprise applications (with
109
+enterprise characteristics such as load balancer, multiple web application
110
+servers, distributed database, security groups, block storage to provide
111
+enterprise level networking safeguards) can be created such that they are
112
+portable to numerous (over 18) private and public OpenStack Clouds.

Loading…
Cancel
Save