Use case analysis for Golang addition to Openstack

This resolution provides the use case analysis (step 1)
for the addition of Go (aka golang) as a supported
language under the OpenStack governance model based
on the requirements of the Swift object server and its
data consistency engine.

Change-Id: I001e587b7b04491e748f528b84055ec62364654c
Thiago da Silva 2017-03-29 14:00:40 -04:00 committed by Thierry Carrez
parent 5ddcdce53a
commit 63555d58fe
2 changed files with 118 additions and 0 deletions

View File

@ -0,0 +1,109 @@
2017-03-29 Use case for the addition of Go as a supported language
In a previous resolution titled :doc:`20150901-programming-languages`, the TC
determined that the supported languages in OpenStack are: bash, Javascript and
Python. Furthermore, that document also recognized that it was not wise to
limit OpenStack service projects to only those three languages in the future,
but it never went as far as determining how new languages could be supported. A
new document: :doc:`Requirements for language additions to the OpenStack
Ecosystem <../reference/new-language-requirements>`, was recently introduced to
define a process in which new languages could be added as supported languages
in OpenStack; it calls for a two step process in which the first step is to
review and agree on the technical needs for a new language, and the second step
to a meet a minimum number of requirements to support the new language in the
OpenStack ecosystem.
This resolution provides the use case analysis (step 1) for the addition of Go
(aka golang) as a supported language under the OpenStack governance model based
on the requirements of the Swift object server and its data consistency engine.
Technical requirements
In an e-mail thread last summer [1]_, Samuel Merritt provided an extensive
explanation for the challenges and limitations of previous solutions that the
Swift team has encountered regarding disk I/O performance. What follows is a
summary of what he detailed:
The Swift Object Server is responsible for handling multiple client connections
concurrently to both read and write data to disk. While Eventlet is very good at
reading/writing data to network sockets, reading/writing data to disks can be
very slow because the calling thread/process are blocked waiting for the kernel
to return.
With Eventlet, when a greenthread tries to read from a socket and the socket is
not readable, the Eventlet hub steps in, un-schedules the greenthread, finds an
un-blocked one, and lets it proceed. When a greenthread tries to read from a
file, the read() call doesn't return until the data is in the process's memory.
If the data isn't in buffer cache and the kernel has to go fetch it from a
spinning disk, it can take on average around ~7ms of seek time. Eventlet is not
able to un-schedule greenthreads that are reading from disk, so the calling
process blocks until the kernel reads the data from the disk.
For the Swift object servers that could be running on servers with 40, 60 or
even 90 disks, all these little waits drive throughput down to unacceptable
levels. To make matters worse, if one of the disks starts failing, the wait on
that one disk can go up to dozens of even hundreds of milliseconds, causing the
object server to block services to all disks.
The Swift community has tried for years to solve this problem with Python. One
attempt was to use a threadpool with a couple of I/O threads per disk, which
helped mitigate the problem with slow disks. The issue with this approach was
the threadpool overhead. It helps for systems calls that are going out to disk,
but it slowed down calls to the data that is in buffer cache, so in reality
using a threadpool brought a great cost to overall throughput, in some cases,
users saw a 25% drop in throughput. Another solution was to separate object
server processes per disk. The solution also helped with slow disks and theres
no need for a thread pool, but for super dense servers with many dozens of disks
it meant that memory consumption spiked up, limiting the memory available for
the filesystem.
Other solutions have been talked about, but ended up being rejected. Using
Linux AIO (kernel AIO, not POSIX libaio) would let the object server have many
pending IOs cheaply, but it only works in `O_DIRECT` mode, which requires
memory buffer to be aligned, which is not possible in Python. Libuv is a new
and promising, yet unproven, solution. However, there are no Python libraries
yet that support async disk I/O calls [2]_ [3]_, plus it would still require
the Swift team to re-write the object server causing a full solution to be
years away.
Proposed solution
The solution to this problem is being able to use non-blocking I/O calls. The
Go runtime would help mitigate the filesystem I/O because it would be able to
run blocking system calls in dedicated OS threads in a lightweight fashion,
allowing for one Go object server process to perform I/O calls to many disks
without blocking the whole process. It solves the I/O throughput problem
without causing a high spike in memory consumption.
The Rackspace cloud files team started experimenting with using Golang and from
that the Hummingbird project was born. Today, Hummingbird serves as a very good
proof of concept that we can solve all the problems that have been mentioned in
a timely manner. Yes, the Swift team will still need to re-write the Object
server, but a signficant amount of that work has already been done, plus it has
been show to already work in production with excellent performance and
scalability results [4]_.
The Swift community believes the reasons stated above satisfies the first step
of the :doc:`Requirements for language additions to the OpenStack Ecosystem
<../reference/new-language-requirements>` resolution to add golang as a
supported language in the Openstack ecosystem. Furthermore, we look forward to
be able to work with the rest of the OpenStack community on the second step
once this resolution is approved.
.. [1]
.. [2]
.. [3]
.. [4]

View File

@ -7,6 +7,15 @@
When a motion does not result in a change in a reference doc, it can
be expressed as a resolution.
.. toctree::
:maxdepth: 1