Upgrade Apache Kafka client
Currently in all Python Monasca components the copy of `kafka-python` library in version 0.9.5 (released on Feb 16, 2016) is used. This specification describes the process of upgrading the Apache Kafka client to `confluent-kafka-python`. This will improve the performance and reliability. Sticking with the old frozen client version is also unacceptable in terms of security. Change-Id: I59f3effcdba39199d61d70a201d8e760840d3627 Story: 2003705 Task: 26360
This commit is contained in:

committed by
Doug Szumski

parent
c13558ba09
commit
a4674a5e7f
@@ -27,6 +27,7 @@ Here you can find the specs, and spec template, for each release:
|
||||
|
||||
specs/queens/index
|
||||
specs/rocky/index
|
||||
specs/stein/index
|
||||
|
||||
There are also some approved backlog specifications that are looking for
|
||||
owners:
|
||||
|
1
doc/source/specs/stein/approved
Symbolic link
1
doc/source/specs/stein/approved
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../../specs/stein/approved
|
1
doc/source/specs/stein/implemented
Symbolic link
1
doc/source/specs/stein/implemented
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../../specs/stein/implemented
|
26
doc/source/specs/stein/index.rst
Normal file
26
doc/source/specs/stein/index.rst
Normal file
@@ -0,0 +1,26 @@
|
||||
=============================
|
||||
Monasca Stein Specifications
|
||||
=============================
|
||||
|
||||
Template:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
Specification Template (Stein release) <template>
|
||||
|
||||
Stein implemented specs:
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
.. implemented/*
|
||||
|
||||
Stein approved (but not implemented) specs:
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
approved/*
|
1
doc/source/specs/stein/template.rst
Symbolic link
1
doc/source/specs/stein/template.rst
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../../specs/stein-template.rst
|
193
specs/stein/approved/upgrade-kafka-client.rst
Normal file
193
specs/stein/approved/upgrade-kafka-client.rst
Normal file
@@ -0,0 +1,193 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===========================
|
||||
Upgrade Apache Kafka client
|
||||
===========================
|
||||
|
||||
Include the URL of your story:
|
||||
|
||||
https://storyboard.openstack.org/#!/story/2003705
|
||||
|
||||
Currently in all Python Monasca components the copy of `kafka-python` library
|
||||
in version 0.9.5 (released on Feb 16, 2016) is used [1]_. This specification
|
||||
describes the process of upgrading the Apache Kafka client to
|
||||
`confluent-kafka-python` [2]_. This will improve the performance and
|
||||
reliability. Sticking with the old frozen client version is also unacceptable
|
||||
in terms of security.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The use of `KeyedProducer` and `SimpleConsumer` in `kafka-python` library has
|
||||
been deprecated as of version 1.0.0 [3]_. Further use of this code poses a
|
||||
security risk. Additionally, profiling of ``monasca-persister`` has shown that
|
||||
most of the time is spent during the consumption of Kafka messages [7]_. Thus,
|
||||
there is a big potential on improving overall Monasca performance by upgrading
|
||||
the used Kafka client.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The wiki page hosted by Apache Software Foundation lists available Python
|
||||
clients [4]_. There are currently three actively maintained and supported
|
||||
clients: `confluent-kafka-python`, `kafka-python` and `pykafka`. Several
|
||||
benchmarks have shown [5]_, [6]_ that the client maintained by Confluent is
|
||||
both the fastest and most complete.
|
||||
|
||||
There is significant performance improvement when using asynchronous producer
|
||||
(~50x). Sending messages asynchronously will require more care to avoid
|
||||
duplicating the persisted data but performance gain justifies that.
|
||||
|
||||
`confluent-kafka-python` is also the only client which offers support for
|
||||
Apache Avro serialization which reduces the size of messages and thus
|
||||
additionally speeds up communication.
|
||||
|
||||
The proposed change includes using:
|
||||
|
||||
* `confluent-kafka-python` library
|
||||
* in asynchronous mode
|
||||
|
||||
Code changes will affect following components:
|
||||
|
||||
* monasca-common
|
||||
* monasca-{log,event}-api
|
||||
* monasca-persister
|
||||
* monasca-notification
|
||||
* monasca-transform
|
||||
|
||||
Java components (`monasca-thresh` and `monasca-persister`) are out of scope of
|
||||
this specification. Client upgrading in these components should be handled
|
||||
separately.
|
||||
|
||||
This client has an external dependency on `librdkafka`, a finely tuned C
|
||||
client.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
* `pykafka`
|
||||
* new version of `kafka-python`
|
||||
* use synchronous mode
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
No data model impact.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
No REST API impact.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
This change will improve the security because of removing the deprecated and
|
||||
unmaintained code.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
No end user impact.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
This change should dramatically improve the performance of the complete
|
||||
solution. In particular performance of `monasca-persister` and `monasca-api` is
|
||||
expected to improve.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
New libraries should be packaged and deployed:
|
||||
|
||||
* `confluent-kafka-python`
|
||||
* `librdkafka`
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
`confluent-kafka-python` has to be used instead of `kafka-python` in all
|
||||
affected components.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
witek
|
||||
|
||||
Other contributors:
|
||||
<>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* remove code using `pykafka`
|
||||
* remove `pykafka` from requirements and lower-constraints
|
||||
* add `confluent-kafka-python` to global-requirements
|
||||
* implement common routines in `monasca-common`
|
||||
* use new code in:
|
||||
* monasca-{log,events}-api
|
||||
* monasca-persister
|
||||
* monasca-notification
|
||||
* monasca-transform
|
||||
* delete old deprecated code
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
New packages have to be build for:
|
||||
|
||||
* `confluent-kafka-python`
|
||||
* `librdkafka`
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
We should test the implementation using existing integration tests (tempest).
|
||||
Additionally we should test the scenario when the producer fails to receive
|
||||
response from Kafka for some of the messages in the bulk. It should be avoided
|
||||
that duplicate entries are created in the database.
|
||||
|
||||
The implantation should be followed by executing following tests on the
|
||||
complete stack:
|
||||
|
||||
* stress
|
||||
* endurance
|
||||
* performance
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
No documentation impact.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://github.com/dpkp/kafka-python/releases/tag/v0.9.5
|
||||
.. [2] https://github.com/confluentinc/confluent-kafka-python
|
||||
.. [3] https://github.com/dpkp/kafka-python/blob/master/docs/changelog.rst#100-feb-15-2016
|
||||
.. [4] https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python
|
||||
.. [5] https://github.com/monasca/monasca-perf/blob/master/kafka_python_client_perf/monascaInvestigationKafkaPythonAPIs.md
|
||||
.. [6] http://activisiongamescience.github.io/2016/06/15/Kafka-Client-Benchmarking/
|
||||
.. [7] http://git.openstack.org/cgit/openstack/monasca-persister/commit/?id=a7112fd30bd545dd850e0e267dcceb9ea27551ad
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Stein
|
||||
- Introduced
|
Reference in New Issue
Block a user