Files
deb-python-cassandra-driver/docs/performance.rst
Adam Holmberg f2e9bb36b9 Upate performance notes doc: +cython, -numbers
remove outdated, specific numbers

Having numbers for contrived workloads sometimes confuses people or
leads to impressions of false claims.

Unified benchmarks at DataStax will replace these.
2015-07-24 16:26:46 -05:00

2.5 KiB

Performance Notes

The Python driver for Cassandra offers several methods for executing queries. You can synchronously block for queries to complete using .Session.execute(), you can obtain asynchronous request futures through .Session.execute_async(), and you can attach a callback to the future with .ResponseFuture.add_callback().

Examples of multiple request patterns can be found in the benchmark scripts included in the driver project.

The choice of execution pattern will depend on the application context. For applications dealing with multiple requests in a given context, the recommended pattern is to use concurrent asynchronous requests with callbacks. For many use cases, you don't need to implement this pattern yourself. cassandra.concurrent.execute_concurrent and cassandra.concurrent.execute_concurrent_with_args provide this pattern with a synchronous API and tunable concurrency.

Due to the GIL and limited concurrency, the driver can become CPU-bound pretty quickly. The sections below discuss further runtime and design considerations for mitigating this limitation.

PyPy

PyPy is an alternative Python runtime which uses a JIT compiler to reduce CPU consumption. This leads to a huge improvement in the driver performance, more than doubling throughput for many workloads.

Cython Extensions

Cython is an optimizing compiler and language that can be used to compile the core files and optional extensions for the driver. Cython is not a strict dependency, but the extensions will be built by default if cython is present in the python path. To include Cython as a requirement, invoke with the extra name cython:

$ pip install cassandra-driver[cython]

multiprocessing

All of the patterns discussed above may be used over multiple processes using the multiprocessing module. Multiple processes will scale better than multiple threads, so if high throughput is your goal, consider this option.

Be sure to never share any ~.Cluster, ~.Session, or ~.ResponseFuture objects across multiple processes. These objects should all be created after forking the process, not before.

For further discussion and simple examples using the driver with multiprocessing, see this blog post.