403 lines
14 KiB
ReStructuredText
403 lines
14 KiB
ReStructuredText
Getting Started
|
|
===============
|
|
|
|
First, make sure you have the driver properly :doc:`installed <installation>`.
|
|
|
|
Connecting to Cassandra
|
|
-----------------------
|
|
Before we can start executing any queries against a Cassandra cluster we need to setup
|
|
an instance of :class:`~.Cluster`. As the name suggests, you will typically have one
|
|
instance of :class:`~.Cluster` for each Cassandra cluster you want to interact
|
|
with.
|
|
|
|
The simplest way to create a :class:`~.Cluster` is like this:
|
|
|
|
.. code-block:: python
|
|
|
|
from cassandra.cluster import Cluster
|
|
|
|
cluster = Cluster()
|
|
|
|
This will attempt to connection to a Cassandra instance on your
|
|
local machine (127.0.0.1). You can also specify a list of IP
|
|
addresses for nodes in your cluster:
|
|
|
|
.. code-block:: python
|
|
|
|
from cassandra.cluster import Cluster
|
|
|
|
cluster = Cluster(['192.168.0.1', '192.168.0.2'])
|
|
|
|
The set of IP addresses we pass to the :class:`~.Cluster` is simply
|
|
an initial set of contact points. After the driver connects to one
|
|
of these nodes it will *automatically discover* the rest of the
|
|
nodes in the cluster and connect to them, so you don't need to list
|
|
every node in your cluster.
|
|
|
|
If you need to use a non-standard port, use SSL, or customize the driver's
|
|
behavior in some other way, this is the place to do it:
|
|
|
|
.. code-block:: python
|
|
|
|
from cassandra.cluster import Cluster
|
|
from cassandra.policies import DCAwareRoundRobinPolicy
|
|
|
|
cluster = Cluster(
|
|
['10.1.1.3', '10.1.1.4', '10.1.1.5'],
|
|
load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='US_EAST'),
|
|
port=9042)
|
|
|
|
|
|
You can find a more complete list of options in the :class:`~.Cluster` documentation.
|
|
|
|
Instantiating a :class:`~.Cluster` does not actually connect us to any nodes.
|
|
To establish connections and begin executing queries we need a
|
|
:class:`~.Session`, which is created by calling :meth:`.Cluster.connect()`:
|
|
|
|
.. code-block:: python
|
|
|
|
cluster = Cluster()
|
|
session = cluster.connect()
|
|
|
|
The :meth:`~.Cluster.connect()` method takes an optional ``keyspace`` argument
|
|
which sets the default keyspace for all queries made through that :class:`~.Session`:
|
|
|
|
.. code-block:: python
|
|
|
|
cluster = Cluster()
|
|
session = cluster.connect('mykeyspace')
|
|
|
|
|
|
You can always change a Session's keyspace using :meth:`~.Session.set_keyspace` or
|
|
by executing a ``USE <keyspace>`` query:
|
|
|
|
.. code-block:: python
|
|
|
|
session.set_keyspace('users')
|
|
# or you can do this instead
|
|
session.execute('USE users')
|
|
|
|
|
|
Executing Queries
|
|
-----------------
|
|
Now that we have a :class:`.Session` we can begin to execute queries. The simplest
|
|
way to execute a query is to use :meth:`~.Session.execute()`:
|
|
|
|
.. code-block:: python
|
|
|
|
rows = session.execute('SELECT name, age, email FROM users')
|
|
for user_row in rows:
|
|
print user_row.name, user_row.age, user_row.email
|
|
|
|
This will transparently pick a Cassandra node to execute the query against
|
|
and handle any retries that are necessary if the operation fails.
|
|
|
|
By default, each row in the result set will be a
|
|
`namedtuple <http://docs.python.org/2/library/collections.html#collections.namedtuple>`_.
|
|
Each row will have a matching attribute for each column defined in the schema,
|
|
such as ``name``, ``age``, and so on. You can also treat them as normal tuples
|
|
by unpacking them or accessing fields by position. The following three
|
|
examples are equivalent:
|
|
|
|
.. code-block:: python
|
|
|
|
rows = session.execute('SELECT name, age, email FROM users')
|
|
for row in rows:
|
|
print row.name, row.age, row.email
|
|
|
|
.. code-block:: python
|
|
|
|
rows = session.execute('SELECT name, age, email FROM users')
|
|
for (name, age, email) in rows:
|
|
print name, age, email
|
|
|
|
.. code-block:: python
|
|
|
|
rows = session.execute('SELECT name, age, email FROM users')
|
|
for row in rows:
|
|
print row[0], row[1], row[2]
|
|
|
|
If you prefer another result format, such as a ``dict`` per row, you
|
|
can change the :attr:`~.Session.row_factory` attribute.
|
|
|
|
For queries that will be run repeatedly, you should use
|
|
`Prepared statements <#prepared-statements>`_.
|
|
|
|
Passing Parameters to CQL Queries
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
When executing non-prepared statements, the driver supports two forms of
|
|
parameter place-holders: positional and named.
|
|
|
|
Positional parameters are used with a ``%s`` placeholder. For example,
|
|
when you execute:
|
|
|
|
.. code-block:: python
|
|
|
|
session.execute(
|
|
"""
|
|
INSERT INTO users (name, credits, user_id)
|
|
VALUES (%s, %s, %s)
|
|
""",
|
|
("John O'Reilly", 42, uuid.uuid1())
|
|
)
|
|
|
|
It is translated to the following CQL query::
|
|
|
|
INSERT INTO users (name, credits, user_id)
|
|
VALUES ('John O''Reilly', 42, 2644bada-852c-11e3-89fb-e0b9a54a6d93)
|
|
|
|
Note that you should use ``%s`` for all types of arguments, not just strings.
|
|
For example, this would be **wrong**:
|
|
|
|
.. code-block:: python
|
|
|
|
session.execute("INSERT INTO USERS (name, age) VALUES (%s, %d)", ("bob", 42)) # wrong
|
|
|
|
Instead, use ``%s`` for the age placeholder.
|
|
|
|
If you need to use a literal ``%`` character, use ``%%``.
|
|
|
|
**Note**: you must always use a sequence for the second argument, even if you are
|
|
only passing in a single variable:
|
|
|
|
.. code-block:: python
|
|
|
|
session.execute("INSERT INTO foo (bar) VALUES (%s)", "blah") # wrong
|
|
session.execute("INSERT INTO foo (bar) VALUES (%s)", ("blah")) # wrong
|
|
session.execute("INSERT INTO foo (bar) VALUES (%s)", ("blah", )) # right
|
|
session.execute("INSERT INTO foo (bar) VALUES (%s)", ["blah"]) # right
|
|
|
|
|
|
Note that the second line is incorrect because in Python, single-element tuples
|
|
require a comma.
|
|
|
|
Named place-holders use the ``%(name)s`` form:
|
|
|
|
.. code-block:: python
|
|
|
|
session.execute(
|
|
"""
|
|
INSERT INTO users (name, credits, user_id, username)
|
|
VALUES (%(name)s, %(credits)s, %(user_id)s, %(name)s)
|
|
""",
|
|
{'name': "John O'Reilly", 'credits': 42, 'user_id': uuid.uuid1()}
|
|
)
|
|
|
|
Note that you can repeat placeholders with the same name, such as ``%(name)s``
|
|
in the above example.
|
|
|
|
Only data values should be supplied this way. Other items, such as keyspaces,
|
|
table names, and column names should be set ahead of time (typically using
|
|
normal string formatting).
|
|
|
|
.. _type-conversions:
|
|
|
|
Type Conversions
|
|
^^^^^^^^^^^^^^^^
|
|
For non-prepared statements, Python types are cast to CQL literals in the
|
|
following way:
|
|
|
|
.. table::
|
|
|
|
+--------------------+-------------------------+
|
|
| Python Type | CQL Literal Type |
|
|
+====================+=========================+
|
|
| ``None`` | ``NULL`` |
|
|
+--------------------+-------------------------+
|
|
| ``bool`` | ``boolean`` |
|
|
+--------------------+-------------------------+
|
|
| ``float`` | | ``float`` |
|
|
| | | ``double`` |
|
|
+--------------------+-------------------------+
|
|
| | ``int`` | | ``int`` |
|
|
| | ``long`` | | ``bigint`` |
|
|
| | | ``varint`` |
|
|
| | | ``smallint`` |
|
|
| | | ``tinyint`` |
|
|
| | | ``counter`` |
|
|
+--------------------+-------------------------+
|
|
| ``decimal.Decimal``| ``decimal`` |
|
|
+--------------------+-------------------------+
|
|
| | ``str`` | | ``ascii`` |
|
|
| | ``unicode`` | | ``varchar`` |
|
|
| | | ``text`` |
|
|
+--------------------+-------------------------+
|
|
| | ``buffer`` | ``blob`` |
|
|
| | ``bytearray`` | |
|
|
+--------------------+-------------------------+
|
|
| ``date`` | ``date`` |
|
|
+--------------------+-------------------------+
|
|
| ``datetime`` | ``timestamp`` |
|
|
+--------------------+-------------------------+
|
|
| ``time`` | ``time`` |
|
|
+--------------------+-------------------------+
|
|
| | ``list`` | ``list`` |
|
|
| | ``tuple`` | |
|
|
| | generator | |
|
|
+--------------------+-------------------------+
|
|
| | ``set`` | ``set`` |
|
|
| | ``frozenset`` | |
|
|
+--------------------+-------------------------+
|
|
| | ``dict`` | ``map`` |
|
|
| | ``OrderedDict`` | |
|
|
+--------------------+-------------------------+
|
|
| ``uuid.UUID`` | | ``timeuuid`` |
|
|
| | | ``uuid`` |
|
|
+--------------------+-------------------------+
|
|
|
|
|
|
Asynchronous Queries
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
The driver supports asynchronous query execution through
|
|
:meth:`~.Session.execute_async()`. Instead of waiting for the query to
|
|
complete and returning rows directly, this method almost immediately
|
|
returns a :class:`~.ResponseFuture` object. There are two ways of
|
|
getting the final result from this object.
|
|
|
|
The first is by calling :meth:`~.ResponseFuture.result()` on it. If
|
|
the query has not yet completed, this will block until it has and
|
|
then return the result or raise an Exception if an error occurred.
|
|
For example:
|
|
|
|
.. code-block:: python
|
|
|
|
from cassandra import ReadTimeout
|
|
|
|
query = "SELECT * FROM users WHERE user_id=%s"
|
|
future = session.execute_async(query, [user_id])
|
|
|
|
# ... do some other work
|
|
|
|
try:
|
|
rows = future.result()
|
|
user = rows[0]
|
|
print user.name, user.age
|
|
except ReadTimeout:
|
|
log.exception("Query timed out:")
|
|
|
|
This works well for executing many queries concurrently:
|
|
|
|
.. code-block:: python
|
|
|
|
# build a list of futures
|
|
futures = []
|
|
query = "SELECT * FROM users WHERE user_id=%s"
|
|
for user_id in ids_to_fetch:
|
|
futures.append(session.execute_async(query, [user_id])
|
|
|
|
# wait for them to complete and use the results
|
|
for future in futures:
|
|
rows = future.result()
|
|
print rows[0].name
|
|
|
|
Alternatively, instead of calling :meth:`~.ResponseFuture.result()`,
|
|
you can attach callback and errback functions through the
|
|
:meth:`~.ResponseFuture.add_callback()`,
|
|
:meth:`~.ResponseFuture.add_errback()`, and
|
|
:meth:`~.ResponseFuture.add_callbacks()`, methods. If you have used
|
|
Twisted Python before, this is designed to be a lightweight version of
|
|
that:
|
|
|
|
.. code-block:: python
|
|
|
|
def handle_success(rows):
|
|
user = rows[0]
|
|
try:
|
|
process_user(user.name, user.age, user.id)
|
|
except Exception:
|
|
log.error("Failed to process user %s", user.id)
|
|
# don't re-raise errors in the callback
|
|
|
|
def handle_error(exception):
|
|
log.error("Failed to fetch user info: %s", exception)
|
|
|
|
|
|
future = session.execute_async(query)
|
|
future.add_callbacks(handle_success, handle_error)
|
|
|
|
There are a few important things to remember when working with callbacks:
|
|
* **Exceptions that are raised inside the callback functions will be logged and then ignored.**
|
|
* Your callback will be run on the event loop thread, so any long-running
|
|
operations will prevent other requests from being handled
|
|
|
|
|
|
Setting a Consistency Level
|
|
---------------------------
|
|
The consistency level used for a query determines how many of the
|
|
replicas of the data you are interacting with need to respond for
|
|
the query to be considered a success.
|
|
|
|
By default, :attr:`.ConsistencyLevel.LOCAL_ONE` will be used for all queries.
|
|
You can specify a different default for the session on :attr:`.Session.default_consistency_level`.
|
|
To specify a different consistency level per request, wrap queries
|
|
in a :class:`~.SimpleStatement`:
|
|
|
|
.. code-block:: python
|
|
|
|
from cassandra import ConsistencyLevel
|
|
from cassandra.query import SimpleStatement
|
|
|
|
query = SimpleStatement(
|
|
"INSERT INTO users (name, age) VALUES (%s, %s)",
|
|
consistency_level=ConsistencyLevel.QUORUM)
|
|
session.execute(query, ('John', 42))
|
|
|
|
Prepared Statements
|
|
-------------------
|
|
Prepared statements are queries that are parsed by Cassandra and then saved
|
|
for later use. When the driver uses a prepared statement, it only needs to
|
|
send the values of parameters to bind. This lowers network traffic
|
|
and CPU utilization within Cassandra because Cassandra does not have to
|
|
re-parse the query each time.
|
|
|
|
To prepare a query, use :meth:`.Session.prepare()`:
|
|
|
|
.. code-block:: python
|
|
|
|
user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?")
|
|
|
|
users = []
|
|
for user_id in user_ids_to_query:
|
|
user = session.execute(user_lookup_stmt, [user_id])
|
|
users.append(user)
|
|
|
|
:meth:`~.Session.prepare()` returns a :class:`~.PreparedStatement` instance
|
|
which can be used in place of :class:`~.SimpleStatement` instances or literal
|
|
string queries. It is automatically prepared against all nodes, and the driver
|
|
handles re-preparing against new nodes and restarted nodes when necessary.
|
|
|
|
Note that the placeholders for prepared statements are ``?`` characters. This
|
|
is different than for simple, non-prepared statements (although future versions
|
|
of the driver may use the same placeholders for both).
|
|
|
|
Setting a Consistency Level with Prepared Statements
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
To specify a consistency level for prepared statements, you have two options.
|
|
|
|
The first is to set a default consistency level for every execution of the
|
|
prepared statement:
|
|
|
|
.. code-block:: python
|
|
|
|
from cassandra import ConsistencyLevel
|
|
|
|
cluster = Cluster()
|
|
session = cluster.connect("mykeyspace")
|
|
user_lookup_stmt = session.prepare("SELECT * FROM users WHERE user_id=?")
|
|
user_lookup_stmt.consistency_level = ConsistencyLevel.QUORUM
|
|
|
|
# these will both use QUORUM
|
|
user1 = session.execute(user_lookup_stmt, [user_id1])[0]
|
|
user2 = session.execute(user_lookup_stmt, [user_id2])[0]
|
|
|
|
The second option is to create a :class:`~.BoundStatement` from the
|
|
:class:`~.PreparedStatement` and binding parameters and set a consistency
|
|
level on that:
|
|
|
|
.. code-block:: python
|
|
|
|
# override the QUORUM default
|
|
user3_lookup = user_lookup_stmt.bind([user_id3])
|
|
user3_lookup.consistency_level = ConsistencyLevel.ALL
|
|
user3 = session.execute(user3_lookup)
|