Add documentation

2012-05-20 22:28:14 +02:00
parent e7b271b460
commit de4b0ab0b9
9 changed files with 942 additions and 0 deletions
--- a/TODO.rst
+++ b/TODO.rst
@@ -0,0 +1,23 @@
+.. Note: this list is automatically included in the documentation.
+
+***********************************
+To-do list and possible future work
+***********************************
+
+This document lists some ideas that the developers thought of, but have not yet
+implemented. The topics described below may be implemented (or not) in the
+future, depending on time, demand, and technical possibilities.
+
+* Improved error handling instead of just propagating the errors from the
+  Thrift layer. Maybe wrap the errors in a HappyBase.Error?
+
+* Automatic retries for failed operations (but only those that can be retried)
+
+* Connection pooling (maybe based on PyCassa's ConnectionPool?)
+
+* Thread safety. This involves at least coordinating access to the socket
+  connection to HBase's Thrift gateway.
+
+* Port HappyBase over to the (still experimental) HBase Thrift2 API when it
+  becomes mainstream, and expose more of the underlying features nicely in the
+  HappyBase API.
--- a/doc/api.rst
+++ b/doc/api.rst
@@ -0,0 +1,47 @@
+*****************
+API documentation
+*****************
+
+.. py:currentmodule:: happybase
+
+This chapter contains detailed API documentation for HappyBase. It is suggested
+to read the :doc:`tutorial <tutorial>` first to get a general idea about how
+HappyBase works.
+
+The HappyBase API is organised as follows:
+
+:py:class:`~happybase.Connection`:
+   The :py:class:`~happybase.Connection` class is the main entry point for
+   application developers. It connects to the HBase Thrift server and provides
+   methods for table management.
+
+:py:class:`~happybase.Table`:
+   The :py:class:`Table` class is the main class for interacting with data in
+   tables. This class offers methods for data retrieval and data manipulation.
+   Instances of this class can be obtained using the
+   :py:meth:`Connection.table()` method.
+
+:py:class:`~happybase.Batch`:
+   The :py:class:`Batch` class implements the batch API for data manipulation,
+   and is available through the :py:meth:`Table.batch()` method.
+
+
+Connection
+==========
+
+.. autoclass:: happybase.Connection
+
+
+Table
+=====
+
+.. autoclass:: happybase.Table
+
+
+Batch
+=====
+
+.. autoclass:: happybase.Batch
+
+
+.. vim: set spell spelllang=en:
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -0,0 +1,244 @@
+# -*- coding: utf-8 -*-
+#
+# HappyBase documentation build configuration file, created by
+# sphinx-quickstart on Tue Mar 20 17:40:16 2012.
+#
+# This file is execfile()d with the current directory set to its containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import sys, os
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#sys.path.insert(0, os.path.abspath('.'))
+
+# -- General configuration -----------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions = ['sphinx.ext.autodoc', 'sphinx.ext.coverage']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix of source filenames.
+source_suffix = '.rst'
+
+# The encoding of source files.
+#source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'HappyBase'
+copyright = u'2012'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = '0.1'
+# The full version, including alpha/beta/rc tags.
+release = '0.1'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+#today = ''
+# Else, today_fmt is used as the format for a strftime call.
+#today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = []
+
+# The reST default role (used for this markup: `text`) to use for all documents.
+#default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+#modindex_common_prefix = []
+
+autodoc_default_flags = ['members', 'undoc-members']
+autodoc_member_order = 'bysource'
+
+# -- Options for HTML output ---------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+html_theme = 'default'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this directory.
+#html_theme_path = []
+
+# The name for this set of Sphinx documents.  If None, it defaults to
+# "<project> v<release> documentation".
+#html_title = None
+
+# A shorter title for the navigation bar.  Default is the same as html_title.
+#html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+#html_logo = None
+
+# The name of an image file (within the static path) to use as favicon of the
+# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+#html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+#html_last_updated_fmt = '%b %d, %Y'
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#html_additional_pages = {}
+
+# If false, no module index is generated.
+#html_domain_indices = True
+
+# If false, no index is generated.
+#html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+#html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+#html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it.  The value of this option must be the
+# base URL from which the finished HTML is served.
+#html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+#html_file_suffix = None
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'HappyBasedoc'
+
+
+# -- Options for LaTeX output --------------------------------------------------
+
+latex_elements = {
+# The paper size ('letterpaper' or 'a4paper').
+#'papersize': 'letterpaper',
+
+# The font size ('10pt', '11pt' or '12pt').
+#'pointsize': '10pt',
+
+# Additional stuff for the LaTeX preamble.
+#'preamble': '',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title, author, documentclass [howto/manual]).
+latex_documents = [
+  ('index', 'HappyBase.tex', u'HappyBase Documentation',
+   u' ', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+#latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+#latex_use_parts = False
+
+# If true, show page references after internal links.
+#latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#latex_show_urls = False
+
+# Documents to append as an appendix to all manuals.
+#latex_appendices = []
+
+# If false, no module index is generated.
+#latex_domain_indices = True
+
+
+# -- Options for manual page output --------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    ('index', 'happybase', u'HappyBase Documentation',
+     [u' '], 1)
+]
+
+# If true, show URL addresses after external links.
+#man_show_urls = False
+
+
+# -- Options for Texinfo output ------------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+  ('index', 'HappyBase', u'HappyBase Documentation',
+   u' ', 'HappyBase', 'One line description of project.',
+   'Miscellaneous'),
+]
+
+# Documents to append as an appendix to all manuals.
+#texinfo_appendices = []
+
+# If false, no module index is generated.
+#texinfo_domain_indices = True
+
+# How to display URL addresses: 'footnote', 'no', or 'inline'.
+#texinfo_show_urls = 'footnote'
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -0,0 +1,27 @@
+*********
+HappyBase
+*********
+
+.. include:: ../README.rst
+
+.. rubric:: Table of contents
+
+.. toctree::
+   :maxdepth: 1
+
+   introduction
+   installation
+   tutorial
+   api
+   todo
+   license
+
+
+.. rubric:: Indices and tables
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
+
+.. vim: set spell spelllang=en:
--- a/doc/installation.rst
+++ b/doc/installation.rst
@@ -0,0 +1,67 @@
+************
+Installation
+************
+
+This guide describes how to install HappyBase.
+
+.. contents:: On this page
+   :local:
+
+
+Setting up a virtual environment
+================================
+
+The recommended way to install HappyBase and Thrift is to use a virtual
+environment created by `virtualenv`. Setup and activate a new virtual
+environment like this:
+
+.. code-block:: sh
+
+   $ virtualenv envname
+   $ source envname/bin/activate
+
+If you use the `virtualenvwrapper` scripts, type this instead:
+
+.. code-block:: sh
+
+   $ mkvirtualenv envname
+
+
+Installing packages
+===================
+
+The next step is to install the Thrift package for Python:
+
+.. code-block:: sh
+
+   (envname) $ pip install thrift
+
+…and the HappyBase package:
+
+.. code-block:: sh
+
+   (envname) $ cd /path/to/happybase/
+   (envname) $ python setup.py install
+
+.. note::
+
+   Generating and installing the HBase Thrift Python modules (using ``thrift
+   --gen py`` on the ``.thrift`` file) is not necessary, since HappyBase
+   bundles pregenerated versions of those modules.
+
+
+Testing the installation
+========================
+
+Verify that the packages are installed correctly by starting a ``python`` shell
+and entering the following statements::
+
+   >>> import thrift
+   >>> import happybase
+
+If you don't see any errors, the installation was successful. Congratulations!
+Now that you have HappyBase installed on your machine, continue with the
+:doc:`tutorial <tutorial>` to learn how to use it.
+
+
+.. vim: set spell spelllang=en:
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -0,0 +1,114 @@
+************
+Introduction
+************
+
+.. py:currentmodule:: happybase
+
+.. contents:: On this page
+   :local:
+
+
+What is HappyBase?
+==================
+
+.. include:: ../README.rst
+
+HappyBase is designed for for use in standard HBase setups, and offers
+application developers a Pythonic API to interact with HBase.
+
+Below the surface, HappyBase uses the `Python Thrift library
+<http://pypi.python.org/pypi/thrift>`_ to connect to HBase's `Thrift
+<http://thrift.apache.org/>`_ gateway, which is included in the standard HBase
+0.9x releases. HappyBase hides most of the details of the underlying RPC
+mechanisms, resulting in application code that is cleaner, more productive to
+write, and more maintainable.
+
+
+What does code using HappyBase look like?
+=========================================
+
+The example below illustrates basic usage of the library::
+
+   import happybase
+
+   connection = happybase.Connection('hostname')
+   table = connection.table('table-name')
+
+   table.put('row-key', {'family:qual1': 'value1',
+                         'family:qual2': 'value2'})
+
+   row = table.row('row-key')
+   print row['family:qual1']  # prints 'value1'
+
+   for key, data in table.rows(['row-key-1', 'row-key-2']):
+       print key, data  # prints row key and data for each row
+
+   for key, data in table.scan(row_prefix='row'):
+       print key, data  # prints 'value1' and 'value2'
+
+   row = table.delete('row-key')
+
+Note that the :doc:`tutorial <tutorial>` contains many more examples.
+
+
+Why not use the HBase Thrift API directly?
+==========================================
+
+You may consider using the HBase Thrift API directly instead of adding yet
+another library to your project. After all, :pep:`20` taught us that simple is
+better than complex, and there should be one, and preferably one way to do it,
+right? Well, we agree.
+
+While the HBase Thrift API can be used directly from Python using the
+(automatically generated) HBase Thrift service classes, application code using
+this API is verbose, cumbersome, and hence error-prone. The reason for this is
+that the HBase Thrift API is a flat, language-agnostic interface API closely
+tied to the RPC going over the wire-level protocol. This means that
+applications need to deal with many imports, sockets, transports, protocols,
+clients, Thrift types and mutation objects. For instance, look at the code
+required to connect to HBase and store two values::
+
+   from thrift import Thrift
+   from thrift.transport import TSocket, TTransport
+   from thrift.protocol import TBinaryProtocol
+
+   from hbase import ttypes
+   from hbase.Hbase import Client, Mutation
+
+   sock = TSocket.TSocket('hostname', 9090)
+   transport = TTransport.TBufferedTransport(sock)
+   protocol = TBinaryProtocol.TBinaryProtocol(transport)
+   client = Client(protocol)
+   transport.open()
+
+   mutations = [Mutation(column='family:qual1', value='value1'),
+                Mutation(column='family:qual2', value='value2')]
+   client.mutateRow('table-name', 'row-key', mutations)
+
+HappyBase hides all the Thrift cruft below a friendly API, and makes the task
+in the example above look like this::
+
+   import happybase
+   connection = happybase.Connection('hostname')
+   table = connection.table('table-name')
+   table.put('row-key', {'family:qual1': 'value1',
+                         'family:qual2': 'value2'})
+
+Hopefully this example makes it clear that you will be a lot happier using
+HappyBase than using the Thrift API directly. If you still have doubts about
+this, try to accomplish some other common tasks, e.g. retrieving rows and
+scanning over a part of a table, and compare that with the really-easy-to-use
+HappyBase equivalents. If you're still not convinced by then, we're sorry to
+inform you that HappyBase is not the project for you, and we wish you all of
+luck maintaining your code ‒ or is it Thrift boilerplate? ‒ while your
+application evolves.
+
+
+How do I get started?
+=====================
+
+Follow the :doc:`installation guide <installation>` and read the :doc:`tutorial
+<tutorial>`.
+
+
+.. vim: set spell spelllang=en:
--- a/doc/license.rst
+++ b/doc/license.rst
@@ -0,0 +1,5 @@
+*******
+License
+*******
+
+.. include:: ../LICENSE.rst
--- a/doc/todo.rst
+++ b/doc/todo.rst
@@ -0,0 +1 @@
+.. include:: ../TODO.rst
--- a/doc/tutorial.rst
+++ b/doc/tutorial.rst
@@ -0,0 +1,414 @@
+********
+Tutorial
+********
+
+.. py:currentmodule:: happybase
+
+This tutorial explores the HappyBase API and should provide you with enough
+information to get you started. Note that this tutorial is intended as an
+introduction to HappyBase, not to HBase in general. Readers should already have
+a basic understanding of HBase and its data model.
+
+While the tutorial does cover most features, it is not a complete reference
+guide. More information about the HappyBase API is available from the :doc:`API
+documentation <api>`.
+
+.. contents:: On this page
+   :local:
+
+
+Opening a :py:class:`Connection`
+================================
+
+We'll get started by connecting to HBase::
+
+   import happybase
+
+   connection = happybase.Connection('somehost')
+
+When a :py:class:`Connection` instance is created, it automatically opens a
+socket connection to the HBase Thrift server. This behaviour can be disabled by
+setting the `autoconnect` argument to `False`, and opening the connection
+manually using :py:meth:`Connection.open`::
+
+   connection = happybase.Connection('somehost', autoconnect=False)
+
+   # before first use:
+   connection.open()
+
+The :py:class:`Connection` class provides various methods to interact with the
+HBase instance. For instance, we can ask ask for the names of the available
+tables using the :py:meth:`Connection.tables` method::
+
+   print connection.tables()
+
+If a single HBase instance is used by multiple applications, table name
+collisions may occur because applications use the same table names. A solution
+is to add a ‘namespace’ prefix to the names of all tables ‘owned’ by a specific
+application. Instead of adding this application-specific prefix each time a
+table name is passed to HappyBase, the `table_prefix` parameter can be used.
+HappyBase will prepend that prefix (and an underscore) to each table name
+handled by the :py:class:`Connection` instance. So, for a project ``myproject``
+that should have table names that look like ``myproject_XYZ``, use this::
+
+   connection = happybase.Connection('somehost', table_prefix='myproject')
+
+:py:meth:`Connection.tables` no longer includes tables in other ‘namespaces’;
+it will only returns tables with a ``myproject_`` prefix in HBase, and also
+strips of the prefix::
+
+   print connection.tables()  # Table "myproject_XYZ" in HBase will be
+                              # returned as simply "XYZ"
+
+The :py:class:`Connection` class offers various other methods to interact with
+HBase, mostly to perform table management tasks like enabling and disabling
+tables. This tutorial does not cover those; the :doc:`API documentation <api>`
+for the :py:class:`Connection` class contains more information.
+
+
+Obtaining a :py:class:`Table` instance
+======================================
+
+The :py:class:`Table` class provides the main API to retrieve and manipulate
+data in HBase. In the example above, we already asked for the available tables
+using the :py:meth:`Connection.tables` method, so the next step is to obtain a
+:py:class:`.Table` instance. This is done by calling
+:py:meth:`Connection.table` with the name of the table::
+
+   table = connection.table('mytable')
+
+Obtaining a :py:class:`Table` instance does *not* result in a round-trip to the
+Thrift server, which means application code may ask the :py:class:`Connection`
+instance for a new :py:class:`Table` whenever it needs one, without negative
+performance consequences. A side effect is that no check is done to ensure that
+the table exists, since that would involve a round-trip, so expect errors if
+you try to interact with non-existing tables later in your code. For this
+tutorial, we assume the table exists.
+
+.. note::
+
+   The ‘heavy’ `HTable` HBase class from the Java HBase API, which does the
+   real communication with the region servers, is at the other side of the
+   Thrift connection. There is no direct mapping between :py:class:`Table`
+   instances on the Python side and `HTable` instances on the server side.
+
+
+Retrieving data
+===============
+
+The HBase data model is a multidimensional sparse map. A table in HBase
+contains column families with column qualifiers containing a value and a
+timestamp. In most of the HappyBase API, column family and qualifier names are
+specified as a single string, e.g. ``cf1:col1``, and not as two separate
+arguments. While column families and qualifiers are different concepts in the
+HBase data model, they are almost always used together when interacting with
+data, so treating them as a single string makes the API a lot simpler.
+
+Retrieving rows
+---------------
+
+The :py:class:`Table` class offers various methods to retrieve data from a
+table in HBase. The most basic one is :py:meth:`Table.row`, which retrieves a
+single row from the table, and returns it as a dictionary mapping columns to
+values::
+
+   row = table.row('row-key')
+   print row['cf1:col1']   # prints the value of cf1:col1
+
+The :py:meth:`Table.rows` method works just like :py:meth:`Table.row`, but
+takes multiple row keys and returns those as `(key, data)` tuples::
+
+   rows = table.rows(['row-key-1', 'row-key-2'])
+   for key, data in rows:
+       print key, data
+
+If you want the results that :py:meth:`Table.rows` returns as a dictionary or
+ordered dictionary, you will have to do this yourself. This is really easy
+though, since the return value can be passed directly to the dictionary
+constructor. For a normal dictionary, order is lost::
+
+   rows_as_dict = dict(table.rows(['row-key-1', 'row-key-2']))
+
+…whereas for a :py:class:`OrderedDict`, order is preserved::
+
+   from collections import OrderedDict
+   rows_as_ordered_dict = OrderedDict(table.rows(['row-key-1', 'row-key-2']))
+
+
+Making more fine-grained selections
+-----------------------------------
+
+HBase's data model allows for more fine-grained selections of the data to
+retrieve. If you know beforehand which columns are needed, performance can be
+improved by specifying those columns explicitly to :py:meth:`Table.row` and
+:py:meth:`Table.rows`. The `columns` argument takes a list (or tuple) of column
+names::
+
+   row = table.row('row-key', columns=['cf1:col1', 'cf1:col2'])
+   print row['cf1:col1']
+   print row['cf1:col2']
+
+Instead of providing both a column family and a column qualifier, items in the
+`columns` argument may also be just a column family, which means that all
+columns from that column family will be retrieved. For example, to get all
+columns and values in the column family `cf1`, use this::
+
+   row = table.row('row-key', columns=['cf1'])
+
+In HBase, each cell has a timestamp attached to it. In case you don't want to
+work with the latest version of data stored in HBase, the methods that retrieve
+data from the database, e.g. :py:meth:`Table.row`, all accept a `timestamp`
+argument that specifies that the results should be restricted to values with a
+timestamp up to the specified timestamp::
+
+   row = table.row('row-key', timestamp=123456789)
+
+By default, HappyBase does not include timestamps in the results it returns. In
+your application needs access to the timestamps, simply set the
+`include_timestamp` parameter to ``True``. Now, each cell in the result will be
+returned as a `(value, timestamp)` tuple instead of just a value::
+
+   row = table.row('row-key', columns=['cf1:col1'], include_timestamp=True)
+   value, timestamp = row['cf1:col1']
+
+HBase supports storing multiple versions of the same cell. This can be
+configured for each column family. To retrieve all versions of a column for a
+given row, :py:meth:`Table.cells` can be used. This method returns an ordered
+list of cells, with the most recent version coming first. The `versions`
+argument specifies the maximum number of versions to return. Just like the
+methods that retrieve rows, the `include_timestamp` argument determines whether
+timestamps are included in the result. Example::
+
+   values = table.cells('row-key', 'cf1:col1', versions=2)
+   for value in values:
+       print "Cell data: %s" % value
+
+   cells = table.cells('row-key', 'cf1:col1', versions=3, include_timestamp=True)
+   for value, timestamp in cells:
+       print "Cell data at %d: %s" % (timestamp, value)
+
+Note that the result may contain fewer cells than requested. The cell may just
+have fewer versions, or you may have requested more versions than HBase keeps
+for the column family.
+
+Scanning over rows in a table
+-----------------------------
+
+In addition to retrieving data for known row keys, rows in HBase can be
+efficiently iterated over using a table scanner, created using
+:py:meth:`Table.scan`. A basic scanner that iterates over all rows in the table
+looks like this::
+
+   for key, data in table.scan():
+       print key, data
+
+Doing full table scans like in the example above is prohibitively expensive in
+practice. Scans can be restricted in several ways to make more selective range
+queries. One way is to specify start or stop keys, or both. To iterate over all
+rows from row `aaa` to the end of the table::
+
+   for key, data in table.scan(row_start='aaa'):
+       print key, data
+
+To iterate over all rows from the start of the table up to row `xyz`, use this::
+
+   for key, data in table.scan(row_stop='xyz'):
+       print key, data
+
+To iterate over all rows between row `aaa` (included) and `xyz` (not included),
+supply both::
+
+   for key, data in table.scan(row_start='aaa', row_stop='xyz'):
+       print key, data
+
+An alternative is to use a key prefix. For example, to iterate over all rows
+starting with `abc`::
+
+   for key, data in table.scan(row_prefix='abc'):
+       print key, data
+
+The scanner examples above only limit the results by row key using the
+`row_start`, `row_stop`, and `row_prefix` arguments, but scanners can also
+limit results to certain columns, column families, and timestamps, just like
+:py:meth:`Table.row` and :py:meth:`Table.rows`. For advanced users, a filter
+string can be passed as the `filter` argument. Additionally, the optional
+`limit` argument defines how much data is at most retrieved, and the
+`batch_size` argument specifies how big the transferred chunks should be. The
+:py:meth:`Table.scan` API documentation provides more information on the
+supported scanner options.
+
+
+Manipulating data
+=================
+
+In HBase, all mutations either store data or mark data for deletion; there is
+no such thing as an `update`. HappyBase provides methods to do single inserts
+or deletes, and also a batch API for bulk mutations.
+
+Storing data
+------------
+
+To store a single cell of data in our table, we can use :py:meth:`Table.put`,
+which takes the row key, and the data to store. The data should be a dictionary
+mapping the column name to a value::
+
+   table.put('row-key', {'cf:col1': 'value1',
+                         'cf:col2': 'value2'})
+
+Use the `timestamp` argument if you want to provide timestamps explicitly::
+
+   table.put('row-key', {'cf:col1': 'value1'}, timestamp=123456789)
+
+If omitted, HBase defaults to the current system time.
+
+Deleting data
+-------------
+
+The :py:meth:`Table.delete` method deletes data from a table. To delete a
+complete row, just specify the row key::
+
+   table.delete('row-key')
+
+To delete one or more columns instead of a complete row, also specify the
+`columns` argument::
+
+   table.delete('row-key', columns=['cf1:col1', 'cf1:col2'])
+
+The optional `timestamp` argument restricts the delete operation to data up to
+the specified timestamp.
+
+Performing batch mutations
+--------------------------
+
+The :py:meth:`Table.put` and :py:meth:`Table.delete` methods both issue a
+command to the HBase Thrift server immediately. This means that using these
+methods is not very efficient when storing or deleting multiple values. It is
+much more efficient to aggregate a bunch of commands and send them to the
+server in one go. This is exactly what the :py:class:`Batch` class, created
+using :py:meth:`Table.batch`, does. A :py:class:`Batch` instance has put and
+delete methods, just like the :py:class:`Table` class, but the changes are sent
+to the server in a single round-trip using :py:meth:`Batch.send`::
+
+   b = table.batch()
+   b.put('row-key-1', {'cf:col1': 'value1', 'cf:col2': 'value2'})
+   b.put('row-key-2', {'cf:col2': 'value2', 'cf:col3': 'value3'})
+   b.put('row-key-3', {'cf:col3': 'value3', 'cf:col4': 'value4'})
+   b.delete('row-key-4')
+   b.send()
+
+.. note::
+
+   Storing and deleting data for the same row key in a single batch leads to
+   unpredictable results, so don't do that.
+
+While the methods on the :py:class:`Batch` instance resemble the
+:py:meth:`~Table.put` and :py:meth:`~Table.delete` methods, they do not take a
+`timestamp` argument for each mutation. Instead, you can specify a single
+`timestamp` argument for the complete batch::
+
+   b = table.batch(timestamp=123456789)
+   b.put(...)
+   b.delete(...)
+   b.send()
+
+:py:class:`Batch` instances can be used as *context managers*, which are most
+useful in combination with Python's ``with`` construct. The example above can
+be simplified to read::
+
+   with table.batch() as b:
+       b.put('row-key-1', {'cf:col1': 'value1', 'cf:col2': 'value2'})
+       b.put('row-key-2', {'cf:col2': 'value2', 'cf:col3': 'value3'})
+       b.put('row-key-3', {'cf:col3': 'value3', 'cf:col4': 'value4'})
+       b.delete('row-key-4')
+
+As you can see, there is no call to :py:meth:`Batch.send` anymore. The batch is
+automatically applied when the ``with`` code block terminates, even in case of
+errors somewhere in the ``with`` block, so it behaves basically the same as a
+``try/finally`` clause. However, some applications require transactional
+behaviour, sending the batch only if no exception occurred. Without a context
+manager this would look something like this::
+
+   b = table.batch()
+   try:
+       b.put('row-key-1', {'cf:col1': 'value1', 'cf:col2': 'value2'})
+       b.put('row-key-2', {'cf:col2': 'value2', 'cf:col3': 'value3'})
+       b.put('row-key-3', {'cf:col3': 'value3', 'cf:col4': 'value4'})
+       b.delete('row-key-4')
+       raise ValueError("Something went wrong!")
+   except ValueError as e:
+       # error handling goes here; nothing is sent to HBase
+       pass
+   else:
+       # no exceptions; send data
+       b.send()
+
+Obtaining the same behaviour is easier using a ``with`` block. The
+`transaction` argument to :py:meth:`Table.batch` is all you need::
+
+   try:
+       with table.batch(transaction=True) as b:
+           b.put('row-key-1', {'cf:col1': 'value1', 'cf:col2': 'value2'})
+           b.put('row-key-2', {'cf:col2': 'value2', 'cf:col3': 'value3'})
+           b.put('row-key-3', {'cf:col3': 'value3', 'cf:col4': 'value4'})
+           b.delete('row-key-4')
+           raise ValueError("Something went wrong!")
+   except ValueError:
+       # error handling goes here; nothing is sent to HBase
+       pass
+
+   # when no error occurred, the transaction succeeded
+
+As you may have imagined already, a :py:class:`Batch` keeps all mutations in
+memory until the batch is sent, either by calling :py:meth:`Batch.send()`
+explicitly, or when the ``with`` block ends. This doesn't work for applications
+that need to store huge amounts of data, since it may result in batches that
+are too big to send in one round-trip, or in batches that use too much memory.
+For these cases, the `batch_size` argument can be specified. The `batch_size`
+acts as a threshold: a :py:class:`Batch` instance automatically sends all
+pending mutations when there are more than `batch_size` pending operations. For
+example, this will result in three round-trips to the server (two batches with
+1000 cells, and one with the remaining 400)::
+
+   with table.batch(batch_size=1000) as b:
+       for i in range(1200):
+           # this put() will result in two mutations (two cells)
+           b.put('row-%04d' % i, {'cf1:col1': 'v1',
+                                  'cf1:col2': 'v2',})
+
+The appropriate `batch_size` is very application-specific since it depends on
+the data size, so just experiment to see how different sizes work for your
+specific use case.
+
+Using atomic counters
+---------------------
+
+The :py:meth:`Table.counter_inc` and :py:meth:`Table.counter_dec` methods allow
+for atomic incrementing and decrementing of 8 byte wide values, which are
+interpreted as big-endian 64-bit signed integers by HBase. Counters are
+automatically initialised to 0 upon first use. When incrementing or
+decrementing a counter, the value after modification is returned. Example::
+
+   print table.counter_inc('row-key', 'cf1:counter')  # prints 1
+   print table.counter_inc('row-key', 'cf1:counter')  # prints 2
+   print table.counter_inc('row-key', 'cf1:counter')  # prints 3
+
+   print table.counter_dec('row-key', 'cf1:counter')  # prints 2
+
+The optional `value` argument specifies how much to increment or decrement by::
+
+   print table.counter_inc('row-key', 'cf1:counter', value=3)  # prints 5
+
+While counters are typically used with the increment and decrement functions
+shown above, the :py:meth:`Table.counter_get` and :py:meth:`Table.counter_set`
+methods can be used to retrieve or set a counter value directly::
+
+   print table.counter_get('row-key', 'cf1:counter')  # prints 5
+
+   table.counter_set('row-key', 'cf1:counter', 12)
+
+Note that an application should *never* :py:meth:`~Table.counter_get` the
+current value, modify it in code and then :py:meth:`~Table.counter_set` the
+modified value; use the atomic :py:meth:`~Table.counter_inc` and
+:py:meth:`~Table.counter_dec` instead!
+
+.. vim: set spell spelllang=en: