swift/doc/source/overview_stats.rst

80 lines
3.5 KiB
ReStructuredText
Raw Normal View History

2010-09-22 09:43:33 -05:00
==================
Swift stats system
==================
The swift stats system is composed of three parts parts: log creation, log
uploading, and log processing. The system handles two types of logs (access
and storage stats), but it can be extended to handle other types of logs.
---------
Log Types
---------
***********
Access logs
***********
Access logs are the proxy server logs.
******************
Storage stats logs
******************
Storage logs (also referred to as stats logs) are generated by a stats system
process. swift-account-stats-logger runs on each account server (via cron) and
walks the filesystem looking for account databases. When an account database
is found, the logger selects the account hash, bytes_used, container_count,
and object_count. These values are then written out as one line in a csv file.
One csv file is produced for every run of swift-account-stats-logger. This
means that, system wide, one csv file is produced for every storage node.
Rackspace runs the account stats logger every hour. Therefore, in a cluster of
ten account servers, ten csv files are produced every hour. Also, every
account will have one entry for every replica in the system. On average, there
will be three copies of each account in the aggreagate of all account stat csv
files created in one system-wide run.
----------------------
Log Processing plugins
----------------------
The swift stats system is written to allow a plugin to be defined for every
log type. Swift includes plugins for both access logs and storage stats logs.
Each plugin is responsible for defining, in a config section, where the logs
are stored on disk, where the logs will be stored in swift (account and
container), the filename format of the logs on disk, the location of the
plugin class definition, and any plugin-specific config values.
The plugin class definition defines three methods. The constuctor must accept
one argument (the dict representation of the plugin's config section). The
process method must accept an iterator, and the account, container, and object
name of the log. The keylist_mapping accepts no parameters.
-------------
Log Uploading
-------------
swift-log-uploader accepts a config file and a plugin name. It finds the log
files on disk according to the plugin config section and uploads them to the
swift cluster. This means one uploader process will run on each proxy server
node and each account server node. To not upload partially-written log files,
the uploader will not upload files with an mtime of less than two hours ago.
Rackspace runs this process once an hour via cron.
--------------
Log Processing
--------------
swift-log-stats-collector accepts a config file and generates a csv that is
uploaded to swift. It loads all plugins defined in the config file, generates
a list of all log files in swift that need to be processed, and passes an
iterable of the log file data to the appropriate plugin's process method. The
process method returns a dictionary of data in the log file keyed on (account,
year, month, day, hour). The log-stats-collector process then combines all
dictionaries from all calls to a process method into one dictionary. Key
collisions within each (account, year, month, day, hour) dictionary are
summed. Finally, the summed dictionary is mapped to the final csv values with
each plugin's keylist_mapping method.
The resulting csv file has one line per (account, year, month, day, hour) for
all log files processed in that run of swift-log-stats-collector.