merging in stats stuff
This commit is contained in:
@@ -47,7 +47,7 @@ If you need more throughput to either Account or Container Services, they may
|
||||
each be deployed to their own servers. For example you might use faster (but
|
||||
more expensive) SAS or even SSD drives to get faster disk I/O to the databases.
|
||||
|
||||
Load balancing and network design is left as an excercise to the reader,
|
||||
Load balancing and network design is left as an exercise to the reader,
|
||||
but this is a very important part of the cluster, so time should be spent
|
||||
designing the network for a Swift cluster.
|
||||
|
||||
@@ -59,7 +59,7 @@ Preparing the Ring
|
||||
|
||||
The first step is to determine the number of partitions that will be in the
|
||||
ring. We recommend that there be a minimum of 100 partitions per drive to
|
||||
insure even distribution accross the drives. A good starting point might be
|
||||
insure even distribution across the drives. A good starting point might be
|
||||
to figure out the maximum number of drives the cluster will contain, and then
|
||||
multiply by 100, and then round up to the nearest power of two.
|
||||
|
||||
@@ -154,8 +154,8 @@ Option Default Description
|
||||
------------------ ---------- ---------------------------------------------
|
||||
swift_dir /etc/swift Swift configuration directory
|
||||
devices /srv/node Parent directory of where devices are mounted
|
||||
mount_check true Weather or not check if the devices are
|
||||
mounted to prevent accidently writing
|
||||
mount_check true Whether or not check if the devices are
|
||||
mounted to prevent accidentally writing
|
||||
to the root device
|
||||
bind_ip 0.0.0.0 IP Address for server to bind to
|
||||
bind_port 6000 Port for server to bind to
|
||||
@@ -173,7 +173,7 @@ use paste.deploy entry point for the object
|
||||
log_name object-server Label used when logging
|
||||
log_facility LOG_LOCAL0 Syslog log facility
|
||||
log_level INFO Logging level
|
||||
log_requests True Weather or not to log each request
|
||||
log_requests True Whether or not to log each request
|
||||
user swift User to run as
|
||||
node_timeout 3 Request timeout to external services
|
||||
conn_timeout 0.5 Connection timeout to external services
|
||||
@@ -193,7 +193,7 @@ Option Default Description
|
||||
log_name object-replicator Label used when logging
|
||||
log_facility LOG_LOCAL0 Syslog log facility
|
||||
log_level INFO Logging level
|
||||
daemonize yes Weather or not to run replication as a
|
||||
daemonize yes Whether or not to run replication as a
|
||||
daemon
|
||||
run_pause 30 Time in seconds to wait between
|
||||
replication passes
|
||||
@@ -249,9 +249,9 @@ The following configuration options are available:
|
||||
Option Default Description
|
||||
------------------ ---------- --------------------------------------------
|
||||
swift_dir /etc/swift Swift configuration directory
|
||||
devices /srv/node Parent irectory of where devices are mounted
|
||||
mount_check true Weather or not check if the devices are
|
||||
mounted to prevent accidently writing
|
||||
devices /srv/node Parent directory of where devices are mounted
|
||||
mount_check true Whether or not check if the devices are
|
||||
mounted to prevent accidentally writing
|
||||
to the root device
|
||||
bind_ip 0.0.0.0 IP Address for server to bind to
|
||||
bind_port 6001 Port for server to bind to
|
||||
@@ -339,8 +339,8 @@ Option Default Description
|
||||
------------------ ---------- ---------------------------------------------
|
||||
swift_dir /etc/swift Swift configuration directory
|
||||
devices /srv/node Parent directory or where devices are mounted
|
||||
mount_check true Weather or not check if the devices are
|
||||
mounted to prevent accidently writing
|
||||
mount_check true Whether or not check if the devices are
|
||||
mounted to prevent accidentally writing
|
||||
to the root device
|
||||
bind_ip 0.0.0.0 IP Address for server to bind to
|
||||
bind_port 6002 Port for server to bind to
|
||||
@@ -353,7 +353,7 @@ user swift User to run as
|
||||
================== ============== ==========================================
|
||||
Option Default Description
|
||||
------------------ -------------- ------------------------------------------
|
||||
use paste.deploy entry point for the account
|
||||
use Entry point for paste.deploy for the account
|
||||
server. For most cases, this should be
|
||||
`egg:swift#account`.
|
||||
log_name account-server Label used when logging
|
||||
@@ -412,6 +412,11 @@ conn_timeout 0.5 Connection timeout to external services
|
||||
Proxy Server Configuration
|
||||
--------------------------
|
||||
|
||||
An example Proxy Server configuration can be found at
|
||||
etc/proxy-server.conf-sample in the source code repository.
|
||||
|
||||
The following configuration options are available:
|
||||
|
||||
[DEFAULT]
|
||||
|
||||
============================ =============== =============================
|
||||
@@ -432,7 +437,7 @@ key_file Path to the ssl .key
|
||||
============================ =============== =============================
|
||||
Option Default Description
|
||||
---------------------------- --------------- -----------------------------
|
||||
use paste.deploy entry point for
|
||||
use Entry point for paste.deploy for
|
||||
the proxy server. For most
|
||||
cases, this should be
|
||||
`egg:swift#proxy`.
|
||||
@@ -443,10 +448,10 @@ log_headers True If True, log headers in each
|
||||
request
|
||||
recheck_account_existence 60 Cache timeout in seconds to
|
||||
send memcached for account
|
||||
existance
|
||||
existence
|
||||
recheck_container_existence 60 Cache timeout in seconds to
|
||||
send memcached for container
|
||||
existance
|
||||
existence
|
||||
object_chunk_size 65536 Chunk size to read from
|
||||
object servers
|
||||
client_chunk_size 65536 Chunk size to read from
|
||||
@@ -474,7 +479,7 @@ rate_limit_account_whitelist Comma separated list of
|
||||
rate limit
|
||||
rate_limit_account_blacklist Comma separated list of
|
||||
account name hashes to block
|
||||
completly
|
||||
completely
|
||||
============================ =============== =============================
|
||||
|
||||
[auth]
|
||||
@@ -482,7 +487,7 @@ rate_limit_account_blacklist Comma separated list of
|
||||
============ =================================== ========================
|
||||
Option Default Description
|
||||
------------ ----------------------------------- ------------------------
|
||||
use paste.deploy entry point
|
||||
use Entry point for paste.deploy
|
||||
to use for auth. To
|
||||
use the swift dev auth,
|
||||
set to:
|
||||
@@ -500,7 +505,7 @@ Memcached Considerations
|
||||
------------------------
|
||||
|
||||
Several of the Services rely on Memcached for caching certain types of
|
||||
lookups, such as auth tokens, and container/account existance. Swift does
|
||||
lookups, such as auth tokens, and container/account existence. Swift does
|
||||
not do any caching of actual object data. Memcached should be able to run
|
||||
on any servers that have available RAM and CPU. At Rackspace, we run
|
||||
Memcached on the proxy servers. The `memcache_servers` config option
|
||||
@@ -526,7 +531,7 @@ Most services support either a worker or concurrency value in the settings.
|
||||
This allows the services to make effective use of the cores available. A good
|
||||
starting point to set the concurrency level for the proxy and storage services
|
||||
to 2 times the number of cores available. If more than one service is
|
||||
sharing a server, then some experimentaiton may be needed to find the best
|
||||
sharing a server, then some experimentation may be needed to find the best
|
||||
balance.
|
||||
|
||||
At Rackspace, our Proxy servers have dual quad core processors, giving us 8
|
||||
@@ -548,7 +553,7 @@ Filesystem Considerations
|
||||
-------------------------
|
||||
|
||||
Swift is designed to be mostly filesystem agnostic--the only requirement
|
||||
beeing that the filesystem supports extended attributes (xattrs). After
|
||||
being that the filesystem supports extended attributes (xattrs). After
|
||||
thorough testing with our use cases and hardware configurations, XFS was
|
||||
the best all-around choice. If you decide to use a filesystem other than
|
||||
XFS, we highly recommend thorough testing.
|
||||
@@ -611,5 +616,5 @@ Logging Considerations
|
||||
|
||||
Swift is set up to log directly to syslog. Every service can be configured
|
||||
with the `log_facility` option to set the syslog log facility destination.
|
||||
It is recommended to use syslog-ng to route the logs to specific log
|
||||
We recommended using syslog-ng to route the logs to specific log
|
||||
files locally on the server and also to remote log collecting servers.
|
||||
|
||||
@@ -7,9 +7,7 @@ Instructions for setting up a dev VM
|
||||
------------------------------------
|
||||
|
||||
This documents setting up a virtual machine for doing Swift development. The
|
||||
virtual machine will emulate running a four node Swift cluster. It assumes
|
||||
you're using *VMware Fusion 3* on *Mac OS X Snow Leopard*, but should give a
|
||||
good idea what to do on other environments.
|
||||
virtual machine will emulate running a four node Swift cluster.
|
||||
|
||||
* Get the *Ubuntu 10.04 LTS (Lucid Lynx)* server image:
|
||||
|
||||
@@ -17,20 +15,9 @@ good idea what to do on other environments.
|
||||
- Ubuntu Live/Install: http://cdimage.ubuntu.com/releases/10.04/release/ubuntu-10.04-dvd-amd64.iso (4.1 GB)
|
||||
- Ubuntu Mirrors: https://launchpad.net/ubuntu/+cdmirrors
|
||||
|
||||
* Create guest virtual machine:
|
||||
|
||||
#. `Continue without disc`
|
||||
#. `Use operating system installation disc image file`, pick the .iso
|
||||
from above.
|
||||
#. Select `Linux` and `Ubuntu 64-bit`.
|
||||
#. Fill in the *Linux Easy Install* details.
|
||||
#. `Customize Settings`, name the image whatever you want
|
||||
(`SAIO` for instance.)
|
||||
#. When the `Settings` window comes up, select `Hard Disk`, create an
|
||||
extra disk (the defaults are fine).
|
||||
#. Start the virtual machine up and wait for the easy install to
|
||||
finish.
|
||||
|
||||
* Create guest virtual machine from the Ubuntu image (if you are going to use
|
||||
a separate partition for swift data, be sure to add another device when
|
||||
creating the VM)
|
||||
* As root on guest (you'll have to log in as you, then `sudo su -`):
|
||||
|
||||
#. `apt-get install python-software-properties`
|
||||
@@ -41,11 +28,22 @@ good idea what to do on other environments.
|
||||
python-xattr sqlite3 xfsprogs python-webob python-eventlet
|
||||
python-greenlet python-pastedeploy`
|
||||
#. Install anything else you want, like screen, ssh, vim, etc.
|
||||
#. `fdisk /dev/sdb` (set up a single partition)
|
||||
#. `mkfs.xfs -i size=1024 /dev/sdb1`
|
||||
#. If you would like to use another partition for storage:
|
||||
|
||||
#. `fdisk /dev/sdb` (set up a single partition)
|
||||
#. `mkfs.xfs -i size=1024 /dev/sdb1`
|
||||
#. Edit `/etc/fstab` and add
|
||||
`/dev/sdb1 /mnt/sdb1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0`
|
||||
|
||||
#. If you would like to use a loopback device instead of another partition:
|
||||
|
||||
#. `dd if=/dev/zero of=/srv/swift-disk bs=1024 count=0 seek=1000000`
|
||||
(modify seek to make a larger or smaller partition)
|
||||
#. `mkfs.xfs -i size=1024 /srv/swift-disk`
|
||||
#. Edit `/etc/fstab` and add
|
||||
`/srv/swift-disk /mnt/sdb1 xfs loop,noatime,nodiratime,nobarrier,logbufs=8 0 0`
|
||||
|
||||
#. `mkdir /mnt/sdb1`
|
||||
#. Edit `/etc/fstab` and add
|
||||
`/dev/sdb1 /mnt/sdb1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0`
|
||||
#. `mount /mnt/sdb1`
|
||||
#. `mkdir /mnt/sdb1/1 /mnt/sdb1/2 /mnt/sdb1/3 /mnt/sdb1/4 /mnt/sdb1/test`
|
||||
#. `chown <your-user-name>:<your-group-name> /mnt/sdb1/*`
|
||||
@@ -56,7 +54,7 @@ good idea what to do on other environments.
|
||||
#. Add to `/etc/rc.local` (before the `exit 0`)::
|
||||
|
||||
mkdir /var/run/swift
|
||||
chown <your-user-name>:<your-user-name> /var/run/swift
|
||||
chown <your-user-name>:<your-group-name> /var/run/swift
|
||||
|
||||
#. Create /etc/rsyncd.conf::
|
||||
|
||||
@@ -64,7 +62,7 @@ good idea what to do on other environments.
|
||||
gid = <Your group name>
|
||||
log file = /var/log/rsyncd.log
|
||||
pid file = /var/run/rsyncd.pid
|
||||
|
||||
address = 127.0.0.1
|
||||
|
||||
[account6012]
|
||||
max connections = 25
|
||||
@@ -472,6 +470,11 @@ good idea what to do on other environments.
|
||||
sudo service rsyslog restart
|
||||
sudo service memcached restart
|
||||
|
||||
.. note::
|
||||
|
||||
If you are using a loopback device, substitute `/dev/sdb1` above with
|
||||
`/srv/swift-disk`
|
||||
|
||||
#. Create `~/bin/remakerings`::
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
@@ -24,6 +24,7 @@ Overview:
|
||||
overview_reaper
|
||||
overview_auth
|
||||
overview_replication
|
||||
overview_stats
|
||||
rate_limiting
|
||||
|
||||
Development:
|
||||
|
||||
184
doc/source/overview_stats.rst
Normal file
184
doc/source/overview_stats.rst
Normal file
@@ -0,0 +1,184 @@
|
||||
==================
|
||||
Swift stats system
|
||||
==================
|
||||
|
||||
The swift stats system is composed of three parts parts: log creation, log
|
||||
uploading, and log processing. The system handles two types of logs (access
|
||||
and account stats), but it can be extended to handle other types of logs.
|
||||
|
||||
---------
|
||||
Log Types
|
||||
---------
|
||||
|
||||
***********
|
||||
Access logs
|
||||
***********
|
||||
|
||||
Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect
|
||||
the proxy log output to an hourly log file. For example, a proxy request that
|
||||
is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412.
|
||||
This allows easy log rotation and easy per-hour log processing.
|
||||
|
||||
******************
|
||||
Account stats logs
|
||||
******************
|
||||
|
||||
Account stats logs are generated by a stats system process.
|
||||
swift-account-stats-logger runs on each account server (via cron) and walks
|
||||
the filesystem looking for account databases. When an account database is
|
||||
found, the logger selects the account hash, bytes_used, container_count, and
|
||||
object_count. These values are then written out as one line in a csv file. One
|
||||
csv file is produced for every run of swift-account-stats-logger. This means
|
||||
that, system wide, one csv file is produced for every storage node. Rackspace
|
||||
runs the account stats logger every hour. Therefore, in a cluster of ten
|
||||
account servers, ten csv files are produced every hour. Also, every account
|
||||
will have one entry for every replica in the system. On average, there will be
|
||||
three copies of each account in the aggregate of all account stat csv files
|
||||
created in one system-wide run.
|
||||
|
||||
----------------------
|
||||
Log Processing plugins
|
||||
----------------------
|
||||
|
||||
The swift stats system is written to allow a plugin to be defined for every
|
||||
log type. Swift includes plugins for both access logs and storage stats logs.
|
||||
Each plugin is responsible for defining, in a config section, where the logs
|
||||
are stored on disk, where the logs will be stored in swift (account and
|
||||
container), the filename format of the logs on disk, the location of the
|
||||
plugin class definition, and any plugin-specific config values.
|
||||
|
||||
The plugin class definition defines three methods. The constructor must accept
|
||||
one argument (the dict representation of the plugin's config section). The
|
||||
process method must accept an iterator, and the account, container, and object
|
||||
name of the log. The keylist_mapping accepts no parameters.
|
||||
|
||||
-------------
|
||||
Log Uploading
|
||||
-------------
|
||||
|
||||
swift-log-uploader accepts a config file and a plugin name. It finds the log
|
||||
files on disk according to the plugin config section and uploads them to the
|
||||
swift cluster. This means one uploader process will run on each proxy server
|
||||
node and each account server node. To not upload partially-written log files,
|
||||
the uploader will not upload files with an mtime of less than two hours ago.
|
||||
Rackspace runs this process once an hour via cron.
|
||||
|
||||
--------------
|
||||
Log Processing
|
||||
--------------
|
||||
|
||||
swift-log-stats-collector accepts a config file and generates a csv that is
|
||||
uploaded to swift. It loads all plugins defined in the config file, generates
|
||||
a list of all log files in swift that need to be processed, and passes an
|
||||
iterable of the log file data to the appropriate plugin's process method. The
|
||||
process method returns a dictionary of data in the log file keyed on (account,
|
||||
year, month, day, hour). The log-stats-collector process then combines all
|
||||
dictionaries from all calls to a process method into one dictionary. Key
|
||||
collisions within each (account, year, month, day, hour) dictionary are
|
||||
summed. Finally, the summed dictionary is mapped to the final csv values with
|
||||
each plugin's keylist_mapping method.
|
||||
|
||||
The resulting csv file has one line per (account, year, month, day, hour) for
|
||||
all log files processed in that run of swift-log-stats-collector.
|
||||
|
||||
|
||||
================================
|
||||
Running the stats system on SAIO
|
||||
================================
|
||||
|
||||
#. Create a swift account to use for storing stats information, and note the
|
||||
account hash. The hash will be used in config files.
|
||||
|
||||
#. Install syslog-ng::
|
||||
|
||||
sudo apt-get install syslog-ng
|
||||
|
||||
#. Add the following to the end of `/etc/syslog-ng/syslog-ng.conf`::
|
||||
|
||||
# Added for swift logging
|
||||
destination df_local1 { file("/var/log/swift/proxy.log" owner(<username>) group(<groupname>)); };
|
||||
destination df_local1_err { file("/var/log/swift/proxy.error" owner(<username>) group(<groupname>)); };
|
||||
destination df_local1_hourly { file("/var/log/swift/hourly/$YEAR$MONTH$DAY$HOUR" owner(<username>) group(<groupname>)); };
|
||||
filter f_local1 { facility(local1) and level(info); };
|
||||
|
||||
filter f_local1_err { facility(local1) and not level(info); };
|
||||
|
||||
# local1.info -/var/log/swift/proxy.log
|
||||
# write to local file and to remove log server
|
||||
log {
|
||||
source(s_all);
|
||||
filter(f_local1);
|
||||
destination(df_local1);
|
||||
destination(df_local1_hourly);
|
||||
};
|
||||
|
||||
# local1.error -/var/log/swift/proxy.error
|
||||
# write to local file and to remove log server
|
||||
log {
|
||||
source(s_all);
|
||||
filter(f_local1_err);
|
||||
destination(df_local1_err);
|
||||
};
|
||||
|
||||
#. Restart syslog-ng
|
||||
|
||||
#. Create the log directories::
|
||||
|
||||
mkdir /var/log/swift/hourly
|
||||
mkdir /var/log/swift/stats
|
||||
chown -R <username>:<groupname> /var/log/swift
|
||||
|
||||
#. Create `/etc/swift/log-processor.conf`::
|
||||
|
||||
[log-processor]
|
||||
swift_account = <your-stats-account-hash>
|
||||
user = <your-user-name>
|
||||
|
||||
[log-processor-access]
|
||||
swift_account = <your-stats-account-hash>
|
||||
container_name = log_data
|
||||
log_dir = /var/log/swift/hourly/
|
||||
source_filename_format = %Y%m%d%H
|
||||
class_path = swift.stats.access_processor.AccessLogProcessor
|
||||
user = <your-user-name>
|
||||
|
||||
[log-processor-stats]
|
||||
swift_account = <your-stats-account-hash>
|
||||
container_name = account_stats
|
||||
log_dir = /var/log/swift/stats/
|
||||
source_filename_format = %Y%m%d%H_*
|
||||
class_path = swift.stats.stats_processor.StatsLogProcessor
|
||||
account_server_conf = /etc/swift/account-server/1.conf
|
||||
user = <your-user-name>
|
||||
|
||||
#. Add the following under [app:proxy-server] in `/etc/swift/proxy-server.conf`::
|
||||
|
||||
log_facility = LOG_LOCAL1
|
||||
|
||||
#. Create a `cron` job to run once per hour to create the stats logs. In
|
||||
`/etc/cron.d/swift-stats-log-creator`::
|
||||
|
||||
0 * * * * <your-user-name> swift-account-stats-logger /etc/swift/log-processor.conf
|
||||
|
||||
#. Create a `cron` job to run once per hour to upload the stats logs. In
|
||||
`/etc/cron.d/swift-stats-log-uploader`::
|
||||
|
||||
10 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf stats
|
||||
|
||||
#. Create a `cron` job to run once per hour to upload the access logs. In
|
||||
`/etc/cron.d/swift-access-log-uploader`::
|
||||
|
||||
5 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf access
|
||||
|
||||
#. Create a `cron` job to run once per hour to process the logs. In
|
||||
`/etc/cron.d/swift-stats-processor`::
|
||||
|
||||
30 * * * * <your-user-name> swift-log-stats-collector /etc/swift/log-processor.conf
|
||||
|
||||
After running for a few hours, you should start to see .csv files in the
|
||||
log_processing_data container in the swift stats account that was created
|
||||
earlier. This file will have one entry per account per hour for each account
|
||||
with activity in that hour. One .csv file should be produced per hour. Note
|
||||
that the stats will be delayed by at least two hours by default. This can be
|
||||
changed with the new_log_cutoff variable in the config file. See
|
||||
`log-processing.conf-sample` for more details.
|
||||
Reference in New Issue
Block a user