6.8 KiB
Swift stats system
The swift stats system is composed of three parts parts: log creation, log uploading, and log processing. The system handles two types of logs (access and account stats), but it can be extended to handle other types of logs.
Log Types
Access logs
Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect the proxy log output to an hourly log file. For example, a proxy request that is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412. This allows easy log rotation and easy per-hour log processing.
Account stats logs
Account stats logs are generated by a stats system process. swift-account-stats-logger runs on each account server (via cron) and walks the filesystem looking for account databases. When an account database is found, the logger selects the account hash, bytes_used, container_count, and object_count. These values are then written out as one line in a csv file. One csv file is produced for every run of swift-account-stats-logger. This means that, system wide, one csv file is produced for every storage node. Rackspace runs the account stats logger every hour. Therefore, in a cluster of ten account servers, ten csv files are produced every hour. Also, every account will have one entry for every replica in the system. On average, there will be three copies of each account in the aggregate of all account stat csv files created in one system-wide run.
Log Processing plugins
The swift stats system is written to allow a plugin to be defined for every log type. Swift includes plugins for both access logs and storage stats logs. Each plugin is responsible for defining, in a config section, where the logs are stored on disk, where the logs will be stored in swift (account and container), the filename format of the logs on disk, the location of the plugin class definition, and any plugin-specific config values.
The plugin class definition defines three methods. The constructor must accept one argument (the dict representation of the plugin's config section). The process method must accept an iterator, and the account, container, and object name of the log. The keylist_mapping accepts no parameters.
Log Uploading
swift-log-uploader accepts a config file and a plugin name. It finds the log files on disk according to the plugin config section and uploads them to the swift cluster. This means one uploader process will run on each proxy server node and each account server node. To not upload partially-written log files, the uploader will not upload files with an mtime of less than two hours ago. Rackspace runs this process once an hour via cron.
Log Processing
swift-log-stats-collector accepts a config file and generates a csv that is uploaded to swift. It loads all plugins defined in the config file, generates a list of all log files in swift that need to be processed, and passes an iterable of the log file data to the appropriate plugin's process method. The process method returns a dictionary of data in the log file keyed on (account, year, month, day, hour). The log-stats-collector process then combines all dictionaries from all calls to a process method into one dictionary. Key collisions within each (account, year, month, day, hour) dictionary are summed. Finally, the summed dictionary is mapped to the final csv values with each plugin's keylist_mapping method.
The resulting csv file has one line per (account, year, month, day, hour) for all log files processed in that run of swift-log-stats-collector.
Running the stats system on SAIO
Create a swift account to use for storing stats information, and note the account hash. The hash will be used in config files.
Edit /etc/rsyslog.d/10-swift.conf:
# Uncomment the following to have a log containing all logs together #local1,local2,local3,local4,local5.* /var/log/swift/all.log $template HourlyProxyLog,"/var/log/swift/hourly/%$YEAR%%$MONTH%%$DAY%%$HOUR%" local1.*;local1.!notice ?HourlyProxyLog local1.*;local1.!notice /var/log/swift/proxy.log local1.notice /var/log/swift/proxy.error local1.* ~
Edit /etc/rsyslog.conf and make the following change:
$PrivDropToGroup adm
mkdir -p /var/log/swift/hourly
chown -R syslog.adm /var/log/swift
chmod 775 /var/log/swift /var/log/swift/hourly
service rsyslog restart
usermod -a -G adm <your-user-name>
Relogin to let the group change take effect.
Create `/etc/swift/log-processor.conf`:
[log-processor] swift_account = <your-stats-account-hash> user = <your-user-name> [log-processor-access] swift_account = <your-stats-account-hash> container_name = log_data log_dir = /var/log/swift/hourly/ source_filename_format = %Y%m%d%H class_path = swift.stats.access_processor.AccessLogProcessor user = <your-user-name> [log-processor-stats] swift_account = <your-stats-account-hash> container_name = account_stats log_dir = /var/log/swift/stats/ source_filename_format = %Y%m%d%H_* class_path = swift.stats.stats_processor.StatsLogProcessor account_server_conf = /etc/swift/account-server/1.conf user = <your-user-name>
Add the following under [app:proxy-server] in `/etc/swift/proxy-server.conf`:
log_facility = LOG_LOCAL1
Create a cron job to run once per hour to create the stats logs. In `/etc/cron.d/swift-stats-log-creator`:
0 * * * * <your-user-name> swift-account-stats-logger /etc/swift/log-processor.conf
Create a cron job to run once per hour to upload the stats logs. In `/etc/cron.d/swift-stats-log-uploader`:
10 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf stats
Create a cron job to run once per hour to upload the access logs. In `/etc/cron.d/swift-access-log-uploader`:
5 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf access
Create a cron job to run once per hour to process the logs. In `/etc/cron.d/swift-stats-processor`:
30 * * * * <your-user-name> swift-log-stats-collector /etc/swift/log-processor.conf
After running for a few hours, you should start to see .csv files in the log_processing_data container in the swift stats account that was created earlier. This file will have one entry per account per hour for each account with activity in that hour. One .csv file should be produced per hour. Note that the stats will be delayed by at least two hours by default. This can be changed with the new_log_cutoff variable in the config file. See log-processor.conf-sample for more details.