slogging/doc/source/overview_stats.rst
John Dickinson dcf70f61b7 updated docs to reflect packaging change
this closes github issue #21
2011-09-22 09:58:55 -05:00

7.8 KiB

Swift stats system

The swift stats system is composed of three parts parts: log creation, log uploading, and log processing. The system handles two types of logs (access and account stats), but it can be extended to handle other types of logs.

Log Types

Access logs

Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect the proxy log output to an hourly log file. For example, a proxy request that is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412. This allows easy log rotation and easy per-hour log processing.

Account / Container DB stats logs

DB stats logs are generated by a stats system process. swift-account-stats-logger runs on each account server (via cron) and walks the filesystem looking for account databases. When an account database is found, the logger selects the account hash, bytes_used, container_count, and object_count. These values are then written out as one line in a csv file. One csv file is produced for every run of swift-account-stats-logger. This means that, system wide, one csv file is produced for every storage node. Rackspace runs the account stats logger every hour. Therefore, in a cluster of ten account servers, ten csv files are produced every hour. Also, every account will have one entry for every replica in the system. On average, there will be three copies of each account in the aggregate of all account stat csv files created in one system-wide run. The swift-container-stats-logger runs in a similar fashion, scanning the container dbs.

Log Processing plugins

The swift stats system is written to allow a plugin to be defined for every log type. Swift includes plugins for both access logs and storage stats logs. Each plugin is responsible for defining, in a config section, where the logs are stored on disk, where the logs will be stored in swift (account and container), the filename format of the logs on disk, the location of the plugin class definition, and any plugin-specific config values.

The plugin class definition defines three methods. The constructor must accept one argument (the dict representation of the plugin's config section). The process method must accept an iterator, and the account, container, and object name of the log. The keylist_mapping accepts no parameters.

Log Uploading

swift-log-uploader accepts a config file and a plugin name. It finds the log files on disk according to the plugin config section and uploads them to the swift cluster. This means one uploader process will run on each proxy server node and each account server node. To not upload partially-written log files, the uploader will not upload files with an mtime of less than two hours ago. Rackspace runs this process once an hour via cron.

Log Processing

swift-log-stats-collector accepts a config file and generates a csv that is uploaded to swift. It loads all plugins defined in the config file, generates a list of all log files in swift that need to be processed, and passes an iterable of the log file data to the appropriate plugin's process method. The process method returns a dictionary of data in the log file keyed on (account, year, month, day, hour). The log-stats-collector process then combines all dictionaries from all calls to a process method into one dictionary. Key collisions within each (account, year, month, day, hour) dictionary are summed. Finally, the summed dictionary is mapped to the final csv values with each plugin's keylist_mapping method.

The resulting csv file has one line per (account, year, month, day, hour) for all log files processed in that run of swift-log-stats-collector.

Running the stats system on SAIO

  1. Create a swift account to use for storing stats information, and note the account hash. The hash will be used in config files.

  2. Edit /etc/rsyslog.d/10-swift.conf:

    # Uncomment the following to have a log containing all logs together
    #local1,local2,local3,local4,local5.*   /var/log/swift/all.log
    
    $template HourlyProxyLog,"/var/log/swift/hourly/%$YEAR%%$MONTH%%$DAY%%$HOUR%"
    local1.*;local1.!notice ?HourlyProxyLog
    
    local1.*;local1.!notice /var/log/swift/proxy.log
    local1.notice           /var/log/swift/proxy.error
    local1.*                ~
  3. Edit /etc/rsyslog.conf and make the following change::

    $PrivDropToGroup adm

  4. mkdir -p /var/log/swift/hourly

  5. chown -R syslog.adm /var/log/swift

  6. chmod 775 /var/log/swift /var/log/swift/hourly

  7. service rsyslog restart

  8. usermod -a -G adm <your-user-name>

  9. Relogin to let the group change take effect.

  10. Create `/etc/swift/log-processor.conf`:

    [log-processor]
    swift_account = <your-stats-account-hash>
    user = <your-user-name>
    
    [log-processor-access]
    swift_account = <your-stats-account-hash>
    container_name = log_data
    log_dir = /var/log/swift/hourly/
    source_filename_pattern = ^
        (?P<year>[0-9]{4})
        (?P<month>[0-1][0-9])
        (?P<day>[0-3][0-9])
        (?P<hour>[0-2][0-9])
        .*$
    class_path = slogging.access_processor.AccessLogProcessor
    user = <your-user-name>
    
    [log-processor-stats]
    swift_account = <your-stats-account-hash>
    container_name = account_stats
    log_dir = /var/log/swift/stats/
    class_path = slogging.stats_processor.StatsLogProcessor
    devices = /srv/1/node
    mount_check = false
    user = <your-user-name>
    
    [log-processor-container-stats]
    swift_account = <your-stats-account-hash>
    container_name = container_stats
    log_dir = /var/log/swift/stats/
    class_path = slogging.stats_processor.StatsLogProcessor
    processable = false
    devices = /srv/1/node
    mount_check = false
    user = <your-user-name>
  11. Add the following under [app:proxy-server] in `/etc/swift/proxy-server.conf`:

    log_facility = LOG_LOCAL1
  12. Create a cron job to run once per hour to create the stats logs. In `/etc/cron.d/swift-stats-log-creator`:

    0 * * * * <your-user-name> /usr/local/bin/swift-account-stats-logger /etc/swift/log-processor.conf
  13. Create a cron job to run once per hour to create the container stats logs. In `/etc/cron.d/swift-container-stats-log-creator`:

    5 * * * * <your-user-name> /usr/local/bin/swift-container-stats-logger /etc/swift/log-processor.conf
  14. Create a cron job to run once per hour to upload the stats logs. In `/etc/cron.d/swift-stats-log-uploader`:

    10 * * * * <your-user-name> /usr/local/bin/swift-log-uploader /etc/swift/log-processor.conf stats
  15. Create a cron job to run once per hour to upload the stats logs. In `/etc/cron.d/swift-stats-log-uploader`:

    15 * * * * <your-user-name> /usr/local/bin/swift-log-uploader /etc/swift/log-processor.conf container-stats
  16. Create a cron job to run once per hour to upload the access logs. In `/etc/cron.d/swift-access-log-uploader`:

    5 * * * * <your-user-name> /usr/local/bin/swift-log-uploader /etc/swift/log-processor.conf access
  17. Create a cron job to run once per hour to process the logs. In `/etc/cron.d/swift-stats-processor`:

    30 * * * * <your-user-name> /usr/local/bin/swift-log-stats-collector /etc/swift/log-processor.conf

After running for a few hours, you should start to see .csv files in the log_processing_data container in the swift stats account that was created earlier. This file will have one entry per account per hour for each account with activity in that hour. One .csv file should be produced per hour. Note that the stats will be delayed by at least two hours by default. This can be changed with the new_log_cutoff variable in the config file. See log-processor.conf-sample for more details.