Juju Charm - Ceph OSD

Go to file

Bryan Quigley 3527bf4ae1 Removes vm.swappiness and vfs_cache_pressure They were both set at 1 in the same commit without justification and both can be bad things to set that low. This commit will just let the kernel defaults come through. Details on how bad it is set to these to 1 courtesy of Jay Vosburgh. vfs_cache_pressure Setting vfs_cache_pressure to 1 for all cases is likely to cause excessive memory usage in the dentry and inode caches for most workloads. For most uses, the default value of 100 is reasonable. The vfs_cache_pressure value specifies the percentage of objects in each of the "dentry" and "inode_entry" slab caches used by filesystems that will be viewed as "freeable" by the slab shrinking logic. Some other variables also adjust the actual number of objects that the kernel will try to free, but for the freeable quantity, a vfs_cache_pressure of 100 will attempt to free 100 times as many objects in a cache as a setting of 1. Similarly, a vfs_cache_pressure of 200 will attempt to free twice as many as a setting of 100. This only comes into play when the kernel has entered reclaim, i.e., it is trying to free cached objects in order to make space to satisfy an allocation that would otherwise fail (or an allocation has already failed or watermarks have been reached and this is occurring asynchronously). By setting vfs_cache_pressure to 1, the kernel will disproportionately reclaim pages from the page cache instead of from the dentry/inode caches, and those will grow with almost no bound (if vfs_cache_pressure is 0, they will literally grow without bound until memory is exhausted). If the system as a whole has a low cache hit ratio on the objects in the dentry and inode caches, they will simply consume memory that is kept idle, and force out page cache pages (file data, block data and anonymous pages). Eventually, the system will resort to swapping of pages and if all else fails to killing processes to free memory. With very low vfs_cache_pressure values, it is more likely that processes will be killed to free memory before dentry / inode cache objects are released. We have had several customers alleviate problems be setting thus value back to the defaults - or having to make them higher to clean things up after being at 1 for so long. vm.swappiness Setting this to 1 will heavily favor (ratio 1:199) releasing file backed pages over writing anonymous pages to swap ("swapping" a file backed page just frees the page, as it can be re-read from its backing file). So, this would, e.g., favor keeping almost all process anonymous pages (stack, heap, etc), even for idle processes, in memory over keeping file backed pages in the page cache. Change-Id: I94186f3e16f61223e362d3db0ddce799ae6120cb Closes-Bug: 1770171 Signed-off-by: Bryan Quigley <bryan.quigley@canonical.com>		2018-07-05 11:07:42 -07:00
actions	Fix subscription of filter object error	2018-06-04 18:17:14 +02:00
files	Misc updates to apparmor profile	2018-05-15 14:01:12 +01:00
hooks	Sync charm-helpers for Rocky series support	2018-06-21 18:51:42 +00:00
lib/ceph	Add pre-flight check for device pristinity	2018-06-04 17:02:11 +02:00
templates	Add support for vault key management with vaultlocker	2018-05-15 08:28:15 +01:00
tests	Sync charm-helpers for Rocky series support	2018-06-21 18:51:42 +00:00
unit_tests	Merge "Fix osd object name restriction"	2018-06-05 08:14:42 +00:00
.coveragerc	Add unit tests for service status	2015-10-06 21:15:38 +01:00
.gitignore	Add support for vault key management with vaultlocker	2018-05-15 08:28:15 +01:00
.gitreview	Add gitreview prior to migration to openstack	2016-02-24 21:53:28 +00:00
.project	Initial ceph-osd charm	2012-10-08 15:07:16 +01:00
.pydevproject	luminous: ceph-volume switch	2018-04-10 09:17:38 +01:00
.testr.conf	Add tox support	2015-10-30 11:22:54 +09:00
LICENSE	Re-license charm as Apache-2.0	2016-06-28 12:01:05 +01:00
Makefile	Update repo to do ch-sync from Git	2017-09-26 08:56:43 +02:00
README.md	Make action descriptions terse, move to README.md	2018-06-05 13:52:05 +02:00
TODO	Enable cephx support by default	2012-10-09 12:19:16 +01:00
actions.yaml	Make action descriptions terse, move to README.md	2018-06-05 13:52:05 +02:00
charm-helpers-hooks.yaml	Add support for vault key management with vaultlocker	2018-05-15 08:28:15 +01:00
charm-helpers-tests.yaml	Update repo to do ch-sync from Git	2017-09-26 08:56:43 +02:00
config.yaml	Removes vm.swappiness and vfs_cache_pressure	2018-07-05 11:07:42 -07:00
copyright	Re-license charm as Apache-2.0	2016-06-28 12:01:05 +01:00
hardening.yaml	Add hardening support	2016-03-24 11:14:47 +00:00
icon.svg	Update charm icon	2017-07-31 14:16:38 -05:00
metadata.yaml	Add support for vault key management with vaultlocker	2018-05-15 08:28:15 +01:00
requirements.txt	Add action to list unmounted disks	2016-11-28 16:53:04 -05:00
revision	[hopem] Added use-syslog cfg option to allow logging to syslog	2014-03-25 18:44:23 +00:00
setup.cfg	Add unit tests for service status	2015-10-06 21:15:38 +01:00
test-requirements.txt	Bring ceph-osd to Python 3	2017-11-17 12:13:54 +00:00
tox.ini	Add pre-flight check for device pristinity	2018-06-04 17:02:11 +02:00

README.md

Overview

Ceph is a distributed storage and network file system designed to provide excellent performance, reliability, and scalability.

This charm deploys additional Ceph OSD storage service units and should be used in conjunction with the 'ceph-mon' charm to scale out the amount of storage available in a Ceph cluster.

Usage

The charm also supports specification of the storage devices to use in the ceph cluster::

osd-devices:
    A list of devices that the charm will attempt to detect, initialise and
    activate as ceph storage.

    If the charm detects pre-existing data on a device it will go into a
    blocked state and the operator must resolve the situation utilizing the
    `list-disks`, `zap-disk` and/or `blacklist-*` actions.

    This this can be a superset of the actual storage devices presented to
    each service unit and can be changed post ceph-osd deployment using
    `juju set`.

For example::

ceph-osd:
  options:
    osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde

Example utilizing Juju storage::

ceph-osd:
  storage:
    osd-devices: cinder,20G

Please refer to Juju Storage Documentation for details on support for various storage providers and cloud substrates.

How to deploy::

juju deploy -n 3 ceph-osd
juju deploy ceph-mon --to lxd:0
juju add-unit ceph-mon --to lxd:1
juju add-unit ceph-mon --to lxd:2
juju add-relation ceph-osd ceph-mon

Once the 'ceph-mon' charm has bootstrapped the cluster, it will notify the ceph-osd charm which will scan for the configured storage devices and add them to the pool of available storage.

Network Space support

This charm supports the use of Juju Network Spaces, allowing the charm to be bound to network space configurations managed directly by Juju. This is only supported with Juju 2.0 and above.

Network traffic can be bound to specific network spaces using the public (front-side) and cluster (back-side) bindings:

juju deploy ceph-osd --bind "public=data-space cluster=cluster-space"

alternatively these can also be provided as part of a Juju native bundle configuration:

ceph-osd:
  charm: cs:xenial/ceph-osd
  num_units: 1
  bindings:
    public: data-space
    cluster: cluster-space

Please refer to the Ceph Network Reference for details on how using these options effects network traffic within a Ceph deployment.

NOTE: Spaces must be configured in the underlying provider prior to attempting to use them.

NOTE: Existing deployments using ceph-*-network configuration options will continue to function; these options are preferred over any network space binding provided if set.

AppArmor Profiles

AppArmor is not enforced for Ceph by default. An AppArmor profile can be generated by the charm. However, great care must be taken.

Changing the value of the aa-profile-mode option is disruptive to a running Ceph cluster as all ceph-osd processes must be restarted as part of changing the AppArmor profile enforcement mode.

The generated AppArmor profile currently has a narrow supported use case, and it should always be verified in pre-production against the specific configurations and topologies intended for production.

The AppArmor profile(s) which are generated by the charm should NOT yet be used in the following scenarios:

When there are separate journal devices.
On any version of Ceph prior to Luminous.
On any version of Ubuntu other than 16.04.
With Bluestore enabled.

Block Device Encryption

The ceph-osd charm supports encryption of underlying block devices supporting OSD's.

To use the 'native' key management approach (where dm-crypt keys are stored in the ceph-mon cluster), simply set the 'osd-encrypt' configuration option::

ceph-osd:
  options:
    osd-encrypt: True

NOTE: This is supported for Ceph Jewel or later.

Alternatively, encryption keys can be stored in Vault; this requires deployment of the vault charm (and associated initialization of vault - see the Vault charm for details) and configuration of the 'osd-encrypt' and 'osd-encrypt-keymanager' options::

ceph-osd:
  options:
    osd-encrypt: True
    osd-encrypt-keymanager: vault

NOTE: This option is only supported with Ceph Luminous or later.

NOTE: Changing these options post deployment will only take effect for any new block devices added to the ceph-osd application; existing OSD devices will not be encrypted.

Actions

The charm offers actions which may be used to perform operational tasks on individual units.

pause

USE WITH CAUTION - Set the local osd units in the charm to 'out' but does not stop the osds. Unless the osd cluster is set to noout (see below), this removes them from the ceph cluster and forces ceph to migrate the PGs to other OSDs in the cluster.

From upstream documentation "Do not let your cluster reach its full ratio when removing an OSD. Removing OSDs could cause the cluster to reach or exceed its full ratio."

Also note that for small clusters you may encounter the corner case where some PGs remain stuck in the active+remapped state. Refer to the above link on how to resolve this.

pause-health (on a ceph-mon) unit can be used before pausing a ceph-osd unit to stop the cluster rebalancing the data off this ceph-osd unit. pause-health sets 'noout' on the cluster such that it will not try to rebalance the data accross the remaining units.

It is up to the user of the charm to determine whether pause-health should be used as it depends on whether the osd is being paused for maintenance or to remove it from the cluster completely.

NOTE the pause action does NOT stop the ceph-osd processes.

resume

Set the local osd units in the charm to 'in'.

list-disks

List disks

The 'disks' key is populated with block devices that are known by udev, are not mounted and not mentioned in 'osd-journal' configuration option.

The 'blacklist' key is populated with osd-devices in the blacklist stored in the local kv store of this specific unit.

The 'non-pristine' key is populated with block devices that are known by udev, are not mounted, not mentioned in 'osd-journal' configuration option and are currently not eligible for use because of presence of foreign data.

add-disk

Add disk(s) to Ceph

Parameters

osd-devices (required)
- The devices to format and set up as osd volumes.
bucket
- The name of the bucket in Ceph to add these devices into

blacklist-add-disk

Add disk(s) to blacklist. Blacklisted disks will not be initialized for use with Ceph even if listed in the application level osd-devices configuration option.

The current blacklist can be viewed with list-disks action.

NOTE This action and blacklist will not have any effect on already initialized disks.

Parameters

osd-devices (required)
- A space-separated list of devices to add to blacklist.
  
  Each element should be a absolute path to a device node or filesystem directory (the latter is supported for ceph >= 0.56.6).
  
  Example: '/dev/vdb /var/tmp/test-osd'

blacklist-remove-disk

Remove disk(s) from blacklist.

Parameters

osd-devices (required)
- A space-separated list of devices to remove from blacklist.
  
  Each element should be a existing entry in the units blacklist. Use list-disks action to list current blacklist entries.
  
  Example: '/dev/vdb /var/tmp/test-osd'

zap-disk

Purge disk of all data and signatures for use by Ceph

This action can be necessary in cases where a Ceph cluster is being redeployed as the charm defaults to skipping disks that look like Ceph devices in order to preserve data. In order to forcibly redeploy, the admin is required to perform this action for each disk to be re-consumed.

In addition to triggering this action, it is required to pass an additional parameter option of i-really-mean-it to ensure that the administrator is aware that this will cause data loss on the specified device(s)

Parameters

devices (required)
- A space-separated list of devices to remove the partition table from.
i-really-mean-it (required)
- This must be toggled to enable actually performing this action

Contact Information

Author: James Page james.page@ubuntu.com Report bugs at: http://bugs.launchpad.net/charm-ceph-osd/+filebug Location: http://jujucharms.com/ceph-osd