3527bf4ae1
They were both set at 1 in the same commit without justification and both can be bad things to set that low. This commit will just let the kernel defaults come through. Details on how bad it is set to these to 1 courtesy of Jay Vosburgh. vfs_cache_pressure Setting vfs_cache_pressure to 1 for all cases is likely to cause excessive memory usage in the dentry and inode caches for most workloads. For most uses, the default value of 100 is reasonable. The vfs_cache_pressure value specifies the percentage of objects in each of the "dentry" and "inode_entry" slab caches used by filesystems that will be viewed as "freeable" by the slab shrinking logic. Some other variables also adjust the actual number of objects that the kernel will try to free, but for the freeable quantity, a vfs_cache_pressure of 100 will attempt to free 100 times as many objects in a cache as a setting of 1. Similarly, a vfs_cache_pressure of 200 will attempt to free twice as many as a setting of 100. This only comes into play when the kernel has entered reclaim, i.e., it is trying to free cached objects in order to make space to satisfy an allocation that would otherwise fail (or an allocation has already failed or watermarks have been reached and this is occurring asynchronously). By setting vfs_cache_pressure to 1, the kernel will disproportionately reclaim pages from the page cache instead of from the dentry/inode caches, and those will grow with almost no bound (if vfs_cache_pressure is 0, they will literally grow without bound until memory is exhausted). If the system as a whole has a low cache hit ratio on the objects in the dentry and inode caches, they will simply consume memory that is kept idle, and force out page cache pages (file data, block data and anonymous pages). Eventually, the system will resort to swapping of pages and if all else fails to killing processes to free memory. With very low vfs_cache_pressure values, it is more likely that processes will be killed to free memory before dentry / inode cache objects are released. We have had several customers alleviate problems be setting thus value back to the defaults - or having to make them higher to clean things up after being at 1 for so long. vm.swappiness Setting this to 1 will heavily favor (ratio 1:199) releasing file backed pages over writing anonymous pages to swap ("swapping" a file backed page just frees the page, as it can be re-read from its backing file). So, this would, e.g., favor keeping almost all process anonymous pages (stack, heap, etc), even for idle processes, in memory over keeping file backed pages in the page cache. Change-Id: I94186f3e16f61223e362d3db0ddce799ae6120cb Closes-Bug: 1770171 Signed-off-by: Bryan Quigley <bryan.quigley@canonical.com> |
||
---|---|---|
actions | ||
files | ||
hooks | ||
lib/ceph | ||
templates | ||
tests | ||
unit_tests | ||
.coveragerc | ||
.gitignore | ||
.gitreview | ||
.project | ||
.pydevproject | ||
.testr.conf | ||
LICENSE | ||
Makefile | ||
README.md | ||
TODO | ||
actions.yaml | ||
charm-helpers-hooks.yaml | ||
charm-helpers-tests.yaml | ||
config.yaml | ||
copyright | ||
hardening.yaml | ||
icon.svg | ||
metadata.yaml | ||
requirements.txt | ||
revision | ||
setup.cfg | ||
test-requirements.txt | ||
tox.ini |
README.md
Overview
Ceph is a distributed storage and network file system designed to provide excellent performance, reliability, and scalability.
This charm deploys additional Ceph OSD storage service units and should be used in conjunction with the 'ceph-mon' charm to scale out the amount of storage available in a Ceph cluster.
Usage
The charm also supports specification of the storage devices to use in the ceph cluster::
osd-devices:
A list of devices that the charm will attempt to detect, initialise and
activate as ceph storage.
If the charm detects pre-existing data on a device it will go into a
blocked state and the operator must resolve the situation utilizing the
`list-disks`, `zap-disk` and/or `blacklist-*` actions.
This this can be a superset of the actual storage devices presented to
each service unit and can be changed post ceph-osd deployment using
`juju set`.
For example::
ceph-osd:
options:
osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde
Example utilizing Juju storage::
ceph-osd:
storage:
osd-devices: cinder,20G
Please refer to Juju Storage Documentation for details on support for various storage providers and cloud substrates.
How to deploy::
juju deploy -n 3 ceph-osd
juju deploy ceph-mon --to lxd:0
juju add-unit ceph-mon --to lxd:1
juju add-unit ceph-mon --to lxd:2
juju add-relation ceph-osd ceph-mon
Once the 'ceph-mon' charm has bootstrapped the cluster, it will notify the ceph-osd charm which will scan for the configured storage devices and add them to the pool of available storage.
Network Space support
This charm supports the use of Juju Network Spaces, allowing the charm to be bound to network space configurations managed directly by Juju. This is only supported with Juju 2.0 and above.
Network traffic can be bound to specific network spaces using the public (front-side) and cluster (back-side) bindings:
juju deploy ceph-osd --bind "public=data-space cluster=cluster-space"
alternatively these can also be provided as part of a Juju native bundle configuration:
ceph-osd:
charm: cs:xenial/ceph-osd
num_units: 1
bindings:
public: data-space
cluster: cluster-space
Please refer to the Ceph Network Reference for details on how using these options effects network traffic within a Ceph deployment.
NOTE: Spaces must be configured in the underlying provider prior to attempting to use them.
NOTE: Existing deployments using ceph-*-network configuration options will continue to function; these options are preferred over any network space binding provided if set.
AppArmor Profiles
AppArmor is not enforced for Ceph by default. An AppArmor profile can be generated by the charm. However, great care must be taken.
Changing the value of the aa-profile-mode
option is disruptive to a running Ceph cluster as all ceph-osd processes must be restarted as part of changing the AppArmor profile enforcement mode.
The generated AppArmor profile currently has a narrow supported use case, and it should always be verified in pre-production against the specific configurations and topologies intended for production.
The AppArmor profile(s) which are generated by the charm should NOT yet be used in the following scenarios:
- When there are separate journal devices.
- On any version of Ceph prior to Luminous.
- On any version of Ubuntu other than 16.04.
- With Bluestore enabled.
Block Device Encryption
The ceph-osd charm supports encryption of underlying block devices supporting OSD's.
To use the 'native' key management approach (where dm-crypt keys are stored in the ceph-mon cluster), simply set the 'osd-encrypt' configuration option::
ceph-osd:
options:
osd-encrypt: True
NOTE: This is supported for Ceph Jewel or later.
Alternatively, encryption keys can be stored in Vault; this requires deployment of the vault charm (and associated initialization of vault - see the Vault charm for details) and configuration of the 'osd-encrypt' and 'osd-encrypt-keymanager' options::
ceph-osd:
options:
osd-encrypt: True
osd-encrypt-keymanager: vault
NOTE: This option is only supported with Ceph Luminous or later.
NOTE: Changing these options post deployment will only take effect for any new block devices added to the ceph-osd application; existing OSD devices will not be encrypted.
Actions
The charm offers actions which may be used to perform operational tasks on individual units.
pause
USE WITH CAUTION - Set the local osd units in the charm to 'out' but does not stop the osds. Unless the osd cluster is set to noout (see below), this removes them from the ceph cluster and forces ceph to migrate the PGs to other OSDs in the cluster.
From upstream documentation "Do not let your cluster reach its full ratio when removing an OSD. Removing OSDs could cause the cluster to reach or exceed its full ratio."
Also note that for small clusters you may encounter the corner case where some PGs remain stuck in the active+remapped state. Refer to the above link on how to resolve this.
pause-health
(on a ceph-mon) unit can be used before pausing a ceph-osd
unit to stop the cluster rebalancing the data off this ceph-osd unit.
pause-health
sets 'noout' on the cluster such that it will not try to
rebalance the data accross the remaining units.
It is up to the user of the charm to determine whether pause-health should be used as it depends on whether the osd is being paused for maintenance or to remove it from the cluster completely.
NOTE the pause
action does NOT stop the ceph-osd processes.
resume
Set the local osd units in the charm to 'in'.
list-disks
List disks
The 'disks' key is populated with block devices that are known by udev, are not mounted and not mentioned in 'osd-journal' configuration option.
The 'blacklist' key is populated with osd-devices in the blacklist stored in the local kv store of this specific unit.
The 'non-pristine' key is populated with block devices that are known by udev, are not mounted, not mentioned in 'osd-journal' configuration option and are currently not eligible for use because of presence of foreign data.
add-disk
Add disk(s) to Ceph
Parameters
osd-devices
(required)- The devices to format and set up as osd volumes.
bucket
- The name of the bucket in Ceph to add these devices into
blacklist-add-disk
Add disk(s) to blacklist. Blacklisted disks will not be initialized for use with Ceph even if listed in the application level osd-devices configuration option.
The current blacklist can be viewed with list-disks action.
NOTE This action and blacklist will not have any effect on already initialized disks.
Parameters
osd-devices
(required)-
A space-separated list of devices to add to blacklist.
Each element should be a absolute path to a device node or filesystem directory (the latter is supported for ceph >= 0.56.6).
Example: '/dev/vdb /var/tmp/test-osd'
-
blacklist-remove-disk
Remove disk(s) from blacklist.
Parameters
osd-devices
(required)-
A space-separated list of devices to remove from blacklist.
Each element should be a existing entry in the units blacklist. Use list-disks action to list current blacklist entries.
Example: '/dev/vdb /var/tmp/test-osd'
-
zap-disk
Purge disk of all data and signatures for use by Ceph
This action can be necessary in cases where a Ceph cluster is being redeployed as the charm defaults to skipping disks that look like Ceph devices in order to preserve data. In order to forcibly redeploy, the admin is required to perform this action for each disk to be re-consumed.
In addition to triggering this action, it is required to pass an additional
parameter option of i-really-mean-it
to ensure that the
administrator is aware that this will cause data loss on the specified
device(s)
Parameters
devices
(required)- A space-separated list of devices to remove the partition table from.
i-really-mean-it
(required)- This must be toggled to enable actually performing this action
Contact Information
Author: James Page james.page@ubuntu.com Report bugs at: http://bugs.launchpad.net/charm-ceph-osd/+filebug Location: http://jujucharms.com/ceph-osd