Document the best practices, considerations, and recommendation for backups (r8,dsR8)
Update Back Up System Data section. Applied editorial fixes. Change-Id: I72dc57a185ef40f9ca98ffa5fbd841d3ecdffa49 Signed-off-by: Elisamara Aoki Goncalves <elisamaraaoki.goncalves@windriver.com>
This commit is contained in:
		@@ -6,108 +6,207 @@
 | 
			
		||||
Back Up System Data
 | 
			
		||||
===================
 | 
			
		||||
 | 
			
		||||
A system data backup of a |prod-long| system captures core system
 | 
			
		||||
information needed to restore a fully operational |prod-long| cluster.
 | 
			
		||||
A system data backup of |prod-long| system captures core system information
 | 
			
		||||
needed to restore a fully operational |prod-long| cluster.
 | 
			
		||||
 | 
			
		||||
.. contents:: In this section:
 | 
			
		||||
.. contents:: |minitoc|
 | 
			
		||||
   :local:
 | 
			
		||||
   :depth: 1
 | 
			
		||||
 | 
			
		||||
.. _backing-up-starlingx-system-data-section-N1002E-N1002B-N10001:
 | 
			
		||||
 | 
			
		||||
System Data Backups include:
 | 
			
		||||
 | 
			
		||||
.. _backing-up-starlingx-system-data-ul-enh-3dl-lp:
 | 
			
		||||
 | 
			
		||||
-   platform configuration details
 | 
			
		||||
 | 
			
		||||
-   system databases
 | 
			
		||||
 | 
			
		||||
-   patching and package repositories
 | 
			
		||||
 | 
			
		||||
-   home directory for the **sysadmin** user and all |LDAP| user accounts.
 | 
			
		||||
 | 
			
		||||
.. warning::
 | 
			
		||||
 | 
			
		||||
    During a system backup, if the files contained in 'sysadmin' user's home
 | 
			
		||||
    directory (``/home/sysadmin``) result in the overall size of the backup
 | 
			
		||||
    being larger than 2 Gbytes, the backup operation may fail.
 | 
			
		||||
 | 
			
		||||
.. xreflink See |sec-doc|: :ref:`Local LDAP Linux User Accounts
 | 
			
		||||
    <local-ldap-linux-user-accounts>` for additional information.
 | 
			
		||||
 | 
			
		||||
    .. note::
 | 
			
		||||
        If there is any change in hardware configuration, for example, new
 | 
			
		||||
        NICs, a system backup is required to ensure that there is no
 | 
			
		||||
        configuration mismatch after system restore.
 | 
			
		||||
 | 
			
		||||
.. _backing-up-starlingx-system-data-section-N10089-N1002B-N10001:
 | 
			
		||||
 | 
			
		||||
------------------------------------
 | 
			
		||||
Detailed contents of a system backup
 | 
			
		||||
------------------------------------
 | 
			
		||||
 | 
			
		||||
The backup contains details as listed below:
 | 
			
		||||
Contents of System Backup
 | 
			
		||||
-------------------------
 | 
			
		||||
 | 
			
		||||
.. _backing-up-starlingx-system-data-ul-s3t-bz4-kjb:
 | 
			
		||||
 | 
			
		||||
-   Platform Configuration Data.
 | 
			
		||||
The following content is included in the backup:
 | 
			
		||||
 | 
			
		||||
    All platform configuration data and files required to fully restore the
 | 
			
		||||
    system to a working state following the platform restore procedure.
 | 
			
		||||
- All platform configuration data required to fully restore the system to a
 | 
			
		||||
  working state following the platform restore procedure.
 | 
			
		||||
 | 
			
		||||
-   (Optional) Any end user container images in **registry.local**; that
 | 
			
		||||
    is, any images other than |org| system and application images.
 | 
			
		||||
    |prod| system and application images are repulled from their
 | 
			
		||||
    original source, external registries during the restore procedure.
 | 
			
		||||
  - Platform and Kubernetes databases.
 | 
			
		||||
 | 
			
		||||
-   Home directory 'sysadmin' user, and all |LDAP| user accounts
 | 
			
		||||
    (item=/etc)
 | 
			
		||||
  - Platform configuration files.
 | 
			
		||||
 | 
			
		||||
-   Patching and package repositories:
 | 
			
		||||
  - Platform certificates and keys.
 | 
			
		||||
 | 
			
		||||
    -   item=/opt/patching
 | 
			
		||||
- Home directory for the sysadmin user and all |LDAP| user accounts.
 | 
			
		||||
 | 
			
		||||
    -   item=/var/www/pages/updates
 | 
			
		||||
- End-user container images in ``registry.local``; that is, any images other
 | 
			
		||||
  than |org| system and application images. |prod| system and application
 | 
			
		||||
  images are re-pulled from their original source, and (optional) external
 | 
			
		||||
  registries during the restore procedure.
 | 
			
		||||
 | 
			
		||||
- Distributed Cloud Vault (Central System Controller only).
 | 
			
		||||
 | 
			
		||||
.. _backing-up-starlingx-system-data-section-N1021A-N1002B-N10001:
 | 
			
		||||
The following content is excluded from the backup:
 | 
			
		||||
 | 
			
		||||
-----------------------------------
 | 
			
		||||
Data not included in system backups
 | 
			
		||||
-----------------------------------
 | 
			
		||||
- Application |PVC| data on Ceph clusters.
 | 
			
		||||
 | 
			
		||||
.. _backing-up-starlingx-system-data-ul-im2-b2y-lp:
 | 
			
		||||
- Modifications manually made to the file systems, such as configuration
 | 
			
		||||
  changes on the ``/etc`` directory. After a restore operation has been
 | 
			
		||||
  completed, these modifications must be reapplied.
 | 
			
		||||
 | 
			
		||||
-   Application |PVCs| on Ceph clusters.
 | 
			
		||||
- Home directories and passwords of local user accounts. They must be backed up
 | 
			
		||||
  manually by the sysadmin.
 | 
			
		||||
 | 
			
		||||
-   StarlingX application data. Use the command :command:`system
 | 
			
		||||
    application-list` to display a list of installed applications.
 | 
			
		||||
 | 
			
		||||
-   Modifications manually made to the file systems, such as configuration
 | 
			
		||||
    changes on the /etc directory. After a restore operation has been completed,
 | 
			
		||||
    these modifications have to be reapplied.
 | 
			
		||||
 | 
			
		||||
-   Home directories and passwords of local user accounts. They must be
 | 
			
		||||
    backed up manually by the system administrator.
 | 
			
		||||
 | 
			
		||||
-   The /root directory. Use the **sysadmin** account instead when root
 | 
			
		||||
    access is needed.
 | 
			
		||||
- The ``/root`` directory. Use the sysadmin account instead when root access is
 | 
			
		||||
  needed.
 | 
			
		||||
 | 
			
		||||
.. note::
 | 
			
		||||
    The system data backup can only be used to restore the cluster from
 | 
			
		||||
    which the backup was made. You cannot use the system data backup to
 | 
			
		||||
    restore the system to different hardware. Perform a system data backup
 | 
			
		||||
    for each cluster and label the backup accordingly.
 | 
			
		||||
 | 
			
		||||
    To ensure recovery from the backup file during a restore procedure,
 | 
			
		||||
    containers must be in the active state when performing the backup.
 | 
			
		||||
    Containers that are in a shutdown or paused state at the time of the
 | 
			
		||||
    backup will not be recovered after a subsequent restore procedure.
 | 
			
		||||
    Ceph data may be retained when restoring to the same servers and cluster.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
System Backup Size
 | 
			
		||||
------------------
 | 
			
		||||
 | 
			
		||||
Consider the following for backup size:
 | 
			
		||||
 | 
			
		||||
- The base size of a platform system backup sizes range from 10MB to 30MB,
 | 
			
		||||
  depending on the size of the system and deployment. |AIO-SX| systems are
 | 
			
		||||
  typically 20MB or less.
 | 
			
		||||
 | 
			
		||||
- Backup of user home directories can cause the backup archive to be very large
 | 
			
		||||
  and is limited to 2GB or less.
 | 
			
		||||
 | 
			
		||||
- Total backup size should be below 100MB when using centralized backup and
 | 
			
		||||
  restore operations.
 | 
			
		||||
 | 
			
		||||
- Container images are large and will only be backed up locally to avoid large
 | 
			
		||||
  image archives being transferred for each system. Container images that are
 | 
			
		||||
  not present on the system may be pulled as part of platform and application
 | 
			
		||||
  deployment, or restored separately to the local registry
 | 
			
		||||
  (``registry.local``).
 | 
			
		||||
 | 
			
		||||
- There can also be a significant size impact when patching is included in the
 | 
			
		||||
  backup.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
System Backup Filesystem Usage
 | 
			
		||||
------------------------------
 | 
			
		||||
 | 
			
		||||
The following filesystems are used during the backup operations of the system
 | 
			
		||||
for both local and centralized backup.
 | 
			
		||||
 | 
			
		||||
**Staging Storage**
 | 
			
		||||
 | 
			
		||||
The host filesystem used to stage temporary files during backup operations. The
 | 
			
		||||
filesystem may also be used to store final backup images if the filesystem is
 | 
			
		||||
sufficiently sized to store the backup archives.
 | 
			
		||||
 | 
			
		||||
Host filesystem name: backup
 | 
			
		||||
 | 
			
		||||
System path: ``/opt/backups``
 | 
			
		||||
 | 
			
		||||
Default size: 25GB
 | 
			
		||||
 | 
			
		||||
For more information on how to modify the host filesystem sizes see
 | 
			
		||||
:ref:`Resize Filesystems on a Host <resizing-filesystems-on-a-host>`.
 | 
			
		||||
 | 
			
		||||
**Local Storage**
 | 
			
		||||
 | 
			
		||||
The host filesystem used to store backup files in a protected partition which
 | 
			
		||||
does not get wiped during system reinstallation. The protected local backup
 | 
			
		||||
partition is typically used by |AIO-SX| systems where there is no redundant
 | 
			
		||||
filesystem storage and is the default for local backups.
 | 
			
		||||
 | 
			
		||||
.. note::
 | 
			
		||||
 | 
			
		||||
    The filesystem is shared with system release pre-staging and needs to be
 | 
			
		||||
    sized for both pre-staging installation media and backup archives.
 | 
			
		||||
 | 
			
		||||
System Path: ``/opt/platform-backup/backups``
 | 
			
		||||
 | 
			
		||||
Default Size: 30GB
 | 
			
		||||
 | 
			
		||||
**Centralized Storage**
 | 
			
		||||
 | 
			
		||||
The Distributed Cloud (DC) Vault filesystem is used to store backup archives
 | 
			
		||||
when using centralized backup and restore. The filesystem size must be
 | 
			
		||||
increased to accommodate subcloud backup archive storage. A separate backup
 | 
			
		||||
archive is stored per subcloud and release, and therefore, must be sized to
 | 
			
		||||
accommodate all backups.
 | 
			
		||||
 | 
			
		||||
System path: ``/opt/dc-vault/backups/<subcloud-name>/<release-version>``
 | 
			
		||||
 | 
			
		||||
Default size: 15GB
 | 
			
		||||
 | 
			
		||||
.. note::
 | 
			
		||||
 | 
			
		||||
    The filesystem is shared for |DC| subcloud deployment and management and
 | 
			
		||||
    must be sized to store subcloud deployment files (subcloud configuration,
 | 
			
		||||
    ISO images and subcloud staging files).
 | 
			
		||||
 | 
			
		||||
For more information on how to modify the controller filesystem sizes see
 | 
			
		||||
:ref:`Storage on Controller Hosts
 | 
			
		||||
<controller-hosts-storage-on-controller-hosts>`.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Distributed Cloud Centralized Backups
 | 
			
		||||
-------------------------------------
 | 
			
		||||
 | 
			
		||||
A subcloud's system data and optionally container images (from
 | 
			
		||||
``registry.local``) can be backed up using DCManager CLI command line
 | 
			
		||||
interface. The subcloud's system backup data can either be stored locally on
 | 
			
		||||
the subcloud or on the System Controller.. The subcloud's container image
 | 
			
		||||
backup (from ``registry.local``) can only be stored locally on the subcloud to
 | 
			
		||||
avoid overloading the central storage and the network with large amount of data
 | 
			
		||||
transfer and redundant storage of images in a central location.
 | 
			
		||||
 | 
			
		||||
.. image:: figures/system-controller-backup-and-restore.png
 | 
			
		||||
    :width: 800
 | 
			
		||||
 | 
			
		||||
For more information on the |CLI| operation of the centralized backup
 | 
			
		||||
capability see :ref:`Backup a Subcloud/Group of Subclouds using DCManager CLI
 | 
			
		||||
<backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42>`.
 | 
			
		||||
 | 
			
		||||
For more information on DCManager - Subcloud Backup API see `Subcloud
 | 
			
		||||
Backups
 | 
			
		||||
<https://docs.starlingx.io/api-ref/distcloud/api-ref-dcmanager-v1.html#subcloud-backups>`__.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Execution Time for System Backups
 | 
			
		||||
---------------------------------
 | 
			
		||||
 | 
			
		||||
- The time to execute system backups is approximately 3-4 minutes for an idle
 | 
			
		||||
  system.
 | 
			
		||||
 | 
			
		||||
- Centralized backups may require additional time for network transfer for
 | 
			
		||||
  larger backups.
 | 
			
		||||
 | 
			
		||||
- Subcloud backups may be initiated and monitored from the DCManager |CLI| or
 | 
			
		||||
  API, including parallel backups.
 | 
			
		||||
 | 
			
		||||
- A minor alarm (210.001) "System Backup in progress" is raised while backing
 | 
			
		||||
  up an individual system.
 | 
			
		||||
 | 
			
		||||
- Systems with at least 4 platform cores will have much faster execution times.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Recommended Backup and Retention Policies
 | 
			
		||||
-----------------------------------------
 | 
			
		||||
 | 
			
		||||
- All backups should be performed remotely and stored off the system.
 | 
			
		||||
 | 
			
		||||
- All backups are done during off-peak hours (i.e. maintenance window).
 | 
			
		||||
 | 
			
		||||
  - Weekly backups should be performed under normal steady state conditions to
 | 
			
		||||
    ensure the system can be restored to a fully operational state.
 | 
			
		||||
 | 
			
		||||
  - Nightly backups are the exception and should only be performed in periods
 | 
			
		||||
    of significant reconfiguration to the system such as during large/mass
 | 
			
		||||
    rollout (addition of subclouds), upgrade cycle of multiple sites, or
 | 
			
		||||
    disaster recovery rehoming of subclouds.
 | 
			
		||||
 | 
			
		||||
- Backups should be performed prior to performing maintenance operations or
 | 
			
		||||
  applying configuration changes to the platform or hosted applications.
 | 
			
		||||
 | 
			
		||||
- The retention period of backups should be approximately one month.
 | 
			
		||||
 | 
			
		||||
  - Since Kubernetes is an intent-based system, the most recent backup is the
 | 
			
		||||
    most important.
 | 
			
		||||
 | 
			
		||||
When the system data backup is complete, the backup file must be kept in a
 | 
			
		||||
secured location, probably holding multiple copies of them for redundancy
 | 
			
		||||
purposes.
 | 
			
		||||
 | 
			
		||||
.. seealso::
 | 
			
		||||
   :ref:`Run Ansible Backup Playbook Locally on the Controller
 | 
			
		||||
 
 | 
			
		||||
										
											Binary file not shown.
										
									
								
							| 
		 After Width: | Height: | Size: 103 KiB  | 
@@ -70,7 +70,7 @@ End user container images in ``registry.local`` will be backed up during the
 | 
			
		||||
upgrade process. This only includes images other than |prod| system and
 | 
			
		||||
application images. These images are limited to 5 GB in total size. If the
 | 
			
		||||
system contains more than 5 GB of these images, the upgrade start will fail.
 | 
			
		||||
For more details, see :ref:`Detailed contents of a system backup
 | 
			
		||||
For more details, see :ref:`Contents of System Backup
 | 
			
		||||
<backing-up-starlingx-system-data-ul-s3t-bz4-kjb>`.
 | 
			
		||||
 | 
			
		||||
.. rubric:: |proc|
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user