kerberos-kdc: role to manage Kerberos KDC servers

This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.

This role automates realm creation, initial setup, key material
distribution and replica host configuration.  None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.

Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.

Change-Id: I60b40897486b29beafc76025790c501b5055313d
This commit is contained in:
Ian Wienand 2021-03-05 16:10:01 +11:00
parent 6df7767200
commit c1aff2ed38
19 changed files with 443 additions and 33 deletions

View File

@ -15,9 +15,8 @@ At a Glance
:Hosts:
* kdc*.openstack.org
:Puppet:
* https://opendev.org/opendev/puppet-kerberos
* :git_file:`modules/openstack_project/manifests/kdc.pp`
:Ansible:
* :git_file:`playbooks/service-kerberos.yaml`
:Projects:
* http://web.mit.edu/kerberos
:Bugs:
@ -30,50 +29,59 @@ At a Glance
OpenStack Realm
---------------
OpenStack runs a Kerberos ``Realm`` called ``OPENSTACK.ORG``.
The realm contains a ``Key Distribution Center`` or KDC which is spread
across a master and a slave, as well as an admin server which only runs on the
master. Most of the configuration is in puppet, but initial setup and
the management of user accounts, known as ``principals``, are manual tasks.
OpenStack runs a Kerberos ``Realm`` called ``OPENSTACK.ORG``. The
realm contains a ``Key Distribution Center`` or KDC which is spread
across a primary and a replica, as well as an admin server which only
runs on the primary.
Most of the configuration is in Ansible, but management of user
accounts, known as ``principals``, is a manual task for
administrators.
Realm Creation
--------------
On the first KDC host, the admin needs to run `krb5_newrealm` by hand. Then
admin principals and host principles need to be set up.
Realm creation is exercised by the Ansible roles during testing, but
is not expected to be used in production (because we have an active
realm/database).
Set up host principals for slave propagation::
The general process is:
# execute kadmin.local then run these commands
addprinc -randkey host/kdc03.openstack.org
addprinc -randkey host/kdc04.openstack.org
ktadd host/kdc03.openstack.org
ktadd host/kdc04.openstack.org
* create the new Kerberos database on the primary
* distribute the database ``stash`` file from the primary to
replicas, to allow them to unencrypt the database propogated to
them. This is created from a master key kept as a secret.
* create an admin user (password saved in file on primary server)
* add host principals for the primary and replica servers
* create keytabs on primary and replica servers (via the admin user),
which allows them to authenticate to each other.
* setup database propogation from primary to replicas with ``kprop``
(primary-side push) and ``kpropod`` (replica-side listen).
Copy the file `/etc/krb5.keytab` to the second kdc host.
The puppet config sets up slave propagation scripts and cron jobs to run them.
You will also need to create a stash file after creating a new realm. Run
`krb5_util stash` on the first kdc host. Copy the file `/etc/krb5kdc/stash`
to all other KDC servers for the krb5-kdc daemons to run.
In a disaster recovery situation, we can provision a fresh realm and
recover principals from dump files (XXX: 2020-03-11 ianw -- dump file
backup to come).
.. _addprinc:
Adding A User Principal
-----------------------
First, ensure the user has an entry in puppet so they have a unix
First, ensure the user has an entry in Ansible so they have a Unix
shell account on our hosts. SSH access is not necessary, but keeping
track of usernames and uids with account entries is necessary.
Then, add the user to Kerberos using kadmin (while authenticated as a
kerberos admin) or kadmin.local on the kdc::
If you are already an admin, you should authenicate with ``kinit
<username>/admin``. Otherwise you can use the ``kadmin.local`` tool
(instead of ``kadmin``) on the primary server, which by-passes
authenication and writes to the database directly.
Use ``kadmin`` to add the principal like so:
kadmin: addprinc $USERNAME@OPENSTACK.ORG
Where `$USERNAME` is the lower-case username of their unix account in
puppet. `OPENSTACK.ORG` should be capitalized.
Ansible. `OPENSTACK.ORG` should be capitalized.
If you are adding an admin principal, use
`username/admin@OPENSTACK.ORG`. Admins should additionally have
@ -87,11 +95,11 @@ than a person. There is no difference in their implementation, only
in conventions around how they are created and used. Service
principals are created without passwords and keytab files are used
instead for authentication. The program `k5start` can use keytab
files to automatically obtain kerberos credentials (and AFS if
files to automatically obtain Kerberos credentials (and AFS if
needed).
Add the service principal to Kerberos using kadmin (while
authenticated as a kerberos admin) or kadmin.local on the kdc::
authenticated as a Kerberos admin) or kadmin.local on the kdc::
kadmin: addprinc -randkey service/$NAME@OPENSTACK.ORG
@ -105,6 +113,10 @@ Then save the principal's keytab::
.. warning:: Each time ``ktadd`` is run, the key is rotated and
previous keytabs are invalidated.
These keytabs are then usually converted to base-64 and stored as
secret variables, and deployed to hosts via Ansible.
``mirror-update`` is probably a good example.
Resetting A User Principal's Password
-------------------------------------
@ -117,12 +129,12 @@ twice as prompted. If you need to reset your admin principal, use
No Service Outage Server Maintenance
------------------------------------
Should you need perform maintenance on the kerberos server that requires
taking kerberos processes offline you can do this by performing your
Should you need perform maintenance on the Kerberos server that requires
taking Kerberos processes offline you can do this by performing your
updates on a single server at a time.
`kdc03.openstack.org` is our primary server and `kdc04.openstack.org`
is the hot standby. Perform your maintenance on `kdc04.openstack.org`
is the replica. Perform your maintenance on `kdc04.openstack.org`
first. Then once that is done we can prepare for taking down the
primary. On `kdc03.openstack.org` run::
@ -132,7 +144,7 @@ You should see::
Database propagation to kdc04.openstack.org: SUCCEEDED
Once this is done the standby server is ready and we can take kdc03
Once this is done the replica is ready and we can take kdc03
offline. When kdc03 is back online rerun `run-kprop.sh` to ensure
everything is working again.

View File

@ -0,0 +1,9 @@
iptables_extra_public_tcp_ports:
- 88
- 464
- 749
- 754
iptables_extra_public_udp_ports:
- 88
- 464
- 749

View File

@ -0,0 +1,27 @@
Configure a Kerberos KDC server
All KDC servers (primary and replicas) should be in a common
``kerberos-kdc`` group that defines ``kerberos_kdc_realm`` and
``kerberos_kdc_master_key``.
The ``kerberos-kdc-primary`` group should have a single primary KDC
host. It will be configured to replicate its database to hosts in
the ``kerberos-kdc-replica`` group.
Hosts in the ``kerberos-kdc-replica`` group will be configured to
receive updates from the ``kerberos-kdc-primary`` host.
The role should be run twice; once limited to the primary group and
then a second time limited to the secondary group.
**Role Variables**
.. zuul:rolevar:: kerberos_kdc_relam
The realm for all KDC servers.
.. zuul:rolevar:: kerberos_kdc_master_key
The master key written into the *stash* file for each KDC, which
allows them to auth.

View File

@ -0,0 +1,6 @@
# This file Is the access control list for krb5 administration.
# When this file is edited run /etc/init.d/krb5-admin-server restart to activate
# One common way to set up Kerberos administration is to allow any principal
# ending in /admin is given full administrative rights.
# To enable this, uncomment the following line:
*/admin *

View File

@ -0,0 +1,14 @@
[Unit]
Description=Kerberos 5 replica KDC update server
[Service]
ExecReload=/bin/kill -HUP $MAINPID
EnvironmentFile=-/etc/default/krb5-kpropd
ExecStart=/usr/sbin/kpropd -D $DAEMON_ARGS
InaccessibleDirectories=/etc/ssh /etc/ssl/private /root
ReadOnlyDirectories=/
ReadWriteDirectories=/var/tmp /tmp /var/lib/krb5kdc /var/run /run
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
[Install]
WantedBy=multi-user.target

View File

@ -0,0 +1,32 @@
- name: Install packages
package:
name:
- krb5-kdc
state: present
- name: Ensure directories
file:
path: '{{ item }}'
state: directory
mode: 0755
owner: root
group: root
loop:
- /etc/krb5kdc
- /var/krb5kdc
- name: Install KDC config
template:
src: 'kdc.conf.j2'
dest: '/etc/krb5kdc/kdc.conf'
mode: 0644
owner: root
group: root
- name: Copy kadm5.acl
copy:
src: kadm5.acl
dest: '/etc/krb5kdc/kadm5.acl'
mode: 0644
owner: root
group: root

View File

@ -0,0 +1,94 @@
- name: Install packages
package:
name:
- krb5-admin-server
state: present
# Note the following is not really for production, where we already
# have a database setup. It is exercsied by testing however.
- name: Look for primary database
stat:
path: /var/lib/krb5kdc/principal
register: _db_created
- name: Setup clean primary
when: not _db_created.stat.exists
block:
- name: Setup primary db
shell: |
yes {{ kerberos_kdc_master_key }} | kdb5_util create -r {{ kerberos_kdc_realm }} -s
- name: Generate and save admin principal password
copy:
dest: '/etc/krb5kdc/admin.passwd'
content: '{{ lookup("password", "/dev/null chars=ascii_letters,digits length=12") }}'
owner: root
group: root
mode: '0600'
- name: Setup initial admin principal
shell: |
echo "addprinc -pw $(cat /etc/krb5kdc/admin.passwd) admin/admin@{{ kerberos_kdc_realm }}" | kadmin.local
# https://web.mit.edu/kerberos/krb5-latest/doc/admin/install_kdc.html
# It is not strictly necessary to have the primary KDC server in
# the Kerberos database, but it can be handy if you want to be
# able to swap the primary KDC with one of the replicas.
- name: Create primary host principal and keytab
shell:
cmd: |
echo "addprinc -randkey host/{{ inventory_hostname }}" | kadmin.local
echo "ktadd host/{{ inventory_hostname }}" | kadmin.local
- name: Create replica host principals
shell:
cmd: 'echo "addprinc -randkey host/{{ item }}" | kadmin.local'
with_inventory_hostnames: kerberos-kdc-replica
# The stash file is used to decrypt the on-disk database. Without
# this you are prompted for the master password on daemon start. This
# needs to be distributed to the replicas so they can also open the
# database.
- name: Read and save stash file
slurp:
src: '/etc/krb5kdc/stash'
register: kerberos_kdc_stash_file_contents
# Export this so replica servers can use this variable to authenicate
# and create keytabs for their host principals, if they need to.
- name: Read in admin/admin password
slurp:
src: "/etc/krb5kdc/admin.passwd"
register: _admin_password
- name: Export admin password
set_fact:
kerberos_kdc_admin_password: '{{ _admin_password.content | b64decode }}'
# kprop is what pushes the db to replicas. Set it up to run via cron
# periodically.
- name: Install kprop script
template:
src: 'run-kprop.sh.j2'
dest: '/usr/local/bin/run-kprop.sh'
mode: 0755
owner: root
group: root
- name: kprop cron to push db to replicas
cron:
name: kprop
minute: 15
job: '/usr/local/bin/run-kprop.sh >/dev/null 2>&1'
- name: start krb5-admin-server
systemd:
state: started
enabled: yes
name: krb5-admin-server
- name: start krb5-kdc
systemd:
state: started
enabled: yes
name: krb5-kdc

View File

@ -0,0 +1,64 @@
- name: Install packages
package:
name:
- krb5-kdc
- krb5-kpropd
state: present
# This is the key to unencrypt the database pushed by the primary
- name: Install stash file from primary
shell:
cmd: 'echo "{{ hostvars[groups["kerberos-kdc-primary"][0]]["kerberos_kdc_stash_file_contents"].content }}" | base64 -d > /etc/krb5kdc/stash'
creates: '/etc/krb5kdc/stash'
- name: Ensure stash file permsissions
file:
path: /etc/krb5kdc/stash
owner: root
group: root
mode: '0600'
# Use the admin user to write out our host keytab
- name: Create host keytab
shell:
cmd: |
echo "ktadd host/{{ inventory_hostname }}" | kadmin -p admin/admin -w '{{ hostvars[groups["kerberos-kdc-primary"][0]]["kerberos_kdc_admin_password"] }}'
creates: '/etc/krb5.keytab'
# This specifies servers that are allowed to send us updates;
# i.e. the primary server
- name: Install kpropd ACL
template:
src: 'kpropd.acl.j2'
dest: '/etc/krb5kdc/kpropd.acl'
mode: 0644
owner: root
group: root
- name: Install kpropd service
copy:
src: krb5-kpropd.service
dest: /etc/systemd/system/krb5-kpropd.service
mode: 0644
owner: root
group: root
register: _kpropd_service_installed
- name: Reload systemd
systemd:
daemon_reload: yes
when: _kpropd_service_installed.changed
- name: Ensure kpropd running
systemd:
state: started
name: krb5-kpropd
enabled: yes
# Note we can't start until replicas are distributed; the main
# service-kerberos.yaml playbook handles this.
- name: Ensure krb5-kdc is enabled
systemd:
name: krb5-kdc
enabled: yes
masked: no

View File

@ -0,0 +1,16 @@
[kdcdefaults]
kdc_ports = 750,88
[realms]
{{ kerberos_kdc_realm }} = {
database_name = /var/lib/krb5kdc/principal
admin_keytab = FILE:/etc/krb5kdc/kadm5.keytab
acl_file = /etc/krb5kdc/kadm5.acl
key_stash_file = /etc/krb5kdc/stash
kdc_ports = 750,88
max_life = 10h 0m 0s
max_renewable_life = 7d 0h 0m 0s
master_key_type = aes256-cts
supported_enctypes = aes256-cts:normal
default_principal_flags = +preauth
}

View File

@ -0,0 +1,3 @@
{% for kdc in groups["kerberos-kdc-primary"] %}
host/{{ kdc }}@{{ kerberos_kdc_realm }}
{% endfor %}

View File

@ -0,0 +1,7 @@
#!/bin/sh
kdclist="{% for s in groups['kerberos-kdc-replica'] %}{{ s }} {% endfor %}"
kdb5_util dump /var/krb5kdc/slave_datatrans
for kdc in $kdclist
do
kprop -f /var/krb5kdc/slave_datatrans $kdc
done

View File

@ -0,0 +1,47 @@
# Setting up a fresh realm, as done in CI, is a five step process of:
#
# 1. setup common packages/config
# 2. setup primary; create db, setup kprop pushes, start services.
# 3. configure replica to accept db updates via kpropd
# 4. do a db replication
# 5. start replica daemons now they have a db copy
#
# In production this is largely a no-op just ensuring things are
# running.
- hosts: "kerberos-kdc:!disabled"
name: "Configure common KDC components"
roles:
- kerberos-client
- kerberos-kdc
- hosts: "kerberos-kdc-primary:!disabled"
name: "Configure Kerberos Primary"
tasks:
- name: Configure primary KDC
include_role:
name: kerberos-kdc
tasks_from: primary
- hosts: "kerberos-kdc-replica:!disabled"
name: "Configure Kerberos Replicas"
tasks:
- name: Configure replica KDC
include_role:
name: kerberos-kdc
tasks_from: replica
- hosts: "kerberos-kdc-primary:!disabled"
name: "Run replication"
tasks:
- name: Run a DB replication
shell: |
/usr/local/bin/run-kprop.sh
- hosts: "kerberos-kdc-replica:!disabled"
name: "Ensure krb5-kdc running"
tasks:
- name: Start krb5-kdc
systemd:
name: krb5-kdc
state: started

View File

@ -0,0 +1,7 @@
- hosts: "kdc-primary.opendev.org"
tasks:
- name: Run kinit
shell: |
cat /etc/krb5kdc/admin.passwd | kinit admin/admin

View File

@ -58,6 +58,7 @@
- group_vars/registry.yaml
- group_vars/gitea.yaml
- group_vars/gitea-lb.yaml
- group_vars/kerberos-kdc.yaml
- group_vars/letsencrypt.yaml
- group_vars/meetpad.yaml
- group_vars/jvb.yaml

View File

@ -27,3 +27,11 @@ groups:
borg-backup:
- borg-backup-test01.opendev.org
- borg-backup-test02.opendev.org
kerberos-kdc:
- kdc-primary.opendev.org
- kdc-replica.opendev.org
kerberos-kdc-primary:
- kdc-primary.opendev.org
kerberos-kdc-replica:
- kdc-replica.opendev.org

View File

@ -0,0 +1,10 @@
# global server settings
kerberos_kdc_realm: OPENDEV.CI
kerberos_kdc_master_key: masterkey123
# client settings
kerberos_realm: OPENDEV.CI
kerberos_admin_server: kdc-primary.opendev.org
kerberos_kdcs:
- kdc-primary.opendev.org
- kdc-replica.opendev.org

View File

@ -593,6 +593,23 @@
- modules/
- manifests/
- job:
name: infra-prod-service-kerberos
parent: infra-prod-service-base
description: Run Kerberos playbook.
vars:
playbook_name: service-kerberos.yaml
infra_prod_ansible_forks: 1
required-projects:
- opendev/system-config
files:
- inventory/
- playbooks/service-kerberos.yaml
- inventory/service/group_vars/kerberos-kdc.yaml
- playbooks/roles/kerberos-kdc/
- roles/kerberos-client/
- playbooks/roles/iptables/
- job:
name: infra-prod-remote-puppet-else
parent: infra-prod-service-base

View File

@ -25,6 +25,7 @@
- name: opendev-buildset-registry
- name: system-config-build-image-hound
soft: true
- system-config-run-kerberos
- system-config-run-lists
- system-config-run-nodepool
- system-config-run-meetpad:
@ -131,6 +132,7 @@
- name: opendev-buildset-registry
- name: system-config-upload-image-hound
soft: true
- system-config-run-kerberos
- system-config-run-lists
- system-config-run-nodepool
- system-config-run-meetpad:
@ -253,6 +255,7 @@
soft: true
- infra-prod-service-bridge
- infra-prod-service-gitea-lb
- infra-prod-service-kerberos
- infra-prod-service-nameserver
- infra-prod-service-nodepool
- infra-prod-service-codesearch:
@ -320,6 +323,7 @@
- infra-prod-service-nameserver
- infra-prod-service-etherpad
- infra-prod-service-meetpad
- infra-prod-service-kerberos
- infra-prod-service-mirror-update
- infra-prod-service-mirror
- infra-prod-service-static

View File

@ -919,3 +919,35 @@
- testinfra/test_refstack.py
# If we rebuild the image, we want to run this job as well.
- docker/refstack/.*
- job:
name: system-config-run-kerberos
parent: system-config-run
ansible-version: 2.9
description: |
Run the playbook for kerberos servers
timeout: 3600
nodeset:
nodes:
- name: bridge.openstack.org
label: ubuntu-bionic
- name: kdc-primary.opendev.org
label: ubuntu-focal
- name: kdc-replica.opendev.org
label: ubuntu-focal
host-vars:
kdc-primary.opendev.org:
host_copy_output:
'/etc/krb5kdc/': logs
'/var/krb5kdc/': logs
kdc-replica.opendev.org:
host_copy_output:
'/etc/krb5kdc/': logs
'/var/krb5kdc/': logs
vars:
run_playbooks:
- playbooks/service-kerberos.yaml
run_test_playbook: playbooks/test-kerberos.yaml
files:
- playbooks/bridge.yaml
- playbooks/roles/kerberos-kdc/