Install ELK with beats to gather metrics ######################################## :tags: openstack, ansible About this repository --------------------- This set of playbooks will deploy an elastic stack cluster (Elasticsearch, Logstash, Kibana) with beats to gather metrics from hosts and store them into the elastic stack. **These playbooks require Ansible 2.5+.** Highlevel overview of the Elastic-Stack infrastructure these playbooks will build and operate against. .. image:: assets/Elastic-Stack-Diagram.svg :alt: Elasticsearch Architecture Diagram :align: center OpenStack-Ansible Integration ----------------------------- These playbooks can be used as standalone inventory or as an integrated part of an OpenStack-Ansible deployment. For a simple example of standalone inventory, see [test-inventory.yml](tests/inventory/test-inventory.yml). Optional | Load balancer configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Configure the Elasticsearch endpoints: While the Elastic stack cluster does not need a load balancer to scale, it is useful when accessing the Elasticsearch cluster using external tooling. Tools like OSProfiler, Grafana, etc will all benefit from being able to interact with Elasticsearch using the load balancer. This provides better fault tolerance especially when compared to connecting to a single node. The following section can be added to the `haproxy_extra_services` list to create an Elasticsearch backend. The ingress port used to connect to Elasticsearch is **9201**. The backend port is **9200**. If this backend is setup make sure you set the `internal_lb_vip_address` on the CLI or within a known variable file which will be sourced at runtime. If using HAProxy, edit the `/etc/openstack_deploy/user_variables.yml` file and add the following lines. .. code-block:: yaml haproxy_extra_services: - service: haproxy_service_name: elastic-logstash haproxy_ssl: False haproxy_backend_nodes: "{{ groups['Kibana'] | default([]) }}" # Kibana nodes are also Elasticsearch coordination nodes haproxy_port: 9201 # This is set using the "elastic_hap_port" variable haproxy_check_port: 9200 # This is set using the "elastic_port" variable haproxy_backend_port: 9200 # This is set using the "elastic_port" variable haproxy_balance_type: tcp Configure the Kibana endpoints: It is recommended to use a load balancer with Kibana. Like Elasticsearch, a load balancer is not required however without one users will need to directly connect to a single Kibana node to access the dashboard. If a load balancer is present it can provide a highly available address for users to access a pool of Kibana nodes which will provide a much better user experience. If using HAProxy, edit the `/etc/openstack_deploy/user_variables.yml` file and add the following lines. .. code-block:: yaml haproxy_extra_services: - service: haproxy_service_name: Kibana haproxy_ssl: False haproxy_backend_nodes: "{{ groups['Kibana'] | default([]) }}" haproxy_port: 81 # This is set using the "Kibana_nginx_port" variable haproxy_balance_type: tcp Configure the APM endpoints: It is recommented to use a load balancer for submitting Application Performance Monitoring data to the APM server. A load balancer will provide a highly available address which APM clients can use to connect to a pool of APM nodes. If using HAProxy, edit the `/etc/openstack_deploy/user_variables.yml` and add the following lines .. code-block:: yaml haproxy_extra_services: - service: haproxy_service_name: apm-server haproxy_ssl: False haproxy_backend_nodes: "{{ groups['apm-server'] | default([]) }}" haproxy_port: 8200 # this is set using the "apm_port" variable haproxy_balance_type: tcp Optional | add OSProfiler to an OpenStack-Ansible deployment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To initialize the `OSProfiler` module within openstack the following overrides can be applied to the to a user variables file. The hmac key needs to be defined consistently throughout the environment. Full example to initialize the `OSProfiler` modules throughout an OpenStack-Ansible deployment. .. code-block:: yaml profiler_overrides: &os_profiler profiler: enabled: true trace_sqlalchemy: true hmac_keys: "UNIQUE_HMACKEY" # This needs to be set consistently throughout the deployment connection_string: "Elasticsearch://{{ internal_lb_vip_address }}:9201" es_doc_type: "notification" es_scroll_time: "2m" es_scroll_size: "10000" filter_error_trace: "false" aodh_aodh_conf_overrides: *os_profiler barbican_config_overrides: *os_profiler ceilometer_ceilometer_conf_overrides: *os_profiler cinder_cinder_conf_overrides: *os_profiler designate_designate_conf_overrides: *os_profiler glance_glance_api_conf_overrides: *os_profiler gnocchi_conf_overrides: *os_profiler heat_heat_conf_overrides: *os_profiler horizon_config_overrides: *os_profiler ironic_ironic_conf_overrides: *os_profiler keystone_keystone_conf_overrides: *os_profiler magnum_config_overrides: *os_profiler neutron_neutron_conf_overrides: *os_profiler nova_nova_conf_overrides: *os_profiler octavia_octavia_conf_overrides: *os_profiler rally_config_overrides: *os_profiler sahara_conf_overrides: *os_profiler swift_swift_conf_overrides: *os_profiler tacker_tacker_conf_overrides: *os_profiler trove_config_overrides: *os_profiler If a deployer wishes to use multiple keys they can do so by with comma separated list. .. code-block:: yaml profiler_overrides: &os_profiler profiler: hmac_keys: "key1,key2" To add the `OSProfiler` section to an exist set of overrides, the `yaml` section can be added or dynamcally appended to a given hash using `yaml` tags. .. code-block:: yaml profiler_overrides: &os_profiler profiler: enabled: true hmac_keys: "UNIQUE_HMACKEY" # This needs to be set consistently throughout the deployment connection_string: "Elasticsearch://{{ internal_lb_vip_address }}:9201" es_doc_type: "notification" es_scroll_time: "2m" es_scroll_size: "10000" filter_error_trace: "false" # Example to merge the os_profiler tag to into an existing override hash nova_nova_conf_overrides: section1_override: key: "value" <<: *os_profiler While the `osprofiler` and `Elasticsearch` libraries should be installed within all virtual environments by default, it's possible they're missing within a given deployment. To install these dependencies throughout the cluster without having to invoke a *repo-build* run the following *adhoc* Ansible command can by used. The version of the Elasticsearch python library should match major version of of Elasticsearch being deployed within the environment. .. code-block:: bash ansible -m shell -a 'find /openstack/venvs/* -maxdepth 0 -type d -exec {}/bin/pip install osprofiler "elasticsearch>=6.0.0,<7.0.0" --isolated \;' all Once the overrides are in-place the **openstack-ansible** playbooks will need to be rerun. To simply inject these options into the system a deployer will be able to use the `*-config` tags that are apart of all `os_*` roles. The following example will run the **config** tag on **ALL** openstack playbooks. .. code-block:: bash openstack-ansible setup-openstack.yml --tags "$(cat setup-openstack.yml | grep -wo 'os-.*' | awk -F'-' '{print $2 "-config"}' | tr '\n' ',')" Once the `OSProfiler` module has been initialized tasks can be profiled on demand by using the `--profile` or `--os-profile` switch in the various openstack clients along with one of the given hmac keys defined. Legacy profile example command. .. code-block:: bash glance --profile key1 image-list Modern profile example command, requires `python-openstackclient >= 3.4.1` and the `osprofiler` library. .. code-block:: bash openstack --os-profile key2 image list If the client library is not installed in the same path as the `python-openstackclient` client, run the following command to install the required library. .. code-block:: bash pip install osprofiler Optional | run the haproxy-install playbook ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash cd /opt/openstack-ansible/playbooks/ openstack-ansible haproxy-install.yml --tags=haproxy-service-config Setup | system configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Clone the elk-osa repo .. code-block:: bash cd /opt git clone https://github.com/openstack/openstack-ansible-ops Copy the env.d file into place .. code-block:: bash cd /opt/openstack-ansible-ops/elk_metrics_7x cp env.d/elk.yml /etc/openstack_deploy/env.d/ Copy the conf.d file into place .. code-block:: bash cp conf.d/elk.yml /etc/openstack_deploy/conf.d/ In **elk.yml**, list your logging hosts under elastic-logstash_hosts to create the Elasticsearch cluster in multiple containers and one logging host under `kibana_hosts` to create the Kibana container .. code-block:: bash vi /etc/openstack_deploy/conf.d/elk.yml Create the containers .. code-block:: bash cd /opt/openstack-ansible/playbooks openstack-ansible lxc-containers-create.yml --limit elk_all Deploying | Installing with embedded Ansible ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If this is being executed on a system that already has Ansible installed but is incompatible with these playbooks the script `bootstrap-embedded-ansible.sh` can be sourced to grab an embedded version of Ansible prior to executing the playbooks. .. code-block:: bash source bootstrap-embedded-ansible.sh Deploying | Manually resolving the dependencies ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This playbook has external role dependencies. If Ansible is not installed with the `bootstrap-ansible.sh` script these dependencies can be resolved with the ``ansible-galaxy`` command and the ``ansible-role-requirements.yml`` file. * Example galaxy execution .. code-block:: bash ansible-galaxy install -r ansible-role-requirements.yml Once the dependencies are set make sure to set the action plugin path to the location of the config_template action directory. This can be done using the environment variable `ANSIBLE_ACTION_PLUGINS` or through the use of an `ansible.cfg` file. Deploying | The environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Install master/data Elasticsearch nodes on the elastic-logstash containers, deploy logstash, deploy Kibana, and then deploy all of the service beats. .. code-block:: bash cd /opt/openstack-ansible-ops/elk_metrics_7x ansible-playbook site.yml $USER_VARS * The `openstack-ansible` command can be used if the version of ansible on the system is greater than **2.5**. This will automatically pick up the necessary group_vars for hosts in an OSA deployment. * You may need to gather facts before running, ``openstack -m setup elk_all`` will gather the facts you will need. * If required add ``-e@/opt/openstack-ansible/inventory/group_vars/all/all.yml`` to import sufficient OSA group variables to define the OpenStack release. Journalbeat will then deploy onto all hosts/containers for releases prior to Rocky, and hosts only for Rocky onwards. If the variable ``openstack_release`` is undefined the default behaviour is to deploy Journalbeat to hosts only. * Alternatively if using the embedded ansible, create a symlink to include all of the OSA group_vars. These are not available by default with the embedded ansible and can be symlinked into the ops repo. .. code-block:: bash ln -s /opt/openstack-ansible/inventory/group_vars /opt/openstack-ansible-ops/elk_metrics_7x/group_vars The individual playbooks found within this repository can be independently run at anytime. Architecture | Data flow ^^^^^^^^^^^^^^^^^^^^^^^^ This diagram outlines the data flow from within an Elastic-Stack deployment. .. image:: assets/Elastic-dataflow.svg :alt: Elastic-Stack Data Flow Diagram :align: center Optional | Enable uwsgi stats ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Config overrides can be used to make uwsgi stats available on unix domain sockets. Any /tmp/-uwsgi-stats.sock will be picked up by Metricsbeat. .. code-block:: yaml keystone_uwsgi_ini_overrides: uwsgi: stats: "/tmp/keystone-uwsgi-stats.sock" cinder_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/cinder-api-uwsgi-stats.sock" glance_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/glance-api-uwsgi-stats.sock" heat_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/heat-api-uwsgi-stats.sock" heat_api_cfn_init_overrides: uwsgi: stats: "/tmp/heat-api-cfn-uwsgi-stats.sock" nova_api_metadata_uwsgi_ini_overrides: uwsgi: stats: "/tmp/nova-api-metadata-uwsgi-stats.sock" nova_api_os_compute_uwsgi_ini_overrides: uwsgi: stats: "/tmp/nova-api-os-compute-uwsgi-stats.sock" nova_placement_uwsgi_ini_overrides: uwsgi: stats: "/tmp/nova-placement-uwsgi-stats.sock" octavia_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/octavia-api-uwsgi-stats.sock" sahara_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/sahara-api-uwsgi-stats.sock" ironic_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/ironic-api-uwsgi-stats.sock" magnum_api_uwsgi_ini_overrides: uwsgi: stats: "/tmp/magnum-api-uwsgi-stats.sock" Rerun all of the **openstack-ansible** playbooks to enable these stats. Use the `${service_name}-config` tags on all of the `os_*` roles. It's possible to auto-generate the tags list with the following command. .. code-block:: bash openstack-ansible setup-openstack.yml --tags "$(cat setup-openstack.yml | grep -wo 'os-.*' | awk -F'-' '{print $2 "-config"}' | tr '\n' ',')" Optional | add Kafka Output format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To send data from Logstash to Kafka create the `logstash_kafka_options` variable. This variable will be used as a generator and create a Kafka output configuration file using the key/value pairs as options. .. code-block:: yaml logstash_kafka_options: codec: json topic_id: "elk_kafka" ssl_key_password: "{{ logstash_kafka_ssl_key_password }}" ssl_keystore_password: "{{ logstash_kafka_ssl_keystore_password }}" ssl_keystore_location: "/var/lib/logstash/{{ logstash_kafka_ssl_keystore_location | basename }}" ssl_truststore_location: "/var/lib/logstash/{{ logstash_kafka_ssl_truststore_location | basename }}" ssl_truststore_password: "{{ logstash_kafka_ssl_truststore_password }}" bootstrap_servers: - server1.local:9092 - server2.local:9092 - server3.local:9092 client_id: "elk_metrics_7x" compression_type: "gzip" security_protocol: "SSL" id: "UniqueOutputID" For a complete list of all options available within the Logstash Kafka output plugin please review the `following documentation `_. Optional config: The following variables are optional and correspond to the example `logstash_kafka_options` variable. .. code-block:: yaml logstash_kafka_ssl_key_password: "secrete" logstash_kafka_ssl_keystore_password: "secrete" logstash_kafka_ssl_truststore_password: "secrete" # SSL certificates in Java KeyStore format logstash_kafka_ssl_keystore_location: "/root/kafka/keystore.jks" logstash_kafka_ssl_truststore_location: "/root/kafka/truststore.jks" When using the kafka output plugin the options, `logstash_kafka_ssl_keystore_location` and `logstash_kafka_ssl_truststore_location` will automatically copy a local SSL key to the logstash nodes. These options are string value and assume the deployment nodes have local access to the files. Optional | add Grafana visualizations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See the grafana directory for more information on how to deploy grafana. Once When deploying grafana, source the variable file from ELK in order to automatically connect grafana to the Elasticsearch datastore and import dashboards. Including the variable file is as simple as adding ``-e @../elk_metrics_7x/vars/variables.yml`` to the grafana playbook run. Included dashboards. * https://grafana.com/dashboards/5569 * https://grafana.com/dashboards/5566 Example command using the embedded Ansible from within the grafana directory. .. code-block:: bash ansible-playbook ${USER_VARS} installGrafana.yml \ -e @../elk_metrics_7x/vars/variables.yml \ -e 'galera_root_user="root"' \ -e 'galera_address={{ internal_lb_vip_address }}' Optional | add kibana custom dashboard ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you want to use a custom dashboard directly on your kibana, you can run the playbook bellow. The dashboard uses filebeat to collect the logs of your deployment. .. code-block:: bash ansible-playbook setupKibanaDashboard.yml $USER_VARS Overview of kibana custom dashboard .. image:: assets/openstack-kibana-custom-dashboard.png :scale: 50 % :alt: Kibana Custom Dashboard :align: center Optional | Customize Elasticsearch cluster configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cluster configuration can be augmented using several variables which will force a node to use a given role. By default all nodes are data and ingest eligible. Available roles are *data*, *ingest*, and *master*. * ``elasticsearch_node_data``: This variable will override the automatic node determination and set a given node to be an "data" node. * ``elasticsearch_node_ingest``: This variable will override the automatic node determination and set a given node to be an "ingest" node. * ``elasticsearch_node_master``: This variable will override the automatic node determination and set a given node to be an "master" node. Example setting override options within inventory. .. code-block:: yaml hosts: children: elastic: hosts: elk1: ansible_host: 10.0.0.1 ansible_user: root elasticsearch_node_master: true elasticsearch_node_data: false elasticsearch_node_ingest: false elk2: ansible_host: 10.0.0.2 ansible_user: root elasticsearch_node_master: false elasticsearch_node_data: true elasticsearch_node_ingest: false elk3: ansible_host: 10.0.0.3 ansible_user: root elasticsearch_node_master: false elasticsearch_node_data: false elasticsearch_node_ingest: true elk4: ansible_host: 10.0.0.4 ansible_user: root logstash: children: elk3: elk4: With the following inventory settings **elk1** would be a master node, **elk2** would be a data, **elk3** would be an ingest node, and **elk4** would be both a data and an ingest node. **elk3** and **elk4** would become the nodes hosting logstash instances. Upgrading the cluster --------------------- To upgrade the packages throughout the elastic search cluster set the package state variable, `elk_package_state`, to latest. .. code-block:: bash cd /opt/openstack-ansible-ops/elk_metrics_7x ansible-playbook site.yml $USER_VARS -e 'elk_package_state="latest"' Forcing the Elasticsearch cluster retention policy to refresh ------------------------------------------------------------- To force the cluster retention policy to refresh set `elastic_retention_refresh`, to "yes". When setting `elastic_retention_refresh` to "yes" the retention policy will forcibly be refresh across all hosts. This option should only be used when the Elasticsearch storage array is modified on an existing cluster. Should the Elasticseach cluster size change (nodes added or removed) the retention policy will automatically be refreshed on playbook execution. .. code-block:: bash cd /opt/openstack-ansible-ops/elk_metrics_7x ansible-playbook site.yml $USER_VARS -e 'elastic_retention_refresh="yes"' Trouble shooting ---------------- If everything goes bad, you can clean up with the following command .. code-block:: bash openstack-ansible /opt/openstack-ansible-ops/elk_metrics_7x/site.yml -e 'elk_package_state="absent"' --tags package_install openstack-ansible /opt/openstack-ansible/playbooks/lxc-containers-destroy.yml --limit elk_all Local testing ------------- To test these playbooks within a local environment you will need a single server with at leasts 8GiB of RAM and 40GiB of storage on root. Running an `m1.medium` (openstack) flavor size is generally enough to get an environment online. To run the local functional tests execute the `run-tests.sh` script out of the tests directory. This will create a 4 node elasaticsearch cluster, 1 kibana node with an elasticsearch coordination process, and 1 APM node. The beats will be deployed to the environment as if this was a production installation. .. code-block:: bash CLUSTERED=yes tests/run-tests.sh After the test build is completed the cluster will test it's layout and ensure processes are functioning normally. Logs for the cluster can be found at `/tmp/elk-metrics-7x-logs`. To rerun the playbooks after a test build, source the `tests/manual-test.rc` file and follow the onscreen instructions. To clean-up a test environment and start from a bare server slate the `run-cleanup.sh` script can be used. This script is distructive and will purge all `elk_metrics_7x` related services within the local test environment. .. code-block:: bash tests/run-cleanup.sh Enabling ELK security --------------------- By default, ELK 7 is deployed without security enabled. This means that all service and user interactions are unauthenticated, and communication is unencrypted. If you wish to enable security features, it is recommended to start with a deployed cluster with security disabled, before following these steps. Note that this is a multi-stage process and requires unavoidable downtime. https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-basic-setup.html#generate-certificates * Generate a certificate authority which is unique to the Elastic cluster. Ensure you set a password against the certificate bundle. * Generate a key and certificate for ElasticSearch instances. You may use a single bundle for all hosts, or unique bundles if preferred. Again, set a password against these. * Store the CA bundle securely, and configure the following elasticsearch Ansible role variables. Note that it may be useful to base64 encode and decode the binary certificate bundle files. elastic_security_enabled: True elastic_security_cert_bundle: "cert-bundle-contents" elastic_security_cert_password: "cert-bundle-password" * Stop all Elasticsearch services. * Run the 'installElastic.yml' playbook against all cluster nodes. This will enable security features, but will halt log ingest and monitoring tasks due to missing authentication credentials. https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html#security-create-builtin-users * Generate usernames and passwords for key ELK services. Store the output securely and set up the following Ansible variables. Note that the credentials for system users are generated for you. For Kibana hosts, set the following variables: kibana_system_username kibana_system_password kibana_setup_username (*) kibana_setup_password (*) For Logstash hosts, set the following variables: logstash_system_username logstash_system_password logstash_internal_username (*) logstash_internal_password (*) For Beats hosts, set the following variables: beats_system_username beats_system_password beats_setup_username (*) beats_setup_password (*) (*) Users marked with a star are not generated automatically. These must be set up manually via the Kibana interface once it has been configured. In order for the Kibana playbook to run successfully, the 'elastic' superuser can be used initially as the 'kibana_setup_username/password'. kibana_setup - any user which is assigned the built in kibana_admin role logstash_internal - see https://www.elastic.co/guide/en/logstash/7.17/ls-security.html#ls-http-auth-basic beats_setup - see setup role at https://www.elastic.co/guide/en/beats/filebeat/7.17/feature-roles.html - this user must also be assigned the built in ingest_admin role * Set 'kibana_object_encryption_key' to a string with a minimum length of 32 bytes. * Run the 'installKibana.yml' playbook against Kibana hosts. This will complete their configuration and should allow you to log in to the web interface using the 'elastic' user generated earlier. * Set up any additional users required by Logstash, Beats or others via the Kibana interface and set their variables as noted above. * Complete deployment by running the 'installLogstash.yml' and Beat install playbooks.