Deploy Prometheus monitoring system using ansible.
When upgrading from <= 2.4.0 version of this role to >= 2.4.1 please turn off your prometheus instance. More in 2.4.1 release notes
- Ansible >= 2.7 (It might work on previous versions, but we cannot guarantee it)
- jmespath on deployer machine. If you are using Ansible from a Python virtualenv, install jmespath to the same virtualenv via pip.
- gnu-tar on Mac deployer host (
brew install gnu-tar
)
All variables which can be overridden are stored in defaults/main.yml file as well as in table below.
Name | Default Value | Description |
---|---|---|
prometheus_version |
2.15.2 | Prometheus package version. Also accepts latest as parameter. Only prometheus 2.x is supported |
prometheus_binaries_local_dir |
"" | Allows to use local packages instead of ones distributed on github. As parameter it takes a directory where prometheus AND promtool binaries are stored on host on which ansible is ran. This overrides prometheus_version parameter |
prometheus_config_dir |
/etc/prometheus | Path to directory with prometheus configuration |
prometheus_db_dir |
/var/lib/prometheus | Path to directory with prometheus database |
prometheus_web_listen_address |
"0.0.0.0:9090" | Address on which prometheus will be listening |
prometheus_web_external_url |
"" | External address on which prometheus is available. Useful when behind reverse proxy. Ex. http://example.org/prometheus |
prometheus_storage_retention |
"30d" | Data retention period |
prometheus_storage_retention_size |
"0" | Data retention period by size |
prometheus_config_flags_extra |
{} | Additional configuration flags passed to prometheus binary at startup |
prometheus_alertmanager_config |
[] | Configuration responsible for pointing where alertmanagers are. This should be specified as list in yaml format. It is compatible with official <alertmanager_config> |
prometheus_alert_relabel_configs |
[] | Alert relabeling rules. This should be specified as list in yaml format. It is compatible with the official <alert_relabel_configs> |
prometheus_global |
{ scrape_interval: 60s, scrape_timeout: 15s, evaluation_interval: 15s } | Prometheus global config. Compatible with official configuration |
prometheus_remote_write |
[] | Remote write. Compatible with official configuration |
prometheus_remote_read |
[] | Remote read. Compatible with official configuration |
prometheus_external_labels |
environment: "{{ ansible_fqdn | default(ansible_host) | default(inventory_hostname) }}" | Provide map of additional labels which will be added to any time series or alerts when communicating with external systems |
prometheus_targets |
{} | Targets which will be scraped. Better example is provided in our demo site |
prometheus_scrape_configs |
defaults/main.yml#L58 | Prometheus scrape jobs provided in same format as in official docs |
prometheus_config_file |
"prometheus.yml.j2" | Variable used to provide custom prometheus configuration file in form of ansible template |
prometheus_alert_rules |
defaults/main.yml#L81 | Full list of alerting rules which will be copied to {{ prometheus_config_dir }}/rules/ansible_managed.rules . Alerting rules can be also provided by other files located in {{ prometheus_config_dir }}/rules/ which have *.rules extension |
prometheus_alert_rules_files |
defaults/main.yml#L78 | List of folders where ansible will look for files containing alerting rules which will be copied to {{ prometheus_config_dir }}/rules/ . Files must have *.rules extension |
prometheus_static_targets_files |
defaults/main.yml#L78 | List of folders where ansible will look for files containing custom static target configuration files which will be copied to {{ prometheus_config_dir }}/file_sd/ . |
prometheus_targets
is just a map used to create multiple files located in "{{ prometheus_config_dir }}/file_sd" directory. Where file names are composed from top-level keys in that map with .yml
suffix. Those files store file_sd scrape targets data and they need to be read in prometheus_scrape_configs
.
A part of prometheus.yml configuration file which describes what is scraped by prometheus is stored in prometheus_scrape_configs
. For this variable same configuration options as described in prometheus docs are used.
Meanwhile prometheus_targets
is our way of adopting prometheus scrape type file_sd
. It defines a map of files with their content. A top-level keys are base names of files which need to have their own scrape job in prometheus_scrape_configs
and values are a content of those files.
All this mean that you CAN use custom prometheus_scrape_configs
with prometheus_targets
set to {}
. However when you set anything in prometheus_targets
it needs to be mapped to prometheus_scrape_configs
. If it isn't you'll get an error in preflight checks.
Lets look at our default configuration, which shows all features. By default we have this prometheus_targets
:
prometheus_targets:
node: # This is a base file name. File is located in "{{ prometheus_config_dir }}/file_sd/<<BASENAME>>.yml"
- targets: #
- localhost:9100 # All this is a targets section in file_sd format
labels: #
env: test #
Such config will result in creating one file named node.yml
in {{ prometheus_config_dir }}/file_sd
directory.
Next this file needs to be loaded into scrape config. Here is modified version of our default prometheus_scrape_configs
:
prometheus_scrape_configs:
- job_name: "prometheus" # Custom scrape job, here using `static_config`
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: "example-node-file-servicediscovery"
file_sd_configs:
- files:
- "{{ prometheus_config_dir }}/file_sd/node.yml" # This line loads file created from `prometheus_targets`
---
- hosts: all
roles:
- cloudalchemy.prometheus
vars:
prometheus_targets:
node:
- targets:
- localhost:9100
- demo.cloudalchemy.org:9100
labels:
env: demosite
We provide demo site for full monitoring solution based on prometheus and grafana. Repository with code and links to running instances is available on github and site is hosted on DigitalOcean.
Alerting rules are defined in prometheus_alert_rules
variable. Format is almost identical to one defined in Prometheus 2.0 documentation.
Due to similarities in templating engines, every templates should be wrapped in {% raw %}
and {% endraw %}
statements. Example is provided in defaults/main.yml file.
The preferred way of locally testing the role is to use Docker and molecule (v2.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable to for your system. We are using tox to simplify process of testing on multiple ansible versions. To install tox execute:
pip3 install tox
To run tests on all ansible versions (WARNING: this can take some time)
tox
To run a custom molecule command on custom environment with only default test scenario:
tox -e py35-ansible28 -- molecule test -s default
For more information about molecule go to their docs.
If you would like to run tests on remote docker host just specify DOCKER_HOST
variable before running tox tests.
Combining molecule and travis CI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have a quite large test matrix which will take more time than local testing, so please be patient.
This project is licensed under MIT License. See LICENSE for more details.