InfluxDB

InfluxDB is an open-source time series database developed by InfluxData. It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.

Available solutions




This template is for Zabbix version: 6.2
Also available for: 6.0 5.4

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/influxdb_http?at=release/6.2

InfluxDB by HTTP

Overview

For Zabbix version: 6.2 and higher
The template to monitor InfluxDB by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template InfluxDB by HTTP — collects metrics by HTTP agent from InfluxDB /metrics endpoint. See:

This template was tested on:

  • InfluxDB, version 2.0

Setup

See Zabbix template operation for basic instructions.

This template works with self-hosted InfluxDB instances. Internal service metrics are collected from InfluxDB /metrics endpoint. For organization discovery template need to use Authorization via API token. See docs: https://docs.influxdata.com/influxdb/v2.0/security/tokens/

Don't forget to change the macros {$INFLUXDB.URL}, {$INFLUXDB.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your InfluxDB instance version and configuration.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$INFLUXDB.API.TOKEN}

InfluxDB API Authorization Token

``
{$INFLUXDB.ORG_NAME.MATCHES}

Filter of discoverable organizations

.*
{$INFLUXDB.ORG_NAME.NOT_MATCHES}

Filter to exclude discovered organizations

CHANGE_IF_NEEDED
{$INFLUXDB.REQ.FAIL.MAX.WARN}

Maximum number of query requests failures for trigger expression.

2
{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN}

Maximum number of tasks runs failures for trigger expression.

2
{$INFLUXDB.URL}

InfluxDB instance URL

http://localhost:8086

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Organizations discovery

Discovery of organizations metrics.

HTTP_AGENT influxdb.orgs.discovery

Preprocessing:

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Filter:

AND

- {#ORG_NAME} NOT_MATCHES_REGEX {$INFLUXDB.ORG_NAME.NOT_MATCHES}

- {#ORG_NAME} MATCHES_REGEX {$INFLUXDB.ORG_NAME.MATCHES}

Items collected

Group Name Description Type Key and additional info
InfluxDB InfluxDB: Instance status

Get the health of an instance.

HTTP_AGENT influx.healthcheck

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: CUSTOM_VALUE -> {"status":"fail"}]}

- JAVASCRIPT: return JSON.parse(value).status == 'pass' ? 1: 0

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Boltdb reads, rate

Total number of boltdb reads per second.

DEPENDENT influxdb.boltdb_reads.rate

Preprocessing:

- JSONPATH: $[?(@.name=="boltdb_reads_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: Boltdb writes, rate

Total number of boltdb writes per second.

DEPENDENT influxdb.boltdb_writes.rate

Preprocessing:

- JSONPATH: $[?(@.name=="boltdb_writes_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: Buckets, total

Number of total buckets on the server.

DEPENDENT influxdb.buckets.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_buckets_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Dashboards, total

Number of total dashboards on the server.

DEPENDENT influxdb.dashboards.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_dashboards_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Organizations, total

Number of total organizations on the server.

DEPENDENT influxdb.organizations.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_organizations_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Scrapers, total

Number of total scrapers on the server.

DEPENDENT influxdb.scrapers.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_scrapers_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Telegraf plugins, total

Number of individual telegraf plugins configured.

DEPENDENT influxdb.telegraf_plugins.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_telegraf_plugins_count")].value.sum()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Telegrafs, total

Number of total telegraf configurations on the server.

DEPENDENT influxdb.telegrafs.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_telegrafs_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Tokens, total

Number of total tokens on the server.

DEPENDENT influxdb.tokens.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_tokens_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Users, total

Number of total users on the server.

DEPENDENT influxdb.users.total

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_users_total")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 30m

InfluxDB InfluxDB: Version

Version of the InfluxDB instance.

DEPENDENT influxdb.version

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_info")].labels.version.first()

- DISCARD_UNCHANGED_HEARTBEAT: 3h

InfluxDB InfluxDB: Uptime

InfluxDB process uptime in seconds.

DEPENDENT influxdb.uptime

Preprocessing:

- JSONPATH: $[?(@.name=="influxdb_uptime_seconds")].value.first()

InfluxDB InfluxDB: Workers currently running

Total number of workers currently running tasks.

DEPENDENT influxdb.task_executor_runs_active.total

Preprocessing:

- JSONPATH: $[?(@.name=="task_executor_total_runs_active")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

InfluxDB InfluxDB: Workers busy, pct

Percent of total available workers that are currently busy.

DEPENDENT influxdb.task_executor_workers_busy.pct

Preprocessing:

- JSONPATH: $[?(@.name=="task_executor_workers_busy")].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

InfluxDB InfluxDB: Task runs failed, rate

Total number of failure runs across all tasks.

DEPENDENT influxdb.task_executor_complete.failed.rate

Preprocessing:

- JSONPATH: $[?(@.name=="task_executor_total_runs_complete" && @.labels.status == "failed")].value.sum()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: Task runs successful, rate

Total number of runs successful completed across all tasks.

DEPENDENT influxdb.task_executor_complete.successful.rate

Preprocessing:

- JSONPATH: $[?(@.name=="task_executor_total_runs_complete" && @.labels.status == "success")].value.sum()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: [{#ORG_NAME}] Query requests bytes, success

Count of bytes received with status 200 per second.

DEPENDENT influxdb.org.query_request_bytes.success.rate["{#ORG_NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name=="http_query_request_bytes" && @.labels.status == "200" && @.labels.endpoint == "/api/v2/query" && @.labels.org_id == "{#ORG_ID}") ].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: [{#ORG_NAME}] Query requests bytes, failed

Count of bytes received with status not 200 per second.

DEPENDENT influxdb.org.query_request_bytes.failed.rate["{#ORG_NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name=="http_query_request_bytes" && @.labels.status != "200" && @.labels.endpoint == "/api/v2/query" && @.labels.org_id == "{#ORG_ID}") ].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: [{#ORG_NAME}] Query requests, failed

Total number of query requests with status not 200 per second.

DEPENDENT influxdb.org.query_request.failed.rate["{#ORG_NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name=="http_query_request_count" && @.labels.status != "200" && @.labels.endpoint == "/api/v2/query" && @.labels.org_id == "{#ORG_ID}") ].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: [{#ORG_NAME}] Query requests, success

Total number of query requests with status 200 per second.

DEPENDENT influxdb.org.query_request.success.rate["{#ORG_NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name=="http_query_request_count" && @.labels.status == "200" && @.labels.endpoint == "/api/v2/query" && @.labels.org_id == "{#ORG_ID}") ].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: [{#ORG_NAME}] Query response bytes, success

Count of bytes returned with status 200 per second.

DEPENDENT influxdb.org.http_query_response_bytes.success.rate["{#ORG_NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name=="http_query_response_bytes" && @.labels.status == "200" && @.labels.endpoint == "/api/v2/query" && @.labels.org_id == "{#ORG_ID}") ].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

InfluxDB InfluxDB: [{#ORG_NAME}] Query response bytes, failed

Count of bytes returned with status not 200 per second.

DEPENDENT influxdb.org.http_query_response_bytes.failed.rate["{#ORG_NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name=="http_query_response_bytes" && @.labels.status != "200" && @.labels.endpoint == "/api/v2/query" && @.labels.org_id == "{#ORG_ID}") ].value.first()

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Zabbix raw items InfluxDB: Get instance metrics

-

HTTP_AGENT influx.get_metrics

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

- PROMETHEUS_TO_JSON

Triggers

Name Description Expression Severity Dependencies and additional info
InfluxDB: Health check was failed

The InfluxDB instance is not available or unhealthy.

last(/InfluxDB by HTTP/influx.healthcheck)=0 HIGH
InfluxDB: Version has changed

InfluxDB version has changed. Ack to close.

last(/InfluxDB by HTTP/influxdb.version,#1)<>last(/InfluxDB by HTTP/influxdb.version,#2) and length(last(/InfluxDB by HTTP/influxdb.version))>0 INFO

Manual close: YES

InfluxDB: has been restarted

Uptime is less than 10 minutes.

last(/InfluxDB by HTTP/influxdb.uptime)<10m INFO

Manual close: YES

InfluxDB: Too many tasks failure runs

"Number of failure runs completed across all tasks is too high."

min(/InfluxDB by HTTP/influxdb.task_executor_complete.failed.rate,5m)>{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN} WARNING
InfluxDB: [{#ORG_NAME}]: Too many requests failures

Too many query requests failed.

min(/InfluxDB by HTTP/influxdb.org.query_request.failed.rate["{#ORG_NAME}"],5m)>{$INFLUXDB.REQ.FAIL.MAX.WARN} WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

Articles and documentation

+ Propose new article

Didn't find integration you need?