HashiCorp Consul

Consul is a service networking solution to automate network configurations, discover services, and enable secure connectivity across any cloud or runtime.

Available solutions




This template is for Zabbix version: 6.2
Also available for: 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/6.2

HashiCorp Consul Node by HTTP

Overview

For Zabbix version: 6.2 and higher
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics. See documentation.
More information about metrics you can find in official documentation.

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

This template was tested on:

  • HashiCorp Consul, version 1.10.0

Setup

See Zabbix template operation for basic instructions.

Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}

Filter of discoverable discovered services on local node.

.*
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}

Filter to exclude discovered services on local node.

CHANGE IF NEEDED
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}

Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.

.*
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}

Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.

CHANGE IF NEEDED
{$CONSUL.NODE.API.URL}

Consul instance URL.

http://localhost:8500
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}

Maximum acceptable value of node's health score for AVERAGE trigger expression.

4
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}

Maximum acceptable value of node's health score for WARNING trigger expression.

2
{$CONSUL.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

90
{$CONSUL.TOKEN}

Consul auth token.

<PUT YOUR AUTH TOKEN>

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
HTTP API methods discovery

Discovery HTTP API methods specific metrics.

DEPENDENT consul.http_api_discovery

Preprocessing:

- PROMETHEUS_TO_JSON: consul_api_http{method =~ ".*"}

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Local node services discovery

Discover metrics for services that are registered with the local agent.

DEPENDENT consul.node_services_lld

Preprocessing:

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

- {#SERVICE_NAME} MATCHES_REGEX {$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}

- {#SERVICE_NAME} NOT_MATCHES_REGEX {$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}

- {#SERVICE_NAMESPACE} MATCHES_REGEX {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}

- {#SERVICE_NAMESPACE} NOT_MATCHES_REGEX {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}

Overrides:

aggregated status
- {#TYPE} MATCHES_REGEX aggregated_status
- ITEM_PROTOTYPE LIKE Aggregated status - DISCOVER


- ITEM_PROTOTYPE LIKE State - DISCOVER

checks
- {#TYPE} MATCHES_REGEX service_check
- ITEM_PROTOTYPE LIKE Check - DISCOVER

Raft leader metrics discovery

Discover raft metrics for leader nodes.

DEPENDENT consul.raft.leader.discovery

Preprocessing:

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Raft server metrics discovery

Discover raft metrics for server nodes.

DEPENDENT consul.raft.server.discovery

Preprocessing:

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Items collected

Group Name Description Type Key and additional info
Consul Consul: Role

Role of current Consul agent.

DEPENDENT consul.role

Preprocessing:

- JSONPATH: $.Config.Server

- BOOL_TO_DECIMAL

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: Version

Version of Consul agent.

DEPENDENT consul.version

Preprocessing:

- JSONPATH: $.Config.Version

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: Number of services

Number of services on current node.

DEPENDENT consul.services_number

Preprocessing:

- JSONPATH: $.Stats.agent.services

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: Number of checks

Number of checks on current node.

DEPENDENT consul.checks_number

Preprocessing:

- JSONPATH: $.Stats.agent.checks

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: Number of check monitors

Number of check monitors on current node.

DEPENDENT consul.check_monitors_number

Preprocessing:

- JSONPATH: $.Stats.agent.check_monitors

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: Process CPU seconds, total

Total user and system CPU time spent in seconds.

DEPENDENT consul.cpu_seconds_total.rate

Preprocessing:

- PROMETHEUS_PATTERN: process_cpu_seconds_total

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Virtual memory size

Virtual memory size in bytes.

DEPENDENT consul.virtual_memory_bytes

Preprocessing:

- PROMETHEUS_PATTERN: process_virtual_memory_bytes

Consul Consul: RSS memory usage

Resident memory size in bytes.

DEPENDENT consul.resident_memory_bytes

Preprocessing:

- PROMETHEUS_PATTERN: process_resident_memory_bytes

Consul Consul: Goroutine count

The number of Goroutines on Consul instance.

DEPENDENT consul.goroutines

Preprocessing:

- PROMETHEUS_PATTERN: go_goroutines

Consul Consul: Open file descriptors

Number of open file descriptors.

DEPENDENT consul.process_open_fds

Preprocessing:

- PROMETHEUS_PATTERN: process_open_fds

Consul Consul: Open file descriptors, max

Maximum number of open file descriptors.

DEPENDENT consul.process_max_fds

Preprocessing:

- PROMETHEUS_PATTERN: process_max_fds

Consul Consul: Client RPC, per second

Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server.

This gives a measure of how much a given agent is loading the Consul servers.

This is only generated by agents in client mode, not Consul servers.

DEPENDENT consul.client_rpc

Preprocessing:

- PROMETHEUS_PATTERN: consul_client_rpc

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Client RPC failed ,per second

Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.

DEPENDENT consul.client_rpc_failed

Preprocessing:

- PROMETHEUS_PATTERN: consul_client_rpc_failed

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: TCP connections, accepted per second

This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.

DEPENDENT consul.memberlist.tcp_accept

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_tcp_accept

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: TCP connections, per second

This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.

DEPENDENT consul.memberlist.tcp_connect

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_tcp_connect

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: TCP send bytes, per second

This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.

DEPENDENT consul.memberlist.tcp_sent

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_tcp_sent

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: UDP received bytes, per second

This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.

DEPENDENT consul.memberlist.udp_received

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_udp_received

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: UDP sent bytes, per second

This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.

DEPENDENT consul.memberlist.udp_sent

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_udp_sent

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: GC pause, p90

The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.

DEPENDENT consul.gc_pause.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_runtime_gc_pause_ns{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

- MULTIPLIER: 1.0E-9

Consul Consul: GC pause, p50

The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.

DEPENDENT consul.gc_pause.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_runtime_gc_pause_ns{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

- MULTIPLIER: 1.0E-9

Consul Consul: Memberlist: degraded

This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate.

The agent uses its own health metric as an indicator to perform this action.

If its health score is low, it means that the node is healthy, and vice versa.

DEPENDENT consul.memberlist.degraded

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_degraded

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Memberlist: health score

This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol.

This metric ranges from 0 to 8, where 0 indicates "totally healthy".

DEPENDENT consul.memberlist.health_score

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_health_score

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Memberlist: gossip, p90

The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.

DEPENDENT consul.memberlist.dispatch_log.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_gossip{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Memberlist: gossip, p50

The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.

DEPENDENT consul.memberlist.gossip.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_gossip{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Memberlist: msg alive

This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.

DEPENDENT consul.memberlist.msg.alive

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_msg_alive

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Memberlist: msg dead

This metric counts the number of times a Consul agent has marked another agent to be a dead node.

DEPENDENT consul.memberlist.msg.dead

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_msg_dead

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Memberlist: msg suspect

The number of times a Consul agent suspects another as failed while probing during gossip protocol.

DEPENDENT consul.memberlist.msg.suspect

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_msg_suspect

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Memberlist: probe node, p90

The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.

DEPENDENT consul.memberlist.probe_node.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_probeNode{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Memberlist: probe node, p50

The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.

DEPENDENT consul.memberlist.probe_node.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_probeNode{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Memberlist: push pull node, p90

The 90 percentile for the number of Consul agents that have exchanged state with this agent.

DEPENDENT consul.memberlist.push_pull_node.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_pushPullNode{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Memberlist: push pull node, p50

The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.

DEPENDENT consul.memberlist.push_pull_node.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_memberlist_pushPullNode{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: KV store: apply, p90

The 90 percentile for the time it takes to complete an update to the KV store.

DEPENDENT consul.kvs.apply.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_kvs_apply{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: KV store: apply, p50

The 50 percentile (median) for the time it takes to complete an update to the KV store.

DEPENDENT consul.kvs.apply.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_kvs_apply{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: KV store: apply, rate

The number of updates to the KV store per second.

DEPENDENT consul.kvs.apply.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_kvs_apply_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Serf member: flap, rate

Increments when an agent is marked dead and then recovers within a short time period.

This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.

Shown as events per second.

DEPENDENT consul.serf.member.flap.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_member_flap

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Serf member: failed, rate

Increments when an agent is marked dead.

This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.

Shown as events per second.

DEPENDENT consul.serf.member.failed.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_member_failed

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Serf member: join, rate

Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins.

Shown as events per second.

DEPENDENT consul.serf.member.join.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_member_join

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Serf member: left, rate

Increments when an agent leaves the cluster. Shown as events per second.

DEPENDENT consul.serf.member.left.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_member_left

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Serf member: update, rate

Increments when a Consul agent updates. Shown as events per second.

DEPENDENT consul.serf.member.update.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_member_update

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: ACL: resolves, rate

The number of ACL resolves per second.

DEPENDENT consul.acl.resolves.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_acl_ResolveToken_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Catalog: register, rate

The number of catalog register operation per second.

DEPENDENT consul.catalog.register.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_catalog_register_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Catalog: deregister, rate

The number of catalog deregister operation per second.

DEPENDENT consul.catalog.deregister.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_catalog_deregister_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Snapshot: append line, p90

The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.

DEPENDENT consul.snapshot.append_line.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_snapshot_appendLine{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Snapshot: append line, p50

The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.

DEPENDENT consul.snapshot.append_line.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_snapshot_appendLine{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Snapshot: append line, rate

The number of snapshot appendLine operations per second.

DEPENDENT consul.snapshot.append_line.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_snapshot_appendLine_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Snapshot: compact, p90

The 90 percentile for the time taken by the Consul agent to compact a log.

This operation occurs only when the snapshot becomes large enough to justify the compaction.

DEPENDENT consul.snapshot.compact.p90

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_snapshot_compact{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Snapshot: compact, p50

The 50 percentile (median) for the time taken by the Consul agent to compact a log.

This operation occurs only when the snapshot becomes large enough to justify the compaction.

DEPENDENT consul.snapshot.compact.p50

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_snapshot_compact{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Snapshot: compact, rate

The number of snapshot compact operations per second.

DEPENDENT consul.snapshot.compact.rate

Preprocessing:

- PROMETHEUS_PATTERN: consul_serf_snapshot_compact_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Get local services check

Data collection check.

DEPENDENT consul.get_local_services.check

Preprocessing:

- JSONPATH: $.error

⛔️ON_FAIL: CUSTOM_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: ["{#SERVICE_NAME}"]: Aggregated status

Aggregated values of all health checks for the service instance.

DEPENDENT consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing:

- JSONPATH: $[?(@.id == "{#SERVICE_ID}")].status.first()

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

DEPENDENT consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing:

- JSONPATH: $[?(@.id == "{#SERVICE_ID}")].checks[?(@.CheckID == "{#SERVICE_CHECK_ID}")].Status.first()

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

DEPENDENT consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing:

- JSONPATH: $[?(@.id == "{#SERVICE_ID}")].checks[?(@.CheckID == "{#SERVICE_CHECK_ID}")].Output.first()

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: HTTP request: ["{#HTTP_METHOD}"], p90

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

DEPENDENT consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing:

- PROMETHEUS_PATTERN: consul_api_http{method = "{#HTTP_METHOD}", quantile = "0.9"}: function: sum

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

DEPENDENT consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing:

- PROMETHEUS_PATTERN: consul_api_http{method = "{#HTTP_METHOD}", quantile = "0.5"}: function: sum

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: HTTP request: ["{#HTTP_METHOD}"], rate

Thr number of HTTP request for the given verb per second.

DEPENDENT consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing:

- PROMETHEUS_PATTERN: consul_api_http_count{method = "{#HTTP_METHOD}"}: function: sum

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Raft state

Current state of Consul agent.

DEPENDENT consul.raft.state[{#SINGLETON}]

Preprocessing:

- JSONPATH: $.Stats.raft.state

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Consul Consul: Raft state: leader

Increments when a server becomes a leader.

DEPENDENT consul.raft.state_leader[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_state_leader

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Raft state: candidate

The number of initiated leader elections.

DEPENDENT consul.raft.state_candidate[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_state_candidate

⛔️ON_FAIL: DISCARD_VALUE ->

Consul Consul: Raft: apply, rate

Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation).

This metric describes the arrival rate of new logs into Raft per second.

DEPENDENT consul.raft.apply.rate[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_apply

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Raft state: leader last contact, p90

The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.

DEPENDENT consul.raft.leader_last_contact.p90[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_leader_lastContact{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Raft state: leader last contact, p50

The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.

DEPENDENT consul.raft.leader_last_contact.p50[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_leader_lastContact{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Raft state: commit time, p90

The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.

DEPENDENT consul.raft.commit_time.p90[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_commitTime{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Raft state: commit time, p50

The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.

DEPENDENT consul.raft.commit_time.p50[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_commitTime{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Raft state: dispatch log, p90

The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.

DEPENDENT consul.raft.dispatch_log.p90[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_leader_dispatchLog{quantile="0.9"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Raft state: dispatch log, p50

The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.

DEPENDENT consul.raft.dispatch_log.p50[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_leader_dispatchLog{quantile="0.5"}

⛔️ON_FAIL: DISCARD_VALUE ->

- JAVASCRIPT: return (isNaN(value)) ? 0 : value;

Consul Consul: Raft state: dispatch log, rate

The number of times a Raft leader writes a log to disk per second.

DEPENDENT consul.raft.dispatch_log.rate[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_leader_dispatchLog_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Raft state: commit, rate

The number of commits a new entry to the Raft log on the leader per second.

DEPENDENT consul.raft.commit_time.rate[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_raft_commitTime_count

⛔️ON_FAIL: DISCARD_VALUE ->

- CHANGE_PER_SECOND

Consul Consul: Autopilot healthy

Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.

DEPENDENT consul.autopilot.healthy[{#SINGLETON}]

Preprocessing:

- PROMETHEUS_PATTERN: consul_autopilot_healthy

⛔️ON_FAIL: DISCARD_VALUE ->

Zabbix raw items Consul: Get instance metrics

Get raw metrics from Consul instance /metrics endpoint.

HTTP_AGENT consul.get_metrics

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

Zabbix raw items Consul: Get node info

Get configuration and member information of the local agent.

HTTP_AGENT consul.get_node_info

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

Zabbix raw items Consul: Get local services

Get all the services that are registered with the local agent and their status.

SCRIPT consul.get_local_services

Expression:

The text is too long. Please see the template.

Triggers

Name Description Expression Severity Dependencies and additional info
Consul: Version has been changed

Consul version has changed. Ack to close.

last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0 INFO

Manual close: YES

Consul: Current number of open files is too high

"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."

min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN} WARNING
Consul: Node's health score is warning

This metric ranges from 0 to 8, where 0 indicates "totally healthy".

This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals.

For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf

max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} WARNING

Depends on:

- Consul: Node's health score is critical

Consul: Node's health score is critical

This metric ranges from 0 to 8, where 0 indicates "totally healthy".

This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals.

For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf

max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} AVERAGE
Consul: Failed to get local services

Failed to get local services. Check debug log for more information.

length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0 WARNING
Consul: Aggregated status is 'warning'

Aggregated state of service on the local agent is 'warning'.

last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1 WARNING
Consul: Aggregated status is 'critical'

Aggregated state of service on the local agent is 'critical'.

last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2 AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

Articles and documentation

+ Propose new article

Didn't find integration you need?