Dostupná řešení

HashiCorp Consul Node by HTTP
HashiCorp Consul Cluster by HTTP
3rd party solutions

This template is for Zabbix version: 7.4

Also available for: 7.2 7.0 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/7.4

HashiCorp Consul Node by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics. See documentation.
More information about metrics you can find in official documentation.

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values. More information about metrics you can find in official documentation.

This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.get_node_info Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Number of check monitors	Number of check monitors on current node.	Dependent item	consul.check_monitors_number Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpu_seconds_total.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtual_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
RSS memory usage	Resident memory size in bytes.	Dependent item	consul.resident_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Open file descriptors	Number of open file descriptors.	Dependent item	consul.process_open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.process_max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.client_rpc_failed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.get_local_services
Get local services check	Data collection check.	Dependent item	consul.get_local_services.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`	Info	Manual close: Yes
HashiCorp Consul Node: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`	Warning
HashiCorp Consul Node: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`	Warning	Depends on: HashiCorp Consul Node: Node's health score is critical
HashiCorp Consul Node: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`	Average
HashiCorp Consul Node: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`	Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Name	Description	Type	Key and additional info
Local node services discovery	Discover metrics for services that are registered with the local agent.	Dependent item	consul.node_services_lld Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover metrics for services that are registered with the local agent.

Dependent item

consul.node_services_lld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

["{#SERVICE_NAME}"]: Aggregated status

Name	Description	Type	Key and additional info
["{#SERVICE_NAME}"]: Aggregated status	Aggregated values of all health checks for the service instance.	Dependent item	consul.service.aggregated_state["{#SERVICE_ID}"] Preprocessing JSON Path: `$[?(@.id == "{#SERVICE_ID}")].status.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status	Current state of health check for the service.	Dependent item	consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output	Current output of health check for the service.	Dependent item	consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`	Warning
HashiCorp Consul Node: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`	Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Name	Description	Type	Key and additional info
HTTP API methods discovery	Discovery HTTP API methods specific metrics.	Dependent item	consul.http_api_discovery Preprocessing Prometheus to JSON: `consul_api_http{method =~ ".*"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discovery HTTP API methods specific metrics.

Dependent item

consul.http_api_discovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

HTTP request: ["{#HTTP_METHOD}"], p90

Name	Description	Type	Key and additional info
HTTP request: ["{#HTTP_METHOD}"], p90	The 90 percentile of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP request: ["{#HTTP_METHOD}"], p50	The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP request: ["{#HTTP_METHOD}"], rate	The number of HTTP request for the given verb per second.	Dependent item	consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})` ⛔️Custom on fail: Discard value Change per second

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft server metrics discovery	Discover raft metrics for server nodes.	Dependent item	consul.raft.server.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft leader metrics discovery	Discover raft metrics for leader nodes.	Dependent item	consul.raft.leader.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.2

Also available for: 7.4 7.0 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/7.2

HashiCorp Consul Node by HTTP

Overview

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.get_node_info Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Number of check monitors	Number of check monitors on current node.	Dependent item	consul.check_monitors_number Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpu_seconds_total.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtual_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
RSS memory usage	Resident memory size in bytes.	Dependent item	consul.resident_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Open file descriptors	Number of open file descriptors.	Dependent item	consul.process_open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.process_max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.client_rpc_failed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.get_local_services
Get local services check	Data collection check.	Dependent item	consul.get_local_services.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`	Info	Manual close: Yes
HashiCorp Consul Node: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`	Warning
HashiCorp Consul Node: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`	Warning	Depends on: HashiCorp Consul Node: Node's health score is critical
HashiCorp Consul Node: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`	Average
HashiCorp Consul Node: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`	Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Name	Description	Type	Key and additional info
Local node services discovery	Discover metrics for services that are registered with the local agent.	Dependent item	consul.node_services_lld Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover metrics for services that are registered with the local agent.

Dependent item

consul.node_services_lld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

["{#SERVICE_NAME}"]: Aggregated status

Name	Description	Type	Key and additional info
["{#SERVICE_NAME}"]: Aggregated status	Aggregated values of all health checks for the service instance.	Dependent item	consul.service.aggregated_state["{#SERVICE_ID}"] Preprocessing JSON Path: `$[?(@.id == "{#SERVICE_ID}")].status.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status	Current state of health check for the service.	Dependent item	consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output	Current output of health check for the service.	Dependent item	consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`	Warning
HashiCorp Consul Node: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`	Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Name	Description	Type	Key and additional info
HTTP API methods discovery	Discovery HTTP API methods specific metrics.	Dependent item	consul.http_api_discovery Preprocessing Prometheus to JSON: `consul_api_http{method =~ ".*"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discovery HTTP API methods specific metrics.

Dependent item

consul.http_api_discovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

HTTP request: ["{#HTTP_METHOD}"], p90

Name	Description	Type	Key and additional info
HTTP request: ["{#HTTP_METHOD}"], p90	The 90 percentile of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP request: ["{#HTTP_METHOD}"], p50	The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP request: ["{#HTTP_METHOD}"], rate	The number of HTTP request for the given verb per second.	Dependent item	consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})` ⛔️Custom on fail: Discard value Change per second

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft server metrics discovery	Discover raft metrics for server nodes.	Dependent item	consul.raft.server.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft leader metrics discovery	Discover raft metrics for leader nodes.	Dependent item	consul.raft.leader.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.0

Also available for: 7.4 7.2 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/7.0

HashiCorp Consul Node by HTTP

Overview

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.get_node_info Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Number of check monitors	Number of check monitors on current node.	Dependent item	consul.check_monitors_number Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpu_seconds_total.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtual_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
RSS memory usage	Resident memory size in bytes.	Dependent item	consul.resident_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Open file descriptors	Number of open file descriptors.	Dependent item	consul.process_open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.process_max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.client_rpc_failed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.get_local_services
Get local services check	Data collection check.	Dependent item	consul.get_local_services.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`	Info	Manual close: Yes
HashiCorp Consul Node: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`	Warning
HashiCorp Consul Node: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`	Warning	Depends on: HashiCorp Consul Node: Node's health score is critical
HashiCorp Consul Node: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`	Average
HashiCorp Consul Node: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`	Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Name	Description	Type	Key and additional info
Local node services discovery	Discover metrics for services that are registered with the local agent.	Dependent item	consul.node_services_lld Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover metrics for services that are registered with the local agent.

Dependent item

consul.node_services_lld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

["{#SERVICE_NAME}"]: Aggregated status

Name	Description	Type	Key and additional info
["{#SERVICE_NAME}"]: Aggregated status	Aggregated values of all health checks for the service instance.	Dependent item	consul.service.aggregated_state["{#SERVICE_ID}"] Preprocessing JSON Path: `$[?(@.id == "{#SERVICE_ID}")].status.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status	Current state of health check for the service.	Dependent item	consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output	Current output of health check for the service.	Dependent item	consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Node: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`	Warning
HashiCorp Consul Node: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`	Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Name	Description	Type	Key and additional info
HTTP API methods discovery	Discovery HTTP API methods specific metrics.	Dependent item	consul.http_api_discovery Preprocessing Prometheus to JSON: `consul_api_http{method =~ ".*"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discovery HTTP API methods specific metrics.

Dependent item

consul.http_api_discovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

HTTP request: ["{#HTTP_METHOD}"], p90

Name	Description	Type	Key and additional info
HTTP request: ["{#HTTP_METHOD}"], p90	The 90 percentile of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP request: ["{#HTTP_METHOD}"], p50	The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
HTTP request: ["{#HTTP_METHOD}"], rate	The number of HTTP request for the given verb per second.	Dependent item	consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})` ⛔️Custom on fail: Discard value Change per second

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft server metrics discovery	Discover raft metrics for server nodes.	Dependent item	consul.raft.server.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft leader metrics discovery	Discover raft metrics for leader nodes.	Dependent item	consul.raft.leader.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.4

Also available for: 7.4 7.2 7.0 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/6.4

HashiCorp Consul Node by HTTP

Overview

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Consul: Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.get_node_info Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Consul: Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Consul: Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Consul: Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Consul: Number of check monitors	Number of check monitors on current node.	Dependent item	consul.check_monitors_number Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Consul: Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpu_seconds_total.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Consul: Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtual_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Consul: RSS memory usage	Resident memory size in bytes.	Dependent item	consul.resident_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Consul: Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Consul: Open file descriptors	Number of open file descriptors.	Dependent item	consul.process_open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Consul: Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.process_max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Consul: Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Consul: Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.client_rpc_failed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
Consul: UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
Consul: UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
Consul: GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Consul: GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Consul: Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Consul: Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Consul: Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Consul: Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Consul: Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Consul: Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
Consul: ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.get_local_services
Consul: Get local services check	Data collection check.	Dependent item	consul.get_local_services.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`	Info	Manual close: Yes
Consul: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`	Warning
Consul: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`	Warning	Depends on: Consul: Node's health score is critical
Consul: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`	Average
Consul: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`	Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Name	Description	Type	Key and additional info
Local node services discovery	Discover metrics for services that are registered with the local agent.	Dependent item	consul.node_services_lld Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover metrics for services that are registered with the local agent.

Dependent item

consul.node_services_lld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

Consul: ["{#SERVICE_NAME}"]: Aggregated status

Name	Description	Type	Key and additional info
Consul: ["{#SERVICE_NAME}"]: Aggregated status	Aggregated values of all health checks for the service instance.	Dependent item	consul.service.aggregated_state["{#SERVICE_ID}"] Preprocessing JSON Path: `$[?(@.id == "{#SERVICE_ID}")].status.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status	Current state of health check for the service.	Dependent item	consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output	Current output of health check for the service.	Dependent item	consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`	Warning
Consul: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`	Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Name	Description	Type	Key and additional info
HTTP API methods discovery	Discovery HTTP API methods specific metrics.	Dependent item	consul.http_api_discovery Preprocessing Prometheus to JSON: `consul_api_http{method =~ ".*"}` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discovery HTTP API methods specific metrics.

Dependent item

consul.http_api_discovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

Consul: HTTP request: ["{#HTTP_METHOD}"], p90

Name	Description	Type	Key and additional info
Consul: HTTP request: ["{#HTTP_METHOD}"], p90	The 90 percentile of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Consul: HTTP request: ["{#HTTP_METHOD}"], p50	The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.	Dependent item	consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
Consul: HTTP request: ["{#HTTP_METHOD}"], rate	The number of HTTP request for the given verb per second.	Dependent item	consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing Prometheus pattern: `SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})` ⛔️Custom on fail: Discard value Change per second

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Name	Description	Type	Key and additional info
Raft server metrics discovery	Discover raft metrics for server nodes.	Dependent item	consul.raft.server.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Consul: Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Consul: Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Consul: Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Consul: Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Name	Description	Type	Key and additional info
Raft leader metrics discovery	Discover raft metrics for leader nodes.	Dependent item	consul.raft.leader.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Consul: Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.2

Also available for: 7.4 7.2 7.0 6.4 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/6.2

HashiCorp Consul Node by HTTP

Overview

For Zabbix version: 6.2 and higher
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics. See documentation.
More information about metrics you can find in official documentation.

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

This template was tested on:

HashiCorp Consul, version 1.10.0

Setup

Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`

Template links

There are no template links in this template.

Discovery rules

Name	Description	Type	Key and additional info
HTTP API methods discovery	Discovery HTTP API methods specific metrics.	DEPENDENT	consul.http_api_discovery Preprocessing: - PROMETHEUS_TO_JSON: `consul_api_http{method =~ ".*"}` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Local node services discovery	Discover metrics for services that are registered with the local agent.	DEPENDENT	consul.node_services_lld Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter: - {#SERVICE_NAME} MATCHES_REGEX `{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}` - {#SERVICE_NAME} NOT_MATCHES_REGEX `{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}` - {#SERVICE_NAMESPACE} MATCHES_REGEX `{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}` - {#SERVICE_NAMESPACE} NOT_MATCHES_REGEX `{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}` Overrides: aggregated status - {#TYPE} MATCHES_REGEX `aggregated_status` - ITEM_PROTOTYPE LIKE `Aggregated status` - DISCOVER - ITEM_PROTOTYPE LIKE `State` - DISCOVER checks - {#TYPE} MATCHES_REGEX `service_check` - ITEM_PROTOTYPE LIKE `Check` - DISCOVER
Raft leader metrics discovery	Discover raft metrics for leader nodes.	DEPENDENT	consul.raft.leader.discovery Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Raft server metrics discovery	Discover raft metrics for server nodes.	DEPENDENT	consul.raft.server.discovery Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h`

Items collected

Group	Name	Description	Type	Key and additional info
Consul	Consul: Role	Role of current Consul agent.	DEPENDENT	consul.role Preprocessing: - JSONPATH: `$.Config.Server` - BOOL_TO_DECIMAL - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Version	Version of Consul agent.	DEPENDENT	consul.version Preprocessing: - JSONPATH: `$.Config.Version` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Number of services	Number of services on current node.	DEPENDENT	consul.services_number Preprocessing: - JSONPATH: `$.Stats.agent.services` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Number of checks	Number of checks on current node.	DEPENDENT	consul.checks_number Preprocessing: - JSONPATH: `$.Stats.agent.checks` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Number of check monitors	Number of check monitors on current node.	DEPENDENT	consul.check_monitors_number Preprocessing: - JSONPATH: `$.Stats.agent.check_monitors` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Process CPU seconds, total	Total user and system CPU time spent in seconds.	DEPENDENT	consul.cpu_seconds_total.rate Preprocessing: - PROMETHEUS_PATTERN: `process_cpu_seconds_total` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Virtual memory size	Virtual memory size in bytes.	DEPENDENT	consul.virtual_memory_bytes Preprocessing: - PROMETHEUS_PATTERN: `process_virtual_memory_bytes`
Consul	Consul: RSS memory usage	Resident memory size in bytes.	DEPENDENT	consul.resident_memory_bytes Preprocessing: - PROMETHEUS_PATTERN: `process_resident_memory_bytes`
Consul	Consul: Goroutine count	The number of Goroutines on Consul instance.	DEPENDENT	consul.goroutines Preprocessing: - PROMETHEUS_PATTERN: `go_goroutines`
Consul	Consul: Open file descriptors	Number of open file descriptors.	DEPENDENT	consul.process_open_fds Preprocessing: - PROMETHEUS_PATTERN: `process_open_fds`
Consul	Consul: Open file descriptors, max	Maximum number of open file descriptors.	DEPENDENT	consul.process_max_fds Preprocessing: - PROMETHEUS_PATTERN: `process_max_fds`
Consul	Consul: Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	DEPENDENT	consul.client_rpc Preprocessing: - PROMETHEUS_PATTERN: `consul_client_rpc` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	DEPENDENT	consul.client_rpc_failed Preprocessing: - PROMETHEUS_PATTERN: `consul_client_rpc_failed` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	DEPENDENT	consul.memberlist.tcp_accept Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_tcp_accept` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	DEPENDENT	consul.memberlist.tcp_connect Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_tcp_connect` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	DEPENDENT	consul.memberlist.tcp_sent Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_tcp_sent` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	DEPENDENT	consul.memberlist.udp_received Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_udp_received` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	DEPENDENT	consul.memberlist.udp_sent Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_udp_sent` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	DEPENDENT	consul.gc_pause.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_runtime_gc_pause_ns{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;` - MULTIPLIER: `1.0E-9`
Consul	Consul: GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	DEPENDENT	consul.gc_pause.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_runtime_gc_pause_ns{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;` - MULTIPLIER: `1.0E-9`
Consul	Consul: Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	DEPENDENT	consul.memberlist.degraded Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_degraded` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	DEPENDENT	consul.memberlist.health_score Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_health_score` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	DEPENDENT	consul.memberlist.dispatch_log.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_gossip{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	DEPENDENT	consul.memberlist.gossip.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_gossip{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	DEPENDENT	consul.memberlist.msg.alive Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_msg_alive` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	DEPENDENT	consul.memberlist.msg.dead Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_msg_dead` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	DEPENDENT	consul.memberlist.msg.suspect Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_msg_suspect` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	DEPENDENT	consul.memberlist.probe_node.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_probeNode{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	DEPENDENT	consul.memberlist.probe_node.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_probeNode{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	DEPENDENT	consul.memberlist.push_pull_node.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_pushPullNode{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	DEPENDENT	consul.memberlist.push_pull_node.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_memberlist_pushPullNode{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	DEPENDENT	consul.kvs.apply.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_kvs_apply{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	DEPENDENT	consul.kvs.apply.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_kvs_apply{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: KV store: apply, rate	The number of updates to the KV store per second.	DEPENDENT	consul.kvs.apply.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_kvs_apply_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	DEPENDENT	consul.serf.member.flap.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_member_flap` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	DEPENDENT	consul.serf.member.failed.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_member_failed` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	DEPENDENT	consul.serf.member.join.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_member_join` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	DEPENDENT	consul.serf.member.left.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_member_left` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	DEPENDENT	consul.serf.member.update.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_member_update` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: ACL: resolves, rate	The number of ACL resolves per second.	DEPENDENT	consul.acl.resolves.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_acl_ResolveToken_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Catalog: register, rate	The number of catalog register operation per second.	DEPENDENT	consul.catalog.register.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_catalog_register_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Catalog: deregister, rate	The number of catalog deregister operation per second.	DEPENDENT	consul.catalog.deregister.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_catalog_deregister_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	DEPENDENT	consul.snapshot.append_line.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_snapshot_appendLine{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	DEPENDENT	consul.snapshot.append_line.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_snapshot_appendLine{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Snapshot: append line, rate	The number of snapshot appendLine operations per second.	DEPENDENT	consul.snapshot.append_line.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_snapshot_appendLine_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	DEPENDENT	consul.snapshot.compact.p90 Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_snapshot_compact{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	DEPENDENT	consul.snapshot.compact.p50 Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_snapshot_compact{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Snapshot: compact, rate	The number of snapshot compact operations per second.	DEPENDENT	consul.snapshot.compact.rate Preprocessing: - PROMETHEUS_PATTERN: `consul_serf_snapshot_compact_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Get local services check	Data collection check.	DEPENDENT	consul.get_local_services.check Preprocessing: - JSONPATH: `$.error` ⛔️ON_FAIL: `CUSTOM_VALUE ->` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: ["{#SERVICE_NAME}"]: Aggregated status	Aggregated values of all health checks for the service instance.	DEPENDENT	consul.service.aggregated_state["{#SERVICE_ID}"] Preprocessing: - JSONPATH: `$[?(@.id == "{#SERVICE_ID}")].status.first()` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status	Current state of health check for the service.	DEPENDENT	consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing: - JSONPATH: `$[?(@.id == "{#SERVICE_ID}")].checks[?(@.CheckID == "{#SERVICE_CHECK_ID}")].Status.first()` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output	Current output of health check for the service.	DEPENDENT	consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"] Preprocessing: - JSONPATH: `$[?(@.id == "{#SERVICE_ID}")].checks[?(@.CheckID == "{#SERVICE_CHECK_ID}")].Output.first()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: HTTP request: ["{#HTTP_METHOD}"], p90	The 90 percentile of how long it takes to service the given HTTP request for the given verb.	DEPENDENT	consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing: - PROMETHEUS_PATTERN: `consul_api_http{method = "{#HTTP_METHOD}", quantile = "0.9"}`: `function`: `sum` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: HTTP request: ["{#HTTP_METHOD}"], p50	The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.	DEPENDENT	consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing: - PROMETHEUS_PATTERN: `consul_api_http{method = "{#HTTP_METHOD}", quantile = "0.5"}`: `function`: `sum` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: HTTP request: ["{#HTTP_METHOD}"], rate	Thr number of HTTP request for the given verb per second.	DEPENDENT	consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing: - PROMETHEUS_PATTERN: `consul_api_http_count{method = "{#HTTP_METHOD}"}`: `function`: `sum` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Raft state	Current state of Consul agent.	DEPENDENT	consul.raft.state[{#SINGLETON}] Preprocessing: - JSONPATH: `$.Stats.raft.state` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Raft state: leader	Increments when a server becomes a leader.	DEPENDENT	consul.raft.state_leader[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_state_leader` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Raft state: candidate	The number of initiated leader elections.	DEPENDENT	consul.raft.state_candidate[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_state_candidate` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Consul	Consul: Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	DEPENDENT	consul.raft.apply.rate[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_apply` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	DEPENDENT	consul.raft.leader_last_contact.p90[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_leader_lastContact{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	DEPENDENT	consul.raft.leader_last_contact.p50[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_leader_lastContact{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	DEPENDENT	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_commitTime{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	DEPENDENT	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_commitTime{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	DEPENDENT	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_leader_dispatchLog{quantile="0.9"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	DEPENDENT	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_leader_dispatchLog{quantile="0.5"}` ⛔️ON_FAIL: `DISCARD_VALUE ->` - JAVASCRIPT: `return (isNaN(value)) ? 0 : value;`
Consul	Consul: Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	DEPENDENT	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_leader_dispatchLog_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	DEPENDENT	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_raft_commitTime_count` ⛔️ON_FAIL: `DISCARD_VALUE ->` - CHANGE_PER_SECOND
Consul	Consul: Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	DEPENDENT	consul.autopilot.healthy[{#SINGLETON}] Preprocessing: - PROMETHEUS_PATTERN: `consul_autopilot_healthy` ⛔️ON_FAIL: `DISCARD_VALUE ->`
Zabbix raw items	Consul: Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP_AGENT	consul.get_metrics Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->`
Zabbix raw items	Consul: Get node info	Get configuration and member information of the local agent.	HTTP_AGENT	consul.get_node_info Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->`
Zabbix raw items	Consul: Get local services	Get all the services that are registered with the local agent and their status.	SCRIPT	consul.get_local_services Expression: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Version has been changed	Consul version has changed. Ack to close.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`	INFO	Manual close: YES
Consul: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`	WARNING
Consul: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`	WARNING	Depends on: - Consul: Node's health score is critical
Consul: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`	AVERAGE
Consul: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`	WARNING
Consul: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`	WARNING
Consul: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`	AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

This template is for Zabbix version: 6.0

Also available for: 7.4 7.2 7.0 6.4 6.2

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul?at=release/6.0

HashiCorp Consul Node by HTTP

Overview

Template HashiCorp Consul Node by HTTP — collects metrics by HTTP agent from /v1/agent/metrics endpoint.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.NODE.API.URL}	Consul instance URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.MATCHES}	Filter of discoverable discovered services on local node.	`.*`
{$CONSUL.LLD.FILTER.LOCAL_SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services on local node.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES}	Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'.	`CHANGE IF NEEDED`
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}	Maximum acceptable value of node's health score for WARNING trigger expression.	`2`
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}	Maximum acceptable value of node's health score for AVERAGE trigger expression.	`4`

Items

Name	Description	Type	Key and additional info
Consul: Get instance metrics	Get raw metrics from Consul instance /metrics endpoint.	HTTP agent	consul.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Get node info	Get configuration and member information of the local agent.	HTTP agent	consul.get_node_info Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Role	Role of current Consul agent.	Dependent item	consul.role Preprocessing JSON Path: `$.Config.Server` Boolean to decimal Discard unchanged with heartbeat: `3h`
Consul: Version	Version of Consul agent.	Dependent item	consul.version Preprocessing JSON Path: `$.Config.Version` Discard unchanged with heartbeat: `3h`
Consul: Number of services	Number of services on current node.	Dependent item	consul.services_number Preprocessing JSON Path: `$.Stats.agent.services` Discard unchanged with heartbeat: `3h`
Consul: Number of checks	Number of checks on current node.	Dependent item	consul.checks_number Preprocessing JSON Path: `$.Stats.agent.checks` Discard unchanged with heartbeat: `3h`
Consul: Number of check monitors	Number of check monitors on current node.	Dependent item	consul.check_monitors_number Preprocessing JSON Path: `$.Stats.agent.check_monitors` Discard unchanged with heartbeat: `3h`
Consul: Process CPU seconds, total	Total user and system CPU time spent in seconds.	Dependent item	consul.cpu_seconds_total.rate Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` ⛔️Custom on fail: Discard value Change per second
Consul: Virtual memory size	Virtual memory size in bytes.	Dependent item	consul.virtual_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Consul: RSS memory usage	Resident memory size in bytes.	Dependent item	consul.resident_memory_bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Consul: Goroutine count	The number of Goroutines on Consul instance.	Dependent item	consul.goroutines Preprocessing Prometheus pattern: `VALUE(go_goroutines)`
Consul: Open file descriptors	Number of open file descriptors.	Dependent item	consul.process_open_fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Consul: Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	consul.process_max_fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Consul: Client RPC, per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.	Dependent item	consul.client_rpc Preprocessing Prometheus pattern: `VALUE(consul_client_rpc)` ⛔️Custom on fail: Discard value Change per second
Consul: Client RPC failed ,per second	Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.	Dependent item	consul.client_rpc_failed Preprocessing Prometheus pattern: `VALUE(consul_client_rpc_failed)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP connections, accepted per second	This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second.	Dependent item	consul.memberlist.tcp_accept Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_accept)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP connections, per second	This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second.	Dependent item	consul.memberlist.tcp_connect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_connect)` ⛔️Custom on fail: Discard value Change per second
Consul: TCP send bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second.	Dependent item	consul.memberlist.tcp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_tcp_sent)` ⛔️Custom on fail: Discard value Change per second
Consul: UDP received bytes, per second	This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_received Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_received)` ⛔️Custom on fail: Discard value Change per second
Consul: UDP sent bytes, per second	This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second.	Dependent item	consul.memberlist.udp_sent Preprocessing Prometheus pattern: `VALUE(consul_memberlist_udp_sent)` ⛔️Custom on fail: Discard value Change per second
Consul: GC pause, p90	The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p90 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Consul: GC pause, p50	The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds.	Dependent item	consul.gc_pause.p50 Preprocessing Prometheus pattern: `VALUE(consul_runtime_gc_pause_ns{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Custom multiplier: `1.0E-9`
Consul: Memberlist: degraded	This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.	Dependent item	consul.memberlist.degraded Preprocessing Prometheus pattern: `VALUE(consul_memberlist_degraded)` ⛔️Custom on fail: Discard value
Consul: Memberlist: health score	This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".	Dependent item	consul.memberlist.health_score Preprocessing Prometheus pattern: `VALUE(consul_memberlist_health_score)` ⛔️Custom on fail: Discard value
Consul: Memberlist: gossip, p90	The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.dispatch_log.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: gossip, p50	The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.	Dependent item	consul.memberlist.gossip.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_gossip{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: msg alive	This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.	Dependent item	consul.memberlist.msg.alive Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_alive)` ⛔️Custom on fail: Discard value
Consul: Memberlist: msg dead	This metric counts the number of times a Consul agent has marked another agent to be a dead node.	Dependent item	consul.memberlist.msg.dead Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_dead)` ⛔️Custom on fail: Discard value
Consul: Memberlist: msg suspect	The number of times a Consul agent suspects another as failed while probing during gossip protocol.	Dependent item	consul.memberlist.msg.suspect Preprocessing Prometheus pattern: `VALUE(consul_memberlist_msg_suspect)` ⛔️Custom on fail: Discard value
Consul: Memberlist: probe node, p90	The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: probe node, p50	The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent.	Dependent item	consul.memberlist.probe_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_probeNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: push pull node, p90	The 90 percentile for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p90 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Memberlist: push pull node, p50	The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent.	Dependent item	consul.memberlist.push_pull_node.p50 Preprocessing Prometheus pattern: `VALUE(consul_memberlist_pushPullNode{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, p90	The 90 percentile for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p90 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, p50	The 50 percentile (median) for the time it takes to complete an update to the KV store.	Dependent item	consul.kvs.apply.p50 Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: KV store: apply, rate	The number of updates to the KV store per second.	Dependent item	consul.kvs.apply.rate Preprocessing Prometheus pattern: `VALUE(consul_kvs_apply_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: flap, rate	Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.flap.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_flap)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: failed, rate	Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second.	Dependent item	consul.serf.member.failed.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_failed)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: join, rate	Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second.	Dependent item	consul.serf.member.join.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_join)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: left, rate	Increments when an agent leaves the cluster. Shown as events per second.	Dependent item	consul.serf.member.left.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_left)` ⛔️Custom on fail: Discard value Change per second
Consul: Serf member: update, rate	Increments when a Consul agent updates. Shown as events per second.	Dependent item	consul.serf.member.update.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_member_update)` ⛔️Custom on fail: Discard value Change per second
Consul: ACL: resolves, rate	The number of ACL resolves per second.	Dependent item	consul.acl.resolves.rate Preprocessing Prometheus pattern: `VALUE(consul_acl_ResolveToken_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Catalog: register, rate	The number of catalog register operation per second.	Dependent item	consul.catalog.register.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_register_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Catalog: deregister, rate	The number of catalog deregister operation per second.	Dependent item	consul.catalog.deregister.rate Preprocessing Prometheus pattern: `VALUE(consul_catalog_deregister_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Snapshot: append line, p90	The 90 percentile for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: append line, p50	The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log.	Dependent item	consul.snapshot.append_line.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: append line, rate	The number of snapshot appendLine operations per second.	Dependent item	consul.snapshot.append_line.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_appendLine_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Snapshot: compact, p90	The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p90 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: compact, p50	The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction.	Dependent item	consul.snapshot.compact.p50 Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Snapshot: compact, rate	The number of snapshot compact operations per second.	Dependent item	consul.snapshot.compact.rate Preprocessing Prometheus pattern: `VALUE(consul_serf_snapshot_compact_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Get local services	Get all the services that are registered with the local agent and their status.	Script	consul.get_local_services
Consul: Get local services check	Data collection check.	Dependent item	consul.get_local_services.check Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Version has been changed	Consul version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0`	Info	Manual close: Yes
Consul: Current number of open files is too high	"Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue."	`min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN}`	Warning
Consul: Node's health score is warning	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN}`	Warning	Depends on: Consul: Node's health score is critical
Consul: Node's health score is critical	This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf	`max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH}`	Average
Consul: Failed to get local services	Failed to get local services. Check debug log for more information.	`length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0`	Warning

LLD rule Local node services discovery

Name Description Type Key and additional info

Local node services discovery

Discover metrics for services that are registered with the local agent.

Dependent item

consul.node_services_lld

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Local node services discovery

Name Description Type Key and additional info

Consul: ["{#SERVICE_NAME}"]: Aggregated status

Aggregated values of all health checks for the service instance.

Dependent item

consul.service.aggregated_state["{#SERVICE_ID}"]

Preprocessing

JSON Path: $[?(@.id == "{#SERVICE_ID}")].status.first()
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Status

Current state of health check for the service.

Dependent item

consul.service.check.state["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Consul: ["{#SERVICE_NAME}"]: Check ["{#SERVICE_CHECK_NAME}"]: Output

Current output of health check for the service.

Dependent item

consul.service.check.output["{#SERVICE_ID}/{#SERVICE_CHECK_ID}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Trigger prototypes for Local node services discovery

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Aggregated status is 'warning'	Aggregated state of service on the local agent is 'warning'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1`	Warning
Consul: Aggregated status is 'critical'	Aggregated state of service on the local agent is 'critical'.	`last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2`	Average

LLD rule HTTP API methods discovery

Name Description Type Key and additional info

HTTP API methods discovery

Discovery HTTP API methods specific metrics.

Dependent item

consul.http_api_discovery

Preprocessing

Prometheus to JSON: consul_api_http{method =~ ".*"}
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for HTTP API methods discovery

Name Description Type Key and additional info

Consul: HTTP request: ["{#HTTP_METHOD}"], p90

The 90 percentile of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p90["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], p50

The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb.

Dependent item

consul.http.api.p50["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: The text is too long. Please see the template.
⛔️Custom on fail: Discard value

Consul: HTTP request: ["{#HTTP_METHOD}"], rate

The number of HTTP request for the given verb per second.

Dependent item

consul.http.api.rate["{#HTTP_METHOD}"]

Preprocessing

Prometheus pattern: SUM(consul_api_http_count{method = "{#HTTP_METHOD}"})
⛔️Custom on fail: Discard value
Change per second

LLD rule Raft server metrics discovery

Name Description Type Key and additional info

Raft server metrics discovery

Discover raft metrics for server nodes.

Dependent item

consul.raft.server.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft server metrics discovery

Name	Description	Type	Key and additional info
Consul: Raft state	Current state of Consul agent.	Dependent item	consul.raft.state[{#SINGLETON}] Preprocessing JSON Path: `$.Stats.raft.state` Discard unchanged with heartbeat: `3h`
Consul: Raft state: leader	Increments when a server becomes a leader.	Dependent item	consul.raft.state_leader[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_leader)` ⛔️Custom on fail: Discard value
Consul: Raft state: candidate	The number of initiated leader elections.	Dependent item	consul.raft.state_candidate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_state_candidate)` ⛔️Custom on fail: Discard value
Consul: Raft: apply, rate	Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second.	Dependent item	consul.raft.apply.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_apply)` ⛔️Custom on fail: Discard value Change per second

LLD rule Raft leader metrics discovery

Name Description Type Key and additional info

Raft leader metrics discovery

Discover raft metrics for leader nodes.

Dependent item

consul.raft.leader.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Raft leader metrics discovery

Name	Description	Type	Key and additional info
Consul: Raft state: leader last contact, p90	The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: leader last contact, p50	The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds.	Dependent item	consul.raft.leader_last_contact.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_lastContact{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: commit time, p90	The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: commit time, p50	The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds.	Dependent item	consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, p90	The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.9"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, p50	The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds.	Dependent item	consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog{quantile="0.5"})` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Consul: Raft state: dispatch log, rate	The number of times a Raft leader writes a log to disk per second.	Dependent item	consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_leader_dispatchLog_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Raft state: commit, rate	The number of commits a new entry to the Raft log on the leader per second.	Dependent item	consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_raft_commitTime_count)` ⛔️Custom on fail: Discard value Change per second
Consul: Autopilot healthy	Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy.	Dependent item	consul.autopilot.healthy[{#SINGLETON}] Preprocessing Prometheus pattern: `VALUE(consul_autopilot_healthy)` ⛔️Custom on fail: Discard value

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.4

Also available for: 7.2 7.0 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul_cluster?at=release/7.4

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints. More information about metrics you can find in official documentation.

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services. In case of Open Source version leave this macro empty.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration. NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.get_cluster_serf Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.get_catalog_services Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`	Info	Manual close: Yes
HashiCorp Consul Cluster: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`	Average
HashiCorp Consul Cluster: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`	Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodes_passing["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodes_warning["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodes_critical["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.get_service_stats["{#SERVICE_NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.2

Also available for: 7.4 7.0 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul_cluster?at=release/7.2

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints. More information about metrics you can find in official documentation.

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration. NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.get_cluster_serf Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.get_catalog_services Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`	Info	Manual close: Yes
HashiCorp Consul Cluster: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`	Average
HashiCorp Consul Cluster: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`	Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodes_passing["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodes_warning["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodes_critical["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.get_service_stats["{#SERVICE_NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.0

Also available for: 7.4 7.2 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul_cluster?at=release/7.0

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints. More information about metrics you can find in official documentation.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration. NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.get_cluster_serf Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.get_catalog_services Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value
Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`	Info	Manual close: Yes
HashiCorp Consul Cluster: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`	Average
HashiCorp Consul Cluster: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`	Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodes_passing["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodes_warning["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodes_critical["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.get_service_stats["{#SERVICE_NAME}"] Preprocessing Check for not supported value: `any error` ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
HashiCorp Consul Cluster: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.4

Also available for: 7.4 7.2 7.0 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul_cluster?at=release/6.4

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services.
In case of Open Source version leave this macro empty.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Consul cluster: Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Consul cluster: Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Consul cluster: Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul cluster: Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.get_cluster_serf Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Consul cluster: Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.get_catalog_services Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Consul cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`	Info	Manual close: Yes
Consul: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`	Average
Consul: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`	Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Consul: Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Consul: Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodes_passing["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodes_warning["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodes_critical["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul cluster: ["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.get_service_stats["{#SERVICE_NAME}"] Preprocessing Check for not supported value ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.CLUSTER.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.2

Also available for: 7.4 7.2 7.0 6.4 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul_cluster?at=release/6.2

HashiCorp Consul Cluster by HTTP

Overview

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.

This template was tested on:

HashiCorp Consul, version 1.10.0

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services.
In case of Open Source version leave this macro empty.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.	``
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

Consul cluster nodes discovery

DEPENDENT

consul.lld_nodes

Preprocessing:

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

- {#NODE_NAME} MATCHES_REGEX {$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}

- {#NODE_NAME} NOT_MATCHES_REGEX {$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}

Consul cluster services discovery

DEPENDENT

consul.lld_services

Preprocessing:

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

- {#SERVICE_NAME} MATCHES_REGEX {$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}

- {#SERVICE_NAME} NOT_MATCHES_REGEX {$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}

Items collected

Group	Name	Description	Type	Key and additional info
Consul	Consul: Nodes: total	Number of nodes on current dc.	DEPENDENT	consul.nodes_total Preprocessing: - JSONPATH: `$.length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Nodes: passing	Number of agents on current dc with serf health status 'passing'.	DEPENDENT	consul.nodes_passing Preprocessing: - JSONPATH: `$[?(@.Status == "passing")].length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Nodes: critical	Number of agents on current dc with serf health status 'critical'.	DEPENDENT	consul.nodes_critical Preprocessing: - JSONPATH: `$[?(@.Status == "critical")].length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Nodes: warning	Number of agents on current dc with serf health status 'warning'.	DEPENDENT	consul.nodes_warning Preprocessing: - JSONPATH: `$[?(@.Status == "warning")].length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Services: total	Number of services on current dc.	DEPENDENT	consul.services_total Preprocessing: - JAVASCRIPT: `return Object.keys(JSON.parse(value)).length;` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Node ["{#NODE_NAME}"]: Serf Health	Node Serf Health Status.	DEPENDENT	consul.serf.health["{#NODE_NAME}"] Preprocessing: - JSONPATH: `$[?(@.Node == "{#NODE_NAME}" && @.CheckID == "serfHealth")].Status.first()` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Service ["{#SERVICE_NAME}"]: Nodes passing	-	DEPENDENT	consul.service.nodes_passing["{#SERVICE_NAME}"] Preprocessing: - JSONPATH: `$[?(@.Service.Service == "{#SERVICE_NAME}")].Checks[?(@.CheckID == "serfHealth" && @.Status == 'passing')].length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Service ["{#SERVICE_NAME}"]: Nodes warning	-	DEPENDENT	consul.service.nodes_warning["{#SERVICE_NAME}"] Preprocessing: - JSONPATH: `$[?(@.Service.Service == "{#SERVICE_NAME}")].Checks[?(@.CheckID == "serfHealth" && @.Status == 'warning')].length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul	Consul: Service ["{#SERVICE_NAME}"]: Nodes critical	-	DEPENDENT	consul.service.nodes_critical["{#SERVICE_NAME}"] Preprocessing: - JSONPATH: `$[?(@.Service.Service == "{#SERVICE_NAME}")].Checks[?(@.CheckID == "serfHealth" && @.Status == 'critical')].length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Consul cluster	Consul cluster: Cluster leader	Current leader address.	HTTP_AGENT	consul.get_leader Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->` - TRIM: `"` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	Consul cluster: Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP_AGENT	consul.get_peers Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->` - JSONPATH: `$.length()` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
Zabbix raw items	Consul cluster: Get nodes	Catalog of nodes registered in a given datacenter.	HTTP_AGENT	consul.get_nodes Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->`
Zabbix raw items	Consul cluster: Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP_AGENT	consul.get_cluster_serf Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->`
Zabbix raw items	Consul cluster: Get services	Catalog of services registered in a given datacenter.	HTTP_AGENT	consul.get_catalog_services Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->`
Zabbix raw items	Consul cluster: ["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP_AGENT	consul.get_service_stats["{#SERVICE_NAME}"] Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: `DISCARD_VALUE ->`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Consul: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`	AVERAGE
Consul: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`	WARNING
Consul: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'
One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.CLUSTER.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`	AVERAGE
Consul cluster: Leader has been changed	Consul cluster version has changed. Ack to close.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`	INFO	Manual close: YES

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

This template is for Zabbix version: 6.0

Also available for: 7.4 7.2 7.0 6.4 6.2

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/consul_http/consul_cluster?at=release/6.0

HashiCorp Consul Cluster by HTTP

Overview

The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template HashiCorp Consul Cluster by HTTP — collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

HashiCorp Consul 1.10.0

Configuration

Setup

Template need to use Authorization via API token.

Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.

This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services.
In case of Open Source version leave this macro empty.

NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.

Macros used

Name	Description	Default
{$CONSUL.CLUSTER.URL}	Consul cluster URL.	`http://localhost:8500`
{$CONSUL.TOKEN}	Consul auth token.	`<PUT YOUR AUTH TOKEN>`
{$CONSUL.NAMESPACE}	Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services.
{$CONSUL.API.SCHEME}	Consul API scheme. Using in node LLD.	`http`
{$CONSUL.API.PORT}	Consul API port. Using in node LLD.	`8500`
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES}	Filter of discoverable discovered nodes.	`.*`
{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES}	Filter to exclude discovered nodes.	`CHANGE IF NEEDED`
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES}	Filter of discoverable discovered services.	`.*`
{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES}	Filter to exclude discovered services.	`CHANGE IF NEEDED`
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG}	Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context.	`0`

Items

Name	Description	Type	Key and additional info
Consul cluster: Cluster leader	Current leader address.	HTTP agent	consul.get_leader Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Trim: `"` Discard unchanged with heartbeat: `1h`
Consul cluster: Nodes: peers	The number of Raft peers for the datacenter in which the agent is running.	HTTP agent	consul.get_peers Preprocessing Check for not supported value ⛔️Custom on fail: Discard value JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Consul cluster: Get nodes	Catalog of nodes registered in a given datacenter.	HTTP agent	consul.get_nodes Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul cluster: Get nodes Serf health status	Get Serf Health Status for all agents in cluster.	HTTP agent	consul.get_cluster_serf Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Nodes: total	Number of nodes on current dc.	Dependent item	consul.nodes_total Preprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: passing	Number of agents on current dc with serf health status 'passing'.	Dependent item	consul.nodes_passing Preprocessing JSON Path: `$[?(@.Status == "passing")].length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: critical	Number of agents on current dc with serf health status 'critical'.	Dependent item	consul.nodes_critical Preprocessing JSON Path: `$[?(@.Status == "critical")].length()` Discard unchanged with heartbeat: `3h`
Consul: Nodes: warning	Number of agents on current dc with serf health status 'warning'.	Dependent item	consul.nodes_warning Preprocessing JSON Path: `$[?(@.Status == "warning")].length()` Discard unchanged with heartbeat: `3h`
Consul cluster: Get services	Catalog of services registered in a given datacenter.	HTTP agent	consul.get_catalog_services Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
Consul: Services: total	Number of services on current dc.	Dependent item	consul.services_total Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Consul cluster: Leader has been changed	Consul cluster version has changed. Acknowledge to close the problem manually.	`last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0`	Info	Manual close: Yes
Consul: One or more nodes in cluster in 'critical' state	One or more agents on current dc with serf health status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0`	Average
Consul: One or more nodes in cluster in 'warning' state	One or more agents on current dc with serf health status 'warning'.	`last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0`	Warning

LLD rule Consul cluster nodes discovery

Name Description Type Key and additional info

Consul cluster nodes discovery

Dependent item

consul.lld_nodes

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster nodes discovery

Name Description Type Key and additional info

Consul: Node ["{#NODE_NAME}"]: Serf Health

Node Serf Health Status.

Dependent item

consul.serf.health["{#NODE_NAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

LLD rule Consul cluster services discovery

Name Description Type Key and additional info

Consul cluster services discovery

Dependent item

consul.lld_services

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Consul cluster services discovery

Name	Description	Type	Key and additional info
Consul: Service ["{#SERVICE_NAME}"]: Nodes passing	The number of nodes with service status `passing` from those registered.	Dependent item	consul.service.nodes_passing["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: Service ["{#SERVICE_NAME}"]: Nodes warning	The number of nodes with service status `warning` from those registered.	Dependent item	consul.service.nodes_warning["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul: Service ["{#SERVICE_NAME}"]: Nodes critical	The number of nodes with service status `critical` from those registered.	Dependent item	consul.service.nodes_critical["{#SERVICE_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Consul cluster: ["{#SERVICE_NAME}"]: Get raw service state	Retrieve service instances providing the service indicated on the path.	HTTP agent	consul.get_service_stats["{#SERVICE_NAME}"] Preprocessing Check for not supported value ⛔️Custom on fail: Discard value

Trigger prototypes for Consul cluster services discovery

Name	Description	Expression	Severity	Dependencies and additional info
Consul: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical'	One or more nodes with service status 'critical'.	`last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.CLUSTER.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

Zabbix 7.4 - Less work. More depth.

Vyzkoušejte Zabbix Cloud s bezplatnou zkušební verzí

Staňte se partnerem Zabbixu

Zabbix Academy is launched!

Přidejte se k našemu globálnímu týmu!

Zabbix + HashiCorp Consul