Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/7.4
Etcd by HTTP
Overview
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the
vendor documentation
.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details onUpgrade etcd from 3.4 to 3.5
. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Requirements
Zabbix version: 7.4 and higher.
Tested versions
This template has been tested on:
- Etcd 3.5.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. -
Check if
etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. -
Add the template to the
etcd
node. Set the hostname or IP address of theetcd
host in the{$ETCD.HOST}
macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls
flag.
For more details, see the etcd documentation
.
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run:
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Get node metrics | HTTP agent | etcd.get_metrics | |
Node health | HTTP agent | etcd.health Preprocessing
|
|
Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Get version | HTTP agent | etcd.get_version | |
Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |
Average | Manual close: Yes | |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
Average | Depends on:
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
Warning | Manual close: Yes Depends on:
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
Average | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
Warning | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
Warning | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
Warning | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
Warning | |
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
Info | Manual close: Yes |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
Info | Manual close: Yes |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
Info | Manual close: Yes |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
Warning |
LLD rule gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Item prototypes for gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Trigger prototypes for gRPC codes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
Warning |
LLD rule Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Item prototypes for Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/7.2
Etcd by HTTP
Overview
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the
vendor documentation
.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details onUpgrade etcd from 3.4 to 3.5
. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Requirements
Zabbix version: 7.2 and higher.
Tested versions
This template has been tested on:
- Etcd 3.5.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. -
Check if
etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. -
Add the template to the
etcd
node. Set the hostname or IP address of theetcd
host in the{$ETCD.HOST}
macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls
flag.
For more details, see the etcd documentation
.
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run:
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Get node metrics | HTTP agent | etcd.get_metrics | |
Node health | HTTP agent | etcd.health Preprocessing
|
|
Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Get version | HTTP agent | etcd.get_version | |
Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |
Average | Manual close: Yes | |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
Average | Depends on:
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
Warning | Manual close: Yes Depends on:
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
Average | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
Warning | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
Warning | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
Warning | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
Warning | |
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
Info | Manual close: Yes |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
Info | Manual close: Yes |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
Info | Manual close: Yes |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
Warning |
LLD rule gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Item prototypes for gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Trigger prototypes for gRPC codes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
Warning |
LLD rule Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Item prototypes for Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/7.0
Etcd by HTTP
Overview
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the
vendor documentation
.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details onUpgrade etcd from 3.4 to 3.5
. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Etcd 3.5.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. -
Check if
etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. -
Add the template to the
etcd
node. Set the hostname or IP address of theetcd
host in the{$ETCD.HOST}
macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls
flag.
For more details, see the etcd documentation
.
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run:
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Get node metrics | HTTP agent | etcd.get_metrics | |
Node health | HTTP agent | etcd.health Preprocessing
|
|
Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Get version | HTTP agent | etcd.get_version | |
Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |
Average | Manual close: Yes | |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
Average | Depends on:
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
Warning | Manual close: Yes Depends on:
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
Average | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
Warning | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
Warning | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
Warning | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
Warning | |
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
Info | Manual close: Yes |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
Info | Manual close: Yes |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
Info | Manual close: Yes |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
Warning |
LLD rule gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Item prototypes for gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Trigger prototypes for gRPC codes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
Warning |
LLD rule Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Item prototypes for Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/6.4
Etcd by HTTP
Overview
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the
vendor documentation
.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details onUpgrade etcd from 3.4 to 3.5
. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Requirements
Zabbix version: 6.4 and higher.
Tested versions
This template has been tested on:
- Etcd 3.5.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. -
Check if
etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. -
Add the template to the
etcd
node. Set the hostname or IP address of theetcd
host in the{$ETCD.HOST}
macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls
flag.
For more details, see the etcd documentation
.
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run:
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Etcd: Get node metrics | HTTP agent | etcd.get_metrics | |
Etcd: Node health | HTTP agent | etcd.health Preprocessing
|
|
Etcd: Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Etcd: Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Etcd: Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Etcd: Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Etcd: Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Etcd: Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Etcd: Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Etcd: Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Etcd: Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Etcd: Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Etcd: Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
Etcd: HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
Etcd: HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
Etcd: HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
Etcd: RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Etcd: Get version | HTTP agent | etcd.get_version | |
Etcd: Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Etcd: Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
Etcd: DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Etcd: Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Etcd: Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Etcd: Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Etcd: Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Etcd: Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Etcd: Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
Etcd: CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Etcd: Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Etcd: Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
Etcd: PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Etcd: Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Etcd: Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Etcd: Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |
Average | Manual close: Yes | |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
Average | Depends on:
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
Warning | Manual close: Yes Depends on:
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
Average | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
Warning | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
Warning | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
Warning | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
Warning | |
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
Info | Manual close: Yes |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
Info | Manual close: Yes |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
Info | Manual close: Yes |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
Warning |
LLD rule gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Item prototypes for gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Trigger prototypes for gRPC codes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
Warning |
LLD rule Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Item prototypes for Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/6.2
Etcd by HTTP
Overview
For Zabbix version: 6.2 and higher.
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the vendor documentation.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
This template has been tested on:
- Etcd, version 3.5.6
Setup
See Zabbix template operation for basic instructions.
Follow these instructions:
- Import the template into Zabbix.
- After importing the template, make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. - Check if
etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. - Add the template to each
etcd node
. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls flag
. (For more details, see etcd documentation).
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run :
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
`Aborted |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.PASSWORD} | - |
`` |
{$ETCD.PORT} | The port of |
2379 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | - |
`` |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | - |
DEPENDENT | etcd.grpc_code.discovery Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- {#GRPC.CODE} NOT_MATCHES_REGEX - {#GRPC.CODE} MATCHES_REGEX Overrides: trigger |
Peers discovery | - |
DEPENDENT | etcd.peer.discovery Preprocessing: - PROMETHEUS_TO_JSON: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Etcd | Etcd: Service's TCP port state | - |
SIMPLE | net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Node health | - |
HTTP_AGENT | etcd.health Preprocessing: - JSONPATH: - BOOL_TO_DECIMAL ⛔️ON_FAIL: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
DEPENDENT | etcd.is.leader Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
DEPENDENT | etcd.has.leader Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Leader changes | The number of leader changes the member has seen since its start. |
DEPENDENT | etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Proposals committed per second | The number of consensus proposals committed. |
DEPENDENT | etcd.proposals.committed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals applied per second | The number of consensus proposals applied. |
DEPENDENT | etcd.proposals.applied.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals failed per second | The number of failed proposals seen. |
DEPENDENT | etcd.proposals.failed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals pending | The current number of pending proposals to commit. |
DEPENDENT | etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Reads per second | The number of read actions by |
DEPENDENT | etcd.reads.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Writes per second | The number of writes (e.g., |
DEPENDENT | etcd.writes.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
DEPENDENT | etcd.network.grpc.received.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
DEPENDENT | etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP requests received | The number of requests received into the system (successfully parsed and |
DEPENDENT | etcd.http.requests.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
DEPENDENT | etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
DEPENDENT | etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
DEPENDENT | etcd.grpc.received.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
DEPENDENT | etcd.grpc.sent.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs started per second | The number of RPCs started on the server. |
DEPENDENT | etcd.grpc.started.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Server version | The version of the |
DEPENDENT | etcd.server.version Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Cluster version | The version of the |
DEPENDENT | etcd.cluster.version Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: DB size | The total size of the underlying database. |
DEPENDENT | etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Keys compacted per second | The number of DB keys compacted per second. |
DEPENDENT | etcd.keys.compacted.rate Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Keys expired per second | The number of expired keys per second. |
DEPENDENT | etcd.keys.expired.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Keys total | The total number of keys. |
DEPENDENT | etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Uptime |
|
DEPENDENT | etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: - JAVASCRIPT: |
Etcd | Etcd: Virtual memory | The size of virtual memory expressed in bytes. |
DEPENDENT | etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Resident memory | The size of resident memory expressed in bytes. |
DEPENDENT | etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: CPU | The total user and system CPU time spent in seconds. |
DEPENDENT | etcd.cpu.util Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Open file descriptors | The number of open file descriptors. |
DEPENDENT | etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
DEPENDENT | etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Deletes per second | The number of deletes seen by this member per second. |
DEPENDENT | etcd.delete.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: PUT per second | The number of puts seen by this member per second. |
DEPENDENT | etcd.put.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Range per second | The number of ranges seen by this member per second. |
DEPENDENT | etcd.range.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Transaction per second | The number of transactions seen by this member per second. |
DEPENDENT | etcd.txn.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Pending events | The total number of pending events to be sent. |
DEPENDENT | etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
DEPENDENT | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
DEPENDENT | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
DEPENDENT | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
DEPENDENT | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
DEPENDENT | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Zabbix raw items | Etcd: Get node metrics | - |
HTTP_AGENT | etcd.get_metrics |
Zabbix raw items | Etcd: Get version | - |
HTTP_AGENT | etcd.get_version |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | - |
last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0 |
AVERAGE | Manual close: YES |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
AVERAGE | Depends on: - Etcd: Service is unavailable |
Etcd: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
WARNING | Manual close: YES Depends on: - Etcd: Service is unavailable |
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
AVERAGE | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
WARNING | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
WARNING | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
WARNING | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
WARNING | |
Etcd: Server version has changed | The Etcd version has changed. Acknowledge to close manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
INFO | Manual close: YES |
Etcd: Cluster version has changed | The Etcd version has changed. Acknowledge to close manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
INFO | Manual close: YES |
Etcd: Host has been restarted | The host uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
INFO | Manual close: YES |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
WARNING | |
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | - |
min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
WARNING |
Feedback
Please report any issues with the template at https://support.zabbix.com.
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/6.0
Etcd by HTTP
Overview
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the vendor documentation.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Requirements
Zabbix version: 6.0 and higher.
Tested versions
This template has been tested on:
- Etcd 3.5.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Follow these instructions:
- Import the template into Zabbix.
- After importing the template, make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. - Check if
etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. - Add the template to each
etcd node
. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls flag
. (For more details, see etcd documentation).
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run :
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.PORT} | The port of |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing
|
|
Etcd: Get node metrics | HTTP agent | etcd.get_metrics | |
Etcd: Node health | HTTP agent | etcd.health Preprocessing
|
|
Etcd: Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Etcd: Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Etcd: Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Etcd: Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Etcd: Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Etcd: Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Etcd: Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Etcd: Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Etcd: Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Etcd: Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Etcd: Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
Etcd: HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
Etcd: HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
Etcd: HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
Etcd: RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Etcd: Get version | HTTP agent | etcd.get_version | |
Etcd: Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Etcd: Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
Etcd: DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Etcd: Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Etcd: Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Etcd: Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Etcd: Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Etcd: Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Etcd: Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
Etcd: CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Etcd: Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Etcd: Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
Etcd: PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Etcd: Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Etcd: Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Etcd: Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0 |
Average | Manual close: Yes | |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
Average | Depends on:
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
Warning | Manual close: Yes Depends on:
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
Average | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
Warning | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
Warning | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
Warning | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
Warning | |
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
Info | Manual close: Yes |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
Info | Manual close: Yes |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
Info | Manual close: Yes |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
Warning |
LLD rule gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Item prototypes for gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Trigger prototypes for gRPC codes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
Warning |
LLD rule Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Item prototypes for Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/5.4
Etcd by HTTP
Overview
For Zabbix version: 5.4 and higher
The template to monitor Etcd by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Etcd
— collects metrics by HTTP agent from /metrics endpoint.
See https://etcd.io/docs/v3.4.0/op-guide/monitoring/#metrics-endpoint.
This template was tested on:
- Etcd, version 3.0+
Setup
See Zabbix template operation for basic instructions.
- Import template into Zabbix
- After importing template make sure that etcd allows for metric collection.
Test by running:
curl -L http://localhost:2379/metrics
- Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring.
To verify run
curl -L http://<etcd_node_address>:2379/metrics
- Add the template to each node with etcd. By default template use client port. You can configure metrics endpoint location by --listen-metrics-urls flag (See etcd docs).
If you have specified a non-standard port for etcd, don't forget change macros {$ETCD.SCHEME}, {$ETCD.PORT}.
If you need it, you can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template for using on the host level.
Test availability: zabbix_get -s etcd-host -k etcd.health
Besides, see the macros section as it will set the trigger values.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.GRPC.ERRORS.MAX.WARN} | Maximum number of gRPC requests failures. |
1 |
{$ETCD.GRPC_CODE.MATCHES} | Filter of discoverable gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | Filter to exclude discovered gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | Filter of discoverable gRPC codes which will be create triggers. |
`Aborted |
{$ETCD.HTTP.FAIL.MAX.WARN} | Maximum number of HTTP requests failures. |
2 |
{$ETCD.LEADER.CHANGES.MAX.WARN} | Maximum number of leader changes. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
90 |
{$ETCD.PASSWORD} | - |
`` |
{$ETCD.PORT} | The port of Etcd API endpoint. |
2379 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | Maximum number of proposal failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | Maximum number of proposals in queue. |
5 |
{$ETCD.SCHEME} | Request scheme which may be http or https. |
http |
{$ETCD.USER} | - |
`` |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | - |
DEPENDENT | etcd.grpc_code.discovery Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- {#GRPC.CODE} NOT_MATCHES_REGEX - {#GRPC.CODE} MATCHES_REGEX Overrides: trigger |
Peers discovery | - |
DEPENDENT | etcd.peer.discovery Preprocessing: - PROMETHEUS_TO_JSON: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Etcd | Etcd: Service's TCP port state | - |
SIMPLE | net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Node health | - |
HTTP_AGENT | etcd.health Preprocessing: - JSONPATH: - BOOL_TO_DECIMAL ⛔️ON_FAIL: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server is a leader | Whether or not this member is a leader. 1 if is, 0 otherwise. |
DEPENDENT | etcd.is.leader Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server has a leader | Whether or not a leader exists. 1 is existence, 0 is not. |
DEPENDENT | etcd.has.leader Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Leader changes | The the number of leader changes the member has seen since its start. |
DEPENDENT | etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Proposals committed per second | The number of consensus proposals committed. |
DEPENDENT | etcd.proposals.committed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals applied per second | The number of consensus proposals applied. |
DEPENDENT | etcd.proposals.applied.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals failed per second | The number of failed proposals seen. |
DEPENDENT | etcd.proposals.failed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals pending | The current number of pending proposals to commit. |
DEPENDENT | etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Reads per second | Number of reads action by (get/getRecursive), local to this member. |
DEPENDENT | etcd.reads.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Writes per second | Number of writes (e.g. set/compareAndDelete) seen by this member. |
DEPENDENT | etcd.writes.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Client gRPC received bytes per second | The number of bytes received from grpc clients per second. |
DEPENDENT | etcd.network.grpc.received.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Client gRPC sent bytes per second | The number of bytes sent from grpc clients per second. |
DEPENDENT | etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP requests received | Number of requests received into the system (successfully parsed and authd). |
DEPENDENT | etcd.http.requests.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP 5XX | Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 5XX. |
DEPENDENT | etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP 4XX | Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 4XX. |
DEPENDENT | etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
DEPENDENT | etcd.grpc.received.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
DEPENDENT | etcd.grpc.sent.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs started per second | The number of RPCs started on the server. |
DEPENDENT | etcd.grpc.started.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Server version | Version of the Etcd server. |
DEPENDENT | etcd.server.version Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Cluster version | Version of the Etcd cluster. |
DEPENDENT | etcd.cluster.version Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: DB size | Total size of the underlying database. |
DEPENDENT | etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Keys compacted per second | The number of DB keys compacted per second. |
DEPENDENT | etcd.keys.compacted.rate Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Keys expired per second | The number of expired keys per second. |
DEPENDENT | etcd.keys.expired.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Keys total | Total number of keys. |
DEPENDENT | etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Uptime | Etcd server uptime. |
DEPENDENT | etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: - JAVASCRIPT: |
Etcd | Etcd: Virtual memory | Virtual memory size in bytes. |
DEPENDENT | etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Resident memory | Resident memory size in bytes. |
DEPENDENT | etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: CPU | Total user and system CPU time spent in seconds. |
DEPENDENT | etcd.cpu.util Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Open file descriptors | Number of open file descriptors. |
DEPENDENT | etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
DEPENDENT | etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Deletes per second | The number of deletes seen by this member per second. |
DEPENDENT | etcd.delete.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: PUT per second | The number of puts seen by this member per second. |
DEPENDENT | etcd.put.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Range per second | The number of ranges seen by this member per second. |
DEPENDENT | etcd.range.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Transaction per second | The number of transactions seen by this member per second. |
DEPENDENT | etcd.txn.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Pending events | Total number of pending events to be sent. |
DEPENDENT | etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
DEPENDENT | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to peer with ID {#ETCD.PEER}. |
DEPENDENT | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from peer with ID {#ETCD.PEER}. |
DEPENDENT | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of send failures from peer with ID {#ETCD.PEER}. |
DEPENDENT | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Receive failures failures | The number of receive failures from the peer with ID {#ETCD.PEER}. |
DEPENDENT | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Zabbix_raw_items | Etcd: Get node metrics | - |
HTTP_AGENT | etcd.get_metrics |
Zabbix_raw_items | Etcd: Get version | - |
HTTP_AGENT | etcd.get_version |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | - |
last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0 |
AVERAGE | Manual close: YES |
Etcd: Node healthcheck failed | https://etcd.io/docs/v3.4.0/op-guide/monitoring/#health-check |
last(/Etcd by HTTP/etcd.health)=0 |
AVERAGE | Depends on: - Etcd: Service is unavailable |
Etcd: Failed to fetch info data (or no data for 30m) | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
WARNING | Manual close: YES Depends on: - Etcd: Service is unavailable |
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
AVERAGE | |
Etcd: Instance has seen too many leader changes (over {$ETCD.LEADER.CHANGES.MAX.WARN} for 15m)' | Rapid leadership changes impact the performance of etcd significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the etcd cluster. |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
WARNING | |
Etcd: Too many proposal failures (over {$ETCD.PROPOSAL.FAIL.MAX.WARN} for 5m)' | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
WARNING | |
Etcd: Too many proposals are queued to commit (over {$ETCD.PROPOSAL.PENDING.MAX.WARN} for 5m)' | Rising pending proposals suggests there is a high client load or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
WARNING | |
Etcd: Too many HTTP requests failures (over {$ETCD.HTTP.FAIL.MAX.WARN} for 5m)' | Too many reqvests failed on etcd instance with 5xx HTTP code. |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
WARNING | |
Etcd: Server version has changed (new version: {ITEM.VALUE}) | Etcd version has changed. Ack to close. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
INFO | Manual close: YES |
Etcd: Cluster version has changed (new version: {ITEM.VALUE}) | Etcd version has changed. Ack to close. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
INFO | Manual close: YES |
Etcd: has been restarted (uptime < 10m) | Uptime is less than 10 minutes |
last(/Etcd by HTTP/etcd.uptime)<10m |
INFO | Manual close: YES |
Etcd: Current number of open files is too high (over {$ETCD.OPEN.FDS.MAX.WARN}% for 5m) | Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, etcd may panic because it cannot create new WAL files. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
WARNING | |
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} (over {$ETCD.GRPC.ERRORS.MAX.WARN} in 5m) | - |
min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
WARNING |
Feedback
Please report any issues with the template at https://support.zabbix.com
Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/5.0
Template App Etcd by HTTP
Overview
For Zabbix version: 5.0 and higher
The template to monitor Etcd by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Etcd
— collects metrics by HTTP agent from /metrics endpoint.
See https://etcd.io/docs/v3.4.0/op-guide/monitoring/#metrics-endpoint.
This template was tested on:
- Etcd, version 3.0+
Setup
See Zabbix template operation for basic instructions.
- Import template into Zabbix
- After importing template make sure that etcd allows for metric collection.
Test by running:
curl -L http://localhost:2379/metrics
- Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring.
To verify run
curl -L http://<etcd_node_address>:2379/metrics
- Add the template to each node with etcd. By default template use client port. You can configure metrics endpoint location by --listen-metrics-urls flag (See etcd docs).
If you have specified a non-standard port for etcd, don't forget change macros {$ETCD.SCHEME}, {$ETCD.PORT}.
If you need it, you can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template for using on the host level.
Test availability: zabbix_get -s etcd-host -k etcd.health
Besides, see the macros section as it will set the trigger values.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.GRPC.ERRORS.MAX.WARN} | Maximum number of gRPC requests failures |
1 |
{$ETCD.GRPC_CODE.MATCHES} | Filter of discoverable gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | Filter to exclude discovered gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md |
CHANGE_IF_NEEDED |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | Filter of discoverable gRPC codes which will create triggers |
`Aborted |
{$ETCD.HTTP.FAIL.MAX.WARN} | Maximum number of HTTP requests failures |
2 |
{$ETCD.LEADER.CHANGES.MAX.WARN} | Maximum number of leader changes |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors |
90 |
{$ETCD.PASSWORD} | - |
`` |
{$ETCD.PORT} | The port of Etcd API endpoint |
2379 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | Maximum number of proposal failures |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | Maximum number of proposals in queue |
5 |
{$ETCD.SCHEME} | Request scheme which may be http or https |
http |
{$ETCD.USER} | - |
`` |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | DEPENDENT | etcd.grpc_code.discovery Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- A: {#GRPC.CODE} NOT_MATCHES_REGEX - B: {#GRPC.CODE} MATCHES_REGEX Overrides: trigger |
|
Peers discovery | DEPENDENT | etcd.peer.discovery Preprocessing: - PROMETHEUS_TO_JSON: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Etcd | Etcd: Service's TCP port state | - |
SIMPLE | net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Node health | - |
HTTP_AGENT | etcd.health Preprocessing: - JSONPATH: - BOOL_TO_DECIMAL - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server is a leader | Whether or not this member is a leader. 1 if is, 0 otherwise. |
DEPENDENT | etcd.is.leader Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server has a leader | Whether or not a leader exists. 1 is existence, 0 is not. |
DEPENDENT | etcd.has.leader Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Leader changes | The number of leader changes the member has seen since its start. |
DEPENDENT | etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Proposals committed per second | The number of consensus proposals committed. |
DEPENDENT | etcd.proposals.committed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals applied per second | The number of consensus proposals applied. |
DEPENDENT | etcd.proposals.applied.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals failed per second | The number of failed proposals seen. |
DEPENDENT | etcd.proposals.failed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Proposals pending | The current number of pending proposals to commit. |
DEPENDENT | etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Reads per second | Number of reads action by (get/getRecursive), local to this member. |
DEPENDENT | etcd.reads.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Writes per second | Number of writes (e.g. set/compareAndDelete) seen by this member. |
DEPENDENT | etcd.writes.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Client gRPC received bytes per second | The number of bytes received from grpc clients per second |
DEPENDENT | etcd.network.grpc.received.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Client gRPC sent bytes per second | The number of bytes sent from grpc clients per second |
DEPENDENT | etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP requests received | Number of requests received into the system (successfully parsed and authd). |
DEPENDENT | etcd.http.requests.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP 5XX | Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 5XX. |
DEPENDENT | etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: HTTP 4XX | Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 4XX. |
DEPENDENT | etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
DEPENDENT | etcd.grpc.received.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
DEPENDENT | etcd.grpc.sent.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: RPCs started per second | The number of RPCs started on the server. |
DEPENDENT | etcd.grpc.started.rate Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Server version | Version of the Etcd server. |
DEPENDENT | etcd.server.version Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: Cluster version | Version of the Etcd cluster. |
DEPENDENT | etcd.cluster.version Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Etcd | Etcd: DB size | Total size of the underlying database. |
DEPENDENT | etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Keys compacted per second | The number of DB keys compacted per second. |
DEPENDENT | etcd.keys.compacted.rate Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Keys expired per second | The number of expired keys per second. |
DEPENDENT | etcd.keys.expired.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Keys total | Total number of keys. |
DEPENDENT | etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Uptime | Etcd server uptime. |
DEPENDENT | etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: - JAVASCRIPT: |
Etcd | Etcd: Virtual memory | Virtual memory size in bytes. |
DEPENDENT | etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Resident memory | Resident memory size in bytes. |
DEPENDENT | etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: CPU | Total user and system CPU time spent in seconds. |
DEPENDENT | etcd.cpu.util Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Open file descriptors | Number of open file descriptors. |
DEPENDENT | etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
DEPENDENT | etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Deletes per second | The number of deletes seen by this member per second. |
DEPENDENT | etcd.delete.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: PUT per second | The number of puts seen by this member per second. |
DEPENDENT | etcd.put.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Range per second | The number of ranges seen by this member per second. |
DEPENDENT | etcd.range.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Transaction per second | The number of transactions seen by this member per second. |
DEPENDENT | etcd.txn.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Events sent per second | The number of events sent by this member per second |
DEPENDENT | etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Etcd | Etcd: Pending events | Total number of pending events to be sent. |
DEPENDENT | etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE} |
DEPENDENT | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUS_TO_JSON: - JAVASCRIPT: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to peer with ID {#ETCD.PEER} |
DEPENDENT | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from peer with ID {#ETCD.PEER} |
DEPENDENT | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of send failures from peer with ID {#ETCD.PEER} |
DEPENDENT | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Receive failures | The number of receive failures from the peer with ID {#ETCD.PEER} |
DEPENDENT | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: - CHANGE_PER_SECOND |
Zabbix_raw_items | Etcd: Get node metrics | - |
HTTP_AGENT | etcd.get_metrics |
Zabbix_raw_items | Etcd: Get version | - |
HTTP_AGENT | etcd.get_version |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | - |
{TEMPLATE_NAME:net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"].last()}=0 |
AVERAGE | Manual close: YES |
Etcd: Node healthcheck failed | https://etcd.io/docs/v3.4.0/op-guide/monitoring/#health-check |
{TEMPLATE_NAME:etcd.health.last()}=0 |
AVERAGE | Depends on: - Etcd: Service is unavailable |
Etcd: Failed to fetch info data (or no data for 30m) | Zabbix has not received data for items for the last 30 minutes |
{TEMPLATE_NAME:etcd.is.leader.nodata(30m)}=1 |
WARNING | Manual close: YES Depends on: - Etcd: Service is unavailable |
Etcd: Member has no leader | "If a member does not have a leader, it is totally unavailable." |
{TEMPLATE_NAME:etcd.has.leader.last()}=0 |
AVERAGE | |
Etcd: Instance has seen too many leader changes (over {$ETCD.LEADER.CHANGES.MAX.WARN} for 15m)' | Rapid leadership changes impact the performance of etcd significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the etcd cluster. |
{TEMPLATE_NAME:etcd.leader.changes.delta(15m)}>{$ETCD.LEADER.CHANGES.MAX.WARN} |
WARNING | |
Etcd: Too many proposal failures (over {$ETCD.PROPOSAL.FAIL.MAX.WARN} for 5m)' | "Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster." |
{TEMPLATE_NAME:etcd.proposals.failed.rate.min(5m)}>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
WARNING | |
Etcd: Too many proposals are queued to commit (over {$ETCD.PROPOSAL.PENDING.MAX.WARN} for 5m)' | "Rising pending proposals suggests there is a high client load or the member cannot commit proposals." |
{TEMPLATE_NAME:etcd.proposals.pending.min(5m)}>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
WARNING | |
Etcd: Too many HTTP requests failures (over {$ETCD.HTTP.FAIL.MAX.WARN} for 5m)' | "Too many requests failed on etcd instance with 5xx HTTP code" |
{TEMPLATE_NAME:etcd.http.requests.5xx.rate.min(5m)}>{$ETCD.HTTP.FAIL.MAX.WARN} |
WARNING | |
Etcd: Server version has changed (new version: {ITEM.VALUE}) | Etcd version has changed. Ack to close. |
{TEMPLATE_NAME:etcd.server.version.diff()}=1 and {TEMPLATE_NAME:etcd.server.version.strlen()}>0 |
INFO | Manual close: YES |
Etcd: Cluster version has changed (new version: {ITEM.VALUE}) | Etcd version has changed. Ack to close. |
{TEMPLATE_NAME:etcd.cluster.version.diff()}=1 and {TEMPLATE_NAME:etcd.cluster.version.strlen()}>0 |
INFO | Manual close: YES |
Etcd: has been restarted (uptime < 10m) | Uptime is less than 10 minutes. |
{TEMPLATE_NAME:etcd.uptime.last()}<10m |
INFO | Manual close: YES |
Etcd: Current number of open files is too high (over {$ETCD.OPEN.FDS.MAX.WARN}% for 5m) | "Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, etcd may panic because it cannot create new WAL files." |
{TEMPLATE_NAME:etcd.open.fds.min(5m)}/{TEMPLATE_NAME:etcd.max.fds.last()}*100>{$ETCD.OPEN.FDS.MAX.WARN} |
WARNING | |
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} (over {$ETCD.GRPC.ERRORS.MAX.WARN} in 5m) | - |
{TEMPLATE_NAME:etcd.grpc.handled.rate[{#GRPC.CODE}].min(5m)}>{$ETCD.GRPC.ERRORS.MAX.WARN} |
WARNING |
Feedback
Please report any issues with the template at https://support.zabbix.com