Envoy Proxy by HTTP
Overview
For Zabbix version: 6.2 and higher
The template to monitor Envoy Proxy by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Envoy Proxy by HTTP
— collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
This template was tested on:
- Envoy Proxy, version 1.20.2
Setup
See Zabbix template operation for basic instructions.
Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview
Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}.
Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ENVOY.CERT.MIN} | Minimum number of days before certificate expiration used for trigger expression. |
7 |
{$ENVOY.METRICS.PATH} | The path Zabbix will scrape metrics in prometheus format from. |
/stats/prometheus |
{$ENVOY.URL} | Instance URL. |
http://localhost:9901 |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | - |
DEPENDENT | envoy.lld.cluster Preprocessing: - PROMETHEUS_TO_JSON - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: |
HTTP metrics discovery | - |
DEPENDENT | envoy.lld.http Preprocessing: - PROMETHEUS_TO_JSON - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: |
Listeners metrics discovery | - |
DEPENDENT | envoy.lld.listeners Preprocessing: - PROMETHEUS_TO_JSON - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Envoy Proxy | Envoy Proxy: Server state | State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS). |
DEPENDENT | envoy.server.state Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
Envoy Proxy | Envoy Proxy: Server live | 1 if the server is not currently draining, 0 otherwise. |
DEPENDENT | envoy.server.live Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
Envoy Proxy | Envoy Proxy: Uptime | Current server uptime in seconds. |
DEPENDENT | envoy.server.uptime Preprocessing: - PROMETHEUS_PATTERN: ⛔️ON_FAIL: |
Envoy Proxy | Envoy Proxy: Certificate expiration, day before | Number of days until the next certificate being managed will expire. |
DEPENDENT | envoy.server.days_until_first_cert_expiring Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Server concurrency | Number of worker threads. |
DEPENDENT | envoy.server.concurrency Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Memory allocated | Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart. |
DEPENDENT | envoy.server.memory_allocated Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Memory heap size | Current reserved heap size in bytes. New Envoy process heap size on hot restart. |
DEPENDENT | envoy.server.memory_heap_size Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Memory physical size | Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart. |
DEPENDENT | envoy.server.memory_physical_size Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Filesystem, flushed by timer rate | Total number of times internal flush buffers are written to a file due to flush timeout per second. |
DEPENDENT | envoy.filesystem.flushed_by_timer.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Filesystem, write completed rate | Total number of times a file was written per second. |
DEPENDENT | envoy.filesystem.write_completed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Filesystem, write failed rate | Total number of times an error occurred during a file write operation per second. |
DEPENDENT | envoy.filesystem.write_failed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Filesystem, reopen failed rate | Total number of times a file was failed to be opened per second. |
DEPENDENT | envoy.filesystem.reopen_failed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Connections, total | Total connections of both new and old Envoy processes. |
DEPENDENT | envoy.server.total_connections Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Connections, parent | Total connections of the old Envoy process on hot restart. |
DEPENDENT | envoy.server.parent_connections Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Clusters, warming | Number of currently warming (not active) clusters. |
DEPENDENT | envoy.cluster_manager.warming_clusters Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Clusters, active | Number of currently active (warmed) clusters. |
DEPENDENT | envoy.cluster_manager.active_clusters Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Clusters, added rate | Total clusters added (either via static config or CDS) per second. |
DEPENDENT | envoy.cluster_manager.cluster_added.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Clusters, modified rate | Total clusters modified (via CDS) per second. |
DEPENDENT | envoy.cluster_manager.cluster_modified.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Clusters, removed rate | Total clusters removed (via CDS) per second. |
DEPENDENT | envoy.cluster_manager.cluster_removed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Clusters, updates rate | Total cluster updates per second. |
DEPENDENT | envoy.cluster_manager.cluster_updated.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Listeners, active | Number of currently active listeners. |
DEPENDENT | envoy.listener_manager.total_listeners_active Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Listeners, draining | Number of currently draining listeners. |
DEPENDENT | envoy.listener_manager.total_listeners_draining Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Listener, warming | Number of currently warming listeners. |
DEPENDENT | envoy.listener_manager.total_listeners_warming Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Listener manager, initialized | A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers. |
DEPENDENT | envoy.listener_manager.workers_started Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
Envoy Proxy | Envoy Proxy: Listeners, create failure | Total failed listener object additions to workers per second. |
DEPENDENT | envoy.listener_manager.listener_create_failure.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Listeners, create success | Total listener objects successfully added to workers per second. |
DEPENDENT | envoy.listener_manager.listener_create_success.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Listeners, added | Total listeners added (either via static config or LDS) per second. |
DEPENDENT | envoy.listener_manager.listener_added.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Listeners, stopped | Total listeners stopped per second. |
DEPENDENT | envoy.listener_manager.listener_stopped.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total | Current cluster membership total. |
DEPENDENT | envoy.cluster.membership_total["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy | Current cluster healthy total (inclusive of both health checking and outlier detection). |
DEPENDENT | envoy.cluster.membership_healthy["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy | Current cluster unhealthy. |
CALCULATED | envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"] Expression: last(//envoy.cluster.membership_total["{#CLUSTER_NAME}"]) - last(//envoy.cluster.membership_healthy["{#CLUSTER_NAME}"]) |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded | Current cluster degraded total. |
DEPENDENT | envoy.cluster.membership_degraded["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total | Current cluster total connections. |
DEPENDENT | envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active | Current cluster total active connections. |
DEPENDENT | envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate | Current cluster request total per second. |
DEPENDENT | envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate | Current cluster requests that timed out waiting for a response per second. |
DEPENDENT | envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate | Total upstream requests completed per second. |
DEPENDENT | envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending | Total active requests pending a connection pool connection. |
DEPENDENT | envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active | Total active requests. |
DEPENDENT | envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate | Total sent connection bytes per second. |
DEPENDENT | envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate | Total received connection bytes per second. |
DEPENDENT | envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active | Total active connections. |
DEPENDENT | envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate | Total connections per second. |
DEPENDENT | envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing | Sockets currently undergoing listener filter processing. |
DEPENDENT | envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate | Total active connections per second. |
DEPENDENT | envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active | Total active requests. |
DEPENDENT | envoy.http.downstream_rq_active["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate | Total requests closed due to a timeout on the request path per second. |
DEPENDENT | envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate | Total connections per second. |
DEPENDENT | envoy.http.downstream_cx_total["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active | Total active connections. |
DEPENDENT | envoy.http.downstream_cx_active["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate | Total bytes received per second. |
DEPENDENT | envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate | Total bytes sent per second. |
DEPENDENT | envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
Zabbix raw items | Envoy Proxy: Get node metrics | Get server metrics. |
HTTP_AGENT | envoy.get_metrics Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: Server state is not live | - |
last(/Envoy Proxy by HTTP/envoy.server.state) > 0 |
AVERAGE | |
Envoy Proxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m |
INFO | Manual close: YES |
Envoy Proxy: Failed to fetch metrics data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 |
WARNING | Manual close: YES |
Envoy Proxy: SSL certificate expires soon | Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire. |
last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} |
WARNING | |
Envoy Proxy: There are unhealthy clusters | - |
last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 |
AVERAGE |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.