Envoy Proxy

Envoy is an open source edge and service proxy, designed for cloud-native applications.

Available solutions




This template is for Zabbix version: 6.2
Also available for: 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/envoy_proxy_http?at=release/6.2

Envoy Proxy by HTTP

Overview

For Zabbix version: 6.2 and higher
The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Envoy Proxy by HTTP — collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

This template was tested on:

  • Envoy Proxy, version 1.20.2

Setup

See Zabbix template operation for basic instructions.

Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview

Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$ENVOY.CERT.MIN}

Minimum number of days before certificate expiration used for trigger expression.

7
{$ENVOY.METRICS.PATH}

The path Zabbix will scrape metrics in prometheus format from.

/stats/prometheus
{$ENVOY.URL}

Instance URL.

http://localhost:9901

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Cluster metrics discovery

-

DEPENDENT envoy.lld.cluster

Preprocessing:

- PROMETHEUS_TO_JSON

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

HTTP metrics discovery

-

DEPENDENT envoy.lld.http

Preprocessing:

- PROMETHEUS_TO_JSON

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Listeners metrics discovery

-

DEPENDENT envoy.lld.listeners

Preprocessing:

- PROMETHEUS_TO_JSON

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Items collected

Group Name Description Type Key and additional info
Envoy Proxy Envoy Proxy: Server state

State of the server.

Live - (default) ⁣Server is live and serving traffic.

Draining - ⁣Server is draining listeners in response to external health checks failing.

Pre initializing - ⁣Server has not yet completed cluster manager initialization.

Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS).

DEPENDENT envoy.server.state

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_state

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Envoy Proxy Envoy Proxy: Server live

1 if the server is not currently draining, 0 otherwise.

DEPENDENT envoy.server.live

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_live

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Envoy Proxy Envoy Proxy: Uptime

Current server uptime in seconds.

DEPENDENT envoy.server.uptime

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_uptime

⛔️ON_FAIL: DISCARD_VALUE ->

Envoy Proxy Envoy Proxy: Certificate expiration, day before

Number of days until the next certificate being managed will expire.

DEPENDENT envoy.server.days_until_first_cert_expiring

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_days_until_first_cert_expiring

Envoy Proxy Envoy Proxy: Server concurrency

Number of worker threads.

DEPENDENT envoy.server.concurrency

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_concurrency

Envoy Proxy Envoy Proxy: Memory allocated

Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart.

DEPENDENT envoy.server.memory_allocated

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_memory_allocated

Envoy Proxy Envoy Proxy: Memory heap size

Current reserved heap size in bytes. New Envoy process heap size on hot restart.

DEPENDENT envoy.server.memory_heap_size

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_memory_heap_size

Envoy Proxy Envoy Proxy: Memory physical size

Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart.

DEPENDENT envoy.server.memory_physical_size

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_memory_physical_size

Envoy Proxy Envoy Proxy: Filesystem, flushed by timer rate

Total number of times internal flush buffers are written to a file due to flush timeout per second.

DEPENDENT envoy.filesystem.flushed_by_timer.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_filesystem_flushed_by_timer

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Filesystem, write completed rate

Total number of times a file was written per second.

DEPENDENT envoy.filesystem.write_completed.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_filesystem_write_completed

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Filesystem, write failed rate

Total number of times an error occurred during a file write operation per second.

DEPENDENT envoy.filesystem.write_failed.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_filesystem_write_failed

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Filesystem, reopen failed rate

Total number of times a file was failed to be opened per second.

DEPENDENT envoy.filesystem.reopen_failed.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_filesystem_reopen_failed

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Connections, total

Total connections of both new and old Envoy processes.

DEPENDENT envoy.server.total_connections

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_total_connections

Envoy Proxy Envoy Proxy: Connections, parent

Total connections of the old Envoy process on hot restart.

DEPENDENT envoy.server.parent_connections

Preprocessing:

- PROMETHEUS_PATTERN: envoy_server_parent_connections

Envoy Proxy Envoy Proxy: Clusters, warming

Number of currently warming (not active) clusters.

DEPENDENT envoy.cluster_manager.warming_clusters

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_manager_warming_clusters

Envoy Proxy Envoy Proxy: Clusters, active

Number of currently active (warmed) clusters.

DEPENDENT envoy.cluster_manager.active_clusters

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_manager_active_clusters

Envoy Proxy Envoy Proxy: Clusters, added rate

Total clusters added (either via static config or CDS) per second.

DEPENDENT envoy.cluster_manager.cluster_added.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_manager_cluster_added

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Clusters, modified rate

Total clusters modified (via CDS) per second.

DEPENDENT envoy.cluster_manager.cluster_modified.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_manager_cluster_modified

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Clusters, removed rate

Total clusters removed (via CDS) per second.

DEPENDENT envoy.cluster_manager.cluster_removed.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_manager_cluster_removed

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Clusters, updates rate

Total cluster updates per second.

DEPENDENT envoy.cluster_manager.cluster_updated.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_manager_cluster_updated

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Listeners, active

Number of currently active listeners.

DEPENDENT envoy.listener_manager.total_listeners_active

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_total_listeners_active: function: sum

Envoy Proxy Envoy Proxy: Listeners, draining

Number of currently draining listeners.

DEPENDENT envoy.listener_manager.total_listeners_draining

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_total_listeners_draining: function: sum

Envoy Proxy Envoy Proxy: Listener, warming

Number of currently warming listeners.

DEPENDENT envoy.listener_manager.total_listeners_warming

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_total_listeners_warming: function: sum

Envoy Proxy Envoy Proxy: Listener manager, initialized

A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers.

DEPENDENT envoy.listener_manager.workers_started

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_workers_started

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Envoy Proxy Envoy Proxy: Listeners, create failure

Total failed listener object additions to workers per second.

DEPENDENT envoy.listener_manager.listener_create_failure.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_listener_create_failure

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Listeners, create success

Total listener objects successfully added to workers per second.

DEPENDENT envoy.listener_manager.listener_create_success.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_listener_create_success

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Listeners, added

Total listeners added (either via static config or LDS) per second.

DEPENDENT envoy.listener_manager.listener_added.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_listener_added

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Listeners, stopped

Total listeners stopped per second.

DEPENDENT envoy.listener_manager.listener_stopped.rate

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_manager_listener_stopped

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total

Current cluster membership total.

DEPENDENT envoy.cluster.membership_total["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_membership_total{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy

Current cluster healthy total (inclusive of both health checking and outlier detection).

DEPENDENT envoy.cluster.membership_healthy["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_membership_healthy{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy

Current cluster unhealthy.

CALCULATED envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]

Expression:

last(//envoy.cluster.membership_total["{#CLUSTER_NAME}"]) - last(//envoy.cluster.membership_healthy["{#CLUSTER_NAME}"])
Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded

Current cluster degraded total.

DEPENDENT envoy.cluster.membership_degraded["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_membership_degraded{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total

Current cluster total connections.

DEPENDENT envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_cx_total{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active

Current cluster total active connections.

DEPENDENT envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_cx_active{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate

Current cluster request total per second.

DEPENDENT envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_total{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate

Current cluster requests that timed out waiting for a response per second.

DEPENDENT envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_timeout{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate

Total upstream requests completed per second.

DEPENDENT envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_completed{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate

Aggregate HTTP response codes per second.

DEPENDENT envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="2"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate

Aggregate HTTP response codes per second.

DEPENDENT envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="3"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate

Aggregate HTTP response codes per second.

DEPENDENT envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="4"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate

Aggregate HTTP response codes per second.

DEPENDENT envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="5"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending

Total active requests pending a connection pool connection.

DEPENDENT envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_pending_active{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active

Total active requests.

DEPENDENT envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_rq_active{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate

Total sent connection bytes per second.

DEPENDENT envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_cx_tx_bytes_total{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate

Total received connection bytes per second.

DEPENDENT envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_cluster_upstream_cx_rx_bytes_total{envoy_cluster_name = "{#CLUSTER_NAME}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active

Total active connections.

DEPENDENT envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_downstream_cx_active{envoy_listener_address = "{#LISTENER_ADDRESS}"}: function: sum

Envoy Proxy Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate

Total connections per second.

DEPENDENT envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_downstream_cx_total{envoy_listener_address = "{#LISTENER_ADDRESS}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing

Sockets currently undergoing listener filter processing.

DEPENDENT envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_listener_downstream_pre_cx_active{envoy_listener_address = "{#LISTENER_ADDRESS}"}: function: sum

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate

Total active connections per second.

DEPENDENT envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_rq_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active

Total active requests.

DEPENDENT envoy.http.downstream_rq_active["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_rq_active{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate

Total requests closed due to a timeout on the request path per second.

DEPENDENT envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_rq_timeout{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate

Total connections per second.

DEPENDENT envoy.http.downstream_cx_total["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_cx_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active

Total active connections.

DEPENDENT envoy.http.downstream_cx_active["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_cx_active{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate

Total bytes received per second.

DEPENDENT envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_cx_rx_bytes_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

- CHANGE_PER_SECOND

Envoy Proxy Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate

Total bytes sent per second.

DEPENDENT envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"]

Preprocessing:

- PROMETHEUS_PATTERN: envoy_http_downstream_cx_tx_bytes_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"}: function: sum

- CHANGE_PER_SECOND

Zabbix raw items Envoy Proxy: Get node metrics

Get server metrics.

HTTP_AGENT envoy.get_metrics

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

Triggers

Name Description Expression Severity Dependencies and additional info
Envoy Proxy: Server state is not live

-

last(/Envoy Proxy by HTTP/envoy.server.state) > 0 AVERAGE
Envoy Proxy: Service has been restarted

Uptime is less than 10 minutes.

last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m INFO

Manual close: YES

Envoy Proxy: Failed to fetch metrics data

Zabbix has not received data for items for the last 10 minutes.

nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 WARNING

Manual close: YES

Envoy Proxy: SSL certificate expires soon

Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire.

last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} WARNING
Envoy Proxy: There are unhealthy clusters

-

last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

Articles and documentation

+ Propose new article

Didn't find integration you need?