Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/vault_http?at=release/7.0
HashiCorp Vault by HTTP
Overview
The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Vault by HTTP
— collects metrics by HTTP agent from /sys/metrics
API endpoint.
See https://www.vaultproject.io/api-docs/system/metrics.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Vault 1.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
See Zabbix template operation for basic instructions.
Configure Vault API. See Vault Configuration.
Create a Vault service token and set it to the macro {$VAULT.TOKEN}
.
Macros used
Name | Description | Default |
---|---|---|
{$VAULT.API.PORT} | Vault port. |
8200 |
{$VAULT.API.SCHEME} | Vault API scheme. |
http |
{$VAULT.HOST} | Vault host name. |
<PUT YOUR VAULT HOST> |
{$VAULT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors for trigger expression. |
90 |
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} | Maximum number of Vault leadership setup failed. |
5 |
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} | Maximum number of Vault leadership losses. |
5 |
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} | Maximum number of Vault leadership step downs. |
5 |
{$VAULT.LLD.FILTER.STORAGE.MATCHES} | Filter of discoverable storage backends. |
.+ |
{$VAULT.TOKEN} | Vault auth token. |
<PUT YOUR AUTH TOKEN> |
{$VAULT.TOKEN.ACCESSORS} | Vault accessors separated by spaces for monitoring token expiration time. |
|
{$VAULT.TOKEN.TTL.MIN.CRIT} | Token TTL critical threshold. |
3d |
{$VAULT.TOKEN.TTL.MIN.WARN} | Token TTL warning threshold. |
7d |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get health | HTTP agent | vault.get_health Preprocessing
|
|
Get leader | HTTP agent | vault.get_leader Preprocessing
|
|
Get metrics | HTTP agent | vault.get_metrics Preprocessing
|
|
Clear metrics | Dependent item | vault.clear_metrics Preprocessing
|
|
Get tokens | Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}". |
Script | vault.get_tokens |
Check WAL discovery | Dependent item | vault.check_wal_discovery Preprocessing
|
|
Check replication discovery | Dependent item | vault.check_replication_discovery Preprocessing
|
|
Check storage discovery | Dependent item | vault.check_storage_discovery Preprocessing
|
|
Check mountpoint discovery | Dependent item | vault.check_mountpoint_discovery Preprocessing
|
|
Initialized | Initialization status. |
Dependent item | vault.health.initialized Preprocessing
|
Sealed | Seal status. |
Dependent item | vault.health.sealed Preprocessing
|
Standby | Standby status. |
Dependent item | vault.health.standby Preprocessing
|
Performance standby | Performance standby status. |
Dependent item | vault.health.performance_standby Preprocessing
|
Performance replication | Performance replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replication_performance_mode Preprocessing
|
Disaster Recovery replication | Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replication_dr_mode Preprocessing
|
Version | Server version. |
Dependent item | vault.health.version Preprocessing
|
Healthcheck | Vault healthcheck. |
Dependent item | vault.health.check Preprocessing
|
HA enabled | HA enabled status. |
Dependent item | vault.leader.ha_enabled Preprocessing
|
Is leader | Leader status. |
Dependent item | vault.leader.is_self Preprocessing
|
Get metrics error | Get metrics error. |
Dependent item | vault.get_metrics.error Preprocessing
|
Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | vault.metrics.process.cpu.seconds.total Preprocessing
|
Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | vault.metrics.process.max.fds Preprocessing
|
Open file descriptors, current | Number of open file descriptors. |
Dependent item | vault.metrics.process.open.fds Preprocessing
|
Process resident memory | Resident memory size in bytes. |
Dependent item | vault.metrics.process.resident_memory.bytes Preprocessing
|
Uptime | Server uptime. |
Dependent item | vault.metrics.process.uptime Preprocessing
|
Process virtual memory, current | Virtual memory size in bytes. |
Dependent item | vault.metrics.process.virtual_memory.bytes Preprocessing
|
Process virtual memory, max | Maximum amount of virtual memory available in bytes. |
Dependent item | vault.metrics.process.virtual_memory.max.bytes Preprocessing
|
Audit log requests, rate | Number of all audit log requests across all audit log devices. |
Dependent item | vault.metrics.audit.log.request.rate Preprocessing
|
Audit log request failures, rate | Number of audit log request failures. |
Dependent item | vault.metrics.audit.log.request.failure.rate Preprocessing
|
Audit log response, rate | Number of audit log responses across all audit log devices. |
Dependent item | vault.metrics.audit.log.response.rate Preprocessing
|
Audit log response failures, rate | Number of audit log response failures. |
Dependent item | vault.metrics.audit.log.response.failure.rate Preprocessing
|
Barrier DELETE ops, rate | Number of DELETE operations at the barrier. |
Dependent item | vault.metrics.barrier.delete.rate Preprocessing
|
Barrier GET ops, rate | Number of GET operations at the barrier. |
Dependent item | vault.metrics.vault.barrier.get.rate Preprocessing
|
Barrier LIST ops, rate | Number of LIST operations at the barrier. |
Dependent item | vault.metrics.barrier.list.rate Preprocessing
|
Barrier PUT ops, rate | Number of PUT operations at the barrier. |
Dependent item | vault.metrics.barrier.put.rate Preprocessing
|
Cache hit, rate | Number of times a value was retrieved from the LRU cache. |
Dependent item | vault.metrics.cache.hit.rate Preprocessing
|
Cache miss, rate | Number of times a value was not in the LRU cache. The results in a read from the configured storage. |
Dependent item | vault.metrics.cache.miss.rate Preprocessing
|
Cache write, rate | Number of times a value was written to the LRU cache. |
Dependent item | vault.metrics.cache.write.rate Preprocessing
|
Check token, rate | Number of token checks handled by Vault core. |
Dependent item | vault.metrics.core.check.token.rate Preprocessing
|
Fetch ACL and token, rate | Number of ACL and corresponding token entry fetches handled by Vault core. |
Dependent item | vault.metrics.core.fetch.acl_and_token Preprocessing
|
Requests, rate | Number of requests handled by Vault core. |
Dependent item | vault.metrics.core.handle.request Preprocessing
|
Leadership setup failed, counter | Cluster leadership setup failures which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership.setup_failed Preprocessing
|
Leadership setup lost, counter | Cluster leadership losses which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership_lost Preprocessing
|
Post-unseal ops, counter | Duration of time taken by post-unseal operations handled by Vault core. |
Dependent item | vault.metrics.core.post_unseal Preprocessing
|
Pre-seal ops, counter | Duration of time taken by pre-seal operations. |
Dependent item | vault.metrics.core.pre_seal Preprocessing
|
Requested seal ops, counter | Duration of time taken by requested seal operations. |
Dependent item | vault.metrics.core.seal_with_request Preprocessing
|
Seal ops, counter | Duration of time taken by seal operations. |
Dependent item | vault.metrics.core.seal Preprocessing
|
Internal seal ops, counter | Duration of time taken by internal seal operations. |
Dependent item | vault.metrics.core.seal_internal Preprocessing
|
Leadership step downs, counter | Cluster leadership step down. |
Dependent item | vault.metrics.core.step_down Preprocessing
|
Unseal ops, counter | Duration of time taken by unseal operations. |
Dependent item | vault.metrics.core.unseal Preprocessing
|
Fetch lease times, counter | Time taken to fetch lease times. |
Dependent item | vault.metrics.expire.fetch.lease.times Preprocessing
|
Fetch lease times by token, counter | Time taken to fetch lease times by token. |
Dependent item | vault.metrics.expire.fetch.lease.times.by_token Preprocessing
|
Number of expiring leases | Number of all leases which are eligible for eventual expiry. |
Dependent item | vault.metrics.expire.num_leases Preprocessing
|
Expire revoke, count | Time taken to revoke a token. |
Dependent item | vault.metrics.expire.revoke Preprocessing
|
Expire revoke force, count | Time taken to forcibly revoke a token. |
Dependent item | vault.metrics.expire.revoke.force Preprocessing
|
Expire revoke prefix, count | Tokens revoke on a prefix. |
Dependent item | vault.metrics.expire.revoke.prefix Preprocessing
|
Revoke secrets by token, count | Time taken to revoke all secrets issued with a given token. |
Dependent item | vault.metrics.expire.revoke.by_token Preprocessing
|
Expire renew, count | Time taken to renew a lease. |
Dependent item | vault.metrics.expire.renew Preprocessing
|
Renew token, count | Time taken to renew a token which does not need to invoke a logical backend. |
Dependent item | vault.metrics.expire.renew_token Preprocessing
|
Register ops, count | Time taken for register operations. |
Dependent item | vault.metrics.expire.register Preprocessing
|
Register auth ops, count | Time taken for register authentication operations which create lease entries without lease ID. |
Dependent item | vault.metrics.expire.register.auth Preprocessing
|
Policy GET ops, rate | Number of operations to get a policy. |
Dependent item | vault.metrics.policy.get_policy.rate Preprocessing
|
Policy LIST ops, rate | Number of operations to list policies. |
Dependent item | vault.metrics.policy.list_policies.rate Preprocessing
|
Policy DELETE ops, rate | Number of operations to delete a policy. |
Dependent item | vault.metrics.policy.delete_policy.rate Preprocessing
|
Policy SET ops, rate | Number of operations to set a policy. |
Dependent item | vault.metrics.policy.set_policy.rate Preprocessing
|
Token create, count | The time taken to create a token. |
Dependent item | vault.metrics.token.create Preprocessing
|
Token createAccessor, count | The time taken to create a token accessor. |
Dependent item | vault.metrics.token.createAccessor Preprocessing
|
Token lookup, rate | Number of token look up. |
Dependent item | vault.metrics.token.lookup.rate Preprocessing
|
Token revoke, count | The time taken to look up a token. |
Dependent item | vault.metrics.token.revoke Preprocessing
|
Token revoke tree, count | Time taken to revoke a token tree. |
Dependent item | vault.metrics.token.revoke.tree Preprocessing
|
Token store, count | Time taken to store an updated token entry without writing to the secondary index. |
Dependent item | vault.metrics.token.store Preprocessing
|
Runtime allocated bytes | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. |
Dependent item | vault.metrics.runtime.alloc.bytes Preprocessing
|
Runtime freed objects | Number of freed objects. |
Dependent item | vault.metrics.runtime.free.count Preprocessing
|
Runtime heap objects | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.heap.objects Preprocessing
|
Runtime malloc count | Cumulative count of allocated heap objects. |
Dependent item | vault.metrics.runtime.malloc.count Preprocessing
|
Runtime num goroutines | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.num_goroutines Preprocessing
|
Runtime sys bytes | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. |
Dependent item | vault.metrics.runtime.sys.bytes Preprocessing
|
Runtime GC pause, total | The total garbage collector pause time since Vault was last started. |
Dependent item | vault.metrics.total.gc.pause Preprocessing
|
Runtime GC runs, total | Total number of garbage collection runs since Vault was last started. |
Dependent item | vault.metrics.runtime.total.gc.runs Preprocessing
|
Token count, total | Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. |
Dependent item | vault.metrics.token Preprocessing
|
Token count by auth, total | Total number of service tokens that were created by an auth method. |
Dependent item | vault.metrics.token.by_auth Preprocessing
|
Token count by policy, total | Total number of service tokens that have a policy attached. |
Dependent item | vault.metrics.token.by_policy Preprocessing
|
Token count by ttl, total | Number of service tokens, grouped by the TTL range they were assigned at creation. |
Dependent item | vault.metrics.token.by_ttl Preprocessing
|
Token creation, rate | Number of service or batch tokens created. |
Dependent item | vault.metrics.token.creation.rate Preprocessing
|
Secret kv entries | Number of entries in each key-value secret engine. |
Dependent item | vault.metrics.secret.kv.count Preprocessing
|
Token secret lease creation, rate | Counts the number of leases created by secret engines. |
Dependent item | vault.metrics.secret.lease.creation.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Vault server is sealed | https://www.vaultproject.io/docs/concepts/seal |
last(/HashiCorp Vault by HTTP/vault.health.sealed)=1 |
Average | |
Version has changed | Vault version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0 |
Info | Manual close: Yes |
Vault server is not responding | last(/HashiCorp Vault by HTTP/vault.health.check)=0 |
High | ||
Failed to get metrics | length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 |
Warning | Depends on:
|
|
Current number of open files is too high | min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} |
Warning | ||
has been restarted | Uptime is less than 10 minutes. |
last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m |
Info | Manual close: Yes |
High frequency of leadership setup failures | There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} |
Average | |
High frequency of leadership losses | There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} |
Average | |
High frequency of leadership step downs | There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} |
Average |
LLD rule Storage metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Storage backend metrics discovery. |
Dependent item | vault.storage.discovery |
Item prototypes for Storage metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#STORAGE}] {#OPERATION} ops, rate | Number of a {#OPERATION} operation against the {#STORAGE} storage backend. |
Dependent item | vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}] Preprocessing
|
LLD rule Mountpoint metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Mountpoint metrics discovery | Mountpoint metrics discovery. |
Dependent item | vault.mountpoint.discovery |
Item prototypes for Mountpoint metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Rollback attempt [{#MOUNTPOINT}] ops, rate | Number of operations to perform a rollback operation on the given mount point. |
Dependent item | vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}] Preprocessing
|
Route rollback [{#MOUNTPOINT}] ops, rate | Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. |
Dependent item | vault.metrics.route.rollback.rate[{#MOUNTPOINT}] Preprocessing
|
LLD rule WAL metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
WAL metrics discovery | Discovery for WAL metrics. |
Dependent item | vault.wal.discovery |
Item prototypes for WAL metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Delete WALs, count{#SINGLETON} | Time taken to delete a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing
|
GC deleted WAL{#SINGLETON} | Number of Write Ahead Logs (WAL) deleted during each garbage collection run. |
Dependent item | vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing
|
WALs on disk, total{#SINGLETON} | Total Number of Write Ahead Logs (WAL) on disk. |
Dependent item | vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing
|
Load WALs, count{#SINGLETON} | Time taken to load a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing
|
Persist WALs, count{#SINGLETON} | Time taken to persist a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing
|
Flush ready WAL, count{#SINGLETON} | Time taken to flush a ready Write Ahead Log (WAL) to storage. |
Dependent item | vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing
|
LLD rule Replication metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication metrics discovery | Discovery for replication metrics. |
Dependent item | vault.replication.discovery |
Item prototypes for Replication metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream WAL missing guard, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing
|
Stream WAL guard found, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing
|
Merkle commit index{#SINGLETON} | The last committed index in the Merkle Tree. |
Dependent item | vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing
|
Last WAL{#SINGLETON} | The index of the last WAL. |
Dependent item | vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing
|
Last DR WAL{#SINGLETON} | The index of the last DR WAL. |
Dependent item | vault.metrics.replication.wal.last_dr_wal[{#SINGLETON}] Preprocessing
|
Last performance WAL{#SINGLETON} | The index of the last Performance WAL. |
Dependent item | vault.metrics.replication.wal.last_performance_wal[{#SINGLETON}] Preprocessing
|
Last remote WAL{#SINGLETON} | The index of the last remote WAL. |
Dependent item | vault.metrics.replication.fsm.last_remote_wal[{#SINGLETON}] Preprocessing
|
LLD rule Token metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Token metrics discovery | Tokens metrics discovery. |
Dependent item | vault.tokens.discovery |
Item prototypes for Token metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Token [{#TOKEN_NAME}] error | Token lookup error text. |
Dependent item | vault.token_via_accessor.error["{#ACCESSOR}"] Preprocessing
|
Token [{#TOKEN_NAME}] has TTL | The Token has TTL. |
Dependent item | vault.token_via_accessor.has_ttl["{#ACCESSOR}"] Preprocessing
|
Token [{#TOKEN_NAME}] TTL | The TTL period of the token. |
Dependent item | vault.token_via_accessor.ttl["{#ACCESSOR}"] Preprocessing
|
Trigger prototypes for Token metrics discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Token [{#TOKEN_NAME}] lookup error occurred | length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 |
Warning | Depends on:
|
|
Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} |
Average | ||
Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} |
Warning | Depends on:
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums