Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/6.2
CockroachDB by HTTP
Overview
For Zabbix version: 6.2 and higher
The template to monitor CockroachDB nodes by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template CockroachDB node by HTTP
— collects metrics by HTTP agent from Prometheus endpoint and health endpoints.
This template was tested on:
- CockroachDB, version 21.2.8
Setup
See Zabbix template operation for basic instructions.
Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.
Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$COCKROACHDB.API.PORT} | The port of CockroachDB API and Prometheus endpoint. |
8080 |
{$COCKROACHDB.API.SCHEME} | Request scheme which may be http or https. |
http |
{$COCKROACHDB.CERT.CA.EXPIRY.WARN} | Number of days until the CA certificate expires. |
90 |
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN} | Number of days until the node certificate expires. |
30 |
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} | Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression. |
300 |
{$COCKROACHDB.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
80 |
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} | Maximum number of SQL statements errors for trigger expression. |
2 |
{$COCKROACHDB.STORE.USED.MIN.CRIT} | The critical threshold of the available disk space in percent. |
10 |
{$COCKROACHDB.STORE.USED.MIN.WARN} | The warning threshold of the available disk space in percent. |
20 |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Discover per store metrics. |
DEPENDENT | cockroachdb.store.discovery Preprocessing: - PROMETHEUS_TO_JSON: - DISCARD_UNCHANGED_HEARTBEAT: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
CockroachDB | CockroachDB: Service ping | Check if HTTP/HTTPS service accepts TCP connections. |
SIMPLE | net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
CockroachDB | CockroachDB: Clock offset | Mean clock offset of the node against the rest of the cluster. |
DEPENDENT | cockroachdb.clock.offset Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Version | Build information. |
DEPENDENT | cockroachdb.version Preprocessing: - PROMETHEUS_PATTERN: - DISCARD_UNCHANGED_HEARTBEAT: |
CockroachDB | CockroachDB: CPU: System time | System CPU time. |
DEPENDENT | cockroachdb.cpu.system_time Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: CPU: User time | User CPU time. |
DEPENDENT | cockroachdb.cpu.user_time Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: CPU: Utilization | CPU utilization in %. |
DEPENDENT | cockroachdb.cpu.util Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Disk: IOPS in progress, rate | Number of disk IO operations currently in progress on this host. |
DEPENDENT | cockroachdb.disk.iops.in_progress.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Disk: Reads, rate | Bytes read from all disks per second since this process started |
DEPENDENT | cockroachdb.disk.read.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Disk: Read IOPS, rate | Number of disk read operations per second across all disks since this process started. |
DEPENDENT | cockroachdb.disk.iops.read.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Disk: Writes, rate | Bytes written to all disks per second since this process started. |
DEPENDENT | cockroachdb.disk.write.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Disk: Write IOPS, rate | Disk write operations per second across all disks since this process started. |
DEPENDENT | cockroachdb.disk.iops.write.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: File descriptors: Limit | Open file descriptors soft limit of the process. |
DEPENDENT | cockroachdb.descriptors.limit Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: File descriptors: Open | The number of open file descriptors. |
DEPENDENT | cockroachdb.descriptors.open Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: GC: Pause time | The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused. |
DEPENDENT | cockroachdb.gc.pause_time Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: GC: Runs, rate | The number of times that Go's garbage collector was invoked per second across all nodes. |
DEPENDENT | cockroachdb.gc.runs.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Go: Goroutines count | Current number of Goroutines. This count should rise and fall based on load. |
DEPENDENT | cockroachdb.go.goroutines.count Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: KV transactions: Aborted, rate | Number of aborted KV transactions per second. |
DEPENDENT | cockroachdb.kv.transactions.aborted.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: KV transactions: Committed, rate | Number of KV transactions (including 1PC) committed per second. |
DEPENDENT | cockroachdb.kv.transactions.committed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Live nodes count | The number of live nodes in the cluster (will be 0 if this node is not itself live). |
DEPENDENT | cockroachdb.live_count Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Liveness heartbeats, rate | Number of successful node liveness heartbeats per second from this node. |
DEPENDENT | cockroachdb.heartbeaths.success.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Memory: Allocated by Cgo | Current bytes of memory allocated by the C layer. |
DEPENDENT | cockroachdb.memory.cgo.allocated Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Allocated by Go | Current bytes of memory allocated by the Go layer. |
DEPENDENT | cockroachdb.memory.go.allocated Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Managed by Cgo | Total bytes of memory managed by the C layer. |
DEPENDENT | cockroachdb.memory.cgo.managed Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Managed by Go | Total bytes of memory managed by the Go layer. |
DEPENDENT | cockroachdb.memory.go.managed Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Total usage | Resident set size (RSS) of memory in use by the node. |
DEPENDENT | cockroachdb.memory.total Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Network: Bytes received, rate | Bytes received per second on all network interfaces since this process started. |
DEPENDENT | cockroachdb.network.bytes.received.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Network: Bytes sent, rate | Bytes sent per second on all network interfaces since this process started. |
DEPENDENT | cockroachdb.network.bytes.sent.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Time series: Sample errors, rate | The number of errors encountered while attempting to write metrics to disk, per second. |
DEPENDENT | cockroachdb.ts.samples.errors.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Time series: Samples written, rate | The number of successfully written metric samples per second. |
DEPENDENT | cockroachdb.ts.samples.written.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Slow requests: DistSender RPCs | Number of RPCs stuck or retrying for a long time. |
DEPENDENT | cockroachdb.slow_requests.rpc Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL: Bytes received, rate | Total amount of incoming SQL client network traffic in bytes per second. |
DEPENDENT | cockroachdb.sql.bytes.received.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL: Bytes sent, rate | Total amount of outgoing SQL client network traffic in bytes per second. |
DEPENDENT | cockroachdb.sql.bytes.sent.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Memory: Allocated by SQL | Current SQL statement memory usage for root. |
DEPENDENT | cockroachdb.memory.sql Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL: Schema changes, rate | Total number of SQL DDL statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.schema_changes.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL sessions: Open | Total number of open SQL sessions. |
DEPENDENT | cockroachdb.sql.sessions Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL statements: Active | Total number of SQL statements currently active. |
DEPENDENT | cockroachdb.sql.statements.active Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL statements: DELETE, rate | A moving average of the number of DELETE statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.delete.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: Executed, rate | Number of SQL queries executed per second. |
DEPENDENT | cockroachdb.sql.statements.executed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: Denials, rate | The number of statements denied per second by a feature flag. |
DEPENDENT | cockroachdb.sql.statements.denials.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: Active flows distributed, rate | The number of distributed SQL flows currently active per second. |
DEPENDENT | cockroachdb.sql.statements.flows.active.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: INSERT, rate | A moving average of the number of INSERT statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.insert.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: SELECT, rate | A moving average of the number of SELECT statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.select.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: UPDATE, rate | A moving average of the number of UPDATE statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.update.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: Contention, rate | Total number of SQL statements that experienced contention per second. |
DEPENDENT | cockroachdb.sql.statements.contention.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL statements: Errors, rate | Total number of statements which returned a planning or runtime error per second. |
DEPENDENT | cockroachdb.sql.statements.errors.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Open | Total number of currently open SQL transactions. |
DEPENDENT | cockroachdb.sql.transactions.open Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL transactions: Aborted, rate | Total number of SQL transaction abort errors per second. |
DEPENDENT | cockroachdb.sql.transactions.aborted.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Committed, rate | Total number of SQL transaction COMMIT statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.transactions.committed.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Initiated, rate | Total number of SQL transaction BEGIN statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.transactions.initiated.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Rolled back, rate | Total number of SQL transaction ROLLBACK statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.transactions.rollbacks.rate Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Uptime | Process uptime. |
DEPENDENT | cockroachdb.uptime Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Node certificate expiration date | Node certificate expires at that date. |
DEPENDENT | cockroachdb.cert.expire_date.node Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: CA certificate expiration date | CA certificate expires at that date. |
DEPENDENT | cockroachdb.cert.expire_date.ca Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Bytes: Live | Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data. |
DEPENDENT | cockroachdb.storage.bytes.[{#STORE},live] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Bytes: System | Number of physical bytes stored in system key-value pairs. |
DEPENDENT | cockroachdb.storage.bytes.[{#STORE},system] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity available | Available storage capacity. |
DEPENDENT | cockroachdb.storage.capacity.[{#STORE},available] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity total | Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity. |
DEPENDENT | cockroachdb.storage.capacity.[{#STORE},total] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity used | Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files. |
DEPENDENT | cockroachdb.storage.capacity.[{#STORE},used] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity available in % | Available storage capacity in %. |
CALCULATED | cockroachdb.storage.capacity.[{#STORE},available_percent] Expression: last(//cockroachdb.storage.capacity.[{#STORE},available]) / last(//cockroachdb.storage.capacity.[{#STORE},total]) * 100 |
CockroachDB | CockroachDB: Storage [{#STORE}]: Replication: Lease holders | Number of lease holders. |
DEPENDENT | cockroachdb.replication.[{#STORE},lease_holders] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Bytes: Logical | Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data. |
DEPENDENT | cockroachdb.storage.bytes.[{#STORE},logical] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate | Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions. |
DEPENDENT | cockroachdb.rebalancing.queries.average.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate | Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions. |
DEPENDENT | cockroachdb.rebalancing.writes.average.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate | Number of replicas which failed processing in the consistency checker queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.consistency.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate | Number of replicas which failed processing in the GC queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.gc.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate | Number of replicas which failed processing in the Raft log queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate | Number of replicas which failed processing in the Raft repair queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate | Number of replicas which failed processing in the replica GC queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate | Number of replicas which failed processing in the replicate queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.replicate.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate | Number of replicas which failed processing in the split queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.split.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate | Number of replicas which failed processing in the time series maintenance queue per second. |
DEPENDENT | cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Ranges count | Number of ranges. |
DEPENDENT | cockroachdb.ranges.[{#STORE},count] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Ranges unavailable | Number of ranges with fewer live replicas than needed for quorum. |
DEPENDENT | cockroachdb.ranges.[{#STORE},unavailable] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Ranges underreplicated | Number of ranges with fewer live replicas than the replication target. |
DEPENDENT | cockroachdb.ranges.[{#STORE},underreplicated] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB read amplification | The average number of real read operations executed per logical read operation. |
DEPENDENT | cockroachdb.rocksdb.[{#STORE},read_amp] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate | Count of block cache hits per second. |
DEPENDENT | cockroachdb.rocksdb.cache.hits.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate | Count of block cache misses per second. |
DEPENDENT | cockroachdb.rocksdb.cache.misses.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: - CHANGE_PER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio | Block cache hit ratio in %. |
CALCULATED | cockroachdb.rocksdb.cache.[{#STORE},hit_ratio] Expression: last(//cockroachdb.rocksdb.cache.hits.[{#STORE},rate]) / (last(//cockroachdb.rocksdb.cache.hits.[{#STORE},rate]) + last(//cockroachdb.rocksdb.cache.misses.[{#STORE},rate])) * 100 |
CockroachDB | CockroachDB: Storage [{#STORE}]: Replication: Replicas | Number of replicas. |
DEPENDENT | cockroachdb.replication.replicas.[{#STORE},count] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced | Number of quiesced replicas. |
DEPENDENT | cockroachdb.replication.replicas.[{#STORE},quiesced] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions | Number of requests that have been stuck for a long time acquiring latches. |
DEPENDENT | cockroachdb.slow_requests.[{#STORE},latch_acquisitions] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions | Number of requests that have been stuck for a long time acquiring a lease. |
DEPENDENT | cockroachdb.slow_requests.[{#STORE},lease_acquisitions] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals | Number of requests that have been stuck for a long time in raft. |
DEPENDENT | cockroachdb.slow_requests.[{#STORE},raft_proposals] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB SSTables | The number of SSTables in use. |
DEPENDENT | cockroachdb.rocksdb.[{#STORE},sstables] Preprocessing: - PROMETHEUS_PATTERN: |
Zabbix raw items | CockroachDB: Get metrics | Get raw metrics from the Prometheus endpoint. |
HTTP_AGENT | cockroachdb.get_metrics Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: |
Zabbix raw items | CockroachDB: Get health | Get node /health endpoint |
HTTP_AGENT | cockroachdb.get_health Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: - REGEX: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | CockroachDB: Get readiness | Get node /health?ready=1 endpoint |
HTTP_AGENT | cockroachdb.get_readiness Preprocessing: - CHECK_NOT_SUPPORTED ⛔️ON_FAIL: - REGEX: - DISCARD_UNCHANGED_HEARTBEAT: |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
CockroachDB: Service is down | - |
last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0 |
AVERAGE | |
CockroachDB: Clock offset is too high | Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean). |
min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 |
WARNING | |
CockroachDB: Version has changed | - |
last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 |
INFO | |
CockroachDB: Current number of open files is too high | Getting close to open file descriptor limit. |
min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} |
WARNING | |
CockroachDB: Node is not executing SQL | Node is not executing SQL despite having connections. |
last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 |
WARNING | |
CockroachDB: SQL statements errors rate is too high | - |
min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} |
WARNING | |
CockroachDB: Node has been restarted | Uptime is less than 10 minutes. |
last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m |
INFO | |
CockroachDB: Failed to fetch node data | Zabbix has not received data for items for the last 5 minutes. |
nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 |
WARNING | Depends on: - CockroachDB: Service is down |
CockroachDB: Node certificate expires soon | Node certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} |
WARNING | |
CockroachDB: CA certificate expires soon | CA certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} |
WARNING | |
CockroachDB: Storage [{#STORE}]: Available storage capacity is low | Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Recovery expression: min(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) > {$COCKROACHDB.STORE.USED.MIN.WARN} |
WARNING | Depends on: - CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low |
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low | Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Recovery expression: min(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) > {$COCKROACHDB.STORE.USED.MIN.CRIT} |
AVERAGE | |
CockroachDB: Node is unhealthy | Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode. |
last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 |
AVERAGE | Depends on: - CockroachDB: Service is down |
CockroachDB: Node is not ready | Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons: - node is in the wait phase of the node shutdown sequence; - node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down. |
last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m |
AVERAGE | Depends on: - CockroachDB: Service is down |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.