Clickhouse

Clickhouse

ClickHouse is an open-source column-oriented DBMS for online analytical processing developed by the Russian IT company Yandex for the Yandex.Metrica web analytics service. ClickHouse allows analysis of data that is updated in real time. The system is marketed for high performance.

Available solutions




Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/clickhouse_http


Template DB ClickHouse by HTTP

Overview

For Zabbix version: 5.0
The template to monitor ClickHouse by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

This template was tested on:

  • ClickHouse, version 19.14+, 20.3+

Setup

Create a user to monitor the service:

create file /etc/clickhouse-server/users.d/zabbix.xml 
<yandex>
    <users>
      <zabbix>
        <password>zabbix_pass</password>
        <networks incl="networks" />
        <profile>web</profile>
        <quota>default</quota>
        <allow_databases>
          <database>test</database>
        </allow_databases>
        </zabbix>
     </zabbix>
  </yandex>   

Login and password are also set in macros:

  • {$CLICKHOUSE.USER}
  • {$CLICKHOUSE.PASSWORD}

If you don't need authentication - remove headers from HTTP-Agent type items

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN}

Maximum size of distributed files queue to insert for trigger expression.

600
{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN}

Maximum number of delayed inserts for trigger expression.

0
{$CLICKHOUSE.LLD.FILTER.DB.MATCHES}

Filter of discoverable databases

.*
{$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES}

Filter to exclude discovered databases

CHANGE_IF_NEEDED
{$CLICKHOUSE.LLD.FILTER.DICT.MATCHES}

Filter of discoverable dictionaries

.*
{$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES}

Filter to exclude discovered dictionaries

CHANGE_IF_NEEDED
{$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN}

Maximum diff between log_pointer and log_max_index.

30
{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN}

Maximum number of smth for trigger expression

5
{$CLICKHOUSE.PARTS.PER.PARTITION.WARN}

Maximum number of parts per partition for trigger expression.

300
{$CLICKHOUSE.PASSWORD}

-

zabbix_pass
{$CLICKHOUSE.PORT}

The port of ClickHouse HTTP endpoint

8123
{$CLICKHOUSE.QUERY_TIME.MAX.WARN}

Maximum ClickHouse query time in seconds for trigger expression

600
{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN}

Maximum size of the queue for operations waiting to be performed for trigger expression.

20
{$CLICKHOUSE.REPLICA.MAX.WARN}

Replication lag across all tables for trigger expression.

600
{$CLICKHOUSE.SCHEME}

Request scheme which may be http or https

http
{$CLICKHOUSE.USER}

-

zabbix

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Tables

Info about tables

DEPENDENT clickhouse.tables.discovery

Filter:

AND

- A: {#DB} MATCHES_REGEX {$CLICKHOUSE.LLD.FILTER.DB.MATCHES}

- B: {#DB} NOT_MATCHES_REGEX {$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES}

Replicas

Info about replicas

DEPENDENT clickhouse.replicas.discovery

Filter:

AND

- A: {#DB} MATCHES_REGEX {$CLICKHOUSE.LLD.FILTER.DB.MATCHES}

- B: {#DB} NOT_MATCHES_REGEX {$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES}

Dictionaries

Info about dictionaries

DEPENDENT clickhouse.dictionaries.discovery

Filter:

AND

- A: {#NAME} MATCHES_REGEX {$CLICKHOUSE.LLD.FILTER.DICT.MATCHES}

- B: {#NAME} NOT_MATCHES_REGEX {$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES}

Items collected

Group Name Description Type Key and additional info
ClickHouse ClickHouse: Longest currently running query time

Get longest running query.

HTTP_AGENT clickhouse.process.elapsed
ClickHouse ClickHouse: Ping HTTP_AGENT clickhouse.ping

Preprocessing:

- REGEX: Ok\. 1

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- DISCARD_UNCHANGED_HEARTBEAT: 10m

ClickHouse ClickHouse: Version

Version of the server

HTTP_AGENT clickhouse.version

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 1d

ClickHouse ClickHouse: Revision

Revision of the server.

DEPENDENT clickhouse.revision

Preprocessing:

- JSONPATH: $[?(@.metric == "Revision")].value.first()

ClickHouse ClickHouse: Uptime

Number of seconds since ClickHouse server start

DEPENDENT clickhouse.uptime

Preprocessing:

- JSONPATH: $[?(@.metric == "Uptime")].value.first()

ClickHouse ClickHouse: New queries per second

Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.

DEPENDENT clickhouse.query.rate

Preprocessing:

- JSONPATH: $[?(@.data.event == "Query")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: New SELECT queries per second

Number of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.

DEPENDENT clickhouse.select_query.rate

Preprocessing:

- JSONPATH: $[?(@.event == "SelectQuery")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: New INSERT queries per second

Number of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.

DEPENDENT clickhouse.insert_query.rate

Preprocessing:

- JSONPATH: $[?(@.event == "InsertQuery")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Delayed insert queries

"Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table."

DEPENDENT clickhouse.insert.delay

Preprocessing:

- JSONPATH: $[?(@.metric == "DelayedInserts")].value.first()

ClickHouse ClickHouse: Current running queries

Number of executing queries

DEPENDENT clickhouse.query.current

Preprocessing:

- JSONPATH: $[?(@.metric == "Query")].value.first()

ClickHouse ClickHouse: Current running merges

Number of executing background merges

DEPENDENT clickhouse.merge.current

Preprocessing:

- JSONPATH: $[?(@.metric == "Merge")].value.first()

ClickHouse ClickHouse: Inserted bytes per second

The number of uncompressed bytes inserted in all tables.

DEPENDENT clickhouse.inserted_bytes.rate

Preprocessing:

- JSONPATH: $[?(@.event == "InsertedBytes")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Read bytes per second

"Number of bytes (the number of bytes before decompression) read from compressed sources (files, network)."

DEPENDENT clickhouse.read_bytes.rate

Preprocessing:

- JSONPATH: $[?(@.event == "ReadCompressedBytes")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Inserted rows per second

The number of rows inserted in all tables.

DEPENDENT clickhouse.inserted_rows.rate

Preprocessing:

- JSONPATH: $[?(@.event == "InsertedRows")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Merged rows per second

Rows read for background merges.

DEPENDENT clickhouse.merge_rows.rate

Preprocessing:

- JSONPATH: $[?(@.event == "MergedRows")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Uncompressed bytes merged per second

Uncompressed bytes that were read for background merges

DEPENDENT clickhouse.merge_bytes.rate

Preprocessing:

- JSONPATH: $[?(@.event == "MergedUncompressedBytes")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Max count of parts per partition across all tables

"Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition,

after that background merge process run."

DEPENDENT clickhouse.max.part.count.for.partition

Preprocessing:

- JSONPATH: $[?(@.metric == "MaxPartCountForPartition")].value.first()

ClickHouse ClickHouse: Current TCP connections

Number of connections to TCP server (clients with native interface).

DEPENDENT clickhouse.connections.tcp

Preprocessing:

- JSONPATH: $[?(@.metric == "TCPConnection")].value.first()

ClickHouse ClickHouse: Current HTTP connections

Number of connections to HTTP server.

DEPENDENT clickhouse.connections.http

Preprocessing:

- JSONPATH: $[?(@.metric == "HTTPConnection")].value.first()

ClickHouse ClickHouse: Current distribute connections

Number of connections to remote servers sending data that was INSERTed into Distributed tables.

DEPENDENT clickhouse.connections.distribute

Preprocessing:

- JSONPATH: $[?(@.metric == "DistributedSend")].value.first()

ClickHouse ClickHouse: Current MySQL connections

Number of connections to MySQL server.

DEPENDENT clickhouse.connections.mysql

Preprocessing:

- JSONPATH: $[?(@.metric == "MySQLConnection")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

ClickHouse ClickHouse: Current Interserver connections

Number of connections from other replicas to fetch parts.

DEPENDENT clickhouse.connections.interserver

Preprocessing:

- JSONPATH: $[?(@.metric == "InterserverConnection")].value.first()

ClickHouse ClickHouse: Network errors per second

Network errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update.

DEPENDENT clickhouse.network.error.rate

Preprocessing:

- JSONPATH: $[?(@.event == "NetworkErrors")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Read syscalls in fly

Number of read (read, pread, io_getevents, etc.) syscalls in fly

DEPENDENT clickhouse.read

Preprocessing:

- JSONPATH: $[?(@.metric == "Read")].value.first()

ClickHouse ClickHouse: Write syscalls in fly

Number of write (write, pwrite, io_getevents, etc.) syscalls in fly

DEPENDENT clickhouse.write

Preprocessing:

- JSONPATH: $[?(@.metric == "Write")].value.first()

ClickHouse ClickHouse: Allocated bytes

"Total number of bytes allocated by the application."

DEPENDENT clickhouse.jemalloc.allocated

Preprocessing:

- JSONPATH: $[?(@.metric == "jemalloc.allocated")].value.first()

ClickHouse ClickHouse: Resident memory

"Maximum number of bytes in physically resident data pages mapped by the allocator,

comprising all pages dedicated to allocator metadata, pages backing active allocations,

and unused dirty pages."

DEPENDENT clickhouse.jemalloc.resident

Preprocessing:

- JSONPATH: $[?(@.metric == "jemalloc.resident")].value.first()

ClickHouse ClickHouse: Mapped memory

"Total number of bytes in active extents mapped by the allocator."

DEPENDENT clickhouse.jemalloc.mapped

Preprocessing:

- JSONPATH: $[?(@.metric == "jemalloc.mapped")].value.first()

ClickHouse ClickHouse: Memory used for queries

"Total amount of memory (bytes) allocated in currently executing queries."

DEPENDENT clickhouse.memory.tracking

Preprocessing:

- JSONPATH: $[?(@.metric == "MemoryTracking")].value.first()

ClickHouse ClickHouse: Memory used for background merges

"Total amount of memory (bytes) allocated in background processing pool (that is dedicated for backround merges, mutations and fetches).

Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks."

DEPENDENT clickhouse.memory.tracking.background

Preprocessing:

- JSONPATH: $[?(@.metric == "MemoryTrackingInBackgroundProcessingPool")].value.first()

ClickHouse ClickHouse: Memory used for backround moves

"Total amount of memory (bytes) allocated in background processing pool (that is dedicated for backround moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa.

This happens naturally due to caches for tables indexes and doesn't indicate memory leaks."

DEPENDENT clickhouse.memory.tracking.background.moves

Preprocessing:

- JSONPATH: $[?(@.metric == "MemoryTrackingInBackgroundMoveProcessingPool")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

ClickHouse ClickHouse: Memory used for background schedule pool

"Total amount of memory (bytes) allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables)."

DEPENDENT clickhouse.memory.tracking.schedule.pool

Preprocessing:

- JSONPATH: $[?(@.metric == "MemoryTrackingInBackgroundSchedulePool")].value.first()

ClickHouse ClickHouse: Memory used for merges

"Total amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa.

This happens naturally due to caches for tables indexes and doesn't indicate memory leaks."

DEPENDENT clickhouse.memory.tracking.merges

Preprocessing:

- JSONPATH: $[?(@.metric == "MemoryTrackingForMerges")].value.first()

ClickHouse ClickHouse: Current distributed files to insert

Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed.

DEPENDENT clickhouse.distributed.files

Preprocessing:

- JSONPATH: $[?(@.metric == "DistributedFilesToInsert")].value.first()

ClickHouse ClickHouse: Distributed connection fail with retry per second

Connection retries in replicated DB connection pool

DEPENDENT clickhouse.distributed.files.retry.rate

Preprocessing:

- JSONPATH: $[?(@.metric == "DistributedConnectionFailTry")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Distributed connection fail with retry per second

"Connection failures after all retries in replicated DB connection pool"

DEPENDENT clickhouse.distributed.files.fail.rate

Preprocessing:

- JSONPATH: $[?(@.metric == "DistributedConnectionFailAtAll")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse ClickHouse: Replication lag across all tables

Maximum replica queue delay relative to current time

DEPENDENT clickhouse.replicas.max.absolute.delay

Preprocessing:

- JSONPATH: $[?(@.metric == "ReplicasMaxAbsoluteDelay")].value.first()

ClickHouse ClickHouse: Total replication tasks in queue DEPENDENT clickhouse.replicas.sum.queue.size

Preprocessing:

- JSONPATH: $[?(@.metric == "ReplicasSumQueueSize")].value.first()

ClickHouse ClickHouse: Total number read-only Replicas

"Number of Replicated tables that are currently in readonly state

due to re-initialization after ZooKeeper session loss

or due to startup without ZooKeeper configured."

DEPENDENT clickhouse.replicas.readonly.total

Preprocessing:

- JSONPATH: $[?(@.metric == "ReadonlyReplica")].value.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Bytes

Table size in bytes. Database: {#DB}, table: {#TABLE}

DEPENDENT clickhouse.table.bytes["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].bytes.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Parts

Number of parts of the table. Database: {#DB}, table: {#TABLE}

DEPENDENT clickhouse.table.parts["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].parts.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Rows

Number of rows in the table. Database: {#DB}, table: {#TABLE}

DEPENDENT clickhouse.table.rows["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].rows.first()

ClickHouse ClickHouse: {#DB}: Bytes

Database size in bytes.

DEPENDENT clickhouse.db.bytes["{#DB}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}")].bytes.sum()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica readonly

Whether the replica is in read-only mode.

This mode is turned on if the config doesn’t have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.

DEPENDENT clickhouse.replica.is_readonly["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].is_readonly.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica session expired

True if the ZooKeeper session expired

DEPENDENT clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].is_session_expired.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica future parts

Number of data parts that will appear as the result of INSERTs or merges that haven’t been done yet.

DEPENDENT clickhouse.replica.future_parts["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].future_parts.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica parts to check

Number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged.

DEPENDENT clickhouse.replica.parts_to_check["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].parts_to_check.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica queue size

Size of the queue for operations waiting to be performed.

DEPENDENT clickhouse.replica.queue_size["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].queue_size.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica queue inserts size

Number of inserts of blocks of data that need to be made.

DEPENDENT clickhouse.replica.inserts_in_queue["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].inserts_in_queue.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica queue merges size

Number of merges waiting to be made.

DEPENDENT clickhouse.replica.merges_in_queue["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].merges_in_queue.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica log max index

Maximum entry number in the log of general activity. (Have a non-zero value only where there is an active session with ZooKeeper).

DEPENDENT clickhouse.replica.log_max_index["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].log_max_index.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Replica log pointer

Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. (Have a non-zero value only where there is an active session with ZooKeeper).

DEPENDENT clickhouse.replica.log_pointer["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].log_pointer.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Total replicas

Total number of known replicas of this table. (Have a non-zero value only where there is an active session with ZooKeeper).

DEPENDENT clickhouse.replica.total_replicas["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].total_replicas.first()

ClickHouse ClickHouse: {#DB}.{#TABLE}: Active replicas

Number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). (Have a non-zero value only where there is an active session with ZooKeeper).

DEPENDENT clickhouse.replica.active_replicas["{#DB}.{#TABLE}"]

Preprocessing:

- JSONPATH: $[?(@.database == "{#DB}" && @.table == "{#TABLE}")].active_replicas.first()

ClickHouse ClickHouse: Dictionary {#NAME}: Bytes allocated

The amount of RAM the dictionary uses.

DEPENDENT clickhouse.dictionary.bytes_allocated["{#NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name == "{#NAME}")].bytes_allocated.first()

ClickHouse ClickHouse: Dictionary {#NAME}: Element count

Number of items stored in the dictionary.

DEPENDENT clickhouse.dictionary.element_count["{#NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name == "{#NAME}")].element_count.first()

ClickHouse ClickHouse: Dictionary {#NAME}: Load factor

The percentage filled in the dictionary (for a hashed dictionary, the percentage filled in the hash table).

DEPENDENT clickhouse.dictionary.load_factor["{#NAME}"]

Preprocessing:

- JSONPATH: $[?(@.name == "{#NAME}")].bytes_allocated.first()

- MULTIPLIER: 100

ClickHouse_ZooKeeper ClickHouse: ZooKeeper sessions

Number of sessions (connections) to ZooKeeper. Should be no more than one.

DEPENDENT clickhouse.zookeper.session

Preprocessing:

- JSONPATH: $[?(@.metric == "ZooKeeperSession")].value.first()

ClickHouse_ZooKeeper ClickHouse: ZooKeeper watches

Number of watches (e.g., event subscriptions) in ZooKeeperr.

DEPENDENT clickhouse.zookeper.watch

Preprocessing:

- JSONPATH: $[?(@.metric == "ZooKeeperWatch")].value.first()

ClickHouse_ZooKeeper ClickHouse: ZooKeeper requests

Number of requests to ZooKeeper in progress.

DEPENDENT clickhouse.zookeper.request

Preprocessing:

- JSONPATH: $[?(@.metric == "ZooKeeperRequest")].value.first()

ClickHouse_ZooKeeper ClickHouse: ZooKeeper wait time

Time spent in waiting for ZooKeeper operations.

DEPENDENT clickhouse.zookeper.wait.time

Preprocessing:

- JSONPATH: $[?(@.event == "ZooKeeperWaitMicroseconds")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- MULTIPLIER: 0.000001

- CHANGE_PER_SECOND

ClickHouse_ZooKeeper ClickHouse: ZooKeeper exeptions per second

Count of ZooKeeper exceptions that does not belong to user/hardware exceptions.

DEPENDENT clickhouse.zookeper.exeptions.rate

Preprocessing:

- JSONPATH: $[?(@.event == "ZooKeeperOtherExceptions")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse_ZooKeeper ClickHouse: ZooKeeper hardware exeptions per second

Count of ZooKeeper exceptions caused by session moved/expired, connection loss, marshalling error, operation timed out and invalid zhandle state.

DEPENDENT clickhouse.zookeper.hw_exeptions.rate

Preprocessing:

- JSONPATH: $[?(@.event == "ZooKeeperHardwareExceptions")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

ClickHouse_ZooKeeper ClickHouse: ZooKeeper user exeptions per second

Count of ZooKeeper exceptions caused by no znodes, bad version, node exists, node empty and no children for ephemeral.

DEPENDENT clickhouse.zookeper.user_exeptions.rate

Preprocessing:

- JSONPATH: $[?(@.event == "ZooKeeperUserExceptions")].value.first()

⛔️ON_FAIL: CUSTOM_VALUE -> 0

- CHANGE_PER_SECOND

Zabbix_raw_items ClickHouse: Get system.events

Get information about the number of events that have occurred in the system.

HTTP_AGENT clickhouse.system.events

Preprocessing:

- JSONPATH: $.data

Zabbix_raw_items ClickHouse: Get system.metrics

Get metrics which can be calculated instantly, or have a current value format JSONEachRow

HTTP_AGENT clickhouse.system.metrics

Preprocessing:

- JSONPATH: $.data

Zabbix_raw_items ClickHouse: Get system.asynchronous_metrics

Get metrics that are calculated periodically in the background

HTTP_AGENT clickhouse.system.asynchronous_metrics

Preprocessing:

- JSONPATH: $.data

Zabbix_raw_items ClickHouse: Get system.settings

Get information about settings that are currently in use.

HTTP_AGENT clickhouse.system.settings

Preprocessing:

- JSONPATH: $.data

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ClickHouse: Get replicas info

-

HTTP_AGENT clickhouse.replicas

Preprocessing:

- JSONPATH: $.data

Zabbix_raw_items ClickHouse: Get tables info

-

HTTP_AGENT clickhouse.tables

Preprocessing:

- JSONPATH: $.data

Zabbix_raw_items ClickHouse: Get dictionaries info

-

HTTP_AGENT clickhouse.dictionaries

Preprocessing:

- JSONPATH: $.data

Triggers

Name Description Expression Severity Dependencies and additional info
ClickHouse: There are queries running more than {$CLICKHOUSE.QUERY_TIME.MAX.WARN} seconds

-

{TEMPLATE_NAME:clickhouse.process.elapsed.last()}>{$CLICKHOUSE.QUERY_TIME.MAX.WARN} AVERAGE

Manual close: YES

ClickHouse: ClickHouse: Service is down

-

{TEMPLATE_NAME:clickhouse.ping.last()}=0 AVERAGE

Manual close: YES

ClickHouse: Version has changed (new version: {ITEM.VALUE})

ClickHouse version has changed. Ack to close.

{TEMPLATE_NAME:clickhouse.version.diff()}=1 and {TEMPLATE_NAME:clickhouse.version.strlen()}>0 INFO

Manual close: YES

ClickHouse: has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:clickhouse.uptime.last()}<10m INFO

Manual close: YES

ClickHouse: Failed to fetch info data (or no data for 30m)

Zabbix has not received data for items for the last 30 minutes

{TEMPLATE_NAME:clickhouse.uptime.nodata(30m)}=1 WARNING

Manual close: YES

Depends on:

- ClickHouse: ClickHouse: Service is down

ClickHouse: Too many throttled insert queries (over {$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN) for 5 min)

Clickhouse have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequency

{TEMPLATE_NAME:clickhouse.insert.delay.min(5m)}>{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} WARNING

Manual close: YES

ClickHouse: Too many MergeTree parts (over 90% of {$CLICKHOUSE.PARTS.PER.PARTITION.WARN})

"Descease INSERT queries frequency.

Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression)

and add one or more PARTS per INSERT inside each partition,

after that background merge process run, and when you have too much unmerged parts inside partition,

SELECT queries performance can significate degrade, so clickhouse try delay insert, or abort it"

{TEMPLATE_NAME:clickhouse.max.part.count.for.partition.min(5m)}>{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} * 0.9 WARNING

Manual close: YES

ClickHouse: Too many network errors (over {$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} in 5m)

Number of errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update is too hight.

{TEMPLATE_NAME:clickhouse.network.error.rate.min(5m)}>{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} WARNING
ClickHouse: Too many distributed files to insert (over {$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} for 5 min)

"Clickhouse servers and in config.xml

https://clickhouse.tech/docs/en/operations/table_engines/distributed/"

{TEMPLATE_NAME:clickhouse.distributed.files.min(5m)}>{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} WARNING

Manual close: YES

ClickHouse: Replication lag is too hight (over {$CLICKHOUSE.REPLICA.MAX.WARN} sec for 5min)

"When replica have too much lag, it can be skipped from Distributed SELECT Queries without errors

and you will have wrong query results."

{TEMPLATE_NAME:clickhouse.replicas.max.absolute.delay.min(5m)}>{$CLICKHOUSE.REPLICA.MAX.WARN} WARNING

Manual close: YES

ClickHouse: {#DB}.{#TABLE} Replica is readonly

This mode is turned on if the config doesn’t have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.

{TEMPLATE_NAME:clickhouse.replica.is_readonly["{#DB}.{#TABLE}"].min(5m)}=1 WARNING
ClickHouse: {#DB}.{#TABLE} Replica session is expired

This mode is turned on if the config doesn’t have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.

{TEMPLATE_NAME:clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"].min(5m)}=1 WARNING
ClickHouse: Too many operations in queue (over {$CLICKHOUSE.QUEUE.SIZE.MAX.WARN} for 5m)

-

{TEMPLATE_NAME:clickhouse.replica.queue_size["{#DB}.{#TABLE}"].min(5m)}>{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN:"{#TABLE}"} WARNING
ClickHouse: Differense between log_max_index and log_pointer is too hight (More than {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN})

-

{Template DB ClickHouse by HTTP:clickhouse.replica.log_max_index["{#DB}.{#TABLE}"].last()} - {TEMPLATE_NAME:clickhouse.replica.log_pointer["{#DB}.{#TABLE}"].last()} > {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} WARNING
ClickHouse: Number of active replicas less than number of total replicas

-

{TEMPLATE_NAME:clickhouse.replica.active_replicas["{#DB}.{#TABLE}"].max(5m)} < {Template DB ClickHouse by HTTP:clickhouse.replica.total_replicas["{#DB}.{#TABLE}"].last()} WARNING
ClickHouse: Too many ZooKeeper sessions opened

"Number of sessions (connections) to ZooKeeper.

Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows."

{TEMPLATE_NAME:clickhouse.zookeper.session.min(5m)}>1 WARNING
ClickHouse: Configuration has been changed

ClickHouse configuration has been changed. Ack to close.

{TEMPLATE_NAME:clickhouse.system.settings.diff()}=1 and {TEMPLATE_NAME:clickhouse.system.settings.strlen()}>0 INFO

Manual close: YES

Feedback

Please report any issues with the template at https://support.zabbix.com

Articles and documentation

+ Propose new article
Add your solution