Ceph

Ceph

Ceph is a free-software storage platform, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

Available solutions




Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/ceph_agent2


Ceph by Zabbix Agent2

Overview

For Zabbix version: 5.2 and higher
The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Ceph by Zabbix Agent2 — collects metrics by polling zabbix-agent2.

This template was tested on:

  • Ceph, version 14.2

Setup

See Zabbix template operation for basic instructions.

  1. Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
  2. Set the {$CEPH.CONNSTRING} such as <protocol(host:port)> or named session.
  3. Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Zabbix agent configuration file.

Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$CEPH.API.KEY}

-

zabbix_pass
{$CEPH.CONNSTRING}

-

https://localhost:8003
{$CEPH.USER}

-

zabbix

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
OSD

-

ZABBIX_PASSIVE ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Pool

-

ZABBIX_PASSIVE ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Items collected

Group Name Description Type Key and additional info
Ceph Ceph: Ping ZABBIX_PASSIVE ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 30m

Ceph Ceph: Number of Monitors

Number of Monitors configured in Ceph cluster

DEPENDENT ceph.num_mon

Preprocessing:

- JSONPATH: $.num_mon

- DISCARD_UNCHANGED_HEARTBEAT: 30m

Ceph Ceph: Overall cluster status

Overall Ceph cluster status, eg 0 - HEALTH_OK, 1 - HEALTH_WARN or 2 - HEALTH_ERR

DEPENDENT ceph.overall_status

Preprocessing:

- JSONPATH: $.overall_status

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Minimum Mon release version

min_mon_release_name

DEPENDENT ceph.min_mon_release_name

Preprocessing:

- JSONPATH: $.min_mon_release_name

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Ceph Ceph: Ceph Read bandwidth

Global read Bytes per second

DEPENDENT ceph.rd_bytes.rate

Preprocessing:

- JSONPATH: $.rd_bytes

- CHANGE_PER_SECOND

Ceph Ceph: Ceph Write bandwidth

Global write Bytes per second

DEPENDENT ceph.wr_bytes.rate

Preprocessing:

- JSONPATH: $.wr_bytes

- CHANGE_PER_SECOND

Ceph Ceph: Ceph Read operations per sec

Global read operations per second

DEPENDENT ceph.rd_ops.rate

Preprocessing:

- JSONPATH: $.rd_ops

Ceph Ceph: Ceph Write operations per sec

Global write operations per second

DEPENDENT ceph.wr_ops.rate

Preprocessing:

- JSONPATH: $.wr_ops

- CHANGE_PER_SECOND

Ceph Ceph: Total bytes available

Total bytes available in Ceph cluster

DEPENDENT ceph.total_avail_bytes

Preprocessing:

- JSONPATH: $.total_avail_bytes

Ceph Ceph: Total bytes

Total (RAW) capacity of Ceph cluster in bytes

DEPENDENT ceph.total_bytes

Preprocessing:

- JSONPATH: $.total_bytes

Ceph Ceph: Total bytes used

Total bytes used in Ceph cluster

DEPENDENT ceph.total_used_bytes

Preprocessing:

- JSONPATH: $.total_used_bytes

Ceph Ceph: Total number of objects

Total number of objects in Ceph cluster

DEPENDENT ceph.total_objects

Preprocessing:

- JSONPATH: $.total_objects

Ceph Ceph: Number of Placement Groups

Total number of Placement Groups in Ceph cluster

DEPENDENT ceph.num_pg

Preprocessing:

- JSONPATH: $.num_pg

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Number of Placement Groups in Temporary state

Total number of Placement Groups in pg_temp state

DEPENDENT ceph.num_pg_temp

Preprocessing:

- JSONPATH: $.num_pg_temp

Ceph Ceph: Number of Placement Groups in Active state

Total number of Placement Groups in active state

DEPENDENT ceph.pg_states.active

Preprocessing:

- JSONPATH: $.pg_states.active

Ceph Ceph: Number of Placement Groups in Clean state

Total number of Placement Groups in clean state

DEPENDENT ceph.pg_states.clean

Preprocessing:

- JSONPATH: $.pg_states.clean

Ceph Ceph: Number of Placement Groups in Peering state

Total number of Placement Groups in peering state

DEPENDENT ceph.pg_states.peering

Preprocessing:

- JSONPATH: $.pg_states.peering

Ceph Ceph: Number of Placement Groups in Scrubbing state

Total number of Placement Groups in scrubbing state

DEPENDENT ceph.pg_states.scrubbing

Preprocessing:

- JSONPATH: $.pg_states.scrubbing

Ceph Ceph: Number of Placement Groups in Undersized state

Total number of Placement Groups in undersized state

DEPENDENT ceph.pg_states.undersized

Preprocessing:

- JSONPATH: $.pg_states.undersized

Ceph Ceph: Number of Placement Groups in Backfilling state

Total number of Placement Groups in backfilling state

DEPENDENT ceph.pg_states.backfilling

Preprocessing:

- JSONPATH: $.pg_states.backfilling

Ceph Ceph: Number of Placement Groups in degraded state

Total number of Placement Groups in degraded state

DEPENDENT ceph.pg_states.degraded

Preprocessing:

- JSONPATH: $.pg_states.degraded

Ceph Ceph: Number of Placement Groups in inconsistent state

Total number of Placement Groups in inconsistent state

DEPENDENT ceph.pg_states.inconsistent

Preprocessing:

- JSONPATH: $.pg_states.inconsistent

Ceph Ceph: Number of Placement Groups in Unknown state

Total number of Placement Groups in unknown state

DEPENDENT ceph.pg_states.unknown

Preprocessing:

- JSONPATH: $.pg_states.unknown

Ceph Ceph: Number of Placement Groups in remapped state

Total number of Placement Groups in remapped state

DEPENDENT ceph.pg_states.remapped

Preprocessing:

- JSONPATH: $.pg_states.remapped

Ceph Ceph: Number of Placement Groups in recovering state

Total number of Placement Groups in recovering state

DEPENDENT ceph.pg_states.recovering

Preprocessing:

- JSONPATH: $.pg_states.recovering

Ceph Ceph: Number of Placement Groups in backfill_toofull state

Total number of Placement Groups in backfill_toofull state

DEPENDENT ceph.pg_states.backfill_toofull

Preprocessing:

- JSONPATH: $.pg_states.backfill_toofull

Ceph Ceph: Number of Placement Groups in backfill_wait state

Total number of Placement Groups in backfill_wait state

DEPENDENT ceph.pg_states.backfill_wait

Preprocessing:

- JSONPATH: $.pg_states.backfill_wait

Ceph Ceph: Number of Placement Groups in recovery_wait state

Total number of Placement Groups in recovery_wait state

DEPENDENT ceph.pg_states.recovery_wait

Preprocessing:

- JSONPATH: $.pg_states.recovery_wait

Ceph Ceph: Number of Pools

Total number of pools in Ceph cluster

DEPENDENT ceph.num_pools

Preprocessing:

- JSONPATH: $.num_pools

Ceph Ceph: Number of OSDs

Number of known storage daemons in Ceph cluster

DEPENDENT ceph.num_osd

Preprocessing:

- JSONPATH: $.num_osd

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Number of OSDs in state: UP

Total number of online storage daemons in Ceph cluster

DEPENDENT ceph.num_osd_up

Preprocessing:

- JSONPATH: $.num_osd_up

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Number of OSDs in state: IN

Total number of participating storage daemons in Ceph cluster

DEPENDENT ceph.num_osd_in

Preprocessing:

- JSONPATH: $.num_osd_in

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Ceph OSD avg fill

Average fill of OSDs

DEPENDENT ceph.osd_fill.avg

Preprocessing:

- JSONPATH: $.osd_fill.avg

Ceph Ceph: Ceph OSD max fill

Percentage fill of maximum filled OSD

DEPENDENT ceph.osd_fill.max

Preprocessing:

- JSONPATH: $.osd_fill.max

Ceph Ceph: Ceph OSD min fill

Percentage fill of minimum filled OSD

DEPENDENT ceph.osd_fill.min

Preprocessing:

- JSONPATH: $.osd_fill.min

Ceph Ceph: Ceph OSD max PGs

Maximum amount of PGs on OSDs

DEPENDENT ceph.osd_pgs.max

Preprocessing:

- JSONPATH: $.osd_pgs.max

Ceph Ceph: Ceph OSD min PGs

Minimum amount of PGs on OSDs

DEPENDENT ceph.osd_pgs.min

Preprocessing:

- JSONPATH: $.osd_pgs.min

Ceph Ceph: Ceph OSD avg PGs

Average amount of PGs on OSDs

DEPENDENT ceph.osd_pgs.avg

Preprocessing:

- JSONPATH: $.osd_pgs.avg

Ceph Ceph: Ceph OSD Apply latency Avg

Average apply latency of OSDs

DEPENDENT ceph.osd_latency_apply.avg

Preprocessing:

- JSONPATH: $.osd_latency_apply.avg

Ceph Ceph: Ceph OSD Apply latency Max

Maximum apply latency of OSDs

DEPENDENT ceph.osd_latency_apply.max

Preprocessing:

- JSONPATH: $.osd_latency_apply.max

Ceph Ceph: Ceph OSD Apply latency Min

Miniumum apply latency of OSDs

DEPENDENT ceph.osd_latency_apply.min

Preprocessing:

- JSONPATH: $.osd_latency_apply.min

Ceph Ceph: Ceph OSD Commit latency Avg

Average commit latency of OSDs

DEPENDENT ceph.osd_latency_commit.avg

Preprocessing:

- JSONPATH: $.osd_latency_commit.avg

Ceph Ceph: Ceph OSD Commit latency Max

Maximum commit latency of OSDs

DEPENDENT ceph.osd_latency_commit.max

Preprocessing:

- JSONPATH: $.osd_latency_commit.max

Ceph Ceph: Ceph OSD Commit latency Min

Minimum commit latency of OSDs

DEPENDENT ceph.osd_latency_commit.min

Preprocessing:

- JSONPATH: $.osd_latency_commit.min

Ceph Ceph: Ceph backfill full ratio

Backfill full ratio setting of Ceph cluster as configured on OSDMap

DEPENDENT ceph.osd_backfillfull_ratio

Preprocessing:

- JSONPATH: $.osd_backfillfull_ratio

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Ceph full ratio

Full ratio setting of Ceph cluster as configured on OSDMap

DEPENDENT ceph.osd_full_ratio

Preprocessing:

- JSONPATH: $.osd_full_ratio

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: Ceph nearfull ratio

Near full ratio setting of Ceph cluster as configured on OSDMap

DEPENDENT ceph.osd_nearfull_ratio

Preprocessing:

- JSONPATH: $.osd_nearfull_ratio

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: [osd.{#OSDNAME}] OSD in DEPENDENT ceph.osd[{#OSDNAME},in]

Preprocessing:

- JSONPATH: $.osds.{#OSDNAME}.in

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: [osd.{#OSDNAME}] OSD up DEPENDENT ceph.osd[{#OSDNAME},up]

Preprocessing:

- JSONPATH: $.osds.{#OSDNAME}.up

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Ceph Ceph: [osd.{#OSDNAME}] OSD PGs DEPENDENT ceph.osd[{#OSDNAME},num_pgs]

Preprocessing:

- JSONPATH: $.osds.{#OSDNAME}.num_pgs

⛔️ON_FAIL: DISCARD_VALUE ->

Ceph Ceph: [osd.{#OSDNAME}] OSD fill DEPENDENT ceph.osd[{#OSDNAME},fill]

Preprocessing:

- JSONPATH: $.osds.{#OSDNAME}.osd_fill

⛔️ON_FAIL: DISCARD_VALUE ->

Ceph Ceph: [osd.{#OSDNAME}] OSD latency apply

Time taken to flush an update to disks.

DEPENDENT ceph.osd[{#OSDNAME},latency_apply]

Preprocessing:

- JSONPATH: $.osds.{#OSDNAME}.osd_latency_apply

⛔️ON_FAIL: DISCARD_VALUE ->

Ceph Ceph: [osd.{#OSDNAME}] OSD latency commit

Time taken to commit an operation to the journal.

DEPENDENT ceph.osd[{#OSDNAME},latency_commit]

Preprocessing:

- JSONPATH: $.osds.{#OSDNAME}.osd_latency_commit

⛔️ON_FAIL: DISCARD_VALUE ->

Ceph Ceph: [{#POOLNAME}] Pool Used

Total bytes used in pool.

DEPENDENT ceph.pool[{#POOLNAME},bytes_used]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.bytes_used

Ceph Ceph: [{#POOLNAME}] Max available

The maximum available space in the given pool.

DEPENDENT ceph.pool[{#POOLNAME},max_avail]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.max_avail

Ceph Ceph: [{#POOLNAME}] Pool RAW Used

Bytes used in pool including copies made.

DEPENDENT ceph.pool[{#POOLNAME},stored_raw]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.stored_raw

Ceph Ceph: [{#POOLNAME}] Pool Percent Used

Percentage of storage used per pool

DEPENDENT ceph.pool[{#POOLNAME},percent_used]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.percent_used

Ceph Ceph: [{#POOLNAME}] Pool objects

Number of objects in the pool.

DEPENDENT ceph.pool[{#POOLNAME},objects]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.objects

Ceph Ceph: [{#POOLNAME}] Pool Read bandwidth

Per-pool read Bytes/second

DEPENDENT ceph.pool[{#POOLNAME},rd_bytes.rate]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.rd_bytes

- CHANGE_PER_SECOND

Ceph Ceph: [{#POOLNAME}] Pool Write bandwidth

Per-pool write Bytes/second

DEPENDENT ceph.pool[{#POOLNAME},wr_bytes.rate]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.wr_bytes

- CHANGE_PER_SECOND

Ceph Ceph: [{#POOLNAME}] Pool Read operations

Per-pool read operations/second

DEPENDENT ceph.pool[{#POOLNAME},rd_ops.rate]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.rd_ops

- CHANGE_PER_SECOND

Ceph Ceph: [{#POOLNAME}] Pool Write operations

Per-pool write operations/second

DEPENDENT ceph.pool[{#POOLNAME},wr_ops.rate]

Preprocessing:

- JSONPATH: $.pools.{#POOLNAME}.wr_ops

- CHANGE_PER_SECOND

Zabbix_raw_items Ceph: Get overall cluster status ZABBIX_PASSIVE ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Zabbix_raw_items Ceph: Get OSD stats ZABBIX_PASSIVE ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Zabbix_raw_items Ceph: Get OSD dump ZABBIX_PASSIVE ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Zabbix_raw_items Ceph: Get df ZABBIX_PASSIVE ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Triggers

Name Description Expression Severity Dependencies and additional info
Ceph: Can not connect to cluster

Connection to Ceph RESTful module is broken (if there is any error presented including AUTH and configuration issues).

{TEMPLATE_NAME:ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"].last()}=0 AVERAGE
Ceph: Cluster in ERROR state

-

{TEMPLATE_NAME:ceph.overall_status.last()}=2 AVERAGE

Manual close: YES

Ceph: Cluster in WARNING state

-

{TEMPLATE_NAME:ceph.overall_status.last()}=1

Recovery expression:

{TEMPLATE_NAME:ceph.overall_status.last()}=0
WARNING

Manual close: YES

Depends on:

- Ceph: Cluster in ERROR state

Ceph: Minimum monitor release version has changed (new version: {ITEM.VALUE})

Ceph version has changed. Ack to close.

{TEMPLATE_NAME:ceph.min_mon_release_name.diff()}=1 and {TEMPLATE_NAME:ceph.min_mon_release_name.strlen()}>0 INFO

Manual close: YES

Ceph: OSD osd.{#OSDNAME} is down

OSD osd.{#OSDNAME} is marked "down" in the osdmap.

The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network.

{TEMPLATE_NAME:ceph.osd[{#OSDNAME},up].last()} = 0 AVERAGE
Ceph: OSD osd.{#OSDNAME} is full

-

{TEMPLATE_NAME:ceph.osd[{#OSDNAME},fill].min(15m)} > {Ceph by Zabbix Agent2:ceph.osd_full_ratio.last()}*100 AVERAGE
Ceph: Ceph OSD osd.{#OSDNAME} is near full

-

{TEMPLATE_NAME:ceph.osd[{#OSDNAME},fill].min(15m)} > {Ceph by Zabbix Agent2:ceph.osd_nearfull_ratio.last()}*100 WARNING

Depends on:

- Ceph: OSD osd.{#OSDNAME} is full

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

Articles and documentation

+ Propose new article
Add your solution