Ad Widget

Collapse

Zabbix Server "livelock" (history syncers hang) upon massive influx of old timestamp

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • neri_866
    Junior Member
    • Feb 2021
    • 2

    #1

    Zabbix Server "livelock" (history syncers hang) upon massive influx of old timestamp

    We are experiencing a critical issue where Zabbix Server 6.0.39 LTS enters a state of "livelock" (unresponsive, high internal process load) when Zabbix Proxies send a large batch of metrics with invalid timestamps (Unix Epoch / year 1970).
    This happens when monitored devices lose power/CMOS battery and reset their time to 1970 before synchronizing NTP.

    The server does not recover automatically and requires a kill -9 to restart.

    Environment

    Zabbix Component: Server
    Version: 6.0.39 LTS
    OS: Linux (Ubuntu 24.04.2 .11.0-26-generic #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC)
    Database: postgresql-15.12
    timescaledb-2: 2.15.2
    Proxy: Active mode (~20,000 hosts, 40 imtems per host, 10-15 min update interval)
    Hardware: 16 vCPU, 32GB RAM, SSD (NVMe)

    Enable compression: off
    Override item history period: yes
    Override item trends period: yes



    Observed Behavior

    History Syncer utilization spikes step by step to 75-100% and stays there.
    pg_stat_activity shows thousands of rapidly appearing/disappearing SELECT queries:

    SELECT clock,ns,value FROM history_uint WHERE itemid=... AND clock<=... ORDER BY clock DESC LIMIT 2

    These queries are extremely fast, <2ms, and do not load the DB/Disk IO, but they flood the logic flow
    The server stops processing new data
    Logs are flooded with: item "..." value timestamp "1970.01.01..." is outside history storage period.
    Tuning attempts: Reducing HistoryCacheSize (to 64M) and StartDBSyncers (to 6-10) improved throughput but did not prevent the final freeze during heavy "1970" storms.

    Workaround / Fix Confirmation
    The issue was immediately resolved only after applying a hard constraint on the Database level to reject old data before Zabbix logic tries to process it fully:

    ALTER TABLE history ADD CONSTRAINT history_clock_check CHECK (clock > 1767225600) NOT VALID;
    -- (Applied to all history_* tables)
Working...