We are experiencing a critical issue where Zabbix Server 6.0.39 LTS enters a state of "livelock" (unresponsive, high internal process load) when Zabbix Proxies send a large batch of metrics with invalid timestamps (Unix Epoch / year 1970).
This happens when monitored devices lose power/CMOS battery and reset their time to 1970 before synchronizing NTP.
The server does not recover automatically and requires a kill -9 to restart.
Environment
Zabbix Component: Server
Version: 6.0.39 LTS
OS: Linux (Ubuntu 24.04.2 .11.0-26-generic #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC)
Database: postgresql-15.12
timescaledb-2: 2.15.2
Proxy: Active mode (~20,000 hosts, 40 imtems per host, 10-15 min update interval)
Hardware: 16 vCPU, 32GB RAM, SSD (NVMe)
Enable compression: off
Override item history period: yes
Override item trends period: yes
Observed Behavior
History Syncer utilization spikes step by step to 75-100% and stays there.
pg_stat_activity shows thousands of rapidly appearing/disappearing SELECT queries:
SELECT clock,ns,value FROM history_uint WHERE itemid=... AND clock<=... ORDER BY clock DESC LIMIT 2
These queries are extremely fast, <2ms, and do not load the DB/Disk IO, but they flood the logic flow
The server stops processing new data
Logs are flooded with: item "..." value timestamp "1970.01.01..." is outside history storage period.
Tuning attempts: Reducing HistoryCacheSize (to 64M) and StartDBSyncers (to 6-10) improved throughput but did not prevent the final freeze during heavy "1970" storms.
Workaround / Fix Confirmation
The issue was immediately resolved only after applying a hard constraint on the Database level to reject old data before Zabbix logic tries to process it fully:
ALTER TABLE history ADD CONSTRAINT history_clock_check CHECK (clock > 1767225600) NOT VALID;
-- (Applied to all history_* tables)
This happens when monitored devices lose power/CMOS battery and reset their time to 1970 before synchronizing NTP.
The server does not recover automatically and requires a kill -9 to restart.
Environment
Zabbix Component: Server
Version: 6.0.39 LTS
OS: Linux (Ubuntu 24.04.2 .11.0-26-generic #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC)
Database: postgresql-15.12
timescaledb-2: 2.15.2
Proxy: Active mode (~20,000 hosts, 40 imtems per host, 10-15 min update interval)
Hardware: 16 vCPU, 32GB RAM, SSD (NVMe)
Enable compression: off
Override item history period: yes
Override item trends period: yes
Observed Behavior
History Syncer utilization spikes step by step to 75-100% and stays there.
pg_stat_activity shows thousands of rapidly appearing/disappearing SELECT queries:
SELECT clock,ns,value FROM history_uint WHERE itemid=... AND clock<=... ORDER BY clock DESC LIMIT 2
These queries are extremely fast, <2ms, and do not load the DB/Disk IO, but they flood the logic flow
The server stops processing new data
Logs are flooded with: item "..." value timestamp "1970.01.01..." is outside history storage period.
Tuning attempts: Reducing HistoryCacheSize (to 64M) and StartDBSyncers (to 6-10) improved throughput but did not prevent the final freeze during heavy "1970" storms.
Workaround / Fix Confirmation
The issue was immediately resolved only after applying a hard constraint on the Database level to reject old data before Zabbix logic tries to process it fully:
ALTER TABLE history ADD CONSTRAINT history_clock_check CHECK (clock > 1767225600) NOT VALID;
-- (Applied to all history_* tables)