Hi
Our Zabbix 6.0.18 had a few instances (3 times) when zabbix_server process exited for no obvious reasons.
Relevant logs:
.....
1681176:20230727:023410.101 SNMP agent item "ifHCOutOctets.["46"]" on host <removed> failed: first network error, wait for 15 seconds
1682860:20230727:023410.425 SNMP agent item "hwWlanRadioChUtilizationRate2.[<removed>]" on host <removed> failed: first network error, wait for 15 seconds
1613353:20230727:023411.180 One child process died (PID:1681055,exitcode/signal:6). Exiting ...
1613353:20230727:023411.180 PROCESS EXIT: 1681055
1613361:20230727:023411.180 HA manager has been paused
1683126:20230727:023411.306 cannot write to IPC socket: Broken pipe
1683126:20230727:023411.306 cannot send data to LLD manager service
1683709:20230727:023412.010 cannot write to IPC socket: Broken pipe
1683709:20230727:023412.010 cannot retrieve alert results
1613361:20230727:023412.860 HA manager has been stopped
1613353:20230727:023412.976 syncing history data...
1613353:20230727:023413.065 syncing history data... 100.000000%
1613353:20230727:023413.065 syncing history data done
1613353:20230727:023413.065 syncing trend data...
1613353:20230727:023453.129 syncing trend data done
1613353:20230727:023454.099 Zabbix Server stopped. Zabbix 6.0.18 (revision d2032721bc8).
3089556:20230727:023504.224 Starting Zabbix Server. Zabbix 6.0.18 (revision d2032721bc8).
.....
Looked through several log pages before the crash, and all log entries are related to hosts becoming unreachable, or resuming.
We are running an HA environment so the standby node takes over​ as expected.
Increasing the verbosity of the logs is not viable due to how infrequently the issue happens (weeks pass between crashes) and the fairly large number of hosts/items we have: ~40000 hosts and 2 mil items.
Any idea on what might be causing this, or how to troubleshoot this further?
Info on system:
- 2 Zabbix server nodes (6.0.18)
- 2 Frontends (6.0.18)
- 3 database nodes (postgresql 14.8 with timeseries). Primary database node set statically in the Zabbix configuration (no database failover occurs)
All VMs running Ubuntu 22.04
Thanks
Stefano
Our Zabbix 6.0.18 had a few instances (3 times) when zabbix_server process exited for no obvious reasons.
Relevant logs:
.....
1681176:20230727:023410.101 SNMP agent item "ifHCOutOctets.["46"]" on host <removed> failed: first network error, wait for 15 seconds
1682860:20230727:023410.425 SNMP agent item "hwWlanRadioChUtilizationRate2.[<removed>]" on host <removed> failed: first network error, wait for 15 seconds
1613353:20230727:023411.180 One child process died (PID:1681055,exitcode/signal:6). Exiting ...
1613353:20230727:023411.180 PROCESS EXIT: 1681055
1613361:20230727:023411.180 HA manager has been paused
1683126:20230727:023411.306 cannot write to IPC socket: Broken pipe
1683126:20230727:023411.306 cannot send data to LLD manager service
1683709:20230727:023412.010 cannot write to IPC socket: Broken pipe
1683709:20230727:023412.010 cannot retrieve alert results
1613361:20230727:023412.860 HA manager has been stopped
1613353:20230727:023412.976 syncing history data...
1613353:20230727:023413.065 syncing history data... 100.000000%
1613353:20230727:023413.065 syncing history data done
1613353:20230727:023413.065 syncing trend data...
1613353:20230727:023453.129 syncing trend data done
1613353:20230727:023454.099 Zabbix Server stopped. Zabbix 6.0.18 (revision d2032721bc8).
3089556:20230727:023504.224 Starting Zabbix Server. Zabbix 6.0.18 (revision d2032721bc8).
.....
Looked through several log pages before the crash, and all log entries are related to hosts becoming unreachable, or resuming.
We are running an HA environment so the standby node takes over​ as expected.
Increasing the verbosity of the logs is not viable due to how infrequently the issue happens (weeks pass between crashes) and the fairly large number of hosts/items we have: ~40000 hosts and 2 mil items.
Any idea on what might be causing this, or how to troubleshoot this further?
Info on system:
- 2 Zabbix server nodes (6.0.18)
- 2 Frontends (6.0.18)
- 3 database nodes (postgresql 14.8 with timeseries). Primary database node set statically in the Zabbix configuration (no database failover occurs)
All VMs running Ubuntu 22.04
Thanks
Stefano
Comment