Yesterday I spent several hours troubleshooting and eventually trying to salvage my installation. We are not amused.
The important part of my installation is my main Zabbix server (MZS) and a remote node (RN). This setup has been running just fine for months now, but after I added a host to RN, the MZS started crashing. Apparently the RN had picked up some data that the MZS couldn't stomach.
Logs on RN and MZS gave no insights into the particular cause, because at DebugLevel==3, nothing important gets logged and with DebugLevel==4, far too much gets logged. No discernable errors or suspiciously looking strings turned up with DebugLevel==4. I did not save any logs, and I'm not going to provoke the same error again.
Both servers were running 1.4.5, and downgrading the MZS to 1.4.3 made no difference. The host I added to the RN was a Cisco Soho 91, with the standard snmp2 template. Both machines are running Debian 4.0, the DB is Mysql and communication is through an SSH tunnel.
I eventually truncated all the history tables on the RN, and that worked. But I lost data, obviously. Cleaning history for the host's items wasn't enough. It would be very, very nice to have features like "start housekeeping", "truncate all history", "Sync node A to node B" and more along those lines.
So now I have two big problems: I can't monitor my remote Cisco host, and I'm basically afraid of adding anything new to the system. That's pretty bad.
Zabbix developers:
- Please comment on the above.
- Please tell me how you're going to ensure that this wont occur again.
- Please tell me what you're going to do _right now_.
The important part of my installation is my main Zabbix server (MZS) and a remote node (RN). This setup has been running just fine for months now, but after I added a host to RN, the MZS started crashing. Apparently the RN had picked up some data that the MZS couldn't stomach.
Logs on RN and MZS gave no insights into the particular cause, because at DebugLevel==3, nothing important gets logged and with DebugLevel==4, far too much gets logged. No discernable errors or suspiciously looking strings turned up with DebugLevel==4. I did not save any logs, and I'm not going to provoke the same error again.
Both servers were running 1.4.5, and downgrading the MZS to 1.4.3 made no difference. The host I added to the RN was a Cisco Soho 91, with the standard snmp2 template. Both machines are running Debian 4.0, the DB is Mysql and communication is through an SSH tunnel.
I eventually truncated all the history tables on the RN, and that worked. But I lost data, obviously. Cleaning history for the host's items wasn't enough. It would be very, very nice to have features like "start housekeeping", "truncate all history", "Sync node A to node B" and more along those lines.
So now I have two big problems: I can't monitor my remote Cisco host, and I'm basically afraid of adding anything new to the system. That's pretty bad.
Zabbix developers:
- Please comment on the above.
- Please tell me how you're going to ensure that this wont occur again.
- Please tell me what you're going to do _right now_.

Comment