Ad Widget

Collapse

Notifications stuck after upgrade from 2.4.7 to 3.0.2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • thtux
    Junior Member
    • Apr 2011
    • 14

    #1

    Notifications stuck after upgrade from 2.4.7 to 3.0.2

    We migrated a Zabbix installation yesterday. Some facts about the environment:

    * SuSE Linux Enterprise Server 11 SP4 running Zabbix server 2.4.7 compiled from source
    * 4 Proxies monitoring about 200 hosts total
    * MySQL-5.6.16 on separate host
    * 800 Windows Servers monitored (including proxies)
    * 60k items, 12k triggers, 750 new values per second
    * database contains about 1 Billion records in total
    * Custom media-script forwarding problem/recovery messages to event mgmt system

    Before the migration the system was running absolutely perfect for months, no issue whatsoever.

    The upgrade itself was absolutely painless and database migration went smooth in only a few seconds. Zabbix server came up, web-frontend worked too but the event-forwarding got kinda stuck.

    The system was configured to call the media-script once for every trigger in problem state every 10 minutes. At the time of the migration we had 5 triggers in problem state but only one of them continued to call the media script.

    To make sure the media-script was not part of the problem I replaced this script with a very simple shell-script which only wrote the subject to a logfile. It worked for one of the open alerts and the script was also called (repeatedly) for some other triggers which were NOT in problem state and haven't been for weeks. The subject also was clearly marked: "OK: ...".

    The PHP GUI had no issues showing the new alerts we provoked for testing but none of them was forwarded to the media-script. Also, there were no "actions" listed for these triggers. It looked like there where the open triggers reported in the web GUI and also updated by the server itself but the notifications did take a completely different set of trigger to report some values which had not changed since weeks.

    We've actually given up already and I wanted to try one last thing before restoring the old environment and database. I removed all (1.5M) events in the database and I've also reset all triggers to "OK" in the database. After this intervention the media-script worked perfectly fine - nothing else changed. At first I've only reset all the triggers to "OK" - this did NOT solve the problem.

    Now it's about 24h since the upgrade and we're still running 3.0.2. Our event-management-system happily receives all notifications from the Zabbix server and no triggers are "stuck". It "looks" like it works now.

    To be honest I don't feel too confident about this "solution". I shouldn't need to manually remove 1.5M records from a table and reset the state of all triggers and I'm not sure if this really is the way to go. Will this work in the future or will this create some other issues later on?

    Thanks for reading this post. Zabbix 3.0.2 seems to be working now for us. But is there anyone else who had this kind of issues? Or is there a way to check the integrity of the database content to make sure it's all ok now?
Working...