Ad Widget

Collapse

We run in active/standby and someone did a bad thing...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • guzzijason
    Senior Member
    • Dec 2015
    • 106

    #1

    We run in active/standby and someone did a bad thing...

    I have a pair of zabbix servers in an active-standby configuration. Unfortunately today, someone inadvertently started the zabbx_server on the standby node, which as you might expect, has caused issues with the DB (ie. duplicate keys).

    Anyone know of a good way to recover from this and clean up the DB? At the moment, I'm potentially considering just throwing away few hours worth of data in the DB, but I'm just starting to dig into the problem now.

    I guess until zabbix server can become HA-aware, I am going to have to resort to drastic safety measure, like completely removing the server binary from the standby node.

    __Jason
  • bbrendon
    Senior Member
    • Sep 2005
    • 870

    #2
    I vote restore from backup. I don't recall ever seeing a script that goes through and validates the correctness of what's stored in the DB.
    Unofficial Zabbix Expert
    Blog, Corporate Site

    Comment

    • guzzijason
      Senior Member
      • Dec 2015
      • 106

      #3
      I have my DB partitioned with daily partitions. Losing some data is tolerable, so I just truncated this day's partitions and restarted zabbix. So far, this seems to be working OK. Unfortunately, in may haste to get this resolved, I seem to have mangled one of my DB nodes, which is (hopefully) recovering now.

      The zabbix_server running on the good DB node seems happy for now.

      (another benefit of partitioning?)

      __Jason

      Comment

      Working...