hi,
last friday, our zabbix server suddenly stopped working properly. I was working on a new template and after some time afk I started work again and was greated with some error message about a locked table or sth. It disappeared quickly so I don't recall the exact wording. And I thought a simple refresh would fix it, but it didn't.
Instead, I noticed that the majority of my items stopped collecting data. Some still work, but only sporadically. A zabbix server restart gets all items to work again but after a short while, they stop working again. The queue shows most items as awaiting update for "more than 10 minutes".
There are no error messages in the server log that provide any hints. I suspected a DB problem and had a look at the processlist but I don't see anything alarming. There are a lot of long-lasting "sleep" connections and about five queries that are in "sending data" state for anything between 3-22 seconds. Maybe that's a little weird, but I don't know. Those queries disappear after a while, so it's not like they get stuck forever.
Please help me with debugging this! Any advice is appreciated. I do have a DB backup. If there's chance that it'll fix things, I'd apply it. But I'd loose some work so I want to be sure there's nothing else I can do first.
Cheers,
Chris
last friday, our zabbix server suddenly stopped working properly. I was working on a new template and after some time afk I started work again and was greated with some error message about a locked table or sth. It disappeared quickly so I don't recall the exact wording. And I thought a simple refresh would fix it, but it didn't.
Instead, I noticed that the majority of my items stopped collecting data. Some still work, but only sporadically. A zabbix server restart gets all items to work again but after a short while, they stop working again. The queue shows most items as awaiting update for "more than 10 minutes".
There are no error messages in the server log that provide any hints. I suspected a DB problem and had a look at the processlist but I don't see anything alarming. There are a lot of long-lasting "sleep" connections and about five queries that are in "sending data" state for anything between 3-22 seconds. Maybe that's a little weird, but I don't know. Those queries disappear after a while, so it's not like they get stuck forever.
Please help me with debugging this! Any advice is appreciated. I do have a DB backup. If there's chance that it'll fix things, I'd apply it. But I'd loose some work so I want to be sure there's nothing else I can do first.
Cheers,
Chris
Comment