Hello,
I am having a problem which despites all my research and readings, I can't find an answer to.
A little bit of History
---------
I am using zabbix for more than a year, studying it day by day, learning new things, optimizing and tweaking.
For most of the time I was running a small setup, about 50 NVPS on a VPS, had problems, tweaked, changed things and got it to work.
I recently changed my setup to include about 700 hosts / 250 NVPS, I knew I had to change the VPS and it is currently running on 2 SSD VPS, 1 for zabbix server, 1 for Percona MySQL DB.
The problem
------------------
Active Agents - "Zabbix Agent is unreachable for 2 minutes"
Like I said, I had a lot of problems, iowait problems, mysql settings not optimized, regular unparturitioned tables, zabbix server settings, etc..
The symptom of them all is the above message which is emailed to me in the hundreds or more for every host I have, sometimes it jumps between Problem/OK states a lot and I get thousands of emails until I reboot the server.
Currently DB is optimized, partitioned, same goes for Zabbix server various caches (most of them are 50-90% free).
For most of the time, things are working great, but sometimes I encounter the above problem, all metrics for both VPS looks normal, zabbix server looks OK, no spikes in values that I saw, same goes for the DB (I am monitoring Percona MySQL templates).
I can't figure out where the problem is, this system is in production and should not play like this. I am struggling with this for very long time, every time I think, "hey, you got this to work", I get another problem.
Every help is much appreciated.
Thanks in advance.
I am having a problem which despites all my research and readings, I can't find an answer to.
A little bit of History
---------
I am using zabbix for more than a year, studying it day by day, learning new things, optimizing and tweaking.
For most of the time I was running a small setup, about 50 NVPS on a VPS, had problems, tweaked, changed things and got it to work.
I recently changed my setup to include about 700 hosts / 250 NVPS, I knew I had to change the VPS and it is currently running on 2 SSD VPS, 1 for zabbix server, 1 for Percona MySQL DB.
The problem
------------------
Active Agents - "Zabbix Agent is unreachable for 2 minutes"
Like I said, I had a lot of problems, iowait problems, mysql settings not optimized, regular unparturitioned tables, zabbix server settings, etc..
The symptom of them all is the above message which is emailed to me in the hundreds or more for every host I have, sometimes it jumps between Problem/OK states a lot and I get thousands of emails until I reboot the server.
Currently DB is optimized, partitioned, same goes for Zabbix server various caches (most of them are 50-90% free).
For most of the time, things are working great, but sometimes I encounter the above problem, all metrics for both VPS looks normal, zabbix server looks OK, no spikes in values that I saw, same goes for the DB (I am monitoring Percona MySQL templates).
I can't figure out where the problem is, this system is in production and should not play like this. I am struggling with this for very long time, every time I think, "hey, you got this to work", I get another problem.
Every help is much appreciated.
Thanks in advance.
Comment