There is a Zabbix server version 5.0.4. For a number of reasons, we have a large value for the Timeout parameter we do not have enough StartPollers = 1000. We saw this on item Utilization of poller data collector processes in % in monitoring of zabbix server.
We added the proxy of same version 5.0.4. The proxy is in active mode. We have distributed the hosts between the server and the proxy so that they have the same Utilization of poller. Now the average is ~ 68% on both servers. The server and proxy are on the same LAN segment.
There are no triggers on Zabbix proxy. Latest data shows that there is not a single large percentage load. The queue size for Zabbix proxy is several hundred items.
The latest data on the server does not show any large percentages load either.
But there are problems with queues on Zabbix server. I have attached a screenshot. At the same time, there are no problems with the Zabbix server itself. The queue size is several hundred items. But the proxy queue size is very large.
The first thing that we excluded is the database. By monitoring it, we do not see a large load and records are added from the server itself without problems, which is evidenced by a small queue size.
The second thing we did was query the history tables. There are really not enough recent values for those items that are in the proxy queue.
Queue details shows that absolutely random items are lost - of different types from different devices from different sites. All they have in common is that they come with a proxy. The list of such items is dynamic, items disappear from the queue and are added to it. Moreover, there are items that are in the queue for hours and days.
There are no errors related to this situation in the logs on both sides.
Somewhere along the way between the proxy and the server and the database, these values are lost. And once they are lost, they are then lost for a long time.
Has anyone come across a similar one?
We added the proxy of same version 5.0.4. The proxy is in active mode. We have distributed the hosts between the server and the proxy so that they have the same Utilization of poller. Now the average is ~ 68% on both servers. The server and proxy are on the same LAN segment.
There are no triggers on Zabbix proxy. Latest data shows that there is not a single large percentage load. The queue size for Zabbix proxy is several hundred items.
The latest data on the server does not show any large percentages load either.
But there are problems with queues on Zabbix server. I have attached a screenshot. At the same time, there are no problems with the Zabbix server itself. The queue size is several hundred items. But the proxy queue size is very large.
The first thing that we excluded is the database. By monitoring it, we do not see a large load and records are added from the server itself without problems, which is evidenced by a small queue size.
The second thing we did was query the history tables. There are really not enough recent values for those items that are in the proxy queue.
Queue details shows that absolutely random items are lost - of different types from different devices from different sites. All they have in common is that they come with a proxy. The list of such items is dynamic, items disappear from the queue and are added to it. Moreover, there are items that are in the queue for hours and days.
There are no errors related to this situation in the logs on both sides.
Somewhere along the way between the proxy and the server and the database, these values are lost. And once they are lost, they are then lost for a long time.
Has anyone come across a similar one?
Comment