Ad Widget

Collapse

Strange losses in the server + proxy bundle.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Loftyara
    Junior Member
    • Jun 2016
    • 10

    #1

    Strange losses in the server + proxy bundle.

    There is a Zabbix server version 5.0.4. For a number of reasons, we have a large value for the Timeout parameter we do not have enough StartPollers = 1000. We saw this on item Utilization of poller data collector processes in % in monitoring of zabbix server.
    We added the proxy of same version 5.0.4. The proxy is in active mode. We have distributed the hosts between the server and the proxy so that they have the same Utilization of poller. Now the average is ~ 68% on both servers. The server and proxy are on the same LAN segment.
    There are no triggers on Zabbix proxy. Latest data shows that there is not a single large percentage load. The queue size for Zabbix proxy is several hundred items.
    The latest data on the server does not show any large percentages load either.
    But there are problems with queues on Zabbix server. I have attached a screenshot. At the same time, there are no problems with the Zabbix server itself. The queue size is several hundred items. But the proxy queue size is very large.
    The first thing that we excluded is the database. By monitoring it, we do not see a large load and records are added from the server itself without problems, which is evidenced by a small queue size.
    The second thing we did was query the history tables. There are really not enough recent values for those items that are in the proxy queue.
    Queue details shows that absolutely random items are lost - of different types from different devices from different sites. All they have in common is that they come with a proxy. The list of such items is dynamic, items disappear from the queue and are added to it. Moreover, there are items that are in the queue for hours and days.

    There are no errors related to this situation in the logs on both sides.
    Somewhere along the way between the proxy and the server and the database, these values are lost. And once they are lost, they are then lost for a long time.
    Has anyone come across a similar one?
    Click image for larger version

Name:	1.jpg
Views:	341
Size:	51.0 KB
ID:	411554
  • Loftyara
    Junior Member
    • Jun 2016
    • 10

    #2
    Originally posted by splitek
    If you have items stuck for days then I suggest you to go into that item and look for info icon, sometimes it contain explanation what is going wrong.
    Sometimes items go into unsupported state, you can try click "check now", and at the same time observe "latest data".
    If this is complete random (different items, templates, hosts), item sometimes stuck sometimes not, and you checked this proxy and its DB then it can be a network problem (but problem should be in the serverlog or host/agent logs).
    It's generally hard to tell what's wrong just by looking at the queue. But I always go from the queue to the item configuration.
    If item goes into unavailable mode, then it is not shown in queues. That Items don't have an error. They are active. They just don't have new values. It is as if zabbix server receives items value from the proxy and therefore does not make them unavailable. But it does not write them to history.
    This cannot be a network problem. Two servers are near each other, zabbix uses TCP and a network problem will not lead to such selectivity.
    If we restart the server and proxy, the situation will repeat, but with other items.
    1-2% of all items get stuck and they are different items every time.

    Comment

    • Loftyara
      Junior Member
      • Jun 2016
      • 10

      #3
      Originally posted by splitek
      Check this: https://blog.zabbix.com/how-to-troub...h-queue/12244/

      One more thing, from time to time I see some part of the items stuck in queue, and some part from the same agent not. Pushing "check now" in item configuration makes them disappear from queue. Another item that gets stuck frequently is a poorly written LLD. This LLD returns duplicates from time to time - but returns. So why item is stuck?
      Thanks. Great article. Thanks to it, I once again made sure that there are no queues on the proxy side. The queue size based on the ids difference in the database ranges from 0 to several thousand. Several thousand values is the size of a single send. I also see that every second several thousand values are sent to the server. Now I am even more convinced that the problem is on the server side. But server stores its own values perfectly.

      Comment

      • Loftyara
        Junior Member
        • Jun 2016
        • 10

        #4
        I saw the issue.
        Proxy polls item correctly. This item is written to the database.
        There are two more items on this host. One item depends on this, the second items are calculated with the result of this.
        According to the documentation and common sense, the dependent and calculated items zabbix server does on its own.
        I see these items active but without history.
        Version 5.0.5 was released yesterday. I'll try to update and write about the changes.

        Comment

        Working...