Ad Widget

Collapse

Large environment assistance

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • suprdave
    Junior Member
    • May 2011
    • 12

    #1

    Large environment assistance

    Our current Zabbix setup appears to be missing data and thinking hosts are offline when they really aren't every once in a while. This is not a physical networking problem. The database itself seems to be keeping up fine. The following are our stats:

    # of proxies: ~90
    # of hosts monitored: ~8000
    # of DB items: 64,185
    # of new values per second:

    +---------------------+
    | count(*)/avg(delay) |
    +---------------------+
    | 2.5523 |
    +---------------------+



    We believe it may be due to there just flat being too many proxies for the zabbix master to keep up with. We're thinking we need to lower the total number of proxies and just have each proxy handle more hosts. Any thoughts or other information I could provide for assistance?

    Thanks!
  • tchjts1
    Senior Member
    • May 2008
    • 1605

    #2
    Originally posted by suprdave
    Our current Zabbix setup appears to be missing data
    Can you clarify that statement? Where are you missing data? Are you seeing data breaks in your graphs?

    Originally posted by suprdave
    and thinking hosts are offline when they really aren't every once in a while.
    Again, can you clarify this? What is telling you that your hosts are offline? Are you talking about the stock "Host is unreachable" alert? I have completely done away with that alert. I found it was a false alert more often than not. A more reliable trigger/alert to use would be to use item agent.ping and set a time amount for it such as 5 minutes or 10 minutes. So for me, I have mine set at 20 minutes and then get an alert if "no Zabbix data is being received for > 20 minutes". Do a forum search on "agent.ping" to get the exact syntax usage.

    Regarding proxies.... we now have about 2,000 monitored hosts and we have 14 proxies in our environment, with really only about 9 of them utilized. The proxy with the largest workload has ~ 350 hosts assigned to it.

    Comment

    • suprdave
      Junior Member
      • May 2011
      • 12

      #3
      Hi!

      >Can you clarify that statement? Where are you missing data? Are you seeing data breaks in your graphs?

      - Poor choice of words, sorry. It's alerting on "Zabbix agent down" when the agent in fact is not down. They seem to come in batches of "Zabbix agent down" alert when a proxy appears to have an issue with talking to the main zabbix server. Typically just restarting the proxy resolves this problem.

      Comment

      • xsbr
        Junior Member
        Zabbix Certified Specialist
        • Oct 2009
        • 25

        #4
        Originally posted by suprdave
        Typically just restarting the proxy resolves this problem.
        Please, let me know which is your proxy db engine. If your using SQLite I can show you a simple script to measure the queue inside proxy.

        Are you using monitoring by DNS or IP? If you're monitoring by DNS, I recommend you use a DNS cache solution inside proxy, because it do a lot of queries.

        Comment

        Working...