Ad Widget

Collapse

Agent not responding when host busy?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • gryphius
    Member
    • Aug 2007
    • 30

    #1

    Agent not responding when host busy?

    last night me (and my boss ) got woken up at 4 am because of a "zabbix agent nor responding" alert. We use {tmpl_agent_important:agent.ping.nodata(300)}=1 as alerting trigger (send sms if no data for 5 minutes, agent.ping is checked every 30 sec) .

    It seems that the agent did not return any data for about 7 minutes. At this time, the monitored host was unter load, doing logrotates.

    Has anyone had similar problems with Linux agents? (I've found similar threads, but all talking about windows only)

    Could you give advice on how to work around this false positive? I don't just want to disable alerting between 03:45 and 04:15 because I fear I could miss real alerts. Any better options?

    Environment: Zabbix 1.4.2, Server Centos 5, Agent Centos 4.5
  • rts
    Member
    • May 2007
    • 54

    #2
    We've got similar problems. Certain servers are rotated out of production to do processing tasks, and we know they're going to be heavily loaded. However, Zabibx reports them as being offline, when actually when we know they're still online, just a bit busy. What's the solution here?

    Comment

    • oliverm
      Senior Member
      • May 2006
      • 155

      #3
      Same here. We have a client with a ropy old web server. A couple of times a day they have it do something (zip up logs before downloading, unzip some large client upload) and the alerts go off.

      Problem is, we have learnt to ignore them, which is worse that acting on them. I'd love to find a solution.

      Olly

      Comment

      • gryphius
        Member
        • Aug 2007
        • 30

        #4
        we have learnt to ignore them
        As I can't actually ignore my mobile ringing at 4 am I have increased the timeout until zabbix sends a sms. But this is just a workaround I'm not really happy with. Anyone got a solution that solves the real problem, eg. zabbix not responding?

        Comment

        • oliverm
          Senior Member
          • May 2006
          • 155

          #5
          Thinking wildly here. If the server is over loaded just before it stops responding, could you perhaps set a dependancy on the alert/trigger so that it checks to see the last value of the CPU usage ?

          Comment

          • nelsonab
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2006
            • 1233

            #6
            Try changing the "nice" level for the agent process so it has higher priority than other other processes. This may improve the changes that the agent will receive a slice of the cpu when the Zabbix server tries to connect with the agent. I don't think the agent is able to set it's nice level as of yet so it will have to be done manually. I haven't played with nice much lately and my brain is a little fuzzy so a quick education about renice would be in order before you use it. :-) Top can also change the nice level, but don't set it too low or your computer may be unresponsive.
            RHCE, author of zbxapi
            Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
            Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

            Comment

            • rts
              Member
              • May 2007
              • 54

              #7
              One solution

              I've managed to stop receiving alerts by changing the UnreachablePeriod in zabbix_server.conf from 45 seconds to 180 seconds. My understanding of this parameter is that the host is demed unreachable if no items are returned within this time period.

              If my understanding is wrong, then I'd love someone to correct me. However, it does seem to have the desired effect.

              Comment

              Working...