Ad Widget

Collapse

Interrupted system call !!

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • azurit
    Junior Member
    • Aug 2007
    • 23

    #1

    Interrupted system call !!

    hi,

    i'm still having problems with hosts availability:

    3020:20080305:161214 Timeout while answering request
    3020:20080305:161214 Get value from agent failed. Error: ZBX_TCP_READ() failed [Interrupted system call]
    3020:20080305:161214 Host [XXX]: first network error, wait for 15 seconds
    3020:20080305:161214 Parameter [agent.ping] will be checked after 120 seconds on host [XXX]

    this is happening about once per week for random host (also for localhost), i also get e-mail saying 'host unreachable'. hosts are, of course, always up and working.

    Debian Etch, Zabbix version 1.4.5 dev (downloaded at 02.03.2008), both server and client. the same thing was happening on 1.4.3 .

    please response me Alexei, this is very serious problem for me (zabbix is useless in this state, it's generating LOTS of false positives). do you need any other info ?
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    ZABBIX has failed to connect to the agent due to timeout, as simple as that. Why do you think this is ZABBIX fault? I'd like to see at least one evidence before any troubleshooting!
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • azurit
      Junior Member
      • Aug 2007
      • 23

      #3
      Originally posted by Alexei
      ZABBIX has failed to connect to the agent due to timeout, as simple as that. Why do you think this is ZABBIX fault? I'd like to see at least one evidence before any troubleshooting!
      i can't tell it for sure of course, but everything other is working fine. how can i find out what could cause this ? or how can i prove that it's zabbix fault ?

      Comment

      • azurit
        Junior Member
        • Aug 2007
        • 23

        #4
        Alexei, i'm doing 'agent.ping' every 30 second and i have 'unreachability trigger' set like this:
        {host:agent.ping.nodata(90)}=1

        so, if there are no data for 90 seconds (3 checks), host is considered as unreachable. now i don't understand why i'm getting 'host unrerachable' because of _one_ network error ? there should be also two more checks to be made before considering host as unreachable.

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Originally posted by azurit
          Alexei, i'm doing 'agent.ping' every 30 second and i have 'unreachability trigger' set like this:
          {host:agent.ping.nodata(90)}=1

          so, if there are no data for 90 seconds (3 checks), host is considered as unreachable. now i don't understand why i'm getting 'host unrerachable' because of _one_ network error ? there should be also two more checks to be made before considering host as unreachable.
          The nodata(90) means that there is no data coming for the agent.ping for ANY reason: high latency, performance problems on ZABBIX server side, failed checks, network problems, anything!
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • azurit
            Junior Member
            • Aug 2007
            • 23

            #6
            Originally posted by Alexei
            The nodata(90) means that there is no data coming for the agent.ping for ANY reason: high latency, performance problems on ZABBIX server side, failed checks, network problems, anything!
            yes, but nodata for _3_ checks (cos i'm doing 1 check per 30 seconds). there is only one message telling about network problems in the logs, which probably means, one check was false (nodata). what about two other checks ? why it is immidiately considered as 'unreachable' ?

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              Originally posted by azurit
              yes, but nodata for _3_ checks (cos i'm doing 1 check per 30 seconds). there is only one message telling about network problems in the logs, which probably means, one check was false (nodata). what about two other checks ? why it is immidiately considered as 'unreachable' ?
              One check per 30 second under ideal conditions. In case of timeout situation ZABBIX protects himself by disabling the host for 60 (if I recall correctly) seconds. You've got one timeout, this is enough to make all three checks fail.
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • azurit
                Junior Member
                • Aug 2007
                • 23

                #8
                Originally posted by Alexei
                One check per 30 second under ideal conditions. In case of timeout situation ZABBIX protects himself by disabling the host for 60 (if I recall correctly) seconds. You've got one timeout, this is enough to make all three checks fail.
                so what do you suggest as a solution ? how should i check for availability ?

                Comment

                Working...