Ad Widget

Collapse

Collection/display problems?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pattieja
    Junior Member
    • Mar 2005
    • 14

    #1

    Collection/display problems?

    I've started to notice an issue with ZABBIX 1.1beta8 where the graphs it generates are sparse in the data being displayed. Then, what is even more suprising is that when any host is powered off, i.e., becomes unreachable, all other graphs for hosts that are still reachable suddenly are affected by the hosts that are not available for the duration of the unavailability. This seems to be incorrect behavior to me, unless I am misunderstanding what is happening.

    I have set NoTimeWait=1 in my server configuration and all agent configurations as well. I don't know if this affects anything. I am also seeing entries in the Queue screen, but they always seem to decrease to 0, then start back up again. I increased the timeout values in agent and server configurations to 30 seconds from 5 seconds.

    The system running zabbix_server is hardly loaded at all. Current uptime shows:

    15:29:19 up 10 days, 4:12, 1 user, load average: 0.00, 0.01, 0.03

    I'm only monitoring a total of 5 hosts, so far; that includes the zabbix_server host.

    I'm not understanding why data gathering should be inhibited for properly available systems. This even happens to data being collected through zabbix_agentd on the system running zabbix_server.

    A few pictures to illustrate (attached):
    Attached Files
  • pattieja
    Junior Member
    • Mar 2005
    • 14

    #2
    Happened again

    Since this issue began, I have added more servers to the mix running zabbix_agentd (1.1beta8). I left 5 of them off last night without modifying their monitored status to unmonitored. The issue with not collecting data for all other available hosts occurred almost from the moment I turned them off, overnight, until now when I just turned them back on. There were over 600 queued entries in the "More than 5 minutes" queue.

    Surely I am not the only person experiencing this issue? If so, then hopefully it's just a configuration issue and not a problem with the design of ZABBIX.

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Anything interesting in zabbix_server's log file?
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • pattieja
        Junior Member
        • Mar 2005
        • 14

        #4
        Just a bunch of:
        008110:20060421:024132 Timeout while connecting to [app-vs05-dev]
        008110:20060421:024132 Host [app-vs05-dev] will be checked after [60] seconds
        (randomly picked from the zabbix_server.log file)

        These go back and forth all night, pretty much only for app-vs05-dev and app-vs06-dev.

        However, at the time I turned them off, yesterday, this appeared in the logs:

        008110:20060420:184132 Cannot connect to [app-vs06-dev] [Connection refused]
        008110:20060420:184132 Started network errors for [app-vs06-dev]
        008110:20060420:184132 Host [app-vs06-dev]: another network error, wait for 5 seconds
        008110:20060420:184132 Cannot connect to [app-vs04-dev] [Connection refused]
        008110:20060420:184132 Started network errors for [app-vs04-dev]
        008110:20060420:184132 Host [app-vs04-dev]: another network error, wait for 5 seconds
        008110:20060420:184132 Cannot connect to [app-vs05-dev] [Connection refused]
        008110:20060420:184132 Started network errors for [app-vs05-dev]
        008110:20060420:184132 Host [app-vs05-dev]: another network error, wait for 5 seconds
        008110:20060420:184132 Cannot connect to [app-vs07-dev] [Connection refused]
        008110:20060420:184132 Started network errors for [app-vs07-dev]
        008110:20060420:184132 Host [app-vs07-dev]: another network error, wait for 5 seconds
        008110:20060420:184135 Cannot connect to [app-vs06-dev] [Connection refused]
        008110:20060420:184135 Host [app-vs06-dev]: another network error, wait for 5 seconds
        008110:20060420:184137 Cannot connect to [app-vs04-dev] [Connection refused]
        008110:20060420:184137 Host [app-vs04-dev]: another network error, wait for 5 seconds
        008110:20060420:184138 Cannot connect to [app-vs05-dev] [Connection refused]
        008110:20060420:184138 Host [app-vs05-dev]: another network error, wait for 5 seconds
        008110:20060420:184138 Cannot connect to [app-vs07-dev] [Connection refused]
        008110:20060420:184138 Host [app-vs07-dev]: another network error, wait for 5 seconds
        008110:20060420:184140 Cannot connect to [app-vs06-dev] [Connection refused]
        008110:20060420:184140 Host [app-vs06-dev]: another network error, wait for 5 seconds
        008110:20060420:184142 Cannot connect to [app-vs04-dev] [Connection refused]
        008110:20060420:184142 Host [app-vs04-dev]: another network error, wait for 5 seconds
        008110:20060420:184143 Cannot connect to [app-vs05-dev] [Connection refused]
        008110:20060420:184143 Host [app-vs05-dev]: another network error, wait for 5 seconds
        008110:20060420:184144 Cannot connect to [app-vs07-dev] [Connection refused]
        008110:20060420:184144 Host [app-vs07-dev]: another network error, wait for 5 seconds
        008110:20060420:184145 Cannot connect to [app-vs06-dev] [Connection refused]
        008110:20060420:184145 Host [app-vs06-dev]: another network error, wait for 5 seconds
        008110:20060420:184217 Timeout while connecting to [app-vs04-dev]
        008110:20060420:184217 Host [app-vs04-dev]: another network error, wait for 5 seconds
        008110:20060420:184247 Timeout while connecting to [app-vs04-dev]
        008110:20060420:184247 Host [app-vs04-dev] will be checked after [60] seconds
        008110:20060420:184317 Timeout while connecting to [app-vs06-dev]
        008110:20060420:184317 Host [app-vs06-dev] will be checked after [60] seconds
        008110:20060420:184347 Timeout while connecting to [app-vs04-dev]
        008110:20060420:184347 Host [app-vs04-dev] will be checked after [60] seconds
        008110:20060420:184417 Timeout while connecting to [app-vs07-dev]
        008110:20060420:184417 Host [app-vs07-dev] will be checked after [60] seconds

        After this entourage, the logs settled into back and forth checking app-vs05-dev and app-vs06-dev every 60 seconds for the remainder of the night until I turned all the systems back on. Every once in awhile, app-vs04-dev appeared throughout the night timed out to be checked in 60 seconds. But app-vs07-dev didn't show up once after its last appearance until almost exactly 12 hours later as timed out and to be checked again in 60 seconds.

        When I arrived in the morning, all the graph outputs for every server monitored by ZABBIX were empty ([no data]). The moment I turned these 4 servers (app-vs[04-07]-dev) back on, the graphs started graphing data again for all monitored hosts.

        Comment

        • pattieja
          Junior Member
          • Mar 2005
          • 14

          #5
          Also, if I leave the hosts defined but set their status to "not monitored", then all the graphs work just fine for all the other hosts (as long as they are all available). If one becomes unavailable, then the graphs start becoming sparse or not displaying data for the period of time any server is unavailable and is being monitored.

          Also, it appears that the more servers there are that are unavailable, the worse the "non-graphing" becomes.

          Comment

          • krusty
            Senior Member
            • Oct 2005
            • 222

            #6
            Same problem with 1.1beta9

            I have the same problem with 1.1beta9.

            I checked the database, but i can not found any problems or missing data. The zabbix will collecting data right. But some graphs wouldn´t be displayed. And some graphs would only sometimes displayed. Can anybody help me?

            I have changed the values for servers, pollers and trappers in zabbix_server.conf. I think the problem is located here.
            After the change some graphs would be displayed but not all. How can i found out which values are perfect for my server?

            If you need information please ask.

            Comment

            Working...