Ad Widget

Collapse

Trigger Flapping from proxy items

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Gigoudt
    Junior Member
    • Sep 2010
    • 22

    #1

    Trigger Flapping from proxy items

    Hi,

    Im having a strange problem with my zabbix implementation. Hope if someone could have a clue or had the same problem.

    For some time ago in the dashboard the triggers of a lot of hosts are flapping without any reason. The trigger uses a icmping check with an or of nodata().

    When you go to the collected data the host is up and the time from when the data is collected is ok for the nodata(). I cant figure out why the trigger shows a problem and why is always flapping. The data is collected and in time !

    We have polling data that doesnt have this problem. All this triggers problems are with data collected from proxies. (setting more trappers wont be the solution, i think because de data is collected).

    Also sometimes the "zabbix server is running" turns to NO and after a short time turns to YES. There is no error in the log of the server.

    What parameter uses zabbix to resolve this of "zabbix server is running" ?

    The strange thing is that data is collected, and only for the proxy items is the problem.

    Hope someone could give a hint to continue searching.

    Im using zabbix 1.8.2.

    thanks
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #2
    1.8.2 has a number of shortcomings. Among the first things to change would be to upgrade to 1.8.4, or better yet, 1.8.5RC2.

    Beyond that, post your server configuration (names redacted for security, of course) and your number of items, hosts, triggers and new values per second so we can try to help with all available information. Also useful will be what hardware you are using for both the zabbix_server and the DB backend.

    Have you checked your "Queue" in Administration -> Queue in the UI to see if things are getting backed up?

    Comment

    • Gigoudt
      Junior Member
      • Sep 2010
      • 22

      #3
      Untergeek thanks for the reply. Here is the data of .conf


      StartPollers=70
      # StartIPMIPollers=0
      StartPollersUnreachable=20
      StartTrappers=80
      StartPingers=15
      # StartDiscoverers=1
      # StartHTTPPollers=1
      # HousekeepingFrequency=1
      # MaxHousekeeperDelete=500
      # DisableHousekeeping=0
      # SenderFrequency=30
      CacheSize=256M
      # CacheUpdateFrequency=60
      HistoryCacheSize=64M
      # TrendCacheSize=4M
      # HistoryTextCacheSize=16M
      # NodeNoEvents=0
      # NodeNoHistory=0
      Timeout=6
      # TrapperTimeout=300
      # TrapperTimeout=5
      # UnreachablePeriod=45
      # UnavailableDelay=60
      # UnreachableDelay=15
      all with # are used by default


      hosts 8160 (2000 polled with agents ; rest of hosts snmp, external, simple check items monitored from proxies)
      items 93003
      triggers 28872
      Required server performance, new values per second 201.7149

      Number of proxies 2000 proxies


      Hardware
      Zabbix_Server
      Single CPU 4core 2,27ghz
      5gb ram

      BD
      Dual CPU, 2X4
      12gb ram

      thanks for any advice and changes. As you say Im going to upgrade to 1.8.4.

      but i cant figure out why server could not determine correctly the triggers as explained in first thread.

      bye

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #4
        Might I also suggest increasing timeout from 6 to the 30 that is allowed?

        When you deal with a lot of items, latency tends to increase. With a higher timeout, you give zabbix more time to process requests before they are dropped. It is possible that your requests are being dropped because of a low timeout.

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          i't strongly suggest double-considering increasing the timeout, though. in case of many unreachable systems pollers may sit there for a long time before timing out, thus having no chance to gather data from other, available systems.
          Zabbix 3.0 Network Monitoring book

          Comment

          • Gigoudt
            Junior Member
            • Sep 2010
            • 22

            #6
            ok i will try that. The thing is that it is production I will wait for weekend to change parameter and restart.

            do you think that others values are ok and harware?

            bye

            Comment

            • Gigoudt
              Junior Member
              • Sep 2010
              • 22

              #7
              I cant wait i made the change !

              I post results later !

              thank again both

              Comment

              • Gigoudt
                Junior Member
                • Sep 2010
                • 22

                #8
                Trigger flapping continue ! the same as the first post!

                I think that timeout wont help because the data is collected. The problem is when the trigger is calculated (and only for proxies items) and timeout is not concerned about that.

                I will try to use wider nodata() range.
                UnreachablePeriod could help ?

                Comment

                • Gigoudt
                  Junior Member
                  • Sep 2010
                  • 22

                  #9
                  Hi,

                  changing the no data() with more seconds help a lot with the flapping.

                  Also i find out that there was a bandwith problem. With the solution of this the hole system is working really good.

                  It seems that the data that was collected from the proxies was stuck with the BW problem. And the change in the nodata() helps also

                  Im still having one question if someone could help. Sometimes "Zabbix server is running" change to NO.

                  What items uses the aplication to figure out this item

                  process ? conection to database ? data lost ?

                  this could help to figure out future problems

                  thanks

                  Comment

                  Working...