Ad Widget

Collapse

Zabbix agent on server is unreachable for 5 minutes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ilia
    Member
    • Dec 2018
    • 37

    #1

    Zabbix agent on server is unreachable for 5 minutes

    i had many much more of them, then i set the params:

    StartPollers=1000
    StartPreprocessors=10
    StartPollersUnreachable=1000
    StartTrappers=100
    StartPingers=1000
    CacheSize=32M

    now i have much less but sill got a approx 20-30 radom unreachable events a day.
    any idea what i should do ?
    Click image for larger version  Name:	zabbix.png Views:	1 Size:	110.4 KB ID:	382385
  • ingus.vilnis
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2014
    • 908

    #2
    Setting max values for all data gathering processes is a bad idea, particularly if you can live with as little as 32MB cache, meaning you have a very small environment.

    Start by lowering the number of started processes so they are about 30-40% busy. Same goes for internal processes for which you have not posted any graphs and settings. Actually I'd recommend to go with default values on all settings and then gradually increasing the values till performance is optimal.

    Comment

    • ilia
      Member
      • Dec 2018
      • 37

      #3
      i thought of it and set to

      Pollers=100
      StartIPMIPollers=10
      StartPreprocessors=100
      StartPollersUnreachable=100
      StartTrappers=100
      StarStarttPingers=100
      StartDiscoverers=100
      CacheSize=64M
      HistoryCacheSize=32M
      HistoryIndexCacheSize=16M
      TrendCacheSize=16M
      ValueCacheSize=32M
      Timeout=10


      the system have 4 cores and 8 GB ram (approx 1.5GB ram is used)

      and it was stable for few hours
      and then few Unreachable events occured

      Click image for larger version

Name:	zabbix2.png
Views:	6382
Size:	77.4 KB
ID:	382408

      Comment

      • Mayday
        Junior Member
        • Apr 2019
        • 2

        #4
        Please kindly check if the clock is synchronized for agent servers. thanks.

        Comment

        • ilia
          Member
          • Dec 2018
          • 37

          #5
          I did, all computers have a synced ntp clock.
          i have upgraded from 4.0 to 4.2 few hours ago let hope it will be enough

          Comment

          • ingus.vilnis
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Mar 2014
            • 908

            #6
            Upgrade alone will be no help if you have such high values for started processes set in the configuration file.

            In Template App Zabbix Server there are five graphs that very well illustrate the health of Zabbix server. based on the reading all settings in the config file can be adjusted. You can show them all here to have more insight along with all currently configured values in the config file.

            Comment

            • ilia
              Member
              • Dec 2018
              • 37

              #7
              since upgrade it seems good,
              maybe the upgrade did some sort of flush or clean some values

              here are the charts: first with the load upon start, and second excluding startup load
              Click image for larger version

Name:	zabbix3h.png
Views:	6348
Size:	82.8 KB
ID:	382589Click image for larger version

Name:	zabbix3hnostart.png
Views:	6352
Size:	80.3 KB
ID:	382590

              Comment

              • ingus.vilnis
                Senior Member
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Mar 2014
                • 908

                #8
                When you receive the "false" alerts, are there really gaps in the collected values from those hosts? Compare one such time when you receive the alert with graphs from that host.

                Comment

                • ilia
                  Member
                  • Dec 2018
                  • 37

                  #9
                  check the chart on my 3rd post, when i have a spike at 12:20 i received a lot of alerts,
                  ut before i had received them all the time, usually at large chunks

                  never i had a gap in the data, but sometimes i had a relative spike (most of the time under the 80% mark)

                  Comment

                  • ingus.vilnis
                    Senior Member
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Mar 2014
                    • 908

                    #10
                    12:20 Pollers busy + UnreachablePollers busy = Network problem where Zabbix can't connect to devices it monitors.

                    Not sure what spikes you mention but go to Monitoring -> Latest data or Graphs for any device which had this agent unreachable alert and carefully look if you have all the values with exact timestamps collected at the problem period.

                    Comment

                    • ilia
                      Member
                      • Dec 2018
                      • 37

                      #11
                      just a note, i had a successful ping and a working ssh connection while zabbix gave the error
                      i never had internal issues when this occurs, cause in most time it occurs on hosts that are idle,

                      i cannot find something special

                      Comment

                      • ingus.vilnis
                        Senior Member
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Mar 2014
                        • 908

                        #12
                        Special would be missing data in for example CPU utilization graph for a host that had the false alert.

                        Look in Zabbix server log file for that period and analyze if there are any entries about the hosts with alerts.

                        Comment

                        • ilia
                          Member
                          • Dec 2018
                          • 37

                          #13
                          i tried, no gaps or error, next error ill dive deeper into it, cause now i enabled some external monitoring

                          Comment

                          Working...