Ad Widget

Collapse

false - Zabbix agent on host1 is unreachable for 5 minutes: host1

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mkash28
    Junior Member
    • Oct 2015
    • 13

    #1

    false - Zabbix agent on host1 is unreachable for 5 minutes: host1

    Every other night exactly around 11:30 - 11:40 I get these flood of alerts (unreachable for 5 minutes) from all my hosts. Before it used to usually start with "poller processes 75% busy:” and ends with “poller processes 75% busy”. But now I do not see Poller process alerts any longer may be because we switched to Active checks.

    The issue won’t go away regardless of several changes to zabbix_server and mysql parameters. I also tried to delay the alert reporting (because unreachable issue corrects it self almost immediately) but that doesn't seem to be working correctly. So far the only way to avoid it is to change the severity of the alerts and opt out to receiving the text/emails. but that is very dangerous as we are using it Operationally. Here are my options:

    Zabbix-server.conf


    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=0
    PidFile=/var/run/zabbix/zabbix_server.pid
    DBHost=localhost
    DBName=db1
    DBUser=dbusr
    DBPassword=psswd
    DBSocket=/var/lib/mysql/mysql.sock
    StartPollers=1000
    StartDiscoverers=150
    SNMPTrapperFile=/var/log/snmptt/snmptt.log
    ListenIP=0.0.0.0
    HousekeepingFrequency=2
    MaxHousekeeperDelete=1000
    CacheSize=128M
    StartDBSyncers=10
    ValueCacheSize=128M
    Timeout=30
    UnreachableDelay=60
    AlertScriptsPath=/usr/lib/zabbix/alertscripts
    ExternalScripts=/usr/lib/zabbix/externalscripts



    Zabbix-agent.conf

    PidFile=/var/run/zabbix/zabbix_agentd.pid
    LogFile=/var/log/zabbix/zabbix_agentd.log
    LogFileSize=0
    Server=localhost
    ListenPort=10050
    ServerActive=1.2.3.4
    Hostname=Server1.gov
    Include=/etc/zabbix/zabbix_agentd.d/

    My.cnf


    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    user=mysql
    max_connections=10000
    wait_timeout=95000
    max_allowed_packet=1000M
    innodb_buffer_pool_size=32G
    innodb_buffer_pool_instances=1
    query_cache_type=1
    query_cache_size=128M
    join_buffer_size=300
    table_open_cache=2500
    binlog_format=mixed
    symbolic-links=0

    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
  • mkash28
    Junior Member
    • Oct 2015
    • 13

    #2
    false - Zabbix agent on host1 is unreachable for 5 minutes: host1

    it seems like a bug in software. I have created a delyed Action so any alert with "Triger name = Like = unreachable for 5 minutes" should be delayed for few minuts. And for all other actions I have added AND Condition to "triger name = not like = unreachable for 5 minutes" so they do not take cay action.

    Now If I manually disable agent on one of my host and wait then it does go through delay action and send alerts after 3 minuts have passed

    But at night its following totally different path. I am getting those unreachable alerts through some unknown action seems like, all actions are blocking "unreachable for 5" and my delayed Action would have blocked the alert. ALso the emails and txt i get is not the one from delayed action. format is different. ANy help is highly appreciated i am getting ready to through zabbix out the window and start the new solution from scratch.

    Comment

    • jan.garaj
      Senior Member
      Zabbix Certified Specialist
      • Jan 2010
      • 506

      #3
      Are you really sure that's a false alert? It seems to be problem in your infrastructure: for example you have some backup job (it can be enabled on non zabbix server) which overloads your network 11:30 - 11:40. Dunno.

      It can be million possible problems in your infrastructure. First detect cause. Check performance graphs: CPU usages/load, network throughput, HDD, IOPs, Zabbix performance metrics, DB metrics .... Try to find any 11:30 - 11:40 spikes/drops, which can indicate what is problem.
      Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
      My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

      Comment

      • mkash28
        Junior Member
        • Oct 2015
        • 13

        #4
        thank you very much for reply. i thought about that but since my server is pretty strong (Dell PowerEdge R720) to handle the load, i didnt put enough thought in to that. But i will review the crons that run during that time once again. thanks again.

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Originally posted by mkash28
          it seems like a bug in software. I have created a delyed Action so any alert with "Triger name = Like = unreachable for 5 minutes" should be delayed for few minuts. And for all other actions I have added AND Condition to "triger name = not like = unreachable for 5 minutes" so they do not take cay action.
          Some people says that only miracle is that there is no miracles.

          As long as only you knows your trigger definition .. sorry but no one can confirm what your impressions because there nothing to confirm that you may be right. As long as no one notice any issues with triggers evaluation code assuming that it must be software bug you making such assumption should be last thing which you should be doing.

          BTW: mentioning in trigger message hostname on which trigger is active is waste of preciouse space on active triggers table as internet same table in other column is clear info on which trigger is active. In other words you are duplicating this information making this list garder to read.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • bbrendon
            Senior Member
            • Sep 2005
            • 870

            #6
            Did you check the slow query log? Since you know the exact times, you might want to log into the server and watch it using your favorite version of top. Also keep mytop open. It doesn't sound like a zabbix bug.
            Unofficial Zabbix Expert
            Blog, Corporate Site

            Comment

            Working...