Ad Widget

Collapse

Hundreds of unreachable for 5 minutes every hew days

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ionoxx
    Junior Member
    • Dec 2015
    • 8

    #1

    Hundreds of unreachable for 5 minutes every hew days

    Good day,

    I'm faily new to Zabbix and have been enjoying my time with it so far. I do have a recurring issue that is troublesome though.

    Usually every few weeks, but now twice in two days, The server stops processing data and I get a PROBLEM unreachable for 5 minutes for all my hosts, then a 5 minutes later or so, an OK.

    Here is all the relevant information and performance graphs that I can think of that might help.

    Hardware:
    XenServer VM
    4x vCPU E5-2609v3
    4GB RAM
    30GB VHD on LSI RAID6 6x 900GB 10K WD XE

    There are very few VMs on this server but the XenServer performance monitor shows that at 6:58am this morning, when the last error occurred, there was a spike in I/O (for this VM) to about 660iops when it usually averages 130.

    Software:
    Zabbix 2.4.6 on CentOS 6.7

    The attached graphs show what zabbix saw in itself. I'm not sure what to make of it.

    Thanks in advanced!
    Attached Files
  • LenR
    Senior Member
    • Sep 2009
    • 1005

    #2
    Check the logs for housekeeping or some other high I/O process. Do you have "log long queries" set in mysql? If so, check that too.

    I've seen similar caused by external activity in VM hosting, vmotion, VMware admins moving large storage objects, but if you see iops on your vm, I'd guess it's internal to you zabbix server.

    Could be a backup process, patching, maybe even GUI activity, DDOS, lots of fun.....

    Comment

    • ionoxx
      Junior Member
      • Dec 2015
      • 8

      #3
      I enabled the slow query log on the mysql server.

      I do see a correlation with the housekeeping which is usually just before the end of the hours, every hour. But it isn't alwyas the case.

      I did have a look at the zabbix_server.log file. When one of these events happens, I get lines as such in the log:

      item <item-name> became not supported: Timeout while_executing a shell script.

      Most of my items are active agents but the ones that fail are a updated versions of this script:


      They do not timeout under normal circumstance.

      Comment

      • Colttt
        Senior Member
        Zabbix Certified Specialist
        • Mar 2009
        • 878

        #4
        please also increase you RAM
        Debian-User

        Sorry for my bad english

        Comment

        • ionoxx
          Junior Member
          • Dec 2015
          • 8

          #5
          I constantly have 70% of my 4GB of ram unused.

          The MySQL instance is at its default settings. Could it be the MySQL server is chocking on the large amount of disk IO its trying to do because it's not caching enough in ram?

          Comment

          • LenR
            Senior Member
            • Sep 2009
            • 1005

            #6
            I think the problem is caused by the delay in the external check script as shown by the timeout error message. I think each external check will tie up a poller processes while until this timeout occurs, if you have more external checks that wait than you have pollers, nothing else will happen.

            Does this script use DNS lookup? How many pollers are you running and how many of these scripts might be running concurrently?

            Comment

            Working...