Ad Widget

Collapse

Zabbix on Hyperv 2022 keeps going offline

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zabbixuser2022
    Junior Member
    • Mar 2022
    • 2

    #1

    Zabbix on Hyperv 2022 keeps going offline

    Hey all,

    I'm pretty new to Zabbix and I'm not much of a Linux guy but I recently set up Zabbix on our HP HyperV host, I've for 23 hosts I'm monitoring, a mixture of Windows ZBX clients and SNMP switches, it's a pretty light install. The VM has been going offline recently for 30-60 minutes and then I'm bombarded with emails telling me everything is offline, then again when it's online (as per my trigger instructions). None of the other VMs have the same issue at the time. The Zabbix server itself isn't pingable from the network during the downtime and when I look at the server, I see the following message:

    ^^[ 979.749856] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised.

    I've seen this message before and the server is still online, so perhaps that is a red herring!?

    I believe I'm running version 5.4.9, it's the official Zabbix appliance from the website which is running CentOS. I've also installed all the available CentOS updates since and that doesn't seem to fix it, but rebooting the VM does fix it until the next time it decides to go offline. It was fine initially for a couple of weeks, I had no problems whatsoever.

    I've given the server 2 CPU's and 4GB RAM with a 20GB disk, I thought this would be adequate for 20-30 hosts.

    Any thoughts would be great! thank you.

  • zabbixuser2022
    Junior Member
    • Mar 2022
    • 2

    #2
    Update: I've updated to version 6 and updated everything, still seem to be having the same issues, any one else had this?

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #3
      Originally posted by zabbixuser2022
      I'm pretty new to Zabbix and I'm not much of a Linux guy but I recently set up Zabbix on our HP HyperV host, I've for 23 hosts I'm monitoring, a mixture of Windows ZBX clients and SNMP switches, it's a pretty light install. The VM has been going offline recently for 30-60 minutes and then I'm bombarded with emails telling me everything is offline, then again when it's online (as per my trigger instructions). None of the other VMs have the same issue at the time. The Zabbix server itself isn't pingable from the network during the downtime and when I look at the server, I see the following message:

      ^^[ 979.749856] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised.
      That might be related to the problem and is certainly worth researching, but I would look in other logs on the appliance first to see if there are other clues.

      Have you ever caught the problem when it's happening, or does it usually happen and then eventually resolve itself before you get a chance to investigate? I'm not very familiar with Hyper-V and we don't run our Zabbix server/database/frontend on a VM, but at least in my experience with other VMs running on VMWare vSphere, when a VM is running low of some resource there will sometimes be warnings about resource utilization. Does Hyper-V have something similar, so you can tell if the VM was using all of its allocated CPU or other resources?

      As far as logs, I would look in a couple of logs
      1. /var/log/messages, for general system logs. Can you correlate any of the log messages from that file to any of the times when the VM guest has been completely unreachable? There aren't any messages about "oom-killer" in the logs during any of those times, are there?
      2. /var/log/zabbix/zabbix_server.log for Zabbix server specific logs. Anything in those logs at the same time as the the VM guest being unreachable?

      Before rebooting the VM, you could also use the 'dmesg' command to look at the kernel log buffer, but under normal conditions the important stuff from the log buffer will get written to other static system logs (boot.log for some stuff, /var/log/messages for most other stuff). Note that the kernel log buffer that 'dmesg' prints does not survive a reboot, so there's no point looking at it after a reboot.

      Comment

      • rthonpm
        Member
        • Jan 2016
        • 41

        #4
        Have you installed they hyperv-daemon packages and followed the other recommendations for running CentOS in Hyper-V?



        Sent from my BBE100-5 using Tapatalk

        Comment

        Working...