Ad Widget

Collapse

zabbix-agent2 7.2.1 fails to start with NVIDIA GPU plugin on Linux

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cgunther
    Junior Member
    • Dec 2024
    • 2

    #1

    zabbix-agent2 7.2.1 fails to start with NVIDIA GPU plugin on Linux

    I upgraded zabbix-agent2 from 6.4 to 7.2.1 and it failed to start / send metrics to my local zabbix server. The log file /var/log/zabbix/zabbix_agent2.log was empty as the process failed to start.

    $ sudo systemctl status zabbix-agent2
    ● zabbix-agent2.service - Zabbix Agent 2
    Loaded: loaded (/usr/lib/systemd/system/zabbix-agent2.service; enabled; vendor preset: disabled)
    Active: activating (auto-restart) (Result: exit-code) since Sun 2024-12-29 11:18:03 AEDT; 5s ago
    Process: 112777 ExecStart=/usr/sbin/zabbix_agent2 -c $CONFFILE (code=exited, status=1/FAILURE)
    Main PID: 112777 (code=exited, status=1/FAILURE)

    Dec 29 11:18:03 hostname systemd[1]: zabbix-agent2.service: Main process exited, code=exited, status=1/FAILURE
    Dec 29 11:18:03 hostname systemd[1]: zabbix-agent2.service: Failed with result 'exit-code'.

    I tried running the the agent and got the following error:

    $ sudo /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf
    2024/12/29 11:19:16.376575 [NVIDIA] failed to kill plugin /usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu: Failed to kill plugin "/usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu" process: os: process already finished.
    zabbix_agent2 [113023]: ERROR: Cannot register plugins: failed to register metrics of plugin "NVIDIA": failed to start plugin: failed to create connection with plugin /usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu: failed to get connection within the time limit 3000000000.


    I couldn't see how to disable a specific plugin, so I removed the plugin configuration: /etc/zabbix/zabbix_agent2.d/plugins.d/nvidia.conf
    Now the zabbix-agent2 starts correctly, sends metrics and writes logs.

    This failed on all of my Linux virtual machines and hosts. I don't have any GPUs configured. The machines use Intel embedded graphics.
    • AlmaLinux 8.10 (VMs with SE Linux enabled. Most are headless)
    • Debian GNU/Linux 11 (VMs)
    • XCP-NG 8.3.0 (CentOS 7 based physical machines)
  • RedaDev17
    Junior Member
    • Dec 2024
    • 1

    #2
    Hello,

    I encountered the same issue with Zabbix Agent 2. To resolve it, you need to comment out the single line in the nvidia.conf file. This will prevent Zabbix Agent 2 from attempting to interact with the NVIDIA plugin, thereby eliminating the error.

    You can do this by editing the file:

    nano /etc/zabbix/zabbix_agent2.d/plugins.d/nvidia.conf
    Then comment out the following line:

    # Plugins.NVIDIA.System.Path=/usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu
    Save the file and restart the Zabbix Agent 2 service. The error should no longer appear.



    Comment


    • NPadmin
      NPadmin commented
      Editing a comment
      Thank you for this, fixed my crashing service.
  • cgunther
    Junior Member
    • Dec 2024
    • 2

    #3
    Thanks RedaDev17 for the suggestion on disabling the plugin.

    The real problem is the zabbix agent should start by default instead of crashing. Hopefully the Zabbix team can correct that in a future release.

    Comment

    • shafuq
      Junior Member
      • Oct 2024
      • 17

      #4
      I've been working to solve this for 4 hours. My case it was a fresh agent2 on a new server. I only did a minor update from 7.2.0 > 7.2.1. Today I noticed the service kept crashing every few seconds. Your solution worked! My other updated servers/agent didn't show this issue so I still don't get what this has different. But my problem is solved now. Thank you!

      Comment

      • troffasky
        Senior Member
        • Jul 2008
        • 565

        #5
        Hash tag me too. The install page suggests
        apt install zabbix-agent2 zabbix-agent2-plugin-*

        If you skip the "zabbix-agent2-plugin-*" bit then it all works fine.
        If you followed the instructions and try to fix it and remove all the unnecessary plugins with

        apt remove zabbix-agent2-plugin-ember-plus zabbix-agent2-plugin-mongodb zabbix-agent2-plugin-mssql zabbix-agent2-plugin-nvidia-gpu zabbix-agent2-plugin-postgresql

        the agent *still* won't start because now there are config files left behind referring to missing binaries [I guess? the logs were complaining about plugins I had removed that it wasn't complaining about before I removed them].
        So I had to
        dpkg -P zabbix-agent2-plugin-ember-plus zabbix-agent2-plugin-mongodb zabbix-agent2-plugin-mssql zabbix-agent2-plugin-nvidia-gpu zabbix-agent2-plugin-postgresql​

        and then all was well. What a ballache!

        Comment

        Working...