Ad Widget

Collapse

Zabbix introduces blocking to a non-blocking check

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • AquaNewt
    Junior Member
    • Feb 2024
    • 2

    #1

    Zabbix introduces blocking to a non-blocking check

    I've written two scripts, one for discovering nfses specifically, and a second script to check if the nfs is actually working or if the nfs is hanging because either the server has vanished or there's a firewall blocking the connection.

    discovery:
    Code:
    #!/bin/bash
    
    mounts=$(mount -t nfs,nfs3,nfs4 | cut -d' ' -f3)
    
    COMMA=''
    echo '['
    
    if [[ -n ${mounts} ]]; then
      while read -r line; do
        echo "${COMMA} { \"{#NFSMOUNT}\":\"${line}\" }"
        COMMA=','
      done <<<"$mounts"
    fi
    
    echo ']'
    ​
    check:
    Code:
    #!/bin/bash
    
    mount="$*"
    
    read -r -t1 < <(stat -t "${mount}") || lsof -b 2>/dev/null | grep "${mount}"
    echo $?
    ​
    If everything works, we get 0, if it doesn't, 1 as stdout. When I block nfs in the filesystem and run the check manually in almost every conceivable way, the check-script runs in about a second and returns 1, as it should. I can run it as root, I can run it as zabbix with sudo (for the stat), no problems.

    However, if the nfs is broken and the script is run either normally via zabbix or via `zabbix_agent2 -t nfs.check["/mnt/broken"]`, the check hangs for about a minute before being killed. The logs also show `failed to kill [sudo /etc/zabbix/userparams-scripts/nfs_check.sh "/mnt/broken"]: operation not permitted`. At absolutely no point should the script itself hang, nor should zabbix have a need of killing it because of a timeout.

    I'm at my wit's end, is zabbix broken, or is there anything I can do on my end to have this check work?
  • AquaNewt
    Junior Member
    • Feb 2024
    • 2

    #2
    I've found the reason for this behaviour: `read -t<number>` apparently doesn't work when bash is noninteractive, it just hangs endlessly, the same can be achieved by running the check with `watch ./check.sh /mnt/broken`. The solution is to replace the read part with `timeout 1 stat -t "${mount}" >/dev/null`.

    Comment

    Working...