Ad Widget

Collapse

Discussion thread for official Zabbix SMART Disk monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • evert
    replied
    Originally posted by PavelZ
    That's the idea behind this plugin. But it only does this during autodiscovery, doesn't it?
    I get them once an hour, as far as I can tell. (At which interval does Autodiscovery run?)

    Originally posted by PavelZ
    By the way, if you don't like the large number of these messages in the log, you can turn it off by configuration of sudoers file
    How do I do this? And does this only suppress the messages, or does it actually keep smartctl from executing the variations?

    Leave a comment:


  • PavelZ
    replied
    That's the idea behind this plugin. But it only does this during autodiscovery, doesn't it?

    By the way, if you don't like the large number of these messages in the log, you can turn it off by configuration of sudoers file

    Leave a comment:


  • evert
    replied
    Why does Zabbix query several types of controllers for each HDD?

    Code:
    Feb 24 00:42:14 app sudo[2115688]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdc -d cciss,0 -j
    Feb 24 00:42:14 app sudo[2115683]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sde -d scsi -j
    Feb 24 00:42:14 app sudo[2115684]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdb -d scsi -j
    Feb 24 00:42:14 app sudo[2115685]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sda -d cciss,0 -j
    Feb 24 00:42:14 app sudo[2115687]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdd -d areca,1 -j
    Feb 24 00:42:14 app sudo[2115692]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdc -d 3ware,0 -j
    Feb 24 00:42:14 app sudo[2115690]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/nvme0 -j
    Feb 24 00:42:14 app sudo[2115694]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdc -d areca,1 -j
    Feb 24 00:42:14 app sudo[2115696]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdb -d cciss,0 -j
    Feb 24 00:42:14 app sudo[2115699]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sda -d areca,1 -j
    Feb 24 00:42:14 app sudo[2115689]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sde -d 3ware,0 -j
    Feb 24 00:42:14 app sudo[2115702]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sda -d scsi -j
    Feb 24 00:42:14 app sudo[2115705]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdd -d 3ware,0 -j
    Feb 24 00:42:14 app sudo[2115697]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sda -d 3ware,0 -j
    Feb 24 00:42:14 app sudo[2115704]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sde -d areca,1 -j
    Feb 24 00:42:14 app sudo[2115700]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdc -d scsi -j
    Feb 24 00:42:14 app sudo[2115698]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sde -d cciss,0 -j
    Feb 24 00:42:14 app sudo[2115709]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdd -d cciss,0 -j
    Feb 24 00:42:14 app sudo[2115701]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdb -d areca,1 -j
    Feb 24 00:42:14 app sudo[2115703]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdb -d sat -j
    Feb 24 00:42:14 app sudo[2115686]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdb -d 3ware,0 -j
    Feb 24 00:42:14 app sudo[2115708]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdc -d sat -j
    Feb 24 00:42:14 app sudo[2115695]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdd -d scsi -j
    Feb 24 00:42:14 app sudo[2115693]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sda -d sat -j
    Feb 24 00:42:14 app sudo[2115691]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sdd -d sat -j
    Feb 24 00:42:14 app sudo[2115707]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/nvme1 -j
    Feb 24 00:42:14 app sudo[2115706]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/sde -d sat -j
    Can I configure the correct type for each disk, to prevent all these extra queries? Or, perhaps even easier, disable the whole '-d XXX' parameter being sent?

    Leave a comment:


  • PavelZ
    replied
    There are a few workarounds that I decided to share.
    Original zabbix templates collection. Contribute to pavlozt/somezabbixtemplates development by creating an account on GitHub.

    This should temporarily close the problems with long surface test ZBX-22770 and ioctl-storm ZBX-25632

    Leave a comment:


  • PavelZ
    replied
    Originally posted by eTawR
    did anybody else have this problem and found a solution?
    This is a fairly common situation with many indicators now. This preprocessing step in this item is to Discard unchanged with heartbeat 6h.
    It's not easy to decide how the Zabbix creators should handle this. When you look at the item graph, you just have to keep this in mind.

    I suggest using updated dashboards or grafana, where missing values ​​can be connected.

    Leave a comment:


  • PavelZ
    replied
    Originally posted by DmitryDonskikh
    Discovery works ok, "Get disk attributes" item receives data, BUT: JSON contains not all attributes, and only RAW values.
    See, this is what it retutrns on "Wear levelling count":
    There is a separate ticket dedicated to this problem https://support.zabbix.com/browse/ZBX-25646
    And I suggest you leave a mark there. I also can't figure out how to get this data.

    Also, it is unclear how you are going to unify these indicators between different SSD manufacturers? According to my research, everything is not that simple.
    The situation is somewhat smoothed out by the fact that in modern NVMe disks the attributes have been standardized.
    Last edited by PavelZ; 06-12-2024, 12:34.

    Leave a comment:


  • DmitryDonskikh
    replied
    Hi guys!
    I have a working Zabbix server 7.0.6 and a bunch of Win10/11 PCs with Zabbix agent2 7.0.6, with smartctl 7.4.
    Zabbix template template_module_smart_agent2.yaml version 7.0-1.
    Discovery works ok, "Get disk attributes" item receives data, BUT: JSON contains not all attributes, and only RAW values.
    See, this is what it retutrns on "Wear levelling count":

    zabbix plugin:
    Code:
    "wear_leveling_count": {
    "value": 633,
    "raw": "633"
    }
    while smartctl -a -j is much more informational:

    smatrctl:
    Code:
    {
    "id": 173,
    "name": "Wear_Leveling_Count",
    "value": 50,
    "worst": 50,
    "thresh": 5,
    "when_failed": "",
    "flags": {
    "value": 51,
    "string": "PO--CK ",
    "prefailure": true,
    "updated_online": true,
    "performance": false,
    "error_rate": false,
    "event_count": true,
    "auto_keep": true
    },
    "raw": {
    "value": 633,
    "string": "633"
    }
    }
    Moreover, plugin's JSON lacks a lot of valuable information which smartctl can obtain: disk size, speed, form-factor etc:

    smartctl:
    Code:
      "user_capacity": {
        "blocks": 250069680,
        "bytes": 128035676160
      },
      "logical_block_size": 512,
      "physical_block_size": 4096,
      "rotation_rate": 0,
      "form_factor": {
        "ata_value": 3,
        "name": "2.5 inches"
      },
      "interface_speed": {
        "current": {
          "sata_value": 3,
          "string": "6.0 Gb/s",
          "units_per_second": 60,
          "bits_per_unit": 100000000
        }
      },
    and so on.
    Why so? IIUC, I should rely on SMART-calculated values, not RAW values. Am I missing something? Or how should I deal with it?

    Last edited by DmitryDonskikh; 02-12-2024, 01:58.

    Leave a comment:


  • schoeppi
    replied
    No, I do not have a better solution than the adapted trigger posted before. And I still don't know if this is the right aproach.

    Smart monitoring with Zabbix is IMHO not really good. The described problem I have triggers much false alerts and if smart values are changing only a generic number with the problem is shown in the operational data which makes it necessary to find out if there is a real serious problem or not.

    I wonder how others do smart monitoring with Zabbix and I wonder if there are better solutions then the template.

    In general I am very happy with Zabbix, but smart monitoring is one of the less things which really should be improved.

    Leave a comment:


  • dixie2k
    replied
    Originally posted by schoeppi
    Once per day I get this message for all disks in all our servers:

    Problem name: SMART [sdXYZ sat]: Disk self-test is not passed

    This seems to be a bug which occures when a selftest is running for the disks on the servers which is the case for our machines.

    Do you have the same issue and how do you handle it? Do you have selftests deactivated?

    Now I've changed the trigger to this to hopefulle stop this error till it is fixed in the smart zabbix agnet2 active template, but I don't know if this the right approach:

    last(/SMART by Zabbix agent 2/smart.disk.test[{#NAME}],#1)="false" and last(/SMART by Zabbix agent 2/smart.disk.test[{#NAME}],#2)="false"

    What do you think and how do you get around this bug?
    schoeppi

    Have you managed a resolution? If so, how?

    Thanks!

    Leave a comment:


  • eTawR
    replied
    I've got smart integration working so far. My only problem is, that the graphs are not really graphs:

    Click image for larger version

Name:	image.png
Views:	1425
Size:	41.5 KB
ID:	492701

    did anybody else have this problem and found a solution?

    Originally posted by molnart

    was there ever a solution for this? I am too struggling with getting smart data without sudo installed

    maybe you can define an alias (or a script at /usr/local/bin for that matter) which does nothing except for executing the following command(s)?

    Leave a comment:


  • moooola
    commented on 's reply
    I am using Zabbix Agent2 7.0.3 on a Windows10 Pro 64bit host. I also had the same error and after some trial and error I got the behavior I wanted with this:

    Plugins.Smart.Path=C:/Program Files/smartmontools/bin/smartctl.exe

  • molnart
    replied
    Originally posted by mhk

    same here.
    the thing I found in regards to Proxmox - that I can't find a way to undo - is this:

    from the agent log:
    Code:
    2022/08/23 15:30:52.485456 executing direct exporter task for key 'smart.disk.discovery'
    2022/08/23 15:30:52.485476 [Smart] executing smartctl command: sudo -n smartctl --scan -j
    2022/08/23 15:30:52.485900 [Smart] command sudo -n smartctl --scan -j smartctl raw response: sh: 1: sudo: not found
    2022/08/23 15:30:52.485926 failed to execute direct exporter task for key 'smart.disk.discovery' error: 'Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..'
    Notice the "sudo -n smartctl --scan [...]" and then the "sh: 1: sudo: not found"
    which from my assumption is what causes it, since Proxmox runs on Debian as root (no sudo pre-installed) but the plugin always want's to call the discovery with sudo and hence fails...

    any idea (apart from installing sudo, which I really don't want tbh) how to tell the thing to call it's function without sudo?

    at least that's what I found out so far, trying to solve this
    was there ever a solution for this? I am too struggling with getting smart data without sudo installed

    Leave a comment:


  • whoops
    replied
    Is it possible to use it normally on windows?
    I am using Win11 as monitored host with Zabbix Agent 2 v7.0.3, and installed last smartmontools as default.

    Out of the box:
    17822:20240910:121835.500 discovery rule "win11host:smart.disk.discovery" became not supported: Failed to execute smartctl: failed to look up smartctl exec path: exec: "smartctl": executable file not found in %PATH%.

    Defined:
    Code:
    Plugins.Smart.Path="C:\Program Files\smartmontools\bin\smartctl.exe"
    at "C:\Program Files\Zabbix Agent 2\zabbix_agent2.d\plugins.d\smart.conf" and got:
    Cannot fetch data.: failed to scan for devices: failed to look up smartctl exec path: exec: ""C:\\Program Files\\smartmontools\\bin\\smartctl.exe"": file does not exist.


    Leave a comment:


  • Turbid
    replied
    Originally posted by ga6QWsJ2dVEF
    I am having some difficulty with this template as the smart.disk.discovery uses random RAID controllers to obtain the SMART stats (see output below), but the only one that works well for me is the scsi device type (specifically: -d scsi -- output not shown, but trust me, it works).

    Code:
     14:31:20.917075 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d 3ware,0 -j
    14:31:20.917116 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d areca,1 -j
    14:31:20.917161 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d sat -j
    14:31:20.917224 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d cciss,0 -j
    14:31:20.917281 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d sat -j
    14:31:20.917315 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d areca,1 -j
    14:31:20.917417 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d 3ware,0 -j
    14:31:20.917482 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d 3ware,0 -j
    14:31:20.917542 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d sat -j
    14:31:20.917592 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d areca,1 -j
    14:31:20.917704 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d cciss,0 -j
    14:31:20.917751 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d areca,1 -j
    14:31:20.917799 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d cciss,0 -j
    14:31:20.917865 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d 3ware,0 -j
    14:31:20.917952 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d 3ware,0 -j
    14:31:20.918036 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d cciss,0 -j
    14:31:20.918150 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d cciss,0 -j
    14:31:20.918316 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d sat -j
    14:31:20.918486 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d cciss,0 -j
    14:31:20.918588 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d sat -j
    14:31:20.918685 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d cciss,0 -j
    14:31:20.918733 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d sat -j
    14:31:20.918838 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d cciss,0 -j
    14:31:20.918925 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d areca,1 -j
    14:31:20.918971 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d sat -j
    14:31:20.919022 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d sat -j
    14:31:20.919139 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d areca,1 -j
    14:31:20.919192 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d 3ware,0 -j
    14:31:20.919263 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d cciss,0 -j
    14:31:20.919362 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d areca,1 -j
    14:31:20.919428 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d 3ware,0 -j
    14:31:20.919803 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d 3ware,0 -j
    14:31:20.919886 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d 3ware,0 -j
    14:31:20.919967 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d sat -j
    14:31:20.920031 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d 3ware,0 -j
    14:31:20.920094 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d cciss,0 -j
    14:31:20.920157 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d areca,1 -j
    14:31:20.920245 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d areca,1 -j
    14:31:20.920425 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d areca,1 -j
    14:31:20.920585 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d sat -j
    On the Zabbix template, I tried to set an LLD macro {#RAIDTYPE} and {#RAID} to scsi but this was unsuccessful. I tried to force the Zabbix Agent SMART Plugin executable path to include the full executable path with the -d scsi, and that did not work (don't recommend it, the smart.disk.discovery process keeps continuously looping.

    Any hints? This template gets the correct data only with the SCSI device type, but I have no idea how to set this on the Zabbix Agent2 config or on the Zabbix Template for the host.
    I encountered the same problem. According to the logs, the agent is in an infinite loop and the request smart.disk.discovery is timed out:

    Code:
    # sudo -u zabbix zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf -t smart.disk.discovery
    smart.disk.discovery [m|ZBX_NOTSUPPORTED] [Timeout occurred while gathering data.]
    #
    # journalctl -u zabbix-agent2 | tail
    сен 05 10:15:52 node-3-krd sudo[1271362]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdd -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271363]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdc -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271357]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdc -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271366]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdd -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271370]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdc -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271374]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdb -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271378]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdb -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271377]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdb -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271375]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdb -d cciss,0 -j
    сен 05 10:15:52 node-3-krd sudo[1271380]:   zabbix : PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a /dev/sdd -d cciss,0 -j


    ​Also someone encountered this on reddit: https://www.reddit.com/r/zabbix/comm...oring_timeout/

    P.S. https://support.zabbix.com/browse/ZBX-25181
    Last edited by Turbid; 05-09-2024, 10:31.

    Leave a comment:


  • Gomo
    replied
    I would appreciate if someone could take a look at my issue posted here https://www.zabbix.com/forum/zabbix-...-agent-2-issue and assist. Thanks!

    Leave a comment:

Working...