Ad Widget

Collapse

Discussion thread for official Zabbix SMART Disk monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bcnx
    Junior Member
    • Jan 2011
    • 19

    #46
    I think I know the problem: these commands output a very different output. It seems NVMe drives are less suitable to be monitored by SMART:

    # smartctl -a /dev/nvme0 -j
    # smartctl -a /dev/sda -j

    Not sure how to have NVMes output the same info. Maybe I need to monitor those by using nvme-cli (https://github.com/narbehaj/zabbix-nvme)

    Cheers,

    BC

    Comment

    • bcnx
      Junior Member
      • Jan 2011
      • 19

      #47
      I have a problem that seems to be purely related to Proxmox VE servers (version 6.4 and 7.x as reported by a user in a previous post). The problem becomes apparent wit this command:

      Code:
      /usr/local/sbin/zabbix_agent2 -c /usr/local/etc/zabbix_agent2.conf -t smart.disk.discovery
      smart.disk.discovery [m|ZBX_NOTSUPPORTED] [Failed to execute smartctl: Command execution failed: exit status 127.]
      The Zabbix Agent 2 is running as root, so no permission problems. Using smartmontools 7.1.

      I did a ton of troubleshooting, but to no avail. Any input is appreciated.

      Cheers

      BC

      Comment

      • mhk
        Junior Member
        • Dec 2020
        • 9

        #48
        I have a problem that seems to be purely related to Proxmox VE servers (version 6.4 and 7.x as reported by a user in a previous post).
        same here.
        the thing I found in regards to Proxmox - that I can't find a way to undo - is this:

        from the agent log:
        Code:
        2022/08/23 15:30:52.485456 executing direct exporter task for key 'smart.disk.discovery'
        2022/08/23 15:30:52.485476 [Smart] executing smartctl command: sudo -n smartctl --scan -j
        2022/08/23 15:30:52.485900 [Smart] command sudo -n smartctl --scan -j smartctl raw response: sh: 1: sudo: not found
        2022/08/23 15:30:52.485926 failed to execute direct exporter task for key 'smart.disk.discovery' error: 'Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..'
        Notice the "sudo -n smartctl --scan [...]" and then the "sh: 1: sudo: not found"
        which from my assumption is what causes it, since Proxmox runs on Debian as root (no sudo pre-installed) but the plugin always want's to call the discovery with sudo and hence fails...

        any idea (apart from installing sudo, which I really don't want tbh) how to tell the thing to call it's function without sudo?

        at least that's what I found out so far, trying to solve this
        ZBX_NOTSUPPORTED: Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..
        Last edited by mhk; 23-08-2022, 16:20.

        Comment

        • Ch77
          Junior Member
          • Sep 2022
          • 3

          #49
          Can somebody explain to me what it means? I have a trigger error "SMART [sda]: Some command to the disk failed" and Exit status - 4 - "Bit 4: We found prefail Attributes <= threshold."

          Comment

          • doogie
            Junior Member
            • May 2020
            • 15

            #50
            Originally posted by Ch77
            Can somebody explain to me what it means? I have a trigger error "SMART [sda]: Some command to the disk failed" and Exit status - 4 - "Bit 4: We found prefail Attributes <= threshold."
            I'm getting this error as well, but only on Debian servers for some reason. All the CentOS boxes seem to be working fine. They're all the same - Dell servers with Megaraid and all seems fine from smartctl side.

            Comment

            • Ch77
              Junior Member
              • Sep 2022
              • 3

              #51
              I added SMART Template at the second Debian server and I received this
              Code:
              Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
              Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
              However there is no items with key such "smart.*"​


              Comment

              • Ch77
                Junior Member
                • Sep 2022
                • 3

                #52
                Originally posted by Ch77
                I added SMART Template at the second Debian server and I received this
                Code:
                Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
                Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                However there is no items with key such "smart.*"​

                Problem was in zabbix agent version. I upgrade from 5.4 to 6.4 and problem was gone

                Comment

                • rob4ikomg
                  Junior Member
                  • Oct 2022
                  • 1

                  #53
                  Hello all

                  A have host "Windows 10" :
                  --- zabbix agent (Active). Version: "zabbix_agentd Win64 (service) (Zabbix) 6.2.3". Location "C:\monitoring\zabbix\zabbix_agent.conf".
                  --- Smart tool "smartmontools". Version: "smartctl 7.3 2022-02-28 r5338 [x86_64-w64-mingw32-w10-21H2] (sf-7.3-1)". Location: "C:\Program Files\smartmontools\bin\smartctl.exe".

                  Code:
                  zabbix agent config:
                  LogFile=c:\monitoring\zabbix\zabbix_agentd.log
                  DebugLevel=3
                  LogFileSize=1
                  
                  StartAgents=0
                  
                  ServerActive=178.151.*.*:*
                  Hostname=home
                  RefreshActiveChecks=61
                  
                  Timeout=30
                  ​


                  I added a host to the zabbix server, but I get an error:
                  Click image for larger version

Name:	Screenshot_2.png
Views:	3225
Size:	33.9 KB
ID:	452755
                  Click image for larger version

Name:	Screenshot_3.png
Views:	3133
Size:	50.0 KB
ID:	452756

                  Code:
                  zabbix_agent.log
                  
                    7364:20221016:074123.806 Starting Zabbix Agent [home]. Zabbix 6.2.3 (revision 98ee88fc19d).
                    7364:20221016:074123.807 **** Enabled features ****
                    7364:20221016:074123.807 IPv6 support:          YES
                    7364:20221016:074123.807 TLS support:           YES
                    7364:20221016:074123.808 **************************
                    7364:20221016:074123.808 using configuration file: C:\monitoring\zabbix\zabbix_agent.conf
                    7364:20221016:074124.251 agent #0 started [main process]
                   14892:20221016:074124.252 agent #1 started [collector]
                   16404:20221016:074124.252 agent #2 started [active checks #1]
                   16404:20221016:074138.565 active check "smart.disk.discovery" is not supported: Unsupported item key.
                   16404:20221016:074148.857 active check "smart.disk.discovery" is not supported: Unsupported item key.
                   16404:20221016:074158.991 active check "smart.disk.discovery" is not supported: Unsupported item key.
                   16404:20221016:074208.097 active check "smart.disk.discovery" is not supported: Unsupported item key.​

                  what am I doing wrong?

                  Comment

                  • ga6QWsJ2dVEF
                    Junior Member
                    • Nov 2022
                    • 2

                    #54
                    Originally posted by rob4ikomg
                    Hello all

                    A have host "Windows 10" :
                    --- zabbix agent (Active). Version: "zabbix_agentd Win64 (service) (Zabbix) 6.2.3". Location "C:\monitoring\zabbix\zabbix_agent.conf".
                    --- Smart tool "smartmontools". Version: "smartctl 7.3 2022-02-28 r5338 [x86_64-w64-mingw32-w10-21H2] (sf-7.3-1)". Location: "C:\Program Files\smartmontools\bin\smartctl.exe".

                    Code:
                    zabbix agent config:
                    LogFile=c:\monitoring\zabbix\zabbix_agentd.log
                    DebugLevel=3
                    LogFileSize=1
                    
                    StartAgents=0
                    
                    ServerActive=178.151.*.*:*
                    Hostname=home
                    RefreshActiveChecks=61
                    
                    Timeout=30
                    ​


                    I added a host to the zabbix server, but I get an error:
                    Click image for larger version

Name:	Screenshot_2.png
Views:	3225
Size:	33.9 KB
ID:	452755
                    Click image for larger version

Name:	Screenshot_3.png
Views:	3133
Size:	50.0 KB
ID:	452756

                    Code:
                    zabbix_agent.log
                    
                    7364:20221016:074123.806 Starting Zabbix Agent [home]. Zabbix 6.2.3 (revision 98ee88fc19d).
                    7364:20221016:074123.807 **** Enabled features ****
                    7364:20221016:074123.807 IPv6 support: YES
                    7364:20221016:074123.807 TLS support: YES
                    7364:20221016:074123.808 **************************
                    7364:20221016:074123.808 using configuration file: C:\monitoring\zabbix\zabbix_agent.conf
                    7364:20221016:074124.251 agent #0 started [main process]
                    14892:20221016:074124.252 agent #1 started [collector]
                    16404:20221016:074124.252 agent #2 started [active checks #1]
                    16404:20221016:074138.565 active check "smart.disk.discovery" is not supported: Unsupported item key.
                    16404:20221016:074148.857 active check "smart.disk.discovery" is not supported: Unsupported item key.
                    16404:20221016:074158.991 active check "smart.disk.discovery" is not supported: Unsupported item key.
                    16404:20221016:074208.097 active check "smart.disk.discovery" is not supported: Unsupported item key.​

                    what am I doing wrong?


                    Switch from zabbix-agentd to zabbix-agent2 , and it should work. I ran into the same problem. Use Zabbix Agent 2

                    Comment

                    • ga6QWsJ2dVEF
                      Junior Member
                      • Nov 2022
                      • 2

                      #55
                      I am having some difficulty with this template as the smart.disk.discovery uses random RAID controllers to obtain the SMART stats (see output below), but the only one that works well for me is the scsi device type (specifically: -d scsi -- output not shown, but trust me, it works).

                      Code:
                        14:31:20.917075 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d 3ware,0 -j
                        14:31:20.917116 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d areca,1 -j
                        14:31:20.917161 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d sat -j
                        14:31:20.917224 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d cciss,0 -j
                        14:31:20.917281 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d sat -j
                        14:31:20.917315 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d areca,1 -j
                        14:31:20.917417 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d 3ware,0 -j
                        14:31:20.917482 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d 3ware,0 -j
                        14:31:20.917542 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d sat -j
                        14:31:20.917592 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdf -d areca,1 -j
                        14:31:20.917704 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d cciss,0 -j
                        14:31:20.917751 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d areca,1 -j
                        14:31:20.917799 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sda -d cciss,0 -j
                        14:31:20.917865 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d 3ware,0 -j
                        14:31:20.917952 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d 3ware,0 -j
                        14:31:20.918036 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d cciss,0 -j
                        14:31:20.918150 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d cciss,0 -j
                        14:31:20.918316 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d sat -j
                        14:31:20.918486 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d cciss,0 -j
                        14:31:20.918588 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdb -d sat -j
                        14:31:20.918685 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d cciss,0 -j
                        14:31:20.918733 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d sat -j
                        14:31:20.918838 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d cciss,0 -j
                        14:31:20.918925 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d areca,1 -j
                        14:31:20.918971 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdi -d sat -j
                        14:31:20.919022 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d sat -j
                        14:31:20.919139 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdc -d areca,1 -j
                        14:31:20.919192 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d 3ware,0 -j
                        14:31:20.919263 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d cciss,0 -j
                        14:31:20.919362 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d areca,1 -j
                        14:31:20.919428 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d 3ware,0 -j
                        14:31:20.919803 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d 3ware,0 -j
                        14:31:20.919886 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdg -d 3ware,0 -j
                        14:31:20.919967 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d sat -j
                        14:31:20.920031 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d 3ware,0 -j
                        14:31:20.920094 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d cciss,0 -j
                        14:31:20.920157 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sde -d areca,1 -j
                        14:31:20.920245 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdh -d areca,1 -j
                        14:31:20.920425 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdj -d areca,1 -j
                        14:31:20.920585 [Smart] executing smartctl command: sudo -n smartctl -a /dev/sdd -d sat -j
                      ​
                      On the Zabbix template, I tried to set an LLD macro {#RAIDTYPE} and {#RAID} to scsi but this was unsuccessful. I tried to force the Zabbix Agent SMART Plugin executable path to include the full executable path with the -d scsi, and that did not work (don't recommend it, the smart.disk.discovery process keeps continuously looping.

                      Any hints? This template gets the correct data only with the SCSI device type, but I have no idea how to set this on the Zabbix Agent2 config or on the Zabbix Template for the host.

                      Comment

                      • 2b2bff
                        Junior Member
                        • Nov 2020
                        • 4

                        #56
                        Another input: The wear out of SSDs and NVMes is not recognized in all cases. Currently it is looked for "percentage used". But there are different representations for this depending on the disk used. Like:

                        233 Media_Wearout_Indicator 0x0032 082 082 000 Old_age Always - 0

                        or

                        202 Percent_Lifetime_Remain 0x0030 098 098 001 Old_age Offline - 2

                        So the template should be extended to cover them as well...

                        Cheers.

                        Comment

                        • 2b2bff
                          Junior Member
                          • Nov 2020
                          • 4

                          #57
                          And another observation: I have a SDD with one entry in the error log. Now it is forever marked as failing.
                          I think the trigger should rather be, that the error count in the log has been increasing. So you can ignore the error for the time being and only care about new ones...

                          Comment

                          • gmseeley
                            Junior Member
                            • Jan 2023
                            • 2

                            #58
                            The smart plugin should allow the filtering/configuration of the "raid" types as Proxmox/Debian will spam the system logs with:

                            Code:
                            program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
                            and is caused by the code invoking smartctl with "areca", "cciss" and "3ware" on systems where none of these raid devices are in use/exist.

                            I also think the plugin should have configuration to allow for optional sudo invocation of smartctl if the user running zabbix has permissions without sudo. Why? Again, it spams the logs with:

                            Code:
                            pam_unix(sudo:session): session closed for user root

                            Comment

                            • 2b2bff
                              Junior Member
                              • Nov 2020
                              • 4

                              #59
                              The discovery rule for Linux gets disks by name rather than id. But those names like /dev/sda can be changing depending on what order the OS detects them. IMHO it would be better to use /dev/disk/by-id because those don't change...

                              Comment

                              • ktt
                                Junior Member
                                • May 2017
                                • 7

                                #60
                                Originally posted by Ch77
                                I added SMART Template at the second Debian server and I received this
                                Code:
                                Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
                                Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
                                However there is no items with key such "smart.*"​

                                I'm having this same problem with version 5.2.7. On couple of Ubuntu 20.04 computers, SMART monitoring works well, but on one computer, I receive this error. All of them use the same 5.2.7 version of agent. They all are behind proxys.

                                When I issue this on local (problematic) computer's shell, the response is as it should be.
                                Code:
                                zabbix_get -s localhost -p 10050 -k smart.disk.discovery
                                [{"{#NAME}":"sda sat","{#DISKTYPE}":"HDD","{#MODEL}":"WDC WD2500BEVT-00A23T0","{#SN}":"WD-WX91A6043599"}]
                                ​

                                Comment

                                Working...