Ad Widget

Collapse

Discussion thread for official Zabbix SMART Disk monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • robsitz
    Junior Member
    • Jan 2021
    • 4

    #31
    Originally posted by max.ch.88

    If you don't have any unsupported items or LLD errors, the template works correctly.
    Hi max.ch.88

    OK, I will try to monitor an old HDD with errors in SMART data and I will observe Zabbix behavior through smartmontools.

    Thank you.

    Regards
    RS

    Comment

    • pax0707
      Junior Member
      • Mar 2022
      • 6

      #32
      SanDisk SDDs

      Code:
      Warning    PROBLEM    Attribute discovery: SMART [sda]: Attribute 244 Thermal_Throttle is failed
      The trigger is expecting an integer.
      Code:
      ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0
      9 Power_On_Hours 0x0032 100 100 --- Old_age Always - 16233
      12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 16
      165 Total_Write/Erase_Count 0x0032 100 100 --- Old_age Always - 25689
      166 Min_W/E_Cycle 0x0032 100 100 --- Old_age Always - 125
      167 Min_Bad_Block/Die 0x0032 100 100 --- Old_age Always - 52
      168 Maximum_Erase_Cycle 0x0032 100 100 --- Old_age Always - 159
      169 Total_Bad_Block 0x0032 100 100 --- Old_age Always - 185
      170 Unknown_Marvell_Attr 0x0032 100 100 --- Old_age Always - 0
      171 Program_Fail_Count 0x0032 100 100 --- Old_age Always - 0
      172 Erase_Fail_Count 0x0032 100 100 --- Old_age Always - 0
      173 Avg_Write/Erase_Count 0x0032 100 100 --- Old_age Always - 125
      174 Unexpect_Power_Loss_Ct 0x0032 100 100 --- Old_age Always - 9
      184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
      187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
      188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0
      194 Temperature_Celsius 0x0022 072 046 --- Old_age Always - 28 (Min/Max 0/46)
      199 SATA_CRC_Error 0x0032 100 100 --- Old_age Always - 0
      230 Perc_Write/Erase_Count 0x0032 100 100 --- Old_age Always - 18727 6400 18727
      232 Perc_Avail_Resrvd_Space 0x0033 100 100 --- Pre-fail Always - 100
      233 Total_NAND_Writes_GiB 0x0032 100 100 --- Old_age Always - 31014
      234 Perc_Write/Erase_Ct_BC 0x0032 100 100 --- Old_age Always - 204712
      241 Total_Writes_GiB 0x0030 100 100 --- Old_age Offline - 69829
      242 Total_Reads_GiB 0x0030 100 100 --- Old_age Offline - 39323
      244 Thermal_Throttle 0x0032 000 100 --- Old_age Always - 0


      Code:
          PROBLEM        SMART [sda]: Attribute 244 Unknown_Attribute is failed
      Code:
          PROBLEM        SMART [sda]: Attribute 9 Power_On_Hours_and_Msec is failed
      Code:
      ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      5 Retired_Block_Count 0x0032 100 100 --- Old_age Always - 0
      9 Power_On_Hours_and_Msec 0x0032 000 100 --- Old_age Always - 9211h+00m+00.000s
      12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 5032
      165 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 61749752249210
      166 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 64
      167 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
      168 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 188
      169 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
      170 Reserve_Block_Count 0x0032 100 100 --- Old_age Always - 0
      171 Program_Fail_Count 0x0032 100 100 --- Old_age Always - 0
      172 Erase_Fail_Count 0x0032 100 100 --- Old_age Always - 0
      173 Unknown_SandForce_Attr 0x0032 100 100 --- Old_age Always - 141
      174 Unexpect_Power_Loss_Ct 0x0032 100 100 --- Old_age Always - 40
      187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
      188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0
      194 Temperature_Celsius 0x0022 066 032 --- Old_age Always - 34 (Min/Max 0/68)
      199 SATA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0
      230 Life_Curve_Status 0x0032 100 100 --- Old_age Always - 21797091414995
      232 Available_Reservd_Space 0x0033 100 100 --- Pre-fail Always - 100
      233 SandForce_Internal 0x0032 100 100 --- Old_age Always - 33629
      234 SandForce_Internal 0x0032 100 100 --- Old_age Always - 142790
      241 Lifetime_Writes_GiB 0x0030 253 253 --- Old_age Offline - 62178
      242 Lifetime_Reads_GiB 0x0030 253 253 --- Old_age Offline - 108956
      244 Unknown_Attribute 0x0032 000 100 --- Old_age Always - 0

      Comment

      • solution
        Senior Member
        • Jun 2020
        • 269

        #33
        after updating the template to:
        "zabbix_export: version: '6.0' date: '2022-04-06T19:34:02Z'..." https://git.zabbix.com/projects/ZBX/...at=release/6.0

        without items and the error appears:
        Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
        Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".

        In host --> discovery rules --> info


        I already did "delete and clear" but the same error appears
        Last edited by solution; 23-04-2022, 22:32.

        Comment

        • damage
          Junior Member
          • Apr 2022
          • 3

          #34
          Originally posted by solution
          after updating the template to:
          "zabbix_export: version: '6.0' date: '2022-04-06T19:34:02Z'..." https://git.zabbix.com/projects/ZBX/...at=release/6.0

          without items and the error appears:
          Cannot create item: item with the same key "smart.disk.get[{#PATH},"{#RAIDTYPE}"]" already exists.
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".
          Cannot accurately apply filter: no value received for macro "{#ATTRIBUTES}".

          In host --> discovery rules --> info


          I already did "delete and clear" but the same error appears
          Same here for version 5.0 (yeah, should update, I know ). Anyway, I noticed a heavy change in GIT on 29 Mar 2022. Using a version prior to this date "solved" the problem.

          Yet, i haven't investigated that further. Maybe someone can explain what happend?

          regards
          Daniel

          EDIT: And be aware that the Macros $SMART.DISK.NAME.MATCHES and $SMART.ATTRIBUTE.ID.MATCHES MUST be set in versions before 29 Mar 2022. I've set both to "^.*$" but to be honest without knowing what $SMART.ATTRIBUTE.ID.MATCHES exactly does.
          Last edited by damage; 24-04-2022, 18:47.

          Comment

          • Colttt
            Senior Member
            Zabbix Certified Specialist
            • Mar 2009
            • 878

            #35
            it would be very nice if someone can add this also to zabbix-agent, because in freebsd I also want to monitor the smart values
            Debian-User

            Sorry for my bad english

            Comment

            • solution
              Senior Member
              • Jun 2020
              • 269

              #36
              I use this script and template: https://github.com/v-zhuravlev/zbx-smartctl
              It works on Windows and Linux.

              I'm just testing this native Zabbix because the script/template above in Zabbix 6 triggers not working properly and I don't know how to solve it. And the project has no more updates/answers for a while.
              But I'm not using the native Zabbix template because it's unreadable. It is very complex to read the data.

              **Sorry my bad english, translate using google translator.

              Comment

              • fl0w
                Junior Member
                • Aug 2021
                • 5

                #37
                Hi,
                I still have an issue on a fresh install on Rocky Linux 8.5 + zabbix-agent2-6.0.4-1.el8.x86_64, I've set up the sudoers file like below:
                Code:
                Defaults:zabbix !requiretty
                zabbix ALL=(ALL) NOPASSWD:/usr/sbin/smartctl
                I've succesfully ran smartctl from zabbix user (after adding /bin/bash in /etc/passwd for testing):
                Code:
                bash-4.4$ sudo smartctl -a /dev/sda
                smartctl 7.1 2020-04-05 r5049 [x86_64-linux-5.17.7-1.el8.elrepo.x86_64] (local build)
                Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
                
                === START OF INFORMATION SECTION ===
                Device Model: TOSHIBA MD04ACA600
                But running zabbix_get from the Zabbix server gets the following error:
                Code:
                bash-4.4$ zabbix_get -s host.domain.tld -k smart.disk.discovery
                ZBX_NOTSUPPORTED: Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..
                Any idea?
                Thanks

                Comment

                • dyndan
                  Junior Member
                  • May 2022
                  • 13

                  #38
                  Hi,
                  I'm having a problem making this thing working on Linux.
                  However, it works out of the box on Windows.

                  Tested on Proxmox V7.2.3
                  Zabbix Agent 2 V6.0.4
                  Smartmontools V7.2 r5155

                  Template: SMART by Zabbix agent 2 active

                  To make thing easier to debug, I'm running Zabbix agent as root so I shouldn't have any permission issue
                  Code:
                  ps aux | grep zabbix
                  root 2294695 0.1 0.0 1316324 16412 ? Ssl 12:11 0:00 /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf
                  What I get in the agent log is as follows
                  Code:
                  2022/05/17 12:12:24.003012 check 'smart.disk.get' is not supported: Failed to execute smartctl: Command execution failed: exit status 127.
                  2022/05/17 12:13:24.002925 check 'smart.disk.get' is not supported: Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..
                  2022/05/17 12:14:24.001999 check 'smart.disk.get' is not supported: Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..
                  Nothing relevant on syslog

                  Any idea?
                  Thanks








                  Comment


                  • bcnx
                    bcnx commented
                    Editing a comment
                    I have the exact same problem on Proxmox VE 6.4. It works fine on Ubuntu. No idea why, I'm using an alternative way of NMVe monitoring until I find a solution.
                • fl0w
                  Junior Member
                  • Aug 2021
                  • 5

                  #39
                  Found, Zabbix is calling smartctl with this path: /usr/local/bin/smartctl which doesn't exist.
                  I've created a symlink :
                  Code:
                  ln -s /usr/sbin/smartctl /usr/local/bin/smartctl
                  And my sudoers is like this:
                  Code:
                  Defaults:zabbix !requiretty
                  zabbix ALL=(ALL) NOPASSWD:/usr/sbin/smartctl
                  Now it works.

                  Comment


                  • bcnx
                    bcnx commented
                    Editing a comment
                    You can use a parameter in the config file for the Zabbix Agent 2. You can point to the actual position of your smartctl binary. Less messy than a symlink.
                • dyndan
                  Junior Member
                  • May 2022
                  • 13

                  #40
                  Originally posted by dyndan
                  Hi,
                  I'm having a problem making this thing working on Linux.
                  However, it works out of the box on Windows.

                  Tested on Proxmox V7.2.3
                  Zabbix Agent 2 V6.0.4
                  Smartmontools V7.2 r5155

                  Template: SMART by Zabbix agent 2 active

                  To make thing easier to debug, I'm running Zabbix agent as root so I shouldn't have any permission issue
                  Code:
                  ps aux | grep zabbix
                  root 2294695 0.1 0.0 1316324 16412 ? Ssl 12:11 0:00 /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf
                  What I get in the agent log is as follows
                  Code:
                  2022/05/17 12:12:24.003012 check 'smart.disk.get' is not supported: Failed to execute smartctl: Command execution failed: exit status 127.
                  2022/05/17 12:13:24.002925 check 'smart.disk.get' is not supported: Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..
                  2022/05/17 12:14:24.001999 check 'smart.disk.get' is not supported: Cannot fetch data: Failed to scan for devices: Cannot unmarshal JSON: invalid character 's' looking for beginning of value..
                  Nothing relevant on syslog

                  Any idea?
                  Thanks
                  Got it,
                  sudo was missing.

                  Code:
                  Command [sudo -n smartctl -j -V] execution failed: exit status 127
                  sh: 1: sudo: not found
                  Installed sudo, and now it works.

                  Thank you
                  Last edited by dyndan; 19-05-2022, 13:51. Reason: Typo

                  Comment

                  • santiagobiali
                    Junior Member
                    • Apr 2022
                    • 5

                    #41
                    Hi, I'm having an issue where zabbixAgent2 is discovering only 1 disk in proxmox.

                    Code:
                    >zabbix_agent2 -V
                    zabbix_agent2 (Zabbix) 6.2.0beta2
                    Revision c8058d35892 10 May 2022, compilation time: May 10 2022 13:07:14
                    Compiled with OpenSSL 1.1.1k 25 Mar 2021
                    Running with OpenSSL 1.1.1n 15 Mar 2022
                    Code:
                    >uname -a
                    Linux proxmoxteste 5.15.35-1-pve [NODE="1"]Home[/NODE] SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) x86_64 GNU/Linux
                    Code:
                    >smartctl -V
                    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.35-1-pve] (local build)
                    Also tested building smarmontools 7.3, but got the same result.

                    Code:
                    > sudo -u zabbix smartctl --scan
                    /dev/sda -d scsi # /dev/sda, SCSI device
                    /dev/sdd -d scsi # /dev/sdd, SCSI device
                    /dev/sdf -d scsi # /dev/sdf, SCSI device
                    /dev/sdg -d scsi # /dev/sdg, SCSI device
                    /dev/sdh -d scsi # /dev/sdh, SCSI device
                    /dev/sdi -d scsi # /dev/sdi, SCSI device
                    /dev/sdk -d scsi # /dev/sdk, SCSI device
                    /dev/sdl -d scsi # /dev/sdl, SCSI device
                    /dev/sdm -d scsi # /dev/sdm, SCSI device
                    /dev/sdn -d scsi # /dev/sdn, SCSI device
                    /dev/sdo -d scsi # /dev/sdo, SCSI device
                    /dev/sdp -d scsi # /dev/sdp, SCSI device
                    /dev/sdq -d scsi # /dev/sdq, SCSI device
                    /dev/sdr -d scsi # /dev/sdr, SCSI device
                    As you can see above, zabbix's user can see all disks in smartctl --scan, but when I try to discover them in zabbix I get only sda:

                    Code:
                    >sudo -u zabbix zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf -t smart.disk.discovery
                    [{"{#NAME}": "sda sat",
                    "{#DISKTYPE}": "ssd",
                    "{#MODEL}": "KINGSTON SA400S37240G",
                    "{#SN}": "50026B778256BAD2",
                    "{#PATH}": "/dev/sda",
                    "{#RAIDTYPE}": "sat",
                    "{#ATTRIBUTES}": "Raw_Read _Error_Rate Power_On_Hours Power_Cycle_Count Write_Protect_Mode SATA_Phy_Error_Count Bad_Block_Rate Bad_Blk_Ct_Erl/Lat Erase_Fail_Count MaxAvgErase_Ct Program_Fail_Count Erase_Fail_Count Reported_Uncorrect Unsafe_Shutdown_Count Temperature_Celsius Reallocated_Event_Count SATA_CRC_Error_Count CRC_Error_Count SSD_Life_Left Flash_Writes_GiB Lifetime_Writes_GiB Lifetime_Reads_GiB Average_Erase_Count Max_Erase_Count Total_Erase_Count Total_Erase_Count"}]
                    "sudo -u zabbix zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf -t smart.attribute.discovery" also returns only sda data.

                    Comment

                    • santiagobiali
                      Junior Member
                      • Apr 2022
                      • 5

                      #42
                      Hey, zabbix should use the disk's serial number instead of the name (sda/sdb/sdb) as it's unique ID, as the name can change in a lot of situations.

                      Comment

                      • rdziwinski
                        Junior Member
                        • Jun 2022
                        • 1

                        #43
                        Hello,
                        i have the same issue as solution.

                        5x (number of disk in server)
                        Code:
                        Cannot create item: item with the same key "smart.disk.get[]" already exists.
                        All seems ok:

                        Code:
                        root@zabbix-server:~# zabbix_get -s agent.com -k "smart.disk.get" -p 10050 #return data
                        [email protected]:~# sudo -u zabbix smartctl --scan # list disks
                        USER=root ; COMMAND=/usr/sbin/smartctl -a /dev/bus/0 -d megaraid,2 -j # execute well
                        Zabbix Server 6.0.4
                        Zabbix Agent 6.0.4
                        Smartmontools 7.1


                        Comment

                        • Codename_Pineapple
                          Junior Member
                          • Jun 2022
                          • 1

                          #44
                          Hi there!

                          Will be support for SAS disks added someday?

                          Comment

                        • bcnx
                          Junior Member
                          • Jan 2011
                          • 19

                          #45
                          Hi,
                          I have problems with NVMe drives on a Linux server. Most items work fine, but "Self-test passed" is greyed out in Latest data. In the server logfile I have 2 sorts of error messages:

                          15921:20220701:164122.410 error reason for "WKSTU:smart.disk.test[nvme9]" changed: Preprocessing failed for: [{"device":{"info_name":"/dev/nvme3","name":"/dev/nvme3","protocol":"NVMe","type":"nvme"},"disk_n...
                          1. Failed: cannot extract value from json by path "$[?(@.disk_name=='nvme9')].ata_smart_data.self_test.status.passed.first()": no data matches the specified path

                          15921:20220701:163821.963 error reason for "WKSTU:smart.disk.test[nvme5]" changed: Preprocessing failed for: [{"ata_sct_capabilities":{"data_table_supported" :tr ue,"error_recovery_control_supported":true,"fe...
                          1. Failed: cannot extract value from json by path "$[?(@.disk_name=='nvme5')].ata_smart_data.self_test.status.passed.first()": no data matches the specified path

                          I've tried several versions of smartmontools and had smartctl run with root privileges, to no avail. SATA drives seem to work fine.

                          Any has the seem problem? I'm on Zabbix 5.4,

                          Cheers,

                          BC

                          Comment

                          Working...