Ad Widget

Collapse

Discussion thread for official Zabbix SMART Disk monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • PavelZ
    Senior Member
    • Dec 2024
    • 162

    #106
    I noticed another obvious problem:
    You show the disk /dev/sda Raid 1 Volume, i.e. consisting of two physical ones.
    If smartctl returns one set of indicators. Obviously, it cannot combine two sets of data.
    The device should not be called /dev/sda.
    Usually for Windows, disks are polled separately with the -d driver specified.
    There is some key or naming convention there.

    "scsi_vendor": "Intel",\r\n "scsi_product": "Raid 1 Volume",\r\n
    Try to find the correct command. It will be something like:
    smartctl -a /dev/csmi0,1​

    I do not have this exact controller model and it is difficult for me to give a ready-made recipe. But I hope this information will help you.​

    Comment

    • PavelZ
      Senior Member
      • Dec 2024
      • 162

      #107
      @INFinite Please note that the template is designed to exclude some devices that do not support queries. There are two macros {$SMART.DISK.NAME.MATCHES} and {$SMART.DISK.NAME.NOT_MATCHES}
      If you exclude /dev/sda, the errors will disappear.

      But does smartctl --scan also output other devices on its list? You need to make sure of this.

      Comment

      • INFinite
        Junior Member
        • Feb 2025
        • 10

        #108
        Originally posted by PavelZ
        @INFinite Please note that the template is designed to exclude some devices that do not support queries. There are two macros {$SMART.DISK.NAME.MATCHES} and {$SMART.DISK.NAME.NOT_MATCHES}
        If you exclude /dev/sda, the errors will disappear.

        But does smartctl --scan also output other devices on its list? You need to make sure of this.
        Click image for larger version

Name:	image.png
Views:	484
Size:	19.8 KB
ID:	499997
        I admit that I could have done something wrong, so I'll give you a screenshot. If everything is correct, then the result has not changed.

        C:\Windows\system32>smartctl --scan
        /dev/sda -d scsi # /dev/sda, SCSI device
        /dev/sdb -d ata # /dev/sdb, ATA device
        /dev/sdc -d sat # /dev/sdc [SAT], ATA device
        /dev/csmi0,0 -d ata # /dev/csmi0,0, ATA device
        /dev/csmi0,1 -d ata # /dev/csmi0,1, ATA device
        /dev/nvme0 -d nvme # /dev/nvme0, NVMe device

        There's a regular intel soft raid. In Agent version 6.4, it perfectly sees the disks behind this raid.
        Here is the output from Agent 6.4:
        Code:
        C:\Windows\system32>"C:\Program Files\Zabbix Agent 2\zabbix_agent2.exe" -c "C:\P
        rogram Files\Zabbix Agent 2\zabbix_agent2.conf" -t smart.disk.discovery
        smart.disk.discovery                          [s|[{"{#NAME}":"csmi0,0","{#DISKTY
        PE}":"ssd","{#MODEL}":"KINGSTON SV300S37A60G","{#SN}":"50026B7246018A2F","{#PATH
        }":"/dev/csmi0,0","{#RAIDTYPE}":"","{#ATTRIBUTES}":"Raw_Read_Error_Rate Retired_
        Block_Count Power_On_Hours_and_Msec Power_Cycle_Count Program_Fail_Count Erase_F
        ail_Count Unexpect_Power_Loss_Ct Wear_Range_Delta Program_Fail_Count Erase_Fail_
        Count Reported_Uncorrect Airflow_Temperature_Cel Temperature_Celsius ECC_Uncorr_
        Error_Count Reallocated_Event_Count Unc_Soft_Read_Err_Rate Soft_ECC_Correct_Rate
         Life_Curve_Status SSD_Life_Left SandForce_Internal SandForce_Internal Lifetime_
        Writes_GiB Lifetime_Reads_GiB"},{"{#NAME}":"csmi0,1","{#DISKTYPE}":"ssd","{#MODE
        L}":"KINGSTON SV300S37A60G","{#SN}":"50026B7246018A2C","{#PATH}":"/dev/csmi0,1",
        "{#RAIDTYPE}":"","{#ATTRIBUTES}":"Raw_Read_Error_Rate Retired_Block_Count Power_
        On_Hours_and_Msec Power_Cycle_Count Program_Fail_Count Erase_Fail_Count Unexpect
        _Power_Loss_Ct Wear_Range_Delta Program_Fail_Count Erase_Fail_Count Reported_Unc
        orrect Airflow_Temperature_Cel Temperature_Celsius ECC_Uncorr_Error_Count Reallo
        cated_Event_Count Unc_Soft_Read_Err_Rate Soft_ECC_Correct_Rate Life_Curve_Status
         SSD_Life_Left SandForce_Internal SandForce_Internal Lifetime_Writes_GiB Lifetim
        e_Reads_GiB"},{"{#NAME}":"nvme0","{#DISKTYPE}":"nvme","{#MODEL}":"","{#SN}":"","
        {#PATH}":"/dev/nvme0","{#RAIDTYPE}":"","{#ATTRIBUTES}":""},{"{#NAME}":"sdc","{#D
        ISKTYPE}":"hdd","{#MODEL}":"ST1000LM024 HN-M101MBB","{#SN}":"S30CJ9CFC09569","{#
        PATH}":"/dev/sdc","{#RAIDTYPE}":"","{#ATTRIBUTES}":"Raw_Read_Error_Rate Throughp
        ut_Performance Spin_Up_Time Start_Stop_Count Reallocated_Sector_Ct Seek_Error_Ra
        te Seek_Time_Performance Power_On_Hours Spin_Retry_Count Calibration_Retry_Count
         Power_Cycle_Count G-Sense_Error_Rate Power-Off_Retract_Count Temperature_Celsiu
        s Hardware_ECC_Recovered Reallocated_Event_Count Current_Pending_Sector Offline_
        Uncorrectable UDMA_CRC_Error_Count Multi_Zone_Error_Rate Load_Retry_Count Load_C
        ycle_Count"}]]
        Today, once again trying to find a solution on the Internet, I came across bug report ZBX-26098 with the same problem.​​

        Comment

        • PavelZ
          Senior Member
          • Dec 2024
          • 162

          #109
          Today, once again trying to find a solution on the Internet, I came across bug report ZBX-26098 with the same problem.​https://support.zabbix.com/projects/...sues/ZBX-26098
          A very similar problem. Similar configuration with Intel software raid 1

          Well, what's the problem with a certain number of unsupported items? Other disks are being queried.​

          Comment

          • INFinite
            Junior Member
            • Feb 2025
            • 10

            #110
            Originally posted by PavelZ
            Well, what's the problem with a certain number of unsupported items? Other disks are being queried.
            I tried to exclude all disks, but it's still an error
            Code:
            Cannot fetch data.: got error executing worker pool: smartctl returned error: unknown error from smartctl.

            Comment

            • PavelZ
              Senior Member
              • Dec 2024
              • 162

              #111
              This filter works during autodiscovery /dev/sda polling will still not work.
              You need to delete the problematic items and run discovery again.

              Comment

              • INFinite
                Junior Member
                • Feb 2025
                • 10

                #112
                Originally posted by PavelZ
                This filter works during autodiscovery /dev/sda polling will still not work.
                You need to delete the problematic items and run discovery again.
                I disconnected the template from the node with the data deletion, reconnected it, and configured the filter and started detection. The error is the same.
                If I got something wrong, could you describe in more detail what needs to be done.

                Comment

                • PavelZ
                  Senior Member
                  • Dec 2024
                  • 162

                  #113
                  As usual, you need to take things apart and check them piece by piece:

                  autodiscovery should return list of device name as json.
                  filters should filter out /dev/sda and other problematic devices.
                  master Items should be created only for those disks that smartctl can query.
                  master items should be queried without errors
                  dependent items should be calculated from json.

                  Comment

                  • AlexPRN
                    Junior Member
                    • Mar 2025
                    • 1

                    #114
                    Hello.
                    Current Agent version on Windows aren't work with SMART. When I run
                    Code:
                    C:\Program Files\Zabbix Agent 2>zabbix_agent2.exe -c zabbix_agent2.conf -t smart.disk.discovery
                    on 7.2.x verisons it returns:
                    Code:
                    smart.disk.discovery                          [m|ZBX_NOTSUPPORTED] [Cannot fetch data.: got error executing worker pool: failed to execute smartctl: "{\r\n  \"json_format_version\": [\r\n  ................... : exit status 32.]
                    But on 6.2.9 Agent everything works fine. What's the reason, anyone know?

                    Comment

                    • PavelZ
                      Senior Member
                      • Dec 2024
                      • 162

                      #115
                      AlexPRN
                      We discussed similar issues a bit above. Could you please clarify what specific disks you have? Are you using raid? What is the operating system version?
                      Are you updating the template inside zabbix server when update agent?

                      Comment

                      • teiteoer
                        Junior Member
                        • Mar 2025
                        • 1

                        #116
                        Hello everyone,

                        I'm trying to use smart.disk.discovery from the SMART by Zabbix agent 2 template, but I'm getting the following error:

                        cannot parse response: cannot find pair with name "value"
                        My system has 2x NVMe drives and 1x SATA SSD.

                        Here’s what I get when running smartctl -a -j:
                        • For the SATA SSD (/dev/sda):
                        Code:
                        "smart_status": {
                        "passed": true
                        }
                        • For /dev/nvme0:

                        Code:
                        "smart_status": {
                        "passed": true,
                        "nvme": {
                        "value": 0
                        }
                        }
                        • For /dev/nvme1:​

                        Code:
                        "smart_status": {
                        "passed": false,
                        "nvme": {
                        "value": 4,
                        "spare_below_threshold": false,
                        "temperature_above_or_below_threshold": false,
                        "reliability_degraded": true,
                        "media_read_only": false,
                        "volatile_memory_backup_failed": false,
                        "persistent_memory_region_unreliable": false,
                        "other": 0
                        }
                        }

                        It seems like Zabbix expects a "value" field directly under "smart_status", but NVMe devices return a nested structure instead.

                        Has anyone encountered this issue or found a workaround?
                        Is this a known limitation, or is there something I can adjust in the template or item configuration to handle NVMe responses properly?

                        Any guidance would be appreciated!

                        Thanks in advance!

                        Comment

                        • PavelZ
                          Senior Member
                          • Dec 2024
                          • 162

                          #117
                          Don't you want to make sure the zabbix agent version match template version first? I don't see any version information from you.
                          It would be stupid to do complex debugging without checking something simple.

                          Comment

                          • PavelZ
                            Senior Member
                            • Dec 2024
                            • 162

                            #118
                            "smart_status": {
                            "passed": false,
                            You noticed that there is a failure here, right?
                            This is certainly not encouraging. It is possible that Zabbix does not have enough different broken hardware to test the agent's behavior in such situations.

                            I suggest recording agent log and the full outputs of all commands so that they can be simulated after replacing the disk.
                            Last edited by PavelZ; 26-03-2025, 22:17.

                            Comment

                            • anmg
                              Junior Member
                              • Sep 2021
                              • 14

                              #119
                              Hello!

                              Can not make to gather data from drives on Adaptec RAID controller.
                              scan command does not provide useful information
                              Code:
                              # sudo -u zabbix smartctl --scan
                              /dev/sda -d scsi # /dev/sda, SCSI device
                              /dev/sdb -d scsi # /dev/sdb, SCSI device
                              check RAID controller type:
                              Code:
                              # lspci | grep RAID
                              02:00.0 RAID bus controller: Adaptec Series 6 - 6G SAS/PCIe 2 (rev 01)
                              use lsscsi to determine actual drives' names. below is 2 separate Logical Volumes RAID1 each.
                              Code:
                              # lsscsi -g
                              [0:0:0:0] disk Adaptec spegel V1.0 /dev/sda /dev/sg0
                              [0:0:1:0] disk Adaptec LogicalDrv 1 V1.0 /dev/sdb /dev/sg1
                              [0:1:0:0] disk ST1000NM0011 SN03 - /dev/sg2
                              [0:1:1:0] disk ST1000DM003-9YN1 CC46 - /dev/sg3
                              [0:1:2:0] disk WDC WD8004FRYZ-01VAE 01.0 - /dev/sg4
                              [0:1:3:0] disk WDC WD8004FRYZ-01VAE 01.0 - /dev/sg5
                              [0:3:0:0] enclosu ADAPTEC Virtual SGPIO - /dev/sg6

                              dmesg output
                              Code:
                              # dmesg | grep scsi
                              [    0.668768] scsi host1: ahci
                              [    0.669251] scsi host2: ahci
                              [    0.672367] scsi host3: ahci
                              [    0.673256] scsi host4: ahci
                              [    0.673573] scsi host5: ahci
                              [    0.673970] scsi host6: ahci
                              [    1.138269] scsi host0: aacraid
                              [    1.139214] scsi 0:0:0:0: Direct-Access     Adaptec  spegel           V1.0 PQ: 0 ANSI: 2
                              [    1.140236] scsi 0:0:1:0: Direct-Access     Adaptec  LogicalDrv 1     V1.0 PQ: 0 ANSI: 2
                              [    1.153685] scsi 0:1:0:0: Direct-Access              ST1000NM0011     SN03 PQ: 1 ANSI: 5
                              [    1.156229] scsi 0:1:1:0: Direct-Access              ST1000DM003-9YN1 CC46 PQ: 1 ANSI: 5
                              [    1.158421] scsi 0:1:2:0: Direct-Access     WDC      WD8004FRYZ-01VAE 01.0 PQ: 1 ANSI: 5
                              [    1.160658] scsi 0:1:3:0: Direct-Access     WDC      WD8004FRYZ-01VAE 01.0 PQ: 1 ANSI: 5
                              [    1.191221] scsi 0:3:0:0: Enclosure         ADAPTEC  Virtual SGPIO         PQ: 0 ANSI: 5
                              [    1.227400] sd 0:0:0:0: Attached scsi generic sg0 type 0
                              [    1.227805] sd 0:0:1:0: Attached scsi generic sg1 type 0
                              [    1.228399] scsi 0:1:0:0: Attached scsi generic sg2 type 0
                              [    1.230211] scsi 0:1:1:0: Attached scsi generic sg3 type 0
                              [    1.230954] scsi 0:1:2:0: Attached scsi generic sg4 type 0
                              [    1.232649] scsi 0:1:3:0: Attached scsi generic sg5 type 0
                              [    1.232951] scsi 0:3:0:0: Attached scsi generic sg6 type 13
                              ​
                              And finally I can get actual information from smartctl by for disk name /dev/sg2 - /dev/sg5
                              Code:
                              # smartctl -d sat --all /dev/sg2


                              lsscsi is to be installed, it was not preinstalled on Ubuntu 24.04.

                              So may be add more RAID controllers discovery logic to SMART plugin?

                              Also any suggestion how to use filters in Zabbix templates? will it be helpful if smartctl --scan does not return valid disk names?

                              Comment

                              • PavelZ
                                Senior Member
                                • Dec 2024
                                • 162

                                #120
                                scan command does not provide useful information
                                # sudo -u zabbix smartctl --scan
                                I thought that zabbix agent2 runs smarctl --scan -d several times according to the number of drivers it knows. Isn't that so?

                                Let's update all programs, enable agent logs and see it.

                                Comment

                                Working...