Ad Widget

Collapse

Discussion thread for official Zabbix SMART Disk monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • INFinite
    replied
    Originally posted by PavelZ
    @INFinite Please note that the template is designed to exclude some devices that do not support queries. There are two macros {$SMART.DISK.NAME.MATCHES} and {$SMART.DISK.NAME.NOT_MATCHES}
    If you exclude /dev/sda, the errors will disappear.

    But does smartctl --scan also output other devices on its list? You need to make sure of this.
    Click image for larger version

Name:	image.png
Views:	553
Size:	19.8 KB
ID:	499997
    I admit that I could have done something wrong, so I'll give you a screenshot. If everything is correct, then the result has not changed.

    C:\Windows\system32>smartctl --scan
    /dev/sda -d scsi # /dev/sda, SCSI device
    /dev/sdb -d ata # /dev/sdb, ATA device
    /dev/sdc -d sat # /dev/sdc [SAT], ATA device
    /dev/csmi0,0 -d ata # /dev/csmi0,0, ATA device
    /dev/csmi0,1 -d ata # /dev/csmi0,1, ATA device
    /dev/nvme0 -d nvme # /dev/nvme0, NVMe device

    There's a regular intel soft raid. In Agent version 6.4, it perfectly sees the disks behind this raid.
    Here is the output from Agent 6.4:
    Code:
    C:\Windows\system32>"C:\Program Files\Zabbix Agent 2\zabbix_agent2.exe" -c "C:\P
    rogram Files\Zabbix Agent 2\zabbix_agent2.conf" -t smart.disk.discovery
    smart.disk.discovery                          [s|[{"{#NAME}":"csmi0,0","{#DISKTY
    PE}":"ssd","{#MODEL}":"KINGSTON SV300S37A60G","{#SN}":"50026B7246018A2F","{#PATH
    }":"/dev/csmi0,0","{#RAIDTYPE}":"","{#ATTRIBUTES}":"Raw_Read_Error_Rate Retired_
    Block_Count Power_On_Hours_and_Msec Power_Cycle_Count Program_Fail_Count Erase_F
    ail_Count Unexpect_Power_Loss_Ct Wear_Range_Delta Program_Fail_Count Erase_Fail_
    Count Reported_Uncorrect Airflow_Temperature_Cel Temperature_Celsius ECC_Uncorr_
    Error_Count Reallocated_Event_Count Unc_Soft_Read_Err_Rate Soft_ECC_Correct_Rate
     Life_Curve_Status SSD_Life_Left SandForce_Internal SandForce_Internal Lifetime_
    Writes_GiB Lifetime_Reads_GiB"},{"{#NAME}":"csmi0,1","{#DISKTYPE}":"ssd","{#MODE
    L}":"KINGSTON SV300S37A60G","{#SN}":"50026B7246018A2C","{#PATH}":"/dev/csmi0,1",
    "{#RAIDTYPE}":"","{#ATTRIBUTES}":"Raw_Read_Error_Rate Retired_Block_Count Power_
    On_Hours_and_Msec Power_Cycle_Count Program_Fail_Count Erase_Fail_Count Unexpect
    _Power_Loss_Ct Wear_Range_Delta Program_Fail_Count Erase_Fail_Count Reported_Unc
    orrect Airflow_Temperature_Cel Temperature_Celsius ECC_Uncorr_Error_Count Reallo
    cated_Event_Count Unc_Soft_Read_Err_Rate Soft_ECC_Correct_Rate Life_Curve_Status
     SSD_Life_Left SandForce_Internal SandForce_Internal Lifetime_Writes_GiB Lifetim
    e_Reads_GiB"},{"{#NAME}":"nvme0","{#DISKTYPE}":"nvme","{#MODEL}":"","{#SN}":"","
    {#PATH}":"/dev/nvme0","{#RAIDTYPE}":"","{#ATTRIBUTES}":""},{"{#NAME}":"sdc","{#D
    ISKTYPE}":"hdd","{#MODEL}":"ST1000LM024 HN-M101MBB","{#SN}":"S30CJ9CFC09569","{#
    PATH}":"/dev/sdc","{#RAIDTYPE}":"","{#ATTRIBUTES}":"Raw_Read_Error_Rate Throughp
    ut_Performance Spin_Up_Time Start_Stop_Count Reallocated_Sector_Ct Seek_Error_Ra
    te Seek_Time_Performance Power_On_Hours Spin_Retry_Count Calibration_Retry_Count
     Power_Cycle_Count G-Sense_Error_Rate Power-Off_Retract_Count Temperature_Celsiu
    s Hardware_ECC_Recovered Reallocated_Event_Count Current_Pending_Sector Offline_
    Uncorrectable UDMA_CRC_Error_Count Multi_Zone_Error_Rate Load_Retry_Count Load_C
    ycle_Count"}]]
    Today, once again trying to find a solution on the Internet, I came across bug report ZBX-26098 with the same problem.​​

    Leave a comment:


  • PavelZ
    replied
    @INFinite Please note that the template is designed to exclude some devices that do not support queries. There are two macros {$SMART.DISK.NAME.MATCHES} and {$SMART.DISK.NAME.NOT_MATCHES}
    If you exclude /dev/sda, the errors will disappear.

    But does smartctl --scan also output other devices on its list? You need to make sure of this.

    Leave a comment:


  • PavelZ
    replied
    I noticed another obvious problem:
    You show the disk /dev/sda Raid 1 Volume, i.e. consisting of two physical ones.
    If smartctl returns one set of indicators. Obviously, it cannot combine two sets of data.
    The device should not be called /dev/sda.
    Usually for Windows, disks are polled separately with the -d driver specified.
    There is some key or naming convention there.

    "scsi_vendor": "Intel",\r\n "scsi_product": "Raid 1 Volume",\r\n
    Try to find the correct command. It will be something like:
    smartctl -a /dev/csmi0,1​

    I do not have this exact controller model and it is difficult for me to give a ready-made recipe. But I hope this information will help you.​

    Leave a comment:


  • INFinite
    replied
    Originally posted by PavelZ
    Technically, all historical versions are available. But it's not that easy to figure it out if you're not used to working with Git.
    The situation is complicated by the presence of several branches. You need to switch branches and download along this path:
    https://github.com/zabbix/zabbix/commits/master/templates/server/smart_agent2/template_module_smart_agent2.yaml


    I installed the template version 7.0. Agent version 7.0.7, the result is the same:
    Code:
    Cannot fetch data.: got error executing worker pool: smartctl returned error: unknown error from smartctl.

    Leave a comment:


  • PavelZ
    replied
    Technically, all historical versions are available. But it's not that easy to figure it out if you're not used to working with Git.
    The situation is complicated by the presence of several branches. You need to switch branches and download along this path:
    Real-time monitoring of IT components and services, such as networks, servers, VMs, applications and the cloud. - History for templates/server/smart_agent2/template_module_smart_agent2.yaml - zabbi...



    The update interval was 10s.
    But in the standard template this interval is 5 minutes. It is unknown what other errors such frequent polling may cause.

    Leave a comment:


  • INFinite
    replied
    Originally posted by PavelZ
    Wait, you do understand that templates also need to be updated? The template version is also subject of control.
    I'm not ready to remote debug this problem, but there are obvious considerations.
    Try to achieve a situation where both the template is version 7.0 and the agent is version 7.0

    Also, I suggest focusing on smartctl first and making the error go away
    Template version 7.2-1.
    Can you tell me where I can find the old versions of the template?

    When replacing files from the archive with version 6.4, everything immediately worked.

    exit status 4.
    It rarely occurs, it just happened to be an example. I associate it with a frequent request for data. The update interval was 10s.
    Last edited by INFinite; 04-03-2025, 13:22.

    Leave a comment:


  • PavelZ
    replied
    Wait, you do understand that templates also need to be updated? The template version is also subject of control.
    I'm not ready to remote debug this problem, but there are obvious considerations.
    Try to achieve a situation where both the template is version 7.0 and the agent is version 7.0

    Also, I suggest focusing on smartctl first and making the error go away
    exit status 4.
    Last edited by PavelZ; 04-03-2025, 12:00.

    Leave a comment:


  • INFinite
    commented on 's reply
    Thanks for the hint in which direction to look.

  • INFinite
    replied
    Originally posted by INFinite
    Hello.
    I ran into this problem, when setting up SMART monitoring on windows(I tried it on different versions OS windows):
    Code:
    Cannot fetch data.: got error executing worker pool: failed to execute smartctl: "{\r\n \"json_format_version\": [\r\n 1,\r\n 0\r\n ],\r\n \"smartctl\":
    {\r\n \"version\": [\r\n 7,\r\n 4\r\n ],\r\n \"pre_release\": false,\r\n \"svn_revision\": \"5530\",\r\n \"platform_info\": \"x86_64-w64-mingw32-2012r2\",
    \r\n \"build_info\": \"(sf-7.4-1)\",\r\n \"argv\": [\r\n \"smartctl\",\r\n \"-a\",\r\n \"/dev/sda\",\r\n \"-j\"\r\n ],\r\n \"exit_status\": 4\r\n },\r\n \"local_time\":
    {\r\n \"time_t\": 1740725274,\r\n \"asctime\": \"Fri Feb 28 09:47:54 2025 RTZ\"\r\n },\r\n \"device\": {\r\n \"name\": \"/dev/sda\",\r\n \"info_name\":
    \"/dev/sda\",\r\n \"type\": \"scsi\",\r\n \"protocol\": \"SCSI\"\r\n },\r\n \"scsi_vendor\": \"Intel\",\r\n \"scsi_product\": \"Raid 1 Volume\",\r\n
    \"scsi_model_name\": \"Intel Raid 1 Volume\",\r\n \"scsi_revision\": \"1.0.\",\r\n \"scsi_version\": \"SPC-3\",\r\n \"user_capacity\": {\r\n \"blocks\": 111357952,
    \r\n \"bytes\": 57015271424\r\n },\r\n \"logical_block_size\": 512,\r\n \"scsi_lb_provisioning\": {\r\n \"name\": \"thin provisioned\",\r\n \"value\": 2,\r\n
    \"management_enabled\": {\r\n \"name\": \"LBPME\",\r\n \"value\": -1\r\n },\r\n \"read_zeros\": {\r\n \"name\": \"LBPRZ\",\r\n \"value\": 0\r\n }\r\n },\r\n
    \"rotation_rate\": 0,\r\n \"logical_unit_id\": \"0x61fa116d01000000001517ffff0aeb84\",\r\n \"device_type\": {\r\n \"scsi_terminology\":
    \"Peripheral Device Type [PDT]\",\r\n \"scsi_value\": 0,\r\n \"name\": \"disk\"\r\n },\r\n \"smart_support\": {\r\n \"available\": false\r\n },\r\n
    \"temperature\": {\r\n \"current\": 0,\r\n \"drive_trip\": 0\r\n },\r\n \"seagate_farm_log\": {\r\n \"supported\": false\r\n }\r\n}\r": exit status 4.
    The data is coming, but it does not have the correct format.

    Ubuntu 24
    zabbix server 7.2.4
    zabbix agent2 7.2.3
    smartmontools 7.4
    template is SMART by Zabbix agent 2

    Help me please
    Experimentally, I found out that the "SMART by Zabbix agent 2" template works with agent version 6.4.21 and possibly earlier. It does not work with versions 7.0.X or 7.2.X.

    Leave a comment:


  • INFinite
    replied
    Originally posted by PavelZ
    INFinite ,
    There are some changes in the new versions of the agent.
    Would you like to try slightly older ones?

    I propose a version 7.0.7
    After these changes, it gives the following error:

    Code:
    Cannot fetch data.: got error executing worker pool: smartctl returned error: unknown error from smartctl.

    Leave a comment:


  • PavelZ
    replied
    INFinite ,
    There are some changes in the new versions of the agent.
    Would you like to try slightly older ones?

    I propose a version 7.0.7

    Leave a comment:


  • PavelZ
    replied
    Originally posted by evert

    Does the method you describe reduce security? In what way?
    Any of your actions affect security. Therefore, I will not take responsibility.

    I am simply informing you that this is an alternative solution described immediately in the Readme as recommended.

    Leave a comment:


  • INFinite
    replied
    Hello.
    I ran into this problem, when setting up SMART monitoring on windows(I tried it on different versions OS windows):
    Code:
    Cannot fetch data.: got error executing worker pool: failed to execute smartctl: "{\r\n \"json_format_version\": [\r\n 1,\r\n 0\r\n ],\r\n \"smartctl\":
    {\r\n \"version\": [\r\n 7,\r\n 4\r\n ],\r\n \"pre_release\": false,\r\n \"svn_revision\": \"5530\",\r\n \"platform_info\": \"x86_64-w64-mingw32-2012r2\",
    \r\n \"build_info\": \"(sf-7.4-1)\",\r\n \"argv\": [\r\n \"smartctl\",\r\n \"-a\",\r\n \"/dev/sda\",\r\n \"-j\"\r\n ],\r\n \"exit_status\": 4\r\n },\r\n \"local_time\":
    {\r\n \"time_t\": 1740725274,\r\n \"asctime\": \"Fri Feb 28 09:47:54 2025 RTZ\"\r\n },\r\n \"device\": {\r\n \"name\": \"/dev/sda\",\r\n \"info_name\":
    \"/dev/sda\",\r\n \"type\": \"scsi\",\r\n \"protocol\": \"SCSI\"\r\n },\r\n \"scsi_vendor\": \"Intel\",\r\n \"scsi_product\": \"Raid 1 Volume\",\r\n
    \"scsi_model_name\": \"Intel Raid 1 Volume\",\r\n \"scsi_revision\": \"1.0.\",\r\n \"scsi_version\": \"SPC-3\",\r\n \"user_capacity\": {\r\n \"blocks\": 111357952,
    \r\n \"bytes\": 57015271424\r\n },\r\n \"logical_block_size\": 512,\r\n \"scsi_lb_provisioning\": {\r\n \"name\": \"thin provisioned\",\r\n \"value\": 2,\r\n
    \"management_enabled\": {\r\n \"name\": \"LBPME\",\r\n \"value\": -1\r\n },\r\n \"read_zeros\": {\r\n \"name\": \"LBPRZ\",\r\n \"value\": 0\r\n }\r\n },\r\n
    \"rotation_rate\": 0,\r\n \"logical_unit_id\": \"0x61fa116d01000000001517ffff0aeb84\",\r\n \"device_type\": {\r\n \"scsi_terminology\":
    \"Peripheral Device Type [PDT]\",\r\n \"scsi_value\": 0,\r\n \"name\": \"disk\"\r\n },\r\n \"smart_support\": {\r\n \"available\": false\r\n },\r\n
    \"temperature\": {\r\n \"current\": 0,\r\n \"drive_trip\": 0\r\n },\r\n \"seagate_farm_log\": {\r\n \"supported\": false\r\n }\r\n}\r": exit status 4.
    The data is coming, but it does not have the correct format.

    Ubuntu 24
    zabbix server 7.2.4
    zabbix agent2 7.2.3
    smartmontools 7.4
    template is SMART by Zabbix agent 2

    Help me please
    Last edited by INFinite; 28-02-2025, 14:34.

    Leave a comment:


  • evert
    replied
    Originally posted by PavelZ
    The company cannot afford to reduce security and publish such instructions, but we can)
    Does the method you describe reduce security? In what way?
    Last edited by evert; 27-02-2025, 11:53.

    Leave a comment:


  • PavelZ
    replied
    The company cannot afford to reduce security and publish such instructions, but we can)

    Old template also have a good example in readme :
    PHP Code:
    Cmnd_Alias SMARTCTL = /usr/sbin/smartctl
    zabbix ALL
    = (ALLNOPASSWDSMARTCTL
    Defaults
    !SMARTCTL !logfile, !syslog, !pam_session 
    The commands will continue to run, they will just clutter the logs.

    Leave a comment:

Working...