Ad Widget
Collapse
Discussion thread for official Zabbix SMART Disk monitoring
Collapse
X
-
Hi,
could someone please help me to figure out what I'm doing wrong with this template?
Currently I ran Zabbix v 5.0.14 on Ubuntu 20.04 and i would like to monitor some HDD on my Windows 10 machine. The agent on the Windows 10 machine is updated to the latest version (5.0.14)
I've successfully imported the template but, as other users mentioned, I only receive
orCode:Failed to scan for devices: Cannot unmarshal JSON: invalid character 'P' in string escape code..
The main difference between the two error is that the first has the path configured in the agent conf file, while the other uses the environment variable.Code:Failed to scan for devices: Cannot unmarshal JSON: invalid character 'n' after top-level value..
Running the check with the agent locally on Windows 10 I sucessfully get the data, but it fails when I try to test it with zabbix_get.
On Windows I use the Administrator account, so there shouldn't be any permission related problem.
Thanks in advanceComment
-
Hi Team,
thanks for the great plugin, but I have an issue with it. I'm trying to add a monitor for NVMe drives and receiving an error
I'm having smartctl 7.2 and what I can see is that the return value is completely different from what we are looking at in the templatePreprocessing failed for: [{"device":{"info_name":"/dev/nvme0","name":"/dev/nvme0","protocol":"NVMe","type":"nvme"},"disk_n...
1. Failed: cannot extract value from json by path "$[?(@.disk_name=='nvme0')].ata_smart_data.self_test.status.passed.first()": no data matches the specified path
zabbix_get -s lt-mum2-psr01 -k smart.disk.get
[{"device":{"info_name":"/dev/nvme0","name":"/dev/nvme0","protocol":"NVMe","type":"nvme"},"disk_name ":"nvme0","disk_type":"unknown","firmware_vers ion" :"002C","json_format_version":[1,0],"local_time":{"asctime":"Sun Aug 8 21:30:18 2021 IST","time_t":1628438418},"logical_block_size":51 2 ,"model_name":"INTEL SSDPEKNW020T8","nvme_controller_id":1,"nvme_ieee_o ui_identifier":6083300,"nvme_namespaces":[{"capacity":{"blocks":4000797360,"bytes":204840 824 8320},"formatted_lba_size":512,"id":1,"size":{"blo cks":4000797360,"bytes":2048408248320},"utilizati o n":{"blocks":4000797360,"bytes":2048408248320}}],"nvme_number_of_namespaces":1,"nvme_pci_vendor ":{ "id":32902,"subsystem_id":32902},"nvme_smart_h ealt h_information_log":{"available_spare":100,"availab le_spare_threshold":10,"controller_busy_time":1893 ,"critical_comp_time":0,"critical_warning":0,"d ata _units_read":20036417,"data_units_written":8547996 ,"host_reads":92243619,"host_writes":43771250," med ia_errors":0,"num_err_log_entries":0,"percentage_u sed":0,"power_cycles":186,"power_on_hours":2831," t emperature":28,"unsafe_shutdowns":6,"warning_temp_ time":0},"nvme_version":{"string":"1.3","value":66 304},"power_cycle_count":186,"power_on_time":{"hou rs":2831},"serial_number":"BTNH003101J42P0C","sm ar t_status":{"nvme":{"value":0},"passed":true},"smar tctl":{"argv":["smartctl","-a","/dev/nvme0","-j"],"build_info":"(local build)","exit_status":0,"platform_info":"x86_64-linux-5.4.114-ltzero.12.x86_64","svn_revision":"5155","version":[7,2]},"temperature":{"current":28},"user_capacity":{ "b locks":4000797360,"bytes":2048408248320}}]Comment
-
Hi All,
these are my tries that have been unsuccessful.
Server side
Zabbix Server 5.4.3
Host with Zabbix Agent 2 5.4.3 active mode has been linked to SMART agent 2 active template
Host with Zabbix Agent 2 5.4.3 passive mode has been linked to SMART agent 2 template
Client side
- Windows 10 Pro with Zabbix Agent 2 active + smartmontools 7.2
- Windows 10 Home with Zabbix Agent 2 passive + smartmontools 7.2
On both client computers I have been ran successfully these smartmontools commands:
But on the front end I have always get this messageCode:C:\Program Files\Zabbix Agent 2>zabbix_agent2.exe -t smart.disk.discovery smart.disk.discovery [s|[{"{#NAME}":"sda","{#DISKTYPE}":"SSD","{#MODEL}":"S amsung SSD 860 EVO 250GB","{#SN}":"S4CJNX0N310488H"}]] C:\Program Files\Zabbix Agent 2>zabbix_agent2.exe -t smart.attribute.discovery smart.attribute.discovery [s|[{"{#NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":5,"{# ATTRNAME}":"Reallocated_Sector_Ct","{#THRESH}":0}, {"{#NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":9,"{# ATTRNAME}":"Power_On_Hours","{#THRESH}":0},{"{#NAM E}":"sda","{#DISKTYPE}":"SSD","{#ID}":12,"{#ATTRNA ME}":"Power_Cycle_Count","{#THRESH}":0},{"{#NAME}" :"sda","{#DISKTYPE}":"SSD","{#ID}":177,"{#ATTRNAME }":"Wear_Leveling_Count","{#THRESH}":0},{"{#NAME}" :"sda","{#DISKTYPE}":"SSD","{#ID}":179,"{#ATTRNAME }":"Used_Rsvd_Blk_Cnt_Tot","{#THRESH}":0},{"{#NAME }":"sda","{#DISKTYPE}":"SSD","{#ID}":181,"{#ATTRNA ME}":"Program_Fail_Cnt_Total","{#THRESH}":0},{"{#N AME}":"sda","{#DISKTYPE}":"SSD","{#ID}":182,"{#ATT RNAME}":"Erase_Fail_Count_Total","{#THRESH}":0},{" {#NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":183,"{# ATTRNAME}":"Runtime_Bad_Block","{#THRESH}":0},{"{# NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":187,"{#AT TRNAME}":"Uncorrectable_Error_Cnt","{#THRESH}":0}, {"{#NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":190," {#ATTRNAME}":"Airflow_Temperature_Cel","{#THRESH}" :0},{"{#NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":1 95,"{#ATTRNAME}":"ECC_Error_Rate","{#THRESH}":0},{ "{#NAME}":"sda","{#DISKTYPE}":"SSD","{#ID}":199,"{ #ATTRNAME}":"CRC_Error_Count","{#THRESH}":0},{"{#N AME}":"sda","{#DISKTYPE}":"SSD","{#ID}":235,"{#ATT RNAME}":"POR_Recovery_Count","{#THRESH}":0},{"{#NA ME}":"sda","{#DISKTYPE}":"SSD","{#ID}":241,"{#ATTR NAME}":"Total_LBAs_Written","{#THRESH}":0}]] C:\Program Files\Zabbix Agent 2>zabbix_agent2.exe -t smart.disk.get smart.disk.get [s|[{"ata_smart_attributes":{"revision":1,"table":[{"flags":{"auto_keep":true,"error_rate":false,"eve nt_count":true,"performance":false,"prefailure":tr ue,"string":"PO--CK ","updated_online":true,"value":51},"id":5,"name": "Reallocated_Sector_Ct","raw":{"string":"0","value ":0},"value":100,"worst":100},{"flags":{"auto_keep ":true,"error_rate":false,"event_count":true,"perf ormance":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":9,"name": "Power_On_Hours","raw":{"string":"526","value":526 },"value":99,"worst":99},{"flags":{"auto_keep":tru e,"error_rate":false,"event_count":true,"performan ce":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":12,"name" :"Power_Cycle_Count","raw":{"string":"607","value" :607},"value":99,"worst":99},{"flags":{"auto_keep" :false,"error_rate":false,"event_count":true,"perf ormance":false,"prefailure":true,"string":"PO--C- ","updated_online":true,"value":19},"id":177,"name ":"Wear_Leveling_Count","raw":{"string":"6","value ":6},"value":99,"worst":99},{"flags":{"auto_keep": false,"error_rate":false,"event_count":true,"perfo rmance":false,"prefailure":true,"string":"PO--C- ","updated_online":true,"value":19},"id":179,"name ":"Used_Rsvd_Blk_Cnt_Tot","raw":{"string":"0","val ue":0},"value":100,"worst":100},{"flags":{"auto_ke ep":true,"error_rate":false,"event_count":true,"pe rformance":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":181,"name ":"Program_Fail_Cnt_Total","raw":{"string":"0","va lue":0},"value":100,"worst":100},{"flags":{"auto_k eep":true,"error_rate":false,"event_count":true,"p erformance":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":182,"name ":"Erase_Fail_Count_Total","raw":{"string":"0","va lue":0},"value":100,"worst":100},{"flags":{"auto_k eep":false,"error_rate":false,"event_count":true," performance":false,"prefailure":true,"string":"PO--C- ","updated_online":true,"value":19},"id":183,"name ":"Runtime_Bad_Block","raw":{"string":"0","value": 0},"value":100,"worst":100},{"flags":{"auto_keep": true,"error_rate":false,"event_count":true,"perfor mance":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":187,"name ":"Uncorrectable_Error_Cnt","raw":{"string":"0","v alue":0},"value":100,"worst":100},{"flags":{"auto_ keep":true,"error_rate":false,"event_count":true," performance":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":190,"name ":"Airflow_Temperature_Cel","raw":{"string":"42"," value":42},"value":58,"worst":48},{"flags":{"auto_ keep":false,"error_rate":true,"event_count":true," performance":false,"prefailure":false,"string":"-O-RC- ","updated_online":true,"value":26},"id":195,"name ":"ECC_Error_Rate","raw":{"string":"0","value":0}, "value":200,"worst":200},{"flags":{"auto_keep":tru e,"error_rate":true,"event_count":true,"performanc e":true,"prefailure":false,"string":"-OSRCK ","updated_online":true,"value":62},"id":199,"name ":"CRC_Error_Count","raw":{"string":"0","value":0} ,"value":100,"worst":100},{"flags":{"auto_keep":fa lse,"error_rate":false,"event_count":true,"perform ance":false,"prefailure":false,"string":"-O--C- ","updated_online":true,"value":18},"id":235,"name ":"POR_Recovery_Count","raw":{"string":"2","value" :2},"value":99,"worst":99},{"flags":{"auto_keep":t rue,"error_rate":false,"event_count":true,"perform ance":false,"prefailure":false,"string":"-O--CK ","updated_online":true,"value":50},"id":241,"name ":"Total_LBAs_Written","raw":{"string":"3052987434 ","value":3052987434},"value":99,"worst":99}]},"ata_smart_data":{"capabilities":{"attribute_aut osave_enabled":true,"conveyance_self_test_supporte d":false,"error_logging_supported":true,"exec_offl ine_immediate_supported":true,"gp_logging_supporte d":false,"offline_is_aborted_upon_new_cmd":false," offline_surface_scan_supported":false,"selective_s elf_test_supported":true,"self_tests_supported":tr ue,"values":[83,3]},"offline_data_collection":{"completion_seconds": 0,"status":{"string":"was never started","value":0}},"self_test":{"polling_minutes ":{"extended":85,"short":2},"status":{"passed":tru e,"string":"completed without error","value":0}}},"device":{"info_name":"/dev/sda","name":"/dev/sda","protocol":"ATA","type":"ata"},"disk_name":"s da","disk_type":"unknown","firmware_version":"RVT0 4B6Q","in_smartctl_database":true,"json_format_ver sion":[1,0],"local_time":{"asctime":"Thu Aug 19 15:22:20 2021 ","time_t":1629379340},"model_family":"Samsung based SSDs","model_name":"Samsung SSD 860 EVO 250GB","power_cycle_count":607,"power_on_time":{"h ours":526},"serial_number":"S4CJNX0N310488H","smar t_status":{"passed":true},"smartctl":{"argv":["smartctl","-a","/dev/sda","-j"],"build_info":"(sf-7.2-1)","exit_status":4,"platform_info":"x86_64-w64-mingw32-w10-b19043","svn_revision":"5155","version":[7,2]},"temperature":{"current":42},"trim":{"supported" :false}}]]
If I run this command from Zabbix server CLI I have been got same message
Zabbix Agent 2 and smartmoontools (smartctl) runs with Administrator privileges on both Windows 10 computers.Code:pi@raspberrypi:~ $ zabbix_get -s 192.168.1.111 -p 10050 -k smart.disk.get ZBX_NOTSUPPORTED: Failed to scan for devices: Cannot unmarshal JSON: invalid character 'n' after top-level value..
So final question: is there anyone who has been get SMART monitoring works fine with Windows 10 hosts?
Regards
RSLast edited by robsitz; 27-08-2021, 16:27.Comment
-
Hi Team and @AlexL
thanks for the great plugin, but I have an issue with SSD\NVME drives.
AlexeySaff and robsitz wrote, but there is still no solution.
could you check and update this great template?
Thanks !Comment
-
I ran in to this as well and figured out how to fix it.
There is two options.
Option 1:
Modify the existing item to use raw value instead
Consequence: the triggers will then most likely not work and should be removed or disabled since they depend on comparing the threshold against the normalized value and not the raw value.
Option 2:
Add a new item in addition to the existing one, this gives you some duplication but preserves the triggers intact.
This is what i decided to do.
Open up the template "Template Module SMART by Zabbix agent 2"
go to Discovery rules and click on "Attribute discovery"
go to "item prototpypes"
you should have only one here, open it and click on clone.
Change the name, i decided on: "SMART [{#NAME}]: ID {#ID} {#ATTRNAME} (raw value)"
Change the key so it is different than the previous one: i went for: smart.disk.rawvalue[{#NAME},{#ID}]
Go to preprocessing and change the JSONPath to: $[?(@.disk_name=='{#NAME}')].ata_smart_attributes.table[?(@.id=={#ID})].raw.value.first()
(basically replacing "value" with "raw.value")
While in the template you may want to also fix the broken power on hours so it makes sense:
go back to the template itself again and open up "Disk discovery" instead, find the item prototype for power on hours.
on preprocessing tab, add a step: "Custom multiplier" and set it to 3600
This will convert the hours in to seconds as expected by this item, so then it will be displayed properly.
There still is something else that needs fixing, the graphs have significant gaps in them, haven't investigated this yetComment
-
One more thing, the agent seems to get the device types very wrong at times.
devices where the device type is unknown gets marked as an SSD even if that is not the case.
this happens with all my mechanical devices behind a megaraid sas controller in IT mode (JBOD)Comment
-
Hello. A quick solution is to add the path to smartctl.exe into your global variable PATH and not use the Plugins.Smart.Path parameter in zabbix_agent2.conf. Also, you can use a system account instead dedicated user to start the Zabbix agent 2 service.Last edited by max.ch.88; 29-09-2021, 16:58.Comment
-
Hi max.ch.88
first of all thanks for your support.
My Windows 10 Pro account (username Roberto) is a member of local Administrators group.
As you can see in the picture below I already have the path of smartctl.exe into my environment variable PATH as well as Zabbix Agent 2 service is running with Local System account.
Then, Zabbix Agent 2 configuration file has commented lines for every Plugins.Smart
### Option: Plugins.Smart.Timeout
# The maximum time in seconds for waiting before smartctl execution is terminated.
# The timeout is for a single smartctl command line execution.
#
# Mandatory: no
# Range: 1-30
# Default: <Global timeout>
# Plugins.Smart.Timeout=
### Option: Plugins.Smart.Path
# Path to smartctl executable.
#
# Mandatory: no
# Default: smartctl
# Plugins.Smart.Path=
Where am I wrong?
Thank you in advance for any further support.
Regards
RS
Comment
-
Hi there.
I've got problem running disk smart check in my environment. Have tried many ways and still have 'unsupported item key' for the 'SMART: Get attributes' item.
Discovery rules 'Disk discovery' and 'Attribute discovery' are also unsupported.
I set higher zabbix agent debug level and have in log:
I have checked the smartctl run possibility for zabbix agent's user (zabbix) and its running fine:3486871:20211220:111015.320 Requested [smart.disk.get]
3486871:20211220:111015.320 Sending back [ZBX_NOTSUPPORTED: Unsupported item key.]
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-80-generic] (local build)Code:sudo -u zabbix smartctl --version
However many other items are working fine, including disk related ones: utilization, queues etc.
What am I doing wrong? Moreover as a newbie (but impressed
) Zabbix user I have more general question: how to debug Zabbix workflow in low level details? As here, in this particular problem do I have more granular tool to trace why it is not working?
OS: Ubuntu 20.04.3 LTS
Zabbix server: 5.4.8 (in docker's container)
Zabbix agent: 5.4.8Comment
-
My bad! All the time I haven't clue that i tried to use Agent's 2 template on my system where first Agent's version was installed. So this problem is solved.
However now I have problem with the ' cannot unmarshal json'. I'm going to investigate it further.
Comment
-
I have tried the plugin on a Proxmox system 7.1-10 (Based on Debian Bullseye), on two different computers, with the following experiences:
On one there was a megaraid adapter and the disks on that adapter were not recognized. On the other there were some rather old HDDs and they were discovered as SSDs.
The megaraid machine is more interesting, checking the disks gives the following output:
As it is seen the megaraid subdevice mubering starts at 14 and the original program is trying to find devices from 0 and as the first device is not found it exits searching.Code:sudo smartctl --scan /dev/sda -d scsi # /dev/sda, SCSI device /dev/sdb -d scsi # /dev/sdb, SCSI device /dev/sdc -d scsi # /dev/sdc, SCSI device /dev/bus/8 -d megaraid,14 # /dev/bus/8 [megaraid_disk_14], SCSI device /dev/bus/8 -d megaraid,15 # /dev/bus/8 [megaraid_disk_15], SCSI device /dev/bus/8 -d megaraid,16 # /dev/bus/8 [megaraid_disk_16], SCSI device /dev/bus/8 -d megaraid,17 # /dev/bus/8 [megaraid_disk_17], SCSI device /dev/bus/8 -d megaraid,18 # /dev/bus/8 [megaraid_disk_18], SCSI device /dev/bus/8 -d megaraid,19 # /dev/bus/8 [megaraid_disk_19], SCSI device /dev/bus/8 -d megaraid,20 # /dev/bus/8 [megaraid_disk_20], SCSI device /dev/nvme0 -d nvme # /dev/nvme0, NVMe device
For the HDDs detected as SSD, the problem is that the plugin decides based on the rotation rate attribute, and if it is not present then it treats it as zero and assumes that then that is an SSD.
I spent some time in fixing the go program and please find attached an updated version.
I did the following changes:- Theres is a sort after running smartctl --scan which ignores the megaraid subdevices, so it reduces my seven megaraid drives one. I have fixed this sort.
- When there are megaraid adapters, then instead of sequentially looking for subdevices starting from 0, use the subdevice mubers reported by smartctl --scan.
- If a device has rotation rate == 0 (may be because the attribute is missing), check the SpinUp time parameter, and if it is present, set rotation rate to 1, which causes the disk to be detected as HDD
- When running smartctl --json , if the result doesn't start with { and doesn't end with }, then probably the sudo right for smartctl is not set, so I changed the error message to "Smartctl did not return json, check if sudo is enabled for zabbix user", insted of the "Failed to scan for devices: Cannot unmarshal JSON: invalid character 'P' in string escape code." or simmilar.
Comment
-
Hello robsitzHi max.ch.88
### Option: Plugins.Smart.Path
# Path to smartctl executable.
#
# Mandatory: no
# Default: smartctl
# Plugins.Smart.Path=
Where am I wrong?
Thank you in advance for any further support.
Regards
RS
Please, set Plugins.Smart.Path="c:\smartmontools\bin\smartctl. exe" in your zabbix-agent2 config file and reboot the service.Comment
-
Hi max.ch.88
Hello robsitz
Please, set Plugins.Smart.Path="c:\smartmontools\bin\smartctl. exe" in your zabbix-agent2 config file and reboot the service.
thanks for your reply!
A little bit step forward...I suppose.
After I have executed your instructions I cannot see anymore the error "Failed to scan for devices..."
The final question is: how can I check within Zabbix if SMART by Zabbix agent active 2 template is working correctly?
Thank you in advance.
Regards
RSComment
-
Comment
Comment