Ad Widget

Collapse

SNMP results not always written to database

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ThomasV
    Junior Member
    • Nov 2024
    • 9

    #1

    SNMP results not always written to database

    Hi,

    Running Zabbix 7.0.5 with MariaDB 15.1

    I am monitoring twenty Dell Cumulus switches with the "Network Generic Device by SNMP" template. I am seeing gaps in my interface graphs, so i started troubleshooting.
    All switches except one are disabled to make troubleshooting easier.

    When we look at latest data for this host/interface, we can see Bits received was not updated, while we know it was polled.
    Click image for larger version

Name:	Screenshot 2024-11-18 200039.png
Views:	180
Size:	23.9 KB
ID:	494491

    With tcpdump i can see the switch interfaces are polled every 3 minutes. I am focusing on a specific interface here and have set SNMP agent "use combined request" to disabled.
    Click image for larger version

Name:	Screenshot 2024-11-18 200016.png
Views:	313
Size:	79.2 KB
ID:	494489

    Next i increased the polling logger to see it from th zabbix perspective. We can see Zabbix polling the item.
    Click image for larger version

Name:	Screenshot 2024-11-18 200031.png
Views:	180
Size:	77.1 KB
ID:	494490

    Next i've started a MariaDB query log. When we look for inserts with the item-id for Bits Received for this interfaces, we can "sometimes" see data being inserted into the DB, but also sometimes we can see the poll, but not the DB insert.
    Click image for larger version

Name:	image.png
Views:	208
Size:	70.8 KB
ID:	494492
    My question is, what would be the best next step to enable for logging in Zabbix?
    With zabbix_server log_level_increase i do not see a specific process_type for Database related logging.

    Thanks in advance, Thomas
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    History syncer processes should deal with DB writes...
    • history syncer - history DB writer

    Comment

    • ThomasV
      Junior Member
      • Nov 2024
      • 9

      #3
      Thanks for the pointer cyber

      When i try this, i can see the Bits Received and Bits Send being polled with tcpdump
      Click image for larger version

Name:	Screenshot 2024-11-20 105759.png
Views:	208
Size:	79.0 KB
ID:	494602

      When i grep the log for queries on those two items i can see these. Bits received is not saved in history_uint
      Click image for larger version

Name:	Screenshot 2024-11-20 105815.png
Views:	166
Size:	85.6 KB
ID:	494603

      When i examine the log for futher info, i can see these errors. It looks like zabbix has no value for this polled item. But we know from the tcpdump that snmp returned a valid value.
      Code:
      26899:20241120:105632.345 zbx_substitute_functions_results() expression[3]:'({38306}>({$IF.UTIL.MAX:"swp20s0"}/100)*{38307} or
      {38308}>({$IF.UTIL.MAX:"swp20s0"}/100)*{38307}) and
      {38307}>0' => '(ERROR(Cannot evaluate function avg(/sw-0021/net.if.in[ifInOctets.56],15m): not enough data.)>("{$IF.UTIL.MAX:\"swp20s0\"}"/100)*1000000000 or
      ERROR(Cannot evaluate function avg(/sw-0021/net.if.out[ifOutOctets.56],15m): not enough data.)>("{$IF.UTIL.MAX:\"swp20s0\"}"/100)*1000000000) and
      1000000000>0'
      
       26899:20241120:105632.345 zbx_substitute_functions_results() expression[3]:'{38306}<(({$IF.UTIL.MAX:"swp20s0"}-3)/100)*{38307} and
      {38308}<(({$IF.UTIL.MAX:"swp20s0"}-3)/100)*{38307}' => 'ERROR(Cannot evaluate function avg(/sw-0021/net.if.in[ifInOctets.56],15m): not enough data.)<(("{$IF.UTIL.MAX:\"swp20s0>
      ERROR(Cannot evaluate function avg(/sw-0021/net.if.out[ifOutOctets.56],15m): not enough data.)<(("{$IF.UTIL.MAX:\"swp20s0\"}"-3)/100)*1000000000'
      What would be the best logger module to see which value is received by zabbix from the snmp get?

      Thanks, Thomas

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        These error are about trigger calculations... it has not enough data for that 15m avg calculation...

        v7 has snmp poller process(es), which should show you, what it gets from device...

        Comment

        • ThomasV
          Junior Member
          • Nov 2024
          • 9

          #5
          I've disabled all devices and all items, except two items (Bits Received and Send) on the same port on the same switch. I can see in the logs SNMP is polling, it just does not write the value to the history_uint table

          Value that is polled and saved in DB
          Code:
          27801:20241120:135635.331 zbx_setproctitle() title:'poller #2 [got 0 values in 0.000274 sec, getting values]'
           27801:20241120:135635.331 In get_values()
           27801:20241120:135635.331 In zbx_dc_config_get_poller_items() poller_type:0
           27801:20241120:135635.331 End of zbx_dc_config_get_poller_items():1
           27801:20241120:135635.331 In zbx_prepare_items() num:1
           27801:20241120:135635.331 In substitute_key_macros_impl() data:'net.if.out[ifOutOctets.56]'
           27801:20241120:135635.331 End of substitute_key_macros_impl():SUCCEED data:'net.if.out[ifOutOctets.56]'
           27801:20241120:135635.331 In substitute_simple_macros_impl() data:'161'
           27801:20241120:135635.331 End substitute_simple_macros_impl() data:'161'
           27801:20241120:135635.331 In substitute_simple_macros_impl() data:'cumuluspassword'
           27801:20241120:135635.331 End substitute_simple_macros_impl() data:'cumuluspassword'
           27801:20241120:135635.331 In substitute_key_macros_impl() data:'1.3.6.1.2.1.2.2.1.16.56'
           27801:20241120:135635.331 End of substitute_key_macros_impl():SUCCEED data:'1.3.6.1.2.1.2.2.1.16.56'
           27801:20241120:135635.331 In substitute_simple_macros_impl() data:'20s'
           27801:20241120:135635.331 End substitute_simple_macros_impl() data:'20s'
           27801:20241120:135635.331 End of zbx_prepare_items()
           27801:20241120:135635.331 In get_values_snmp() host:'sw-0021' addr:'10.250.255.21' num:1
           27801:20241120:135635.331 In zbx_snmp_open_session()
           27801:20241120:135635.331 In zbx_is_ip4() ip:'10.250.255.21'
           27801:20241120:135635.331 End of zbx_is_ip4():SUCCEED
           27801:20241120:135635.331 SNMP [[email protected]:161]
           27801:20241120:135635.331 End of zbx_snmp_open_session()
           27801:20241120:135635.331 In zbx_snmp_process_standard()
           27801:20241120:135635.331 In zbx_snmp_translate() OID:'1.3.6.1.2.1.2.2.1.16.56'
           27801:20241120:135635.331 End of zbx_snmp_translate() oid_translated:'1.3.6.1.2.1.2.2.1.16.56'
           27801:20241120:135635.331 In zbx_snmp_get_values() num:1 level:0
           27801:20241120:135635.333 zbx_snmp_get_values() snmp_sess_synch_response() status:0 s_snmp_errno:0 errstat:0 mapping_num:1
           27801:20241120:135635.333 In zbx_snmp_set_result() type:65
           27801:20241120:135635.333 End of zbx_snmp_set_result():SUCCEED
           27801:20241120:135635.333 End of zbx_snmp_get_values():SUCCEED
           27801:20241120:135635.333 End of zbx_snmp_process_standard():SUCCEED
           27801:20241120:135635.333 In zbx_snmp_close_session()
           27801:20241120:135635.333 End of zbx_snmp_close_session()
           27801:20241120:135635.333 End of get_values_snmp()
           27801:20241120:135635.333 In zbx_activate_item_interface() interfaceid:36 itemid:50735 type:20 version:0
           27801:20241120:135635.333 End of zbx_activate_item_interface()
           27801:20241120:135635.333 In zbx_preprocess_item_value()
           27801:20241120:135635.333 End of zbx_preprocess_item_value()
           27801:20241120:135635.333 In zbx_ipc_socket_write()
           27801:20241120:135635.333 End of zbx_ipc_socket_write():SUCCEED
           27801:20241120:135635.333 End of get_values():1
           27801:20241120:135635.333 zbx_setproctitle() title:'poller #2 [got 1 values in 0.002470 sec, idle 5 sec]'
           27801:20241120:135635.333 In zbx_ipc_async_socket_recv() timeout:5
           27788:20241120:135635.333 zbx_ipc_async_socket_recv() code:9997 size:0 data:
           27788:20241120:135635.333 End of zbx_ipc_async_socket_recv():0
           27788:20241120:135635.333 In zbx_ipc_async_socket_recv() timeout:0
           27788:20241120:135635.333 End of zbx_ipc_async_socket_recv():0
           27788:20241120:135635.333 In zbx_sync_history_cache() history_num:1
           27788:20241120:135635.333 In zbx_dc_expand_user_and_func_macros() '365d'
           27788:20241120:135635.333 End of zbx_dc_expand_user_and_func_macros() '365d'
           27788:20241120:135635.333 In zbx_dc_expand_user_and_func_macros() '31d'
           27788:20241120:135635.333 End of zbx_dc_expand_user_and_func_macros() '31d'
           27788:20241120:135635.333 In DCmass_prepare_history() history_num:1
           27788:20241120:135635.333 End of DCmass_prepare_history()
           27788:20241120:135635.333 In DBmass_add_history()
           27788:20241120:135635.333 In zbx_history_add_values()
           27788:20241120:135635.334 query [txnlev:1] [begin;]
           27788:20241120:135635.334 query [txnlev:1] [insert into history_uint (itemid,clock,ns,value) values (50735,1732107395,333335943,18865040);.]
           27788:20241120:135635.334 query [txnlev:1] [commit;]
           27791:20241120:135635.345 End of zbx_ipc_async_socket_recv():0
           27791:20241120:135635.345 zbx_setproctitle() title:'history syncer #4 [processed 0 values, 0 triggers in 0.000094 sec, syncing history]'
           27791:20241120:135635.345 In zbx_sync_history_cache() history_num:1
           27791:20241120:135635.345 zbx_setproctitle() title:'history syncer #4 [processed 0 values, 0 triggers in 0.000063 sec, idle 1 sec]'
           27791:20241120:135635.345 In zbx_ipc_async_socket_recv() timeout:1
           27788:20241120:135635.347 End of zbx_history_add_values()
           27788:20241120:135635.347 End of DBmass_add_history()
           27788:20241120:135635.347 In zbx_dc_mass_update_trends()
           27788:20241120:135635.347 End of zbx_dc_mass_update_trends()
          Polled and not stored in the DB
          Code:
           28421:20241120:144444.270 zbx_setproctitle() title:'poller #3 [got 0 values in 0.000406 sec, getting values]'
           28421:20241120:144444.270 In get_values()
           28421:20241120:144444.270 In zbx_dc_config_get_poller_items() poller_type:0
           28421:20241120:144444.270 End of zbx_dc_config_get_poller_items():1
           28421:20241120:144444.270 In zbx_prepare_items() num:1
           28421:20241120:144444.271 In substitute_key_macros_impl() data:'net.if.in[ifInOctets.56]'
           28421:20241120:144444.271 End of substitute_key_macros_impl():SUCCEED data:'net.if.in[ifInOctets.56]'
           28421:20241120:144444.271 In substitute_simple_macros_impl() data:'161'
           28421:20241120:144444.271 End substitute_simple_macros_impl() data:'161'
           28421:20241120:144444.271 In substitute_simple_macros_impl() data:'cumuluspassword'
           28421:20241120:144444.271 End substitute_simple_macros_impl() data:'cumuluspassword'
           28421:20241120:144444.271 In substitute_key_macros_impl() data:'1.3.6.1.2.1.2.2.1.10.56'
           28421:20241120:144444.271 End of substitute_key_macros_impl():SUCCEED data:'1.3.6.1.2.1.2.2.1.10.56'
           28421:20241120:144444.271 In substitute_simple_macros_impl() data:'20s'
           28421:20241120:144444.271 End substitute_simple_macros_impl() data:'20s'
           28421:20241120:144444.271 End of zbx_prepare_items()
           28421:20241120:144444.271 In get_values_snmp() host:'sw-0021' addr:'10.250.255.21' num:1
           28421:20241120:144444.271 In zbx_snmp_open_session()
           28421:20241120:144444.271 In zbx_is_ip4() ip:'10.250.255.21'
           28421:20241120:144444.271 End of zbx_is_ip4():SUCCEED
           28421:20241120:144444.271 SNMP [[email protected]:161]
           28421:20241120:144444.271 End of zbx_snmp_open_session()
           28421:20241120:144444.271 In zbx_snmp_process_standard()
           28421:20241120:144444.271 In zbx_snmp_translate() OID:'1.3.6.1.2.1.2.2.1.10.56'
           28421:20241120:144444.271 End of zbx_snmp_translate() oid_translated:'1.3.6.1.2.1.2.2.1.10.56'
           28421:20241120:144444.271 In zbx_snmp_get_values() num:1 level:0
           28421:20241120:144444.273 zbx_snmp_get_values() snmp_sess_synch_response() status:0 s_snmp_errno:0 errstat:0 mapping_num:1
           28421:20241120:144444.273 In zbx_snmp_set_result() type:65
           28421:20241120:144444.273 End of zbx_snmp_set_result():SUCCEED
           28421:20241120:144444.273 End of zbx_snmp_get_values():SUCCEED
           28421:20241120:144444.273 End of zbx_snmp_process_standard():SUCCEED
           28421:20241120:144444.273 In zbx_snmp_close_session()
           28421:20241120:144444.273 End of zbx_snmp_close_session()
           28421:20241120:144444.273 End of get_values_snmp()
           28421:20241120:144444.273 In zbx_activate_item_interface() interfaceid:36 itemid:50564 type:20 version:0
           28421:20241120:144444.273 End of zbx_activate_item_interface()
           28421:20241120:144444.273 In zbx_preprocess_item_value()
           28421:20241120:144444.273 End of zbx_preprocess_item_value()
           28421:20241120:144444.273 In zbx_ipc_socket_write()
           28421:20241120:144444.273 End of zbx_ipc_socket_write():SUCCEED
           28421:20241120:144444.273 End of get_values():1
           28421:20241120:144444.273 zbx_setproctitle() title:'poller #3 [got 1 values in 0.003022 sec, idle 5 sec]'
           28421:20241120:144444.273 In zbx_ipc_async_socket_recv() timeout:5
           28411:20241120:144444.313 End of zbx_ipc_async_socket_recv():0
          Something looks different about zbx_ipc_async_socket_recv() in the logs, can this have something to do with the issue?

          Thanks, Thomas

          Comment

          • ThomasV
            Junior Member
            • Nov 2024
            • 9

            #6
            With more debugging on, it seems the value after preprocessing is empty;

            Bits Send has Value returned from pre-processing
            Code:
             29753:20241120:183235.437 [3] pp_worker_entry() process task type:5 itemid:50735
             29753:20241120:183235.437 [3] In pp_execute(): value:2232498461 type:uint64
             29753:20241120:183235.437 [3] In pp_execute_step() step:10 params:'' value:'2232498461' cache:(nil)
             29753:20241120:183235.437 [3] In zbx_dc_expand_user_and_func_macros() ''
             29753:20241120:183235.437 [3] End of zbx_dc_expand_user_and_func_macros() ''
             29753:20241120:183235.437 [3] End of pp_execute_step() ret:SUCCEED value:3486863
             29753:20241120:183235.437 [3] In pp_execute_step() step:1 params:'8' value:'3486863' cache:(nil)
             29753:20241120:183235.437 [3] In zbx_dc_expand_user_and_func_macros() '8'
             29753:20241120:183235.437 [3] End of zbx_dc_expand_user_and_func_macros() '8'
             29753:20241120:183235.437 [3] End of pp_execute_step() ret:SUCCEED value:27894904
             29753:20241120:183235.437 [3] End of pp_execute(): value:'27894904' type:uint64
            Bits Received has no value returned from pre-processing
            Code:
             29753:20241120:183244.435 [2] pp_worker_entry() process task type:5 itemid:50564
             29753:20241120:183244.435 [2] In pp_execute(): value:611136069 type:uint64
             29753:20241120:183244.435 [2] In pp_execute_step() step:10 params:'' value:'611136069' cache:(nil)
             29753:20241120:183244.435 [2] In zbx_dc_expand_user_and_func_macros() ''
             29753:20241120:183244.435 [2] End of zbx_dc_expand_user_and_func_macros() ''
             29753:20241120:183244.435 [2] End of pp_execute_step() ret:SUCCEED value:
             29753:20241120:183244.435 [2] End of pp_execute(): value:'' type:none​

            Comment

            • cyber
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Dec 2006
              • 4807

              #7
              I am out of ideas there...

              Comment

              • ThomasV
                Junior Member
                • Nov 2024
                • 9

                #8
                Ok, so i had a look at the pre-processing of these values. There are two steps:
                • Change per second
                • Custom Multiplier = 8
                We can see in the logs the value is empty after the first step.
                When i read the documentation for "Change per second" it states:
                As this calculation may produce floating-point numbers, it is recommended to set the 'Type of information' to Numeric (float), even if the incoming raw values are integers. This is especially relevant for small numbers where the decimal part matters. If the floating-point values are large and may exceed the 'float' field length in which case the entire value may be lost, it is actually suggested to use Numeric (unsigned) and thus trim only the decimal part.
                But the items are Numeric (Unsigned). So i changed them to Numeric (Float) just to try. This did not solve my problem.

                Then i gave up, and switched templates. From "Network Generic Device by SNMP" to "Linux by SNMP" as these Cumulus Switches are linux at the core. The "Linux by SNMP" also has an Interface Discovery with the same items.
                I must note that these are also Numeric (Unsigned). But they worked. No gaps in my graphs anymore.

                The main difference between these templates is the polling time for the interface items. On the "Network Generic Device by SNMP" the interval is 3m and on "Linux by SNMP" the interval is 1m.

                I changed the interval on the "Network Generic Device by SNMP" template items from 3m to 1m, and this solves the problem. So, somewhere in the "Change per second" calculation if it is around 180 seconds it might cause a problem, and if it's 60 seconds it does not.
                Last edited by ThomasV; 21-11-2024, 20:17.

                Comment


                • tim.mooney
                  tim.mooney commented
                  Editing a comment
                  It's rare to see an issue on these forums that's this well debugged. I know it's easier for you to switch to a different template, but for the sake of others I wish you would report this via support.zabbix.com. Without a support contract there's no guarantee that you'll get a response, but it sure feels like there's a problem hiding here somewhere.
              • cyber
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Dec 2006
                • 4807

                #9
                But both items (bits in/out) have same interval.. same preprocessing.. .. one is OK, one is not.. Still a mystery.

                Comment

                Working...