Ad Widget

Collapse

NVPS is too low with MariaDB 10.11 and Zabbix 5.0.41

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • firesh
    Junior Member
    • Aug 2021
    • 18

    #1

    NVPS is too low with MariaDB 10.11 and Zabbix 5.0.41

    Hi,

    I have a NVPS of ~200 - 300 which is weird, prior to migrating from MariaDB 10.4 to MariaDB 10.11, our NVPS used to be ~1.2K.
    I am currently using Zabbix 5.0.41 to be specific via containers.

    I have 6794 Hosts, 311258 Number of Items.
    I have 1 Proxy, but the server also receives ~2k of those hosts as part of its passive checks itself.

    I have a server with spec of:
    40 Cores and 377Gb RAM.

    Below is the current configuration that I use:
    Code:
    -e ZBX_STARTPOLLERS=1000
    -e ZBX_STARTPREPROCESSORS=300
    -e ZBX_STARTPOLLERSUNREACHABLE=300
    -e ZBX_MAXHOUSEKEEPERDELETE=100000
    -e ZBX_STARTTRAPPERS=20
    -e ZBX_STARTPINGERS=100
    -e ZBX_STARTDISCOVERERS=100
    -e ZBX_STARTHTTPPOLLERS=20
    -e ZBX_STARTALERTERS=50
    -e ZBX_CACHESIZE=10G
    -e ZBX_STARTDBSYNCERS=100
    -e ZBX_HISTORYCACHESIZE=512M
    -e ZBX_HISTORYINDEXCACHESIZE=512M
    -e ZBX_TRENDCACHESIZE=512M
    -e ZBX_VALUECACHESIZE=1G
    -e ZBX_STARTPROXYPOLLERS=10
    -e ZBX_PROXYCONFIGFREQUENCY=120
    Also frequently I do get this error message :
    Code:
    601]: ERROR [file and function: <db.c,DBget_nextid>, revision:c0c2d49, line:814] Something impossible has just happened.
    601:20240228:070710.160 === Backtrace: ===
    601:20240228:070710.160 18: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](zbx_backtrace+0x58) [0x5555556d3dd8]
    601:20240228:070710.160 17: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](DBget_maxid_num+0x256) [0x55555572f966]
    601:20240228:070710.160 16: /usr/sbin/zabbix_server: discov
    My DB MariaDB 10.11 InnoDB parameters:
    Code:
    Name    Value
    innodb_adaptive_flushing    ON
    innodb_adaptive_flushing_lwm    10.000000
    innodb_adaptive_hash_index    OFF
    innodb_adaptive_hash_index_parts    8
    innodb_autoextend_increment    64
    innodb_autoinc_lock_mode    1
    innodb_buf_dump_status_frequency    0
    innodb_buffer_pool_chunk_size    251658240
    innodb_buffer_pool_dump_at_shutdown    ON
    innodb_buffer_pool_dump_now    OFF
    innodb_buffer_pool_dump_pct    25
    innodb_buffer_pool_filename    ib_buffer_pool
    innodb_buffer_pool_load_abort    OFF
    innodb_buffer_pool_load_at_startup    ON
    innodb_buffer_pool_load_now    OFF
    innodb_buffer_pool_size    16106127360
    innodb_change_buffer_max_size    25
    innodb_change_buffering    none
    innodb_checksum_algorithm    full_crc32
    innodb_cmp_per_index_enabled    OFF
    innodb_compression_algorithm    zlib
    innodb_compression_default    OFF
    innodb_compression_failure_threshold_pct    5
    innodb_compression_level    6
    innodb_compression_pad_pct_max    50
    innodb_data_file_path    ibdata1:12M:autoextend
    innodb_data_home_dir    
    innodb_deadlock_detect    ON
    innodb_deadlock_report    full
    innodb_default_encryption_key_id    1
    innodb_default_row_format    dynamic
    innodb_defragment    OFF
    innodb_defragment_fill_factor    0.900000
    innodb_defragment_fill_factor_n_recs    20
    innodb_defragment_frequency    40
    innodb_defragment_n_pages    7
    innodb_defragment_stats_accuracy    0
    innodb_disable_sort_file_cache    OFF
    innodb_doublewrite    ON
    innodb_encrypt_log    OFF
    innodb_encrypt_tables    OFF
    innodb_encrypt_temporary_tables    OFF
    innodb_encryption_rotate_key_age    1
    innodb_encryption_rotation_iops    100
    innodb_encryption_threads    0
    innodb_fast_shutdown    1
    innodb_fatal_semaphore_wait_threshold    600
    innodb_file_per_table    ON
    innodb_fill_factor    100
    innodb_flush_log_at_timeout    1
    innodb_flush_log_at_trx_commit    2
    innodb_flush_method    O_DIRECT
    innodb_flush_neighbors    1
    innodb_flush_sync    ON
    innodb_flushing_avg_loops    30
    innodb_force_primary_key    OFF
    innodb_force_recovery    0
    innodb_ft_aux_table    
    innodb_ft_cache_size    8000000
    innodb_ft_enable_diag_print    OFF
    innodb_ft_enable_stopword    ON
    innodb_ft_max_token_size    84
    innodb_ft_min_token_size    3
    innodb_ft_num_word_optimize    2000
    innodb_ft_result_cache_limit    2000000000
    innodb_ft_server_stopword_table    
    innodb_ft_sort_pll_degree    2
    innodb_ft_total_cache_size    640000000
    innodb_ft_user_stopword_table    
    innodb_immediate_scrub_data_uncompressed    OFF
    innodb_instant_alter_column_allowed    add_drop_reorder
    innodb_io_capacity    200
    innodb_io_capacity_max    2000
    innodb_lock_wait_timeout    50
    innodb_log_buffer_size    16777216
    innodb_log_file_buffering    ON
    innodb_log_file_size    536870912
    innodb_log_group_home_dir    ./
    innodb_lru_flush_size    32
    innodb_lru_scan_depth    1536
    innodb_max_dirty_pages_pct    90.000000
    innodb_max_dirty_pages_pct_lwm    0.000000
    innodb_max_purge_lag    0
    innodb_max_purge_lag_delay    0
    innodb_max_purge_lag_wait    4294967295
    innodb_max_undo_log_size    10485760
    innodb_monitor_disable    
    innodb_monitor_enable    
    innodb_monitor_reset    
    innodb_monitor_reset_all    
    innodb_old_blocks_pct    37
    innodb_old_blocks_time    1000
    innodb_online_alter_log_max_size    134217728
    innodb_open_files    2000
    innodb_optimize_fulltext_only    OFF
    innodb_page_size    16384
    innodb_prefix_index_cluster_optimization    ON
    innodb_print_all_deadlocks    OFF
    innodb_purge_batch_size    300
    innodb_purge_rseg_truncate_frequency    128
    innodb_purge_threads    8
    innodb_random_read_ahead    OFF
    innodb_read_ahead_threshold    56
    innodb_read_io_threads    64
    innodb_read_only    OFF
    innodb_read_only_compressed    OFF
    innodb_rollback_on_timeout    OFF
    innodb_sort_buffer_size    1048576
    innodb_spin_wait_delay    4
    innodb_stats_auto_recalc    ON
    innodb_stats_include_delete_marked    OFF
    innodb_stats_method    nulls_equal
    innodb_stats_modified_counter    0
    innodb_stats_on_metadata    OFF
    innodb_stats_persistent    ON
    innodb_stats_persistent_sample_pages    20
    innodb_stats_traditional    ON
    innodb_stats_transient_sample_pages    8
    innodb_status_output    OFF
    innodb_status_output_locks    OFF
    innodb_strict_mode    ON
    innodb_sync_spin_loops    30
    innodb_table_locks    ON
    innodb_temp_data_file_path    ibtmp1:12M:autoextend
    innodb_tmpdir    
    innodb_undo_directory    ./
    innodb_undo_log_truncate    OFF
    innodb_undo_tablespaces    0
    innodb_use_atomic_writes    ON
    innodb_use_native_aio    OFF
    innodb_write_io_threads    32
    Any help would be greatly appreciated.

    #strugglingsoul
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    Have you done some inventory, are all the items still present and receiving data? Specially discovered items... ?

    About your config..
    100 discoverers? Doing some massive network discovery all the time? Looks like overkill and your server also complains about discovery and impossible things...
    100 db syncers? you know that default 4 is OK for 4k NVPS... and after that you add 1 or 2 but not 96... increasing this may do more harm than good...
    10 proxy pollers if you just have one proxy... waste of resources...

    Comment

    • firesh
      Junior Member
      • Aug 2021
      • 18

      #3
      Originally posted by cyber
      Have you done some inventory, are all the items still present and receiving data? Specially discovered items... ?

      About your config..
      100 discoverers? Doing some massive network discovery all the time? Looks like overkill and your server also complains about discovery and impossible things...
      100 db syncers? you know that default 4 is OK for 4k NVPS... and after that you add 1 or 2 but not 96... increasing this may do more harm than good...
      10 proxy pollers if you just have one proxy... waste of resources...
      100 Discoverers for now is going by 1 discovery rule = 1 Thread for 1 subnet scan, i already have 42 subnets in list, so approximately 42 threads per that count, hence i just double it and round.
      100 DBsyncers alright this is probably my bad, however this worked efficiently with MariaDB 10.4 and with MariaDB 10.11 it went to shits. So probably MariaDB 10.11 became more efficient ?
      10 proxy poller , well in actual I have 4 proxy, but the rest are just online with just discovery happening, nothing else (almost like abandon), so the logic i was going with was 1 Proxy = 1 Proxy Poller

      Do educate if this thoughts are completely absurd.

      Comment

      • firesh
        Junior Member
        • Aug 2021
        • 18

        #4
        However, you mentioned that its discoverer on "Something impossible has happened"
        But its not just discoverer, it happens with trapper as well. To my knowledge i do not have anything running on trapper unless Proxy Pollers use trapper as my Proxies are active.
        Everything else is Passive.

        However, somehow i feel bad about this thou:
        Code:
        601:20240229:085134.772 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ffff6ab7083] zz0.9f3uskuektpzz zz0.18t9cggdzfwzz
        Code:
        zabbix_server [601]: ERROR [file and function: <db.c,DBget_nextid>, revision:c0c2d49, line:814] Something impossible has just happened.
           601:20240229:085134.772 === Backtrace: ===
           601:20240229:085134.772 18: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](zbx_backtrace+0x58) [0x5555556d3dd8]
           601:20240229:085134.772 17: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](DBget_maxid_num+0x256) [0x55555572f966]
           601:20240229:085134.772 16: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](+0x1e33ca) [0x5555557373ca]
           601:20240229:085134.772 15: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](DBdelete_triggers+0x18d) [0x55555573a97d]
           601:20240229:085134.772 14: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](+0x1e6b05) [0x55555573ab05]
           601:20240229:085134.772 13: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](DBdelete_items+0x636) [0x55555573b576]
           601:20240229:085134.772 12: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](DBdelete_hosts+0x184) [0x55555573c7c4]
           601:20240229:085134.772 11: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](DBdelete_hosts_with_prototypes+0xe0) [0x55555573ee80]
           601:20240229:085134.772 10: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](op_host_del+0x9e) [0x55555566db5e]
           601:20240229:085134.772 9: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](process_actions+0x5d8) [0x555555666ea8]
           601:20240229:085134.772 8: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](+0x1150a6) [0x5555556690a6]
           601:20240229:085134.772 7: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](zbx_process_events+0x253) [0x55555566ac23]
           601:20240229:085134.772 6: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](discoverer_thread+0xf13) [0x5555555b7763]
           601:20240229:085134.772 5: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](zbx_thread_start+0x24) [0x5555556df2a4]
           601:20240229:085134.772 4: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](MAIN_ZABBIX_ENTRY+0x8c9) [0x5555555a9d69]
           601:20240229:085134.772 3: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](daemon_start+0x175) [0x5555556d39b5]
           601:20240229:085134.772 2: /usr/sbin/zabbix_server: discoverer #11 [processed 0 rules in 0.000000 sec, performing discovery](main+0x687) [0x5555555a23e7]
           601:20240229:085134.772 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ffff6ab7083]
        Code:
        zabbix_server [2036]: ERROR [file and function: <db.c,DBget_nextid>, revision:c0c2d49, line:814] Something impossible has just happened.
          2036:20240229:084220.608 === Backtrace: ===
          2036:20240229:084220.608 21: /usr/sbin/zabbix_server: trapper #18 [processing data](zbx_backtrace+0x58) [0x5555556d3dd8]
          2036:20240229:084220.608 20: /usr/sbin/zabbix_server: trapper #18 [processing data](DBget_maxid_num+0x256) [0x55555572f966]
          2036:20240229:084220.608 19: /usr/sbin/zabbix_server: trapper #18 [processing data](+0x1e33ca) [0x5555557373ca]
          2036:20240229:084220.608 18: /usr/sbin/zabbix_server: trapper #18 [processing data](DBdelete_triggers+0x18d) [0x55555573a97d]
          2036:20240229:084220.608 17: /usr/sbin/zabbix_server: trapper #18 [processing data](+0x1e6b05) [0x55555573ab05]
          2036:20240229:084220.608 16: /usr/sbin/zabbix_server: trapper #18 [processing data](DBdelete_items+0x636) [0x55555573b576]
          2036:20240229:084220.608 15: /usr/sbin/zabbix_server: trapper #18 [processing data](DBdelete_hosts+0x184) [0x55555573c7c4]
          2036:20240229:084220.608 14: /usr/sbin/zabbix_server: trapper #18 [processing data](DBdelete_hosts_with_prototypes+0xe0) [0x55555573ee80]
          2036:20240229:084220.608 13: /usr/sbin/zabbix_server: trapper #18 [processing data](op_host_del+0x9e) [0x55555566db5e]
          2036:20240229:084220.608 12: /usr/sbin/zabbix_server: trapper #18 [processing data](process_actions+0x5d8) [0x555555666ea8]
          2036:20240229:084220.608 11: /usr/sbin/zabbix_server: trapper #18 [processing data](+0x1150a6) [0x5555556690a6]
          2036:20240229:084220.608 10: /usr/sbin/zabbix_server: trapper #18 [processing data](zbx_process_events+0x253) [0x55555566ac23]
          2036:20240229:084220.608 9: /usr/sbin/zabbix_server: trapper #18 [processing data](process_proxy_data+0x17c0) [0x55555574d290]
          2036:20240229:084220.608 8: /usr/sbin/zabbix_server: trapper #18 [processing data](zbx_recv_proxy_data+0x39f) [0x5555555dcaef]
          2036:20240229:084220.608 7: /usr/sbin/zabbix_server: trapper #18 [processing data](+0x80c0d) [0x5555555d4c0d]
          2036:20240229:084220.608 6: /usr/sbin/zabbix_server: trapper #18 [processing data](trapper_thread+0x218) [0x5555555d66f8]
          2036:20240229:084220.608 5: /usr/sbin/zabbix_server: trapper #18 [processing data](zbx_thread_start+0x24) [0x5555556df2a4]
          2036:20240229:084220.608 4: /usr/sbin/zabbix_server: trapper #18 [processing data](MAIN_ZABBIX_ENTRY+0x9fb) [0x5555555a9e9b]
          2036:20240229:084220.608 3: /usr/sbin/zabbix_server: trapper #18 [processing data](daemon_start+0x175) [0x5555556d39b5]
          2036:20240229:084220.608 2: /usr/sbin/zabbix_server: trapper #18 [processing data](main+0x687) [0x5555555a23e7]
          2036:20240229:084220.608 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ffff6ab7083]
          2036:20240229:084220.608 0: /usr/sbin/zabbix_server: trapper #18 [processing data](_start+0x2e) [0x5555555a8fce]
        zz0.pfv04dq4cjozz
        Also, does DB Default Collation is different matters ? if all my tables meet the collation but DB collation isnt a match ?

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4807

          #5
          Originally posted by firesh

          100 Discoverers for now is going by 1 discovery rule = 1 Thread for 1 subnet scan, i already have 42 subnets in list, so approximately 42 threads per that count, hence i just double it and round.
          100 DBsyncers alright this is probably my bad, however this worked efficiently with MariaDB 10.4 and with MariaDB 10.11 it went to shits. So probably MariaDB 10.11 became more efficient ?
          10 proxy poller , well in actual I have 4 proxy, but the rest are just online with just discovery happening, nothing else (almost like abandon), so the logic i was going with was 1 Proxy = 1 Proxy Poller

          Do educate if this thoughts are completely absurd.
          ok .. with load of network scans it might be fine.
          but DBSyncers is definitely too much.
          I have 20 passive proxies and 8 pollers for them, usual load ~20%. Goes higher only if there are some network issues. But I don't think having 10 pollers for your 5 proxies will affect much...

          I have a nagging feeling, that I have seen similar errors somewhere.. probably in forum here... But reasons or fixes... those escape me now.

          Comment

          • firesh
            Junior Member
            • Aug 2021
            • 18

            #6
            Hi,

            This is another thing that is bothering me, why everytime i restart the Zabbix-Server-Mysql (backend of Zabbix)
            For temporary the NVPS will spike high, in this picture is upto 500, i have also seen initially last time was upto 8k,
            but after that it will drop tremendously upto 200 NVPS only.

            It is weird behaviour for the poller, assuming the NVPS count is from the poller.

            Also take note, i have lots of the data coming through SSH Agent and not really Zabbix Agent.

            Click image for larger version

Name:	image.png
Views:	249
Size:	33.3 KB
ID:	479939

            Comment

            • firesh
              Junior Member
              • Aug 2021
              • 18

              #7
              Also another thing that I think Zabbix might not have taken into consideration on the 4k NVPS that you are mentioning,
              that value is assumed that you are using Zabbix Agent as your collector and not SSH agent , isnt it ?

              Have Zabbix ever tested based on their SSH agent, what the nvps they get ?

              I feel like 1000 Poller with SSH agent still seems bottleneck.

              Comment

              • cyber
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Dec 2006
                • 4807

                #8
                NVPS - Enabled items from monitored hosts are included in the calculation. Log items are counted as one value per item update interval. Regular interval values are counted; flexible and scheduling interval values are not. The calculation is not adjusted during a "nodata" maintenance period. Trapper items are not counted.


                So if your ssh checks are with regular interval, they should be counted...

                Comment

                Working...