Ad Widget

Collapse

Zabbix proxy - a lot of queues.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Himura
    Junior Member
    • Feb 2021
    • 14

    #1

    Zabbix proxy - a lot of queues.

    Hello everyone,

    kindly asking you for the help and best practises about the tuning of zabbix server and proxies.

    What have I got:

    server side : 6 GB RAM, 4 cores from Xeon CEP E5620 2.40 GHz , zabbix environment is working under docker containers (server 5.4.2 + postgreSLQ + nginX), 100 GB HDD
    proxies side (working in active mode, using sqllite3) : 4 GB RAM, 2 processors Xeon CPU E5-2620 v3, 10GB HDD

    PostgreSQL conf tune is the following (if not listed -> default values)
    max_connections = 198
    shared_buffers = 1536MB
    effective_cache_size = 4608MB
    maintenance_work_mem = 768MB
    checkpoint_completion_target = 0.9
    wal_buffers = 16MB
    default_statistics_target = 500
    random_page_cost = 4
    effective_io_concurrency = 2
    work_mem = 1985kB
    min_wal_size = 4GB
    max_wal_size = 16GB
    max_worker_processes = 4
    max_parallel_workers_per_gather = 2
    max_parallel_workers = 4
    max_parallel_maintenance_workers = 2
    Server conf (if not listed -> default values)
    ZBX_STARTPOLLERS: 90
    ZBX_STARTPREPROCESSORS: 60
    ZBX_STARTPOLLERSUNREACHABLE: 30
    ZBX_STARTPINGERS: 100

    ZBX_TIMEOUT: 30
    ZBX_IPMIPOLLERS: 10
    ZBX_STARTTRAPPERS: 20
    ZBX_STARTDBSYNCERS: 5

    ZBX_CACHESIZE: 128M
    ZBX_VALUECACHESIZE: 128M
    ZBX_HISTORYCACHESIZE: 60M
    ZBX_HISTORYINDEXCACHESIZE: 30M
    ZBX_TRENDCACHESIZE: 30M

    ZBX_MAXHOUSEKEEPERDELETE: 10000
    Proxies conf (if not listed -> default values) :
    DBName=/tmp/zabbix_proxy.db
    StartPollers=40
    StartPollersUnreachable=5
    StartTrappers=10
    StartPingers=10
    CacheSize=124M
    StartDBSyncers=10
    HistoryCacheSize=32M
    HistoryIndexCacheSize=12M
    Timeout=15
    From the server graphs I can see the following information about the server status:
    Click image for larger version  Name:	data gathering process busy.png Views:	0 Size:	80.3 KB ID:	420206
    Click image for larger version  Name:	internal process busy.png Views:	0 Size:	100.4 KB ID:	420207

    It looks ok, but the server performance is horrible :

    Click image for larger version  Name:	server performance 1 day.png Views:	0 Size:	99.2 KB ID:	420208

    I'm using some LLD discovery scripts to detect the network interfaces parameters and status of Wireless APs at the controller side. From time to time I'm facing very strange situation among the items : using the same discovery rule some items are working fine, and in the same time others not :
    Click image for larger version  Name:	sample interface working.png Views:	0 Size:	17.0 KB ID:	420209

    Click image for larger version  Name:	image_18517.png Views:	4 Size:	33.3 KB ID:	420210

    Kindly asking for your help!!

    With best regards, Max.
    Last edited by Himura; 08-03-2021, 10:57.
  • Himura
    Junior Member
    • Feb 2021
    • 14

    #2
    Update : even the items that were working fine received the error "SNMP error (tooBig)" after attaching new devices monitored by the Zabbix.

    Comment

    • Himura
      Junior Member
      • Feb 2021
      • 14

      #3
      UPD: proxy service restart initiate complete db creation on the proxy side and not supported items again and again are changing their status from "Not supported" to "Enable" and back again.

      Comment

      • Himura
        Junior Member
        • Feb 2021
        • 14

        #4
        UPD: unlink and clear all the data for the test. Now I can see the following situation
        Click image for larger version

Name:	queue overview.png
Views:	1694
Size:	27.7 KB
ID:	420240

        Could someone help, please ?

        Comment

        • Hamardaban
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • May 2019
          • 2713

          #5
          Your problem is not in the proxy settings (although you can reduce the number of pollers that run).) but in the work of the snmp subsystem.
          Try to set / remove the bulk request checkbox in the node settings.
          Look in the logs of the monitored devices for errors. Update the device firmware. If you use snmp_v3-try switching to v2 (reduces the load on encryption processing)

          Comment

          • Himura
            Junior Member
            • Feb 2021
            • 14

            #6
            Hello Hamardaban and thank you for your support!

            I've turned off the bulk request as you've adviced. Situation is the following.
            Click image for larger version

Name:	изображение_2021-03-14_194208.png
Views:	1693
Size:	23.4 KB
ID:	420642

            From the proxy logs I can see a lot of similar logs

            41551:20210314:204256.684 SNMP agent item "ap-clients-[]" on host "Wireless Controller" failed: first network error, wait for 15 seconds
            41559:20210314:204311.697 resuming SNMP agent checks on host "Wireless Controller": connection restored
            41524:20210314:204356.716 SNMP agent item "ap-eth-download[]" on host "Wireless Controller" failed: first network error, wait for 15 seconds
            41563:20210314:204411.282 resuming SNMP agent checks on host "Wireless Controller": connection restored
            41534:20210314:204456.781 SNMP agent item "ap-eth-download[]" on host "Wireless Controller" failed: first network error, wait for 15 seconds
            41559:20210314:204511.648 resuming SNMP agent checks on host "Wireless Controller": connection restored
            41552:20210314:204556.432 SNMP agent item "net.if.speed[ifHighSpeed.14]" on host "Wireless Controller" failed: first network error, wait for 15 seconds
            41559:20210314:204611.986 resuming SNMP agent checks on host "Wireless Controller": connection restored
            41538:20210314:204656.496 SNMP agent item "ap-radio-state-2ghz-[]" on host "Wireless Controller" failed: first network error, wait for 15 seconds
            41562:20210314:204712.009 resuming SNMP agent checks on host "Wireless Controller": connection restored
            But when any new entry appears I'm able to do snmpwalk for the conrete parameter and receive the needed data.

            Any advices or help will be really appreciated.

            Comment

            Working...