Ad Widget

Collapse

Zabbix monitoring has delay graph

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • arovah
    Member
    • Nov 2020
    • 43

    #1

    Zabbix monitoring has delay graph

    Dear Colleagues,

    Is there any solutions, why the graph on my zabbix always delayed (about 1-2 hours) and this always happens at night. The following is a description of the topology: device/agent -> zabbix proxy (2 poller/zabbix proxy) -> zabbix server -> database server. I'm using zabbix server version 4.2.0 and the database is using postgres 9.6. When the graph is delayed and I check for the performance of the zabbix proxy, the history syncer process is always above 70% and a maximum of 100%. Here's the configuration on zabbix_server.conf:

    StartPollers=50
    StartPollersUnreachable=25
    StartTrappers=10
    StartPingers=50
    StartDiscoverers=5
    StartAlerters=5
    HousekeepingFrequency=0
    CacheSize=8G
    CacheUpdateFrequency=900
    StartDBSyncers=25
    HistoryCacheSize=2G
    HistoryIndexCacheSize=2G
    TrendCacheSize=2G
    ValueCacheSize=24G
    Timeout=3


    Is there something wrong with my configuration above? How do I solve the slow query on my database, for example:
    slow query: 6.776881 sec, "select eventid,object,objectid from problem where r_eventid is null and source=3 and ( (object=0 and (objectid between 498885 and 498893 or objectid between 1072376 and 1072384 or objectid in (400427,400429,400430,700503,700504,1094354,109435 5,1094356,1072367,1072368,1072370,1072373,1072374, 1292969,1292981,1293000,1293002,1293003,1293004,12 93016,1293023,1309458,1313766))))"
    38319:20210730:230149.747 slow query: 5.018176 sec, "insert into history_uint (itemid,clock,ns,value) values (2205511,1627656447,208711164,2096),(1403623,16276 56447,542251607,280),(1403671,1627656447,542251607 ,1214759048),(1403599,1627656447,542251607,0),(140 3656,1627656447,542251607,41695288),(1403732,16276 56447,542251607,0),(1403849,1627656447,542251607,0 ),(1403718,1627656447,542251607,0),(1403561,162765 6447,542251607,0),(1403706,1627656447,542251607,29 6),(1191465,1627656450,989874179,0),(491800,162765 6450,989874179,0),(491708,1627656450,994046832,114 08),(491682,1627656450,994046832,3984),(2612694,16 27656450,994046832,2064),(491786,1627656450,994531 540,0),(491787,1627656450,994531540,0),(491701,162 7656450,994531540,15584),(491819,1627656450,994531 540,32),(491681,1627656450,994531540,0),(1117788,1 627656450,994531540,16),(491809,1627656450,9945315 40,304),(491693,1627656450,994531540,7664),(923930 ,1627656450,994531540,96),(2612297,1627656450,9945 31540,184),(491807,16276)

    Really appreciate for any answer and your advice.
    Thank You.

    Click image for larger version  Name:	internal_process_busy.png Views:	0 Size:	50.8 KB ID:	429014

    Click image for larger version  Name:	zabbix_proxy_performance.png Views:	0 Size:	60.2 KB ID:	429016
    Attached Files
  • arovah
    Member
    • Nov 2020
    • 43

    #2
    Hi splitek,

    Thank you for your response, do you know ? How many value i have to increase db syncer, with specification db server 32 cpu cores and 64 gb?

    Thanks

    Comment

    • splitek
      Senior Member
      • Dec 2018
      • 101

      #3
      Settings should be based on NVPS and on hardware (disks), I suggest to first try default: StartDBSyncers=4

      You have HousekeepingFrequency=0 are you using partitioning? If so then 0 is OK, if not or partitioning is by TSDB then I will suggest to stay with default 1.

      I wonder why "data sender" line have "no data". Proxy sending data to Zabbix so there should be something (Zabbix 4.x do not have that process, or maybe there are some problems with sending data to Zabbix?)

      Try to look in Zabbix at graphs with "cache" items (like zabbix[rcache,buffer,pused], zabbix[wcache,history,pused] and others) this tells you if settings like HistoryCacheSize are optimal or not.

      Do you have DB and Zabbix on this one host? If so then in my opinion more memory should go to PostgreSQL. It's hard to tell what settings are good for particular DB without testing and constantly monitoring DB but here is a web page that could help you set memory for DB:

      If you are using TSDB then there is a tool "timescaledb-tune" witch sets PostgreSQL memory for using with TimescaleDB.

      Comment

      • arovah
        Member
        • Nov 2020
        • 43

        #4
        Hi splitek,

        Thanks for your reply, i'm not sure if i decreasing StartDBSyncers=4 because if delay graph is happen the history syncer is up to 100% (high) . I have tried to increase to be 30 from 25 StartDBSyncers and for a while is better. But next two days once increased StartDBSyncers the same problem is happen again . We don't use partitioning database but we have separated server for database. Do you have suggested best config from this :data below :
        Number of hosts (enabled/disabled/templates) 17736 16903 / 706 / 127
        Number of items (enabled/disabled/not supported) 1432344 1348313 / 42306 / 41725
        Number of triggers (enabled/disabled [problem/ok]) 715747 693939 / 21808 [30106 / 663833]
        I attached too zabix cache graph. Really appriacted for you any suggest and support.
        Thank you

        Attached Files

        Comment

        • splitek
          Senior Member
          • Dec 2018
          • 101

          #5
          From cache graph we can read that cache usage is low. For "trend, history index, value cache" (parameters: TrendCacheSize, HistoryIndexCacheSize, ValueCacheSize) values even can be lower.
          Also from graph we see that Zabbix history write cache goes from average ~1,7% to max ~24%. This happens in hours 20:00 - 23:00. Question is it happens in periods? (day by day, or other periods). Maybe something is running on DB server in that hours? (backup?).
          Maybe in that hours monitored hosts are offline? This can lead to switch many triggers to problem state at once. This brings many events. So the DB will have many more to write.
          I suggest to look into PostgreSQL logs.
          Some configuration parameters should be tuned (ie. shared_buffers, effective_cache_size, work_mem, maintenance_work_mem). This link can help: https://blog.crunchydata.com/blog/op...er-performance
          and tool: https://pgtune.leopard.in.ua/#/

          Also good thing is to consider PG version upgrade. v9.6 is in support only to November this year. Newest versions of PG are also better in performance.

          Comment

          • arovah
            Member
            • Nov 2020
            • 43

            #6
            hi Splitek,

            Really appreciate for your reply, it happen sometime above 22:00 pm until 3 am more it's normal again. You are correct, there is backup process in our databae but if in morning until before 22:00 it always happen.
            I'm really out of ideas for this. Dou you have any solution for this ? or any one can help me ? please find attached file, it was happened again.
            Attached Files

            Comment

            • splitek
              Senior Member
              • Dec 2018
              • 101

              #7
              Can you consider upgrading PG to v12 or v13 and next also Zabbix to at least 5.0?
              I am also thinking of defining maintenance period for the time when this breaks happen and adding into it host that have big number of items/triggers. Maybe in that hours some hosts "spam" events? I had situation like that, but it every time ended in zabbix restart not problems with syncers (this was Zabbix 4.4 as I remember).

              Comment

              • arovah
                Member
                • Nov 2020
                • 43

                #8
                hi Slitek,

                Many thank you for your reply, we will consider about it.

                Thank You

                Comment

                Working...