Ad Widget

Collapse

Zabbix history syncer processes more than 75% busy in zabbix 4.4

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • hamidhszd
    Junior Member
    • Jun 2019
    • 8

    #1

    Zabbix history syncer processes more than 75% busy in zabbix 4.4

    Hi guys
    i have a huge problem in my zabbix server and i get a lot of alerts about processes

    Zabbix history syncer processes more than 75% busy /
    Zabbix alerter processes more than 75% busy /
    Zabbix escalator processes more than 75% busy /
    Zabbix discoverer processes more than 75% busy /

    LogFileSize=0
    StartPollers=500
    StartPollersUnreachable=400
    StartPingers=30
    StartHTTPPollers=60
    StartVMwareCollectors=30
    VMwareCacheSize=1G
    CacheSize=3G
    HistoryCacheSize=2012M
    HistoryIndexCacheSize=100M
    TrendCacheSize=1G
    ValueCacheSize=400M
    Timeout=20
    UnavailableDelay=3
    UnreachableDelay=2
    ExternalScripts=/usr/lib/zabbix/externalscripts
    LogSlowQueries=3000


    i really appreciate your help concerning this issue.

    Click image for larger version

Name:	Capture.PNG
Views:	25302
Size:	126.8 KB
ID:	402500
  • doctorbal82
    Member
    • Oct 2016
    • 39

    #2
    hamidhszd ,

    How large is your Zabbix database? What database are you using (MariaDB, PostgreSQL)?

    When it comes to a long running History Syncer processes you can always add more resources to the HistoryCacheSize parameter in your zabbix_server.conf file as long as you have enough resources on the system (memory) to spare.

    I see your housekeeper process is also taking a some time to run. This typically shows issues with the database struggling to clean up "old" items/triggers/etc.

    Depending on the database you use I would suggest to implement partitioning.

    If you use PostgreSQL consider looking into timescaledb (https://www.zabbix.com/documentation...ll/timescaledb) or implement a 3rd party partitioning tool. I wrote a bit about it here using pg_partman for PostgreSQL: https://github.com/Doctorbal/zabbix-...s-partitioning

    Partitioning will be your biggest performance improvement in the long term.

    in the interim see if you can increase the HistoryCacheSize slightly and see if that improves the issue you are seeing on a regular basis.

    Best Regards,
    Andreas

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #3
      In addition to all the good questions and advice that doctorbal82 had, what does your server have for new values per second (NVPS)?

      500 pollers and 400 pollers for unreachable seems really high unless you have a massive environment. Setting those too high (when not needed) seems to be another source of contention that can be detrimental to performance.

      Comment

      • ingus.vilnis
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Mar 2014
        • 908

        #4
        Poor database performance and lack of history* and trends* table partitioning is the key here.

        Zabbix server configuration file as posted also does not make any sense, it must be viewed in larger context along with telemetry analysis of Zabbix internal processes and caches.

        Sorry, but I have to totally disagree on the suggestion to raise HistoryCacheSize even more.(2012M - why such a weird value anyways?) 128M is enough for most Zabbix setups and this cache should be 99% free all the time. If your history cache is filling up then likely because of History Syncers not being able to catch up. Adding more cache just gives a bit more time till full crash or total freeze of Zabbix, nothing more. If you have full history cache then you will never recover it anyways. And don't think adding more History Syncers will help - no, it won't.

        So start by performance analyses of Zabbix server and database. Search around the topics here, there are few good ones that will set you off.

        Comment

        • hamidhszd
          Junior Member
          • Jun 2019
          • 8

          #5
          Originally posted by doctorbal82
          hamidhszd ,

          How large is your Zabbix database? What database are you using (MariaDB, PostgreSQL)?

          When it comes to a long running History Syncer processes you can always add more resources to the HistoryCacheSize parameter in your zabbix_server.conf file as long as you have enough resources on the system (memory) to spare.

          I see your housekeeper process is also taking a some time to run. This typically shows issues with the database struggling to clean up "old" items/triggers/etc.

          Depending on the database you use I would suggest to implement partitioning.

          If you use PostgreSQL consider looking into timescaledb (https://www.zabbix.com/documentation...ll/timescaledb) or implement a 3rd party partitioning tool. I wrote a bit about it here using pg_partman for PostgreSQL: https://github.com/Doctorbal/zabbix-...s-partitioning

          Partitioning will be your biggest performance improvement in the long term.

          in the interim see if you can increase the HistoryCacheSize slightly and see if that improves the issue you are seeing on a regular basis.

          Best Regards,
          Andreas

          Thanks alot for your answer.
          but as you know HistoryCacheSize Range is 128K-2G so i cant increase it anymore
          i think also that its not ant cache problem


          by the way im using MariaDB and its very big
          could you please tell me more about partitioning?

          and what about moving DB to a seprated server?


          Click image for larger version

Name:	Capture.PNG
Views:	25133
Size:	47.1 KB
ID:	402630
          Attached Files

          Comment

          • hamidhszd
            Junior Member
            • Jun 2019
            • 8

            #6
            Originally posted by tim.mooney
            In addition to all the good questions and advice that doctorbal82 had, what does your server have for new values per second (NVPS)?

            500 pollers and 400 pollers for unreachable seems really high unless you have a massive environment. Setting those too high (when not needed) seems to be another source of contention that can be detrimental to performance.


            I've reduced the mentioned poller to 50 and 40 but there were not any changes there.

            Click image for larger version

Name:	Capture.PNG
Views:	25011
Size:	16.9 KB
ID:	402632

            Comment

            • hamidhszd
              Junior Member
              • Jun 2019
              • 8

              #7
              Originally posted by ingus.vilnis
              Poor database performance and lack of history* and trends* table partitioning is the key here.

              Zabbix server configuration file as posted also does not make any sense, it must be viewed in larger context along with telemetry analysis of Zabbix internal processes and caches.

              Sorry, but I have to totally disagree on the suggestion to raise HistoryCacheSize even more.(2012M - why such a weird value anyways?) 128M is enough for most Zabbix setups and this cache should be 99% free all the time. If your history cache is filling up then likely because of History Syncers not being able to catch up. Adding more cache just gives a bit more time till full crash or total freeze of Zabbix, nothing more. If you have full history cache then you will never recover it anyways. And don't think adding more History Syncers will help - no, it won't.

              So start by performance analyses of Zabbix server and database. Search around the topics here, there are few good ones that will set you off.
              Thanks for your advice but i've searched alot and couldnt find anything useful related to my problem.

              Comment

            • doctorbal82
              Member
              • Oct 2016
              • 39

              #8
              Very interesting insight from ingus.vilnis stating that increasing the HistoryCacheSize will not be the solution here. But now seeing yours graph on cache usage I can now see that indeed that is not the issue.

              As mentioned partitioning will be your best friend. I do not have experience with MySQL partitioning but see if you can search in this forum or check out https://zabbix.org/wiki/Main_Page for some ideas.

              If you want to migrate to PostgreSQL from MySQL check out a recent YouTube guide by the official Zabbix team: https://youtu.be/S-C5NCZJnt0

              There are some fundamentals that you are look at and see if that helps resolve the problem:
              • Separate the services: 3 separate servers/VMs; 1 for database; 1 for Zabbix Server; 1 for Zabbix Web Server
              • Use SSD over HDD. Particularly for the Database. Here you will see a large improvement.
              • Ensure you provide the database plenty of memory and tune the settings appropriately. Search the forum for MariaDB tuning.
              • Ensure you provide the Zabbix Server plenty of CPU for processing items quickly and efficiently.
              • Ensure templates are "lean and mean". Keep them simple and ensure the item processing is simple and effective.
              • Ensure you use latest software packages to mitigate security and bug issues on both the OS, Zabbix, DB, web server (apache, nginx), etc.
              • Partition! It is daunting to implement but worth it. Once Zabbix NVPS hits over 1000 you will see a lot more problems if no partitioning is in place.
              I'm sure the esteemed individuals on this forum can provide more performance improvements but this is an immediate list off the top of my head.

              Best Regards,
              Andreas

              Comment


              • ingus.vilnis
                ingus.vilnis commented
                Editing a comment
                Hi Andreas,

                Yes, History Cache has to be free all the time. If it is not - you have problems elsewhere, more cache will not solve it.

                Partitioning is the way to go but I see that this instance has only 400 nvps so it could do well without partitioning too. Just make sure the DB is fast.

                Migration from one DB engine to another - no. If you made a poor choice in selection of DB engine (with added inability to tune it properly for intended workload) then migrating to another engine will require the same tuning plus introduce heap of problems caused by mistakes made during the migration. One thing is to watch a youtube video, completely different is to do it on production. Postgres will not solve problems for this particular case.

                Separation of services is good but again might be an overkill for 400 nvps setup. Good thing though - you can have more isolated view on the problems e.g. which component (zabbix, DB or web) is causing the load. You can't tell that for sure on all-in-one server.

                SSD for database is good, that is true, but again at 400 nvps any disk will do. How did we live 5 years ago when SSD was a luxury?

                RAM for innodb_buffer_pool_size is crucial for MySQL. Give it 50-75% of total RAM of DB server.

                Zabbix server does not need plenty of CPU. Nor RAM. Nor disk. I am looking now at an instance of 4000 nvps (10 times more than here). 4 cores @30% utilized, 8GB RAM.

                Clean templates - yes. Monitor only things that you need and UNDERSTAND their meaning. Otherwise it is waste of power and data. If possible use Zannix agent (active) type of items for standard server telemetry.

                Latest packages. Yes but with caution. I have stayed on older but stable Zabbix for a longer time jut because newer releases broke some features.

                And again on partitioning. It is best when configured on clean empty server. The problem with big existing databases is that it takes longer downtime to do.

                Best Regards,
                Ingus

              • doctorbal82
                doctorbal82 commented
                Editing a comment
                Excellent, professional insight ingus.vilnis! Thank you for breaking it down from your experience.

                I am actually quite surprised on the Zabbix server not needing that much CPU, RAM nor disk. 4000 nvps at 4 cores and 8GB RAM is amazing! I am not at that scale yet but will keep that in mind.

                Also good call out on the latest packages. Something to keep in mind for sure.

                Best Regards,
                Andreas

              • ingus.vilnis
                ingus.vilnis commented
                Editing a comment
                Please keep in mind that those mentioned 4 cores are just for Zabbix server alone.

                DB runs separately on 10 cores / 40 GB RAM using Percona MySQL.

                Web on two 2 core 4 GB F5 balanced nginx servers handling 50-70 simultaneous users daily.
            • rightkick
              Junior Member
              • May 2021
              • 3

              #9
              I had the same warnings and the web UI was very slow. Setting the below at my mariadb server did a huge difference:

              Code:
              key_buffer_size = 1G
              query_cache_size = 16M
              max_heap_table_size = 32M
              innodb_buffer_pool_size = 2G​

              Comment


              • Gusalma
                Gusalma commented
                Editing a comment
                Esto solucionó para mi el problema de " Utilization of history syncer internal processes", paso de estar en uso el 100% a usar el 5%. gracias.
            Working...