Ad Widget

Collapse

Zabbix process busy every saturday from 5:00 - 08:30 am

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • DTIServicedesk
    Junior Member
    • Sep 2023
    • 9

    #1

    Zabbix process busy every saturday from 5:00 - 08:30 am


    Hello everyone

    First of all here are my versions:
    - Zabbix: 6.0.20 (70 hosts, ~6k items, 800 triggers, ~10k new values / hr)
    - Ubuntu: 22.04.3 LTS (4 CPUs, 8 GB RAM, 50 GB HDD)
    - MySQL: 8.0.34

    To my problem I have:
    I updated Zabbix from 3.2.0 to 6.0.20 in August. I completly build a new Ubuntu Server environment with Zabbix, so far everything works fine.
    After I copied the old Zabbix database to the new one, from the very first saturday I have busy processes on the Zabbix server monitoring (each saturday from 5:00 - 8:30 am). In this time it is not possible to send any data to Zabbix and it causes many problems which shouldn't be generated normally.

    In the process graphics is visible that the history syncer percentage (red line) is highly increasing at 5am. Thereby the housekeeping process also increases his percentage when started.
    I dont know yet, if its a Zabbix internal process which causes the high performance or another VM on the same ESXi host or the ESXi host itself.

    I also checked the logs (agent and server) but nothing noticeable.

    It feels like there is a scheduled task starting somewhere (internal or external) which causes this issue every saturday.

    Does someone have any experience with such problems? Or are there any hints I can work with?
    Does the server have too little CPU or memory? Average free memory over the last 7 days are 6 GB, so I dont think its a memory issue.
    Is there a way to check the Zabbix internal scheduled tasks / cronjobs?


    If more informations is needed please ask for them, so I can hand them later on.

    Thanks in advance for any help.

    Cheers, Simon


    Zabbix server config file:
    Code:
    ListenPort=10051
    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=0
    DebugLevel=3
    PidFile=/run/zabbix/zabbix_server.pid
    SocketDir=/run/zabbix
    DBName=zabbix
    DBUser=zabbix
    DBPassword=PASSWORD
    SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
    Timeout=4
    FpingLocation=/usr/bin/fping
    Fping6Location=/usr/bin/fping6
    LogSlowQueries=3000
    StatsAllowedIP=127.0.0.1


  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2
    Originally posted by DTIServicedesk
    - Ubuntu: 22.04.3 LTS (4 CPUs, 8 GB RAM, 50 GB HDD)

    (each saturday from 5:00 - 8:30 am).

    or another VM on the same ESXi host or the ESXi host itself.
    More RAM and some tuning for your MySQL install may help, but if the problem happens just once a week and it's a VM in a shared hosting environment, I would say there's about a 75% chance that it's caused by high I/O on whatever backing storage your VM resides upon.

    Do you do some kind of weekly VM snapshots or backups during that time? Some other VM on the same storage that's got a lot of I/O going on because of a weekly scheduled job? If you're not the vSphere/ESXi admin, ask the admin to check the I/O graphs for your storage pool during that time, to see if some other VM is monopolizing the I/O.

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4807

      #3
      Yeah... what tim.mooney said.. Look around in your environment, Zabbix does not have such weekly jobs. Housekeeper is 1h by default, history syncers run constantly. Common thing for them is accessing the DB. So your storage gets slow, processes cannot write etc...

      Comment

      • DTIServicedesk
        Junior Member
        • Sep 2023
        • 9

        #4
        Good morning everyone

        Thanks for your replies.

        I checked the backups of the VMs on the same ESXi host. They did start every saturday morning, so we decided to move them to sunday. But unfortunatly it didnt changed anything. The same error appears on saturday morning.

        Next steps are:
        - increase CPU on the Zabbix VM
        - include ESXi monitoring to Zabbix (to check if the whole ESXi has high performance)
        - shut down unnecessary VMs on the same ESXi host
        - check Zabbix DB and try to improve

        If someone has another approach, please let me know.

        I will keep updating in here, if something has changed.


        Thx and cheers
        Simon

        Comment

        • DTIServicedesk
          Junior Member
          • Sep 2023
          • 9

          #5

          Hello everyone

          I've got some updates on the issue:
          First of all, the issue is still occuring. What we have changed last week was:
          - Increased CPU of the Zabbix Host
          - Shut down unnecessary VMs on the ESXi host

          Next steps:
          - Check MySQL DB and improve, whereever I can

          If anyone else has any ideas on that case, please let me know. I dont know where to look anymore.

          Edit: and if someone has a good MySQL optimizing guide, I would be very thankfull.

          Cheers, Simon

          Screenshots attached:
          2x ESXi performance
          2x VM performance
          1x Zabbix processes
          Attached Files
          Last edited by DTIServicedesk; 09-10-2023, 13:48.

          Comment

          • cyber
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Dec 2006
            • 4807

            #6

            Very general optimizing guide from last weeks summit
            https://assets.zabbix.com/files/even..._your_Zabbix_s etup.pdf
            mentions just couple of things about mysql... But I dont think this would apply to such a small instance as yours...


            Maybe just a wild guess, but what else is going on in your env at that time? Seems that also escalator and alerter processes go up in usage at that time. Any alert storm at that time as other hosts in your env do strange things?

            Comment

            • DTIServicedesk
              Junior Member
              • Sep 2023
              • 9

              #7
              Hello everyone

              Last week I didnt change much but took the database in Zabbix monitoring, to check if I see anything conspicuous on the database itself.

              After checking the PDF in the link above, I modified the database in the following variables (did this yesterday, so I have to wait until saturday, if it still appears):
              Code:
              ################################
              # Added MySQL variables by SHU #
              ################################
              # 231017 SHU Added innodb_buffer_pool_size = 8G
              # 231017 SHU Added innodb_log_file_size = 512M
              # 231017 SHU Added innodb_file_per_table = 1
              # 231017 SHU Added innodb_log_buffer_size = 4M
              #
              innodb_buffer_pool_size = 8G
              innodb_log_file_size = 512M
              innodb_file_per_table = 1
              innodb_log_buffer_size = 4M
              ​
              I also installed mysqltuner and tuning-primer and checked what else I could change (Link: https://hostadvice.com/how-to/web-ho...-ubuntu-18-04/​)
              Recommendations from mysqltuner:
              Code:
              -------- Recommendations ---------------------------------------------------------------------------
              General recommendations:
                  Check warning line(s) in /var/log/mysql/error.log file
                  MySQL was started within the last 24 hours: recommendations may be inaccurate
                  Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=ON
                  We will suggest raising the 'join_buffer_size' until JOINs not using indexes are found.
                           See https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size
                  Be careful, increasing innodb_redo_log_capacity means higher crash recovery mean time
              Variables to adjust:
                  skip-name-resolve=ON
                  join_buffer_size (> 256.0K, or always use indexes with JOINs)
                  innodb_redo_log_capacity should be (=2G) if possible, so InnoDB Redo log Capacity equals 25% of buffer pool size.
              ​
              So i changed the join_buffer_size aswell:
              Code:
              # 231018 SHU Added join_buffer_size=524288
              join_buffer_size=524288

              As in the screenshot "231018_Database_BufferUsage.jpg" visible, the buffer utilization dropped by 80%.

              Also the screenshot "231018_Zabbix_HousekeepingProcess.jpg" shows a slightly lower housekeeping process usage (changes were at 17th October at like 3pm).


              I have some positive vibes that this could help and hopefully sort our problemes upcoming saturday.

              I will give another update starting next week.

              Cheers, Simon
              Attached Files

              Comment

              • DTIServicedesk
                Junior Member
                • Sep 2023
                • 9

                #8
                Hello,

                Long time since I wrote something about this case.

                We solved the problems in the meantime by moving the whole Zabbix VM to another ESXi Host, which has clearly more power in terms of RAM, CPU and disk performance.
                Since then we havent faced the problems anymore.

                I will close the case / will let the case be closed.

                Cheers, Simon

                Comment

                Working...