Ad Widget

Collapse

No Data triggers firing between 1AM and 4AM every day

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cdecarlo
    Junior Member
    • Nov 2019
    • 23

    #1

    No Data triggers firing between 1AM and 4AM every day

    Hey everybody,

    I have been dealing with a maddening problem with our Zabbix installation. Every morning between 1AM and 4AM we have a lot of our triggers that are using the nodata() function firing.

    I'm trying to rule out anything internal to Zabbix that is happening every morning between 1AM and 4AM or if it is something external to Zabbix causing these issues.

    The only manifestation of any issues in the zabbix logs appears as more and slower query warning in the server and proxy log files. Normally during the day we will occasionally get "slow query" warnings with most of the query times between 3 and 5 seconds. However, during the 1AM - 4AM time range we see a lot more of these warnings in the server log and they are for much longer times. The proxy logs almost never have any slow query warnings except between 1AM - 4AM where we get several warnings with the timings between 3 and 5 seconds (running SQLite DB locally).

    For example (here are two warnings in the server log. One from 9:00 AM this morning and one from 04:02 AM this morning)

    Code:
    16215:20200716:090015.193 slow query: 3.478317 sec, "insert into trends_uint (itemid,clock…..
    Code:
    16217:20200716:040213.392 slow query: 72.393781 sec, "insert into trends_uint (itemid,clock...
    We are trying to look for anything in our environment that might be causing these slow downs, but also wanted to see if there was anything scheduled internally to Zabbix between 1AM and 4AM.

    Interestingly enough, these triggers usually suffer the "resolved time before the alert time" symptom, which I guess makes sense if things have slowed down and the server is not able to process the item data from all of our proxy servers fast enough (but everything was actually up and running).

    We are currently running Zabbix 5.0.1 on Ubuntu 18.04 using PostGres 10.12 (running on VMWare : VM has 8CPU/12G RAM) . I have turned off housekeeping for items/trends as I have a custom script that prunes these records every hour (no partitioning currently being used). Our configuration runs perfectly fine except between 1AM and 4AM when everything seems to slow down dramatically (but not for the entire hour, just for about 10 minutes each hour when it calms back down until the next hour).

    So...

    1:00 AM - 1:10 AM -> SLOW
    1:11 AM - 1:59 AM -> Normal
    2:00 AM - 2:10 AM -> SLOW
    2:11 AM - 2:59 AM - Normal

    Does anybody know if there is something scheduled in Zabbix internally on the hour ONLY between 1AM and 4AM (or something that behaves very differently between those hours) that might explain these symptoms? We are still looking at our environment to see if there is something external causing these issues.
  • cdecarlo
    Junior Member
    • Nov 2019
    • 23

    #2
    O.K. So I'm going to answer my own question.

    We ended up migrating our Postgres DB storage from SAS storage to SSD storage on our NetAPP filer. This seemed to decrease the disk latency during the 1AM - 4AM time frame to stop these "No Data" alerts firing. We are following up with NetAPP to see what is going on with our filers between 1AM and 4AM to slow down the disk performance so much. I am still occasionally seeing "slow query" alerts in the logs during these times but the longest ones now during that window are around 16 seconds with the majority of them being around 4 seconds. It seems like Zabbix is very sensitive to DB performance. We were seeing high write latency on the disks during the time the alerts were firing, which was slowing the DB down enough to not be able to keep up with incoming metrics.

    Comment

    • Hamardaban
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • May 2019
      • 2713

      #3
      For zabix, the most important component is the database. Poor performance or the presence of delays strongly affect the operation of the entire system.
      What the documentation explicitly says: https://www.zabbix.com/documentation...ormance_tuning

      Comment

      Working...