Ad Widget

Collapse

Zabbix http poller processes more than 75% busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • moses.moore
    Junior Member
    • Dec 2014
    • 24

    #1

    Zabbix http poller processes more than 75% busy

    How would I diagnose this, to see what the http poller processes are getting hung up on ?

    I've been seeing more of this warning lately. I've only got 48 webchecks, and the load average for the machine isn't increased when I see this trigger, nor am I noticing slow response times for the remote servers.

    I tried running a little bash script to monitor http sites while Zabbix does, and the bash script didn't notice timeouts when the Zabbix http poller processes did get hung up (and it's not my script causing the problem, the problem started before I tried this).

    Can I get more info about the http poller processes, maybe how long each webcheck is taking to complete or which ones are timing-out?
  • BDiE8VNy
    Senior Member
    • Apr 2010
    • 680

    #2
    This pre-defined trigger indicates that Zabbix is more than 75% busy by doing web scenarios.

    Zabbix has separate types of processes for several tasks, http poller (web scnearios) is one of them. For many process types Zabbix allows to configure how many processes for such a particular type are started. Workload is then balanced equally among these processes.

    Each process in Zabbix generally processes only one task (e.g. gathering an item value) at a time. There might be exceptions like SNMP bulk-requests though.

    However, in your case consider increasing StartHTTPPollers.

    You might also be interested in "Monitoring how busy Zabbix processes are" and "Runtime loglevel changing"

    Comment

    • moses.moore
      Junior Member
      • Dec 2014
      • 24

      #3
      in your case consider increasing StartHTTPPollers.
      The name suggests that's how many poller processes that Zabbix starts with, but there is no explicit maximum. The documentation you quoted says "The upper limit used to be 255 before version 1.8.5." so maybe it's just an unfortunate name. I'll try that.

      upping the loglevel, I'm seeing many messages like the following:

      Code:
      cannot process step "GET /" of web scenario "filmsite" on host "<omitted>": 
      Timeout was reached: Operation timed out 
      after 15737 milliseconds with 34048 bytes received
      ... but 34048 bytes is the size of the front page of filmsite. It looks like the http poller fetches the entire page, then waits for more for fifteen seconds, then gives up and reports that it didn't get what it was looking for? That's really weird.

      Comment

      Working...