Ad Widget

Collapse

web test log to find out why it failed

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • moses.moore
    Junior Member
    • Dec 2014
    • 24

    #1

    web test log to find out why it failed

    How do I find out why a web test scenario failed? I've only seen records of when it failed, and at what step it failed, but not why that step failed.

    I have a web test scenario that fails often, many times a week, but only for one poll period each time. By the time a human can get to it to test with other tools, Zabbix already reports that it's back to normal. None of the other websites on the same machine are suffering; the web test scenario itself is just looking for a '302 Moved Temporarily' response so I know it's not some website app code gone awry.

    I could set up a little monitoring script to poll at the same time as Zabbix for the same thing, but if I have to do that then why do I have Zabbix in the first place? So I feel there must be a way for Zabbix to tell me why a web test scenario has failed, not just "yes it failed."

    Here's a description of the web test:

    Name: cheers
    Update Interval: 120
    Retries: 1
    Agent: Google Chrome 17
    Variables:
    - {hostname}=[omitted]
    Enabled: true
    Steps:
    - {name: "GET /", url: "http://{hostname}/", follow-redirects: false, timeout: 15, required-status-codes: "301-302"}

    And this is the trigger:

    {mt:web.test.fail[cheers].last()}<>0
  • moses.moore
    Junior Member
    • Dec 2014
    • 24

    #2
    So I did set up a bash script to watch the same URL, and carp if there was any change in the output or any problem fetching the URL, polling every 30 seconds while Zabbix polls every 120 seconds.

    Zabbix reported a problem with the web scenario at 5:20am while I was sleeping. During the same time, the 30-second polling loop reported that fetching the URL was no different than before.

    I really would like to know why Zabbix's web scenario reported a problem when something else running on the same machine polling 4x more frequently did not.

    Comment

    • jan.garaj
      Senior Member
      Zabbix Certified Specialist
      • Jan 2010
      • 506

      #3
      Increase debug level (+log file size also) for web monitoring and you will see the reason in the log file.
      Historically, issues that might arise when using Zabbix have not been easy to troubleshoot. Not everything that would be useful is always logged. Log level can be increased, but those who still remember the first time they saw what DebugLevel=4 can do will understand why that option usually helped more advanced users. Besides, changing the […]
      Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
      My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

      Comment

      • moses.moore
        Junior Member
        • Dec 2014
        • 24

        #4
        Thank you; now I have more info.

        cannot process step "GET /" of web scenario "cheers" on host "mt": Timeout was reached: Operation timed out after 15968 milliseconds with 55212 bytes received
        I already know from the 30-second loop I had running at the same time that it's not really the remote server timing out, and the extra logging is showing me that many machines among many remote locations are giving me "Operation timed out" and "first network error / connection restored" messages. I'll use what you showed me to investigate further.

        Comment

        Working...