Ad Widget

Collapse

Converting Web page to XML(?)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • high-t
    Member
    • Dec 2014
    • 68

    #1

    Converting Web page to XML(?)

    Hi.

    I wish to monitor changes made to the table in this page. I know how to parse an XML (using xmlstarlet), so I figured I'll try and convert the webpage's table to XML.

    With that in mind, my questions are:
    1. Is there a known "html-table to xml" utility that can make such a conversion?
    2. How would you tackle this? Would you take a different approach?


    Thanks!

    On a side note: It's funny.. site5 is using Zabbix to feed this page, so in a sense.. this is actually a reverse engineering .

    Amit.
  • aib
    Senior Member
    • Jan 2014
    • 1615

    #2
    First of all, Zabbix typically use the paradigm : one service - one number.
    So, for the page which contain the status of eight(8) services, you need 8 Items, configured under Host "s9-london".
    So far so good?

    Then, you can use a script, which will split the HTML page, and use zabbix_sender to fill up data for all eight Items. It doesn't matter, which language / library / etc. you use.

    Finally, that script can be used to control all services OR you can use just a simple one-line user parameter for zabbix agent, if you want to control only one service. It's all up to you.
    Sincerely yours,
    Aleksey

    Comment

    • high-t
      Member
      • Dec 2014
      • 68

      #3
      Thank you, Aleksey, for the reply.

      I understand the "one service-one number" paradigm. With that, I might as well decide to use it all as a single item, in which each change to the item means "something" went wrong. This is an approach I'm exercising with some of my items, and it works just as well.
      When a deeper level of granularity is needed; you're right - it goes to testing each service separately. Agreed.

      So with that, if you take a look at the html code of the page, you'll see it is a classic layout of a table, and for each service/column, there's an icon representing it's state. HTTP is the first icon, FTP is the second one, MySQL the third as so on. How would you go about querying the nth icons value?

      Thanks!

      Amit.

      Comment

      • aib
        Senior Member
        • Jan 2014
        • 1615

        #4
        You know, I tried to get a page with CURL - it doesn't work. I don't know why I can use wget and cannot do the same with curl. Lack of knowledge, I guess.

        Then - when you create an Item, you give him a name like "HTTP service".
        In ahead, you know that this Item will collect information about only HTTP status, so you can directly collect the data from line one.
        The same story about each line, right?
        Sincerely yours,
        Aleksey

        Comment

        • high-t
          Member
          • Dec 2014
          • 68

          #5
          well, I don't know why you're not able to use curl.
          I'm using it quite extensively and with no special problem.

          For instance: The following is a use case for extracting a domain's expiry date:



          With relation to your suggestion; I'm not quite following you. How would you suggest collecting a specific data from that HTML page. For example, how would you extract the FTP status?

          Thanks!

          Comment

          • aib
            Senior Member
            • Jan 2014
            • 1615

            #6
            I also had no problem with curl before, but: (see attached picture)

            As you see, wget downloaded page and saved it in INDEX.HTML file. but CURL got a code "403 forbidden". This is my problem.
            Attached Files
            Sincerely yours,
            Aleksey

            Comment

            • aib
              Senior Member
              • Jan 2014
              • 1615

              #7
              For your request about getting the HTTP status: I use "wget" and created a line for UserParameter= in zabbix_agent.conf
              Code:
              UserParameter=http_status_site5[*],wget -q http://www.site5.com/support/current-status/$1/ -O /tmp/$1.html; grep "centered" /tmp/$1.html | head -1 | awk '{print $$5};' | awk -F'"' '{print $$2};'
              And I tested it:
              Code:
              [root@z ~]# zabbix_get -s localhost -k http_status_site5[barnum]
              Online
              [root@z ~]# zabbix_get -s localhost -k http_status_site5[s9-london]
              Online
              Now, you can create an Item in host description which will have :
              Code:
              Key : http_status_site5[s9-london]
              Type : Text
              And create a trigger:
              Code:
              Name: HTTP service on s9-london is NOT Online ({ITEM.VALUE1})
              Expression: {hostname:http_status_site5[s9-london].str("Online")}<>1
              If you want to check any different service, you can copy/paste the Userparameter line and change the part "| head -1 |" to "sed -n 7p", where "7" is the number of line, which you want to check.
              Code:
              1 = HTTP
              2 = FTP
              3 = MySQL
              4 = POP3
              5 = IMAP
              6 = SMTP
              7 = SSH
              8 = Load
              Sincerely yours,
              Aleksey

              Comment

              • high-t
                Member
                • Dec 2014
                • 68

                #8
                Thank you so much, Alexsey for you help. Highly appreciated!

                I took a little bit of a different approach: I decided to use 2 variables; Server name and the Service being monitored. This way, I can extend the check for other Site5's servers as well. I also used "cut" rather than awk.
                Here's my UserParameter:

                Code:
                UserParameter=site5_status[*],wget -q http://www.site5.com/support/current-status/$1/ -O /tmp/$1.html; grep "centered" /tmp/$1.html | sed -n "$2"p | cut -d"=" -f4 |cut -d'"' -f2
                Following the above, I created an item per each server+service monitored:

                Code:
                #Monitor Server s9-london, Service HTTP
                Key : site5_status[s9-london,1]
                Finally, I created a trigger per each item, but decided to query for "Offline" rather than "not Online". This is because as it turns out, site5's webpage does not fully load each time. I'm pretty sure this is a problem with site5's web server being overloaded, but it means that in such cases, the trigger is unable to find the "Online", hence reports as if the service is Offline.

                Code:
                Name: Site5/s9-london/FTP is {ITEM.VALUE1}
                Expression: {hostname:site5_status[s9-london,2].str("Offline")}=1
                That concludes the monitoring of site5's status page.
                Nextup in my project:

                Comment

                • aib
                  Senior Member
                  • Jan 2014
                  • 1615

                  #9
                  Originally posted by high-t
                  Thank you so much, Alexsey for you help. Highly appreciated!
                  Any time!
                  Finally, I created a trigger per each item, but decided to query for "Offline" rather than "not Online". This is because as it turns out, site5's webpage does not fully load each time. I'm pretty sure this is a problem with site5's web server being overloaded, but it means that in such cases, the trigger is unable to find the "Online", hence reports as if the service is Offline.
                  In this case you can face with the problem when status is Undetermined.
                  I'm not sure that status can be only Online OR Offline. So, your trigger would not fired up if the status will be Undetermined (or something different)
                  Sincerely yours,
                  Aleksey

                  Comment

                  • high-t
                    Member
                    • Dec 2014
                    • 68

                    #10
                    In this case you can face with the problem when status is Undetermined.
                    I'm not sure that status can be only Online OR Offline. So, your trigger would not fired up if the status will be Undetermined (or something different)
                    In fact, querying for Offline rather than Not Online goes to solve exactly that. This is because I only want to be informed when the service is offline, and I definitely want wipe out and disregard any form of an undetermined status.

                    As for your last remark: I went ahead and tested out servers which had services offline at the time. I made sure the status page's code contains the phrase Offline.

                    Comment

                    • aib
                      Senior Member
                      • Jan 2014
                      • 1615

                      #11
                      I tried to get Offline status from status page. Finally, I got it.

                      When any service goes Offline, the code looks like that:
                      Code:
                      <td class="centered"><img src="/img/ico_unavailable.png" alt="Offline" title="Offline" /></td>
                      So, you can use Image name OR alt text OR title.
                      Sincerely yours,
                      Aleksey

                      Comment

                      Working...