Ad Widget

Collapse

Two minor annoyances: Upgrade and Monitoring Log files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cstackpole
    Senior Member
    Zabbix Certified Specialist
    • Oct 2006
    • 225

    #1

    Two minor annoyances: Upgrade and Monitoring Log files

    Hello,
    I have two issues that are not really a big deal but I thought I would ask in case someone has some incite they would like to share. The first is a question regarding an update and the latter is a performance question with regards to log files. The server is now 1.4 and the agents are all 1.4 varients (some are Debian Lenny 1.4 others are complied from a nightly developers download from last week). I did a quick search on the forums and didn't find anything. If I missed something glaringly obvious, I apologize.

    The first is:
    I updated my Debian server from Zabbix 1.1.7 to 1.4 it was a smooth upgrade and everything appears to work well overall. The "problem" is I added a Linux system under 1.1.7 and I just added a new identical (in just about every way) Linux system under 1.4. Both were assigned the standard unix_t template.

    The 1.4 system has really cool sub-menus in the General-Latest Data while the 1.1.7 has a flat view of everything. When I create a new item on the 1.4 host I can pick what submenu I want, the 1.1.7 system does not have those listings. So I went to Configuration->hosts and sure enough the 1.4 has the Unix_t listed as a template but the 1.1.7 system has nothing listed as a template. "Well theres your problem" I thought and I added Unix_t to the 1.1.7 system. I got all the cool updates, but the items were all new (eg the information did not carry over to the new Unix_t and so I had 2 disk free space graphs for example).

    The only way I found to get the new looks/feel/features was to delete the host and re-add it to Zabbix. This is a loss of all the data and is not desirable for several systems. I can do without the updated features on those systems, but I thought I would ask if there was something I was missing or something I could do. Any tips, hints, or suggestions?


    Second:
    I have a log file I am monitoring. There are key words I need to monitor and different actions that need to be done depending on the key word. This log file is updated several times a second.
    So I have items like:
    log[/my/log/file]
    log[/my/log/file,ALERT]
    log[/my/log/file,ERROR]
    log[/my/log/file,HeartBeat]

    Basically I am monitoring the same file 4 times and triggering off of those items. At first I couldn't really distinguish any performance hit. On the nodes themselves it does not appear to make one lick of difference. Then I added 10 systems to the server and it makes a BIG difference. Monitoring 30 systems (those 10 plus an additional 20) without monitoring those 4 items and my dual P4 3ghz with a gb ram sits happily about 15-20% per processor. I enable those 4 items on those 10 systems and in about 15 minutes my server sits about 80-95% load

    Can someone suggest a better method?

    Thanks for your input!
    cstackpole
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    You may consider using a single log[] with a complex GNU-style regular expression as a second parameter:

    log[/my/log/file,"Complex regual expression which will match ALERT, ERROR, whatever"]
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • cstackpole
      Senior Member
      Zabbix Certified Specialist
      • Oct 2006
      • 225

      #3
      I will give that a try and see what happens.
      Thanks Alexei!

      Comment

      • cstackpole
        Senior Member
        Zabbix Certified Specialist
        • Oct 2006
        • 225

        #4
        Hello Alexei,
        Just thought I would post back some of my experiences dealing with this high load. The server is 1.4.1 running on Debian.

        I tried your solution of combining several of the statements into one. That didn't seem to make a noticeable difference. So I continued to work on the expressions.

        Then last night, one of these servers went down. Zabbix never sent an email and the frontend still showed green all the way down. When the server came back online, the "server {HOSTNAME} is unreachable" went grey. The history shows "none" for the status for almost a week! I still got alerts when I would run a process to max the memory, push the load, or filled the hard drive space but I could turn off the agent (and/or the server!) and never get an email. I couldn't get the unknown state to commit to ON or OFF.

        So off to the forums I went. I found this: http://www.zabbix.com/forum/showthre...5&page=1&pp=10

        and began working my way through it. I verified that only the Linux agents of 1.4.1 were going into the unknown state and all the other systems were OK (the 1.1.7 and the 1.1.4 agents responded as they should). Nothing seemed to work for the 1.4.1 agents though. I stopped the agents on the systems then I stopped the Zabbix server and brought it all back up. Still unknown.

        So I tried again. Brought down the agents. Brought down the server. I even turned off Apache this time. I don't know why I decided to check, but I ran ps to verify Zabbix was stopped; it was. Thats when I noticed MANY MANY MySQL entries; way more then I had seen before on this system. Top was showing that the load had not died down significantly after Zabbix was stopped (went from >5.5 to hovering around 5).

        I stopped MySQL, checked it was gone (ps again) and that the load had dropped, then I started MySQL, Apache, zabbix_agentd, and zabbix_server. I brought up all the other agents as well and the UNKNOWN status is gone! Zabbix reports machines when they are offline/reboot again, the status is updated, and most importantly when I have the original set of log items enabled the load is sitting below 1.5!

        I am really not certain what happened. Zabbix appeared to function just fine with 1.4 before I added the log items/triggers. I noticed the increase in load after enabling them, but it seems as if it was MySQL that was actually using all the resources. My concern is that MySQL just got back logged trying to keep up and if that is true then I can expect this to happen again. I will keep a closer eye on this and let you know if it happens again. I can always reboot my development machine every couple of days and make sure that Zabbix spots it.

        Before this moment flees too far from my memory, is there anything that might be helpful to you guys? I know you have worked on this issue before so if there is something that might help out, please let me know. There wasn't anything in the logs that looked important to me or different from normal, but I could be wrong. I also grabbed a screenshot (though it looks similar to those already posted in the other posts).

        If I can help, please let me know.
        cstackpole

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Debian and Ubuntu servers perform integrity checks of MySQL databases when restarted as far as I am aware.

          Also, you may check MySQL processes and SQL statements (mysqladmin processlist) to see what exactly is going on when CPU load is too high.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • bbrendon
            Senior Member
            • Sep 2005
            • 870

            #6
            Regarding your first issue. That is the behavior I experienced as well when I upgraded. For some trends I didn't want to lose, I manually fixed it at the database level.

            When I upgraded, I lost most all my history.
            Unofficial Zabbix Expert
            Blog, Corporate Site

            Comment

            Working...