Ad Widget

Collapse

No logfile monitoring after upgrade from 1.4.4 to 1.6.1

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dyost
    Junior Member
    • Oct 2008
    • 8

    #1

    No logfile monitoring after upgrade from 1.4.4 to 1.6.1

    Greetings,

    The stats:


    OS: Red Hat Enterprise Linux ES release 4 (Nahant Update 7)

    uname -a: Linux myhost 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 i686 i386 GNU/Linux

    ZABBIX Server (daemon) v1.6.1 (04 November 2008)
    Compilation time: Nov 6 2008 11:45:55

    ZABBIX Agent v1.6.1 (04 November 2008)
    Compilation time: Nov 6 2008 11:45:55

    mysql Ver 14.7 Distrib 4.1.22, for redhat-linux-gnu (i386) using readline 4.3

    PHP 4.3.9 (cgi) (built: Jul 15 2008 10:14:59)
    Copyright (c) 1997-2004 The PHP Group
    Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies



    I upgraded from 1.4.4 to 1.6.1 and now none of the logfile monitoring works. My logfile active check items are now "not supported" (Not supported by ZABBIX agent).

    Absolutely nothing in any log file (server nor agent) to give me any idea whatsoever as to **why** they don't work.

    1) I have tried restarting the agent according to:



    No help.

    2) I have tried re-activating the items. They stay active for a minute or two and then switch back to "not supported" (and during the supposed "active" time no log file data was read--doesn't show up in history).

    Again, nothing whatsoever in any Zabbix log file to tell me what in the world is going on.

    3) I have tried creating new logfile monitor items, maybe with some lines from one of the actual logfiles in a temp file (and monitor that temp file), to see if at least the *new* items work.

    Nope.


    More notes:

    1) Everything worked just fine before the upgrade. I changed *nothing* in the configuration.

    2) Notwithstanding #1, I admit that I had not manually checked the logfile items recently--it is *possible* they were already shanked before the upgrade and I didn't notice, though they were working fine for weeks before. But I am fairly certain all was fine right up to the upgrade. I mention this just in case the timing with the upgrade is a wild goose chase on this problem.

    3) I am aware that the logfile items have to be set to "active" and that the hostname has to match in the agent's config file. They are set to active and hostnames do match--again, note this all *worked* before and I did not change any configs, nor any items.


    Do you have any ideas or even any pointers on how to get Zabbix to tell me *why* it keeps arbitrarily disabling these logfile items now?

    Thanks much,
    DY
  • dyost
    Junior Member
    • Oct 2008
    • 8

    #2
    Anybody? Anybody have any ideas? It appears to me that 1.6.1 simply stopped supporting logfile monitoring. Sure wish I would have known that logfile monitoring would go away--I'd not have upgraded. Better yet would be to have a full test lab to test the upgrade, but I don't have that luxury.


    Here's some info that I got by increasing the debug level:

    AGENT log:

    3463:20081122:124602 Sending [{
    "request":"active checks",
    "host":"usptitan1"}]
    3463:20081122:124602 Before read
    3463:20081122:124602 Got [{
    "response":"success",
    "data":[
    {
    "key":"log[\/var\/log\/aiparc.log]",
    "delay":"1",
    "lastlogsize":"-1074939056"}]}]
    3463:20081122:124602 In parse_list_of_checks() [{
    "response":"success",
    "data":[
    {
    "key":"log[\/var\/log\/aiparc.log]",
    "delay":"1",
    "lastlogsize":"-1074939056"}]}]
    3463:20081122:124602 In disable_all_metrics()
    3463:20081122:124602 In add_check('log[/var/log/aiparc.log]', 1, -1074939056)
    3463:20081122:124602 In process_active_checks('127.0.0.1',10051)
    3463:20081122:124602 In process log (/var/log/aiparc.log,-1075613568)
    3463:20081122:124602 Cannot set postition to [-1075613568] for [/var/log/aiparc.log] [Invalid argument]
    3463:20081122:124602 Active check [log[/var/log/aiparc.log]] is not supported. Disabled.
    3463:20081122:124602 In process_value('usptitan1','log[/var/log/aiparc.log]','ZBX_NOTSUPPORTED')




    Why in the world does it claim the logfile position (last read position) is -1075613568? No idea. So I went into the MySQL backend and looked--the lastlogsize for the item. It's 5213662, right there in the field, not -1075613568.

    The thread referenced above says a restart worked--not for me. I've restarted a thousand times.


    Here's the SERVER log (separate timestamp--tried twice, and the logs get rotated so fast that I lose them--don't worry, same error occurred before):

    17796:20081122:125602 Trapper got [{
    "request":"active checks",
    "host":"usptitan1"}] len 50
    17796:20081122:125602 In send_list_of_active_checks_json()
    17796:20081122:125602 Host:usptitan1
    17796:20081122:125602 Query [select i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h .useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.ho stid,h.status,i.value_type,h.errors_from,i.snmp_po rt,i.delta,i.prevorgvalue,i.lastclock,i.units,i.mu ltiplier,i.snmpv3_securityname,i.snmpv3_securityle vel,i.snmpv3_authpassphrase,i.snmpv3_privpassphras e,i.formula,h.available,i.status,i.trapper_hosts,i .logtimefmt,i.valuemapid,i.delay_flex,h.dns,i.para ms,i.trends,h.useipmi,h.ipmi_port,h.ipmi_authtype, h.ipmi_privilege,h.ipmi_username,h.ipmi_password,i .ipmi_sensor from hosts h, items i where i.hostid=h.hostid and h.status=0 and i.type=7 and h.host='usptitan1' and h.proxy_hostid=0 and (i.status=0 or (i.status=3 and i.nextcheck<=1227380162)) and h.hostid between 000000000000000 and 099999999999999]
    17796:20081122:125602 In substitute_simple_macros (data:"log[/var/log/aiparc.log]")
    17796:20081122:125602 End substitute_simple_macros (result:log[/var/log/aiparc.log])
    17796:20081122:125602 Sending [{
    "response":"success",
    "data":[
    {
    "key":"log[\/var\/log\/aiparc.log]",
    "delay":"1",
    "lastlogsize":"-1074939056"}]}]



    So there again, why -1074939056?

    If I add a brand new item with some *world readable* temp file as it's "log" to try to practice on, I get -1074939056. If I reactivate existing (previously-working) log items, I get -1074939056 for all of them.

    Any ideas at all?

    Comment

    • dyost
      Junior Member
      • Oct 2008
      • 8

      #3
      Quick update: it was saying -1074939056 for all of the lastlogsize values. Did that for quite a while. Now it's saying -1075728032 is the position for all of them. All log files now claim -1075728032 as the lastlogsize even though their actual value in the MySQL item table is a valid, positive integer.

      Looks like a completely shanked pointer somewhere to me.

      Comment

      • runner
        Junior Member
        • Mar 2008
        • 3

        #4
        Exactly the same for me

        And no help aswell.

        @dyost, did you try with an old agentd (like 1.4x) ?

        CentOS 5.2
        2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
        zabbix agentd 1.6.1
        zabbix server v1.6.1

        Maybe some Guru could help?

        Comment

        • dyost
          Junior Member
          • Oct 2008
          • 8

          #5
          @ runner: I didn't realize you could use mismatching agentd vs. server versions, but I guess that makes sense. I may try that, though I'm nervous about everything so much now that who knows what else would break. I'm surprised that 1.6.1 completely breaks logfile monitoring but nobody has responded--no help or bugfix planned. I may edit the code myself. In any case, it'll probably be a very long time before we upgrade again, given all this.

          Comment

          • runner
            Junior Member
            • Mar 2008
            • 3

            #6
            Hello dyost,

            I tested a combination of Zabbix Server 1.6.1 with agend 1.4.5 running on Centos 4.2 (2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 30 12:33:47 EST 2007 i686 i686 i386 GNU/Linux) and it works!

            IMHO you don't have to worry using older agents with a higher versioned server. I have some fairly old agents. My oldest is 1.1.7 and still works . IMHO the worst may happen is no data. But I wouldn't try the other direction (Agent higher than server)

            The next days I will try some other maschine, agent combinations.

            Maybe someone can put this in the Bugtrack for agent v1.6.1?

            Regards
            Karsten

            Comment

            • Silvery
              Junior Member
              Zabbix Certified Specialist
              • Oct 2008
              • 28

              #7
              My Zabbix_Server has version 1.6.1 and my Zabbix_Agent Version ist 1.4.6. With this combination active agents also work fine.

              Comment

              • Alexei
                Founder, CEO
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Sep 2004
                • 5654

                #8
                Originally posted by dyost
                Looks like a completely shanked pointer somewhere to me.
                You are right. This is 1.6.1 specific problem, which is fixed in the latest code.
                Alexei Vladishev
                Creator of Zabbix, Product manager
                New York | Tokyo | Riga
                My Twitter

                Comment

                • dyost
                  Junior Member
                  • Oct 2008
                  • 8

                  #9
                  [NOTE: it's crazy that I was in the middle of typing this reply when Alexei's response came in]

                  Thanks very much, runner and Silvery. It is definitely confirmed that 1.6.1 is not ready for production--not the agent, anyway. Logfile monitoring is completely hosed--but Alexei has weighed in that it'll be fixed.

                  I followed your advice and used an older agent, while keeping the newer server.

                  So now I am running 1.6.1 on the server and 1.4.6 as the agent. So far, it's working just fine, from what I can tell.

                  Of course, it's also consuming about 100% of the CPU. One of the reasons I had upgraded to 1.6.1 was for the advertised performance gains, but I am 99% sure that the CPU usage will settle once it gets done parsing the days and days of logfile data that hadn't been read yet.

                  I know why the CPU consumption had been so good--because logfile monitoring was broken for me! But I'll still hope that 1.6.1 on the server has performance benefits, and my 1.4.6 agent appears to be working OK.

                  Thanks very much!

                  Comment

                  • Alexei
                    Founder, CEO
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Sep 2004
                    • 5654

                    #10
                    The problem is specific to ZABBIX server 1.6.1. The agents are not affected!
                    Alexei Vladishev
                    Creator of Zabbix, Product manager
                    New York | Tokyo | Riga
                    My Twitter

                    Comment

                    • dyost
                      Junior Member
                      • Oct 2008
                      • 8

                      #11
                      Originally posted by Alexei
                      The problem is specific to ZABBIX server 1.6.1. The agents are not affected!

                      Wait--now I'm a bit confused. Why does swapping out the 1.6.1 agent with a 1.4.6 agent at least *appear* to make things work, then? Is it risky or otherwise inadvisable to "fix" this with a workaround by running an older agent?

                      All I did was stop the agent, copy new binaries (1.4.6) over the top, and restart the agent. Never touched the server. And now logfile monitoring appears to be working (as do the rest of the items).

                      Comment

                      • chyaroslav
                        Junior Member
                        • Jun 2008
                        • 8

                        #12
                        Problem with 1.6.1

                        Anybody knows has this bug fixed in 1.6.4 server?

                        Comment

                        • Alexei
                          Founder, CEO
                          Zabbix Certified Trainer
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Sep 2004
                          • 5654

                          #13
                          Originally posted by chyaroslav
                          Anybody knows has this bug fixed in 1.6.4 server?
                          The problem was resolved long time ago, I think in 1.6.2.
                          Alexei Vladishev
                          Creator of Zabbix, Product manager
                          New York | Tokyo | Riga
                          My Twitter

                          Comment

                          • chyaroslav
                            Junior Member
                            • Jun 2008
                            • 8

                            #14
                            Originally posted by Alexei
                            The problem was resolved long time ago, I think in 1.6.2.
                            Nevertheless I have installed 1.6.4 and the problem still remain.
                            ZABBIX Server (daemon) v1.6.4 (3 April 2009)
                            Compilation time: Apr 9 2009 10:09:05

                            2423:20090409:110801 Sending [ZBX_GET_ACTIVE_CHECKS
                            d-tst-1
                            ]
                            2423:20090409:110801 Before read
                            2423:20090409:110801 In parse_list_of_checks() [log[/var/adm/messages,error]:1:-1078953032
                            ZBX_EOF
                            ]
                            2423:20090409:110801 In disable_all_metrics()
                            2423:20090409:110801 Parsed [log[/var/adm/messages,error]:1:-1078953032]
                            2423:20090409:110801 In add_check('log[/var/adm/messages,error]', 1, -1078953032)
                            2423:20090409:110801 Parsed [ZBX_EOF]
                            2423:20090409:110801 In process_active_checks('10.61.12.40',10051)
                            2423:20090409:110801 In process log (/var/adm/messages,-1078953032)
                            2423:20090409:110801 Cannot set postition to [-1078953032] for [/var/adm/messages] [Invalid argument]
                            2423:20090409:110801 Active check [log[/var/adm/messages,error]] is not supported. Disabled.
                            2423:20090409:110801 XML before sending [<req><host>ZC10c3QtMQ==</host><key>bG9nWy92YXIvYWRtL21lc3NhZ2VzLGVycm9yXQ== </key><data>WkJYX05PVFNVUFBPUlRFRA==</data><lastlogsize>LTEwNzg5NTMwMzI=</lastlogsize></req>]
                            2423:20090409:110801 OK
                            2423:20090409:110801 In get_min_nextcheck()

                            Comment

                            • chyaroslav
                              Junior Member
                              • Jun 2008
                              • 8

                              #15
                              problem

                              Please, Anybody help me! What is wrong in my configuration. May be it is only a problem of Solaris agents?

                              Comment

                              Working...