Ad Widget

Collapse

it services parent services not turning off

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lamont
    Member
    • Nov 2007
    • 89

    #1

    it services parent services not turning off

    I've got some "interior node" it services that do not have any triggers set on them, which are set to "problem, if at least one child has a problem".

    Occasionally, i see where the child it services have all gone green (status = 0) but the interior nodes are still status = 4. I can go into the database and do a select on the services_links database to find all the child nodes and none of them have a non-zero status. I have to manually twiddle the database to set the status = 0 in the services table to fix it.

    I don't know how zabbix sets those status fields, but some race condition or bug is getting exercised every now and then and failing to clear parent it services correctly (most of the time this works, it must be a 1-in-100 or 1-in-1000 kind of bug).
  • lamont
    Member
    • Nov 2007
    • 89

    #2
    bueller? bueller?

    Comment

    • lamont
      Member
      • Nov 2007
      • 89

      #3
      rebumping this one.

      i looked at a bunch of the it services child/parent setting stuff but it looked
      not entirely obvious where to go fishing for this bug...

      Comment

      • Aly
        ZABBIX developer
        • May 2007
        • 1126

        #4
        If you could describe circumstances at which that bug happens may be we could repeat it and fix it.

        P.S. Create a bug report in our support system, so it wouldn't lost in time.
        Zabbix | ex GUI developer

        Comment

        • lamont
          Member
          • Nov 2007
          • 89

          #5
          Bug report page seems to be down or responding slowly for me right now (support.zabbix.com?)

          I can't describe the circumstances that it occurs in, i just know the effects are that i wind up with a zabbix service which is an "interior node" in the tree that has no triggers directly attatched to it which has a non-zero status. If you walk down the tree to all the IT services that it inherits from then you find that none of the triggers below it are on.

          I assume there is an algorithm in zabbix that when a trigger goes off, it walks up the IT services tree turning off the status bits on the IT services. That algorithm seems to have some kind of race condition in it. Maybe when that algorithm is running concurrently because of state changes on two different triggers underneath a single "interior node" there's a race? Both concurrent executions wind up concluding they need to leave the "interior node" status non-zero because the other trigger is still on, then they turn off their "edge nodes" and leave the "interior node" status orphaned?

          I don't know if that is happening, but something like that seems to be happening...

          It doesn't happen very often, and we have a fairly massive amount of IT services configured ~3000 or so per server over about 8 servers and we only leak about one a week or so. Since it is a very visible UI error and it requires manual tweeking in the database to fix it, however, it is giving zabbix a bit of a black eye here.
          Last edited by lamont; 25-06-2009, 18:43.

          Comment

          Working...