Ad Widget

Collapse

Zabbix become slow after upgrade to 1.4.2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • eli.stair
    Junior Member
    • May 2006
    • 20

    #31
    Tracking/optimizing slow queries

    I'd suggest turning on the slow query log, determining (from the query text) where those are being generated (i.e. from the php interface, or in what part of the C sources, and for what operation). It's ambiguous from your statement: are you seeing 2000+second runtimes, or 2000+ slow queries over some period of time?

    It's a good goal, but not necessarily a good bet that you can get rid of all slow queries (aside from the fact that it is a user-defined threshhold , since they are affected by so many factors. First off you have to take the time to tune your operating system, IO subsystem, and database server for the available hardare. At that point you're in a better state no matter what queries are thrown at the system, and makes it much easier to discern the cause of one.

    I've found several instances of misbehaving queries that were able to be optimized, and the zabbix crew were able to get those resolved in the next dot-release after giving details on the running query, circumstances involved, and what was generating them. I've seen a great improvement, but there is obviously still room for improvement. I'm working on tracking down further instances that can be tuned, and any assistance you can give is of benefit to us all.

    Currently the biggest violators I see continue to be GUI operations on templates, which appears to perform far more queries/updates than necessary against every host and related table.

    /eli

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #32
      Originally posted by eli.stair
      Currently the biggest violators I see continue to be GUI operations on templates, which appears to perform far more queries/updates than necessary against every host and related table.
      Last week we fixed extremely inefficient code related to manipulation owith template triggers. Perhaps this is exactly the problem you noticed.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • eli.stair
        Junior Member
        • May 2006
        • 20

        #33
        Code optimizations

        That's one, thanks Alexei! The biggest query issue currently is the way template operations with the GUI recurses the host/item/other tables. For example, initiating a change altering the status of a template (disabling/adding/modifying an item) with hundreds of attached hosts can take tens of minutes (or much more, and be incomplete/error). I haven't gotten deep enough again (re-enabling all-query log, checking exec time of each query, writing smart non-loop operations) with this issue to propose a code fix yet.

        1) ---
        The underlying problem is that ALL queries are being issued serially to limited scope. This causes huge, multiple, redundant loops against a single matching host/item, instead of necessary fields on ALL impacted hosts once, and caching the output, then taking action. Similarly, the INSERT/UPDATE queries are too serialized and could better be batched (or transaction-ized) at the end of the entire process. Essentially, the current PHP code is perfectly functional for small installed bases, but the high number of simple iterative loops results in highly inefficient operations. The linear scalability of the logic for these doesn't work well for my site, basically.

        The best examples of these serialization problems are caused by iterations in these functions and the way they are called (serially, looped individually) from items.php and can be seen in the include/items.inc.php functions: delete_item (l. 731), update_item (l. 353)., smart_update_item (l. 468) and others.

        I believe the highest-cost factor currently is the use of update_item(), though even smart_update_item() is issued in a completely serialized fashion, issued in a loop from items.php, and appears it returns with the result of issuing the update_item() call and all of its inherent while loops! It actually looks to me (after cursory review, I'm happy to be corrected) that any change to an item in a template results in indirectly at least one full query of all values for all items, then an UPDATE of all values for that item, for ALL hosts linked to the template. All completely serialized, thus taking a long while (linear extension of single query exec time) and making no use of optimizations the database can perform.

        2) ---
        Operations on templates with hundreds of hosts has caused me grief in the database, lingering entries from deleted hosts with triggers/items that remain. I'm unsure of the root cause of these, whether it's from failed/timed-out operations executed through the browser, or removals that weren't thorough. These become apparent when you perform another operation that acts on a template, as the orphaned host/item is still present and matches the template. Worse, is the fact that there is currently no database lint tool to detect instances of this cruft.

        The scariest part of all is the potential here for serious table-data corruption (effectively breaking zabbix) if an operation initiated by an admin through the PHP interface is interrupted. My only suggestion to somehow solve this is to move away from the current method of long-running PHP scripts which are necessarily completed successfully in order for things to work... to pre-generate all the SQL queries that are part of the batch operation up front, store those in a "batch" table, then have the zabbix daemon or something similar handle running those in the background in a transaction-safe manner.
        ---

        Here is snipped output from change made to a template, showing the effects of items pointing to the deleted hostid erroring out:

        # No host with hostid=[10067]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10068]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10069]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10070]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10071]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10072]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10073]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10074]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10075]
        # Item ':custom.cpu.cs' updated
        # No host with hostid=[10076]
        # Item ':custom.cpu.cs' updated

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #34
          Originally posted by eli.stair
          My only suggestion to somehow solve this is to move away from the current method of long-running PHP scripts which are necessarily completed successfully in order for things to work... to pre-generate all the SQL queries that are part of the batch operation up front, store those in a "batch" table, then have the zabbix daemon or something similar handle running those in the background in a transaction-safe manner.
          This is not necessary. The transactions can be (and will be) implemented on PHP side as well, making all batch operations immune to all sorts of interrupts.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          Working...