Ad Widget

Collapse

Zabbix 1.8 hates Oracle (as a backend)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #1

    Zabbix 1.8 hates Oracle (as a backend)

    Having done some perusal of the code (disclaimer: I'm no coder, but I can follow code after a fashion) it becomes apparent that Zabbix 1.8 uses a single db syncer process. This is probably no problem for MySQL when it's set up as a local database. What happens when it's remote over a network?

    At any rate, we've finally added more hosts than Zabbix can handle. Our database is capable of much more, but apparently Zabbix coupled with the SPARC architecture (running in 32 bits no less) and an Oracle backend not going to happen.

    Number of items (monitored/disabled/not supported) 15178 15084 / 1 / 93

    Code:
    29188:20100408:155639.349 DB syncer spent 0.000124 second while processing 0 items. Nextsync after 5 sec.
     29188:20100408:155644.353 DB syncer spent 0.000087 second while processing 0 items. Nextsync after 5 sec.
     29188:20100408:155649.354 DB syncer spent 0.000078 second while processing 0 items. Nextsync after 5 sec.
     29188:20100408:155658.826 DB syncer spent 4.472079 second while processing 186 items. Nextsync after 5 sec.
     29188:20100408:155729.525 DB syncer spent 25.698532 second while processing 1000 items. Nextsync after 5 sec.
     29188:20100408:155827.152 DB syncer spent 52.626974 second while processing 2000 items. Nextsync after 5 sec.
     29188:20100408:160041.239 DB syncer spent 129.086294 second while processing 7000 items. Nextsync after 5 sec.
     29188:20100408:160427.822 DB syncer spent 221.582604 second while processing 11000 items. Nextsync after 4 sec.
     29188:20100408:161048.093 DB syncer spent 376.270834 second while processing 21000 items. Nextsync after 4 sec.
    29188:20100408:161820.495 DB syncer spent 448.400870 second while processing 29000 items. Nextsync after 4 sec.
    In other words, we're underwater. The "queue" is backing up within minutes of start-up. The history write-cache starts to fill up quickly and the DB Syncer seems to be unable to send data to our database quickly enough to keep our chin above water.

    Questions from my team include, "Why is this a single-threaded or single process? Why isn't it parallelized?" I understand the problems with data concurrency, but they have a valid point. Is there any way we can speed this along or spawn multiple parallel processes? What can be done?

    I understand that MySQL is preferred. Why wouldn't it have just as hard a time? We're about to attempt to move to a Linux/intel front-end as we fear the slower (but optimized for multi-threading) design of the SPARC boxes has something to do with this.

    Our problem really does seem to be that Zabbix can't write to our DB fast enough to keep up with the stream of data we are trying to monitor, at least with Oracle on the backend and a Sun T5120 on the server side (8 cores, 16G of RAM).

    Any suggestions?
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Originally posted by untergeek
    Our problem really does seem to be that Zabbix can't write to our DB fast enough to keep up with the stream of data we are trying to monitor, at least with Oracle on the backend and a Sun T5120 on the server side (8 cores, 16G of RAM).

    Any suggestions?
    Zabbix does exactly the same processing for all types of back-end databases. If you say that Zabbix works fast under MySQL but is very slow under Oracle, it basically means that Oracle (settings, configuration, whatever) is to blame. There is nothing Oracle-specific in Zabbix code.

    Yes, we do not use use Oracle-specific tweaks, but Zabbix Server should demonstrate at least comparable performance with Oracle, if not better than with MySQL, as Oracle scales much better theoretically. Your DB hardware is very nice, I would do some Oracle tuning.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • untergeek
      Senior Member
      Zabbix Certified Specialist
      • Jun 2009
      • 512

      #3
      Our DB Hardware is nicer than that! That's the spec for our Zabbix Server. In spite of the nice hardware, Zabbix can't write out fast enough to keep up.

      I suppose I should have stated that we have these numbers:

      Number of hosts (real, not templates) 342
      Number of triggers 4320
      Required server performance, new values per second 248.45277777778

      Shouldn't Oracle be able to keep up with that? I would expect. The problem, according to our DBA is not that Oracle is incapable, but that Zabbix isn't sending fast enough. It is also sending straight SQL queries instead of binary formatted, but that's a topic for another thread. There is a cost for doing straight SQL queries, but our hardware should be capable. What seems to be happening is that Zabbix (on OUR SPARC hardware) doesn't seem to be able to process all of the entries in a speedy enough fashion because the DB Syncer is single-threaded and single-threads on our SPARC hardware are nowhere near as fast as single-threaded performance on newer Intel hardware (or even older Intel hardware). Also of note is that the recommended setup is with MySQL running locally, with Zabbix connecting to a socket instead of via TCP. Shouldn't that also increase theoretical performance vs. network connections?

      I guess our questions are as follows:
      1. Can the number of DB Syncers be increased? We see this option in server.c
        Code:
        int     CONFIG_DBSYNCER_FORKS           = 1;
        and would love to increase it, but would not try without some understanding of what that would do.
      2. Would migrating to Intel hardware with Linux (Ubuntu) on top increase our throughput even staying with Oracle on the backend?
      3. If there are ways to increase performance of the millions of writes we're doing to Oracle, what kind of tweaks have been or are being used by others? Do you have any ideas?


      We love Zabbix. We're just disappointed that it can't handle the load we're trying to throw at it as presently configured. We're not abandoning Zabbix. We're just trying to learn how to make this work right.

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #4
        Problem solved in an unsupported way.

        We set this in server.c and it works like a champ with Oracle (cannot recommend this to anyone else with any other database. Oracle is designed to take care of some of these things so you don't have to. Don't think the other SQLs can).

        Code:
        int     CONFIG_DBSYNCER_FORKS           = 12;
        Yes. We tried it and it worked. Not only did it work, it worked well. I tested this with as many as 100 syncers and as few as 5 (seeing as 1 is the default).

        With 100, we started to run into collisions on the IDS table. In fact, that was the issue with even 25 syncers with the number of items we have. 50 was okay with only 109 values per second (as dictated on the Dashboard) in our certification environment. Our production environment has the items described above and we had no collisions with 12 syncers. This is working so well we're surprised beyond measure.

        I understand that this is not supported or recommended. It works well for us and thought you should know.

        Of 13852 lines captured containing "DB syncer spent x.xx seconds processing y items:
        1145 are greater than 2 seconds (approximately 8.27%)
        1290 are between 1 and 2 seconds (approximately 9.32%)
        11417 are less than 1 second (approximately 82.42%)

        The top 10 longest syncs:
        31.669499
        31.321584
        30.168350
        29.419622
        27.699291
        27.088114
        23.160533
        21.065184
        19.959127
        19.725157

        The top 10 shortest syncs:
        0.000077
        0.000077
        0.000077
        0.000077
        0.000077
        0.000077
        0.000077
        0.000077
        0.000077
        0.000077

        We are not using DebugLevel=4 any more, but will continue to monitor the progress of this for the coming week.

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          It is absolutely not recommended way of speeding up Zabbix. It may lead to a collision with IDS regardless of number of syncers and other issues (loss of ordering for historical data, etc).

          Number of significant improvements in logic of existing database cache is on the way. It is very likely that the changes will be released within 3-4 weeks. Please be patient, meanwhile be careful when trying non-official hacks.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • untergeek
            Senior Member
            Zabbix Certified Specialist
            • Jun 2009
            • 512

            #6
            Alexei,

            I hope the changes work. Unfortunately I think that our problem is inherent to our architecture. Meanwhile, we are carefully evaluating this. We are not taking it lightly. I believe the only reason we're succeeding at this is because we're using Oracle. Oracle processes things in a different enough manner that it should succeed in this way.

            So far the data is good. We'll keep you posted.

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              Great, please keep me updated.
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • untergeek
                Senior Member
                Zabbix Certified Specialist
                • Jun 2009
                • 512

                #8
                It's been a few weeks so far and it's been running well.

                I did notice this line in the Changelog:

                - [ZBXNEXT-325] added StartDBSyncers parameter for parallel writing to DB (Sasha)

                This is excellent news! I am looking forward to official support for this in 1.8.3!

                Out of curiosity, did my experience have anything to do with this or was it in the pipes already?

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #9
                  Originally posted by untergeek
                  Out of curiosity, did my experience have anything to do with this or was it in the pipes already?
                  It was already in our pipeline. An incomplete list of planned performance related improvements can be found here: https://support.zabbix.com/browse/ZBXNEXT-318
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  Working...