Ad Widget

Collapse

Segregating Zbx Svr db and UI db workloads

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • maheshme973
    Junior Member
    • Jul 2011
    • 9

    #1

    Segregating Zbx Svr db and UI db workloads

    Our volumes are growing rapidly; We are beginning to see DB (on mysql) issues due to the UI workload interfering with the Zabbix Server load on the DB.

    My initial attempt was to create a MySQL Slave configured just for UI tasks. It worked for some time.

    But after an hour or so, mysql replication started breaking. Duplicate primary keys were being reported for tables like events, auditlog etc.

    So, clearly, "just viewing" dashboards from the UI also generates data that is likely to conflict with data generated by the Zbx server.

    Is there a fool-proof way of making sure UI load doesn't interfere with Zabbix Server load on the DB?
  • BDiE8VNy
    Senior Member
    • Apr 2010
    • 680

    #2
    Once I made a setup of directing DML to the master DB and queries to the hot-standby DB. I used pgpool-II to do that, which is for PostgreSQL. Maybe there's something similar for MySQL.

    Finally I reverted that setup because immediately after making configuration changes in the frontend (which were directed to the master DB) they were queried to show the current status (directed to hot-standby).
    Since data was sometimes not replicated that fast, the previous setting was shown by the frontend. That was quite confusing to users.

    Beside consulting commercial support you could think of creating your own dashboard.

    However, what kind of interferences do you observe exactly?

    Edit:
    Well, the issue I described with pgpool-II was due to the fact that I want do asynchronous replication. The mentioned problem would not appear if having used synchronous replication.
    Last edited by BDiE8VNy; 27-09-2015, 12:36.

    Comment

    • maheshme973
      Junior Member
      • Jul 2011
      • 9

      #3
      By interference I meant performance issues. The DB as well as the Zbx servers became sluggish every now and then. There used to be numerous "lockwaits" in the DB and several of the Zbx trapper processes would stop accepting new connections (netstat showed them to be in CLOSE_WAIT state).

      The issues we were seeing are described here: https://www.zabbix.com/forum/showthread.php?t=49732

      At present, the UI is running on a separate server but is pointed to the same DB instance as is the Zbx server. As part of the change, we also upgraded the mysql disks to SSD.

      For the last 2 days we have not been seeing any issues. It is difficult to tell if the UI load separation did the trick or if it was just the SSDs. If its the latter, I guess we will start seeing issues again when the workload increases further.
      --------------------------------------------------------------------------------------------------
      BTW - I like your approach. It's a shame you ran into issues. I will start looking for the mysql equivalent of what you did.

      About my approach:

      I was seeing mysql replication errors in tables auditlog, profiles, triggers, events, items.

      This means these tables were being updated on both the mysql servers.

      I thought about this a little (after my previous post) and here are my hypotheses:

      - I think the auditlog and profiles tables may genuinely be updated by both, the UI part and the Zbx server. The way around this could be to not replicate these tables. Omitting these tables may not cause any functional issues other than the fact that the information will be spread over both the servers.

      - I think the triggers, events and items errors are artificial - meaning, I suspect my colleague was sending her UI transactions to the master mysql instance (by using the wrong IP address in the browser) while my UI transactions were going to the slave. Any way, this is a theory. I will test this further when I have the time.

      If my analysis is right, my approach deserves another chance. I will post an update if I get round to doing it.

      I also realize my earlier approach may fall short functionally because any configuration change that I make from the UI will need to end up on the master mysql. For that I may have to resort to a master-master setup where both instances replicate from each other. That adds a whole new dimension to the complexity!
      Last edited by maheshme973; 25-09-2015, 22:48.

      Comment

      • maheshme973
        Junior Member
        • Jul 2011
        • 9

        #4
        I installed mysql_proxy to separate reads and writes. It works - but I am still seeing several read queries go to the master database.

        My guess is that these read queries are inside transactions.

        Comment

        • Oscar Garcia
          Junior Member
          • Oct 2015
          • 6

          #5
          Success using Percona Cluster Server and 3 nodes

          Hello, we had success using Percona Cluster Server and 3 nodes in a test environment, but finally we got a better machine (quicker CPU, with more cores and RAM).

          Write bottleneck was still there, all three nodes are continuously writing data to RAID 1 disks, but reads from UI are load balanced between nodes using linux director reducing overall CPU usage (our previous bottleneck).

          We experienced abnormal behavior with zabbix_sender from remote hosts, sometimes there is a long delay between sending data and showing it in latest data (+5 minutes). I don't know if it a percona cluster or zabbix 2.2 issue. I think that that problem still there, sometimes with newer server the data is delayed.

          We had another issue related with clock time difference, but it was solved installing and configuring NTP (our firewall filter external NTP sources) in 3 db, 3 nginx/php-fpm and zabbix_server nodes.

          Today we are using distributed monitoring:
          * 4 environment (overall services, production, pre-production and development).
          * Every environment has a independent zabbix server and database node.
          * All 4 environments (plus old one, still migrating) are accessed by a 3 node nginx/php-fpm cluster.

          Old environment was 965 hosts and ~90 items per second (with severe IO and CPU bottleneck) and newest one 560 hosts and ~40 items per second (increasing every day, but with lots of CPU/RAM resources free).

          More than 80% of our monitoring are based on external scripts sending data with zabbix_sender to localhost, so I think that items per second calculations are wrong in both environments.

          Regards.

          Comment

          Working...