Ad Widget

Collapse

History Syncer 100%

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sb195c@att.com
    Member
    • Oct 2017
    • 41

    #1

    History Syncer 100%

    I'm having a problem where my history syncer processes are going to 100% and random times. When this occurs, it seems that they are in a state where they can't catch up, filling the history cache and at some point they either eventually recover, or I have to kill the zabbix process entirely. I have everything tuned as optimally as possible (everything runs on own device, plenty of hardware, mysql tuned, etc.). I have the large tables partitioned, and have disabled housekeeping for history/trends. I don't believe it's housekeeping because when it occurs the housekeeper has not run. I immediately seen slow queries both select and update in the zabbix_server.log. Are there any debugging or steps that I can take to gather more information from zabbix to see what may be causing this to happen? I am on release 3.4.7

  • vesper1978
    Member
    • Nov 2016
    • 59

    #2
    How many history syncer processes are you running? You may need to have more. However, if you're seeing slow queries, you may have a DB performance issue. How large are your history_* tables? What kind of resources does your DB server have?

    Comment

    • sb195c@att.com
      Member
      • Oct 2017
      • 41

      #3
      Each of these runs within VMware Vsphere running Ubuntu 16.04.3 LTS Status of Zabbix
      Zabbix server is running Yes wtc2zasv01:10051
      Number of hosts (enabled/disabled/templates) 979 886 / 4 / 89
      Number of items (enabled/disabled/not supported) 382546 382072 / 252 / 222
      Number of triggers (enabled/disabled [problem/ok]) 172036 171839 / 197 [150 / 171689]
      Number of users (online) 11 2
      Required server performance, new values per second 2062.4
      ZABBIX SERVER
      4 CPU
      32 GB RAM

      root@wtc2zasv01:/etc/zabbix# cat zabbix_server.conf
      LogFile=/var/log/zabbix/zabbix_server.log
      LogFileSize=0
      PidFile=/var/run/zabbix/zabbix_server.pid
      SocketDir=/var/run/zabbix
      DBHost = wtc2zadb02
      DBName=zabbix
      DBUser=zabbix
      DBPassword = ########
      StartPollers = 10
      StartPollersUnreachable = 10
      StartPingers = 10
      StartDiscoverers = 10
      SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
      CacheSize = 2G
      HistoryCacheSize = 256M
      HistoryIndexCacheSize = 256M
      TrendCacheSize = 256M
      ValueCacheSize = 2G
      Timeout=4
      AlertScriptsPath=/usr/lib/zabbix/alertscripts
      ExternalScripts=/usr/lib/zabbix/externalscripts
      FpingLocation=/usr/bin/fping
      Fping6Location=/usr/bin/fping6
      #LogSlowQueries=3000
      LogSlowQueries=10000
      StartProxyPollers=20
      ProxyConfigFrequency = 300
      ProxyDataFrequency = 1
      StartDBSyncers = 24

      DB SERVER
      12 CPU
      64 GP RAM
      DB is replicating to a hot standby using Master-Master replication
      Housekeeper is disabled for History and Trends
      All large tables are partitioned (history daily, trends monthly)

      /var/lib/mysql - is on an EMC VMax storage SAN with 2TB of disk allocated

      mysqld.cnf

      [mysqld_safe]
      socket = /var/run/mysqld/mysqld.sock
      nice = 0

      [mysqld]
      #
      # * Basic Settings
      #
      user = mysql
      pid-file = /var/run/mysqld/mysqld.pid
      socket = /var/run/mysqld/mysqld.sock
      port = 3306
      basedir = /usr
      datadir = /var/lib/mysql
      tmpdir = /tmp
      lc-messages-dir = /usr/share/mysql
      skip-external-locking
      skip-name-resolve
      #
      # Instead of skip-networking the default is now to listen only on
      # localhost which is more compatible and is not less secure.
      #bind-address = 127.0.0.1
      #
      # * Fine Tuning
      #
      key_buffer_size = 16M
      max_allowed_packet = 64M
      thread_stack = 192K
      thread_cache_size = 8
      # This replaces the startup script and checks MyISAM tables if needed
      # the first time they are touched
      myisam-recover-options = BACKUP
      #max_connections = 100
      #table_cache = 64
      #thread_concurrency = 10
      #
      # * Query Cache Configuration
      #
      query_cache_limit = 1M
      query_cache_size = 0
      query_cache_type = 0

      #
      # * Logging and Replication
      #
      # Both location gets rotated by the cronjob.
      # Be aware that this log type is a performance killer.
      # As of 5.1 you can enable the log at runtime!
      #general_log_file = /var/log/mysql/mysql.log
      #general_log = 1
      #
      # Error log - should be very few entries.
      #
      log_error = /var/log/mysql/error.log
      #
      # Here you can see queries with especially long duration
      #log_slow_queries = /var/log/mysql/mysql-slow.log
      #long_query_time = 2
      #log-queries-not-using-indexes
      #
      # The following can be used as easy to replay backup logs or for replication.
      # note: if you are setting up a replication slave, see README.Debian about
      # other settings you may need to change.
      #server-id = 1
      #log_bin = /var/log/mysql/mysql-bin.log
      expire_logs_days = 10
      max_binlog_size = 100M
      #binlog_do_db = include_database_name
      #binlog_ignore_db = include_database_name
      #
      # * InnoDB
      #
      # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
      # Read the manual for more InnoDB related options. There are many!
      innodb_buffer_pool_size = 48G
      innodb_buffer_pool_instances = 16
      innodb_log_file_size=8G
      innodb_lru_scan_depth=256
      #innodb_io_capacity = 2000
      innodb_io_capacity = 10000
      #
      # * Security Features
      #
      # Read the manual, too, if you want chroot!
      # chroot = /var/lib/mysql/
      #
      # For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
      #
      # ssl-ca=/etc/mysql/cacert.pem
      # ssl-cert=/etc/mysql/server-cert.pem
      # ssl-key=/etc/mysql/server-key.pem



      max_connections = 500
      optimizer_switch = 'index_condition_pushdown=off'

      server_id = 2
      log-bin="mysql-bin"
      binlog-do-db=zabbix
      binlog-ignore-db=information_schema
      binlog-ignore-db=mysql
      replicate-ignore-db=test
      replicate-ignore-db=information_schema
      replicate-ignore-db=mysql
      relay-log="mysql-relay-log"
      auto-increment-increment = 2
      auto-increment-offset = 2



      I'm definitely looking at the DB as a bottleneck, but I'm trying to determine exactly what are the history syncers doing when this problem occurs. I've logged query activity during this time, and Select and Updates become horribly slow (I can take an update that is logged as a slow query and run it in my standby db and it runs instantly). However, I would think if the DB is over run, the problem would persist after Zabbix was shutdown and restarted. I sometimes have to kill the Zabbix parent process to completely to kick it free (and losing all the data in the history cache), and then after a restart the Proxies all start to plow the data they have been collecting and storing locally back to Zabbix server. When this occurs the syncers actually keep up fine (along with the DB) and the queue will settle itself out within about 5-10 mins (and the rest of the system seems ok).
      Last edited by [email protected]; 11-04-2018, 23:31.

      Comment

      • sb195c@att.com
        Member
        • Oct 2017
        • 41

        #4
        Also, my config sync process constantly runs around 35% and nothing i've tried seems to make that change, not sure if that is related?

        Comment

        Working...