Ad Widget

Collapse

Postgres For large zabbix

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pietro54
    Senior Member
    • Feb 2011
    • 112

    #1

    Postgres For large zabbix

    Hello,
    I got big zabbix database, over 170gb. Over 300nvps. Its working "not bad" but it can work better
    I belive the bottleneck of the system is database.

    Im looking for some tuning tips.
    Of coures I read a lot about howto tuning zabbix, postgres and what hardware should it be.

    So my config`s:
    Hardware:
    2x X5690 @ 3.47
    24Gb Ram memory
    4x 240gb ssd disk in raid 10
    But Raid controler is only 1,5Gbps. <- This is in plans to change.

    Software:
    Postgres 9.1
    Zabbix 2.0.4
    Redhat 6.3

    Zabbix_server.conf
    NodeID=1
    LogFile=/tmp/zabbix_server.log
    LogFileSize=1000
    DebugLevel=3
    DBHost=localhost
    DBName=zabbix
    DBUser=zabbix
    DBPassword=5r4e3w2q
    SenderFrequency=5
    CacheSize=32M
    HistoryCacheSize=256M
    TrendCacheSize=16M
    HistoryTextCacheSize=64M
    Timeout=15
    FpingLocation=/usr/local/sbin/fping
    LogSlowQueries=10000
    StartPollers=30
    StartPingers=15
    StartPollersUnreachable=30
    StartIPMIPollers=10
    StartTrappers=15
    StartDBSyncers=10
    StartHTTPPollers=15
    postgres.conf
    max_connections = 300 # (change requires restart)
    shared_buffers = 12288MB # min 128kB
    work_mem = 64MB # min 64kB
    maintenance_work_mem = 216MB # min 1MB
    wal_level = hot_standby # minimal, archive, or hot_standby
    wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers
    checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
    checkpoint_timeout = 30min # range 30s-1h
    checkpoint_completion_target = 0.8 # checkpoint target duration, 0.0 - 1.0
    max_wal_senders = 5 # max number of walsender processes
    cpu_tuple_cost = 0.0030 # same scale as above
    cpu_index_tuple_cost = 0.0010 # same scale as above
    cpu_operator_cost = 0.0005 # same scale as above
    effective_cache_size = 12384MB
    log_destination = 'stderr' # Valid values are combinations of
    logging_collector = on # Enable capturing of stderr and csvlog
    log_directory = 'pg_log' # directory where log files are written,
    log_filename = 'postgresql-%a.log' # log file name pattern,
    log_truncate_on_rotation = on # If on, an existing log file with the
    log_rotation_age = 1d # Automatic rotation of logfiles will
    log_rotation_size = 0 # Automatic rotation of logfiles will
    log_min_error_statement = warning # values in order of decreasing detail:
    log_min_duration_statement = 1000 # -1 is disabled, 0 logs all statements
    log_error_verbosity = default # terse, default, or verbose messages
    log_line_prefix = '%m %u %e' # special values:
    autovacuum = on # Enable autovacuum subprocess? 'on'
    autovacuum_max_workers = 5 # max number of autovacuum subprocesses
    autovacuum_naptime = 1min # time between autovacuum runs
    datestyle = 'iso, mdy'
    lc_messages = 'en_US.UTF-8' # locale for system error message
    lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
    lc_numeric = 'en_US.UTF-8' # locale for number formatting
    lc_time = 'en_US.UTF-8' # locale for time formatting
    default_text_search_config = 'pg_catalog.english'

    1. I`m afraid the autovacuum isnt working at all.
    2. I`m making vacuum full on evry sunday. This help for a while. Database is ligther for even 15gb (175gb before and 160gb after vacuum)
    3. I want to hold long history. Now Im holding almoust evry item for 90 days. I plan to hold this for 190days. And trends for 365days
    4. I know I should make partitionig, so we can jump this tip.
    5. 300nvps is big value, but in plan is ~500nvps.
    6. When postgres 9.3 will be stable version i wannt to migrate my database.
    7. Is there something else then vaccum analyze in postgres, like analyze in oracle?

    Please advice me, is this correct path to tuning my zabbix, is there something I can do now?

    Regards,
    Pietro
    Last edited by pietro54; 01-09-2013, 13:08.
  • f.koch
    Member
    Zabbix Certified Specialist
    • Feb 2010
    • 85

    #2
    Hi,

    1. I`m afraid the autovacuum isnt working at all.
    hm, why do you think it's not working?
    make sure you enabled stats_start_collector and stats_row_level


    5. 300nvps is big value, but in plan is ~500nvps.
    do you have problems with 300 nvps ?

    in general, which problems do you try to solve?

    regards

    Comment

    • BDiE8VNy
      Senior Member
      • Apr 2010
      • 680

      #3
      Hi!

      Just want to write down some things that I've done when I was facing serious performance (mostly I/O related) issues.
      Everything is written from memory and is for sure not everything that I've done.

      First things that I would suggest:
      • Get familiar with Zabbix internals (processes, caches, value processing, etc.)
      • Don't touch Zabbix history caches unless you have to - lots does not help lots
      • Don't increase StartDBSyncers unless you have to and know how history synchronization works
      • If you use Zabbix proxies, and there are only a few reasons not to, set StartTrappers to more than that
      • Be sure you know how to capture bad performing database queries and how to analyze them
      • Disable housekeeper and partition history and trends tables
      • If using Zabbix prior to 2.1.3 take care every foreign key child is indexed


      The following settings are for informational purposes only and are not intended to be reasonable or suitable for any other installation!

      Code:
      shared_buffers = 2GB
      temp_buffers = 32MB
      work_mem = 64MB
      maintenance_work_mem = 512MB
      max_stack_depth = 9MB
      shared_buffers is probably to high. I think I definitely recognized better frontend responsiveness until reaching 1GB.
      temp_buffers, work_mem and maintenance_work_mem is set on feeling.
      max_stack_depth is set to size limit enforced by the kernel

      Code:
      synchronous_commit = off
      checkpoint_segments = 128
      checkpoint_timeout = 60min
      checkpoint_completion_target = 0.9
      synchronous_commit is disabled since Zabbix data is (at this time) not critical in terms of currentness of data in case of a db crash.
      check_point_segemtns are sized to assure checkpointing is triggered under normal load by timeout only.
      checkpoint_timeout is set to it's maximum. That should minimize count of dirty buffer writes for frequently updated data.
      checkpoint_completion_target is set to slow down dirty buffer writes to complete almost before next checkpointing occurs.

      Code:
      seq_page_cost = 4.2
      effective_cache_size = 20GB
      autovacuum_naptime = 15s
      autovacuum_vacuum_cost_delay = 0
      seq_page_cost had to be adjusted to avoid sequencial scans for a particualar query where a index scan is the better choice.
      effective_cache_size gives the database an idea how much pages might be located in filesystem buffer.
      autovacuum_naptime is balanced according to total vacuum time.
      autovacuum_vacuum_cost_delay is set to 0 to be as aggressive as possible. History tables have autovacuum_vacuum_cost_delay set to 20

      Currently most annoying issue to me: FrozenXIDs
      Freezing might be pain in the ass on large history tables (in my case up too 100GB per month) when autovacuum kicks in.

      Here some information about the corresponding Zabbix environment.
      All Linux hosts are virtualized via OpenVZ

      Zabbix-Server
      - CentOS 6.4 (i386)
      - PostgreSQL 9.1.9
      - Zabbix-Server 2.0.7

      Zabbix-Proxies (SQLite)
      - CentOS 5.9 and 6.4 (i386 and x86_64)
      - 40 proxies in use
      - All proxies equipped with Zabbix-Java-Gateway, snmptrapd, snmptt, ODBC and DBforBix

      System x3650 M3
      2 x Intel Xeon Processor X5670
      64GB RAM (Total)
      8GB RAM (Zabbix OpenVZ container including DB/Server/Frontend)
      8x 2.5" SAS 300GB 10K RAID6 (1x Spare)

      Some numbers
      Monitored hosts: ~1.2K
      Monitored items: ~150K
      Active triggers: ~80K
      Host groups: ~200
      User groups: ~30
      Users: ~100
      NVPS: ~1K

      The following numbers are in average over one day:

      Zabbix busy internal timer processes ~25%
      Zabbix busy configuration syncer ~5%
      Zabbix busy hystory syncer ~13%

      CPU idle time 85%
      CPU user time 14%
      CPU system time 0.9%
      CPU iowait time 0.1%

      Read Ops ~12
      Write Ops ~780

      Read Bps ~190 KB/s
      Write Bps ~ 6.9 MB/s

      Edit:
      Recent modifications:
      Code:
      shared_buffers = 1GB
      bgwriter_lru_maxpages = 0
      checkpoint_segments = 250
      #autovacuum_naptime = 1min
      deadlock_timeout = 5s
      shared_buffers (as already mentioned) was set a bit too high.
      bgwriter_lru_maxpages is set to 0 to disable background writing and avoid heavily updated data to be written often. I tried this before without significant performance gain. This time it makes a difference - I was probably too impatience. Anyhow, ZBXNEXT-1689 is a better approach and will make this needless.
      checkpoint_segments increased to avoid checkpointing by xlog.
      autovacuum_naptime is reset to it's default (1min)
      deadlock_timeout has been set according to max execution time for most queries on normal load

      Edit:
      Code:
      shared_buffers = 2GB
      #temp_buffers = 8MB
      work_mem = 80MB
      Turns out that reducing shared_buffers from 2GB to 1GB (in connection with disabled background writing) let checkpointing to be finisched within 20mins and thus higher I/O rates.
      After reverting the setting back to 2GB checkpointing gets spread over ~45 mins
      temp_buffers reverted to default.
      work_mem increased to avoid creating temporary files on considerably sorts
      Last edited by BDiE8VNy; 26-09-2013, 18:17. Reason: some modifications

      Comment

      • pietro54
        Senior Member
        • Feb 2011
        • 112

        #4
        Hi,
        sorry for very late respond.
        Thank you for your replay, thank you for your patient for my text

        It doesn't help a lot in my performance issue but it was very informative.

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Originally posted by pietro54
          Hello,
          I got big zabbix database, over 170gb. Over 300nvps. Its working "not bad" but it can work better
          TBH IMO it is very small zabbix installation and mid size zabbix installations begins somewhere at few k to few tenths k nvps.

          I belive the bottleneck of the system is database.
          Bottleneck is in storage .. in latency of I/Os (not even in max bandwidth)

          [..]
          Software:
          Postgres 9.1
          Zabbix 2.0.4
          Redhat 6.3

          [..]

          1. I`m afraid the autovacuum isnt working at all.
          2. I`m making vacuum full on evry sunday. This help for a while. Database is ligther for even 15gb (175gb before and 160gb after vacuum)
          3. I want to hold long history. Now Im holding almoust evry item for 90 days. I plan to hold this for 190days. And trends for 365days
          4. I know I should make partitionig, so we can jump this tip.
          5. 300nvps is big value, but in plan is ~500nvps.
          6. When postgres 9.3 will be stable version i wannt to migrate my database.
          7. Is there something else then vaccum analyze in postgres, like analyze in oracle?

          Please advice me, is this correct path to tuning my zabbix, is there something I can do now?
          First switch to partitioning. Database type (postgress or mysql) does not really matter. Switch to zabbix 2.2 to utilize zabbix server cashing which decreases number of select up to 3 times.

          I'm running 1.2knvps on four year old hardware (HP blade gen6).
          DB backen is using 200GBx2 SSDs (behind RAID controller).
          Zabbix server 2.2 is running on HP blade gen1 and is not using even 10% of CPU power.
          I should switch DB backed to Solaris to use ZFS compression (my current zabbix database is 170GB). With 1MB zfs record size compression ratio is about 3.6 so effectively my 200GB SSDs will be like more than 3 times more fast storage (cost of buying this size enterprise SSDs for gen6 is almost two times higher than Solaris license cost .. if you need support).

          Zabbix uses typical warehouse database. Using for such DB postgresql is IMO overkill. Mysql as simpler and easier to maintain is IMO better choice (but it is only my private opinion and I'm aware that some people may not agree with this). However here IMO more important may be skills of someone who will be looking after zabbix DB backend.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          Working...