Ad Widget

**Colttt** · 02-11-2012, 04:05

hello!,

maybe you can create a bugreport for this?!

Otherwise you can use pg_pool to manage connection, some other people use it with zabbix (f.koch?) and it works fine..

**rsvancara** · 02-11-2012, 05:13

Created a Bug Report

I created a bug report. However from what I can see the way zabbix is designed, it will not work well in a large scale, enterprise environment on postgresql due to the way postgresql implements locking and the way zabbix brutally shoves updates into the items table in the database. This is not Postgresql's fault, it is the software engineer(s) just not recognizing the limitations of the database platform and providing the proper optimizations to support it.

If you are interested, check out dbcache.c and look at how it creates a single large transaction of about 1000 update statements for the items tables. Now imagine a scenario where two or more transactions contain the same itemid? Now you have lock contention and the database must figure out what to do. In some cases there are deadlocks and the database must perform an expensive operation to check for this condition which ultimately leads to other problems.

At least that has been my observation.

**rsvancara** · 02-11-2012, 05:15

pg_pool

I do not see how this helps as zabbix's processes hold the connection open so a pool is essentially worthless unless it can perform caching of some kind or transparently fix the problems with lock contention by serializing the data inserts.

**f.koch** · 02-11-2012, 09:42

Perhaps 32 StartDBSyncers is too high? Any recommendations? Anyone know anything about zabbix on this forum? I only say this because it seems like people do not really get much help here and the answers seem more like guesses?

Hi, this is your problem, set the DBSyncer to the default (comment the setting), you don't need more syncer, the high syncer setting is the cause for the sharelock errors. (thats not the real cause, but you don't get it with the default).

Why do you has set the DB Syncher to 32?

regards flo

**Colttt** · 02-11-2012, 20:04

Can you please post you bugreport(number)?

**rsvancara** · 02-11-2012, 20:49

StartDBSyncers

In order to keep up with the workload. We were seeing sharelock and deadlock problems at 6 StartDBSyncers. I can see where this would be problematic but the setting was increased in an effort to increase throughput which clearly is not working.

**f.koch** · 02-11-2012, 21:01

Originally posted by rsvancara

In order to keep up with the workload. We were seeing sharelock and deadlock problems at 6 StartDBSyncers. I can see where this would be problematic but the setting was increased in an effort to increase throughput which clearly is not working.

hm do you know the internal process monitoring? take a look, if there is a problem with the history syncer processes (thats the name for the DB syncers) you can see this.

Can you give some infos about your system ? DB Size, new vps, monitored hosts?

which throughput would to increase?

**rsvancara** · 02-11-2012, 21:05

System Specifications

Code:

Zabbix server is running	Yes	its-zabbix-01.it.wsu.edu:10051
Number of hosts (monitored/not monitored/templates)	3830	3743 / 29 / 58
Number of items (monitored/disabled/not supported)	256323	241129 / 2622 / 12572
Number of triggers (enabled/disabled)[problem/unknown/ok]	197027	197011 / 16  [125 / 0 / 196886]
Number of users (online)	66	12
Required server performance, new values per second	2159.63	 -

Database Server/Zabbix Server (Physical Hardware)
Memory: 96GB RAM
CPU: 12 Core
Disks: 6-15K RPM drives (300GB each) RAID 1+0

Two Proxy Servers
32GB RAM each
8 CPU Cores
Mirrored 10RPM-300GB drives.

I can see large load spikes that correlate with the db syncers running.

**rsvancara** · 02-11-2012, 21:07

Additional Information

We have the history tables partitioned and the housekeeper is turned off.
Additionally I am having to manually vacuum the items table, otherwise it reaches over 44GB in size for just over 200,000 records. Perhaps tuning back the number of syncer processes will help.

**f.koch** · 02-11-2012, 21:10

Originally posted by rsvancara

Code:

Zabbix server is running	Yes	its-zabbix-01.it.wsu.edu:10051
Number of hosts (monitored/not monitored/templates)	3830	3743 / 29 / 58
Number of items (monitored/disabled/not supported)	256323	241129 / 2622 / 12572
Number of triggers (enabled/disabled)[problem/unknown/ok]	197027	197011 / 16  [125 / 0 / 196886]
Number of users (online)	66	12
Required server performance, new values per second	2159.63	 -

Database Server/Zabbix Server (Physical Hardware)
Memory: 96GB RAM
CPU: 12 Core
Disks: 6-15K RPM drives (300GB each) RAID 1+0

Two Proxy Servers
32GB RAM each
8 CPU Cores
Mirrored 10RPM-300GB drives.

I can see large load spikes that correlate with the db syncers running.

Load spikes? you mean system load ? , have you tuned your postgres configuration?
in such a big environment, you should run the zabbix server and the database on differend hardware.

How big is your Database?

**rsvancara** · 02-11-2012, 21:15

Database Server

We have tuned the database server per the postgresql documentation. Here are the changes we have made:

Here is our modifications to the default postgresql.conf file:
listen_addresses = '*'
max_connections = 300 # (change requires restart)
shared_buffers = 40GB # min 128kB # WSUIT
work_mem = 1024MB # WSU IT
maintenance_work_mem = 512MB # WSU IT
effective_io_concurrency = 4
synchronous_commit = off # immediate fsync at commit # WSUIT
wal_buffers = 1024MB
checkpoint_segments = 512 # in logfile segments, min 1, 16MB each # WSUIT
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 # WSUIT
effective_cache_size = 48GB # WSUIT
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = '/var/log/pg_log' # directory where log files are written, WSUIT
log_filename = 'postgresql-%a.log' # log file name pattern,
log_truncate_on_rotation = on # If on, an existing log file of the
log_rotation_size = 100MB # Automatic rotation of logfiles will
log_min_error_statement = error # values in order of decreasing detail:
log_min_duration_statement = 1000 # WSUIT -- Log statements over ten seconds
log_line_prefix = '%t %u %d %h - ' # special values:
log_lock_waits = on # WSUIT - Turn on logging waits
autovacuum_max_workers = 6 # WSUIT -- Increase these for our larger tables
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting

postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

We are unable to run the database on separate hardware at this time. In terms of load, yes, I am referring to high system cpu utilization but I/O is 0% which seems indicative of something waiting for something else.

The database cluster occupies about 250GB of disk space.

**f.koch** · 02-11-2012, 21:19

ok so you have high cpu load but not much I/O waiting when the db syncer are running, so where is the problem?

**rsvancara** · 02-11-2012, 21:25

Good Question

All I know is that a parallel implementation that is running on MySQL on a system with 16GB of RAM is out performing postgres and zabbix, monitoring the same number of items, but not as many hosts.

Our queues are backed up and I can not figure out how to get Zabbix and Postgres more data. Additionally the high CPU load, is a result of misconfigured syncers, I do understand that. But normally, our system sits mostly idle, but the queue remains backed up and additionally our data is full of holes.

If you have ideas, let me know, but otherwise, I have looked into everything including running the server in debug mode and using strace.

We are actively converting everything over to MySQL now. The only issue is going to be housekeeper but I think a simple delete where statement will clean up history tables and allow us to turn off housekeeper.

**f.koch** · 02-11-2012, 21:31

Have you considered to ask the zabbix support for help? have you a support contract? in such ab big installation it makes sense.

I think this is not an problem we can solve only with posting informations in the forum.

Ad Widget

Zabbix Postgresql Locking

Zabbix Postgresql Locking

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment