Ad Widget

**mjcig** · 05-05-2014, 18:47

Had an opportunity to restart the zabbix server and there is no change in the behavior of the history cache filling. The frequency has increased, but nothing on the database or in the logs have proven any area to dig further.

I confirmed there are no log items defined as checks and still trying to understand the impact if I let the cache fill completely. In past experience, when the history cache fills, the db syncers would become more busy as it is unable to process and write the data to the db quick enough.

The odd thing is this is not the case. There is no performance based on the internal checks I have in place.

Attached are grapsh frot he alst 7 days to help illustrate what I am seeing. The changes in the history cache is me restarting the zabbix services as in past when the history cache fills, data to be processed falls behind.

Any other thoughts?

Attached Files

**mjcig** · 29-05-2014, 00:17

To add.... I let let the cache run to zero and after bouncing on the bottom for a few hours; one of the proxy servers stopped sending values to the primary creating a back log in values.

I am planning on moving to 1.8.20 for the short term while we evaluate upgrade options. We have a large history partition in our db with over 1.4 billion rows of data.

Anyone running 1.8.20 and can chime in with any issues they may have encountered?

Thanks in advance

**tchjts1** · 29-05-2014, 06:34

Increasing DB Syncers is not your answer. You should leave that at the default unless you have a very large installation monitoring thousands of hosts.

Instead, there are some cache settings in your zabbix_server.conf that you should increase (I think 1.8.2 has them). There are 3 of them. Just search that conf file for "cache" and you'll see them. Maybe try bumping them up an additional 128M.

After you adjust them, restart your Zabbix server process.

One of the cache settings is this: (As you see, I have mine set to 256M)

Code:

### Option: CacheSize
#       Size of configuration cache, in bytes.
#       Shared memory size for storing host, item and trigger data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# CacheSize=8M
CacheSize=256M

It may also benefit you to bump your Timeout= setting to 10 if you are still at the default of 3.

**mjcig** · 29-05-2014, 15:52

Thanks fr your thoughts.

I agree that bumping up the number of dbsyncers will not solve problems, but we write 150 million rows of data to the history table daily. I have played around with increasing ans decreasing the dbsyncers here and there with no major changes.

As for the History Cache, I have it set to 384M and was previously configured for 256M. The odd thing is this behavior started occuring out of nowhere. Before 4/10 the cache with 256M never went below 99% free and was
that way for a year. changing it to more or less and I still see the same behavior. It is almost as if I stumbled upon some bug and the values in the buffer are not some sort of integer value.

As for the timeout, it is my understanding this is for external checks... but has been some time. We originally has this set higher and saw increased lowad on our zabbix host. We moved those processes to scripts leveraging zabbix_sender to push the values in. Much more efficient at processing.

In fact our complete environment is configured for active monitoring of the agents. We saw a huge increase in the number of values a proxy and server can process as the burden of poling and waiting for data is no longer and issue.

**mjcig** · 03-06-2014, 14:55

Update -

I compiled and updated the server and proxy binaries to 1.8.20 and still experiencing the same behavior as before with the history cache being consumed.

Are there any other internal checks or tools to review what type of data is being written to the history cache or inspect what data is in the buffer?

**waydena** · 08-10-2014, 15:45

Same history write cache problem out of nowhere

We had the same problem out of nowhere after a system had been running for about 6 months. (Monitoring a system with 6 windows servers and 30 windows workstations, 4 switches, a couple of VoIP gateways. Running on a Dell R520 with 32 GB of RAM)

I had added a couple of log file monitors and tweaked some switch SNMP monitoring. This was before the initial problem.

After the problem I turned off the log monitoring and I also decreased some polling times in some widely used templates (from 30 -> 60 secs) and reduced some history settings in other templates (from 30 days to 7).

I suspect the system may have gotten even more busy dumping all the data that was no longer required due to the reduced history settings - the housekeeping spiked a little also. I tried changing that from the default 1 hr to 6 hrs (as suggested in other forum posts) but that appeared to make things worse so I set it back to hourly. It may also have been that we just rolled over the period when a lot of history started getting dumped. These were both changes after the initial problem though.

Our cache sizes were previously all defaults. I increased the settings as follows:

Cache Size: 8M -> 32M
History Cache: 16M -> 32 M
History Text Cache: 16M -> 32M

The system is behaving now (with Log monitors turned back on) but it is difficult to tell for sure if the problem is resolved by the new cache settings because it is back to running at 99.9% free all the time as it was previous to "the incident"...

Ad Widget

weird problem with history write cache filling

weird problem with history write cache filling

Comment

Comment

Comment

Comment

Comment

Comment