Ad Widget

**tchjts1** · 25-06-2014, 17:41

What is your polling interval for those 2 items? I am using agent version 2.0.9 with those (active) items at 60 second intervals.

I use the stock triggers and have no issues detecting a server reboot or agent unavailable.

These are the triggers I have:

Server restarted:

Code:

{Template OS Windows:system.uptime.last(0)}<600

Agent unreachable:

Code:

{Template OS Windows:agent.ping.nodata(5m)}=1

**PiotrIr** · 25-06-2014, 17:48

tchjts1,

Thank you for your reply.

Server restarted is exactly the same like in your example.

agent.ping.nodata I've tried 5m, 15m, 60m and 1440m.

**tchjts1** · 25-06-2014, 17:53

As mentioned in my earlier response:
What is your polling interval for those 2 items?

Also, when looking at one of your hosts under items and then under triggers, do they both have green checkmarks under the "Error" column, or are they showing red with an actual error?

Attached Files

**PiotrIr** · 25-06-2014, 17:59

Sorry, I misunderstood you.

I'm not sure if pooling interval is the same as update interval (if not, could you tell me how to check this?). If so:

agent.ping 30s
system.uptime 300s

both have green mark under the "Error" column.

**tchjts1** · 25-06-2014, 18:05

Polling interval and update interval... same to me. Your settings are fine there.

When you say they are green, you are looking at a host, and not at the template, right?

The next step I would do is see if Zabbix internal processes are sufficient to handle the workload.

Take a look at this post, at the last paragraph and the graphs that follow.

If you assign that template to your Zabbix server, you will be able to see if Zabbix is struggling or not. Here is the post:

Just a moment...

https://www.zabbix.com/forum/showthread.php?t=41219

**PiotrIr** · 25-06-2014, 18:29

Thank you for your reply.

Yes, green is on host.

Could you help me to interpret data please? Some processes are very busy but I'm not sure if they apply to the issue and eventually how to resolve problem.

**tchjts1** · 25-06-2014, 18:38

Those images are very hard to see because they are so small. But I will certainly be glad to help you interpret them.

Can you take a screenshot of each graph and attach them as a separate images? (I use MWSnap3 for this purpose)

Also, instead of using a 1 hour timeframe in your graphs, please use 1 day (24 hours) instead. This will give a better overall picture of your process usage.

MWSnap3 will allow your screenshots to look like this when you upload them:

Attached Files

**PiotrIr** · 26-06-2014, 10:15

tchjts1,

Once again, thank you for your help.
Pictures below. You will see around 45 minutes break in data - I shut down a server for this time.

**tchjts1** · 26-06-2014, 17:44

I would make a few adjustments to your zabbix_server.conf file.

You can see that your trapper processes are 100% busy all the time.
I would also increase your pollers a bit.
I would also allocate some more configuration cache.

So these are the settings you should adjust, then after you do that, you need to restart your Zabbix server process:

(Note that I leave the defaults in place with the comment sign # preceding that line so that I always know what the default setting is, and I put the new vales on a new line without the # sign)

I am only guessing that you are running with all the stock default values. If you have already modified those values, then increase them in small chunks until you get the desired results. If you are running on the default values, then theses below suggested settings may work for you.

For trappers:

Code:

### Option: StartTrappers
#       Number of pre-forked instances of trappers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartTrappers=5
StartTrappers=15

For pollers:

Code:

### Option: StartPollers
#       Number of pre-forked instances of pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollers=5
StartPollers=35

I would also bump up the unreachable pollers a bit from default:

Code:

### Option: StartPollersUnreachable
#       Number of pre-forked instances of pollers for unreachable hosts (including IPMI).
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollersUnreachable=1
StartPollersUnreachable=5

For configuration cache:

Code:

### Option: CacheSize
#       Size of configuration cache, in bytes.
#       Shared memory size for storing host, item and trigger data.
#
# Mandatory: no
# Range: 128K-1G
# Default:
# CacheSize=8M
CacheSize=64M

Do not change the StartDBSyncers value. Leave that at 4.

Remember to restart the Zabbix server process after making the adjustments. Let things run for a few hours, then re-check your internal process graphs and see if things have improved.

**PiotrIr** · 27-06-2014, 12:05

tchjts1,

You are genius. This works like a charm! Thank you so much!

**tchjts1** · 01-07-2014, 19:26

You're welcome. Keep in mind my suggested settings are just that - suggestions. You may need to tweak them further for optimal performance.

These internal graphs should be reviewed on a regular basis, and particularly if you are adding in more hosts/items/triggers as time goes on.

One other point is that the template the internal items belong to also has built in triggers that you should have seen on the Zabbix dashboard. Specifically the one that would have triggered for trappers being more than 75% busy.

I would suggest that these triggers be taken seriously as it certainly affects Zabbix performance.

**PiotrIr** · 02-07-2014, 17:51

I will keep eye on this. Once again, thank you so much.

**tchjts1** · 02-07-2014, 17:54

I have an action set up to notify me by e-mail any time these items trip the thresholds .

Attached Files

**PiotrIr** · 02-07-2014, 18:06

This make perfect sense, thank you for advice.

I’ve noticed also small problem with mine Zabbix housekeeper process. When it runs on 100% it slows down the server a lot. I decreased items from 500 to 100 and will see if this will help. Read some posts and optimized MySQL (didn't helped too much) but is any other way to cut the 100% to less somehow? I noticed problem is in HDD speed but I can't increase it as no budget for new hardware.

As I’m playing Zabbix (must say like it) new things came to mine mind and just wander if you could help me.

I have bandwidth monitoring on couple of routers using Template SNMP Interfaces and this is working perfectly fine. However to troubleshoot issues I would like to monitor this per source (internal) IP address. Is any way to monitor bandwidth per IP address directly on router instead of switch?

Other thing is recording of connections on router – source -> destination. I realize this may take a lot of resources but sometimes when I need infected PC in the network could be very helpful.

Once again, thank you for help.

Ad Widget

Server status problem

Server status problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment