Ad Widget

**tchjts1** · 22-01-2013, 22:46

On your internal process % statistics, I do not see "Zabbix busy poller process %" listed. (I saw proxy poller process). Your graph only shows a 1 hour timeframe. You should look at a larger window and see how that looks. Your proxy poller process looks relatively low for that one hour. You should take a look at Zabbix busy poller process %.

Anyway, how many hosts are you monitoring? I have about 200 hosts and I have my StartPollers value set at 70 for Zabbix server. (No proxies) You should also take a look at your cache usage stats.

Attached Files

**mcmyst** · 23-01-2013, 00:02

I have around 300 network switch monitored via SNMPv2.
Here are the graphs:

I know that my pollers looks low, but before today StartPollers was at 8 for the server and the proxy... Today I have set it to 15 but no change on the queue. The pollers does not look to be busy, so increasing it will not change anything in my opinion. Do you think that I should try more ?

But I don't know on which I should increase it ? On the server or on the proxy ?

**tchjts1** · 23-01-2013, 00:14

Originally posted by mcmyst

I know that my pollers looks low, but before today StartPollers was at 8 for the server and the proxy... Today I have set it to 15 but no change on the queue. The pollers does not look to be busy, so increasing it will not change anything in my opinion. Do you think that I should try more ?

But I don't know on which I should increase it ? On the server or on the proxy ?

Yeah, you're right. All your pollers look fine for resource usage. Are you seeing anything helpful in your zabbix_server.log?

Maybe you are having some timeouts happening. You could try increasing this variable by a few seconds to see if it helps:

Code:

### Option: Timeout
#       Specifies how long we wait for agent, SNMP device or external check (in seconds).
#
# Mandatory: no
# Range: 1-30
# Default:
# Timeout=3

And any changes you make to the zabbix_server.conf file require a restart of the Zabbix server process.

**mcmyst** · 23-01-2013, 00:21

Ok thank you I will try tomorrow morning. I will post the results here.

**tchjts1** · 23-01-2013, 00:23

Check your server log first before you change settings... see if there is any obvious error happening there.

**flako** · 23-01-2013, 00:46

Hello
As is the graph of 'Zabbix performance' (the queue)
Viewing your graphics, I would try disabling housekeeper (zabbix_server.conf). This is 100% saturated (I'll bet you a beer that is not ending), this makes DB saturates causing the queue items increase. You're also running every 70min (Its too fast, once per day would be enough if housekeeper work)

**tchjts1** · 23-01-2013, 01:07

Although housekeeper may be attributing to some of the bottleneck, I wouldn't recommend disabling it unless you have your DB partitioned, and are managing old data that way.

My housekeeper is the same as yours, except a bit shorter in duration. I would look at optimizing your settings in my.cnf rather than disabling housekeeper.

I too, thought the same as Flako and set it to run only every 12 hours. Bad choice though, as it then ran for about 2 hours solid instead of 5 or 10 minutes every hour.

Attached Files

**mcmyst** · 23-01-2013, 07:59

Thank you all for your replies.

I know that housekeeper is a performance killer, but I have to run it to delete old data because I don't have a partitioned database (MySQL can't use foreign keys on partitioned tables). And even if I change of database engine, I want to keep some items for 3 years and some others for one year so it would never works.

Here are my logs from the proxy:

Code:

snmp_build: unknown failure 25485:20130123:064848.679 SNMP item [Ethernet0-0-5.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
snmp_build: unknown failure 25491:20130123:064903.039 SNMP item [Ethernet0-0-32.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
 25491:20130123:064918.054 resuming SNMP checks on host [SWITCH]: connection restored
snmp_build: unknown failure 25485:20130123:064928.660 SNMP item [Ethernet0-0-35.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
 25491:20130123:064943.079 resuming SNMP checks on host [SWITCH]: connection restored
snmp_build: unknown failure 25478:20130123:064948.530 SNMP item [Ethernet0-0-3.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
snmp_build: unknown failure 25491:20130123:065003.085 SNMP item [Ethernet0-0-3.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
 25491:20130123:065018.110 resuming SNMP checks on host [SWITCH]: connection restored
snmp_build: unknown failure 25481:20130123:065048.282 SNMP item [FastEthernet2-14.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
 25469:20130123:065050.155 Received configuration data from server. Datalen 10951816
snmp_build: unknown failure 25491:20130123:065106.423 SNMP item [FastEthernet2-14.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
snmp_build: unknown failure 25478:20130123:065108.605 SNMP item [Ethernet0-0-6.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
 25491:20130123:065118.432 resuming SNMP checks on host [SWITCH]: connection restored
 25488:20130123:065123.439 resuming SNMP checks on host [SWITCH]: connection restored
snmp_build: unknown failure 25473:20130123:065128.598 SNMP item [Ethernet0-0-4.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
snmp_build: unknown failure 25491:20130123:065143.439 SNMP item [Ethernet0-0-4.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
snmp_build: unknown failure 25491:20130123:065158.449 SNMP item [Ethernet0-0-41.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
 25491:20130123:065213.507 resuming SNMP checks on host [SWITCH]: connection restored
snmp_build: unknown failure 25485:20130123:065223.634 SNMP item [Ethernet0-0-44.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
snmp_build: unknown failure 25488:20130123:065238.459 SNMP item [Ethernet0-0-44.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
 25488:20130123:065253.475 resuming SNMP checks on host [SWITCH]: connection restored
 25469:20130123:065307.898 Received configuration data from server. Datalen 10951816
snmp_build: unknown failure 25486:20130123:065308.798 SNMP item [Ethernet0-0-40.ifHCInBroadcastPkts] on host [SWITCH] failed: first network error, wait for 15 seconds
snmp_build: unknown failure 25488:20130123:065324.192 SNMP item [Ethernet0-0-40.ifHCInBroadcastPkts] on host [SWITCH] failed: another network error, wait for 15 seconds
 25491:20130123:065338.211 resuming SNMP checks on host [SWITCH]: connection restored

I have only this kind of error, I think it is because there are too many SNMP queries on the host and it stops responding. I will see if I can raise up the limits on all the switch.

On the server side, I have only this:

Code:

 1047:20130123:062534.813 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1048:20130123:062753.672 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1055:20130123:062837.192 housekeeper deleted: 1134607 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
  1047:20130123:063011.860 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1046:20130123:063229.189 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1049:20130123:063446.985 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1049:20130123:063704.754 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1049:20130123:063922.709 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1048:20130123:064139.850 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1047:20130123:064357.460 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1050:20130123:064614.870 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1046:20130123:064832.537 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1048:20130123:065049.954 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1050:20130123:065307.683 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1049:20130123:065525.448 Sending configuration data to proxy 'proxy'. Datalen 10951816
  1050:20130123:065742.999 Sending configuration data to proxy 'proxy'. Datalen 10951816

Yes the housekeeper saturates on 100% but as you can see, it is not causing lot of trouble to the database. But you make me think that the housekeeper is still enabled on my proxy. Could it be the cause ? I will try disabling it at work this morning just to see if it get better.

**mcmyst** · 23-01-2013, 09:52

Ok so I think I have found my problem thanks to this post:

We’ll be back soon!

https://www.zabbix.com/forum/showpost.php?p=99280&postcount=4

So I have double checked my proxy log saying "first network error" and then I found that some items OID where malformed as follow:
.3.6.1.2.1.31.1.1.1.9.6 in place of 1.3.6.1.2.1.31.1.1.1.9.6

The thing is that these OID were not in "Not Supported" state but in "Enabled" state.

In fact I have developped a program to create automatically zabbix items/triggers/graphs trhought the API. The typo was in my program code...

So now I have to figure out all the malformed OID and it should be much better !

**mcmyst** · 23-01-2013, 11:11

We did it !

So the problem was 29 items with malformed OIDs that were not reported as "Not Supported"...

So thank you everyone for your help, thank you 'tchjts1' to have pointed me to the logs !

And as you can see, even if the housekeeper is at 100%, the queue is getting lower and lower !

Ad Widget

Huge zabbix queue but where is the bottleneck ?

Huge zabbix queue but where is the bottleneck ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment