Some points of interest:
Solaris 10 running on a SPARC-Enterprise-T5120 with 16G of RAM.
Zabbix 1.8 (or trunk, it flunks with either)
Oracle 10.2.0.1 (running on a remote host, accessed via GigE).
Presently, zabbix_server.conf is set to default on everything but the MUST configures, like hostname and db setup. We've tried it with differing values cranked up to the max and it doesn't make a bit of difference (which leads me to believe it's database performance related).
Number of hosts (monitored/not monitored/templates) 89 49 / 0 / 40
Number of items (monitored/disabled/not supported) 1694 1677 / 0 / 17
Number of triggers (enabled/disabled)[true/unknown/false] 1163 1163 / 0 [5 / 810 / 348]
Required server performance, new values per second 19.013055555556
You'd think a server like this could handle just about anything you could throw at it, but somehow that's not the case with Zabbix Server 1.8 and Oracle 10.2.0.1 on the backend.
Preliminary data shows that we can sustain up to around 12 values per second before the system decides to simply not appear to update any more. After about 15 to 30 minutes, my QUEUE is pegged at 1100 for Zabbix Agent and 111 for Simple Checks in the "Longer than 10 minutes" column. I am overwhelmed by false positives on a huge number of triggers simply because the data which is coming in is not getting recorded or something. Even with debuglevel at 4 I'm only seeing SUCCESS for data incoming. It just doesn't seem to be getting in fast enough. Here's an example:
I am also flummoxed by how long it's taking to restart zabbix. First it takes forever to write the values out to the database and then it takes forever to start again after:
All of the bajillion similar entries go by at the speed of slow. Is it really updating every trigger with "Zabbix was restarted"?
We did NOT have any problems like this with 1.6.x. What gives? OCI was supposed to be faster than libsqlora8. It doesn't seem that way to me.
I guess I want to know that there is a) a way to improve Oracle performance, b) a magical configuration setting I'm missing, or c) something else?
Solaris 10 running on a SPARC-Enterprise-T5120 with 16G of RAM.
Zabbix 1.8 (or trunk, it flunks with either)
Oracle 10.2.0.1 (running on a remote host, accessed via GigE).
Presently, zabbix_server.conf is set to default on everything but the MUST configures, like hostname and db setup. We've tried it with differing values cranked up to the max and it doesn't make a bit of difference (which leads me to believe it's database performance related).
Number of hosts (monitored/not monitored/templates) 89 49 / 0 / 40
Number of items (monitored/disabled/not supported) 1694 1677 / 0 / 17
Number of triggers (enabled/disabled)[true/unknown/false] 1163 1163 / 0 [5 / 810 / 348]
Required server performance, new values per second 19.013055555556
You'd think a server like this could handle just about anything you could throw at it, but somehow that's not the case with Zabbix Server 1.8 and Oracle 10.2.0.1 on the backend.
Preliminary data shows that we can sustain up to around 12 values per second before the system decides to simply not appear to update any more. After about 15 to 30 minutes, my QUEUE is pegged at 1100 for Zabbix Agent and 111 for Simple Checks in the "Longer than 10 minutes" column. I am overwhelmed by false positives on a huge number of triggers simply because the data which is coming in is not getting recorded or something. Even with debuglevel at 4 I'm only seeing SUCCESS for data incoming. It just doesn't seem to be getting in fast enough. Here's an example:
Code:
23266:20091228:220143.479 Get value from agent result: '2335078905' 23266:20091228:220143.479 End of get_value():SUCCEED 23266:20091228:220143.479 In calculate_item_nextcheck (1663,60,"",1262059303) 23266:20091228:220143.479 End calculate_item_nextcheck (result:1262059363) 23266:20091228:220143.479 In substitute_simple_macros (data:'vm.memory.size[free]') 23266:20091228:220143.479 In get_value() key:'vm.memory.size[free]' 23266:20091228:220143.480 In get_value_agent() host:'[I][B]REDACTED[/B][/I]' addr:'[I][B]REDACTED[/B][/I]' key:'vm.memory.size[free]' 23266:20091228:220143.481 Sending [vm.memory.size[free]
Code:
22157:20091228:213915.653 tr value [0] event_prev_value [2] event_last_status [0] new_value [2] 22157:20091228:213915.653 Updating trigger 22157:20091228:213915.653 Query [txnlev:1] [update triggers set value=2,lastchange=1262056518,error='Zabbix was restarted.' where triggerid=13645]
We did NOT have any problems like this with 1.6.x. What gives? OCI was supposed to be faster than libsqlora8. It doesn't seem that way to me.
I guess I want to know that there is a) a way to improve Oracle performance, b) a magical configuration setting I'm missing, or c) something else?
Comment