Ad Widget

**Alexei** · 29-12-2009, 10:04

Zabbix 1.8 is supposed to run much faster than 1.6.x. The after-restart trigger update logic is exactly the same as in 1.6. The 19 values per second can be easily handled by an embedded hardware, your box is capable of monitoring 50x more hosts, items and triggers (provided disk IO is fast).

**untergeek** · 29-12-2009, 15:42

Thanks for the reply, Alexei. I just want to know why the discrepancy exists. We were handling 50 values per second with 1.6.8 and the server wasn't breaking a sweat. Why in 1.8 am I suffering? OCI vs. libsqlora8 can't explain this, can it? It was bound to the lib32 oracle libraries and so is the OCI 1.8 server. I just don't get it. I will paste in a screen capture of my queue so you can see how backed up it all is (I'm replying from my iPhone right now so I can't).

**Alexei** · 29-12-2009, 15:45

Yes, OCI is supposed to work faster. I cannot answer your question without seeing more details.

**untergeek** · 29-12-2009, 16:01

What details would you like? I will harvest anything I can get from the debug log.

**untergeek** · 29-12-2009, 16:08

Here's the screen cap of my queue (attached).

Attached Files

**untergeek** · 29-12-2009, 16:31

Configuration:

Code:

Configuration:

  Detected OS:           solaris2.10
  Install path:          /usr/local
  Compilation arch:      solaris

  Compiler:              /usr/bin/cc
  Compiler flags:        -I/usr/local/include -I/opt/sfw/include -I/opt/oracle/product/10.2.0.1/rdbms/public -I/opt/oracle/product/10.2.0.1/rdbms/demo       -I/usr/sfw/include -I/usr/local/include -I. -I/usr/local/include    -I/usr/local/include 

  Enable server:         yes
  With database:         Oracle
  WEB Monitoring via:    cURL
  Native Jabber:         no
  SNMP:                  net-snmp
  IPMI:                  no
  Linker flags:          -L/usr/local/include -L/opt/oracle/product/10.2.0.1/lib32 -R/usr/local/include -R/opt/oracle/product/10.2.0.1/lib32  -L/usr/local/lib   -L/opt/oracle/product/10.2.0.1/lib      -L/opt/sfw/lib -lcurl -L/usr/sfw/lib -lssl -lcrypto -lsocket -lnsl -lssl -lcrypto -lsocket -lnsl -ldl -lz  -L/usr/local/lib -L/usr/sfw/lib -L/usr/local/lib -lnetsnmp -lgen -lelf -lnsl -lsocket -lcrypto  -L/usr/local/lib -L/usr/sfw/lib -L/usr/local/lib -lnetsnmp -lgen -lelf -lnsl -lsocket -lcrypto 
  Libraries:             -lkvm -lm -lnsl -lkstat -lsocket  -lresolv -liconv  -lclntsh -lnnz10     -lcurl  -lnetsnmp  

  Enable proxy:          yes
  With database:         Oracle
  WEB Monitoring via:    cURL
  SNMP:                  net-snmp
  IPMI:                  no
  Linker flags:          -L/usr/local/include -L/opt/oracle/product/10.2.0.1/lib32 -R/usr/local/include -R/opt/oracle/product/10.2.0.1/lib32  -L/usr/local/lib   -L/opt/oracle/product/10.2.0.1/lib     -L/opt/sfw/lib -lcurl -L/usr/sfw/lib -lssl -lcrypto -lsocket -lnsl -lssl -lcrypto -lsocket -lnsl -ldl -lz  -L/usr/local/lib -L/usr/sfw/lib -L/usr/local/lib -lnetsnmp -lgen -lelf -lnsl -lsocket -lcrypto  -L/usr/local/lib -L/usr/sfw/lib -L/usr/local/lib -lnetsnmp -lgen -lelf -lnsl -lsocket -lcrypto 
  Libraries:             -lkvm -lm -lnsl -lkstat -lsocket  -lresolv -liconv  -lclntsh -lnnz10    -lcurl  -lnetsnmp  

  Enable agent:          yes
  Linker flags:          -L/usr/local/include -L/opt/oracle/product/10.2.0.1/lib32 -R/usr/local/include -R/opt/oracle/product/10.2.0.1/lib32  -L/usr/local/lib 
  Libraries:             -lkvm -lm -lnsl -lkstat -lsocket  -lresolv -liconv

  LDAP support:          no
  IPv6 support:          no

We're not using any proxy for monitoring. All hosts are directly reachable by the Zabbix Server. I merely compiled it in case we wanted it in the future.

**untergeek** · 29-12-2009, 17:50

Zabbix Shutdown time

How long should it take to shut down a Zabbix 1.8 server?

I understand that it is performing history syncing. How long should it take for this to complete?

Here's how long it takes for the above server:

Code:

  2945:20091229:092625.677 One child process died (PID:3236). Exiting ...
  2945:20091229:092627.769 Syncing history data...
  2945:20091229:092923.772 Syncing history data...3.637686%
  2945:20091229:093310.670 Syncing history data...7.275373%
  2945:20091229:093705.617 Syncing history data...10.913059%
  2945:20091229:093939.803 Syncing history data...14.550746%
  2945:20091229:094218.742 Syncing history data...18.188432%
  2945:20091229:094613.130 Syncing history data...21.826119%
  2945:20091229:095004.805 Syncing history data...25.463805%
  2945:20091229:095250.587 Syncing history data...29.101491%
  2945:20091229:095520.293 Syncing history data...32.739178%
  2945:20091229:095912.743 Syncing history data...36.376864%

I'm not even going to bother making you wait for the end of this. There are simply not enough items for this to take this long, are there?

**untergeek** · 29-12-2009, 18:36

Here's the complete story.
It took from 9:26AM until 10:30AM to sync history data.

This can't be right.

Code:

  2945:20091229:092625.677 One child process died (PID:3236). Exiting ...
  2945:20091229:092627.769 Syncing history data...
  2945:20091229:092923.772 Syncing history data...3.637686%
  2945:20091229:093310.670 Syncing history data...7.275373%
  2945:20091229:093705.617 Syncing history data...10.913059%
  2945:20091229:093939.803 Syncing history data...14.550746%
  2945:20091229:094218.742 Syncing history data...18.188432%
  2945:20091229:094613.130 Syncing history data...21.826119%
  2945:20091229:095004.805 Syncing history data...25.463805%
  2945:20091229:095250.587 Syncing history data...29.101491%
  2945:20091229:095520.293 Syncing history data...32.739178%
  2945:20091229:095912.743 Syncing history data...36.376864%
  2945:20091229:100307.935 Syncing history data...40.014551%
  2945:20091229:100541.387 Syncing history data...43.652237%
  2945:20091229:100800.715 Syncing history data...47.289924%
  2945:20091229:101212.558 Syncing history data...50.927610%
  2945:20091229:101614.931 Syncing history data...54.565296%
  2945:20091229:101933.392 Syncing history data...58.202983%
  2945:20091229:102131.802 Syncing history data...61.840669%
  2945:20091229:102508.659 Syncing history data...65.478356%
  2945:20091229:102854.285 Syncing history data...69.116042%
  2945:20091229:102904.472 Syncing history data...70.163696%
  2945:20091229:102915.612 Syncing history data...75.489269%
  2945:20091229:102926.746 Syncing history data...79.927246%
  2945:20091229:102936.797 Syncing history data...83.477628%
  2945:20091229:102946.252 Syncing history data...87.519098%
  2945:20091229:102956.350 Syncing history data...92.204438%
  2945:20091229:103006.552 Syncing history data...97.224445%
  2945:20091229:103012.663 Syncing history data...done.
  2945:20091229:103012.664 Syncing trends data...
  2945:20091229:103018.604 Syncing trends data...done.
  2945:20091229:103018.606 Zabbix Server stopped.

**chivo** · 30-12-2009, 23:13

Is this the same database and hardware you used with Zabbix 1.6.x? If you upgraded, did you follow the upgrade procedure for the database changes? Primarily I'm thinking about having to drop specific indexes and creating new ones.

Second, is this Oracle database used for any other applications? What is the back end storage like? Even with 1.6.X you should be able to handle much more than 50 new values per second. Using an HP intel server with similar memory requirements, I can run the zabbix server and mysql DB and have 538 hosts with 56496 items checked, and I'm not really pushing the system. (That's about 260 new values per second)

Given that, I would check that your Oracle configuration is optimized and disk IO for your DB is good.

**untergeek** · 30-12-2009, 23:26

It is the same hardware and Oracle server. We even started over from scratch with a clean schema for 1.8, just to be sure. It's not from having plugged-up indexes.

**ericgearhart** · 03-01-2010, 21:23

Hmmm really feels like possible bug with Oracle driver to me, especially if it was changed from 1.6 -> 1.8

If I had an extra Oracle box laying around I'd try to help...unfortunately my experience in DBA related things is limited to MySQL and Postgres (and a tiny little MS SQL)

**untergeek** · 03-01-2010, 21:36

They say they've tried with Oracle 11g. Can anyone running Oracle 11g on Solaris confirm that they are not experiencing this problem?

**ericgearhart** · 03-01-2010, 21:42

There's a "free" (as in beer) Oracle edition similar to MS SQL Express, aptly titled "Oracle Express Edition"

"Express Edition[40] ('Oracle Database XE'), introduced in 2005, offers Oracle 10g free to distribute on Windows and Linux platforms. It has a footprint of only 150 MB and is restricted to the use of a single CPU, a maximum of 4 GB of user data. Although it can install on a server with any amount of memory, it uses a maximum of 1 GB.[41] Support for this version comes exclusively through on-line forums and not through Oracle support." (from the WP article)

... I don't see an Express Edition for 11g though. If there was an 11g Express version I'd be tempted to throw up an 11g Express Edition Linux VM and test this...

**untergeek** · 03-01-2010, 21:43

Thanks for the thought. Yeah, we're running full enterprise Oracle. I'd probably need an apples to apples comparison. I might be able to convince my boss to run an 11g install somewhere on our cluster.

Ad Widget

Ridiculously low performance threshold in 1.8

Ridiculously low performance threshold in 1.8

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment