Ad Widget

**tchjts1** · 23-04-2014, 14:06

Use your Linux OS template to get CPU, memory, IO, etc stats from your Zabbix server.

I recall the UI slowness from version 2.0.6. If you move to 2.0.9, I think you will see that is resolved. You have an NVPS of 86. Ours is currently just over 1,000 and the GUI, graphs and screens are lightning fast.

See the bottom of this post and the graphs that follow it for ways to improve Zabbix internal tuning: https://www.zabbix.com/forum/showthread.php?t=41219

**zabbixfk** · 23-04-2014, 14:54

Obtaining I/O ops, RAM, CPU details for zabbix server : For Benchmarking

Thanks for the reply. Very sorry that i have not conveyed my message properly.

I have already linked that template and getting the CPU/RAM data.

In the process of benchmarking and sizing ( as my device count will grow, there by items/triggers and NPVS also), i was looking at finding out

a). how much each zabbix server process (zabbix_server) takes CPU/RAM/IO etc ( any formula/commands scripts to calculate)
b). How much does cpu/ram/io is taken when a device is added to zabbix server from UI.
c). how much load each UI session ( user who logs to zabbix server from browser) takes - as i need to add more users, and most of them will be online.

And i am posting some of the graphs as you mentioned in the reply.
Thanks.
[IMG]ttp://s30.postimg.org/ru8glhxkx/Screen_Shot_2014_04_23_at_6_12_01_PM.png[/IMG]

**tchjts1** · 23-04-2014, 15:28

Your unreachable poller process is running very high. On your Zabbix server in zabbix_server.conf, try increasing that value by 5 and restarting your zabbix server process.

Your housekeeper process also looks a bit strange to me. What are your settings for that? I run mine every 1 hour with a maxdelete of 500. With that setting, it runs for about 10 minutes every hour with a very predictable pattern. Can you show that same graph for a 1 day (24 hour) period instead of a 7 day period?

And also, as I mentioned, if you have the ability to upgrade to version 2.0.9, I think you will see much better performance in the GUI.

**jan.garaj** · 23-04-2014, 23:48

10% CPU iowait is not good - it can be good if you have slow hdd and super fast CPU - I think, that you have slow HDD and a lot of IOPs (try to disable debuglevel 4). Also some standard CPU usage metrics are missing. Why? Standard Linux tools can help you with detecting your bottleneck:
iostat -xd..., mpstat -P ALL, top/htop, uptime (load), mysqltuner.pl, ....

**tchjts1** · 23-04-2014, 23:54

Originally posted by jan.garaj

10% CPU iowait is not good - it can be good if you have slow hdd and super fast CPU, ....

Speaking of IO Wait, we had a similar issue because of the swappiness setting in Linux. Check this post I wrote up: https://www.zabbix.com/forum/showthread.php?t=38575

Whether this applies in this case or not, I don't know. But it is worth being informed about possible solutions.

**zabbixfk** · 24-04-2014, 09:51

Obtaining I/O ops, RAM, CPU details for zabbix server : For Benchmarking

Thank you all the reply.

Will work on tuning stuffs.
- Even i am a bit confused on CPU metrics are not getting displayed on zabbix graphs ( i have used zabbix server template.) - will debug more on this.

- Will try after disabling debuglevel ( looks like may be too much of log writing is making disk wait.

- Was looking at top/iostat/uptime commands, wrote on small script also ( pls check first message of this thread), but not able diagnose the results/issues. We actually changed the HDD after seeing more than 12% iowait ( command output of iostat).

- Sorry, i am not in a position to take downtime and upgrade build to 2.0.9 ( i see that latest is 2.2.x version.) - and moreover, database size is is close to 22G (mysql) - not sure auto upgrade scripts given in 2.2.x can handle that.

- Good point to be noted on the swappiness settings, it is 60 ( as a default) - changed it to 10 now. Let's see how it goes ( after changing this, iostat -xkcd shows 8.93%)

- Housekeeper settings is 1,500 ( every 1 hour run for 500 deletes) - during this process i can see CPU shooting up.

- Thanks again for all the replies. Sorry to deviate but wanted to find out more on the benchmark part - what i was looking at was sizing stuff. Considering the double of the existing device count, what kind of a server i would need in future .w.r.to. CPU/RAM etc by measuring each zabbix_server process and each zabbix apache end takes. But not to forget, all the replies are greatly helpful for tuning , understanding zabbix server also to find what i am doing wrong.

Thanks.

**jan.garaj** · 24-04-2014, 23:47

12% iowait (command output of iostat) is not IMHO OK. Find a reason http://superuser.com/questions/50091...gnu-linux-base
You server sizing looks perfect for 100nvps, so keep investigating.

**pc99096** · 26-04-2014, 07:42

there might be a performance problem with slow hdd, other hw specifications look ok. i would try to decrease the history/trend interval in the templates.

**zabbixfk** · 28-04-2014, 14:19

Obtaining I/O ops, RAM, CPU details for zabbix server : For Benchmarking

Thank you all for the reply.

I spent couple of hours to analyse disk performance, run both iostat, and iotop - but unable to come to conclusion - may be my interpretation of results are very bad.
IOTop output :

I was told

Code:

jbd2/dm-2-8

is disk controller, so its touching value close to 90% IO column is okay ( though it sounded scary to me).

IOSTAT output

Code:

Linux 2.6.32-358.18.1.el6.x86_64 (zabbixServer) 	04/28/2014 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.30    0.00    2.04    9.09    0.00   84.57

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.53   459.67    4.26  147.04   135.76  2359.41    32.98     1.07    7.06   4.99  75.53

SAR output. ( sar -p -d)

Code:

                      DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
Average:          sda    144.12    144.56   4036.38     29.01     10.57     73.38      6.24     89.96
Average:    VolGroup-lv_root     14.23      4.66    133.13      9.68      2.19    153.69      8.03     11.43
Average:    VolGroup-lv_swap      0.08      0.67      0.00      8.00      0.00     28.03     15.46      0.13
Average:    VolGroup-lv_var    490.61    139.22   3903.25      8.24     45.62     92.99      1.83     89.75

and From this link using below formula to calculate IOPS,

Code:

 IOPS = d * dIOPS / (( %r + ( F + %w ))

Where,
d=number of disks,
dIOPS = iops per disk ( for 7.2K rpm, values range between 75-100 - i chosed 80).
%r = % of read workload ( rd_sec / ( rd_sec + wr_sec) ) - from sar output
%w = % of write workload ( wr_sec / ( rd_sec + wr_sec )
F = Raid factor , for RAID 5, its 4 for write, 1 for read.
I got 82 as value. Which is pretty much less than i thought. Was expecting 200+.

It would be great if somebody can guide / share your thoughts/pointers.

Thanks.

**jan.garaj** · 29-04-2014, 10:50

Your Zabbix server requires performance: 84.41 new values per second.
It's 84.41*90 ~= 7,6kb ~= 1kB/s
1kB/sec + trend data ~~~~= 2kB/s
Zabbix requires to write 2kB/s of data to DB, but mysql server writes >2MB/s to harddisk.

Your average queue size (avgqu-sz) for your /var (VolGroup-lv_var) is 45.62. It's terrible value :-)
But I don't understand #;
- why sda from iotop has "normal" avgqu-sz 1.07
- why controller has 58% IOPs
What will happen with iotop stat if you disable mysql and zabbix? Is your RAID/LVM healthy?

From my view, you should to minimize IOP operations from mysql. Do not use/disable (if it's possible):
- query log
- binary logs

**zabbixfk** · 29-04-2014, 12:35

Obtaining I/O ops, RAM, CPU details for zabbix server : For Benchmarking

Thank you for the quick reply.

I did not understand your calculation of 2K/s zabbix write, can please you elaborate...

I am not sure why this controller ( guess you are referring to jbd2/dm-2-8) is shooting up on I/O, did some googling still not able to figure out.

Since its kind of production, i am not in a position to stop zabbix/mysql :'(

I have disabled querylog ( no change in iotop output), but can't disable binary log as (this zabbix server serves as master) i had setup master/slave replication for myql db to another server.

Looks like RAID/LVM is healthy, - as vgscan/lvmdiskscan didn't complain ( may be i am using wrong commands to check ? )

Code:

vgdisplay --verbose |grep PV |grep Name
    Finding all volume groups
    Finding volume group "VolGroup"
  PV Name               /dev/sda2

Code:

vgscan 
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup" using metadata type lvm2

Code:

lvmdiskscan 
  /dev/ram0             [      16.00 MiB] 
  /dev/root             [      50.00 GiB] 
  /dev/ram1             [      16.00 MiB] 
  /dev/sda1             [     500.00 MiB] 
  /dev/VolGroup/lv_swap [       5.88 GiB] 
  /dev/ram2             [      16.00 MiB] 
  /dev/sda2             [     464.76 GiB] LVM physical volume
  /dev/VolGroup/lv_var  [     408.88 GiB] 
  /dev/ram3             [      16.00 MiB] 
  /dev/ram4             [      16.00 MiB] 
  /dev/ram5             [      16.00 MiB] 
  /dev/ram6             [      16.00 MiB] 
  /dev/ram7             [      16.00 MiB] 
  /dev/ram8             [      16.00 MiB] 
  /dev/ram9             [      16.00 MiB] 
  /dev/ram10            [      16.00 MiB] 
  /dev/ram11            [      16.00 MiB] 
  /dev/ram12            [      16.00 MiB] 
  /dev/ram13            [      16.00 MiB] 
  /dev/ram14            [      16.00 MiB] 
  /dev/ram15            [      16.00 MiB] 
  3 disks
  17 partitions
  0 LVM physical volume whole disks
  1 LVM physical volume

And this has journal enabled,

Code:

tune2fs -l /dev/mapper/VolGroup-lv_var  | grep has_journal
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

When checked for 'Uniturptible sleep (IO) process' count, i got jbd2/dm-2-8 with 1457 pid is more, so

Code:

cat /proc/1457/io 
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 0
write_bytes: 49116737536
cancelled_write_bytes: 0

Very surprised to see, rchar, read_bytes is 0 !!!

Code:

hdparm -tT /dev/sda2

/dev/sda2:
 Timing cached reads:   10770 MB in  2.00 seconds = 5391.83 MB/sec
 Timing buffered disk reads:  342 MB in  3.01 seconds =  113.59 MB/sec

Any pointers are greatly helpful.

Thanks

**Navern** · 29-04-2014, 18:20

Originally posted by zabbixfk

Thank you for the quick reply.

I did not understand your calculation of 2K/s zabbix write, can please you elaborate...

I am not sure why this controller ( guess you are referring to jbd2/dm-2-8) is shooting up on I/O, did some googling still not able to figure out.

Since its kind of production, i am not in a position to stop zabbix/mysql :'(

I have disabled querylog ( no change in iotop output), but can't disable binary log as (this zabbix server serves as master) i had setup master/slave replication for myql db to another server.

Looks like RAID/LVM is healthy, - as vgscan/lvmdiskscan didn't complain ( may be i am using wrong commands to check ? )

Code:

vgdisplay --verbose |grep PV |grep Name
    Finding all volume groups
    Finding volume group "VolGroup"
  PV Name               /dev/sda2

Code:

vgscan 
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup" using metadata type lvm2

Code:

lvmdiskscan 
  /dev/ram0             [      16.00 MiB] 
  /dev/root             [      50.00 GiB] 
  /dev/ram1             [      16.00 MiB] 
  /dev/sda1             [     500.00 MiB] 
  /dev/VolGroup/lv_swap [       5.88 GiB] 
  /dev/ram2             [      16.00 MiB] 
  /dev/sda2             [     464.76 GiB] LVM physical volume
  /dev/VolGroup/lv_var  [     408.88 GiB] 
  /dev/ram3             [      16.00 MiB] 
  /dev/ram4             [      16.00 MiB] 
  /dev/ram5             [      16.00 MiB] 
  /dev/ram6             [      16.00 MiB] 
  /dev/ram7             [      16.00 MiB] 
  /dev/ram8             [      16.00 MiB] 
  /dev/ram9             [      16.00 MiB] 
  /dev/ram10            [      16.00 MiB] 
  /dev/ram11            [      16.00 MiB] 
  /dev/ram12            [      16.00 MiB] 
  /dev/ram13            [      16.00 MiB] 
  /dev/ram14            [      16.00 MiB] 
  /dev/ram15            [      16.00 MiB] 
  3 disks
  17 partitions
  0 LVM physical volume whole disks
  1 LVM physical volume

And this has journal enabled,

Code:

tune2fs -l /dev/mapper/VolGroup-lv_var  | grep has_journal
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

When checked for 'Uniturptible sleep (IO) process' count, i got jbd2/dm-2-8 with 1457 pid is more, so

Code:

cat /proc/1457/io 
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 0
write_bytes: 49116737536
cancelled_write_bytes: 0

Very surprised to see, rchar, read_bytes is 0 !!!

Code:

hdparm -tT /dev/sda2

/dev/sda2:
 Timing cached reads:   10770 MB in  2.00 seconds = 5391.83 MB/sec
 Timing buffered disk reads:  342 MB in  3.01 seconds =  113.59 MB/sec

Any pointers are greatly helpful.

Thanks

To check you raid health you should have some specific utility for your RAID controller. For example if you use Adaptec RAID controller than you can use arrconf utility to check health of your hardware RAID.

**jan.garaj** · 30-04-2014, 00:17

Originally posted by zabbixfk

I did not understand your calculation of 2K/s zabbix write, can please you elaborate...

One numeric value requires +/-90bytes in database. I've used this info + your info about required new values per second + some additional space for trend => my estimation data writes 2K/s (if you monitor only numeric values, if you monitor logs, it should be more of course)

Sorry, I don't have deep knowledge about disk, so I can't check your disk outputs.

My opinion:
- disk is overloaded (disk queue 45 for /var)
- write load from mysql has unexpected high value - my expectation is 2K for data file + 2K for bin log + overhead ~~~=> 100KB/s (not >2MB)

Try to check your DB server with mysqltuner.pl
Check what DB is doing (SHOW FULL PROCESSLIST), status (SHOW STATUS), ...

**zabbixfk** · 05-05-2014, 07:43

Obtaining I/O ops, RAM, CPU details for zabbix server : For Benchmarking

Thank you all for the reply.

Thanks @jan.garaj - for the explanation.

Even i suspect disk issues. MySql seems to be culprit as, if i shut down mysql, all becomes normal ( cpu load goes down, i/o goes to normal).

SHOW FULL PROCESSLIST : shows, most of the time, either delete from history_* tables or lot of update queries, and some zabbix connections which are in sleep mode.

I had set Housekeeping frequency to 1, 500 items.

Thanks.

Ad Widget

Obtaining I/O ops, RAM, CPU details for zabbix server

Obtaining I/O ops, RAM, CPU details for zabbix server

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment