Ad Widget

Collapse

How to monitor hardware?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • gospodin.horoshiy
    Senior Member
    • Sep 2008
    • 272

    #1

    How to monitor hardware?

    Hi

    I'm would like to discuss which are the best triggers' treshholds for Hardware Monitoring?

    Here is what I've come up with:
    I)Memory

    1)\Memory\Available Bytes
    {t: perf_counter["\Memory\Available Bytes"].avg(300)}<10000000

    If less than 10 Megabytes for last 5 minutes - fire trigger: low memory

    2) \Memory\Pages/sec
    {t: perf_counter["\Memory\Pages/sec"].avg(1800)}>50

    If average value is higher than 50 for last 30 minutes - fire trigger:

    a) If Trigger 1 is also fired: Low memory
    b) If Trigger 1 is not fired: High Pages


    II)CPUs

    1)\Processor(_Total)\% Processor Time
    {t: perf_counter["\Processor(_Total)\% Processor Time"].avg(1800)}>80

    If average is more than 80 % for last 30 minutes - fire trigger: CPU's high utilization level

    III)Physical disks
    _Total in case of one disk only ( or Hardware RAID)

    1)\Physical Disk(_Total)\% Disk Time
    {t: perf_counter["\Physical Disk(_Total)\% Disk Time"].avg(1800)}>80

    If average is more than 80 % for last 30 minutes - fire trigger: HDD is slow

    2)\Physical Disk(_Total)\Avg Queue Length
    {t: perf_counter["\Physical Disk(_Total)\Avg Queue Length"].avg(1800)}>2

    If average value is more than 2 for last 30 minutes - fire trigger: HDD is slow


    IV)Network
    Swithed Fast Ethernet Network

    1)\Network Interface(NIC LAN")\Total Bytes Sent/sec
    {t: perf_counter["\Network Interface(NIC LAN")\Total Bytes Sent/sec" ].avg(1800)}>80000000

    If average is more than 80Mbit for last 30 minutes - fire trigger: Network bottleneck


    This is my 'beta' version of counters and I think how accurate they are, actually. What I want is that all real bottlenecks and poor performance events fired. But minimize number of false alerts.

    Questions:
    1)Which triggers and counters do you use for hardware monitoring and what do you think how accurate mine are?
    2)Pages > 50 but Available Bytes are Ok. What does it mean?
    3)High Disk Time. Does it really mean bad performance?
    4)What else I'm missing for full picture?



    P.S. Windows perfomance monitor counters here are only as examples. They could be not written correctly.
    Zbx 2.0.4 on Debian and MYSQL5 on Ubuntu Server 64bit 8.04,
    200+ Win Agents, 50+ Linux Agents, 150+ Network Devices
  • gospodin.horoshiy
    Senior Member
    • Sep 2008
    • 272

    #2
    One thing I can say already:

    2) \Memory\Pages/sec
    {t: perf_counter["\Memory\Pages/sec"].avg(1800)}>50

    If average value is higher than 50 for last 30 minutes - fire trigger:
    "Spikes" of this counter give me headache. How to change it?

    I would use Minimum value for last 30 minutes is higher than 50( or 20) .min(1800)>20 but I'm afraid that single minimal value of zero could make the real alert not fired.
    Last edited by gospodin.horoshiy; 05-11-2008, 13:17.
    Zbx 2.0.4 on Debian and MYSQL5 on Ubuntu Server 64bit 8.04,
    200+ Win Agents, 50+ Linux Agents, 150+ Network Devices

    Comment

    • gospodin.horoshiy
      Senior Member
      • Sep 2008
      • 272

      #3
      Guys, how do you set up triggers for your hardware anyway?
      Zbx 2.0.4 on Debian and MYSQL5 on Ubuntu Server 64bit 8.04,
      200+ Win Agents, 50+ Linux Agents, 150+ Network Devices

      Comment

      • fast.ryder
        Member
        • Apr 2008
        • 46

        #4
        Windows Performance Counters - Monitoring

        Hello!

        Usually I only create triggers for NIC bandwidth usage being superior than a given value, like you do, and low RAM available.

        Disk I/O is hard to meter for me since I only have 1 Windows machine acting as a fileserver, every other is Linux-Based and so I cannot get a usage value taken from many machines that will be acceptable as a baseline.

        I would also recommend adding some CPU counters like "Interrupt Time" that may indicate hardware problems with specialized cards (telephony, etc).

        Please take care in selecting item Units, since in your post you use the bytes sent/sec counter and explain it as being "Mbps". This is wrong, you must multiply bytes by 8 to get bits and then you can create a trigger.

        Cheers,

        Ivo Pereira
        IT Consultant
        Portugal

        Comment

        • gospodin.horoshiy
          Senior Member
          • Sep 2008
          • 272

          #5
          Hi, thanks for the reply!

          Do you monitor CPU Interrupt time? If you do, what is your trigger condition?


          Please take care in selecting item Units, since in your post you use the bytes sent/sec counter and explain it as being "Mbps". This is wrong, you must multiply bytes by 8 to get bits and then you can create a trigger.
          I have proper multipliers here, don't worry, but thanks for noticing anyway
          Zbx 2.0.4 on Debian and MYSQL5 on Ubuntu Server 64bit 8.04,
          200+ Win Agents, 50+ Linux Agents, 150+ Network Devices

          Comment

          Working...