Ad Widget

Collapse

Rare Problem down systems slows zabbix

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fmtaylor2
    Member
    • May 2006
    • 66

    #1

    Rare Problem down systems slows zabbix

    I have what I hope is a rare problem. We haave to shut down our datacenter for high voltage power work. We were attempting to use zabbix to verify that all the systems were actually offline before shuting down the power. Howwever, as the number of offline systems increased, the response time for zabbix became slower to the point that it wasn't responding in a timely manner (eg: I shut the system down and 5 minutes later zabbix still had not recognized it as down). What should I change to avoid this issue??

    Here is some data:

    Number of hosts (monitored/not monitored/templates/deleted) 288 209 / 67 / 12 / 0
    Number of items (monitored/disabled/not supported)[trapper] 29460 15555 / 3259 / 10646 [1]
    Number of triggers (enabled/disabled)[true/unknown/false] 16394 8213 / 8181 [33 / 90 / 8090]
    Number of users (online) 30 3
    Required server performance, new values per second 211.6517 -

    # This is config file for ZABBIX server process
    # To get more information about ZABBIX,
    # go http://www.zabbix.com

    ############ GENERAL PARAMETERS #################

    # This defines which server this is.
    # Default value 1
    # This parameter must be between 1 and 255
    Server=2

    # Number of pre-forked instances of pollers
    # Default value is 6
    # This parameter must be between 5 and 255
    StartPollers=32

    # Number of pre-forked instances of trappers
    # Default value is 5
    # This parameter must be between 2 and 255
    StartTrappers=2

    # Listen port for trapping. Default port number is 10051. This parameter
    # must be between 1024 and 32767

    ListenPort=10051

    # How often ZABBIX will perform housekeeping procedure
    # (in hours)
    # Default value is 1 hour
    # Housekeeping is removing unnecessary information from
    # tables history, laert, and alarms
    # This parameter must be between 1 and 24

    HousekeepingFrequency=1

    # How often ZABBIX will try to send unsent alerts
    # (in seconds)
    # Default value is 30 seconds
    SenderFrequency=60

    # Uncomment this line to disable housekeeping procedure

    #DisableHousekeeping=1

    # Specifies debug level
    # 0 - debug is not created
    # 1 - critical information
    # 2 - error information
    # 3 - warnings (default)
    # 4 - for debugging (produces lots of information)

    DebugLevel=3

    # Specifies how long we wait for agent (in sec)
    # Must be between 1 and 30
    Timeout=15

    # After how many seconds of unavailability treat a host as unavailable
    UnavailablePeriod=240

    # Name of PID file

    PidFile=/tmp/zabbix_server.pid

    # Name of log file
    # If not set, syslog is used

    LogFile=/tmp/zabbix_server.log

    #Location for custom alert scripts
    AlertScriptsPath=/home/zabbix/bin/
    ExternalScripts=/home/zabbix
    #Location of 'fping. Default is /usr/sbin/fping
    FpingLocation=/usr/local/sbin/fping

    # Frequency of ICMP pings. Defauls is 30 second.
    #PingerFrequency=30

    # Database host name
    # Default is localhost

    #DBHost=localhost

    # Database name

    DBName=zabbix

    # Database user

    DBUser=zabbix

    # Database password
    # Comment this line if no password used

    DBPassword=z4bb1x

    # Connect to MySQL using Unix socket?
    DBSocket=/var/lib/mysql/mysql.sock
    #DBSocket=/tmp/mysql.sock
    UnreachableDelay = 30
    UnreachablePeriod = 120
    UnavailableDelay = 300

    Any advice??
  • nelsonab
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2006
    • 1233

    #2
    Are you using active checks? Judging by your config file you aren't.

    What comes to mind is the Zabbix_Server may be trying to query the various hosts and is delayed while waiting for one respond before moving on to the next one. I also notice you have a 4 minute delay before the Zabbix_Server will show a host as unavailable. I understand it this means the server will try every key/item on that host until it is shown as unavailable, at which point it will then revert to one or two keys until it comes back online. While testing each key there is an additional TCP timeout which may further delay things.

    Can any of the developers comment about how this process works? Is polling controlled by one thread, or do they all work completely independent of each other?
    RHCE, author of zbxapi
    Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
    Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      I would decrease Timeout to 3 seconds. This would make ZABBIX passive checks (ZABBIX agent, SNMP, etc) much more responsive in case of many timeout situations.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      Working...