Ad Widget

Collapse

A few issues with a moderately new zabbix user

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jgp77
    Junior Member
    • May 2013
    • 3

    #1

    A few issues with a moderately new zabbix user

    Some preliminary information:

    [root@centosbase ~]# zabbix_server -V
    Zabbix server v2.0.5 (revision 33558) (12 February 2013)
    Compilation time: Feb 13 2013 20:16:05

    All zabbix_agentd are 2.0+

    A few issues I've run into as a moderately new user:


    1. If I unlink a template from a host - any warnings that were generated for it previously are still reflected in Monitoring>Overview and do not dissipate on their own.

    2. If I update an item prototype is takes some time for this to be reflected in Zabbix Monitoring->Overview->Data
    Something like 20 minutes

    3. When an agent auto-registers upon the first try the server replies with: cannot send list of active checks to [77.74.76.78]: host [test-host.com] not found. If I restart the agent - it successfully gets a list of items to check for:
    12317:20130520:104809.982 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
    12318:20130520:104809.983 agent #0 started [collector]
    12320:20130520:104809.983 agent #1 started [active checks]

    This is just a necessary, and I think intentional lag, that must take place. I just listed in here in case auto-registration could perhaps be made to have the hosts item-checks ready before the initial check.

    4. I get errors on many of my active agents like:
    13129:20130520:104923.019 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
    13130:20130520:104923.019 agent #0 started [collector]
    13132:20130520:104923.020 agent #1 started [active checks]
    zabbix_agentd [13780]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
    zabbix_agentd [3980]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
    zabbix_agentd [24719]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable

    I assume this is normal since three processes/threads are trying to access the same pid file? However a message letting me know this isn't an error per se would be nice (assuming it isn't)

    5. I cannot seem to find the interval which zabbix-agent sends/recvs updates though I do understand the process (an agent checking for items with it's hostname - the server responds with a list - agent sends back the results). I would like to set this as small as possible.

    I am using the template OS-Linux on all of my hosts - all items and item prototypes and so on have been changed to agent(active). However when I do something intentionally to set off a trigger such as update /etc/passwd, it takes far too long for me to get alerted. If this were a disk that filled up on a production server I'd really need the alert within 30 seconds. A minute at the most (phone calls would be in well before this).

    6. On the frontend when I go to graphs, select Eth0 Traffic - and set it to hosts:all - only once host graph actually shows. I was expecting more of a screen like appearance with eth0 traffic for all hosts.

    7. I get this one annoying log message every now and then on the zabbix server:
    3664:20130519:193201.978 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,status,filter,error,lifetime from items where itemid=23521]

    I've read about this problem and increased the memory limit and several things in my my.cnf:
    max_connections = 20000
    wait_timeout = 28000
    max_allowed_packet = 64M

    Obviously I don't want that many connections on a 4GB box and it's not solving the problem anyway - any help is appreciated.




    I am very new to Zabbix so I assume all of these are personal ignorance. That said I think this is the best monitoring solution by miles. Nagios with it's annoyingly complex nrpe setups to monitor local disks and then the complete lack of easy graphing etc. Your auto-registration has also allowed me deploy Zabbix with Puppet which as been key. If posting all of these at once is a headache my apologies.
  • tchjts1
    Senior Member
    • May 2008
    • 1605

    #2
    Originally posted by jgp77
    Some preliminary information:

    [root@centosbase ~]# zabbix_server -V
    Zabbix server v2.0.5 (revision 33558) (12 February 2013)
    Compilation time: Feb 13 2013 20:16:05

    All zabbix_agentd are 2.0+

    A few issues I've run into as a moderately new user:


    1. If I unlink a template from a host - any warnings that were generated for it previously are still reflected in Monitoring>Overview and do not dissipate on their own.
    Use "Unlink and clear".

    Originally posted by jgp77
    2. If I update an item prototype is takes some time for this to be reflected in Zabbix Monitoring->Overview->Data
    Something like 20 minutes
    That's normal.

    Originally posted by jgp77
    3. When an agent auto-registers upon the first try the server replies with: cannot send list of active checks to [77.74.76.78]: host [test-host.com] not found. If I restart the agent - it successfully gets a list of items to check for:
    12317:20130520:104809.982 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
    12318:20130520:104809.983 agent #0 started [collector]
    12320:20130520:104809.983 agent #1 started [active checks]
    That is also normal during auto-registration. Active checks are attempted right off the bat, prior to the host information being generated in the frontend.


    Originally posted by jgp77
    4. I get errors on many of my active agents like:
    13129:20130520:104923.019 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
    13130:20130520:104923.019 agent #0 started [collector]
    13132:20130520:104923.020 agent #1 started [active checks]
    zabbix_agentd [13780]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
    zabbix_agentd [3980]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
    zabbix_agentd [24719]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
    I have only ever seen that error for zabbix_server.pid. I have never seen it for zabbix_agentd.pid

    I have always just left the default setting as is in zabbix_agentd.conf for this:
    Code:
    ############ GENERAL PARAMETERS #################
    
    ### Option: PidFile
    #       Name of PID file.
    #
    # Mandatory: no
    # Default:
    # PidFile=/tmp/zabbix_agentd.pid

    Originally posted by jgp77
    I am using the template OS-Linux on all of my hosts - all items and item prototypes and so on have been changed to agent(active). However when I do something intentionally to set off a trigger such as update /etc/passwd, it takes far too long for me to get alerted. If this were a disk that filled up on a production server I'd really need the alert within 30 seconds. A minute at the most (phone calls would be in well before this).
    What is the "interval" value for this item in your template? In mine, it is set by default to 3600 seconds. So, it checks it once per hour. That is when your trigger will evaluate it and alert on it if it meets the trigger criteria.

    Comment

    • jgp77
      Junior Member
      • May 2013
      • 3

      #3
      Ah thank you very much - knowing that the item update interval is handled on the server and not the agent clears things up dramatically.

      I had the pidfile to set:
      PidFile=/var/run/zabbix/zabbix_agentd.pid

      I've commented this out as it's not very important and I will check back to see if that clears things up.

      I appreciate your answers. Knowing what shouldn't and should be expected helps.

      My last remaining concern that MySQL error that happens every so often.

      Comment

      • tchjts1
        Senior Member
        • May 2008
        • 1605

        #4
        Originally posted by jgp77

        7. I get this one annoying log message every now and then on the zabbix server:
        3664:20130519:193201.978 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,status,filter,error,lifetime from items where itemid=23521]

        I've read about this problem and increased the memory limit and several things in my my.cnf:
        max_connections = 20000
        wait_timeout = 28000
        max_allowed_packet = 64M
        .
        What version of MySql are you using? I am on:
        # mysql -V
        mysql Ver 14.14 Distrib 5.1.61, for redhat-linux-gnu (x86_64) using readline 5.1

        My DB server is on a VM with 16GB of memory. That is not an error that I have a problem with. My my.cnf is this: (Again, tailored for a 16GB box)

        Code:
        [mysqld]
        datadir=/data/mysql
        socket=/var/lib/mysql/mysql.sock
        user=mysql
        
        # logs
        slow_query_log=1
        slow_query_log_file=/var/log/mysql_slow.log
        # log_slow_queries=/var/log/mysql_slow.log
        long_query_time=20
        symbolic-links=0
        
        ## Added 10/10/12
        port = 3306
        skip-external-locking
        max_allowed_packet = 1M
        table_open_cache = 512
        read_buffer_size = 2M
        read_rnd_buffer_size = 8M
        myisam_sort_buffer_size = 64M
        #thread_concurrency = 8
        
        # Zabbix parameters
        innodb_file_per_table
        max_allowed_packet = 16M
        innodb_data_home_dir = /data/mysql
        innodb_data_file_path = ibdata1:10M:autoextend
        innodb_log_group_home_dir = /data/mysql
        innodb_buffer_pool_size = 8G
        innodb_additional_mem_pool_size = 32M
        innodb_lock_wait_timeout = 120
        innodb_log_file_size = 120M
        innodb_thread_concurrency = 8
        key_buffer_size = 512M
        max_connections=512
        table_cache=4096
        query_cache_size = 128M
        tmp_table_size = 8M
        thread_cache_size = 64
        sort_buffer_size = 16M
        
        
        [mysqld_safe]
        log-error=/var/log/mysqld.log
        pid-file=/var/run/mysqld/mysqld.pid

        Comment

        Working...