Ad Widget

Collapse

Zabbix Server 2.2.0: unreachable for 5 minutes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • LeeMcL
    Junior Member
    • Nov 2013
    • 2

    #1

    Zabbix Server 2.2.0: unreachable for 5 minutes

    I have a very small Zabbix setup with just 11 hosts being monitored at the moment. One of those hosts does dual-duty running as both the server and agent. All the machines are running Ubuntu 12.04 with Zabbix installed from the Ubuntu packages from zabbix.com. The machines all have plenty of cpu, ram and network connections. For example the Zabbix server has 8x3.4 GHz cpu, 32 Gig RAM and its load average rarely gets above 0.1

    Till a week ago everything was running Zabbix 2.0.9 both agents and server. A week ago I upgraded just the server to 2.2.0 - using the packages form zabbix.com

    Since then in the logs I get bursts for up to a couple of hours of:

    Zabbix agent item [...] on host [...] failed: first network error, wait for 15 seconds
    usually a few of these then
    temporarily disabling Zabbix agent checks on host [..]: host unavailable
    enabling Zabbix agent checks on host []: host became available

    The items vary and it seems to effect all hosts I monitor.

    Sometimes I'll get emails PROBLEM: Zabbix agent on ... is unreachable for 5 minutes and then after 10 minutes I'll get the OK.

    Unfortunately as this seems to effect all my hosts I'll get one or more PROBLEM then OK emails for each server

    While I was getting a burst of these emails I changed zabbix_server.conf and restarted it. (The rest of the fields are the defaults apart from the zabbix mysql details):

    StartPollers=20
    Timeout=15
    UnreachablePeriod=90
    UnreachableDelay=10

    This at least stopped the emails and seemed to help with collecting data. However I still get a burst of emails.

    In the logs I get these:

    Zabbix agent item [...] on host [...] failed: first network error, wait for 10 seconds
    resuming Zabbix agent checks on host [...]: connection restored
    then 10 seconds later
    Zabbix agent item [...] on host [...] failed: first network error, wait for 10 seconds
    Zabbix agent item [...] on host [...] failed: another network error, wait for 10 seconds
    resuming Zabbix agent checks on host [...]: connection restored

    I've not found anything similar in the forums. I've checked the Zabbix Item Queue and its all zeros. I have no problems with network connectivity while Zabbix is reporting the network errors. But since one of the hosts being checked is really itself its unlikely its a network issue. No problems in the Zabbix Agent logs.

    The Zabbix data gathering process busy graph is basically flat except for poller processes which is at 20%. Zabbix internal process busy graph has no number over 0.2

    I'd really appreciate any help as to what might be causing this.

    Thanks
  • BrunoSpinelli
    Member
    • Mar 2013
    • 34

    #2
    Good afternoon.
    LeeMcL

    I had some problem with version 2.2.0, but I think it is a very stable and very good to use version yet.

    In installation after you install the proxy agent and the machines had the error
    host not found in both the proxy and the agent. So checking the settings and analyzing the logs generated an error regarding macros.
    I found the solution as follows:
    When you install the database in my case it uses a MYSQL schema.sql Archiving to create the database tables, so what comes along in the package I believe 2.2.0 was bug because the error occurred.
    The solution was incontrei install version 2.2.0 using schema.sql file with the version 2.0.9.
    Everything is now working here.

    Atte.
    Bruno Spinelli

    Comment

    • LeeMcL
      Junior Member
      • Nov 2013
      • 2

      #3
      It was MySQL

      Thanks for that. I really hadn't thought that a network error report was being caused by a database issue. I've not checked MySQL usage in a while as it has been very low for a long time.

      On investigation it turns out that there was a new search being used on the a different database on the Zabbix Server that was basically hammering MySQL (the queries and bandwidth increase 10-20 fold when that search is run).

      I've been told its not going to be fixed for some time so instead I've just moved Zabbix Server onto a different machine. Its now on a tiny server but without having to fight to get MySQL resources its doing well.

      Comment

      • lprikockis
        Junior Member
        • Apr 2010
        • 4

        #4
        it's not *just* mysql...

        I'm experiencing basically the same problems with my Zabbix 2.2.2 installation. Except that I'm using postgresql... and nothing other than Zabbix is hammering on it.

        Load on the zabbix server is pretty low and I don't see any checks queuing up or anything else that would account for these frequent timeouts/failures to connect to agents.

        any other thoughts on where I should be looking for the problem?

        Comment

        • AballahSonDis
          Junior Member
          • Sep 2013
          • 7

          #5
          Same Issue

          I am having the same issue - just wondering what you did to fix mysql? This is on its own machine so not sure what the deal is.

          current database version 020220000/02020000

          I am also getting some odd errors:
          cannot parse OID became not supported.

          Any thoughts?

          Comment

          Working...