Ad Widget

Collapse

Zabbix "stuck"

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Pedro.Almeida
    Junior Member
    • Sep 2014
    • 22

    #1

    Zabbix "stuck"

    Weird thing happened (already in two completely different 3.0.1 installations).

    Zabbix server reports being down on the frontend. Lots of triggers from agent.nodata(600).
    Seems no data from agents is being received.
    Nothing odd shows on logs, only one or other message from other checks (for instance, web scenarios) as business-as-usual.

    Seems as if, all of a sudden, all agents can't speak to Zabbix.
    Perhaps some leak? socket leak?

    Restarting zabbix-server process solves issue.

    Happened to anyone?

    Even local zabbix_agent reports the issue.

    28617:20160408:061534.482 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:061849.587 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:061852.587 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062013.630 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062016.634 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062058.712 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062104.726 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062131.726 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062134.727 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062201.727 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062204.728 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062325.829 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062328.851 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062343.892 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062346.892 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062352.893 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062355.893 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062455.893 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062458.894 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062504.894 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062507.894 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:062534.899 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:062645.961 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [4] Interrupted system call)
    28617:20160408:063207.634 active check data upload to [127.0.0.1:10051] is working again
    28617:20160408:063207.634 active check data upload to [127.0.0.1:10051] started to fail ([connect] cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)
    28617:20160408:063327.258 active check data upload to [127.0.0.1:10051] is working again
  • Pedro.Almeida
    Junior Member
    • Sep 2014
    • 22

    #2
    Just one note.

    One of these systems has gone along 1.8, 2.0.x, 2.4.x, up to 3.0.1 (along the years, of course, not all at once).
    The other has gone 2.4.3=>2.4.7=>3.0.0 => 3.0.1.

    This behaviour was only seen on 3.0.x.

    Perhaps this was something that slipped into the connection broker when encryption was added.

    Comment

    • glebs.ivanovskis
      Senior Member
      • Jul 2015
      • 237

      #3
      Originally posted by Pedro.Almeida
      Perhaps this was something that slipped into the connection broker when encryption was added.
      You're right. Looks like ZBX-10530, already fixed and released in 3.0.2rc1.

      Comment

      • Pedro.Almeida
        Junior Member
        • Sep 2014
        • 22

        #4
        Originally posted by glebs.ivanovskis
        You're right. Looks like ZBX-10530, already fixed and released in 3.0.2rc1.
        Thanks a lot!!
        Strangest thing is disabling Houseekeeper seemed to take care of it (on both servers).
        Never had performance issues. Perhaps a mix of conditions?

        Comment

        Working...