Ad Widget

Collapse

SSH Items randomly working

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Kukulkan
    Junior Member
    • Jun 2020
    • 7

    #1

    SSH Items randomly working

    Hi, I just try to monitor a server using SSH login. But I found Zabbix only randomly working with my checks.

    All check items are sometimes working if I test them. But mostly I get

    Cannot read data from SSH server

    I did a lot of research and also tried updating these two values in /etc/zabbix/zabbix-server.conf:
    Code:
    timeout=10
    StartPollers=100
    But I had no success. Most of my items fail. Only one or two random checks work every now and then.

    I can see zabbix making a successful login to the observed system (/var/log/secure) but it seems that Zabbix fails to execute or parse the ssh command every now and then.

    Here is one of the affected configurations:
    Name: CPU Load
    Type: SSH Agent
    Key: ssh.run["cpu load",vsprovider.de.mycompany.com,22,utf-8]
    Host interface: <may host>
    Authentication method: Password
    User name: <username
    Password: <password>
    Executed script: cat /proc/loadavg|cut -d ' ' -f2



    It works sometimes, but sadly only every now and then. If I do using a normal SSH session, it works fine (simulate login and command).
    Last edited by Kukulkan; 15-06-2020, 12:20.
  • Kukulkan
    Junior Member
    • Jun 2020
    • 7

    #2
    Hello, and sorry for the bump up. But I really need to fix this.

    I have no clue why it only works randomly. If I manually test any items multiple times, it works about 10% of the requests. 90% fail.

    The affected system tells me that all the SSH logins are successful. But zabbix mostly saying "Cannot read data from SSH server".

    What can I do?

    Comment

    • Kukulkan
      Junior Member
      • Jun 2020
      • 7

      #3
      I just found, with debugging level 5, that Zabbix is saying it can not resolve my domain name. But a ping for the identical domain from the appliance works (copy&paste) and the nameserver for the interface is also set correctly? I restarted the Zabbix appliance, but no success.

      Anyway, I then changed the Items to use the IP address instead of the domain name. It still works only sometimes. This is what I get in the log for such item:

      Code:
      2217:20200609:153545.930 itemid:30929 hostid:10324 key:'ssh.run["user cnt",192.168.11.59,22,utf-8]'
      2217:20200609:153545.930 type:13 value_type:3
      2217:20200609:153545.930 interfaceid:2
      2217:20200609:153545.930 state:1 error:'Cannot read data from SSH server'
      2217:20200609:153545.930 flags:0 status:0
      2217:20200609:153545.930 valuemapid:0
      2217:20200609:153545.930 lastlogsize:0 mtime:0
      2217:20200609:153545.930 delay:'5m' nextcheck:1591717529 lastclock:0
      2217:20200609:153545.930 data_expected_from:1591716945
      2217:20200609:153545.930 history:1 history_sec:7776000
      2217:20200609:153545.930 poller_type:0 location:1
      2217:20200609:153545.930 inventory_link:0
      2217:20200609:153545.930 priority:1 schedulable:1
      2217:20200609:153545.930 units:'' trends:1
      2217:20200609:153545.930 ssh:[username:'zabbix' password:'myZabbixUserPwd' authtype:0 params:'who|wc -l']
      2217:20200609:153545.930 ssh:[publickey:'' privatekey:'']
      Sadly, this does not give me any information about why it fails :-(

      Any tipps?

      Comment

      • Kukulkan
        Junior Member
        • Jun 2020
        • 7

        #4
        With help of user123132 from the IRC channel, I found additonal information:

        The agent log always returns success:

        Code:
        27013:20200610:074528.429 zbx_setproctitle() title:'active checks #1 [idle 1 sec]'
        27009:20200610:074528.659 zbx_setproctitle() title:'collector [processing data]'
        27009:20200610:074528.659 In update_cpustats()
        27009:20200610:074528.659 End of update_cpustats()
        27009:20200610:074528.659 zbx_setproctitle() title:'collector [idle 1 sec]'
        27011:20200610:074529.069 zbx_setproctitle() title:'listener #2 [processing request]'
        27011:20200610:074529.069 Requested [system.users.num]
        27011:20200610:074529.069 In zbx_popen() command:'who | wc -l'
        27011:20200610:074529.070 End of zbx_popen():7
        27726:20200610:074529.070 zbx_popen(): executing script
        27011:20200610:074529.073 In zbx_waitpid()
        27011:20200610:074529.073 zbx_waitpid() exited, status:0
        27011:20200610:074529.073 End of zbx_waitpid():27726
        27011:20200610:074529.073 EXECUTE_STR() command:'who | wc -l' len:1 cmd_result:'2'
        27011:20200610:074529.073 Sending back [2]
        So I get successfull results for every SSH call but zabbix is still telling me "Cannot read data from SSH server".

        This is the items configuration:
        Click image for larger version

Name:	gakQI2n.png
Views:	3050
Size:	57.8 KB
ID:	403066

        I increased timeout on both server and agent and also increased StartPollers to 100. Not working :-(

        Comment

        • Kukulkan
          Junior Member
          • Jun 2020
          • 7

          #5
          Sadly no help. In the meantime I consider this a serious bug. It took me so many hours to investigate and no idea if I do something wrong...

          Comment

          • tim.mooney
            Senior Member
            • Dec 2012
            • 1427

            #6
            The Zabbix bug tracker is at support.zabbix.com .

            You shouldn't need 100 pollers unless you have a much larger environment than you have so far described.

            What the "Host Interface" should be set to is (I think) poorly documented for this type of item. The example is in the docs I think shows it set to the server's zabbix_server port -- note the 10051 rather than 10050. I have no idea if that matters in this case, but if you haven't tried it, you probably should try that and see if it changes the behavior.

            We haven't used the ssh agent item type in many years, because for anything that can run the agent, the agent is generally much more efficient. For things that can't run the agent, we've so far been able to use other types of checks (SNMP, HTTP, simple) that are again more efficient.

            Comment

            • Kukulkan
              Junior Member
              • Jun 2020
              • 7

              #7
              Hi Tim. Thanks for the reply. I tried with port 10051, but that does not change anything. It only works randomly (about one of ten).

              The machine to observe does not offer third party repositories and is not able to run the agent. SNMP is not available but SSH would do fine. I will report the issue if I do not need an additional account for that.

              UPDATE: Someone already reported this: https://support.zabbix.com/browse/ZBX-17756
              Last edited by Kukulkan; 16-06-2020, 11:25.

              Comment

              • walter.egosson
                Junior Member
                • Aug 2020
                • 2

                #8
                Had the exact same issue. Solution was to compile libssh from source from the official website (https://www.libssh.org/get-it/) and follow the instructions.
                By the time I am wirtting Debian Buster still relieso on the version 0.8 of libssh. Hope it helps

                Comment

                Working...