Ad Widget

**tzn** · 25-04-2011, 00:07

I'm suffering form the same issue on several hosts. We are testing Zabbix to replace our Nagios installation (2500+ hosts, 30000+ checks), on on some hosts we are getting false positives from net.tcp.service[ssh]
I eliminated custom kernel (grsec, etc), agent version, etc. Where should I look further ?

**richlv** · 25-04-2011, 05:23

proc.num grabs the amount of currently running processes. default provided triggers check that value.

net.tcp.service connects to the port (from the agent) and checks for correct response.

if that fails, there are multiple reasons, which might include network issues or overloaded target host.

is the check failing constantly ? is sshd listening on all interfaces (including localhost) ?

**tzn** · 25-04-2011, 12:01

Originally posted by richlv

proc.num grabs the amount of currently running processes. default provided triggers check that value.

net.tcp.service connects to the port (from the agent) and checks for correct response.

if that fails, there are multiple reasons, which might include network issues or overloaded target host.

is the check failing constantly ? is sshd listening on all interfaces (including localhost) ?

Network problem is not a case, locally check is reporting the same status:

Code:

root@sXXXXX:/etc# time sudo -u zabbix /usr/sbin/zabbix_agentd -t net.tcp.service[ssh]
net.tcp.service[]                             [u|0]

real    0m5.061s
user    0m0.004s
sys     0m0.000s

Also it takes a lot of time to finish the check.

sshd is istening on all interfaces:

Code:

[email protected]:/etc# netstat -nltp |grep ssh
tcp        0      0 10.10.x.x:22         0.0.0.0:*               LISTEN      4119/sshd       
tcp        0      0 127.0.0.1:22            0.0.0.0:*               LISTEN      4119/sshd

Host is also not busy

Code:

root@sXXXXX:/etc# w
 11:40:27 up 7 days, 23:16,  1 user,  load average: 0.20, 0.30, 0.26

Stracing what agent does gives following output:

Code:

11:52:31.750369 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
11:52:31.750416 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
11:52:31.750555 read(3, "", 5)          = 0
11:52:36.793393 write(3, "0\n", 2)      = 2
11:52:36.793484 close(3)                = 0
11:52:36.793537 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
11:52:36.793598 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe562115000
11:52:36.793658 write(1, "net.tcp.service[]               "..., 52net.tcp.service[]                             [u|0]
) = 52
11:52:36.793728 exit_group(0)           = ?

Connect itself takes a lot of time, about 5 sec., and agent is connecting to localhost for unknown reason. I checked connecting manually, and there is one significant difference:

Code:

root@sXXXXX:/etc# nc 91.121.73.47 22
SSH-2.0-OpenSSH_5.1p1 Debian-6ubuntu2

Protocol mismatch.
root@sXXXXX:/etc# nc 127.0.0.1 22
root@sXXXXX:/etc#

**tzn** · 25-04-2011, 12:13

Problem solved

denyhosts was the problem. localhost was blacklisted for ssh in /etc/hosts.deny

**hgomez** · 28-05-2013, 09:56

I encountered same problems after upgrading to Zabbix 2.0.6 for some hosts.

I didn't get problems for 2.0.2-2.0.5 but now I see very often :

Trigger: SSH server is down on mymach1.myorg.com
Trigger status: PROBLEM
Trigger severity: Average
Trigger URL:

Item values:

1. SSH server (mymach1.myorg.com:net.tcp.service[ssh]): 0
2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
3. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*

And 1mn later :

Trigger: SSH server is down on mymach1.myorg.com
Trigger status: OK
Trigger severity: Average
Trigger URL:

Item values:

1. SSH server (mymach1.myorg.com:net.tcp.service[ssh]): 1
2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
3. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*

All systems are running Zabbix 2.0.6 and I'm using Passive Check mode

Anyone with same problems in 2.0.6 ?

**Heilig** · 30-05-2013, 10:55

Check the trigger expression. Here is the latest article on the subject:

No more flapping. Define triggers the smart way. - Zabbix Blog

http://blog.zabbix.com/no-more-flapping-define-triggers-the-smart-way/1488/

Zabbix trigger expressions provide an incredibly flexible way of defining problem conditions. If you can express your problem using plain English or any other human language, there is a great chance it could be represented using triggers. I’ve noticed that even experienced Zabbix users are not always aware of the true power of triggers. The […]

**hgomez** · 30-05-2013, 11:05

Interesting

Would you recommand using TRIGGER.VALUE or min() ?

**Heilig** · 30-05-2013, 11:21

In this situation I recommend use function min().

**hgomez** · 30-05-2013, 11:46

Originally posted by Heilig

In this situation I recommend use function min().

I selected last(1m), to ensure alert is triggered only if ssh is down for 1mn.
Since polling is each 30s, it should be 2 picks from remote if I'm right

**Heilig** · 30-05-2013, 12:30

No, the function last() isn't good in this situation. Trigger will blink each time when you received "0" value. In a previous post, I made a mistake and the function min() is also better not use with items that return 2 values (0 or 1).
You described function max(1m)=0. Its I recommend you to use.
If service continue blink you can made trigger less sensitive (by increasing the time interval), but better find and eliminate the cause of such behaviour of service.

**hgomez** · 30-05-2013, 17:00

Originally posted by Heilig

No, the function last() isn't good in this situation. Trigger will blink each time when you received "0" value. In a previous post, I made a mistake and the function min() is also better not use with items that return 2 values (0 or 1).
You described function max(1m)=0. Its I recommend you to use.
If service continue blink you can made trigger less sensitive (by increasing the time interval), but better find and eliminate the cause of such behaviour of service.

I finally used :

net.tcp.service[ssh].min(1m)

And for now it seems to prevent from glitches

I will try with max

net.tcp.service[ssh].max(1m)

**rondeniable** · 13-02-2014, 07:20

RE: CentOS 6

In my case, i had two issues.

First, I found that localhost was not set in /etc/hosts (*yeah i know*)

Second, I did not have localhost set as a ListenAddress in sshd_config.

I added this line to /etc/ssh/sshd_config and restarted.
ListenAddress 0.0.0.0

Hope that helps

**rubendob** · 06-06-2014, 15:49

Solved!

Originally posted by tzn

denyhosts was the problem. localhost was blacklisted for ssh in /etc/hosts.deny

Hey

thanks folks, was exactly same issue here.

Great job.

Ad Widget

False Positive Trigger Problem with ssh Service

False Positive Trigger Problem with ssh Service

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment