PDA

View Full Version : Somebody help me !! URGENT!!


Bharathu
09-01-2006, 11:24
Hi!!

I'm using Zabbix 1.1 beta 4. The zabbix server was installed in Linux. I was not able to get my Windows agent communicate with my zabbix server. i always get this message in my zabbix_agentd.log.

[09-Jan-2006 14:42:32] *************** Log file opened ****************
[09-Jan-2006 14:42:32] Collector thread initialized successfully
[09-Jan-2006 14:42:32] Zabbix Win32 Agent started
[09-Jan-2006 14:42:53] Active checks [Cannot connect to [10.239.18.13:10051] [No error]]
:( :(

I am enclosing the configuration details of my server & agent.

**********ZABBIX_SERVER.CONF *********

Server=1
StartSuckers=6
StartTrappers=5
ListenPort=10051
SenderFrequency=30
DisableHousekeeping=1
DebugLevel=3
Timeout=5
LogFile=/tmp/zabbix_server.log

DBHost=localhost
DBName=zabbix
DBUser=root

#DBSocket=/tmp/mysql.sock

***********ZABBIX_AGENTD.CONF*****************

Server=10.239.18.13
ServerPort=10051
Hostname=10.239.19.120
ListenPort=10050
#ListenIP=10.239.18.13
StartAgents=5
#DisableActive=1
DebugLevel=3
LogFile=C:\zabbix_agentd.log
Timeout=3

Nate Bell
09-01-2006, 15:49
I'll get the ball rolling. I haven't ever tried running a Zabbix Agent on a Win32 box. That said, I have used Windows extensively otherwise, and my first guess would be that your agent can't connect because a firewall running on Windows is blocking the port Zabbix uses.

Nate

cameronsto
09-01-2006, 17:15
Is there a firewall between the agent and server blocking requests to port 10051?

-cameron

elkor
10-01-2006, 01:57
Are you only using active checks for the windows box? regardless of a firewall blocking inboud 10051 traffic, if outbound (server -> agent) 10050 traffic is allowed "zabbix agent" type items should function regardless of this message being present in the agent's log file

Bharathu
10-01-2006, 05:50
Firstly thanks for ur replies....but i dont have any firewalls installed on my PC. Also i ran the agent in 2 modes i.e. "c:\zabbixw32 start" and also in "c:\zabbixw32 standalone". In both ways i get the same error in my log file. In my web interface i see 0 in the field of "number of values stored".

When i ran my server on the linux box, it was creating 11 threads in total. But when i run the command "strace zabbix_server" i get this message....as follows.


*****************************************
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 4
fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
connect(4, {sa_family=AF_FILE, path="/var/lib/mysql/mysql.sock"}, 110) = 0
setsockopt(4, SOL_IP, IP_TOS, [8], 4) = -1 EOPNOTSUPP (Operation not supported)setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
read(4, "9\0\0\0", 4) = 4
read(4, "\n4.1.0-alpha-standard-log\0\333#\0\0t!"..., 57) = 57
stat64("/usr/share/mysql/charsets/Index.xml", {st_mode=S_IFREG|0777, st_size=17147, ...}) = 0
open("/usr/share/mysql/charsets/Index.xml", O_RDONLY|O_LARGEFILE) = 5
read(5, "<?xml version=\'1.0\' encoding=\"ut"..., 17147) = 17147
close(5) = 0
write(4, "\22\0\0\1\215\240\0\0\0root\0\0zabbix\0", 22) = 22
read(4, "\5\0\0\2", 4) = 4
read(4, "\0\0\0\2\0", 5) = 5
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
read(4, 0x87d6ac8, 8192) = -1 EAGAIN (Resource temporarily unavailable)fcntl64(4, F_SETFL, O_RDWR) = 0
write(4, "\7\0\0\0\2zabbix", 11) = 11
read(4, "\5\0\0\1", 4) = 4
read(4, "\0\0\0\2\0", 5) = 5
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
read(4, 0x87d6ac8, 8192) = -1 EAGAIN (Resource temporarily unavailable)fcntl64(4, F_SETFL, O_RDWR) = 0
write(4, "\'\0\0\0\3select refresh_unsupported "..., 43) = 43
read(4, "\1\0\0\1", 4) = 4
read(4, "\1", 1) = 1
read(4, "%\0\0\2", 4) = 4
read(4, "\6config\23refresh_unsupported\3\4\0\0\1"..., 37) = 37
read(4, "\1\0\0\3", 4) = 4
read(4, "\376", 1) = 1
read(4, "\4\0\0\4", 4) = 4
read(4, "\003600", 4) = 4
read(4, "\1\0\0\5", 4) = 4
read(4, "\376", 1) = 1
getuid32() = 0
socket(PF_FILE, SOCK_STREAM, 0) = 5
fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(5, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)close(5) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 5
fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(5, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(5) = 0
open("/etc/passwd", O_RDONLY) = 5
fcntl64(5, F_GETFD) = 0
fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
fstat64(5, {st_mode=S_IFREG|0644, st_size=2138, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe5000
read(5, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2138
close(5) = 0
munmap(0xb7fe5000, 4096) = 0
setgid32(502) = 0
setuid32(502) = 0
setresgid32(-1, 502, -1) = 0
setresuid32(-1, 502, -1) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGC HLD, child_tidptr=0xb7fe6708) = 27765
File [/tmp/zabbix_server.pid] exists. Is this process already running ?
--- SIGCHLD (Child exited) @ 0 (0) ---
exit_group(0) = ?
******************************************

I'm a total newbie in Linux stuffs....just 2 weeks into linux. I dont know how to make the head and tail of the strace o/p i got..... any ideas?
waiting for replies guys...... :)

Regards,
Bharathu

elkor
10-01-2006, 13:32
Ok,

one thing at a time. firstly you have your zabbix server

next you have the box you're monitoring (your pc?) regardless of operating system.

on the server, through the web interface you must first add the host to be monitored in the configuration/hosts section. When that is done you must add items that you want to monitor for that host in the configuration/items section. To be useful for alerts you should add triggers for the items as well in configuration/triggers but that can be done at a later date.. let's just verify you can collect data.

Add an item that is checked somewhat frequently and will show activity.. such as cpu load or something. For simplicity sake, have it be of a "zabbix agent" type.

then fire up the agent on the machine to be monitored, don't worry about the could not communicate with server messages for now. That message just means that the agent can't "call home" to the server to see if it has any active checks to make; it is 100% possible that the system could be working just fine and you still get that message in the agent's log.

post back here and let us know what you've got.

Bharathu
10-01-2006, 14:09
Hi

I tried as you said. I added a host with "host name = 10.239.19.120" this machine has the agent installed and is working on win XP. I also selected "use IP" and entered the same IP of my host. I selected the port to 10051. I did not add anything from the templates. I clearly dont know wat to do with them...so didnt touch it.

Next i added an ITEM with "description = cpu load", host = 10.239.19.120, type = zabbix agent, and left key and units fiels NULL. This is still not working....when i went to "Monitoring/Latest data"...... it is just showing a "-" for "cpu load" in the last check, last value & change fields.

I found that it was showing "Cannot connect to [10.239.19.120] [Connection refused] " in Configuration/Host ( for my host --> 10.239.19.120)

Alexei
10-01-2006, 14:13
Next i added an ITEM with "description = cpu load", host = 10.239.19.120, type = zabbix agent, and left key and units fiels NULL.1. Item's key must not be empty!
2. Check if ZABBIX server is listening port 10051 (netstat -an|grep 10051)
3. Try to connect to ZABBIX server from the monitored host (telnet <server> 10051)

Bharathu
10-01-2006, 14:16
hi...
wat should i type in item key? for example for cpu load...wat should i take as the key?? And i cannot use telnet..... it does not work....
And i get this message when i use netstat command

[zabbix@localhost bin]$ netstat -an|grep 10051
tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN
regds,
Bharathu

Bharathu
10-01-2006, 14:23
i have the following error in my server log file
:(
****************************************
030325:20060110:181048 Starting zabbix_server. ZABBIX 1.1beta4.
030327:20060110:181048 server #1 started [Alerter]
030329:20060110:181048 server #2 started [Timer]
030331:20060110:181048 server #3 started [ICMP pinger]
030333:20060110:181048 server #4 started [Escalator]
030335:20060110:181048 server #5 started [Poller. SNMP:OFF]
030337:20060110:181048 server #6 started [Trapper]
030339:20060110:181048 server #7 started [Trapper]
030335:20060110:181048 Cannot connect to [10.239.19.120] [Connection refused]
030335:20060110:181048 Host [10.239.19.120] will be checked after [60] seconds
030341:20060110:181048 server #8 started [Trapper]
030346:20060110:181048 server #9 started [Trapper]
030348:20060110:181048 server #10 started [Trapper]
030325:20060110:181048 server #0 started [Housekeeper]
030335:20060110:181148 Cannot connect to [10.239.19.120] [Connection refused]
030335:20060110:181148 Host [10.239.19.120] will be checked after [60] seconds
030335:20060110:181248 Cannot connect to [10.239.19.120] [Connection refused]
030335:20060110:181248 Host [10.239.19.120] will be checked after [60] seconds
030335:20060110:182248 Cannot connect to [10.239.19.120] [Connection refused]
030335:20060110:182248 Cannot connect to [10.239.19.120] [Connection refused]

Bharathu
10-01-2006, 15:22
one more thing ....i could not configure the zabbix server with "net-snmp"support as it gave me errors. So i did not use the option "--with-net-snmp" while configuring the server. :confused:
Please send me some valuable tips.... thanking u all in anticipation
Bharathu

elkor
10-01-2006, 15:24
documentation is here (http://www.zabbix.com/manual/v1.1/config_items.php) for item key syntax. The name is simply a label, it is the key that tells the server what to actually monitor.

for cpu load the key is system.cpu.load.

I know the documentation is sparse, but as soon as you get connectivity I HIGHLY suggest you spend some time going through it and working with the program in detail or it will be a very rocky road ;)

Bharathu
10-01-2006, 16:37
hi
Thanks for the info. I will surely go through the documentation once again as u said. But can u please help me in checking whether my configuration files (server & agent) are correct?? I have attached them in my first post. Also i really dont understand how to make the s/w run in polling mode or active check mode. I heard that active check is the best...please can u help me by just giving an example on how to configure the active check mode and the polling mode. I'm getting confused with terms "server port", "listen port", "listen ip" etc... i feel i am going wrong somewhere here ....which is leading to my problems...
thanking u all in anticipation

Bharathu :)

Bharathu
11-01-2006, 06:17
please... can u check the config file i attached and tell me if i have them right!!! Please guys anyone with any ideas? I am still not able to record any values!!! :confused: :(

Bharathu
11-01-2006, 13:10
hi i did telnet to my linux server from the windows host(telnet 10.239.18.13 10051)... and then immediately i used the netstat command on my linux box to see (netstat -an |grep 10.239.19.120 )
Then i got this message....
tcp 0 0 10.239.18.13:10051 10.239.19.120:4962 ESTABLISHED

later after some time it goes into this state as below

tcp 0 0 10.239.18.13:10051 10.239.19.120:4962 TIME_WAIT

Dont know why this is happening? How and why it is going to TIME_WAIT state ?? need urgent help..please !!!!! :eek:

Alexei
11-01-2006, 14:02
You're asking very basic questions! Read books...

Bharathu
11-01-2006, 14:11
Thanks for ur reply sir.... i know i am not good in these stuffs... i'm trying hard here. can u atleast tell me how i can get to solve the error ---> Cannot connect to [10.239.19.120] [Connection refused] in my zabbix_server.log file ????? ur help will be really appreciated sir.

kisters
11-01-2006, 14:40
Hi,

lets start from the bottom. First check if you can connect from the windows workstation to the zabbix-linux-server:
At the windows-computer try a telnet 10.239.18.13 10051
If you get a black box and you can get back to the command-line by pressing enter the zabbix-services on the server are running. If you get message telling you you cannot connect to the host eather there is a firewall running on the zabbix server or the processes aren't running.

Then try the same thing from the zabbix-server:
telnet 10.239.19.120 10050

There won't be a prompt, just
Connected to 10.239.19.120
Escape character is '^]'.

If you press enter you will get
ZBX_NOTSUPPORTED
Connection closed by foreign host.

If you get other results eather the zabbix-agent isn't running on the windows-pc or there is a firewall (Windows XP SP2 / Windows 2003 SP1 Firewall?) enabled.

Bharathu
11-01-2006, 14:56
thanks for ur prompt reply
i tried as u said....when i did telnet from windows i got the black screen and when i press "enter" it comes to command line.... but if i dont press enter also it comes back to command line after some time. Is it usual to happen like that?

Next i tried telnet on the linux machine ...as u said and i got these messages

telnet 10.239.19.120 10050
Trying 10.239.19.120...
Connected to 10.239.19.120 (10.239.19.120).
Escape character is '^]'.
Connection closed by foreign host.

but when i press enter as u said i got the message as u said
[zabbix@localhost bin]$ telnet 10.239.19.120 10050
Trying 10.239.19.120...
Connected to 10.239.19.120 (10.239.19.120).
Escape character is '^]'.

ZBX_NOTSUPPORTED
Connection closed by foreign host.

now wat to do? my windows is running service pack 2

ReefShark
11-01-2006, 15:52
Not sure, but I think you should disable active checks for now unless you intend to use them.
So in your zabbix_agentd.conf:
#ServerPort=10051
DisableActive=1
... and restart your agent process of course.

When I run into trouble I usually go to the most basic configuration and settings and build up step by step from there, so you can easily tell where things get screwy.

I think compiling with net-snmp requires you have that package installed. So if you intend to use it, install net-snmp and compile Zabbix again.

Based on your telnet output, I think your server <-> agent communication works as intended, so their might be something wrong with your GUI setup. Have you tried using any of the template items (Host.Win32 has a few that are backward compatible, so should be usable).
Also, is your agent running on your Windows box (check in the Services window)?

Nate Bell
11-01-2006, 15:54
It sounds like you can connect through telnet just fine. I also don't see anything wrong with your .conf files that you posted. Personally, I haven't had much luck getting active checks to work with Linux, so all my items are set to ZABBIX_Agent. I HIGHLY recommend you stay away from active checks until you have gotten a simple setup working with normal ZABBIX_Agent items. Here are some things to check:

If you have both the zabbix server running, and you have the zabbix agentd on the Windows PC running, then take a look at your Zabbix web interface. Click on Configuration -> Hosts and make sure you have set up a host that points to the Windows PC. To double check if you have set yours up here are the steps: Enter a name for the host into the Host text box, click Use IP Address, and enter the Windows PC's IP address. Everything else is fine to leave as default, so click Add.

Once you have a host, you have to add items for it. From the Hosts screen, click the name of the host that corresponds to the Windows PC. In the text box at the bottom, enter these values at add an item:
Description: Outgoing traffic on interface eth0 (1min)
Type: Zabbix agent
key: netloadout1[eth0]
units: Bps
and leave everything else at its default value. Click Add.

Now, with that setup, you should be able to look at that host's latest data, and see it gathering the amount of outgoing traffic on your ethernet device once every minute.

If that does not work, then something else is going on. If it does work, then you can start adding more items that come built into zabbix.

elkor
11-01-2006, 16:32
We're all willing to help you out here, but you also need to make some effort.

I suspect that there is nothing wrong with your installation at all, you simply don't have any items defined and you are getting hung up on the error message in the client file that it can't connect back to the server. It is safe to ignore this unless your items are of the type ZABBIX_Active, which they should not be.

Does your server show with a Monitored status in the hosts section of the web front end? If so, go check the documentation section and add some items, I think you'll see it's working correctly.

Bharathu
12-01-2006, 05:17
Guys!!
Yesterday i finally got some values recorded. My configuration is working for active checks only. I got many values recoorded. I added items to monitor CPU utilization. And there is some problem with my windows pc where my agent is working. The linux box is not connecting to my windows agent. That's why i am getting "connection refused error" in my server log. But my active checks were getting recorded into server. So like that i saw the graph for the first time..and i am thrilled.

Thank u so much guys.... u rock!! I'm still in the process of checking other items. and i also want to find out about why my windows PC is refusing the connection from the linux server. Windows firewall is OFF. but there is a mac afee s/w installed.... i dont know whether that's responsible for it. I dont have any control over it...i'm on a network and i am not the admin of it.

will get back to u with updates...thanks again... :)

ReefShark
12-01-2006, 10:37
I'm still in the process of checking other items. and i also want to find out about why my windows PC is refusing the connection from the linux server. Windows firewall is OFF. but there is a mac afee s/w installed.... i dont know whether that's responsible for it. I dont have any control over it...i'm on a network and i am not the admin of it.)
I am sure McAfee is blocking all incoming traffic on port 10050. In fact, it should block traffic on that port, unless there is a reason not to. If you use Zabbix, there is obviously a reason not to block that particular port. So take a look at the McAfee config and open up port 10050. Whether you are a network admin or not shouldn't matter because it's a piece of software on your PC.

Cudo's on getting your setup working btw ;)

Bharathu
12-01-2006, 11:43
I really have no control on it. I went thru the firewall setting where we can change access on the ports. All those fields were disabled. I cannot do any thing. I am still trying to find some other way out. Just like the active checks i have a strong feeling i can make the server polling method also work.
For the time being i am getting thrilled seeing these supeb graps i'm getting. 20 days of hard work paid off finally. :)

Bharathu
13-01-2006, 06:40
Hello everybody...

Now i got the server polling also working... it is doing a great job. For server polling these are the modification required in the windows agent config file.

Server=10.239.18.13
#ServerPort=10051
#Hostname=10.239.52.10 (windows host)
ListenPort=10050
ListenIP=10.239.18.13 (my zabbix server installed here)

Also i added the host from the WEB Gui with post "10050" and selected add the "item" with "zabbix agent" type.

this worked for me :)