Hi All,
Today morning at around 10:30 I had a deadlock on my DB and all my tomcat was down (false alert by zabbix)
In my zabbix_server.log I found only this messages. Only tomcat tests failed, but I cannot see them in the zabbix_server.log.
I already changed the Timeout settings to 20 from 5 seconds, but it didnt helped me.
I have userparameters for tomcat testing.
My trigger:
{Template_Tomcat01:tomcat01-uptime[{$IPSTRING01},{$JMXPORT01}].nodata(180)}=1
My item:
tomcat01-uptime[{$IPSTRING01},{$JMXPORT01}]
My userparameter:
UserParameter=tomcat01-uptime[*],java -jar /usr/local/tomcatshared/skajla-JMXClient.jar $1 $2 admin jmxpass java.lang:type=Runtime Uptime
$1 the tomcat ip address
$2 the tomcat port
Could you please advise how can I collect more logs?
I am using 2.0.2 zabbix server, postgresql 9.1, agent are 1:1.8.11-1
Thanks,
Andrew
Code:
30294:20140310:103014.862 Zabbix agent item [proc.num[]] on host [ ] failed: first network error, wait for 15 seconds
30289:20140310:103031.606 Zabbix agent item [system.uptime] on host [ ] failed: first network error, wait for 15 seconds
30299:20140310:103035.961 resuming Zabbix agent checks on host [ ]: connection restored
30320:20140310:103038.009 item [ :psql.db_returned[zabbix]] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30320:20140310:103038.014 item [ :mysql.status.Innodb_pages_read] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30323:20140310:103043.150 item [ :psql.db_connections[zabbix]] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30317:20140310:103045.781 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30308:20140310:103105.868 Sending configuration data to proxy 'Zabbix proxy '. Datalen 94224
30322:20140310:103106.471 item [ :psql.tx_commited] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30322:20140310:103106.474 item [ :psql.db_fetched[zabbix]] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30322:20140310:103106.476 item [ :psql.db_deleted[zabbix]] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30322:20140310:103106.482 item [ :mysql.status.Open_tables] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30322:20140310:103106.484 item [ :perf_counter[\ASP.NET Apps v4.0.30319(__Total__)\Requests/Sec]] became not supported: ZBX_NOTSUPPORTED
30321:20140310:103106.863 item [ :psql.server_processes] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30321:20140310:103106.871 item [ :tomcat01-uptime[{$IPSTRING01},{$JMXPORT01}]] became supported
30301:20140310:103109.343 cannot send list of active checks to [10.29.47.121]: host [FRFYEZD01] not monitored
[B] 30321:20140310:103110.167 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: deadlock detected
DETAIL: Process 6354 waits for ShareLock on transaction 162940348; blocked by process 6352.
Process 6352 waits for ShareLock on transaction 162940345; blocked by process 6354.
HINT: See server log for query details.
[update ids set nextid=nextid+12 where nodeid=0 and table_name='events' and field_name='eventid']
zabbix_server [30321]: ERROR [file:db.c,line:1412] Something impossible has just happened.[/B]
30321:20140310:103110.312 item [ :psql.db_size[zabbix]] became not supported: Received value [timeout while executing a shell script] is not suitable for value type [Numeric (unsigned)] and data type [Decim
30321:20140310:103110.317 item [ :psql.db_connections[zabbix]] became supported
30320:20140310:103110.329 item [ :psql.db_returned[zabbix]] became supported
30320:20140310:103110.333 item [ :mysql.status.Innodb_pages_read] became supported
30322:20140310:103111.232 item [ :psql.db_fetched[zabbix]] became supported
30322:20140310:103111.234 item [ :psql.db_deleted[zabbix]] became supported
30322:20140310:103111.242 item [ :mysql.status.Open_tables] became supported
30322:20140310:103121.534 item [ :psql.tx_commited] became supported
30317:20140310:103200.308 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30317:20140310:103315.518 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30317:20140310:103655.367 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30317:20140310:103810.136 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30322:20140310:103852.309 item [ :tomcat01-uptime[{$IPSTRING01},{$JMXPORT01}]] became supported
30317:20140310:103925.780 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30317:20140310:104040.485 web scenario step "Home page on :Home page on " error: error doing curl_easy_perform: Timeout was reached
30320:20140310:104120.409 item [ :perf_counter[\ASP.NET Apps v4.0.30319(__Total__)\Requests/Sec]] became supported
30316:20140310:104120.884 web scenario step "GEO Web 2 Categories:Category 2698" error: error doing curl_easy_perform: Timeout was reached
30323:20140310:104147.096 item [ :psql.db_size[zabbix]] became supported
30291:20140310:105357.455 Zabbix agent item [agent.ping] on host [ ] failed: first network error, wait for 15 seconds
30318:20140310:105408.173 web scenario step " Web 2 Home SSL:Home SSL" error: error doing curl_easy_perform: Timeout was reached
30292:20140310:105409.833 Zabbix agent item [net.tcp.service[ssh]] on host [ ] failed: first network error, wait for 15 seconds
30299:20140310:105411.114 resuming Zabbix agent checks on host [ ]: connection restored
30299:20140310:105424.157 resuming Zabbix agent checks on host [ ]: connection restored
In my zabbix_server.log I found only this messages. Only tomcat tests failed, but I cannot see them in the zabbix_server.log.
I already changed the Timeout settings to 20 from 5 seconds, but it didnt helped me.
I have userparameters for tomcat testing.
My trigger:
{Template_Tomcat01:tomcat01-uptime[{$IPSTRING01},{$JMXPORT01}].nodata(180)}=1
My item:
tomcat01-uptime[{$IPSTRING01},{$JMXPORT01}]
My userparameter:
UserParameter=tomcat01-uptime[*],java -jar /usr/local/tomcatshared/skajla-JMXClient.jar $1 $2 admin jmxpass java.lang:type=Runtime Uptime
$1 the tomcat ip address
$2 the tomcat port
Could you please advise how can I collect more logs?
I am using 2.0.2 zabbix server, postgresql 9.1, agent are 1:1.8.11-1
Thanks,
Andrew
Comment