HI,
We have encryption with certificates running for some time now and it's mostly running very fine. Thanks to Puppet the switch from unencrypted to encrypted communication went very smoothly and without much downtime.
But there's a very weird problem if we add a new server to Zabbix which never had contact with the server before (brand new VM).
All the certs and configuration are generated and deployed by Puppet before the Zabbix agents starts the very first time. But in this case (and only this case) we cannot get it to work correctly, except we do one of two things:
It becomes even weirder. We tried to debug where the problem starts and enabled the debug log on the server and agent. But everything looks as it should be. The data is sent from the agent and received by the server (logs below). But no data is written to the database.
Sent by the agent (stripped the data part a bit):
And received by the server:
There's one error parsing the data, but that's a small permission issue only.
Result: No data is ever written to the database as long we don't do one of the two tasks mentioned above.
It's a bit annoying to reproduce/test because it seems to only work (or better, fail) on a fresh machine which never had contact with the server before.
Any suggestions how we can debug this further? Because we have a production system only we're a bit limited in what we can test.
Thank you,
Urs
We have encryption with certificates running for some time now and it's mostly running very fine. Thanks to Puppet the switch from unencrypted to encrypted communication went very smoothly and without much downtime.
But there's a very weird problem if we add a new server to Zabbix which never had contact with the server before (brand new VM).
All the certs and configuration are generated and deployed by Puppet before the Zabbix agents starts the very first time. But in this case (and only this case) we cannot get it to work correctly, except we do one of two things:
- Let the agent connect to the server without any encryption one time, and then turn encryption back on (never tested a PSK)
- We restart the Zabbix server service
It becomes even weirder. We tried to debug where the problem starts and enabled the debug log on the server and agent. But everything looks as it should be. The data is sent from the agent and received by the server (logs below). But no data is written to the database.
Sent by the agent (stripped the data part a bit):
Code:
2664:20160721:134701.518 In send_buffer() host:'11.22.33.2' port:10051 entries:37/100
2664:20160721:134701.519 In zbx_tls_connect(): issuer:"CN=Company CA - G1,O=Foo,L=Bar,ST=Blah,C=US" subject:"CN=server,O=Foo,L=Bar,ST=Blah,C=US"
2664:20160721:134701.533 peer certificate issuer:"CN=Company CA - G1,O=Foo,L=Bar,ST=Blah,C=US" subject:"CN=server,O=Foo,L=Bar,ST=Blah,C=US"
2664:20160721:134701.533 End of zbx_tls_connect():SUCCEED (established TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256)
2664:20160721:134701.533 JSON before sending [{"request":"agent data","data":[{"host":"client","key":"agent.hostname","value":"client","clock":1469101616,"ns":476152236},{"host":"client","key":"agent.ping","value":"1","clock":1469101616,"ns":476297844},{"host":"client","key":"agent.version","value":"3.0.3","clock":1469101616,"ns":476437202},
...]
2664:20160721:134701.534 JSON back [{"response":"success","info":"processed: 36; failed: 1; total: 37; seconds spent: 0.000608"}]
2664:20160721:134701.534 In check_response() response:'{"response":"success","info":"processed: 36; failed: 1; total: 37; seconds spent: 0.000608"}'
2664:20160721:134701.534 info from server: 'processed: 36; failed: 1; total: 37; seconds spent: 0.000608'
2664:20160721:134701.534 End of check_response():SUCCEED
2664:20160721:134701.534 OK
2664:20160721:134701.534 End of send_buffer():SUCCEED
Code:
8362:20160721:134701.531 In zbx_tls_accept()
8362:20160721:134701.544 peer certificate issuer:"CN=Company CA - G1,O=Foo,L=Bar,ST=Blah,C=US" subject:"CN=client,O=Foo,L=Bar,ST=Blah,C=US"
8362:20160721:134701.544 End of zbx_tls_accept():SUCCEED (established TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256)
8362:20160721:134701.544 __zbx_zbx_setproctitle() title:'trapper #10 [processing data]'
8362:20160721:134701.545 trapper got '{"request":"agent data","data":[{"host":"client","key":"agent.hostname","value":"client","clock":1469101616,"ns":476152236},{"host":"client","key":"agent.ping","value":"1","clock":1469101616,"ns":476297844},{"host":"client","key":"agent.version","value":"3.0.3","clock":1469101616,"ns":476437202}
...],"clock":1469101621,"ns":533376880}'
8362:20160721:134701.545 In recv_agenthistory()
8362:20160721:134701.545 In process_hist_data()
8362:20160721:134701.545 In process_mass_data()
8362:20160721:134701.546 item [client:Basic[yumupdatecount]] error: Received value [xxx: Permission deniedInvalid value] is not suitable for value type [Numeric (unsigned)] and data type [Decimal]
8362:20160721:134701.546 End of process_mass_data()
8362:20160721:134701.546 End of process_hist_data():SUCCEED
8362:20160721:134701.546 In zbx_send_response()
8362:20160721:134701.546 zbx_send_response() '{"response":"success","info":"processed: 36; failed: 1; total: 37; seconds spent: 0.000608"}'
8362:20160721:134701.546 End of zbx_send_response():SUCCEED
Result: No data is ever written to the database as long we don't do one of the two tasks mentioned above.
It's a bit annoying to reproduce/test because it seems to only work (or better, fail) on a fresh machine which never had contact with the server before.
Any suggestions how we can debug this further? Because we have a production system only we're a bit limited in what we can test.
Thank you,
Urs
Comment