After upgrading to 4.0.1 version there have occurred lots of "failed: first network errors". Also, there are some empty values in different kinds of graphs and values are missing from history and even problems from unreachable triggers.
[root@server_zabbixs1 etc]# tail -f /var/log/zabbix/zabbix_server.log
21088:20181127:212901.948 Zabbix agent item "net.tcp.listen[9050]" on host "server_go-html1.domain.dc" failed: first network error, wait for 15 seconds
21075:20181127:212929.753 Zabbix agent item "net.if.out[WAN Miniport (L2TP)]" on host "server_go-html2.domain.dc" failed: first network error, wait for 15 seconds
21091:20181127:212931.963 resuming Zabbix agent checks on host "server_go-html1.domain.dc": connection restored
21087:20181127:212937.977 Zabbix agent item "net.if.out[WAN Miniport (IP)-QoS Packet Scheduler-0000]" on host "server_mq3.domain.dc" failed: first network error, wait for 15 seconds
21079:20181127:212947.650 Zabbix agent item "net.if.out[vmxnet3 Ethernet Adapter-WFP 802.3 MAC Layer LightWeight Filter-0000]" on host "server_GO-API1.domain.dc" failed: first network error, wait for 15 seconds
21059:20181127:212956.197 Zabbix agent item "net.if.out[WAN Miniport (IPv6)-WFP Native MAC Layer LightWeight Filter-0000]" on host "server_webdemo2.domain.dc" failed: first network error, wait for 15 seconds
21091:20181127:212959.983 resuming Zabbix agent checks on host "server_go-html2.domain.dc": connection restored
21063:20181127:213004.459 Zabbix agent item "perf_counter["\Web Service Cache\File Cache Hits %",300]" on host "server_go-sl1.domain.dc" failed: first network error, wait for 15 seconds
21091:20181127:213008.000 resuming Zabbix agent checks on host "server_mq3.domain.dc": connection restored
21091:20181127:213017.006 resuming Zabbix agent checks on host "server_GO-API1.domain.dc": connection restored
21091:20181127:213026.019 resuming Zabbix agent checks on host "server_webdemo2.domain.dc": connection restored
21046:20181127:213031.261 Zabbix agent item "net.if.out[Microsoft Kernel Debug Network Adapter]" on host "server_go-sl1.domain.dc" failed: another network error, wait for 15 seconds
21091:20181127:213050.046 resuming Zabbix agent checks on host "server_go-sl1.domain.dc": connection restored
21087:20181127:213051.819 Zabbix agent item "net.if.in[vmxnet3 Ethernet Adapter-WFP LightWeight Filter-0000]" on host "server_xen6542.domain.dc" failed: first network error, wait for 15 seconds
21064:20181127:213058.660 Zabbix agent item "vfs.fs.size[E:,used]" on host "server_mq3.domain.dc" failed: first network error, wait for 15 seconds
21056:20181127:213108.149 Zabbix agent item "net.if.in[WAN Miniport (SSTP)]" on host "server_MAINTCLN2.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213121.011 resuming Zabbix agent checks on host "server_xen6542.domain.dc": connection restored
21075:20181127:213123.599 Zabbix agent item "net.if.in[Microsoft ISATAP Adapter #2]" on host "server_go-sl4.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213128.016 resuming Zabbix agent checks on host "server_mq3.domain.dc": connection restored
21067:20181127:213136.766 Zabbix agent item "system.cpu.load[percpu,avg15]" on host "server_sqlhtk.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213138.025 resuming Zabbix agent checks on host "server_MAINTCLN2.domain.dc": connection restored
21046:20181127:213143.158 Zabbix agent item "net.if.in[WAN Miniport (PPPOE)]" on host "server_INT2.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213153.033 resuming Zabbix agent checks on host "server_go-sl4.domain.dc": connection restored
21090:20181127:213206.040 resuming Zabbix agent checks on host "server_sqlhtk.domain.dc": connection restored
21090:20181127:213213.044 resuming Zabbix agent checks on host "server_INT2.domain.dc": connection restored
We have already tried increasing server resources (CPU and memory), changed the number of pollers and did some tuning with MariaDB. We also upgraded the agent of some servers. There aren't any errors in the agent log. Everything else seems to work perfectly, so probably there isn't anything wrong with the network. Does anyone have any ideas of what can be the reason for this?
Added the screenshot of Zabbix servers CPU load.
[root@server_zabbixs1 etc]# tail -f /var/log/zabbix/zabbix_server.log
21088:20181127:212901.948 Zabbix agent item "net.tcp.listen[9050]" on host "server_go-html1.domain.dc" failed: first network error, wait for 15 seconds
21075:20181127:212929.753 Zabbix agent item "net.if.out[WAN Miniport (L2TP)]" on host "server_go-html2.domain.dc" failed: first network error, wait for 15 seconds
21091:20181127:212931.963 resuming Zabbix agent checks on host "server_go-html1.domain.dc": connection restored
21087:20181127:212937.977 Zabbix agent item "net.if.out[WAN Miniport (IP)-QoS Packet Scheduler-0000]" on host "server_mq3.domain.dc" failed: first network error, wait for 15 seconds
21079:20181127:212947.650 Zabbix agent item "net.if.out[vmxnet3 Ethernet Adapter-WFP 802.3 MAC Layer LightWeight Filter-0000]" on host "server_GO-API1.domain.dc" failed: first network error, wait for 15 seconds
21059:20181127:212956.197 Zabbix agent item "net.if.out[WAN Miniport (IPv6)-WFP Native MAC Layer LightWeight Filter-0000]" on host "server_webdemo2.domain.dc" failed: first network error, wait for 15 seconds
21091:20181127:212959.983 resuming Zabbix agent checks on host "server_go-html2.domain.dc": connection restored
21063:20181127:213004.459 Zabbix agent item "perf_counter["\Web Service Cache\File Cache Hits %",300]" on host "server_go-sl1.domain.dc" failed: first network error, wait for 15 seconds
21091:20181127:213008.000 resuming Zabbix agent checks on host "server_mq3.domain.dc": connection restored
21091:20181127:213017.006 resuming Zabbix agent checks on host "server_GO-API1.domain.dc": connection restored
21091:20181127:213026.019 resuming Zabbix agent checks on host "server_webdemo2.domain.dc": connection restored
21046:20181127:213031.261 Zabbix agent item "net.if.out[Microsoft Kernel Debug Network Adapter]" on host "server_go-sl1.domain.dc" failed: another network error, wait for 15 seconds
21091:20181127:213050.046 resuming Zabbix agent checks on host "server_go-sl1.domain.dc": connection restored
21087:20181127:213051.819 Zabbix agent item "net.if.in[vmxnet3 Ethernet Adapter-WFP LightWeight Filter-0000]" on host "server_xen6542.domain.dc" failed: first network error, wait for 15 seconds
21064:20181127:213058.660 Zabbix agent item "vfs.fs.size[E:,used]" on host "server_mq3.domain.dc" failed: first network error, wait for 15 seconds
21056:20181127:213108.149 Zabbix agent item "net.if.in[WAN Miniport (SSTP)]" on host "server_MAINTCLN2.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213121.011 resuming Zabbix agent checks on host "server_xen6542.domain.dc": connection restored
21075:20181127:213123.599 Zabbix agent item "net.if.in[Microsoft ISATAP Adapter #2]" on host "server_go-sl4.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213128.016 resuming Zabbix agent checks on host "server_mq3.domain.dc": connection restored
21067:20181127:213136.766 Zabbix agent item "system.cpu.load[percpu,avg15]" on host "server_sqlhtk.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213138.025 resuming Zabbix agent checks on host "server_MAINTCLN2.domain.dc": connection restored
21046:20181127:213143.158 Zabbix agent item "net.if.in[WAN Miniport (PPPOE)]" on host "server_INT2.domain.dc" failed: first network error, wait for 15 seconds
21090:20181127:213153.033 resuming Zabbix agent checks on host "server_go-sl4.domain.dc": connection restored
21090:20181127:213206.040 resuming Zabbix agent checks on host "server_sqlhtk.domain.dc": connection restored
21090:20181127:213213.044 resuming Zabbix agent checks on host "server_INT2.domain.dc": connection restored
We have already tried increasing server resources (CPU and memory), changed the number of pollers and did some tuning with MariaDB. We also upgraded the agent of some servers. There aren't any errors in the agent log. Everything else seems to work perfectly, so probably there isn't anything wrong with the network. Does anyone have any ideas of what can be the reason for this?
Added the screenshot of Zabbix servers CPU load.
Comment