Ad Widget

Collapse

Zabbix Agent Containers... On a train

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • millap
    Junior Member
    • Sep 2025
    • 2

    #1

    Zabbix Agent Containers... On a train

    Hey all,

    First post, apologies if I'm missing something critical. I'm involved with a trial of LEO connectivity on some trains in the UK. As part of the trial, we've been evaluating different on-board monitoring platforms to tell us the health of on-board systems, as well as pulling SNMP statistics from supported systems. We have a central Zabbix server, with Zabbix Proxies running as containers within the on-board FW on each train feeding data back. So far, so good. The connectivity isn't always reliable, even with LEO due to our geographic position and the availability of LEO coverage for some periods of time. Also, the undulating path the train takes or tunnels can also lead to loss of backhaul.

    Our on-board backhaul router maintains a tunnel to the DC where the main Zabbix server lives, and this tunnel can intermittently go down. The issue I have, is the remote Zabbix Agents are losing record of host availability for SNMP monitored hosts, leaving them in an 'Unknown' state. If we remote restart the container via Ansible, bingo, within a few seconds, the state of monitored hosts returns. Even when we've lost the state of the monitored SNMP agents, we're still receiving data from the agents in relation to interface, CPU, memory, etc.

    Before we restart the container, we have logs which look liks -

    Code:
    8:20260223:013830.008 received configuration data from server at "1.2.3.4", datalen 233947
    38:20260223:013833.536 enabling SNMP agent checks on host "FW-000000-001": interface became available
    38:20260223:013836.548 enabling SNMP agent checks on host "SWI-000000-00-001": interface became available
    38:20260223:013839.487 enabling SNMP agent checks on host "SWI-000000-00-002": interface became available
    38:20260223:013840.551 enabling SNMP agent checks on host "SWI-000000-00-001": interface became available
    38:20260223:013841.548 enabling SNMP agent checks on host "SWI-000000-00-002": interface became available
    38:20260223:013842.533 enabling SNMP agent checks on host "RTR-000000-001": interface became available
    16:20260223:020754.448 executing housekeeper
    16:20260223:020754.480 housekeeper [deleted 0 records in 0.000380 sec, idle for 1 hour(s)]
    8:20260223:023424.731 received configuration data from server at "1.2.3.4", datalen 29712
    16:20260223:030754.004 executing housekeeper
    16:20260223:030755.005 housekeeper [deleted 71882 records in 0.451045 sec, idle for 1 hour(s)]
    15:20260223:031250.668 Unable to connect to [1.2.3.4]:10051 [cannot connect to [[1.2.3.4]:10051]: connection timed out]
    15:20260223:031250.668 Will try to reconnect every 1 second(s)
    15:20260223:031250.704 Connection restored.
    8:20260223:031254.676 Unable to connect to [1.2.3.4]:10051 [cannot connect to [[1.2.3.4]:10051]: connection timed out]
    8:20260223:031254.676 Will try to reconnect every 10 second(s)
    8:20260223:031255.780 Connection restored.
    After the restart of the container, it clears the issue. The logs look remarkably similar -

    Code:
    user@fw:~$ sho container log zabbix-proxy | no-more
    Preparing Zabbix proxy
    Starting Zabbix Proxy (active) [000000]. Zabbix 7.4.2 (revision 7aa4e07).
    Press Ctrl+C to exit.
    
    1:20260224:121243.196 Starting Zabbix Proxy (active) [000000]. Zabbix 7.4.2 (revision 7aa4e07).
    1:20260224:121243.196 **** Enabled features ****
    1:20260224:121243.196 SNMP monitoring: YES
    1:20260224:121243.196 IPMI monitoring: YES
    1:20260224:121243.196 Web monitoring: YES
    1:20260224:121243.196 VMware monitoring: YES
    1:20260224:121243.196 ODBC: YES
    1:20260224:121243.196 SSH support: YES
    1:20260224:121243.196 IPv6 support: YES
    1:20260224:121243.196 TLS support: YES
    1:20260224:121243.196 **************************
    1:20260224:121243.196 using configuration file: /etc/zabbix/zabbix_proxy.conf
    1:20260224:121243.206 cannot open database file "/var/lib/zabbix/db_data/000000.sqlite": [2] No such file or directory
    1:20260224:121243.207 creating database ...
    1:20260224:121244.244 current database version (mandatory/optional): 07040000/07040000
    1:20260224:121244.244 required mandatory version: 07040000
    1:20260224:121244.246 proxy #0 started [main process]
    8:20260224:121244.247 proxy #1 started [configuration syncer #1]
    8:20260224:121244.265 no records in "settings" table
    9:20260224:121244.274 proxy #2 started [trapper #1]
    10:20260224:121244.274 proxy #3 started [trapper #2]
    11:20260224:121244.275 proxy #4 started [trapper #3]
    12:20260224:121244.276 proxy #5 started [trapper #4]
    13:20260224:121244.278 proxy #6 started [trapper #5]
    14:20260224:121244.279 proxy #7 started [preprocessing manager #1]
    15:20260224:121244.280 proxy #8 started [data sender #1]
    16:20260224:121244.285 proxy #9 started [housekeeper #1]
    17:20260224:121244.286 proxy #10 started [http poller #1]
    18:20260224:121244.288 proxy #11 started [browser poller #1]
    19:20260224:121244.291 proxy #12 started [discovery manager #1]
    20:20260224:121244.297 proxy #13 started [history syncer #1]
    21:20260224:121244.299 proxy #14 started [history syncer #2]
    22:20260224:121244.303 proxy #15 started [history syncer #3]
    23:20260224:121244.308 proxy #16 started [history syncer #4]
    24:20260224:121244.309 proxy #17 started [self-monitoring #1]
    25:20260224:121244.310 proxy #18 started [task manager #1]
    26:20260224:121244.311 proxy #19 started [poller #1]
    27:20260224:121244.311 proxy #20 started [poller #2]
    28:20260224:121244.311 proxy #21 started [poller #3]
    29:20260224:121244.315 proxy #22 started [poller #4]
    30:20260224:121244.317 proxy #23 started [poller #5]
    31:20260224:121244.319 proxy #24 started [unreachable poller #1]
    32:20260224:121244.321 proxy #25 started [icmp pinger #1]
    33:20260224:121244.322 proxy #26 started [availability manager #1]
    37:20260224:121244.330 proxy #30 started [snmp poller #1]
    34:20260224:121244.337 proxy #27 started [odbc poller #1]
    35:20260224:121244.339 proxy #28 started [http agent poller #1]
    37:20260224:121244.340 thread started
    36:20260224:121244.340 proxy #29 started [agent poller #1]
    35:20260224:121244.340 thread started
    38:20260224:121244.341 proxy #31 started [internal poller #1]
    36:20260224:121244.341 thread started
    14:20260224:121244.411 [2] thread started [preprocessing worker #2]
    14:20260224:121244.411 [4] thread started [preprocessing worker #4]
    14:20260224:121244.411 [1] thread started [preprocessing worker #1]
    14:20260224:121244.412 [5] thread started [preprocessing worker #5]
    14:20260224:121244.412 [7] thread started [preprocessing worker #7]
    14:20260224:121244.412 [8] thread started [preprocessing worker #8]
    14:20260224:121244.412 [6] thread started [preprocessing worker #6]
    14:20260224:121244.412 [10] thread started [preprocessing worker #10]
    14:20260224:121244.412 [3] thread started [preprocessing worker #3]
    14:20260224:121244.412 [11] thread started [preprocessing worker #11]
    14:20260224:121244.412 [12] thread started [preprocessing worker #12]
    14:20260224:121244.413 [14] thread started [preprocessing worker #14]
    14:20260224:121244.413 [15] thread started [preprocessing worker #15]
    14:20260224:121244.415 [13] thread started [preprocessing worker #13]
    14:20260224:121244.415 [9] thread started [preprocessing worker #9]
    14:20260224:121244.415 [16] thread started [preprocessing worker #16]
    8:20260224:121244.471 received configuration data from server at "1.2.3.4", datalen 233947
    19:20260224:121245.648 thread started [discovery worker #1]
    19:20260224:121245.648 thread started [discovery worker #3]
    19:20260224:121245.648 thread started [discovery worker #4]
    19:20260224:121245.648 thread started [discovery worker #5]
    19:20260224:121245.648 thread started [discovery worker #2]
    37:20260224:121246.396 enabling SNMP agent checks on host "FW-000000-001": interface became available
    37:20260224:121247.337 enabling SNMP agent checks on host "RTR-000000-001": interface became available
    37:20260224:121247.300 enabling SNMP agent checks on host "SWI-000000-00-001": interface became available
    37:20260224:121247.385 enabling SNMP agent checks on host "SWI-000000-00-001": interface became available
    37:20260224:121248.345 enabling SNMP agent checks on host "SWI-000000-00-002": interface became available
    Restarting the container, either manually, or in an automated fashion is fine, but it's not a very elegant solution, and in the OS we're using to run the containers, we've hit issues with the number of volumes exceeding a watermark when restarting the Zabbix Agent container so often (leading to more maintenance from the podman perspective).

    I was wondering if any experts might be able to offer some advice who've worked with the agent in less than 100% reliable backhaul environments as to whether there might be some optional environment variables we can pass to the agent to mitigate the loss of monitoring. Our container config in the FW host OS is fairly basic and is as follows -

    Code:
    set container name zabbix-proxy allow-host-networks
    set container name zabbix-proxy capability 'net-raw'
    set container name zabbix-proxy environment ZBX_DEBUGLEVEL value '3'
    set container name zabbix-proxy environment ZBX_HOSTNAME value '000000'
    set container name zabbix-proxy environment ZBX_PROXYMODE value '0'
    set container name zabbix-proxy environment ZBX_SERVER_HOST value '1.2.3.4'
    set container name zabbix-proxy environment ZBX_SERVER_PORT value '10051'
    set container name zabbix-proxy image 'zabbix/zabbix-proxy-sqlite3'
    Thanks for taking the time to read this post!

    Andy

    Last edited by millap; Yesterday, 16:01.
  • guntis_liepins
    Member
    • Oct 2025
    • 34

    #2
    Do you monitor SNMP items via proxy too ? Proxy should reconnect without restarting a container...

    Comment

    • millap
      Junior Member
      • Sep 2025
      • 2

      #3
      Originally posted by guntis_liepins
      Do you monitor SNMP items via proxy too ? Proxy should reconnect without restarting a container...
      Hiya,
      Yes, we do. That's why it's flagged up as an issue. The Host Availability is based on the SNMP monitored hosts. It's like the container agent is forgetting to return availability to the central Zabbix instance until the container is restarted, at which point the following log entries show things have kicked back into action -

      Code:
      37:20260224:121246.396 enabling SNMP agent checks on host "FW-000000-001": interface became available
      37:20260224:121247.337 enabling SNMP agent checks on host "RTR-000000-001": interface became available
      37:20260224:121247.300 enabling SNMP agent checks on host "SWI-000000-00-001": interface became available
      37:20260224:121247.385 enabling SNMP agent checks on host "SWI-000000-00-001": interface became available
      37:20260224:121248.345 enabling SNMP agent checks on host "SWI-000000-00-002": interface became available
      Once this is done, the external platform which uses the Zabbix API to monitor host availabilty shows a good state -

      Click image for larger version

Name:	Screenshot 2026-02-27 at 08.41.10.png
Views:	14
Size:	52.5 KB
ID:	511466
      Before that, the Monitoring Agent state is 'UP', but Hosts UP is 0. On the Zabbix Dashboard, this is what availability looks like before the container restart -

      Click image for larger version

Name:	Screenshot 2026-02-27 at 08.46.02.png
Views:	14
Size:	27.3 KB
ID:	511467

      Andy

      Comment

      Working...