Ad Widget

Collapse

Can't deal with Zabbix active check

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • GillezDeleuze
    Junior Member
    • Dec 2023
    • 4

    #1

    Can't deal with Zabbix active check

    Hello guys!

    I used zabbix for a long time and everything was cool. I used passive check on ~100 VM (Linux and WIndow) and it works now.
    But few days ago one of my machines started to work bad (with zabbix agent already) (it running out of memory and killed postrgresql process). I understood that i can get this logs to zabbix server, so the trigger will send me some messages about it.
    I made smth like in tutorials: ​​​. Also, my host was configured already with proper hostname (which is also correlates with hostname of my bad-working machine) also i changed zabbix-agent config and reloaded it.

    ``` ############ GENERAL PARAMETERS #################
    ### Option: PidFile
    # Name of PID file.
    #
    # Mandatory: no
    # Default:
    # PidFile=/tmp/zabbix_agentd.pid
    PidFile=/run/zabbix/zabbix_agentd.pid
    EnableRemoteCommands=1
    LogFile=/var/log/zabbix-agent/zabbix_agentd.log


    LogFileSize=2
    DebugLevel=5

    Server=serverlinux
    #DisablePassive=0
    #DisableActive=0
    ListenPort=10050

    ServerActive=serverlinux
    Timeout=30​
    #listenIP=serverlinux

    ​```

    ```
    37482:20231228:115242.411 Starting Zabbix Agent [staffcop-pc]. Zabbix 4.0.17 (revision a528a0a4bc).
    37482:20231228:115242.411 **** Enabled features ****
    37482:20231228:115242.411 IPv6 support: YES
    37482:20231228:115242.411 TLS support: YES
    37482:20231228:115242.411 **************************
    37482:20231228:115242.411 using configuration file: /etc/zabbix/zabbix_agentd.conf
    37482:20231228:115242.411 In zbx_load_modules()
    37482:20231228:115242.411 End of zbx_load_modules():SUCCEED
    37482:20231228:115242.412 In init_collector_data()
    37482:20231228:115242.412 In zbx_dshm_create() size:0
    37482:20231228:115242.412 End of zbx_dshm_create():SUCCEED shmid:-1
    37482:20231228:115242.412 End of init_collector_data()
    37482:20231228:115242.412 agent #0 started [main process]
    37493:20231228:115242.413 agent #1 started [collector]
    37493:20231228:115242.413 In init_cpu_collector()
    37493:20231228:115242.413 End of init_cpu_collector():SUCCEED
    37493:20231228:115242.413 __zbx_zbx_setproctitle() title:'collector [processing data]'
    37493:20231228:115242.413 In update_cpustats()
    37494:20231228:115242.414 agent #2 started[listener #1]
    37494:20231228:115242.414 In zbx_tls_init_child()
    37493:20231228:115242.414 End of update_cpustats()
    37493:20231228:115242.414 __zbx_zbx_setproctitle() title:'collector [idle 1 sec]'
    37494:20231228:115242.414 GnuTLS library (version 3.6.13) initialized
    37494:20231228:115242.414 End of zbx_tls_init_child()
    37494:20231228:115242.414 __zbx_zbx_setproctitle() title:'listener #1 [waiting for connection]'
    37497:20231228:115242.414 agent #5 started [active checks #1]
    37497:20231228:115242.414 In zbx_tls_init_child()
    37497:20231228:115242.414 GnuTLS library (version 3.6.13) initialized
    37497:20231228:115242.414 End of zbx_tls_init_child()
    37497:20231228:115242.414 In init_active_metrics()
    37497:20231228:115242.415 buffer: first allocation for 100 elements
    37497:20231228:115242.415 End of init_active_metrics()
    37497:20231228:115242.415 In send_buffer() host:'serverlinux' port:10051 entries:0/100
    37497:20231228:115242.415 End of send_buffer():SUCCEED
    37497:20231228:115242.415 __zbx_zbx_setproctitle() title:'active checks #1 [getting list of active checks]'
    37497:20231228:115242.415 In refresh_active_checks() host:'serverlinux' port:10051
    37496:20231228:115242.415 agent #4 started[listener #3]
    37496:20231228:115242.415 In zbx_tls_init_child()
    37496:20231228:115242.416 GnuTLS library (version 3.6.13) initialized
    37496:20231228:115242.416 End of zbx_tls_init_child()
    37496:20231228:115242.416 __zbx_zbx_setproctitle() title:'listener #3 [waiting for connection]'
    37495:20231228:115242.417 agent #3 started[listener #2]
    37495:20231228:115242.417 In zbx_tls_init_child()
    37495:20231228:115242.417 GnuTLS library (version 3.6.13) initialized
    37495:20231228:115242.417 End of zbx_tls_init_child()
    37495:20231228:115242.417 __zbx_zbx_setproctitle() title:'listener #2 [waiting for connection]'
    37497:20231228:115242.424 active check configuration update from [serverlinux:10051] started to fail (cannot connect to [[serverlinux]:10051]: [111] Connection refused)
    37497:20231228:115242.424 End of refresh_active_checks():FAIL
    37497:20231228:115242.424 __zbx_zbx_setproctitle() title:'active checks #1 [processing active checks]'
    37497:20231228:115242.424 In process_active_checks() server:'serverlinux' port:10051
    37497:20231228:115242.424 End of process_active_checks()
    37497:20231228:115242.424 In get_min_nextcheck()
    37497:20231228:115242.424 End of get_min_nextcheck():-1
    37497:20231228:115242.424 __zbx_zbx_setproctitle() title:'active checks #1 [idle 1 sec]'
    37493:20231228:115243.414 __zbx_zbx_setproctitle() title:'collector [processing data]'
    37493:20231228:115243.414 In update_cpustats()
    37493:20231228:115243.414 End of update_cpustats()
    37493:20231228:115243.414 __zbx_zbx_setproctitle() title:'collector [idle 1 sec]'
    37497:20231228:115243.424 In send_buffer() host:'serverlinux' port:10051 entries:0/100
    37497:20231228:115243.424 End of send_buffer():SUCCEED
    37497:20231228:115243.424 __zbx_zbx_setproctitle() title:'active checks #1 [idle 1 sec]'
    37496:20231228:115244.062 __zbx_zbx_setproctitle() title:'listener #3 [processing request]'
    37496:20231228:115244.064 Requested [system.cpu.num]​
    ```

    So what i also want to say: my zabbix runs in NAT, so only firewall of vm can block smth, but i unblocked 10050 and 10051.
    also, my zabbix server runs in kubernetis.(i just remade some docker settings to kubernetis it.)
    Passive info of all zabbix agents works fine (include this machine)
    Also i tried to change config file like Activeserver=myserver:10050
    and in logs there were another problem it said that zabbix cant parse active checks.
    Can u plz help me?
    BTW sry for bad eng/
  • MRedbourne
    Senior Member
    • Feb 2023
    • 103

    #2
    Can you post the OS information belonging to the Zabbix Agent and Zabbix Server?

    37482:20231228:115242.411 Starting Zabbix Agent [<REDACTED>]. Zabbix 4.0.17 (revision a528a0a4bc).
    Would seem that you're running an LTS that was EOL around 2 years ago. I'd suggest patching that. It no longer receives general or security updates.

    37497:20231228:115242.424 active check configuration update from [serverlinux:10051] started to fail (cannot connect to [[serverlinux]:10051]: [111] Connection refused)
    Your Zabbix server is refusing the connection. You mentioned it's running in NAT. You need to check two firewall policies potentially.

    1. Perimeter/Internal Corporate Firewall.
    2. Host firewall (ufw, iptables, firewalld)

    ufw status | grep -i "10051"
    firewall-cmd --list-all | grep -i "10051"
    iptables -L INPUT -v -n | grep -i "10051"

    If those don't return any results (you're likely to only have one of the apps installed on Zabbix Server), the host firewall is likely rejecting the connection. Check your perimeter firewalls as well for any traffic destined to TCP\10051. Note what the perimeter says. "Timeout" usually means either the host dropped it, or isn't listening. If the perimeter/internal firewall dropped the connection, have a look at the ACLs.

    Edit: Also check netstat.
    sudo netstat -tunlp | egrep -i "10050|10051"

    If netstat only shows one listening port (10050), Zabbix Server isn't configured correctly.
    Last edited by MRedbourne; 28-12-2023, 21:09.

    Comment

    • GillezDeleuze
      Junior Member
      • Dec 2023
      • 4

      #3
      Thank you for the fast reply and also sorry for long response.
      Also my zabbix server version is Zabbix 6.4.8 and it runs in kubernetis
      And you absolutely right, agent runs on 20.04 but i can't upgrade it for several reasons (
      and firewall is ok, but yeah seems that zabbix-server not configured properly cause
      ```
      root@serverlinux:/home/kubeadm/k_yamls/k_zabbix_mariadb# sudo netstat -tunlp | egrep -i "10050|10051"
      tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 3060077/zabbix_agen
      tcp6 0 0 :::10050 :::* LISTEN 3060077/zabbix_agen​

      ```
      Probably i should post my kuber settings to clearify it
      ```
      root@serverlinux:/home/kubeadm/k_yamls/k_zabbix_mariadb# cat zabbix-server-deploy.yaml
      apiVersion: v1
      kind: Service
      metadata:
      name: zabbix-server
      namespace: kube-system
      labels:
      app: zabbix-server
      spec:
      type: NodePort
      ports:
      - port: 10051
      targetPort: 10051
      nodePort: 30017
      protocol: TCP
      selector:
      app: zabbix-server

      ---

      apiVersion: apps/v1
      kind: Deployment
      metadata:
      name: zabbix-server
      namespace: kube-system
      labels:
      name: zabbix-server
      app: zabbix-server
      spec:
      replicas: 1
      revisionHistoryLimit: 3
      selector:
      matchLabels:
      name: zabbix-server
      strategy:
      rollingUpdate:
      maxSurge: 30%
      maxUnavailable: 30%
      template:
      metadata:
      labels:
      name: zabbix-server
      app: zabbix-server
      spec:
      hostname: zabbix-server
      volumes:
      - name: zabbix-storage
      persistentVolumeClaim:
      claimName: zabbix-pvc
      containers:
      - name: zabbix-server
      image: zabbix/zabbix-server-mysql
      imagePullPolicy: IfNotPresent
      resources:
      limits:
      cpu: 400m
      memory: 1024Mi
      requests:
      cpu: 100m
      memory: 100Mi
      ports:
      - containerPort: 10051
      env:
      - name: DB_SERVER_HOST
      value: "mariadb-server"
      - name: MYSQL_USER
      value: "xxx"
      - name: MYSQL_PASSWORD
      value: "xxx"
      - name: MYSQL_DATABASE
      value: "zxxx"
      - name: ZBX_CACHESIZE
      value: "1024M"
      #- name: TZ
      #value: "Asia/Shanghai"
      volumeMounts:
      - name: zabbix-storage
      mountPath: /usr/lib/zabbix/alertscripts
      - name: zabbix-storage
      mountPath: /usr/lib/zabbix/externalscripts​
      ```

      Comment

      • MRedbourne
        Senior Member
        • Feb 2023
        • 103

        #4

        Hey Mate,

        Unfortunately, I know little about K8s. Can you redact and post:
        • /etc/zabbix/zabbix_server.conf
        • /var/log/zabbix/zabbix_server.log
        Does your zabbix-server have AppArmour/AppArmor installed? If so, what do the policies look like?

        Restart the zabbix-server daemon and try running "ausearch -m avc -ts recent". Does that return anything?

        Comment

        • GillezDeleuze
          Junior Member
          • Dec 2023
          • 4

          #5
          Good evening!)
          So, in my kuber pod most of strings in /etc/zabbix/zabbix_server. and this file totally repeat servers settings from previous mesage) but here it is in .txt format)

          and also some logs (events with this agent on this time) but i guess its not about it, the meybe caused because i changed settings (disabled passive interface on this agent)
          215:20231228:115150.717 Zabbix agent item "proc.num" on host "staffcop-pc" failed: first network error, wait for 15 seconds
          216:20231228:115207.694 resuming Zabbix agent checks on host "staffcop-pc": connection restored
          212:20231228:115644.561 Zabbix agent item "system.cpu.load[all,avg5]" on host "staffcop-pc" failed: first network error, wait for 15 seconds
          216:20231228:115659.045 resuming Zabbix agent checks on host "staffcop-pc": connection restored
          214:20231228:115949.470 Zabbix agent item "system.users.num" on host "staffcop-pc" failed: first network error, wait for 15 seconds
          216:20231228:120005.218 resuming Zabbix agent checks on host "staffcop-pc": connection restored
          214:20231228:120142.760 Zabbix agent item "system.cpu.load[all,avg15]" on host "staffcop-pc" failed: first network error, wait for 15 seconds
          216:20231228:120158.381 resuming Zabbix agent checks on host "staffcop-pc": connection restored
          214:20231228:120249.616 Zabbix agent item "system.users.num" on host "staffcop-pc" failed: first network error,​

          ```
          That is
          ```
          AppArmour/AppArmor not on angent and not in kuber server
          kubeadm@serverlinux:~$ sudo ausearch -m avc -ts recent
          <no matches> zz0.64ws03dyyapzz​

          Comment

          • Markku
            Senior Member
            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
            • Sep 2018
            • 1781

            #6
            nodePort: 30017
            Based on some searching your Zabbix server service is now exposed on port 30017 for the outside world, not on the default port 10051, so you need to configure the new port on the active agent(s).

            Markku

            Comment

            • GillezDeleuze
              Junior Member
              • Dec 2023
              • 4

              #7
              Ok, Markku, i'll try to... Thanks

              Comment

              Working...