Ad Widget

Collapse

Node without metrics collection - Intermittent Problem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jmanuelrch
    Junior Member
    • Jan 2025
    • 5

    #1

    Node without metrics collection - Intermittent Problem

    Hi Zabbix community,

    I’m experiencing a recurring issue with approximately 30 nodes in my infrastructure, which trigger the alert "Nodo sin Recoleccion de Metricas." This problem occurs randomly and frequently on devices monitored via SNMP and Zabbix Agent.

    Here are the main findings and observed behaviors:
    1. Alert activation condition:
      • The alert is triggered when critical metrics such as CPU or virtual memory utilization are not received for a period of 10 minutes.
      • Although these metrics are polled every 5 minutes, most issues last only a few seconds.
    2. Behavior on SNMP-monitored devices:
      • During the downtime, not only are the critical metrics unavailable, but other metrics, such as network connection data across various interfaces, also fail to collect.
      • However, the SNMP availability metric remains stable and does not show any drops, ruling out SNMP connectivity issues.
    3. Behavior on Zabbix Agent-monitored devices:
      • A similar pattern is observed: multiple metrics fail to collect for short periods, but the nodes themselves do not appear to lose connectivity.

    I’ve performed a preliminary analysis but haven’t been able to pinpoint the root cause. I will attach screenshots illustrating:
    • List of the same problem and duration. (duration-zabbix-1.png)
    • Examples of missing metrics. (missing-metrics-zabbix-2.png | missing-metrics-zabbix-3.png)
    • Comparisons with the SNMP availability metric. (availability-snmp-zabbix-4.png)

    I would greatly appreciate your help in identifying possible causes, additional diagnostic steps, or configuration adjustments I could make to resolve this issue.

    Thank you in advance for your support.

    Best regards,
    Manuel Rodriguez
    Attached Files
  • Blevar
    Member
    • Jan 2025
    • 68

    #2
    Does this issue occur if you monitor less devices?
    Maybe try increasing Agent and SNMP Pollers. Look for
    Code:
    # StartAgentPollers=
    and
    Code:
    # StartSNMPPollers=
    in
    Code:
    /etc/zabbix/zabbix_server.conf
    <- uncomment and try fiddeling with the values.

    Comment

    Working...