Ad Widget

Collapse

Monitoring/Alert on systemd mounts

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jhboricua
    Senior Member
    • Dec 2021
    • 113

    #1

    Monitoring/Alert on systemd mounts

    This is a rather lengthy post but I'm trying to provide as much info as possible.

    We use systemd mounts on several of our linux aws instances, targeting either EFS or CIFS network shares. We have run into some instances where a server fails to mount the network share due to various reasons. The problem is that we currently don't have a way to get alerts when these network shares are down on the clients so we are often reacting after a consumer brings it up to our attention. We currently use the 'Systemd by Zabbix agent 2' template on our linux hosts to monitor systemd services. So I set to out to see how to leverage that template for systemd mounts. The template, in its current incarnation, discovers Systemd services and socket units. So first order was to discover mount units. I was able to figure out that one fairly quick and modeled the discovery after the service unit discovery, so now I have a discovery created for mount units. The first item prototype I setup in this discovery is targeting the following key in order to get the zabbix raw items:
    Code:
    systemd.unit.get["{#UNIT.NAME}",Mount]
    Zabbix discovers the mount units and gets the following raw data (formatted for easy viewing):
    Code:
    {
      "AmbientCapabilities": 0,
      "AppArmorProfile": [false, ""],
      "BlockIOAccounting": false,
      "BlockIODeviceWeight": [],
      "BlockIOReadBandwidth": [],
      "BlockIOWeight": 18446744073709551615,
      "BlockIOWriteBandwidth": [],
      "CPUAccounting": false,
      "CPUAffinity": "",
      "CPUQuotaPerSecUSec": 18446744073709551615,
      "CPUSchedulingPolicy": 0,
      "CPUSchedulingPriority": 0,
      "CPUSchedulingResetOnFork": false,
      "CPUShares": 18446744073709551615,
      "Capabilities": "",
      "CapabilityBoundingSet": 18446744073709551615,
      "ControlGroup": "/system.slice/mnt-efs.mount",
      "ControlPID": 0,
      "Delegate": false,
      "DeviceAllow": [],
      "DevicePolicy": "auto",
      "DirectoryMode": 493,
      "Environment": [],
      "EnvironmentFiles": [],
      "ExecMount": [
        [
          "/bin/mount",
          [
            "/bin/mount",
            "<filesystemID>.efs.us-east-1.amazonaws.com:/",
            "/mnt/efs",
            "-t",
            "efs",
            "-o",
            "rw,user"
          ],
          false,
          1657913026954499,
          72937867165,
          1657913028376711,
          72939289376,
          29696,
          1,
          0
        ]
      ],
      "ExecRemount": [],
      "ExecUnmount": [],
      "Group": "",
      "IOScheduling": 0,
      "IgnoreSIGPIPE": true,
      "InaccessibleDirectories": [],
      "KillMode": "control-group",
      "KillSignal": 15,
      "LazyUnmount": false,
      "LimitAS": 18446744073709551615,
      "LimitCORE": 18446744073709551615,
      "LimitCPU": 18446744073709551615,
      "LimitDATA": 18446744073709551615,
      "LimitFSIZE": 18446744073709551615,
      "LimitLOCKS": 18446744073709551615,
      "LimitMEMLOCK": 65536,
      "LimitMSGQUEUE": 819200,
      "LimitNICE": 0,
      "LimitNOFILE": 4096,
      "LimitNPROC": 31448,
      "LimitRSS": 18446744073709551615,
      "LimitRTPRIO": 0,
      "LimitRTTIME": 18446744073709551615,
      "LimitSIGPENDING": 31448,
      "LimitSTACK": 18446744073709551615,
      "MemoryAccounting": false,
      "MemoryCurrent": 18446744073709551615,
      "MemoryLimit": 18446744073709551615,
      "MountFlags": 0,
      "Nice": 0,
      "NoNewPrivileges": false,
      "NonBlocking": false,
      "OOMScoreAdjust": 0,
      "Options": "rw,nosuid,nodev,noexec,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.134.37.172,local_lock=none,addr=10.134.37.105,user",
      "PAMName": "",
      "PassEnvironment": [],
      "Personality": "",
      "PrivateDevices": false,
      "PrivateNetwork": false,
      "PrivateTmp": false,
      "ProtectHome": "no",
      "ProtectSystem": "no",
      "ReadOnlyDirectories": [],
      "ReadWriteDirectories": [],
      "RestrictAddressFamilies": [false, []],
      "Result": "success",
      "RootDirectory": "",
      "RuntimeDirectory": [],
      "RuntimeDirectoryMode": 493,
      "SELinuxContext": [false, ""],
      "SameProcessGroup": true,
      "SecureBits": 0,
      "SendSIGHUP": false,
      "SendSIGKILL": true,
      "Slice": "system.slice",
      "SloppyOptions": false,
      "SmackProcessLabel": [false, ""],
      "StandardError": "inherit",
      "StandardInput": "null",
      "StandardOutput": "journal",
      "StartupBlockIOWeight": 18446744073709551615,
      "StartupCPUShares": 18446744073709551615,
      "SupplementaryGroups": [],
      "SyslogIdentifier": "",
      "SyslogLevelPrefix": true,
      "SyslogPriority": 30,
      "SystemCallArchitectures": [],
      "SystemCallErrorNumber": 0,
      "SystemCallFilter": [false, []],
      "TTYPath": "",
      "TTYReset": false,
      "TTYVHangup": false,
      "TTYVTDisallocate": false,
      "TasksAccounting": false,
      "TasksCurrent": 18446744073709551615,
      "TasksMax": 18446744073709551615,
      "TimeoutUSec": 90000000,
      "TimerSlackNSec": 50000,
      "Type": "nfs4",
      "UMask": 18,
      "User": "",
      "UtmpIdentifier": "",
      "What": "<filesystemID>.efs.us-east-1.amazonaws.com:/",
      "Where": "/mnt/efs",
      "WorkingDirectory": ""
    }
    From there I created dependent item prototype to grab the value of the mountpoint from 'Where' in the JSON output above and that correctly gets discovered and populated in the Zabbix client, in the case above as /mnt/efs.

    The thing I'm struggling right now is the trigger to alert on. Unlike the Systemd service units, the raw output returned by the Systemd mount units doesn't have values for the active state of the unit, that could tell me if the unit is running or not. I was hoping I could do something like running a stat command such as:
    Code:
    stat -f --format="T%" <path from mountpoint value in discovered dependent item>
    and trigger an alert if the value returned is anything but cifs or nfs. However I'm having a hard time understanding how to accomplish this in Zabbix even after extensive searches, as this needs to happen on the monitored instance. Hence my post.
    Last edited by jhboricua; 04-08-2022, 21:18.
  • jhboricua
    Senior Member
    • Dec 2021
    • 113

    #2
    I managed to come with a solution for our environment and want to share it here in case someone finds themselves in the same position. At work we currently run the following for Zabbix.
    • Zabbix 5.0 server
    • Zabbix Agent2 on monitored Windows/Linux systems.
    • Systemd by Zabbix Agent 2 Template for monitoring Systemd Services
    I added two discoveries to the Systemd by Zabbix Agent 2 template. One for systemd mounts and another for systemd automounts.

    Click image for larger version  Name:	image.png Views:	1 Size:	60.0 KB ID:	450119

    Here's the Systemd mount discovery, the automount is the same except the key is systemd.unit.discovery[automount]:




    I also added the appropriate named macros for these in the template and targeted the name for the mount and automount units I want to have discovered (mnt-.*.mount and mnt-.*.automount). Depending on which Linux server you run, there are other systemd mount units created by the system by default that you might not want to monitor. By using the filters mnt- as the start of the filter I'm targeting systemd mounts created by us. This will also set the discovery to only discover the automount or mount service if its set to enabled. So if you only use mount units (without automount), then those will be discovered. If you use automount units in conjuction with mount units, then only the automount units are discovered, since the mounts are not enabled by default, they are triggered to start by the automount unit instead.

    Click image for larger version

Name:	image.png
Views:	1706
Size:	690.9 KB
ID:	450129

    I then created the following item prototypes in the systemd mount discovery. I did the same for the automount item discovery, targeting the appropriate key. The only difference with the automount one is that it doesn't have a target property so it will only have three items vs the four you see below for the mount units.

    Click image for larger version  Name:	image.png Views:	1 Size:	59.3 KB ID:	450120

    The get unit info item prototype I use to gather some info to display in Zabbix via the dependent items, like the path where the network share is mounted at. On the mount unit I also gather what is being mounted, meaning the Windows share UNC path or the NFS/EFS target. This I do just for informational purposes.

    The item that I use to monitor the actual mount is the Mount state item. Here I'm targetting the key 'systemd.automount.active_state' or 'systemd.mount.active_state'. That key is being populated by two UserParameters I setup on my Linux hosts. These are:

    Code:
    UserParameter=systemd.mount.active_state[*],awk -F"=" '/Where.*/ {print $$2}' < "/etc/systemd/system/$1" | xargs -I{} stat -f --format="%T" {} | grep -c 'cifs\|nfs\|smb'
    UserParameter=systemd.automount.active_state[*],awk -F"=" '/Where.*/ {print $$2}' < "/etc/systemd/system/$1" | xargs -I{} stat -f --format="%T" {} | grep -c 'cifs\|nfs\|smb'
    The item prototype will pass the discovered systemd mount or automount name as the $1 parameter. The awk command parses out what is after the 'Where' line in the systemd configuration file for the automount/mount file. This is the path where the share is mounted. It then pipes it to the stat command via xargs and uses grep to check if the value returned is either cifs, smb2/smb3 or cifs. If that is true, the output of the command is 1. If not, it's 0. The intent here is to stat the path where network share gets mounted on. If the value returned is say ext4 or xfs or anything other filesystem but Cifs/SMB/NFS, then we are going to assume that the share is not mounted.

    Now onto the trigger. In our case, we want to be alerted if the value of the active_state item is not 1 for 5 straight minutes.

    Click image for larger version  Name:	image.png Views:	1 Size:	72.1 KB ID:	450121

    We used Chef to deploy the two UserParameter keys to all our Linux systems as an override file in the /etc/zabbix/zabbix_agent2.d folder. So far everything is working great and I learned a great deal in the process. But I wanted to share this for anyone who might be trying to accomplish the same.
    Attached Files
    Last edited by jhboricua; 20-08-2022, 18:49.

    Comment

    • jhboricua
      Senior Member
      • Dec 2021
      • 113

      #3
      I'm attaching the modified template. It is for the 5.0 release of Zabbix but can be easily used as a reference for modifying the template for newer releases. The original can be found at: https://git.zabbix.com/projects/ZBX/...at=release/5.0

      Attached Files

      Comment

      • jhboricua
        Senior Member
        • Dec 2021
        • 113

        #4
        And here is what it looks like on a host after discovery:

        Click image for larger version

Name:	image.png
Views:	1569
Size:	58.6 KB
ID:	450126

        Comment

        • jhboricua
          Senior Member
          • Dec 2021
          • 113

          #5
          Mods, feel free to move this post to the Zabbix Cookbook subforum if you feel it is more appropriate for it to be there.

          Comment

          • jankbroke
            Junior Member
            • Jan 2024
            • 1

            #6
            I just created an account here so I could say thank you for posting this, this is exactly what I have been looking for.

            Comment

            • jhboricua
              Senior Member
              • Dec 2021
              • 113

              #7
              Originally posted by jankbroke
              I just created an account here so I could say thank you for posting this, this is exactly what I have been looking for.
              Glad to hear you found it useful. I have an improved version of the template after learning more about discoveries and applying some of the stuff I learned from the Zabbix training courses. It no longer requires User Parameters to get the state and type of the mounts. Enjoy.
              Attached Files

              Comment

              Working...