Ad Widget

Collapse

Monitoring a service in Linux

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • NewUser1
    Member
    • Oct 2022
    • 36

    #1

    Monitoring a service in Linux

    Hello

    I am trying to monitor the dhcp service running on two of my linux server. I created a new item and used the key "proc.num[<name>,<user>,<state>,<cmdline>,<zone>]​" after that I tried creating a trigger and that is where I ran into a problem.

    Here is the expression for the trigger
    Code:
    func(/Linux by Zabbix agent/proc.num[isc-dhcp-server,,run,,])=0
    Error:
    • Invalid parameter "/1/expression": unknown function "func".
    ​When I changed the function to min and add ",15m"
    Code:
    min(/Linux by Zabbix agent/proc.num[isc-dhcp-server,,run,,],15m)=0
    Error:
    • Incorrect item key "proc.num[isc-dhcp-server,,run,,]" provided for trigger expression on "Linux by Zabbix agent".


    My goal is to be alerted when the DHCP service is not running.
    Attached Files
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    #2
    "Incorrect item key" --> do you have an item configured with that key? (You should have)

    Markku

    Comment

    • NewUser1
      Member
      • Oct 2022
      • 36

      #3
      Originally posted by Markku
      "Incorrect item key" --> do you have an item configured with that key? (You should have)

      Markku
      I have an item created with key "proc.num["isc-dhcp-server",,run]"

      I click to create a trigger where the 3 dots are.
      Attached Files
      Last edited by NewUser1; 30-12-2022, 22:44.

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #4
        proc.num["isc-dhcp-server",,run]
        vs.
        proc.num[isc-dhcp-server,,run,,]

        They are not the same, hence your "Incorrect item key" error.

        Markku

        Comment

        • NewUser1
          Member
          • Oct 2022
          • 36

          #5
          Originally posted by Markku
          proc.num["isc-dhcp-server",,run]
          vs.
          proc.num[isc-dhcp-server,,run,,]

          They are not the same, hence your "Incorrect item key" error.

          Markku
          Okay I fixed it here is the change I made,

          min(/server/proc.num["isc-dhcp-server",,run],5m) <= 0

          Does this mean that if the service is not running for more than 5 mins I will get an alert for it?
          Last edited by Alex.S; 04-01-2023, 09:21.

          Comment

          • Markku
            Senior Member
            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
            • Sep 2018
            • 1781

            #6
            Let's test it: imagine the last values are (in one-minute intervals):

            1
            1
            0
            1
            1

            Your min() expression it will result in 0 (the minimum of those five values), which you compare to 0. --> Your statement is true (0 <= 0), so a problem is raised. That's not what you wanted, right?

            Better would be using max(). Using the same five values it will result in value 1 (the maximum of those five values) --> 1 <= 0 is not true, so there is no problem yet.

            If all the five values are 0 (0, 0, 0, 0, 0), then their maximum is 0 --> your statement is true: 0 <= 0 --> problem is raised, just like you wanted.

            To simplify it a bit, there is no need to check for negative values (number of processes cannot be negative, can they?):

            max(/server/proc.num["isc-dhcp-server",,run],5m) = 0

            Markku
            Last edited by Alex.S; 04-01-2023, 09:21.

            Comment

            • NewUser1
              Member
              • Oct 2022
              • 36

              #7
              Originally posted by Markku
              Let's test it: imagine the last values are (in one-minute intervals):

              1
              1
              0
              1
              1

              Your min() expression it will result in 0 (the minimum of those five values), which you compare to 0. --> Your statement is true (0 <= 0), so a problem is raised. That's not what you wanted, right?

              Better would be using max(). Using the same five values it will result in value 1 (the maximum of those five values) --> 1 <= 0 is not true, so there is no problem yet.

              If all the five values are 0 (0, 0, 0, 0, 0), then their maximum is 0 --> your statement is true: 0 <= 0 --> problem is raised, just like you wanted.

              To simplify it a bit, there is no need to check for negative values (number of processes cannot be negative, can they?):

              max(/server/proc.num["isc-dhcp-server",,run],5m) = 0

              Markku
              This is giving a false positive the services is on but we are getting an alert it is off.
              Last edited by NewUser1; 03-01-2023, 23:05.

              Comment

              • Markku
                Senior Member
                Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                • Sep 2018
                • 1781

                #8
                Originally posted by NewUser1
                This is giving a false positive the services is on but we are getting an alert it is off.
                At the time that is happending, what does Latest data say during last 5 minutes?

                Hmm, I don't think you should have "run" there at all, it requires that the process is actively executing during the checks, and as a background process that's not really happening, right?

                Markku

                Comment


                • NewUser1
                  NewUser1 commented
                  Editing a comment
                  latest data says 0 for the past 5 minutes. I don't know
                  All I want is a a problem to show up when the service is off and then I want the problem to resolve when the services I back up. this is on a linux machine. So far it hangs up on Resolving. seems like "run" and "trace" don't have values attached to them for me to use comparison at the end of the expression.
              • Markku
                Senior Member
                Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                • Sep 2018
                • 1781

                #9
                Remove "run" and the commas before it and report back how it goes. It will then count the number of existing processes.

                Markku

                Comment

                • NewUser1
                  Member
                  • Oct 2022
                  • 36

                  #10
                  Originally posted by Markku
                  Remove "run" and the commas before it and report back how it goes. It will then count the number of existing processes.

                  Markku
                  I removed it and now the alert triggered again saying the service is not running but that is not true. I can see the service running. here is the current trigger.

                  max(/server/proc.num["isc-dhcp-server"],5m) = 0

                  Comment

                  • Markku
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                    • Sep 2018
                    • 1781

                    #11
                    Are you sure you even have a process called "isc-dhcp-server"?

                    Use ps aux command to see the list of processes.

                    Markku

                    Comment

                    • NewUser1
                      Member
                      • Oct 2022
                      • 36

                      #12
                      Originally posted by Markku
                      Are you sure you even have a process called "isc-dhcp-server"?

                      Use ps aux command to see the list of processes.

                      Markku
                      well I figured it out, your question made me go down the rabbit hole. I was monitoring the "service" not the "process." I went ahead and used this expression min(/server/proc.num["dhcp"],5m) = 0

                      Comment

                      Working...