Ad Widget

Collapse

Monitoring microservices

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jplarson
    Junior Member
    • Nov 2018
    • 4

    #1

    Monitoring microservices

    My company's operations department is using Zabbix to monitor servers, which is where Zabbix appears to excel. I am a software developer and want to add instrumentation into our software to manage a host of things. Due to past experience, I was going to feed this into Prometheus, but the ops team would prefer I feed Zabbix. I can understand their request, so I'm trying to fill it.

    I want to instrument things like:

    -Request latency across various portions of the system
    -Transactions per second, again, across various portions of the system
    -A few other transactional types of metrics, such as the ebb and flow of various microservice instances

    These sorts of things are never tied to a particular machine. I can't just monitor dev001 and dev002. There's a web server, which is behind a load balancer, so it might really be five webservers spread across 5 hosts. There's a redis cluster and a postgresql instance. On each web server will be one or more instances of our core application. There may also be other hosts, and somewhere all our microservices are running. Those guys come up for a while, do some work, and go away. Except during failure conditions, they tend to stay up for 30 minutes to two days. Once they go away, they're generally gone for good, as they are tied to various events in our system. (And in this case, an event is a real world event with a configured start and stop time.)

    I want to be able to track this kind of data for alerting if things start to fail, but I also want to track usage over time. This will help ops in machine sizing. It will also help dev if someone says, "Suddenly after last week's release, the system is slower." I'd love to be able to go back to all these statisics and compare performance over arbitrary units of time.

    Zabbix appears to be very host-based, but as a programmer, I don't care about hosts. I care about services / microservices. I care about latency. Operations cares about CPU and memory usage and page swaps, but I'm not here to instrument any of that.

    So my question is: can Zabbix help with what I'm trying to do. In searching the forums, there are only two hits against microservices, and neither of them was terribly useful.

    If Zabbix can help, can anyone give me pointers to how I should organize my data when I'm not dealing with hosts but instead with services? As far as I'm concerned, this could all run on a single host (like it does in development mode) or on 45. I don't care. I care about overally performance of my software and about identifying when I suddenly added a bottleneck that wasn't there last week.

    Maybe there's a good writeup on how to do this type of monitoring. Seaching the docs for services told me about monitoring windows services, which isn't remotely the same thing.

    Any pointers are appreciated.
  • Mike2K
    Member
    • Oct 2018
    • 62

    #2
    Is this a webservice ? If so, you should look at the webmonitoring...

    Comment

    • jplarson
      Junior Member
      • Nov 2018
      • 4

      #3
      I'm using the term "microservices" in an exceedingly loose fashion. Yes, we have web services, but a whole lot of what I want to monitor do not have any sort of HTTP interface, and they are transitory.

      To offer a little more context. We have events (real world events), and during those events, we have a number of processes that run to service the event. (We refer to them as agents.) We spawn those up prior to the start of the event and shut the down afterwards. We might have 100 events going on at a time.

      We also have conenctions into our web services.

      I want to track all the metrics associated with this so we can watch for chokepoints creeping into the system as well as load on the individual hosts. The first part is to help the programmers figure out when something needs performance tuning. The second is so ops can determine when to add more host instances.

      Host monitoring can tell you when you are spiking CPU or I/O or swapping, but that doesn't tell you if your latency is changing. I don't care if I'm running at 90% CPU. I care whether latency is increasing.

      So... I'm going to bring up agents associated with an event. They're going to work for a while then go away, never to be seen again. But I want to track the work they're doing. I don't know if I can look at data for an individual event. It may be we'll store that in our local DB. But I do want to track aggregate data. X number of this transaction happening within this timeslice, with an average / peak latency of La and Lp. Do the same thing with each type of transaction, plus aggregate all transactions, plus somehow aggregate end-to-end, which is a series of transactions.

      I don't know how much I can do with Zabbix. I've never used it, and I was on the edge of how we used Prometheus in my last position. (I was the boss, and it was one of my employees who figured all this out. I just looked at the pretty graphs and said, "Maybe we should figure out why that's ramping up over time".)

      Pointers?

      Is Zabbix the right tool? I'm willing to leave managing hosts to the ops team, and Zabbix seems well suited for that. I'm instrumenting software. Should I be looking at Prometheus instead, and we use them side-by-side?

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        Using zabbix you can monitor anything as long as exact metrics of monitored objects is possible to represent in form of numbers or strings.
        Using such metrics you can be even bunch of dwarfs .. (position/coordinates -> numbers, names -> strings)
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • ITOMDave
          Member
          • Nov 2018
          • 53

          #5
          Hi all, I was just looking through a few posts to see if I could help anywhere. I'm thinking that maybe an old technology may be able to help here.

          In the dim and distant past I've used Application Response Management API from the Open Group to do this kind of thing. As kloczek said, as long as the data is available then Zabbix can deal with it. The trick is getting the right data available at the the right time in the right place. There's some info here - yes I know it's very old and there are probably more modern solution around, but in the interests of community spirit this may help : https://en.wikipedia.org/wiki/Applic...se_Measurement

          Comment

          • andris
            Zabbix developer
            • Feb 2012
            • 228

            #6
            Zabbix 4.2 can collect data from Prometheus exporters: https://www.zabbix.com/documentation...pes/prometheus

            Comment

            • kedaly
              Junior Member
              • May 2019
              • 3

              #7
              So here's the problem.. A lot of MicroServices are running in Orchestration layers like Docker Swarm and Kubernetes Pods.. There is no way to poll them and you'll need an active agent to push the data out to Zabbix for individual instances of services

              So has anyone solved for this?

              Comment

              • Newuser2020
                Junior Member
                • Sep 2020
                • 2

                #8
                Hello all, I am trying to feed Zaabix metrics to Prometheus. Is it possible ? I do not see any documentation on the web regarding its integration. My boss do not want to have Zabbix 4.2 - Prometheus integration - https://blog.zabbix.com/zabbix-4-2-p...ation/7558/#10.
                Since bot h do monitoring , he wants to levearge prometheus as we can query there. ANy help will be much appreaciated.

                Comment

                Working...