Ad Widget

**otheus** · 29-07-2016, 17:36

Modest proposal

OK, so from the above, what I think Zabbix needs is a new architecture. This new architecture introduces the concept of Services that are akin to Hosts.

* An item may belong to either a Service or a Host.

* A Service is not permanently associated with a Host. It may be dynamically associated with 0 or more hosts.

* If associated with 0 hosts, the Service is said to be down, offline, or unavailable.

* Each Service, similar to hosts, may have one or more items. Such items would logically be related to the service, such as (for an HTTP service) the number of worker threads, the response time of a URL fetch operation, and log entries. The items should not know much about the host its running on, but may include things such as IP address, MAC address, hostname; but not things like OS load or CPU utilization -- these latter are the domain of the Host items.

* If a Service migrates from one Host to another, its metadata will be updated accordingly.

* For Service-autodiscovery, Zabbix will need to rely on external entities such as container registries.

* A special zabbix-agent may be needed to deal with such service-oriented monitoring. It might, for instance, detect and notify the Zbx server when a service has moved to its hosts.

**jan.garaj** · 31-07-2016, 01:27

Hard-coded service orientation can reach the limits in the future. I prefer metric/item tagging. For example:

Code:

- response.time [host=elasticsearch]
- response.time [service=elasticsearch]
- response.time [container=elasticsearch]
- response.time [host=elasticsearch,service=elasticsearch,container=elasticsearch]

I don't understand how inetd zabbix agent version can be used for docker monitoring. Can you explain it please?

**otheus** · 01-08-2016, 10:51

Hard-coded service orientation can reach the limits in the future. I prefer metric/item tagging. For example:

I don't understand at all the example you provide. Response.time? What does that mean? What are these tags? Is this an example of how Zabbix might be configured in the future?

I don't understand how inetd zabbix agent version can be used for docker monitoring. Can you explain it please?

It's an ugly hack. The idea isn't specific to Docker.

Context: Let's say that you have a cluster of (possibly virtual) hosts (as opposed to containers), each having their own fixed IP and each with a zabbix daemon running. Additionally, you'd have one or more service IPs that float with the system depending on where the clustering service places that service. Now, we want to monitor that service, but it's pointless to monitor it via the (for example) 4 fixed-IP hosts, but on the one host that currently holds the floating IP address. So we configure Zabbix to have a "host" identified by its service IP and which holds all items related to that particular service. (URL monitoring is an example of such an item.)

We can do this with the standard daemon-agent, provided we configure it to listen to 0.0.0.0. But what if, for some reason (security, isolation, etc), we don't want it to listen to 0.0.0.0 but to the fixed IP address of the server? It's not a problem in general to do this, but it is a problem for the floating, service IP -- now we need to configure a daemon to listen to an address that does not exist (most of the time, because it's on another server). The work-around is to use inetd/xinetd and to configure its filtering rules to hand off to the zabbix-agent for a request to the given service IP.

With Docker, this may become an obvious solution: Instead of building up a container that runs init and which includes a dedicated zabbix monitoring service (because, you know, Docker claims this is not the Docker way of doing things, and because, you know, systemd doesn't like to run inside a container, and because, you know, it's increasingly difficult to install services outside of systemd), the monitoring service can be handled nicely by xinetd. Every time the Zbx server makes a request to the agent via the service IP, xinetd steps in, launches the zabbix agent inside the container.

Caveat: I haven't actually tried this. I'm not even sure how the various Docker-clustering-wares really handle service IPs. This particular coalescence of problem-solution came about as I was playing around with Docker, trying to figure out its suitability for running certain applications within our datacenter. One of the models was to use LVM and keepalive to assign a service IP to one of the several servers, each running a docker container of the service, with the additional question of: how to run the monitoring service within the container as well. The problem is that the containers themselves cannot assign (or even know about) the Service IP: that must be handled by the containing OS.

**kloczek** · 01-08-2016, 16:02

Originally posted by otheus

OK, so from the above, what I think Zabbix needs is a new architecture. This new architecture introduces the concept of Services that are akin to Hosts.

* An item may belong to either a Service or a Host.

* A Service is not permanently associated with a Host. It may be dynamically associated with 0 or more hosts.

* If associated with 0 hosts, the Service is said to be down, offline, or unavailable.

* Each Service, similar to hosts, may have one or more items. Such items would logically be related to the service, such as (for an HTTP service) the number of worker threads, the response time of a URL fetch operation, and log entries. The items should not know much about the host its running on, but may include things such as IP address, MAC address, hostname; but not things like OS load or CPU utilization -- these latter are the domain of the Host items.

* If a Service migrates from one Host to another, its metadata will be updated accordingly.

* For Service-autodiscovery, Zabbix will need to rely on external entities such as container registries.

* A special zabbix-agent may be needed to deal with such service-oriented monitoring. It might, for instance, detect and notify the Zbx server when a service has moved to its hosts.

Nichil novi sub Sole (Latin: Nothing new under the Sun)

Your scenario it is typical scenario of monitoring multi node cluster with N>2.
What you should do is just organize dummy host and put on metrics of such host monitoring of you service.

I'm really surprised in how many cases people are thinking that things like containerisation, job processing or async processing it is something discovered only in last few years

No .. most of those things are around more years than some admins are on this Planet ..
Instead rediscovering the wheel more people should try to ask older SAs/SEs asking them how to deal with such dilemmas

**Alexei** · 03-08-2016, 11:02

I think that notion of 'host' (computational unit, container, whatever) will always present. Zabbix architecture is flexible enough to adopt to different use cases, I wouldn't expect any major paradigm shift in this space.

Zabbix 3.2 is introducing problem (event) tags and event correlation module that will bring eventually top level view on problems and services along with much flexible way of managing actions and top-level dependencies.

I believe we still miss good and well-understood way of defining applications (services), it will come in the future.

**kloczek** · 03-08-2016, 17:18

Alex ass author of the zabbix you know that everything is hooked on definition of some kid of new keys/monitoring

Perfect example here is web check (which may be similar to service). Each Web check adds more than one item with metrics definition. To be honest at the moment I don't see how to define such thing like high level service in a similar way as it is with web checks.
Only think which I see that it would be possible to define is generic system.service[<service>] key which depends on OS may deliver status of the <service>.
AFAIK such key is at the moment provided on Windows and I see some possibilities of extending definition of this key on other OSes or distros.
However even across different flavours of the Unices it is possible to add some more sophisticated variations like on Solaris is possible to add sampling start/stop service timeouts or number of restarts made automatically. However in case SMF (Service Management Facility) on Solaris it is possible to do tis over trapper items hooked in some core SMF infrastructure.
Theoretically similar systemd on Linux is very immature from point of view even status of the services or instances of the services compare to SMF.

**Alexei** · 03-08-2016, 17:45

I don't think services should be hard-linked to underlying resources as implemented for WEB checks.

The way I see it is to have low-level resources somehow loosely connected to business level services. It can be achieved by engaging tagging (coming in 3.2) that would allow more flexible relationship between IT Services and events. Well, let see where it goes.

**kloczek** · 03-08-2016, 21:12

Originally posted by Alexei

I don't think services should be hard-linked to underlying resources as implemented for WEB checks.

I've been only mentioning that as same as web checks some other classes of resources may have in future some special support to monitor/present state and IMO one of candidates may be high level services monitoring to provide kind of abstraction.

Example: On Linux lets say we have service like zabbix-agent and it would be good to know is this service is in state guarantee that after reboot it will be automatically started.

On RHEL6 you can check this using "/sbin/chkconfig --list zabbix-agent|cut -f5" checking do you have "3

n". On RHEL7 you can check did someone enabled this service using systemd commands. On other types of distributions such check can be done using other method. IMO key like system.service[] could be used on hiding such details making templates more portable on time scale and/or on moving between distributions.

In other words services monitoring it is not only something which presents current state of the running processes but as well history like "did service A been automatically restarted in lash 1h by systemd on Linux or SMF on Solaris because it crashed?" or state in future answering on questions like "service A will be automatically started after reboot or not?"
Other examples of the service related metrics:
- how long took systemd/SMF start Oracle listener before it has been reporting that it is in fully initialized state?
- messages send on stderr/stdout.

**otheus** · 10-08-2016, 22:48

Originally posted by Alexei

Zabbix architecture is flexible enough to adopt to different use cases, I wouldn't expect any major paradigm shift in this space.

@Alexi, More precisely, Zabbix's configuration can be hacked heroically to adopt to this use case. But what I'm strongly suggesting here is that the hacks are so ugly and inelegant, it begs the question : why Zabbix. One of Z's key benefits is its configurability in the Web GUI and its powerful templating system. Secondly, this is a problem that I think will envelop all monitoring systems -- and system admins -- and I'd personally like to see Zabbix ahead of the curve.

Originally posted by kloczek

Your scenario it is typical scenario of monitoring multi node cluster with N>2. What you should do is just organize dummy host and put on metrics of such host monitoring of you service.

@Klozcek, This is a hack. It's also one that doesn't scale very well. It also does not take into account some invalid assumptions.

I did discuss the multi node cluster problem. Your suggested solution requires that services be connectable on the routable network. It also requires that zabbix agent be listenable to that IP address. It also assumes that services are managed statically. In a docker world, these assumptions are not necessarily valid.

As I mentioned before: the docker community resists running containers within init. What they think is ideal is this scenario for one host running in a kubernettes environment:

Code:

 |-- pid 1822 docker container 1 running mysql "alpha" port 3306
 |-- pid 1823 docker container 2 running httpd "Aaron" port 80,443
 |-- pid 2010 docker container 3 running mysql "beta" port 3306
 |-- pid 2011 docker container 4 running httpd "Bob" port 80,443
 |-- pid 3430 docker container 5 running mysql "delta" port 3306 
 |-- pid 3431 docker container 6 running httpd "David" port 80,443

Each service is running in its own container. Each service can have the same port number because internally, they listen on different network interfaces. Somehow (undefined, because multiple possibilities exist) the services are multiplexed on the public network IP address (usually via different port numbers and a discovery service of some sort). Each service may suddenly moved between host A and host B. When it moves to host B, it may have (1) a different internal IP, (2) a different external IP/port, (2) a different container id. The service IP address is one managed by Kubernettes, swarm, or whatever. The standard way is to set up a reverse NAT path for the service IP to reach the inside container; docker itself does this. I see great difficulty in ensuring that docker redirects incoming connections to the zabbix agent.

To monitor HTTP, I want the following items:

Connection time to service ip/port
Number of processes running of that service
Number of lines in log file for that service
Custom item which extracts /extended-status (for Apache) from localhost
memory usage of all HTTP services

To monitor MySQL, I might have a 50 or so items that correlate to the various values from mysqladmin variables and related commands. One way to do this while avoiding firewall difficulties is with zabbix-triggers and zabbix-sender. Obviously, these are wrapped up in a script which will be run within the related container. However, there are also standard items:

Connection time to service ip/port
Number of kernel threads for mysql process
Memory usage of mysql process
Number of lines in slowqueries log

If the MySQL service suddenly dies, the related HTTP service(s) will also trigger alerts. Finally, service pairs might be created dynamically in response to load (dynamic resource allocation in clouds is part of the point of all this docker stuff).

Now, how do we configure zabbix (or anything, really) to monitor this?

Option 1

As @Kloczeck suggested: each service gets an IP and a host entry within zabbix config. To do this, the service must have a routable IP. Further: the zabbix agent must be listening on that IP and in the same container as the service. Without a sysinit, the admin must make sure (somehow) that when docker 1 is launched for mysql "alpha", it also launches a zabbix agent within that container. So our pseudo-process table looks like:

Code:

 |-- pid 1822 docker container 1 running mysql "alpha" port 3306
 |-- pid 1823 docker container 2 running httpd "Aaron" port 80,443
 |-- pid 1824 docker container 1 running zabbix-agentd "Aragorn" port 10050
 |-- pid 1825 docker container 2 running zabbix-agentd "Aardvark" port 10050
 |-- pid 2010 docker container 3 running mysql "beta" port 3306
 |-- pid 2011 docker container 4 running httpd "Bob" port 80,443
 |-- pid 2012 docker container 3 running zabbix-agentd "Balin" port 10050
 |-- pid 2013 docker container 4 running zabbix-agentd "Beatle" port 10050
 ...

As long as the IPs are routable and static (or discoverable via dynamic DNS or something), and the sysadmin has a really good knowledge of Docker, this won't be too hard to configure: For each service, the Zabbix admin creates a new host based on the relevant template. This solves most of the problems. However:

There is no (obvious) way to trace a problem VM to its "physical" host.
There is no way to create a dependency on the service's actual "physical" host.
How to handle dynamically created service-host sets? (Can this be done currently with Discovery? Even so, we'd have to set up host discovery for a relatively high frequency).

Option 2:

Templates with dynamic parameters so that multiple template sets can be added to a host? Items that can "move" from one "host" to another on demand? All of that sounds rather ugly.

Option 3:

A separate "service" hierarchy under which items can be assigned. Obviously this would need to be designed so it solves the problems/weaknesses above.

Zabbix 3.2 is introducing problem (event) tags and event correlation module that will bring eventually top level view on problems and services along with much flexible way of managing actions and top-level dependencies.

I believe we still miss good and well-understood way of defining applications (services), it will come in the future.

I don't see how tags help the problem of configuration. Can the tags be used to help the admin figure out which VM/host is the problem when a group of pseudo-service-hosts present probelems? Regardless, I'm glad you see a future direction here, and yes, you're right: we need to have it better defined.

Thanks for listening.

**kloczek** · 11-08-2016, 08:34

I did discuss the multi node cluster problem. Your suggested solution requires that services be connectable on the routable network. It also requires that zabbix agent be listenable to that IP address. It also assumes that services are managed statically. In a docker world, these assumptions are not necessarily valid.

Above assumption is valid only in case of using passive monitoring which i known that it does not well scales so sooner or later you wil stop sing passive monitoring.
In case of active monitoring none of the above is valid.
As long as you have even on the same IP (it does not nredd to be single host .. it can be even one IP with NAT gateway) you can have as many agents as you want because on query of srv/prx agent says "give me monitoring cfg for <host_A>".

SS* · 21-08-2016, 14:39

"The docker phenomenon."

"really surprised in how many cases people are thinking that things like containerisation"

from one viewpoint it is simple yet surprisingly effective, cognitive ease. The word Docker itself is easy to say and read, as some words are repeated a positive association is formed. As I recall Larry Ellison quoted he was surprised by how much the name of software affects the likelihood of its success.

How was it done before? I take it by this - http://blog.aquasec.com/a-brief-hist...to-docker-2016

1979.. chroot. Yep wasn't even born then.

Ad Widget

Future of Zabbix - service-oriented items

Future of Zabbix - service-oriented items

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment