PDA

View Full Version : SLA Explanation / Example


navtek007
16-05-2005, 16:50
Hi All,

this question has probably been asked a lot, but i still don't understand how the SLA configuration works. For example say we have a SLA that says the following: (dummy numbers)

Web Services 75% 8:30am - 5:30pm
Database Services 100% 6:30am - 6:30pm
Mail Services 100% 6:00am - 6:30pm

How would i implement this using the SLA config screen.

thanks in advanced.

Kepeke1967
23-05-2005, 12:16
Hello,

We have the same issue here: applying timeframes to certain SLA-monitoring-items. It doesn't seem to work using triggers from a template host.
Must be something we're doing wrong??

Any help would be very appreciated!
:)

navtek007
14-06-2005, 16:33
Ok its been a while since the last response so I will just stay with the first part of my question:

I still don't understand how to configure the IT Services side of things?

1 . Do I just create a service called "Database Services" then do I hard/soft link all my database related triggers to this service OR is each trigger created as a sub service eg "Database is Unreachable" is created as another service and then it is linked to the Database Services service?

2. Would I only add triggers that affect the uptime of the service, not ones like diskspace OR are certain trigger severity levels weighted differently when calculating the SLA level eg: Warning, Disaster etc...

3. What is the difference between a soft and hard link?

4. Could someone give me an example on how I could use this?

Sorry for the long post, but i feel i am missing out on this functionality in Zabbix.

thanks in advanced.

illumin8
14-06-2005, 21:21
I still don't understand how to configure the IT Services side of things?
I have the same issue. There is no documentation on how IT Services work that explains the difference between MAX or MIN calculation, hard and soft links, and just a quick tutorial on how to do it.

My problem is I've tried creating IT Services that are linked to a trigger. I make the trigger go True, then the SLA stays at 100%, and isn't adjusted like it should be. I must be configuring it incorrectly and I'm just not sure how. Any help you guys could give would be greatly appreciated.

illumin8
15-06-2005, 14:56
I have hammered on it for around 8 hours, creating IT Services, linking them to Triggers, everything, and no matter what I do, I can't get the SLA it's reporting to drop below 100%....

Edit: OK, I finally got it to drop below 100% by using soft links instead of hard. I still don't know what the difference is (perhaps someone could explain it to me). It seems to be accurately reporting SLA now, however, it would be great to have a small explaination of it.

I also found a bug (I think) in IT Service setup (affecting 1.0 and alpha 1.1): If you set an IT Service to MIN algorithm, you can't edit it again. When you try to edit it, the Algorithm field shows up blank (doesn't populate from the database), and you can no longer update the entry. You have to delete it and start over.

navtek007
16-06-2005, 09:25
Alexei would you be able to help us all out here, there have been so many threads about this being a grey area in Zabbix. Everything else works great and i'm sure this works well however i don't think a lot of people know how to use it properly. Some definitions and examples would be a great start I think.

Thanks in advanced.

illumin8
17-06-2005, 20:55
Ok, I think I've got it figured out and I'm hoping this will help others that need to setup SLAs. This is how I did it:

First of all, a little background. I have about 100 hosts that I'm monitoring with simplecheck icmpping. They don't have any Zabbix agents installed, and I'm only monitoring them with pings to track server uptime. Each host has only 1 item and 1 trigger called "Cannot ping {HOSTNAME}".

Now, how to setup IT Services:


Login to Zabbix
Click on Configuration
Click on IT Services
Click on the drop-down next to Trigger, and choose your host group, then host, then the trigger that is the basis for forming the SLA. In my case, the trigger I am monitoring is "Cannot ping {HOSTNAME}". I'm using linked templates, so it actually expands the trigger to show the real hostname.
Check the two boxes that say "Show SLA", and "Link to trigger".
Type in the acceptable SLA percentage. Leave the service name blank. It will automatically use the name of the trigger anyway.
Click the Add button.
Repeat the above procedure until you have added all the triggers that might calculate into an SLA. For me, this is only 1 trigger per server, but you might have many more you want to track.
Now, you're going to create a Parent service that will hold all of the child triggers you just added SLAs for.
Type in a name for the parent service. This might be something like "Oracle Database Server".
Choose a status calculation algorithm. Here is an explanation of what the two options, MAX or MIN will do: Use the MAX algorithm for things like a farm of web servers, where if one or two are down, the service is still fine. Things that are load-balanced or clustered like web servers are prime candidates for MAX. Use the MIN algorithm for services that are not clustered or load-balanced. For example, if you're tracking a number of conditions on a single server, like "Server load is below a certain level" and "Response time is good", you want to use MIN, because if any one trigger gets set to True, you want it to mark the service as down.
Set the SLA percentage to what you want.
Do not check the "Link to trigger" checkbox.
Click the Add button.
Now that you've created the parent service, you need to link it to the child services (triggers) that we created above.
Click on the Parent service in the list of services. Now you will be taken to a similar screen, but you are editing just that service, instead of the master list of all services.
In the "Link to" section, choose the child trigger to link the parent service to. I chose to use a soft link, since that's the default, but I still don't know what the difference is. If somebody wants to clarify that, I would greatly appreciate it.
Click the add link button.
Continue until you have added all of the child triggers you want to that service.
Parent services can also be child services of other parent services. This allows you to create hierarchies.


Ok, so now that I've explained all of this, I thought I would give you a more real-world example of how I used it.


We have two data centers. We'll call them "East Coast" and "West Coast".
Each data center has three web servers. We'll call them "eastweb1", "eastweb2", "eastweb3", "westweb1", "westweb2", and "westweb3".
Each web server has a trigger defined that will be set if it goes offline or crashes.
The first thing we do, is create 6 child services. One child service for each web server. Each child service is created by linking it to the trigger for that web server.
Next, we create two parent services, one called "Web Servers - East Coast", and one called "Web Servers - West Coast". Each one of these is created using MAX algorithm, since if any one server is up, the service is considered operational.
Now we link "eastweb1", "eastweb2", and "eastweb3" to the parent "Web Servers - East Coast", and link "westweb1", "westweb2", and "westweb3" to "Web Servers - West Coast".
Now, we create another parent service called simply "Web Servers", again using MAX algorithm, since even if we lose an entire data center, as long as the other data center is operational, the web service is still considered up.
We link "Web Servers - East Coast" and "Web Servers - West Coast" to the parent service "Web Servers".


You should be all set at this point. When you go into the IT Services View screen, you'll see just "Web Servers" listed as a service, along with the SLA. If you click on Web Servers, you can drill down and see availability for both East Coast and West Coast. If you click on either of those, you can see availability on an individual service basis.

So you can see how easy it is to create hierarchies of services with this. It seems to be working for me. I hope this tutorial helps someone else.

navtek007
18-06-2005, 13:09
Great! That is exacltly what i wanted to know. Good example of the MAX/MIN functionality as well.

illumin8
15-12-2005, 17:12
Will somebody please sticky this thread? Even though I've had to figure all of this out from trial and error, it is still the BEST and ONLY documentation on the web that actually explains how to setup IT Services in Zabbix.

IT Services is probably the most requested, and yet least usable and documented feature in Zabbix.

It would be nice to see Alexei give some type of official explanation of how IT Services is supposed to work.

Wolfgang
15-12-2005, 18:53
@illumin8

Thank you very much for putting together how SLAs are to setup :-)

crs9
09-03-2006, 20:36
I still having a few problems with SLA and not sure where I'm going wrong. using beta7
1) I create my triggers to calculate my SLA.
2) I then create my parent
3) I then soft link my triggers to the parent
4) I can go back into the config and see the parent as service 1 and the triggers as service 2
5) Strange thing is I click on a trigger in the config and within a trigger's config, the trigger statement is set at default. Meaning is says "all" and "select host...", is this correct?
6) Second strange thing is when I go to monitor IT Services, I click on the parent device and it drills down, but nothing is on the next page, which tells me I'm certainly missing a step.

Can anyone shed some light on the step I'm missing?

Thanks

herr_bpl
28-03-2006, 12:13
I just can confirm, no matter how i will try to make service(s) and play with triggers, SLA stands proudly on 100%. Impressive for management but absolutely not reliable :'(

Hope, next version and feature freeze will shed some light upon it...

Alexei
28-03-2006, 12:54
I just can confirm, no matter how i will try to make service(s) and play with triggers, SLA stands proudly on 100%. Impressive for management but absolutely not reliable :'(

This is because SLA updates its first status on a trigger change. When adding new service, or linking a services to a trigger, SLA is OK be default.

This is to be changed.

ghislain
28-03-2006, 16:51
is that article ok with you Alexei ? perhaps if the info are good it can ba added to the wiki ?

regards,
Ghislain. :D :rolleyes:

axel
04-05-2006, 09:54
Hi i have Problems with IT Services too.

I set up like this.

- Routers MIN - SLA 100%
- Trigger (Ping Router1) MIN - SLA 100%
- Trigger (Ping Router2) MIN - SLA 100%
- Trigger (Ping Router3) MIN - SLA 100%

But if i Check the Availability report of Router2 i have 98.9177% .

I use Zabbix 1.0 too and there it services works ok.

For Example if Trigger (Ping Router1) is ON nothing happen on the IT Service site. The Services told me that everything is OK .

May be i dont know what IT Service should do ;)

Please help thx



SFM-Router OK - - Show
[TRIGGER] PING VPN Router1 OK - 99.05%/100.00% Show
[TRIGGER] PING VPN Router2 OK - 99.05%/100.00% Show

Robert Wagnon
02-02-2008, 05:26
I guessed a similar instruction set as the one listed, I've performed the listed instructions exactly, and experimented with the system.

However, the SLA values never drop below 100%.

I've used a Web Slow trigger on the response time of a Web Step. I set the trigger to 1 second and determined that 50% of the time, we are slower than 1 second. The trigger constantly flips (ON/OFF), but the SLA is 100%.

Hm....

Any suggestions?

Aly
04-02-2008, 10:17
Check this thread (http://www.zabbix.com/forum/showthread.php?t=8129)...

banderas20
01-07-2009, 12:09
I have hammered on it for around 8 hours, creating IT Services, linking them to Triggers, everything, and no matter what I do, I can't get the SLA it's reporting to drop below 100%....

Edit: OK, I finally got it to drop below 100% by using soft links instead of hard. I still don't know what the difference is (perhaps someone could explain it to me). It seems to be accurately reporting SLA now, however, it would be great to have a small explaination of it.

I also found a bug (I think) in IT Service setup (affecting 1.0 and alpha 1.1): If you set an IT Service to MIN algorithm, you can't edit it again. When you try to edit it, the Algorithm field shows up blank (doesn't populate from the database), and you can no longer update the entry. You have to delete it and start over.

But the matter is, ¿how do I set up soft links in a lowest level IT Service, (linked to a trigger)?

Thanks in advance!

wilfrik2003
14-01-2011, 19:40
The issue I am having is ...
How can i configure the SLA such that each zabbix user can only see the sla of the group that he has access to. i have about 8 users with limited access to the group assigned to them and i dont want them to see the SLA of other users except thiers...how can i add this restriction.

your help would be appreciated.