View Full Version : IT services/Sla always 100%. Why?
maurotb@libero.it
19-10-2007, 17:12
Hi,
i have setup sla whith a simple trigger
{ServerTest:net.tcp.port[,21].last(0)}=1
Parent Service Root
Status calculation algorithm none
no Service times.
Corretly,when the ftp server is down,in monitoring->IT SERVICES,
zabbix show me "Status:Disaster Reason:FTP server (TCP) Not running on LastFloor"
If i click on my trigger/service i show
2007.Oct.19 15:44:07 TRUE - 1.4 hours 1.4 hours 100%
2007.Oct.19 15:43:06 FALSE - 1 mins 1 mins 1.22%
2007.Oct.19 15:40:06 TRUE - 3 mins 1.4 hours 98.82%
2007.Oct.19 15:37:05 FALSE - 3 mins 4 mins 4.53%
2007.Oct.19 15:28:07 TRUE - 9 mins 1.6 hours 95.
Ok,this work but my sla is always 100%!!!
If i click on sla bars,problems is always 0d 0h 0m downtime is always 0d 0h 0m and sla is always 100%.
I have checked in db the table service_alarm
it have one record
servicealarmid=200 serviceid=28 clock=1192801386 value=0
no other record are present.
Please help me.
Thank's
I've also got this problem, every month or so I convince myself it must just be a simple configuration error on my part but each time it beats me. My correlating triggers have been active for reasonable lengths of time yet the SLAs in my IT services always show 100%.
Can anyone help? Is anyone successfully using the IT services feature?
John
michaeltje
08-11-2007, 09:06
Im experiencing the exact same problem, i tested this with a simple ping commando, any fixes would be appreciated
Hmm i don't see any problem, yet. Works fine to me. Created service, till now i have 1h 16m of problems, with SLA 99.24% :cool:
Tested on latest rev.
In my own testing I noticed the calculation stayed on 100% (and also only one record in the service_alarm table) until a new week started on Monday, then it started following the new triggers passing by.
Winfried
theologu
10-01-2008, 16:18
Aly, please tell us the steps that you performed, if it works for you, maybe we are doing something wrong.
I added 4 triggers, some of them have been on, but SLA is 100%
In IT Serv configuration, I created a node - "Web Appz".
Status calculation algorithm - Max of Childs
Show Sla - YES
Added leafs to that node.
"Apache", "Space", "CPU". For each i linked some trigger.
"Apache" trigger was ON.
In that branch we have problem(Apache), so by the time SLA of node "Web Appz" is getting worst.
theologu
11-01-2008, 11:50
Thank you Aly, now I can confirm it working.
Please tell us what is about Calculation, what means MIN of childs and MAX of childs and None ?
Also, what is the influence of "Service times" ? By default, if one does not add any service time, the service is considered allways active?
If I add for a service called "Postfix" linked with a trigger "Email server is down", and in Service times I set: One-time downtime from sunday 13:30 till sunday 14:00, that means that if Postfis is down in that period, the SLA is not altered?
Please tell us also what is the purpose of soft links.
Thank you!
Hi Aly,
Appreciate your comment on explanation of Max of child and Min of child, at least it will give us a clear understanding about that calculation since it missing from documentation (CMIIW)
Thanks
BEE
If I add for a service called "Postfix" linked with a trigger "Email server is down", and in Service times I set: One-time downtime from sunday 13:30 till sunday 14:00, that means that if Postfis is down in that period, the SLA is not altered?
Yes, you are correct.
Min/None/Max of Child - means which status of children will be set to parent node, MIN - it will selects minimal status, MAX - maximum, none - none of children status will be assigned to parent node.
can you more explain the min/max of child concept please ?
hardtofi
08-02-2008, 12:53
can you more explain the min/max of child concept please ?
There's a pretty nice explaination here (http://www.zabbix.com/forum/showthread.php?p=2991#post2991)
I too have some problems with the SLAs. I've set it up like this:
Trigger that goes on if response time of a page is > 1s
Then I did like this:
1. Parent node with no links to triggers, with show sla and MIN of children
2. Added a service to the parent, linked to the trigger
Both of these show 100% all the time altho the trigger goes on about once every 30-60 calls.
After reading the thread I linked to above I did a new try (without deleting the above non-working SLA check)
1. Made a service linked to the same trigger, with show sla and none as calculation
2. Made a parent, set it to depend on the above one and min as calculation
Now the parent shows 99.96% but the child shows 100%!? 99.96% can be right but I really can't figure out why the child (which must be from where the parents get the number in the first place) reports 100%
Can someone pretty please explain this weird behavior to me? Also why the first try (using the exact same trigger) reports 100% both in the parent and the children?
acucatti
08-02-2008, 14:35
Hi,
I did what was told before, but I´m still with the same problem. SLA is always 100%.
Any tips ?
Thanks
acucatti
08-02-2008, 15:36
I´m using zabbix 1.4.2 on Solaris 8.
I appreciate some help.
thanks again.
Folks, I think we just have to bite the bullet on this one: The WHOLE IT Services /SLA function of Zabbix is riddled with bugs to the point of it being unuseable. Not to mention that it's basic idea is wrong to begin with.
That explains the lack of documentation and responce from Alexei.
This "feature" should never have made it into the stable release, not in it's present state at least. :(
There's a pretty nice explaination here (http://www.zabbix.com/forum/showthread.php?p=2991#post2991)
Eleventh paragraph is wrong. Read my article about min/max/none.
acucatti
08-02-2008, 20:27
Thanks for your answer Aly and skogan,
I had already read those posts, but still doesn´t work. I think I´m not doing or something wrong but could be missing anything.
Do you know if there is a bug on version 1.4.2 ?
Do i have to wait at least one week to SLA work properly, like sugested by some earlier post ? I dont think so ...
Neither the parents or the children work.
There is some other tip or workaroud ?
thanks again
hardtofi
11-02-2008, 09:04
Can anyone explain this behavior (screen shot below)? It's related to my post earlier in this thread.
The parent is configured with:
Depends on the child (not soft linked(?))
Calculation alg: min of child
Show sla: yes
Acceptable: 99.8
No service times
No links to trigger.
The child:
No dependency
Show SLA: yes
Acceptable: 99.8
No service times
Link to trigger: yes
If I click the bar with percentages of the parent it shows 3h 17m of problems last week, if I click the bar of the child it shows 0 time problems.
Is there any logic here I'm not understanding? I would have suspected both to have the same percentage of error.
(*edit : I'm using zabbix 1.4.4 compiled from source)
SLA of parent service is calculated depending on previous SLA, so if there was some other service(as leaf), then you deleted it and added new one, SLA of parent node will not be calculated only depending on new added service, it's also calculates all previous SLA(parent's node SLA).
hardtofi
11-02-2008, 11:09
SLA of parent service is calculated depending on previous SLA, so if there was some other service(as leaf), then you deleted it and added new one, SLA of parent node will not be calculated only depending on new added service, it's also calculates all previous SLA(parent's node SLA).
Thanks for the reply but the problem is that the trigger that the child is depending on is firing off regularly so the number reported by the parent is probably true, while the child is wrong.
If I click on the link [1s SLA on search] in the child on the "IT Services" page will I get the list attached (croped since it's very long). And this certainly isn't 100%.
hardtofi
12-02-2008, 11:28
I've been looking at the table zabbix.service_alarms now and it seems rows (for some IT services) are not being inserted there when the triggers goes off, and thus the lack of correct data. I tried inserting a few rows manually and that affects the percentage.
What is it that triggers the zabbix_serverd to call DBadd_service_alarm and insert rows there? Is there a bug or does it depend on something I might have missed in my configuration?