I've got a bunch of aggregate checks for various pools of servers. In this case, each pool is 4 servers and among other things, I'm looking at load average and then doing the average in the aggregate using pretty much a copy/paste from the docs, like so:
grpavg["Servers Pool 34","system.cpu.load","last","0"]
Works great but one of the boxes in the pool is down, status "Not monitored". The "last" value it's been using for the past week is the last one recorded before that server went down. It makes sense in a literal sense, but I'd expect that the aggregate calculation would be to average just the 3 alive servers' loads, instead of averaging 3 alive and 1 dead server (with very out-of-date data).
Am I doing something wrong? I've been trying to wrap my head around a workaround (to have it average the 3 alive ones, leaving out the 4th dead one) but between aggregates and calculated checks, I can't find the magic bullet.
Thanks!
grpavg["Servers Pool 34","system.cpu.load","last","0"]
Works great but one of the boxes in the pool is down, status "Not monitored". The "last" value it's been using for the past week is the last one recorded before that server went down. It makes sense in a literal sense, but I'd expect that the aggregate calculation would be to average just the 3 alive servers' loads, instead of averaging 3 alive and 1 dead server (with very out-of-date data).
Am I doing something wrong? I've been trying to wrap my head around a workaround (to have it average the 3 alive ones, leaving out the 4th dead one) but between aggregates and calculated checks, I can't find the magic bullet.
Thanks!