Somewhat crazy idea, but has anyone tried out pure in-memory tables with zabbix for the history* tables and/or anything else that gets high transactional writes?
The data will get wiped, obviously, if the box reboots.
However, at Amazon we used something very similar to this kind of architecture where there were highly distributed boxes that did in-memory storage of 2 weeks worth of application datapoints. It wasn't used for trending and performance analysis, but it made the monitored datapoints cheap enough that the software developers could monitor as many points from their software as they wanted to, and there was no policing of the heaviness of anyone's monitoring. If a software team decided to monitor 100 data points per minute off any particular server (times the size of their deployed base of servers) the monitoring solution would basically scale to that just via scaling out RAM (there was a cost here, but nothing like the cost of scaling out IOPS to handle the load).
So, I've been wondering if something similar could be done with zabbix, and if anyone has tried it? I'd also setup a lighter weight monitoring of critical trending data (CPU, network, I/O, memory, etc) that would be persistent on separate servers.
I'm not sure how to handle upgrades -- it might be possible just to setup slaves and upgrade the offline one and fail backwards and forwards to keep the data from getting wiped.
Obviously there would need to be an understanding that the SLA for this monitoring data would have to allow for occasional complete dataloss (e.g. cooling failure events at the datacenter that cause machines to shutdown would involve dataloss) -- but that would be the cost for having "unlimited" items that could be monitored from software.
I haven't tried this out yet, but architecturally from 30,000 feet this is exactly the kind of monitoring infrastructure that Amazon uses internally (and they have *very* good monitoring and metrics compared to what I've seen after I left).
The data will get wiped, obviously, if the box reboots.
However, at Amazon we used something very similar to this kind of architecture where there were highly distributed boxes that did in-memory storage of 2 weeks worth of application datapoints. It wasn't used for trending and performance analysis, but it made the monitored datapoints cheap enough that the software developers could monitor as many points from their software as they wanted to, and there was no policing of the heaviness of anyone's monitoring. If a software team decided to monitor 100 data points per minute off any particular server (times the size of their deployed base of servers) the monitoring solution would basically scale to that just via scaling out RAM (there was a cost here, but nothing like the cost of scaling out IOPS to handle the load).
So, I've been wondering if something similar could be done with zabbix, and if anyone has tried it? I'd also setup a lighter weight monitoring of critical trending data (CPU, network, I/O, memory, etc) that would be persistent on separate servers.
I'm not sure how to handle upgrades -- it might be possible just to setup slaves and upgrade the offline one and fail backwards and forwards to keep the data from getting wiped.
Obviously there would need to be an understanding that the SLA for this monitoring data would have to allow for occasional complete dataloss (e.g. cooling failure events at the datacenter that cause machines to shutdown would involve dataloss) -- but that would be the cost for having "unlimited" items that could be monitored from software.
I haven't tried this out yet, but architecturally from 30,000 feet this is exactly the kind of monitoring infrastructure that Amazon uses internally (and they have *very* good monitoring and metrics compared to what I've seen after I left).

Comment