I have two metrics, one is the running total of data used on a piece of networking equipment, and the other is a rate() derived from that total usage. For some reason, the rate() will randomly return massive spikes up into the 200+ Gbps range when the underlying data barely changed.
For example, here is the underlying data for a sample time range when the error occured:
And here is the spike from the exact same time range:
And here are the relevant configs for each of the metrics as well.

Anyone know what could be causing these massive spikes? The only thing I can think of is the interplay between the check interval (10s) and the window the rate() function is using (20s), but I don't know enough about the underlying rate() function to be sure, or to know how to resolve it.
Thanks!
For example, here is the underlying data for a sample time range when the error occured:
Anyone know what could be causing these massive spikes? The only thing I can think of is the interplay between the check interval (10s) and the window the rate() function is using (20s), but I don't know enough about the underlying rate() function to be sure, or to know how to resolve it.
Thanks!
Comment