How to Manage Your Operational Data Efficiently

I’m frequently asked how long you should keep operational data. It makes sense. Holding onto that data can get really expensive. Below are the two different cases I walk through in this video.

Case One: Real-Time Event Debugging

You need data because there's an event going on. The data you want is real time data - up to the last hour or the last second - at per-second granularity. This is the data you need when you have to debug a live event, but without having to go into box after box after box, querying each one every time.

The problem is that production operations are, at core, a distributed-systems problem. Most companies handle this by taking all that data and pulling it into one system. That creates lag. It prevents you from knowing what's going on right now, and it costs you a lot of money. You end up storing a lot of metrics just saying everything is just fine. And, it also creates a lot of inconsistency across your data silos. Your metrics are in a different place than your logs, in a different place from your resource inventory, and they're all slightly inconsistent with each other.

At Shoreline, we believe that the ‘ground truth’ is on the boxes you manage themselves. We treat the distributed system like a distributed system - by querying nodes directly and having a real time view - per-second - on metrics, on resources, and on the output of Linux commands. We think that's necessary in order to debug in-the-moment.

Case Two: Operational Reporting and Trend Analysis

There is also a need to be able to do operational reporting over time - for example, in the last week or month. For that, all of the data doesn’t need to be as high-grain, but for the issues that occur you do need to have high-grain high-fidelity information. (The rest of the information you may not care as much about.) But in these moments, what you need is accurate data: there are trends, patterns, and anomalies you might want to keep track of. At Shoreline we deal with this by transforming the raw data into the frequency time domain using wavelets. Wavelets are the same technology that underpin MPEG and JPEG, and it gives us really great compression (about 40x) which enables both high resolution per-second data and makes it possible to look at trends over time. The reason the compression works is because you're not looking for individual data points anymore. What you're looking for is the shape of the curve. And looking at that curve at different points, and trying to match the curve to see if it occurred in the past.

Conclusion

At the end of the day, you need access to 1) live, high resolution data and 2) you need cost effective data that you can retain for a long time. We believe that people shouldn’t store operational data for a long time because we don’t think they’ll look at it. But, if it needs to be accessed, Shoreline makes it efficient for them to look at the data. . One hundred metrics sampled every second costs Shoreline about 25 cents per host per year. That’s so inexpensive, we don’t even bother charging for it right now.

Take a look at what our solution can do for your team.

How to Efficiently Manage Your Operational Data

Product

Resources

Support

Company