Niall Murphy, former SRE at Google and Microsoft and author of the O'Reilly book, Site Reliability Engineering, shares his experience of using Shoreline's Incident Automation Platform.
Hello, my name is Niall Murphy, and I’ve been working in Internet infrastructure for over 20 years.
As a result, I know production incidents are an infinitely renewable resource. Given a large enough or complicated enough system, they’re gonna keep happening. They’re very hard to handle, and almost no one is very good at handling them. I’ve often reflected on how much gets solved, by ad-hoc automation or scripts engineers keep in their home directories to save the day at the last moment. Fine, but it’s not a sustainable approach, and making your team consistently better at incidents still remains very hard.
What I like about the Shoreline approach is that it’s not just an automation tool, it’s not just a production debugging tool, but it actually helps you to make your whole team better. It’s kind of a shared and common language for doing things in production. You get to automate away both the boring, repetitive toil that every operational situation has, and use a fine scalpel to slice off a particular datacenter when the DB has gone haywire.
The first time I saw Shoreline, I was really impressed with the fact that you could write a simple expression to pick up the hosts you cared about, but also that this expression was resolved in real-time, so when the instances changed, the value changes too. Then the fact that you could stick a filter on that and say “all instances with the following names and with the following CPU usage” was really powerful. You could partition the problem space really quickly. Then attaching a named alert to that was really useful, and finally acting on it automatically just made it really easy to handle the kind of low-grade toil that almost every system has to struggle with, just make it go away until you’ve time to fix it later.
Shoreline’s the kind of solution where you can tidy away both the stuff you just need to keep going for now and the kind of thing you use to fix the urgent issue more quickly than you otherwise could.
Shoreline buys you time to do the right thing, instead of just stumbling from problem to problem.