Back to Blog

Unleashing the Power of AI in IT Operations

In a compelling discussion, Evan and Anurag delve into the intricacies of Shoreline's AI Ops platform for incident response. Anurag, drawing from his experience leading reliable services at AWS, highlights the challenges of maintaining high availability in the face of rapid growth. He emphasizes the role of innovative automations in ensuring consistent service for demanding customers. Anurag suggests the first step in driving reliability for cloud services is understanding the root causes of incidents. He points to Shoreline's free tool designed to aid in this process. The conversation also features a case study of a major Shoreline client managing a 30,000-node fleet across multiple clouds and regions. Anurag shares how the client efficiently handles security checks and issue detections over thousands of instances simultaneously, treating the entire fleet as a single entity. For a deeper dive into this insightful discussion, the full video podcast is available on YouTube and LinkedIn.

Evan Kirstel and Anurag Gupta

Overview

In this conversation, Evan and Anurag talk about Shoreline's AI Ops platform for incident response.

Anurag covers what he learned while leading reliable services at AWS, which was made more challenging by rapid growth. Because of the scale, they had to get innovative with automations to ensure high availability for customers with high expectations.

Today, driving reliability for cloud services has become a widespread challenge for many companies. Anurag recommends that first step to take is to just understand what is causing incidents to get a good blueprint on where to do repairs or to shorten diagnostics. (Shoreline has a free tool to help with this.)

Anurag also tells the story of a large Shoreline customer that runs a fleet of 30,000 nodes across all three major clouds, and across 'umpteen' regions. "It's really cool for them to be able to run a command across 10,000 instances at once, to detect if something might be wrong or check for new security vulnerabilities. You can manage your fleet as though it were a single box. And that's cool."

Check out the full video podcast on YouTube or on LinkedIn.