Blog

Resources and insights

Read the latest stuff we're up to and what we're most excited about.

Loading...
DASH 2023
Event

DASH 2023

DASH, by Datadog, is an annual conference with two days packed with hands-on learning and inspiration. Let’s build and scale the next generation of applications, infrastructure, security, and technical teams together.

How to Solve the Challenges of MELT Data at Scale
Video

How to Solve the Challenges of MELT Data at Scale

The bigger the data set, the slower it is to analyze. For MELT, you need to be able to execute a query at scale across your fleet and see what's going on in the live environment. That’s why, at Shoreline, we favor modeling the distributed system as a distributed system.

How to Reduce Alarm Noise
Video

How to Reduce Alarm Noise

In any company, 50-80% of the alarms are noisy. Employees get trained to snooze these alarms – which isn’t always the right thing to do. Wouldn't it be better if you could easily see which are your top issues each week, and which alarms might be set incorrectly?

Building a Culture Around Reliability
Video

Building a Culture Around Reliability

It's not some other team's job to keep your service up. Just like it's not some other team's job to fix your bugs or make sure that your system doesn't have vulnerabilities. We all have to own it. That is what a culture of reliability requires.

How Shoreline Helps You Get a 4 9’s SLA
Video

How Shoreline Helps You Get a 4 9’s SLA

Since we’re all sitting on similar infrastructure, if someone solves an issue, everyone should be able to benefit from it. That’s one of the ways we help our customers to save time, reduce errors, and get to a four 9’s SLA.

A Guide to Building Reliable Systems
Video

A Guide to Building Reliable Systems

Amazon S3's 11 nines claim promises near-immortal data storage, but real-world factors like solar events and correlated failures challenge this durability. Understanding the limits is crucial for robust system design.

Decoding Taylor Swift’s Ticketmaster Debacle
Video

Decoding Taylor Swift’s Ticketmaster Debacle

What can we learn from the Ticketmaster (Taylor Swift) Debacle? Ticketmaster experienced an unprecedented demand that resulted in their site crashing for many hours. If they had designed a reliable service with an escalator-like system instead of an elevator, this could have been avoided.

SLC DevOpsDays 2023
Event

SLC DevOpsDays 2023

Every year, we look forward to connecting with our local DevOps community at SLC DevOps Days; Sharing and learning from experts in our community, and working with DevOps thought leaders that visit our event. We are very excited to be back in Salt Lake City March 14-15, 2023.

KubeCon '23
Event

KubeCon '23

The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities in Chicago, Illinois from November 6-9, 2023.

AWS Summit New York 2023
Event

AWS Summit New York 2023

Join us in person for AWS Summit New York to see how your peers and competitors are using the cloud to their advantage and learn all the ways you can use AWS to jump-start, grow, or supercharge your business and career to the next level. AWS Summit New York is a free event.

Monitorama 2023
Event

Monitorama 2023

Monitorama brings together the brightest minds among the open source development and operations communities to continue to push the boundaries of observability software and practices, all while having a great time in a casual setting. Join us in Portland, Oregon June 26-28, 2023.

Unleashing the Power of AI in IT Operations
Podcast

Unleashing the Power of AI in IT Operations

In a compelling discussion, Evan and Anurag delve into the intricacies of Shoreline's AI Ops platform for incident response. Anurag, drawing from his experience leading reliable services at AWS, highlights the challenges of maintaining high availability in the face of rapid growth. He emphasizes the role of innovative automations in ensuring consistent service for demanding customers. Anurag suggests the first step in driving reliability for cloud services is understanding the root causes of incidents. He points to Shoreline's free tool designed to aid in this process. The conversation also features a case study of a major Shoreline client managing a 30,000-node fleet across multiple clouds and regions. Anurag shares how the client efficiently handles security checks and issue detections over thousands of instances simultaneously, treating the entire fleet as a single entity. For a deeper dive into this insightful discussion, the full video podcast is available on YouTube and LinkedIn.

About Company Values
Video

About Company Values

Part of the reason to create a company is to create the environment you want to be in.So it’s important that you reflect your values in your interview process. Otherwise, the sheer number of people joining will dilute things.

Risks of Automation vs. Human Errors
Video

Risks of Automation vs. Human Errors

Automation is risky. Errors in the remediation code could worsen an outage. While that’s true, we also know that human error causes 5x more incidents than automation. You can fix code. You can't fix people.

Is Automation Too Time-Consuming?
Video

Is Automation Too Time-Consuming?

Automation takes us too much time. We're way too busy fighting fires to think about it. The problem with this approach is that 48% of incidents are straightforward and repetitive. Don't have people fix them manually. Teach the computer how to do it.

How to Manage Failure without Wasting Resources
Video

How to Manage Failure without Wasting Resources

How can you better utilize the resources you keep aside for failover purposes? Here's how we utilized resources kept just for failover purposes to do things that could be stopped for some time when a failure happens and had resources doing useful background activity that can be deferred to when things hit the fan.

How to Reduce Waste for Unexpected Demands
Video

How to Reduce Waste for Unexpected Demands

Shoreline's back ends are low utilization most of the time. But once an hour, we pull telemetry data from all agents, resulting in a CPU, memory, and network utilization spike. See how we convert over-provisioned resources for demand spikes to waste and eliminate it.

Slack vs. Waste
Video

Slack vs. Waste

Waste is when resources are deeply over-provisioned, underutilized, or not utilized at all. Slack appears like the same thing, but you create it with purpose. It's important to understand the difference to drive costs down.

Why You Should Automate Production Ops
Video

Why You Should Automate Production Ops

Most of the on-call issues are commonplace, which means they happen again and again. It’s important to automate these issues because it’s a one-time investment, doesn’t make mistakes, and stays with you forever.

TigerGraph: Scaling in the Cloud with a Small Ops Team
Webinar

TigerGraph: Scaling in the Cloud with a Small Ops Team

Shoreline founder and CEO, Anurag Gupta, joins Dr. Jay Yu, TigerGraph's VP of Product and Innovation, to discuss innovative ways to scale cloud operations fast without the need to incur a lot of costs and keep expanding the cloud DevOps team in this webinar hosted by DevOps.com.

Article

5 Ways to Prevent an Outage

The main challenge in preventing outages lies in the inevitable breakdown of various components like disks, nodes, and networks. To mitigate this, companies need to acknowledge human error as an unavoidable factor, especially when numerous commands are manually inputted daily. Investigating how minor errors can cause significant damage and implementing safeguards and redundancies are essential steps to reduce the risks and impacts of potential outages.

DASH
Event

DASH

Dash, by Datadog, is an annual conference about building and scaling the next generation of applications, infrastructure, security, and technical teams, hosted in 2022 at Javits Center North, New York.

KubeCon ‘22
Event

KubeCon ‘22

The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities in Detroit, Michigan from October 24 – 28, 2022.

re:Invent
Event

re:Invent

For 10 years, the global cloud community has come together at re:Invent to meet, get inspired, and rethink what's possible. Join us again this year in Las Vegas for our biggest, most comprehensive, and most vibrant event in cloud computing.

Shoreline Incident Automation Overview
Video

Shoreline Incident Automation Overview

Shoreline’s Incident Automation Platform was built to reduce manual and repetitive work, so that you can repair issues faster, increase team productivity, and eliminate thousands of hours of degraded service.