Our videos

Approaches and tips

Learn from our quick explanations, demos, and best practices.

Filtered byType: Video
How to Solve the Challenges of MELT Data at Scale

How to Solve the Challenges of MELT Data at Scale

The bigger the data set, the slower it is to analyze. For MELT, you need to be able to execute a query at scale across your fleet and see what's going on in the live environment. That’s why, at Shoreline, we favor modeling the distributed system as a distributed system.

How to Reduce Alarm Noise

How to Reduce Alarm Noise

In any company, 50-80% of the alarms are noisy. Employees get trained to snooze these alarms – which isn’t always the right thing to do. Wouldn't it be better if you could easily see which are your top issues each week, and which alarms might be set incorrectly?

Building a Culture Around Reliability

Building a Culture Around Reliability

It's not some other team's job to keep your service up. Just like it's not some other team's job to fix your bugs or make sure that your system doesn't have vulnerabilities. We all have to own it. That is what a culture of reliability requires.

How Shoreline Helps You Get a 4 9’s SLA

How Shoreline Helps You Get a 4 9’s SLA

Since we’re all sitting on similar infrastructure, if someone solves an issue, everyone should be able to benefit from it. That’s one of the ways we help our customers to save time, reduce errors, and get to a four 9’s SLA.

A Guide to Building Reliable Systems

A Guide to Building Reliable Systems

Amazon S3's 11 nines claim promises near-immortal data storage, but real-world factors like solar events and correlated failures challenge this durability. Understanding the limits is crucial for robust system design.

Decoding Taylor Swift’s Ticketmaster Debacle

Decoding Taylor Swift’s Ticketmaster Debacle

What can we learn from the Ticketmaster (Taylor Swift) Debacle? Ticketmaster experienced an unprecedented demand that resulted in their site crashing for many hours. If they had designed a reliable service with an escalator-like system instead of an elevator, this could have been avoided.

About Company Values

About Company Values

Part of the reason to create a company is to create the environment you want to be in.So it’s important that you reflect your values in your interview process. Otherwise, the sheer number of people joining will dilute things.

Risks of Automation vs. Human Errors

Risks of Automation vs. Human Errors

Automation is risky. Errors in the remediation code could worsen an outage. While that’s true, we also know that human error causes 5x more incidents than automation. You can fix code. You can't fix people.

Is Automation Too Time-Consuming?

Is Automation Too Time-Consuming?

Automation takes us too much time. We're way too busy fighting fires to think about it. The problem with this approach is that 48% of incidents are straightforward and repetitive. Don't have people fix them manually. Teach the computer how to do it.

How to Manage Failure without Wasting Resources

How to Manage Failure without Wasting Resources

How can you better utilize the resources you keep aside for failover purposes? Here's how we utilized resources kept just for failover purposes to do things that could be stopped for some time when a failure happens and had resources doing useful background activity that can be deferred to when things hit the fan.

How to Reduce Waste for Unexpected Demands

How to Reduce Waste for Unexpected Demands

Shoreline's back ends are low utilization most of the time. But once an hour, we pull telemetry data from all agents, resulting in a CPU, memory, and network utilization spike. See how we convert over-provisioned resources for demand spikes to waste and eliminate it.

Slack vs. Waste

Slack vs. Waste

Waste is when resources are deeply over-provisioned, underutilized, or not utilized at all. Slack appears like the same thing, but you create it with purpose. It's important to understand the difference to drive costs down.

Why You Should Automate Production Ops

Why You Should Automate Production Ops

Most of the on-call issues are commonplace, which means they happen again and again. It’s important to automate these issues because it’s a one-time investment, doesn’t make mistakes, and stays with you forever.

Shoreline Incident Automation Overview

Shoreline Incident Automation Overview

Shoreline’s Incident Automation Platform was built to reduce manual and repetitive work, so that you can repair issues faster, increase team productivity, and eliminate thousands of hours of degraded service.