Fleetwide Repairs

Safely fix incidents across your entire fleet, with less overhead, and with fewer errors.

Act on the entire production fleet as if it were a single box.

With Shoreline’s agents installed fleet-wide, SREs can instantly and easily execute the actions they need, wherever they need to be run.

  • Never SSH into individual boxes
  • Never load and manage versions of shell scripts on each node
  • Never manage permissions and credentials box by box

See how it works:

Arm your production ops team with real time debug data, and guide them with pre-approved repair actions

"Shoreline is 10X faster than Ansible."

Site Reliability Engineer, Observability Company

Below is an example of a Shoreline command in action:

pod | app="cc_processor" | filter(pod_cpu_usage > 80) | `bouncepods.sh`

This command has three parts: a resource query, a metric query and a Linux command. Pod | app=”cc_processor” is filtering for pods tagged with cc_processor, pods running the credit card processing app. The second part is filtering on a metric, CPU usage, to narrow this down to those with high CPU. Finally, Shoreline is executing a ‘bounce pods’ script on each pod meeting the first two criteria so that Kubernetes can redistribute the pods. This is how you can run a precise action across your entire fleet in seconds.

Execute actions across the fleet in parallel.

Instead of logging into individual boxes, Shoreline commands can apply to all hosts in your fleet, or just the ones in specific pods whose metrics have hit specified states.

Empower your SRE team to fix incidents quickly with guardrails.

Shoreline empowers with limits. No more SSH-ing into boxes with unlimited access. Shoreline Notebooks guide users on how to resolve issues while limiting potential issues with role-based access control (RBAC), blast radius constraints, and circuit breakers.

Use pre-approved actions to quickly and safely remediate incidents.

By repairing common production operations issues with pre-approved actions based on a library of proven best practices, the on-call team can safely and quickly improve MTTR metrics and overall reliability.

Monitor real-time alerts, view full Alarm details, and execute linked repair commands.

Learn more: see the documentation

Ready to get started?

Shoreline helps you eliminate repetitive tickets and increase your availability at the same time. Get started today with a free trial.