Search -
Site Reliability Engineering: How Google Runs Production Systems
Site Reliability Engineering How Google Runs Production Systems Author:Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy Building and operating distributed systems is fundamental to large-scale production infrastructure, but doing so in a scalable, reliable, and efficient way requires a lot of trial and error. In this collection of essays and articles, three Site Reliability Engineers from Google explain how the company has successfully navigated these waters over... more » the past decade.You?ll learn how Google continuously deploys and monitors some of the largest software systems in the world, how it?s Site Reliability Engineering team learns and improves after outages, and how they balance risk-taking vs reliability with error budgets.« less