Site Reliability Engineer (SRE) vs. DevOps: What’s the Difference?

Maria Homann

May 4, 2021

Site Reliability Engineering (SRE) is a term that has gained a lot of traction in recent years. Learn what it is, how it’s different from DevOps, and how you can use it as a part of your software development strategy.

Every time a new concept emerges in software development, it spikes interest and provokes the question: How is this different from what we’re already doing? And can we do things better if we follow these principles?

SRE is no different.

In this blog post, we’ll take a look at Site Reliability Engineering vs DevOps, and the similarities and differences between the two.

If you’re interested in learning what it takes to implement SRE, join our webinar on this topic and learn how Credit Suisse have transitioned towards an SRE culture by accelerating automation and empowering people with self-service tools.

What is Site Reliability Engineering (SRE)?

SRE stands for Site Reliability Engineering or Site Reliability Engineer. It’s a set of practices and culture, as well as a job role.

The term was coined by Google, and is commonly considered a slightly evolved version of DevOps. Perhaps because the founder of Google's Site Reliability Team, Ben Treynor, said that SRE is "what happens when a software engineer is tasked with what used to be called operations,” which could also be said about DevOps.

But what does that actually mean?

It means that, as a Site Reliability Engineer, it’s your job to align two different, sometimes contradicting, initiatives: to get code developed and shipped fast and to ensure that the code is highly reliable.

In other words, the SRE deals with contrasting endeavors such as stability and agility, quality and speed, proactivity and reactivity, pre-production and post-production, and innovation and operation.

SREs do this by spending equal amounts of time on development and operations. They must ensure that non-functional, operational requirements such as performance, security, availability, and maintainability are met in product design and development.

They must also consistently think automation into the equation in order to optimize the software development pipeline.

What is DevOps?

DevOps is, like SRE, a set of practices and a culture, as well as a job role.

The mindset behind DevOps is that if you build software, you also own it. By removing this division between responsibilities, it becomes easier to ensure that software keeps a high level of quality from development to release and that it moves with speed and agility through the DevOps pipeline.

Related reading: 20 DevOps Testing Tools

To better understand what DevOps is, we’ll break the concept down into five core principles:

1. Break down silos

A core role of DevOps is to break down barriers between teams or departments and encourage communication and collaboration for improved development pipelines.

DevOps personnel have unique insight into the full pipeline and bring together knowledge from the development and operations sides so that developers can gain more insight into operations and vice versa.

2. Accept failure, and fail fast

Software fails sometimes. That’s why we test. So instead of pretending that this doesn’t happen, DevOps aim to find methods to mitigate risk, and to ensure that the same mistakes don’t happen twice.

This is one of the reasons why test automation is core to DevOps - because it helps find those mistakes, and to find them early in the release cycle, where they are cheaper to fix.

3. Introduce change gradually

Instead of deploying large changes to production, the idea with DevOps is to deploy smaller, incremental changes, and to do so more frequently.

This makes it easier to review changes, control any bugs that may occur as a result, and to roll back those changes when needed.

4. Leverage tools and automation

In order to make frequent releases possible, and to ensure a high level of quality within those releases, optimization of the release pipeline is critical.

It’s the role of the DevOps to build the release pipeline with tools such as automation, which increase speed and accuracy while minimizing risk of human error.

Automation removes unnecessary manual work that is repetitive and error-prone, and allows for much quicker feedback loops. Without automation, it’s simply not possible to deliver at the pace required in today’s software production, and automation should therefore be considered a prerequisite for successful DevOps practices.

5. Measure everything

Outcomes are difficult to define if you can’t measure them. DevOps must use data and metrics to understand the outcomes of the initiatives taken, such as utilizing automation for efficiency gains.

Learn more about DevOps in our blog post and whitepaper: What is DevOps?

So how is SRE different from DevOps?

Truth be told, there aren’t many differences between the two concepts. They both build on the same principles.

As opposed to waterfall and DevOps, SRE and DevOps aren’t competing methods. Rather, SRE and DevOps are two sides of the same coin with a shared purpose - to break down organizational barriers to deliver better software, faster.

However, if you think of DevOps as the underlying philosophy, you could say that SRE as the prescriptive way of accomplishing that philosophy - at least that’s how Google have defined it themselves.

Both concepts came into existence around the same time, and both from the same need - to bring the mindset and knowledge from development into operations and to bring the need for stability and reliability (hence the name) from operations back to development.

In essence, the main difference between Site Reliability Engineering and DevOps is that SRE is more practical, and DevOps is more philosophical.

Site Reliability Engineering vs DevOps - a side-by-side comparison:

DevOps	Site Reliability Engineering (SRE)
Break down organizational silos	Share ownership of product across teams
Accept failure and fail fast	Closely examine failures in order to ensure that they don’t happen twice, and plan for failure by incorporating failure costs into the budget
Introduce change gradually	Push changes out in small scale, and carefully test before a full-blown release
Leverage tools and automation	Consistently look for automation opportunities and try to implement tools that can optimize processes and remove manual work
Measure everything	Define and measure key performance indicators to track progress and health of systems

Ready to learn more about SRE and how you can use automation strategically as a part of your SRE practices? Join our webinar: How Credit Suisse have transitioned towards an SRE culture by accelerating automation and empowering people with self-service tools.