# Understanding SRE

URL:: https://share.snipd.com/episode/21a5570a-c634-4856-8ed8-4f3058b1fdd6
Author:: The Stack Overflow Podcast
## Highlights
> DevOps and SRE: The Philosophy of Product Development
> Key takeaways:
> - DevOps is a more overarching philosophy of running product development compared to SRE.
> - DevOps emphasizes short feedback cycles and frequent production deployment to measure effectiveness and operate fast under user feedback.
> - While DevOps provides a philosophical standpoint, it lacks concrete tools and organizational methodology.
> - SRE is considered a concrete implementation of the DevOps philosophy.
> - SRE can be seen as implementing the interface of DevOps in computer science terms.
> Transcript:
> Speaker 1
> And if you compare SRE with DevOps, then your total right in saying that DevOps is a more overarching philosophy of running product development. So with DevOps, you've got a philosophy of short feedback cycles where you are able to deploy your production frequently, you are able to measure the effectiveness of what you deployed On the users and that way you are able to operate fast under the tight guidance of user feedback. And within that realm, you also need to operate the services reliably as well. And although DevOps tells from the philosophical standpoint that you need to do this, it doesn't give you very concrete tools and organizational methodology in order to do so. So if you look on this, then you'll find early-create videos that compare DevOps and SRE and they tell you that SRE is a concrete implementation of the DevOps philosophy. So in computer science terms, plus SRE implements DevOps interface, so to speak. ([Time 0:04:08](https://share.snipd.com/snip/7a75754a-bb06-4bdc-9d82-b4fe3019ecb2))
> The Importance of Service-Level Objectives and Involving SRE Thinking in the Product Life Cycle
> Key takeaways:
> - The more developers are put on call for their services, the more incentives they have to implement reliability from the beginning.
> - Involving SRE thinking at the beginning of the product life cycle can contribute to reliability implementation and durability thinking.
> - When developers are involved in operations work, they are more likely to prioritize reliability in the design and development process of services.
> Transcript:
> Speaker 1
> The more you put developers on call for their services, the more incentives they will have to implement the reliability into the services from the beginning. So this is coming back to the question that you asked a couple of minutes ago. So is it only about really operating the stuff towards the end, or is it also more about involving the SRE thinking at the beginning of the product life cycle? And the more you put the actual developers who implement reliability into doing operations work, the more they will actually do the reliability implementation and durability thinking In the design process and development process of the services. ([Time 0:10:34](https://share.snipd.com/snip/771b4b93-2130-441b-b75f-721bfcce8654))
> The Role of SRE Organization in Running Services
> Transcript:
> Speaker 1
> And if you follow the original SRE book from Google, then you'll see that although they've got an entire SRE organization that is running the services, actually it never starts with The full SRE support from the SRE organization. So you actually need to convince the SRE organization within Google to help your product. And before they do this, you are responsible for running your service yourself. So that means you are totally in the ability to run it philosophy. And once you've later enlisted support by the SRE organization, then your services fall below certain service level objectives that you have agreed between the development organization And the SRE organization, then the SRE organization will return the services to you and then you have back to your ability to run it. So that means that you are always as the development organization have skin in the game of running services, even in the original SRE literature. And I would recommend doing so. ([Time 0:12:00](https://share.snipd.com/snip/631df172-34b8-4273-a745-910f59afad05))
> The Core Practice of Managed Service Level Objectives in SRE
> Key takeaways:
> - Managed service level objectives are a central principle in SRE methodology.
> - Defining and adopting service level objectives is crucial for quantifying liability and determining reliability.
> - Adopting service level objectives allows SRE infrastructure to provide data for influencing prioritization of reliability.
> - Not all companies talking about SRE have adopted service level objectives, but it is essential.
> Transcript:
> Speaker 1
> If you look at the SRE principles, then one of the central principles there is managed by service level objectives. Right. So if there are no service level objectives, then how can you manage? Then it's you sort of done big bits and pieces out of SRE methodology and you say that you are running the SRE way, but actually the core essence is still missing there. So therefore, I'd say one of the core practices that's on the rise now is to actually define and adopt and manage by service level objectives. I think this is absolutely key because without this, you cannot really quantify your liability. Without this, you cannot have your SRE infrastructure provide you with data, short-term data, long-term data about what's reliable to which extent, what's not reliable to which You extend and for which time period as on. And therefore, you cannot present that data in application to your product management in order to influence prioritization of reliability stuff where there is least reliability At the moment and so on. But I think adopting service level objectives and really leaning in that idea and really bringing folks on the same table in order to get that done, I think this is really core. And it's surprising to me that not all companies are doing this who are talking about SRE. I think this is absolutely essential. ([Time 0:15:38](https://share.snipd.com/snip/af412876-e849-439c-8718-c2f59bffe893))
## New highlights added October 16, 2023 at 6:16 PM
> How to build a good SRE program
> Key takeaways:
> - A successful SRE program should establish a joint understanding of reliability objectives at both an organizational and team level.
> - Transparent tracking of goals and service level objectives is important to assess if reliability goals are being met.
> - Continuous dialogue within teams and at the organizational level is necessary to evaluate if reliability objectives are being fulfilled and if customer satisfaction is being achieved.
> Transcript:
> Speaker 1
> So what I would suggest, a successful SRE program will first of all establish a joint understanding of the reliability objectives that we've got as organization and then as a set of Services owned by a particular team. So each team will have an understanding of their reliability goal. Number one. So then number two, what will happen is that there will be transparent tracking of those goals, whether the services are fulfilling the goals or service level objectives or not. And then there will be a rather continuous dialogue within each team and also at the higher level within the organization, whether we are fulfilling the reliability objectives, the SLOs that we set for ourselves and whether despite fulfilling them, we are still getting customer complaints or the other way around. ([Time 0:17:45](https://share.snipd.com/snip/91eb1818-bdc1-4a90-9008-ab3c98863310))
## New highlights added October 20, 2023 at 10:57 AM
> Episode AI notes
> 1. DevOps is a more overarching philosophy of running product development compared to SRE. DevOps emphasizes short feedback cycles and frequent production deployment to measure effectiveness and operate fast under user feedback. While DevOps provides a philosophical standpoint, it lacks concrete tools and organizational methodology. SRE is considered a concrete implementation of the DevOps philosophy. SRE can be seen as implementing the interface of DevOps in computer science terms.
> 2. The more developers are put on call for their services, the more incentives they have to implement reliability from the beginning. Involving SRE thinking at the beginning of the product life cycle can contribute to reliability implementation and durability thinking. When developers are involved in operations work, they are more likely to prioritize reliability in the design and development process of services.
> 3. The SRE organization at Google does not provide full support initially. You need to convince the SRE organization to help your product. Initially, you are responsible for running your service yourself. Once you enlist support from the SRE organization, certain service level objectives are agreed upon. If the service level objectives are not met, the services are returned to you. It is recommended to have a stake in running services as a development organization.
> 4. Managed service level objectives are a central principle in SRE methodology. Defining and adopting service level objectives is crucial for quantifying liability and determining reliability. Adopting service level objectives allows SRE infrastructure to provide data for influencing prioritization of reliability. Not all companies talking about SRE have adopted service level objectives, but it is essential.
> 5. A successful SRE program should establish a joint understanding of reliability objectives at both an organizational and team level. Transparent tracking of goals and service level objectives is important to assess if reliability goals are being met. Continuous dialogue within teams and at the organizational level is necessary to evaluate if reliability objectives are being fulfilled and if customer satisfaction is being achieved. ([Time 0:00:00](https://share.snipd.com/episode-takeaways/f95ace80-737a-4ffa-b932-19d62c15334a))
---
Title: Understanding SRE
Author: The Stack Overflow Podcast
Tags: readwise, podcasts
date: 2024-01-30
---
# Understanding SRE

URL:: https://share.snipd.com/episode/21a5570a-c634-4856-8ed8-4f3058b1fdd6
Author:: The Stack Overflow Podcast
## AI-Generated Summary
None
## Highlights
> Episode AI notes
> 1. DevOps is a more overarching philosophy of running product development compared to SRE. DevOps emphasizes short feedback cycles and frequent production deployment to measure effectiveness and operate fast under user feedback. While DevOps provides a philosophical standpoint, it lacks concrete tools and organizational methodology. SRE is considered a concrete implementation of the DevOps philosophy. SRE can be seen as implementing the interface of DevOps in computer science terms.
> 2. The more developers are put on call for their services, the more incentives they have to implement reliability from the beginning. Involving SRE thinking at the beginning of the product life cycle can contribute to reliability implementation and durability thinking. When developers are involved in operations work, they are more likely to prioritize reliability in the design and development process of services.
> 3. The SRE organization at Google does not provide full support initially. You need to convince the SRE organization to help your product. Initially, you are responsible for running your service yourself. Once you enlist support from the SRE organization, certain service level objectives are agreed upon. If the service level objectives are not met, the services are returned to you. It is recommended to have a stake in running services as a development organization.
> 4. Managed service level objectives are a central principle in SRE methodology. Defining and adopting service level objectives is crucial for quantifying liability and determining reliability. Adopting service level objectives allows SRE infrastructure to provide data for influencing prioritization of reliability. Not all companies talking about SRE have adopted service level objectives, but it is essential.
> 5. A successful SRE program should establish a joint understanding of reliability objectives at both an organizational and team level. Transparent tracking of goals and service level objectives is important to assess if reliability goals are being met. Continuous dialogue within teams and at the organizational level is necessary to evaluate if reliability objectives are being fulfilled and if customer satisfaction is being achieved. ([Time 0:00:00](https://share.snipd.com/episode-takeaways/f95ace80-737a-4ffa-b932-19d62c15334a))
> DevOps and SRE: The Philosophy of Product Development
> Summary:
> DevOps is an overarching philosophy of running product development that focuses on short feedback cycles, frequent production deployment, and measuring effectiveness. However, it lacks concrete tools and methodology. On the other hand, SRE is a concrete implementation of the DevOps philosophy, providing the tools and organizational methodology needed for reliable service operation. In computer science terms, SRE implements the DevOps interface.
> Transcript:
> Speaker 1
> And if you compare SRE with DevOps, then your total right in saying that DevOps is a more overarching philosophy of running product development. So with DevOps, you've got a philosophy of short feedback cycles where you are able to deploy your production frequently, you are able to measure the effectiveness of what you deployed On the users and that way you are able to operate fast under the tight guidance of user feedback. And within that realm, you also need to operate the services reliably as well. And although DevOps tells from the philosophical standpoint that you need to do this, it doesn't give you very concrete tools and organizational methodology in order to do so. So if you look on this, then you'll find early-create videos that compare DevOps and SRE and they tell you that SRE is a concrete implementation of the DevOps philosophy. So in computer science terms, plus SRE implements DevOps interface, so to speak. ([Time 0:04:08](https://share.snipd.com/snip/7a75754a-bb06-4bdc-9d82-b4fe3019ecb2))
> The Importance of Service-Level Objectives and Involving SRE Thinking in the Product Life Cycle
> Summary:
> By putting developers on call for their services, they will have more incentives to prioritize the reliability of those services from the start. This emphasizes the importance of involving SRE thinking at the beginning of the product life cycle. When developers are directly responsible for operations work and reliability implementation, they are more likely to incorporate durability thinking into the design and development process.
> Transcript:
> Speaker 1
> The more you put developers on call for their services, the more incentives they will have to implement the reliability into the services from the beginning. So this is coming back to the question that you asked a couple of minutes ago. So is it only about really operating the stuff towards the end, or is it also more about involving the SRE thinking at the beginning of the product life cycle? And the more you put the actual developers who implement reliability into doing operations work, the more they will actually do the reliability implementation and durability thinking In the design process and development process of the services. ([Time 0:10:34](https://share.snipd.com/snip/771b4b93-2130-441b-b75f-721bfcce8654))
> The Role of SRE Organization in Running Services
> Summary:
> According to the original SRE book from Google, the SRE organization does not initially provide full support for running services. You need to convince them first. Until then, you are responsible for running your own service. Once you enlist the support of the SRE organization and meet agreed service level objectives, they will return the services to you. It means the development organization always has skin in the game of running services, even in the original SRE literature.
> Transcript:
> Speaker 1
> And if you follow the original SRE book from Google, then you'll see that although they've got an entire SRE organization that is running the services, actually it never starts with The full SRE support from the SRE organization. So you actually need to convince the SRE organization within Google to help your product. And before they do this, you are responsible for running your service yourself. So that means you are totally in the ability to run it philosophy. And once you've later enlisted support by the SRE organization, then your services fall below certain service level objectives that you have agreed between the development organization And the SRE organization, then the SRE organization will return the services to you and then you have back to your ability to run it. So that means that you are always as the development organization have skin in the game of running services, even in the original SRE literature. And I would recommend doing so. ([Time 0:12:00](https://share.snipd.com/snip/631df172-34b8-4273-a745-910f59afad05))
> The Core Practice of Managed Service Level Objectives in SRE
> Summary:
> To truly adopt the principles of SRE, it is crucial to define and manage by service level objectives (SLOs). Without SLOs, measuring reliability and influencing prioritization becomes impossible. Many companies claiming to follow SRE overlook this key practice, but it is essential for success.
> Transcript:
> Speaker 1
> If you look at the SRE principles, then one of the central principles there is managed by service level objectives. Right. So if there are no service level objectives, then how can you manage? Then it's you sort of done big bits and pieces out of SRE methodology and you say that you are running the SRE way, but actually the core essence is still missing there. So therefore, I'd say one of the core practices that's on the rise now is to actually define and adopt and manage by service level objectives. I think this is absolutely key because without this, you cannot really quantify your liability. Without this, you cannot have your SRE infrastructure provide you with data, short-term data, long-term data about what's reliable to which extent, what's not reliable to which You extend and for which time period as on. And therefore, you cannot present that data in application to your product management in order to influence prioritization of reliability stuff where there is least reliability At the moment and so on. But I think adopting service level objectives and really leaning in that idea and really bringing folks on the same table in order to get that done, I think this is really core. And it's surprising to me that not all companies are doing this who are talking about SRE. I think this is absolutely essential. ([Time 0:15:38](https://share.snipd.com/snip/af412876-e849-439c-8718-c2f59bffe893))
> How to build a good SRE program
> Summary:
> A successful SRE program should establish a joint understanding of reliability goals within the organization. Each team should track their own goals transparently. Continuous dialogue should occur within teams and at the organizational level to ensure goals are being met and customer satisfaction is achieved.
> Transcript:
> Speaker 1
> So what I would suggest, a successful SRE program will first of all establish a joint understanding of the reliability objectives that we've got as organization and then as a set of Services owned by a particular team. So each team will have an understanding of their reliability goal. Number one. So then number two, what will happen is that there will be transparent tracking of those goals, whether the services are fulfilling the goals or service level objectives or not. And then there will be a rather continuous dialogue within each team and also at the higher level within the organization, whether we are fulfilling the reliability objectives, the SLOs that we set for ourselves and whether despite fulfilling them, we are still getting customer complaints or the other way around. ([Time 0:17:45](https://share.snipd.com/snip/91eb1818-bdc1-4a90-9008-ab3c98863310))