Service Level Objectives (SLI, SLO, SLA) Explained Simply
Below is a concise and hopefully memorable framework for understanding SLI, SLO and SLA. These three concepts have a relationship that is hierarchical in nature. You can think of it as a pyramid.
Service Level Indicators (SLI)
This answers the question, what are we going to measure? To summarize it in one word, metrics.
Here are some common examples of metrics for HTTP based microservices
- Response time (the amount of time it takes between sending a request and getting a response)
- Throughput (max number of requests the system needs to handle)
- Error rate (ration of failed requests to successful requests)
Service Level Objective (SLO)
This answers the question, what values of SLIs matter? To summarize it in one phrase, SLI + thresholds.
Below are some common SLO metrics.
Service Level Agreement (SLA)
This answers the question, what happens if we don’t stay within the thresholds of the SLOs? In other words what happens if we don’t live up to the bar set by our SLOs? To summarize it in one phrase SLO + consequences.
It is worth mentioning that many software teams that have internal stakeholders don’t have formal SLAs. These are usually between a company and its customers. For example an SLA for Google compute mentions that customers are entitled to financial compensation (in terms of compute credits) if Google doesn’t meet the SLA agreement.