DORA Four Key Metrics (Accelerate book)
The Accelerate Book
In the book, Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, Dr. Nicole Forsgren, Jez Humble and Gene Kim studied what made separated strong performing technology organisations from their less effective counterparts.
The book summarises years of rigorous research from years of State of DevOps Reports, built upon 23,000 datasets from companies all around the world. The organisations studied included start-ups and enterprises, profit and not-for-profit organisations and companies which were born digital alongside those that had to undergo digital transformation.
The Four Key Metrics
The research identified that just Four Key Metrics distinguish the performance of various technology organisations. These "North Star" metrics serve as indicators of overall software engineering health.
These metrics aren't "The Goal" of a business, but organisations which did well against these metrics had higher rates of profitability, market share and customer satisfaction. In other words; they allowed organisations to experiment faster, ship reliably and prevent burnout.
Likewise; these goals aren't "leading indicators" or "local metrics" which tell you whether you need to increase, say, Unit Test coverage or cut build times - they measure the entire engineering health of a team. By analysing these metrics and drilling down into inefficient areas, you can ensure you're constantly optimising things that will improve the end-to-end engineering health on your team, rather than optimising in a local area that will make no improvement.
The Four Key Metrics were as follows:
- Cycle Time (Change Lead Time) - Time to implement, test, and deliver code for a feature (measured from first commit to deployment)
- Deployment Frequency - Number of deployments in a given duration of time
- Change Failure Rate (CFR) - Percentage of deployments which caused a failure in production
- Mean Time to Recovery (MTTR) - Mean time it takes to restore service after production failure
Hollistic EngProd Metric
Cycle Time measures what happens from a developer picking up some work, through to it going into production.
Many engineering leaders fall into the trap of measuring one indicator (say, deployment time) without understanding how the entire picture looks. Focus in only one local area can lead to optimisation where no bottleneck exists.
By looking at a global metric, you can optimise the entire Software Development Lifecycle.
A Metric for Engineers
As Cycle Time measures engineering process, rather than product outcomes, it is a measure that is actionable for engineering teams.
The engineering team can take complete ownership of improving it without external dependencies.
Product Managers will continue to prioritise work as makes sense for the business, but engineering will be able to deliver that work faster.
Optimises for Flow
As described in Why Shipping Software Smaller Helps Deliver Better Product, it is better to optimise for flow instead of just the volume of work delivered. Improving flow allows you to deliver better business outcomes reliably whilst preventing developer burnout.
Cycle Time encourages focus on a flow instead of using lower-impact flow based metrics.
Having this hollistic picture allows you to drill into problem areas.
For example; after seeing typical Cycle Times, you may notice that work is slowing down during Code Review. Drilling further, you might see that it takes too long for first code reviews to be completed due to slow builds blocking the approval workflow.
By first looking at the global picture and then drilling down into local areas, you are able to improve the entire Software Development Lifecycle, instead of just one area.
No Quality Measure
Cycle Time will measure speed, but it won't measure quality. It should typically be paired with strong engineering practice (in managing the balance of risk and reward, to maintain reliability) alongside more quantative measures of reliability.
For example; whilst Cycle Time is a key metric for engineering performance, it might also be worthwhile looking at Median Full Resolution Time for bugs to ensure that customer bugs are resolved in a timely fashion.
As Cycle Time is a lagging indicator, it can be hard to gain visibility into risks early when using it. Ideally you want to address risks before they turn into metrics.
Using alerts, teams can track when issues start to appear before they turn into issue. For example; Slack alerts warning when a Pull Request is stuck in review or back-and-forth discussion can allow issues to be resolved promptly.
Cycle Time vs Change Lead Time
Cycle Time is an end-to-end measure of the Software Development LIfecycle (and allows you to find bottlenecks in the entire development lifecycle).
Change Lead Time is a considerably more narrow metric that covers the time from code commit to deployment.
Many tools which track DORA metrics will only allow you to track Change Lead Time, which only provides visibility into part of the software engineering process. This is as DORA metrics are typically used for DevOps optimisation, instead of for EngProd.
Other metrics within DORA also focus on DevOps areas rather than engineering productivity (for example, Change Failure Rate and Mean Time To Recovery rather than customer-reported bugs).
Deployment Frequency is a volume-based metric, so often tells us very little and can be unhelpful - particularly on healthy projects that are subject to little need for change. By measuring Change Lead Time effectively, we are able to remove the need for this metric.