The process of an engineering team adapting to be able to ship software faster and more reliably without burnout.
North Star Metric
A North Star Metric is a metric that a team uses to as their overriding objectives. For a company seeking to undergo a DevOps Transformation, metrics like Change Lead Time, Deployment Frequency, Mean Time to Recovery and Change Failure Rate (see definitions below) may be used as these North Stars.
A North Star Metric should not encapsulate the entire business objective (e.g. revenue growth) but at the same time it should not be too localised so that the big picture is missed. When local metrics (leading indicators) are optimised, they result in improvements against North Star Metrics, which ultimulately improve the business objective metrics.
Issue (Jira Ticket)
Issues or tickets in project management tools like Jira are used to record, assign and track work that is completed by engineers.
A commit is the fundamental building block of a change to software. New features, bug fixes and any other kind of software change contain one or more commits that record changes to software in a Version Control System.
A Pull Request encapsulates one or more commits. Pull Requests allow developers to propose a change to a software, allowing automated tests to run and their colleagues to review their change before the change is actioned.
Version Control System (VCS)
A VCS records changes to software and allows developers to manage these changes collectively. As a VCS will hollistically contain software projects, it is often referred to as a Source Code Management (SCM) system. Git is a common open-source VCS, and commercial providers like GitHub, GitLab and Atlassian Bitbucket will host respositories of code stored using Git in the cloud.
Cycle Time (& Issue Cycle Time)
Cycle Time represents the time taken for a given iteration of work to be completed. It measures the time from starting work on something to that job being completed. Unlike Lead Time, this does not include the time spent waiting for the work to start.
When using Git data, Haystack will present Cycle Time as the time from first commit to that change being merged (but not necessarily actually deployed into production). This effectively measures the cycle of how long it takes to finalise, test and review a given software change.
When using Jira data, Haystack can present an Issue Cycle Time metric which measures the entire time from a developer indicating that they have started working on a task, through to them marking that work as complete.
Change Lead Time
Change Lead Time represents the time taken from first commit being made to that change being fully deployed in production. It essentially measures the entire time it takes to propose a software change through to that change actually being in the hands of a user.
As this is often the most reproducible part of the software development process, it is the easiest to optimise.
The number of times in a given time period that an engineering team will deploy a change to production.
Engineering Onboarding Time
Engineering onboarding time is the mean time a member takes to make their first commit from the time they joined.
Mean Time to Recovery
Time from a production incident being occuring to the system's health recovering.
Typically engineering teams will automate the detection of production outages using alerting tools like Prometheus and PagerDuty. When an outage is detected by a tool like Prometheus, an incident will be raised in PagerDuty. An oncall engineer will acknoledge the incident and begin working on the issue. Once their work is done, the incident will be automatically resolved by the alerting system as the system regains health.
Full Resolution Time for Bugs
DevOps teams will often consider MTTR to mean Mean Time to Recovery, the time to resolve an incident. This metric is often highly dependent on what is considered an incident, many ordinary customer bugs can be important to track for this metric (given the speed of fixing a bug can often be inversely correlated to customer satisfaction).
By instead tracking Full Resolution Time for bugs you are able to identify how quickly your team is able to resolve customer issues and respond to challenges when they come up.
Sprint Completion Rate
The percentage of tickets in a given sprint that have been completed as originally planned.
Throughput (& Issue Throughput)
Throughput, when calculated from Git, represents the number of Pull Requests merged over a given timeframe.
Issue throughput is instead calculated from Jira and measures the number of tickets closed in a given timeframe.
Leading Indicators and Risks
Merge Without Review
Number of Pull Requests merged without review. This tracks you when a pull request has been merged without appropriate code review.
Approved, not merged
Approved pull requests awaiting merge. This tracks you when pull requests are reviewed, approved but not merged yet.
High Discussion Activity
Pull requests are stuck in a back-and-forth discussion. When a pull request has >10 comment cycles, it indicating developers are stuck in a discussion.
Comment cycle is defined as at least one comment made by a commenter and at least one comment made as a response to the same pull request counted as one comment cycle.
>1K Line Changes
Pull Requests opened with more than 1K line changes. This alerts you when a pull request has > 1K line changes - these typically take much longer to merge.
High Cycle Time Warning
Pull Requests with higher than 5 days of Cycle Time (in Git). This alerts you when pull requests are taking longer than expected - an opportunity to retro and unblock team members.
Commits Awaiting Review
Pull Requests with new commits awaiting further review. This helps keep active pull requests top of mind for reviewers.
Members that have more than 3 active pull requests in progress. This alerts you to when members are overworking - context switching decreases developer efficiency by nearly 50%.
Abnormal amount of activity over the weekend. This alerts you to members working over the weekend - giving an opportunity to encourage healthier working habits.
Too Many Cooks
Number of pull requests that had 3 or more commiters.
Issues in Multiple Sprints
Tracking issues that span multiple sprints is a great way of identifying work that is not properly broken down or that the team are stuck on. By drilling into these metrics, you can then identify the rogue Jira issues that need expediting.
Sprint Scope Change
When issues are added to a sprint after it has already started, this can mean the team is working beyond its capacity and work that might not have been properly prioritised is being focussed on. By identifying issues that have been added mid-sprint, you can identify and mitigate these issues.
Unlinked Pull Requests
Even if you require that Pull Requests contain a Jira label, they can sometimes be bypassed by invalid Jira IDs, like “PROJ-XXX” or “PROJ-0”. To ensure Pull Requests are correctly linked to Jira tickets, and match against the work that you’re trying to achieve, tracking Unliked Pull Requests allows you to manage this risk more effectively. This metric is calculated using both Git and Jira data in conjunction and allows you to ensure your Pull Requests are correctly tracked against Jira tickets.
Pull Requests per Issue
Pull Requests per Issue is a fairly simple metric but can be a good proxy metric to understanding whether too much work is being grouped into one Jira story, but also whether there are too many places that developers need to make changes before they can push changes live. Whilst this metric shouldn’t be used as a goal in its own right, drilling down on this measure allows you to identify deeper issues.