How to Fix Software Development Bottlenecks
Cycle Time is split into multiple different metrics. In the last article, we discussed how you can identify where the bottleneck is specifically.
Once you've identified the bottleneck, it is important to understand how to improve it. Usually this process involves actively listening to members of your team and looking at sub-metrics in other tools (like your CI build system) to understand where the opportunities are. Then you can proceed to propose the change.
This part of the process involves collecting feedback from your engineering team and using professional engineering judgement to identify technical problems. It involves, not just quantitive data, but qualitative data too.
Critically, you must then apply management and technical skills to be able to prioritise and ultimately fix the problem.
In this article, we describe some common bottlenecks at each place within the Software Development Lifecycle to provide some ideas for improvement programmes. These are ideas to validate, and may not match the use-case you currently face. Our next article will describe how you can propose and get buy-in for an improvement.
In Haystack, Development Time is measured from first commit, to a Pull Request being raised.
This can be affected by the following:
- Difficulty running/testing the work in a development environment.
- Difficulty running required build processes locally.
- Changes in requirements or unclear requirements leading local rework.
- Work is sized too large and requires multiple commits to complete.
- Too many people committing work on the same branch before a Pull Request is raised.
- Technical debt meaning the code is overly complicated or too hard to work on, meaning additional expertise is needed in development.
To resolve this, it can often be useful to experience the development process first-hand or pair program with another developer who is routinely working in this area. Developer experience surveys and retrospectives can also be of use here.
Review Time is often the bottleneck for many teams and broken down into three different metrics, let's take each one individually.
First Response Time
First Response Time is the time spent in the code review process waiting for the first response (approval or comment). This gives us a sense of how long developers spend waiting for their team to respond. Teams should aim for first response time within a few hours of opening.
Some common reasons for this being the bottleneck include:
- Slow or flaky CI tests delaying the software from leaving the automated test phase.
- Poor review assignment (consider using something like a CODEOWNERS file to automatically add reviewers or automatic assignment).
- Developers struggling to prioritise reviews due to high workload.
- Insufficient Collective Code Ownership meaning there aren't enough people to review the code.
### Rework Time
Rework Time is the time spent rewriting code after the Pull Request has been opened. This gives us a sense of how long developers spend rewriting code to get an approved Pull Request. Teams should aim for first rework time of less than 2 days.
Some causes include:
- Codebase complexity means unintended effects are caused during the development process.
- Insufficient linting to identify issues earlier in the development process.
- Poor automated testing meaning bugs are only identified later in the process.
- Work is sized too large.
- Work is subject to scope change after the PR is worked on.
- Unclear original requirements.
- Unclear or controversial software development patterns.
Idle Completion Time
Idle Completion Time is the time from last commit to the Pull Request merged. This gives us a sense of how long the final review process takes. It is rare for this to be the bottleneck and teams should aim to eliminate this as much as possible.
Reasons for this include:
- Poor project management leads to developers starting new work before completing their last task, leading to task switching (reducing Cycle Time, care and productivity).
- Slow pre-merge build processes.
- Automation to automatically merge PRs after sufficient approval can help reduce this.
Deployment time is the time taken from a Pull Request being merged to the software being deployed in production.
This can commonly be improved by increasing the number of regular deployments, for example, by using a release train to regularly release new code. Ultimately you should aim for software to be deployed on each merge to keep this to a minimum.
Common reasons for this to be the bottleneck include:
- Insufficient number of deployments, due to slow and manual deployment processes.
- Poor ability to manage the risks associated with deployments (deploying more frequently in a more automated way should result in less failures, not more).
- Deployment schedule doesn't match development times.
- Automated deployment processes are too slow and complex.
- Developers are not trusted to deploy and maintain their own code.