I have been working with a group on whether they should have a single organizational backlog or whether they should separate their backlog into a product backlog and a technical improvements backlog. Like many places they want to make sure that infrastructure and technical debt is prioritized properly.
Any practitioner will tell you that technical improvements generally get ignored in favour of product improvements. Technical debt pay down normally gets deprioritized when there is a chance to improve the product from the customer’s perspective. At one organisation I worked with, the CEO said that thirty percent of every team’s capacity should be focused on improving the quality of the product. Every quarter the CEO would review where the teams were focusing and discover that product improvements dominated, normally taking one hundred percent of the effort.
In theory a separate backlog for technical improvements makes sense. We assign a certain percentage of capacity of each team to technical improvements and then we work through the technical improvement in the order in the backlog. However, in practice, we know that the teams will work almost exclusively against the product backlog. When you have two backlogs, you have the problem of deciding how the priority of each item in the technical improvement backlog relates to the priority of items in the product backlog.
The only solution is to have one backlog containing both Product and Technical Improvements. The investor group that prioritises the backlog considers the relative importance of each item regardless of whether it’s a product improvement (i.e. customer / business benefit) or a technical improvement. That way it is clear to teams that the technical improvement is more important than that new feature that the client wants.
So where does the confusion in Agile circles come from? It turns out that SAFE advocates the use of two organizational backlogs, a product backlog and an architectural backlog. The authors of the SAFE framework must know that two backlogs create this problem that leads to excessive technical debt, so why do they advocate this approach. My theory is quite simple. SAFE advocates the use of a particularly bad implementation of Weighted Shortest Job First (WSJF) to prioritise the product backlog. WSJF contains a bias that favours backlog items where the outcome is known. In Cynefin terms, WSJF has a bias towards “Obvious” and “Complicated” items, and against “Complex” and “Chaos” items. Technical backlog items often (though not always) fall into the “Complex” and “Chaos” domain. My theory is that the authors of SAFE have seen this problem in early implementations of SAFE and so separated out the Product and Technical Backlog.
Practice has shown that one backlog is needed if technical items are going to be given the appropriate level of priority. So how do we do this in practice? The real problem is that technical improvements are often expressed in terms of cost and the “What” / “How”. Technical improvements are rarely expressed in terms of the benefit they will deliver. To do this, two additional organisational level metrics are required, one for customer perceived quality (functional bugs, performance bugs, UX bugs, availability etc), and one for the lead time to deliver value (e.g. weighted lead time for investments and lead time from detection to fix for bugs). Technical improvements can be expressed in terms of these metrics (e.g. Paying down this technical debt will reduce the uncertainty of lead time for this change to the component, or this item will reduce the probability of bugs). The outcome is often unknowable which means they are in the “Complex” or “Chaos” domain, however the investor group can understand the intent. Once the intent is known, it is possible to construct a narrative that explains the value of the technical item in the context of other product investments. It is then possible to construct a backlog where WSJF may assist the prioritization discussion but does not dominate it.
Another couple of related points.
- Paying down technical debt is a great way to train new people on a code base and reduce key man dependency (a form of technical debt). If the organisation knows it will be making major changes to a particular component, investing in the pay down of technical debt is a great way to prepare for that future development and build additional capacity so that it can be done quicker.
- A more important point is that having to put technical debt into the organizational backlog is a transitional state. It is necessary because the teams allow the build up of technical debt and see the need to pay down debt in large chunks. In mature Agile teams, the teams will gradually improve the quality of the code base as part of every piece of work that they do. For example, a few years ago I visited Nat Pryce on a project. He showed a graph that his team had created. The graph showed the number of lines of code in their application. When they started it contained one million lines of code. After six months it consisted of one hundred thousand lines of code. They had not taken time out to pay down of technical debt, rather it was a continual process alongside developing new features. On a mature team you are less likely to see backlog items to clean up technical debt because it is a continual process that is part of normal development. In other words, a mature Agile culture will clean up as they go along whereas a immature Agile culture will have teams that see a choice between delivering features and creating technical debt. It would appear that SAFE intends to embed this immature practice in its process by having a separate technical backlog.
In conclusion, a separate technical backlog is a failure state. A technical backlog institutionalizes immature practices and creates a separation between product and technical concerns when there should be no split. A second technical backlog hides technical concerns from the product organisation when they should be a primary concern. Instead, create Quality and Lead Time based metrics that allow engineers to communicate the importance of the work they need to do.
One backlog to rule them all, not two or three or more!