"How would you teach someone to read a Cumulative Flow Diagram?" (Spoiler alert: the CFD is most definitely NOT the single most important part of Kanban.)
It was circa 2010 and I had just met Frank Vega for the first time at a Kanban meetup. In all honesty, I may have met Frank in passing before that, but 2010 was the first time I had a chance to really talk to him. At that time, Frank was one of the few people in the Kanban community who really knew what he was talking about (in my opinion, of course). On a Skype call shortly after that meetup Frank asked his fairly innocuous question, "How would you teach someone to read a Cumulative Flow Diagram?" (Incidentally, this was a completely rhetorical question as Frank knew very well--better than anyone else as it turns out--how to read a CFD). In the decade or more since I have been asked that question, most of my career has been based around trying to come up with a competent answer. My search has led me to following conclusion:
Most of what has become Kanban orthodoxy is simply wrong. Kanban doctrine as it exists today is based mostly on rumor and misunderstanding and is seldom based on science, much less fact. The early days of how the community talked about Cumulative Flow Diagrams (not only how to read them, but their importance altogether) was a perfect example of this triumph of ignorance.
Bear with me for a moment while I explain.
At the heart of what makes a Cumulative Flow Diagram (CFD) work is a relationship known as Little's Law. Dr. Little even uses a CFD in one of the proofs of his eponymous law1 (note: he called it a Cumulative Arrivals/Departures diagram as shown in Figure 1.1):
I bring up Little's Law because that equation is often held up as the foundational, irrefutable mathematical justification for Kanban. And it is. Its just not for the reasons that you think it is. Frank's question forced me into a deep dive of Little's Law (LL) and was the impetus for me to discover why all of us should really care about this simple equation (Spoiler alert: Little's Law itself is also NOT the single most important part of Kanban).
Let me say first that a full explanation of LL is beyond the scope of this book (for a fuller discussion please see "Actionable Agile Metrics for Predictability"2); however, I will need to take a few moments to summarize some of its relevant points.
LL can be stated as CT = WIP / TH where CT is the average Cycle Time of your process, WIP is the average Work In Progress of your process, and TH is the average Throughput of your process.
LL is exact in its calculation and this equation can be applied to any flow system. Before I explain that statement further, let’s pause for a second and try an experiment. If you know the average TH, CT, and WIP of your process (for the last month, for example), I’d like you to plug those numbers into the LL equation now. Try several different permutations. Maybe first divide your WIP by your TH and see if you get your CT. Then try multiplying your CT by your TH to see if you get your WIP, etc. What do you see? My guess is your numbers aren’t quite coming out the way you would expect them to or as predicted by LL. Not only are they probably off, but in some cases they are probably off by quite a bit.
What’s going on here? The LL calculation is indeed exact, but it is only exact in contexts when a specific set of assumptions are fulfilled. Those assumptions are (for the time period under observation):
1. The average departure rate must equal average arrival rate
2. All items that enter the system must finish and exit the system
3. The amount of WIP is roughly the same at the beginning and end of the time interval under observation
4. The average age of WIP is neither growing nor declining
5. Consistent units are used for the measurement of TH, CT, and WIP.
[Note: Assumption #5 is given here for completeness purposes only as this last assumption is trivial. All it is saying is that if you want to measure CT in days, then TH needs to be measured per day and average WIP must be measured by day. Mixing units is a big no-no (e.g., CT in weeks and TH in story points), but that should be intuitively obvious. And if anyone on your team struggles with this point then you have bigger problems than how to best apply LL.]
Because your calculated numbers didn’t come out as predicted by LL, that tells us that your process explicitly or implicitly violated one or more of the LL assumptions at least once and probably at multiple points over the time period that you chose for your calculation. The net effect of violating LL’s assumptions is that you have destabilized your process--as evidenced by the equation not working.
System stability (from a LL perspective) is so important because it is impossible to optimize a process that is inherently unstable. Your experience tells you this. How easy is it to optimize a process where the number of things you are working on increases every day? How easy is it to optimize a process where all the things you work on get blocked by dependencies on other teams? The LL assumptions, therefore, act as a powerful guide to policies we should implement to help prevent our process from destabilizing.
Any time you are applying Kanban to your context, you care about all 5 of Little's Law's assumptions (whether you know it or not).
And of those five, there is one assumption that rules them all—as promised by the title of this chapter.
A thorough understanding of what it means to violate each of LL's assumptions is key to the optimization of your delivery process. So let's take a minute to walk through each of those in a bit more detail.
The first thing to observe about the assumptions is that #1 and #3 are logically equivalent. I'm not sure why Dr. Little calls these out separately because I've never seen a case where one is fulfilled but the other is not. Therefore, I think we can safely treat those two as the same. But more importantly, you'll notice what Little is not saying here with either #1 or #3. He is making no judgment about the actual amount of WIP that is required to be in the system. He says nothing of less WIP being better or more WIP being worse. In fact, Little couldn't care less. All he cares about is that WIP is stable over time. So while having arrivals match departures (and thus unchanging WIP over time) is important, that tells us *nothing* about whether we have too much WIP, too little WIP, or just the right amount of WIP. Assumptions #1 and #3 therefore, while important, can be ruled out as *the* most important.
Assumption #2 is one that is frequently ignored. In your work, how often do you start something but never complete it? My guess is the number of times that has happened to you over the past few months is something greater than zero. Even so, while this assumption is again of crucial importance, it is usually the exception rather than the rule. Unless you find yourself in a context where you are always abandoning more work than you complete (in which case you have much bigger problems than LL), this assumption will also not be the dominant reason why you have a suboptimal workflow.
Which leaves us with assumption #4. Allowing items to arbitrarily age is the single greatest factor as to why you are not efficient, effective, nor predictable at delivering customer value. Stated a different way, if you plan to adopt Kanban (or if you are already practicing Kanban), the single most important aspect that you should be paying attention to is not letting work items age unnecessarily!.
More than limiting WIP, more than visualizing work, more than finding bottlenecks (which is not really a Kanban thing anyway), the only question to ask of your Kanban system is are you letting items age needlessly?
Before we get into aging, we need to take a step back and first talk about Cycle Time (CT). Most people think that the reason Kanban emphasizes CT so much is so that we can pressure Agile teams into getting more things done faster. Nothing could be further from the truth. The reason that Kanban cares about CT is because CT represents the time to customer feedback.
We'll see in a later chapter that until a work item is actually in the hands of the customer, that item represents only hypothetical value. Value can only determined by the customers themselves and that determination can only be made after the item is delivered. Thus, CT is really a measure of "time to validated feedback".
However, CT itself can only be calculated at or after the moment when the item has actually finished. Before it has finished all we know is the item's age. That aging process starts immediately once work begins. Further, work items will continue to age until they are ultimately delivered to the customer. Thus, the more items age, the longer we delay precious feedback from the customer.
That delayed feedback increases the chances of something going wrong with delivery. Maybe the business environment changes, maybe customer requirements change, maybe a global pandemic takes over--it's impossible to know what might happen to change a customer's needs. But what we do know is that longer age represents higher risk. And the ultimate risk is that we spend a long time working on something that ends up not being valuable. As my friend and colleague Prateek Singh likes to say, "it is all about finding out how wrong you are as quickly as possible." By letting items age unnecessarily, you are not just sabotaging your ability to deliver, you are sabotaging your ability to deliver what your customers really want.
So if aging is so bad, how do we prevent it from happening?
A question I love to ask in my workshops is "what are the two most effective ways to prevent items from aging unnecessarily?" This question usually stumps attendees because they want to run back to the dogma that they've been previously taught. You'll get answers like "lower WIP", or "clear blockers", or the like. But as we've just seen those answers don't necessarily lead to shorter age.
The first way to prevent items from aging is to finish them. It's that simple. If an item finishes, it is no longer aging. We can then begin the process to get customer feedback.
The second (and probably even better) way to prevent items from aging is to not start them. How many times are you and your team pressured to start work when you are not ready just for the sake of looking like you are making progress? From an LL perspective, that is the absolute worst thing that you can do.
Now let's put this all together. If you finish work as quickly as possible and don't start work until you are ready to do so, what have you just done? You guessed it, you've just controlled Work In Progress.
The real reason to control WIP is to prevent unnecessary aging.
We can take this logic a step further and assert that **all** Kanban practices can be derived from the basic principle that we don't want items to age unnecessarily. Why visualize work? So we can see where work is piling up and items are aging unnecessarily. Why mark work as blocked? So we can see where flow is not happening and items are aging unnecessarily. Why implement pull policies? So some items aren't allowed to jump the queue which would cause other items to age unnecessarily. And so on.
All Kanban practices can be derived from the singular motivation of not wanting items to age unnecessarily.
One final and very important thing to mention about how to prevent items from aging too much: If an item is taking too long to flow then the biggest culprit is probably that the item is too big. One of the first things you should look at for a "stuck" item is creative ways to break it up into many smaller items. Keep in mind that the idea here is not to break items up just to make our numbers look good. Rather, we want to find ways to break a big, valuable piece of work into several smaller--but still valuable!--pieces of work. In flow terms what we are really talking about is batch size. So many times you may be working on a single item--a single story, a single epic, a single feature, whatever--but what you really are working on are several small items masquerading as one large item (strategies for breaking work down will be explored in more detail in Chapter 3). The best signal you have that something may be too big and therefore may need to be broken down is its age. Ignore age at your peril.
If you are not paying attention to aging, you are missing the only real reason to do Kanban.
In other words, if Kanban is all about the optimal delivery of customer value, then how do you really know how optimal you are? The answer does not lie in WIP limits, CFDs, Flow Efficiency, change management, or any of the other B.S. you may have been fed until now. The answer lies in your ability to know whether your items are aging unnecessarily or not. Everything else in Kanban should be subordinate to that one aim.
However, an item that is aging in and of itself is not necessarily a bad thing. The reality is that all items must age to some extent before they can be delivered. The question we must ask, therefore, is how much age is too much age?
The answer is the Service Level Expectation or SLE. SLEs are another one of those topics that you have probably not heard much about. The SLE is so fundamental, in fact, that it deserves its own chapter, and that chapter immediately follows...
Endnotes
1. Little, J. D. C., and S. C. Graves. “Little’s Law.” D. Chhajed, T. J. Lowe, eds. Building Intuition: Insights from Basic Operations Management Models and Principles (Springer Science + Business Media LLC, New York, 2008)
2. Daniel Vacanti, "Actionable Agile Metrics for Predictability" (ActionableAgile Press, 2014)
Remember you can always download a PDF of The Kanban Pocket Guide here: https://prokanban.org/kpg/