The Kanban Pocket Guide:  Chapter 2 - The Service Level Expectation

The Kanban Pocket Guide: Chapter 2 - The Service Level Expectation

 m

Maybe you've heard a myth about Kanban that goes something like this:

"Because Kanban has no timeboxes, items are allowed to take as much time as they need to finish."

Or maybe you've heard something like this:

"We can't do Scrum because we can't finish items in two weeks. So we do Kanban because Kanban has no requirement to get items done in two weeks."

Putting aside the misconceptions about Scrum in the second statement for a second, both of the above quotes are incorrect in their assessment of Kanban. Work Items in Kanban do not get to sit in progress forever and finish whenever we get around to working on them. That is the antithesis of flow. Flow implies movement or progress. And if items are just sitting and aging then there is no flow. No flow, no Kanban.

So, no, in Kanban items don't get to take as long as they want to finish. But what is Kanban's solution to this problem? Well, as always, it's helpful to look at things from the perspective of our customers. What's the first question our customers will ask us once we start to work on something for them? If you answered "When will it be done?" then you win a prize. Whether you agree or not, that is a reasonable question for our customers to ask. And we need a way to provide them an answer.

If you think about it, what our customers are really asking us to do is to predict the future. Therefore, any answer we give them is tantamount to a forecast. A funny thing about the future, however, is that it has this nasty habit of being full of uncertainty. Despite what some people might tell you, no one can predict the future with 100% certainty. The second uncertainty is involved in any endeavour, a probabilistic approach is warranted.

For example, before I flip this coin, tell me with 100% certainty that it will come up exactly heads. Obviously you can't give 100% certainty before the flip, but what you can say is you have a 50% chance of it being heads (and a 50% chance of it being tails). As another example, before I roll this 6-sided die tell me will 100% certainty that I will roll exactly a 3. Again, 100% certainty doesn't exist, but I do know I have about a 17% chance of rolling a 3.

The same principle applies in our work. Once I start to work on an item, it is impossible for me to say with 100% certainty exactly how long it will take for that item to finish. But what I can do is look at historical data to come up with a probabilistic statement about how long it should take (e.g., "85% chance of finishing in 12 days or less"). By the way, another word for "probabilistic statement about the future" is "forecast".

Putting this all together, when our customers ask "when will it be done?" we need to answer them with a forecast. In Kanban, the probabilistic statement about how long it will take for individual items to finish once started is known as the Service Level Expectation or SLE.

From the Kanban Guide: "The SLE is a forecast of how long it should take a single work item to flow from started to finished. The SLE itself has two parts: a period of elapsed time and a probability associated with that period (e.g., "85% of work items will be finished in eight days or less"). The SLE should be based on historical cycle time, and once calculated, should be visualized on the Kanban board. If historical cycle time data does not exist, a best guess will do until there is enough historical data for a proper SLE calculation."

The SLE serves two functions in Kanban. First, it provides a completion forecast for work items once they have started. Second, the SLE helps us to answer the question we posed at the end of the last chapter, namely, "how much age is too much age?".

How to calculate an SLE for your process is covered ad nauseam in AAMFP(Actionable Agile Metrics for Predictability) and WWIBD(When Will it be Done?). If you are not familiar with the derivation of an SLE I would ask you to familiarize yourself with that concept before proceeding, as our attention here will be focused on the practicalities of how to use the SLE once calculated--especially in relation to age.

Percentiles as Intervention Triggers

As items age, we gain information about them. The percentiles on our Scatterplot work as perfect checkpoints to examine our newfound information. We will use these checkpoints to be as proactive as possible to insure that work gets completed in a timely and predictable manner.

How does this work? Let’s talk about the 50th percentile first. And let’s assume for this discussion that our team is using an 85th percentile SLE. Once an item remains in progress to a point such that its age is the same as the Cycle Time of the 50th percentile line, we can say a couple of things. First, we can say that, by definition, this item is now larger than half the work items we have seen before. That might give us reason to pause. What have we found out about this item that might require us to take action on it? Do we need to swarm on it? Do we need to break it up? Do we need to escalate the removal of a blocker? The urgency of these questions is due to the second thing we can say when an item’s age reaches the 50th percentile. When we first pulled the work item into our process it had a 15% chance of violating its SLE (that is the very definition of using the 85th percentile as an SLE). Now that the item has hit the 50th percentile, the chance of it violating its SLE has doubled from 15% to 30%. Remember, the older an item gets the larger the probability that it will get older. Even if that does not cause concern, it should at least cause conversation. This is what actionable predictability is all about.

When an item has aged to the 70th percentile line, we know it is older than more than two-thirds of the other items we have seen before. And now its chance of missing its SLE has jumped to 50%. Flip a coin. The conversations we were having earlier (e.g., pair, swarm, break the item up) should now become all the more urgent.

And they should continue to be urgent as that work item’s age gets closer and closer to the 85th percentile. The last thing we want is for that item to violate its SLE—even though in this example we know it is going to happen 15% of the time. We want to make sure that we have done everything we can to prevent a violation occurring. The reason for this is just because an item has breached its SLE does not mean that we all of a sudden take our foot off the gas. We still need to finish that work. Some customer somewhere is waiting for their value to be delivered.

Right Sizing

I lied to you earlier when I said the SLE serves two functions in Kanban.  It actually serves three.  The third function is to assist in a practice known as right sizing.  There is a pervasive Kanban myth that all items that flow through your process have to be of the same size. After all, that's the only way that WIP Limits make sense, right?  Wrong.  There is nothing in flow theory that demands that all items that flow through a WIP-controlled system be exactly the same size.  In fact, there is a whole theory of variation that acknowledges that not only do items not have to be of exactly the same size but that there is also nothing you could ever do to make them all exactly the same size--even if you wanted to.  That is, variation in work item size will always exist.  

Therefore, the consequence of variation is that we have to design a system that is able to gracefully handle the varying size of items that will ultimately enter our system.  But there are limits to the amount of variation that we can handle.  To illustrate this idea, Frank Vega loves to use the example of a wood chipper (anyone who has seen the movie *Fargo* knows exactly what I am talking about).  Think about what happens when you try to shove a tree branch that is too big into the wood chipper (a la *Fargo*).  At the very least that branch will get stuck.  At worst, that branch will break your chipper.  Likewise, what happens if you were to pick a bunch of sawdust and throw all that sawdust into the chipper?  That would clog things up too.  But those are extreme cases.  The wood chipper would be able to reasonably handle anything sized between sawdust and a small tree trunk.  Any branch that the wood chipper can handle without struggling is said to be right sized.

The same is true of your process.  For your process the right size will be the range of possible outcomes as dictated by the percentile confidence that you have chosen for your SLE. For example, if you are using an 85^th^ percentile as your SLE, and in your process the 85^th^ percentile is 12 days or less, then the right size for items to flow into your system is 12 days or less. 

Conclusion

Once you get the hang of managing work by age, then you are about 80% of the way to being able to optimize flow.  There are a few odds and ends that we still need to cover (i.e., the rest of this book), but none of that will mean anything unless you grasp the concept of work item age!

Remember, however, that it is impossible for us to know how big an item is *before* we start to work on it.  As work progresses, we need to continually compare that item's age to our SLE, using percentile lines as triggers as previously discussed.  Just because you thought something was right sized when you started it doesn't meant that it actually is.  The only way to truly know is to monitor aging for each and every item that flows through your system. 

But monitoring is only half the battle--and it's not even the most important part.  The other half of the battle is to take action once you have the information that an item is taking too long to complete.  The best information in the world is useless unless action is taken.  Hence, Kanban practice #2, "Active Management of Items in Progress".  Luckily, we'll talk about that topic next.

Remember you can always download a PDF of The Kanban Pocket Guide here: https://prokanban.org/kpg/