Feature Monte Carlo

What is Monte Carlo?

In simplest terms, Monte Carlo is running an experiment multiple (often tens of thousands of times). We use random paths through the future using the past as a model to figure out what is the likelihood of different outcomes.

The Inputs

The key metric we use for MC is Throughput. This is the number of stories closed on a daily basis by the team. If the box below represents your process, this is the number of stories exiting it per day.


Next, we need the amount of work remaining. This is in the form of Features expected to be delivered during this release and the stories attached to those features. With Throughput and the amount of work remaining, we have two parts of the equation, the rate at which we get work done and the amount of work remaining.


The first two inputs can help us figure out when we will be done with features for a release. We can then compare this to the end date of the release and figure out if the feature will be delivered within the course of that release. That is the third input that makes Monte Carlo tick.
There is one more input that we have to consider to make the simulations more realistic, but we will get to that a little later.

Mixing the Ingredients

With Monte Carlo simulations, what we are trying to do is simulate the future using the recent past. If the remaining days in the release look like the recent past, what is the probability that we can get our work done? Let us start with figuring out the probability of getting a certain number of stories done in the next 30 days. We will take a look at a team (sample data from a software development team) that is trying to get 87 stories done in the next 30 days. We will use the past 35 days, which is what we use for most teams.


These numbers were taken from a file export from a software development team on 3/12/2020. These are not real priorities or complete numbers. The throughput and number of stories in these features is accurate as of 3/12/2020 but the 30 day deadline is completely hypothetical.
Take a look at the screenshot below. It shows the three inputs that we are considering (you might have to zoom in a bit).

The three (horribly drawn) boxes are the inputs we have discussed previously. The red box is the daily throughput of the team – the number of stories closed in each of the past 35 days. The yellow box is the date range we are simulating for – 30 days (3/26 to 4/25). The blue box is the number of items we are trying to get done.


A single Monte Carlo simulation is essentially a simulation of what the team will do over the next 30 days. Monte Carlo randomly selects a day from the past 35 days and assigns its throughput to the first day in the simulation(3/26). It then picks another random day from the past and assigns its throughput to the next day (3/27). This is done for every day until the end date for our simulation (4/25). We then total up all the throughputs for the simulated future and that total is the result of this single Monte Carlo simulation.


Doing this once is not enough. This single run is only one possible future. We repeat this process thousands of times. We are running 3000 simulations for the results shown here. We capture the resulting total stories completed in each of these simulations. We also keep track of how many times we got each result.
Now that we have these thousands of simulated futures, we can analyze the results as shown in the bottom half of the screenshot. That is a histogram of how often we got each result from our simulation. We can pick the number of stories that are remaining to be completed, 87 in this case, and find out what percentage of simulations resulted in 87 or more stories completed. As seen in the screenshot the answer is 1.7%. We can answer this question for any of the ‘number of stories’ we are interested in. We can also place our features in priority order and find out where we cross certain thresholds – 85% for initial concern (Yellow) and 65% (Red) for greater degree of concern. Let us see what it looks like for the features for this team assuming this hypothetical 30 day deadline.

As the cumulative number of stories to be done goes up the ‘Chance to Finish’ for that feature keeps decreasing.

Input 4 has Joined The Game

The method described above makes one major assumption – Teams will work on one feature at a time and in strict priority order. It assumes that teams are able to direct all their throughput towards the first feature, then the next, then the next, and so on. As you can imagine the first assumption there is a major problem. Teams often, due to multiple reasons, work on multiple features simultaneously. This is a completely valid way for a team to work and we should have our Monte Carlo models take this into account.


The last input that is used in these Monte Carlo runs is the number of features a team has in progress. We determine the features in progress by literally counting the features that have passed the start point, but not yet passed the finished point in the team’s process. This is the team’s Feature WIP. As teams have multiple features “In Progress” at a time, in order to accurately match the way the teams work, we need to spread the throughput across these features. 
Just as before, a random throughput from the past 35 days is selected and this throughput is used to lower the stories remaining count for randomly selected “In Progress” features. When one of these features reaches a story count of 0, it is marked as complete and the date it would complete is recorded. Also, when a feature is marked as complete, the next feature in priority order is put into progress. The process is repeated till there are no more stories remaining to be done or there are no more days left to simulate for.


The important thing to note is that adding the Feature WIP into the simulations does not change the number of stories a team can get done, it only changes which features those stories come from.


The percentage of these simulations that have a feature ending before the Due Date of that feature is the success probability of that feature. Below are some examples using the same data that we used above, before we considered Feature WIP. The Feature WIP = 1 results are the same as shown above. 


Feature WIP = 2

You can see some of these numbers starting to change. While most of these are the same as before, Feature number 9, has dropped dramatically from 78% to 23% chance of success. Meanwhile, the chances of success of the smaller sized feature number 10 have gone up from about 62% to 91%.
Let us see what happens when we increase the Feature WIP to 3 –

What if this team decided to have a feature WIP of 10 instead? What would those results look like?

Hopefully, the pattern is becoming evident. As we move from a WIP of 1 to larger WIPs, the success probabilities shift from being in Priority order to being in roughly the order of the number of stories in a feature. Initially (with very low WIPs) priority has the greatest effect on the chance to finish. As WIP becomes higher though, Feature size becomes the greatest determinant of chance to finish. In fact, if this team was to use a WIP of 18, ie start all features on Day 1, the results would be as follows – 

As you can see, these results are strictly in the order of the number of stories in the feature.
Hopefully, this explains to a decent extent how Monte Carlo simulations produce the results that you see on a daily basis. There is no real magic to this voodoo. It is literally taking the data the team is generating and applying it to the future. There is no human intervention, just the team’s throughput data which acts as the model, the date as set by the team, the scope as determined by the features, and the Feature WIP as decided by the team. These 4 ingredients are then applied to the future and the results are presented. Below is the pseudo-code for how these simulations run. 

Leave a Comment

Your email address will not be published. Required fields are marked *