Can’t we just use Story Points with Monte Carlo Simulations

 m

Approaching Multiple Item Forecasts

Teams and organisations are regularly faced with questions of forecasting. When will this set of features be delivered? How many items can we expect to complete by a given date? These are not trivial questions - whether you are aligning with stakeholders, planning a release, or making an informed decision.

Monte Carlo Simulation (MCS) has become an increasingly common technique for tackling these questions. By running thousands of simulated scenarios based on historical data, MCS provides a probabilistic forecast rather than a false sense of precision from a single-point estimate - a far more honest way to answer the exam questions of "when will it be done?" and "how much can we deliver?"

This article is not an introduction to MCS or multiple item forecasting - there are excellent resources that cover both in far greater and better depth, and I’d encourage you to explore them if you are building your foundational understanding. A couple of my favourites are Flow Forecasting Pocket Guide and When Will It Be Done? 

What I’m here to explore is something more specific: why Throughput is the right input, currency, when running MCS for multiple item forecasts.

The Currency Behind MCS: Throughput

Before exploring why Throughput is the right unit of measure, it helps to ground ourselves in what it actually is. The ProKanban definition is a useful starting point: Throughput is the number of work items completed per unit of time.

Simple. No estimation, no weighting, just a count.

When you look across the articles, books and research on MCS, you'll find Throughput consistently positioned as the input of choice. In practice, this means two inputs drive the simulation:

  • Historical daily Throughput - the count of items your team completed per day (could even be week, sprint or even month) over a given period
  • Number of remaining items - the count of items you need to complete (in a "when" forecast)

Both inputs are counts of actual items. No conversion, no translation - just real data reflecting what your team has actually delivered.

This naturally raises a question worth sitting with: why couldn't we just use story points? Feed in completed daily story points as the historical input, enter the sum of remaining estimates, and let the simulation run. From a pure mathematical standpoint, you technically could - MCS doesn't mind what unit you feed it, as long as the inputs are consistent.

But there are several compelling reasons why you wouldn't want to. I'll explore four of the key reasons why Throughput is so highly regarded as the “currency” for multiple item forecasts.

1. Throughput Saves Estimation Time

Picture a planning poker session. Cards on the table, five different opinions, a heated discussion about whether this story is a 5, 8 or 13, and eventually (even though approaches do vary) an average taken. Multiply that across every item in your backlog, every sprint, every new feature that gets broken down. It becomes a cottage industry in its own right - significant time spent in refinement and planning, producing/updating numbers that ultimately don't tell you when something will be delivered.

If story points were the input for a MCS, you'd need to estimate every single item in the backlog, not just to establish a points-based history for completed items, but also to produce a "remaining work" figure to simulate against. Time wise, that's a considerable investment before you've even run the simulation.

Throughput and remaining items (required for “when” forecasts) sidestep this entirely, there's nothing to calculate or discuss, they’re a count.

Yes, right-sizing (not to be confused with same-sizing) applies, but from experience, is a far simpler and more productive conversation than assigning points.

One of my previous teams reduced the time spent in refinement sessions by 60% after moving away from estimation-based approaches. That time was reinvested into actually delivering, reducing uncertainty through doing, rather than trying to predict it all upfront.

2. Throughput Excludes Subjectivity and Bias

Estimation is, by its nature, a subjective exercise. When a team discusses whether an item is a 3 or a 5, they are making relative guesses through ordinal data - influenced by experience, context, confidence and any number of human factors. Two people can look at the same piece of work and land on entirely different numbers, both with perfectly reasonable justifications.

Feed that subjectivity (e.g. points completed per day) into a MCS and you embed additional variability. You're no longer running thousands of scenarios against real historic delivery data, you're running them against opinions. The variability introduced by inconsistent or subjective estimates produces insight that reflects how your team felt about the work rather than how they actually delivered it.

Throughput removes this entirely. A completed item is a completed item, there's nothing to interpret. The count is objective, and that objectivity is what makes it a reliable simulation input.

I'll acknowledge that right-sizing introduces a degree of subjectivity. But the opportunity for bias is significantly lower, and crucially, we have active mechanisms to manage it. Item aging being a key one. 

3. Throughput is Easily Understood

I'll keep this one short, because the point is a simple one.

Have you ever tried explaining story points to someone new? The confused looks, the questions - What's velocity? Why Fibonacci numbers? Why not just use days? Even when people grasp the concept, teams apply it inconsistently.

All of that complexity is noise when what you're really trying to do is forecast delivery.

Throughput needs almost no explanation. It's a count, items completed per day. People understand it immediately and can instantly apply it. It's also drift-resistant; unlike story points, which tend to inflate or deflate over time as teams/contexts shift, a count remains a count.

The one instinct worth watching out for is the temptation to relate points back to time. It's a natural human response, and a problematic one. That leads us into the fourth and final reason.

4. Throughput Accounts for Blockers, Waiting, and Dependencies

I've saved my favourite until last.

A common assumption in estimation is that complexity drives duration, a more complex item will naturally take longer to deliver. But the data often tells a different story. When you plot item complexity (points) against actual cycle time, the correlation is weak at best. Complex items don't reliably take longer, and simple items don't reliably move faster.

Why? Because when teams estimate, they are largely thinking about the active work, the coding, the testing, the doing. What they aren't accounting for is everything in between: the waiting for a dependency, the blocker that sits unresolved for three days, the handoff that gets stuck in a queue. These delays can account 90+% of the time from start to completion, yet they rarely feature in an estimate. That's not a criticism of teams, it's simply very difficult to estimate the unknown. Now imagine using that as an input into a MCS, you’re missing out on a lot of delivery context. 

Throughput doesn't require you to predict any of this. It captures it automatically.

There is a well-established scientific relationship, under certain assumptions, between Cycle Time, WIP, and Throughput, known as Little's Law. This is not a heuristic or a rule of thumb; it is the actual physics of how work moves through a system. Because Throughput is derived from real end-to-end historic delivery data, it inherently reflects the full picture: the active work, the waiting, the blockers, and the dependencies. 

This gives your MCS a true end-to-end view of how work flows through your system. And if your process has multiple defined finish points (through multiple Definitions of Workflow (DoWs)) Throughput also gives you the flexibility to run multiple forecasts based on those different “finish” points.

Summary

When faced with a multiple-item forecast, the question is simple: what is the most effective and efficient unit of measure to feed into your MCS? I hope I have provided some insight into why the answer is Throughput.

To recap, the four reasons I’ve explored:

  • It saves estimation time - Throughput is a count, readily available in your delivery data. Reinvest that time into actually delivering potential value and removing uncertaining through doing.
  • It excludes subjectivity and bias - A completed item is a completed item. No ordinal guesswork and no unnecessary variability into your simulation.
  • It is easily understood - A count needs no lengthy explanation. It's intuitive, immediately applicable, and drift-resistant.
  • It accounts for blockers, waiting, and dependencies - Throughput captures the full end-to-end reality of how work moves through your system, including everything that estimation frequently, but understandably, excludes.

Whilst story points remain mathematically valid as an input, the reasons above make a case for why there's a better alternative that is researched, taught and practised.

Don't take my word for it - in the next article I'll run a Monte Carlo Simulation using both story points and Throughput side by side, so you can see the real-world impact of these issues for yourself.

You may also find yourself asking a different question: if Throughput and Cycle Time share a scientific relationship through Little's Law, could Cycle Time serve as an equally effective input? It's a great question - stay tuned.