Why Story Points are a measure of complexity, not effort

Measures of estimation

For those of you who have not heard of Story Points, let me start by explaining that they are now the preferred method of estimation for the backlog of work on your project.

Instead of estimating Feature X in the backlog as “8 Man Days”, you might estimate Feature X as “8 Story Points”.

But herein lies a problem. What exactly do we mean by “8 Man Days” and “8 Story Points”? They are not interchangeable.

What is a “Man Day”?

First, let us re-visit that old friend of ours in estimation: The Man Day!

Time

The Man Day, which would perhaps be better referred to as “The Worker Day”, has long been used as a measure of estimation when you can accurately predict the complexity of the task ahead and when your available resource is of equal ability.

Say you want to build a wall X feet long and Y feet high. You might say it is “8 Man Days” of effort. The effort, here, is the operative word.

So, if you have 1 person, and they work for 8 Days, they should have built the wall if your estimate is accurate.

And if you have 2 people, they should have built the wall in 4 days. And 4 should have built the wall in 2 days, and so on.

In its simplest form, the effort of Man Days can be divided by the available resource of workers – though for reasons that are beyond the scope of this article, this is not strictly true!

But what happens if you need to estimate a task that you can neither accurately predict the complexity of nor rely on equal ability amongst your available resources?

What is a Story Point?

People have been building walls since the time of the Ancient Egyptians. And maybe before. Humanity, as a whole, knows what it’s doing when it comes to building walls.

But what about modern computer systems? These can often be complex and custom in design. And in an agile environment, they’re also constantly evolving.

And with the continual advent of new technologies, some staff might be more competent in some technologies than others at any given time.

In such an environment, it is no longer appropriate to measure estimation in terms of effort. Instead, it is far better to measure in terms of complexity.

And just as the “Man Day” is a unit of effort, the watt a unit of power and the gram a unit of mass, it can be said that the “Story Point” is a unit of complexity.

When we estimate Feature X as “8 Story Points”, we are saying Feature X has a Complexity of 8. But what do we mean by “Complexity of 8”?

Fibonacci Sequence and Relative Complexity

Measures

When we discussed the Man Days above with our “8 Man Days” example, the 8 acted as a multiplier. As in “1 Man Day x 8”. And we could then do arithmetic on our estimate of effort – as in “(1 Man Day x 8) / 2 Men)”.

Unfortunately, we are fresh out of luck when it comes to using arithmetic to measure Complexity with Story Points and must instead look at Relative Complexity.

This is why many Story Point scales use a Fibonacci sequence (0, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89) or similar.

We should never say that Feature X with 8 Story Points is 4 times more complex than Feature Y with 2 Story Points. (Don’t be tempted to use the word “effort” here. Story Points are a unit of complexity, not effort!) We can only say that Feature X is relatively more complex than Feature Y and relatively less complex than Feature Z, which has, say, 13 Story Points.

The Fibonacci sequence reflects the uncertainty of the estimate as the complexity grows!

Why Story Points can not be converted into Man Days

I hope, by now, that this is obvious. They are simply not the same thing. One is a unit of effort, the other a unit of complexity.

And on top of that, one can have arithmetic easily applied to it, while the other is a relative scale that can not.

But herein lies the beauty of Story Points, because just as Story Points solve the issue of complexity in a computer system in a way that Man Days can not, they also help solve the issue of resource skill imbalance.

Solving the issue of resource skill imbalance

Say you have two software developers in a room, and they need to estimate two features: Feature X and Feature Y.

The more experienced of the two developers look at Feature X and triumphantly declares “2 Man Days” and then looks at Feature Y and triumphantly declares “4 Man Days”.

But the less experienced developer looks uneasy. They would have much rather declared “4 Man Days” for Feature X and “8 Man Days” for Feature Y.

This won’t do – when your boss is looking on expectingly, there can only be one correct answer! But what are these developers really saying?

They are both saying that Feature Y is relatively more complex than Feature X. Using Story Points, they can both agree, for example, that Feature X has a complexity of 3 and Feature Y has a complexity of 5.

In terms of committing to work, they have committed to implementing Features X and Y with a combined complexity of 8. They have not committed to a defined effort. They can’t. Committing to a defined effort, as illustrated above, requires a commitment that one set resource will carry out the implementation.

So how can we measure this work?

Measuring throughput

In simple terms: you measure your Velocity over time.

Using the above example, both developers have committed to 8 Story Points of Complexity. It’s a Complexity that the experienced developer estimated would take 6 Man Days to complete, and the less experienced developer estimated would take 12 Man Days to complete.

But this does not matter. If we are working with the Scrum methodology, we are working within a time-boxed Sprint time frame. Say, two weeks. In the case of our example, this would be 20 Man Days of average resource availability: 5 days x 2 weeks x 2 developers.

And if, say, this is our first Sprint, we might choose to only commit to delivering our 8 Story Points of Complexity.

At the end of the first Sprint, one of three things will have happened. Either the 8 Story Points of Complexity have utilised our available resource time exactly; we finish ahead of schedule (say in half the available time), or we continue to have outstanding work. The developers have either perfectly committed, committed or over-committed.

If the first scenario happens, then our Velocity is currently 8 Story Points per 20 Man Days of average resource availability. We should commit to no more than 8 Story Points of Complexity for the following Sprint and see how we get on, with our Velocity averaging out over time.

If scenario two happens, then our Velocity is currently 16 Story Points per 20 Man Days of average resource availability and we should commit to 16 Story Points of Complexity for the following Sprint. Again, our Velocity will average out over time.

And finally, if scenario three happens, then our current Velocity is less than 8 Story Points per 20 Man Days of average resource availability and we should commit to fewer Story Points going forward.

It’s important to note that while it’s the ideal scenario, perfect committing is rare – people are generally optimistic about what they can get done! The trick is that Story Points average over time, so you’ll see a more stable and consistent delivery over an average number of sprints (3 is a common number). Over time, teams will learn what their true Velocity is and adapt as new skills are developed.

Conclusion

Hopefully, this article has been of some use in shining a light on why Story Points are a measure of complexity, not effort, and why they should be used as such when it comes to improving the software development estimation practices of your teams.

If you have any further questions regarding anything Agile for your business development, then please get in touch.

Published: Apr 12, 2018

Updated: Jun 22, 2023

Atlassian