blog image

Why Story Points are a measure of complexity, not effort

In this post, Clearvision Development Expert Rob Giddings explores the differences between effort and complexity when it comes to measures of estimation in Story Points.

Measures of estimation

For those of you who have not heard of Story Points, let me start by explaining that they are now the preferred method of estimation for the backlog of work on your project.

Instead of estimating Feature X in the backlog as “8 Man Days”, you might estimate Feature X as “8 Story Points”.

But herein lies a problem. What exactly do we mean by “8 Man Days” and “8 Story Points”? They are not interchangeable.

What is a “Man Day”?

First let us re-visit that old friend of ours in estimation: The Man Day!

time

The Man Day, which would perhaps be better referred to as “The Worker Day”, has long been used as a measure of estimation when you can accurately predict the complexity of the task ahead, and when your available resource is of equal ability.

Say you want to build a wall X feet long and Y feet high. You might say it is “8 Man Days” of effort. Effort, here, being the operative word.

So, if you have 1 person, and they work for 8 Days, they should have built the wall if your estimate is accurate.

And if you have 2 people, they should have built the wall in 4 days. And 4 should have built the wall in 2 days, and so on.

In its simplest form, the effort of Man Days can be divided by the available resource of workers – though for reasons that are beyond the scope of this article, this is not strictly true!

But what happens if you need to estimate a task that you can neither accurately predict the complexity of, nor rely on equal ability amongst your available resources?

What is a Story Point?

People have been building walls since the time of the Ancient Egyptians. And maybe before. Humanity as a whole, knows what it’s doing when it comes to building walls.

But what about modern computer systems? These can often be complex and custom in design. And in an agile environment, they’re also constantly evolving.

And with the continual advent of new technologies, some staff might be more competent in some technologies than others, at any given time.

In such an environment, it is no longer appropriate to measure estimation in terms of effort. Instead, it is far better to measure in terms of complexity.

And just as the “Man Day” is a unit of effort, the watt a unit of power and the gram a unit of mass, it can be said that the “Story Point” is a unit of complexity.

When we estimate Feature X as “8 Story Points”, we are saying Feature X has a Complexity of 8. But what do we mean by “Complexity of 8”?

Fibonacci sequence and Relative Complexitymaths

When we discussed the Man Days above with our “8 Man Days” example, the 8 acted as a multiplier. As in “1 Man Day x 8”. And we could then do arithmetic on our estimate of effort – as in “(1 Man Day x 8) / 2 Men)”.

Unfortunately, we are fresh out of luck when it comes to using arithmetic to measure Complexity with Story Points, and must instead look at Relative Complexity.

This is why many Story Point scales use a Fibonacci sequence (0, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89) or similar.

We should never say that Feature X with 8 Story Points is 4 times more complex than Feature Y with 2 Story Points. (Don’t be tempted to use the word “effort” here. Story Points are a unit of complexity, not effort!) We can only say that Feature X is relatively more complex than Feature Y, and relatively less complex than Feature Z, which has, say, 13 Story Points.

The Fibonacci sequence reflects the uncertainty of the estimate, as the complexity grows!

Why Story Points can not be converted into Man Days

I hope by now, that this is obvious. They are simply not the same thing. One is a unit of effort, the other a unit of complexity.

And on top of that, one can have arithmetic easily applied to it, while the other is a relative scale that can not.

But herein lies the beauty of Story Points, because just as Story Points solve the issue of complexity in a computer system in a way that Man Days can not, they also help solve the issue of resource skill imbalance.

Solving the issue of resource skill imbalance

Say you have two software developers in a room and they need to estimate two features: Feature X and Feature Y.

The more experienced of the two developers looks at Feature X and triumphantly declares “2 Man Days” and then looks at Feature Y and triumphantly declares “4 Man Days”.

But the less experienced developer looks uneasy. They would have much rather have declared “4 Man Days” for Feature X and “8 Man Days” for Feature Y.

This won’t do – when your boss is looking on expectingly, there can only be one correct answer! But what are these developers really saying?

They are both saying that Feature Y is relatively more complex than Feature X. Using Story Points, they can both agree, for example, that Feature X has a complexity of 3, and Feature Y has a complexity of 5.

In terms of committing to work, they have committed to implementing the Features X and Y with the combined complexity of 8. They have not committed to a defined effort. They can’t. Committing to a defined effort, as illustrated above, requires a commitment that one set resource will carry out the implementation.

So how can we measure this work?

Measuring throughput

In simple terms: you measure your Velocity over time.

Using the above example, both developers have committed to 8 Story Points of Complexity. It’s a Complexity that the experienced developer estimated would take 6 Man Days to complete and the less experienced developer estimated would take 12 Man Days to complete.

But this does not matter. If we are working with the Scrum methodology, we are working within a time-boxed Sprint time frame. Say, two weeks. In the case of our example this would be 20 Man Days of average resource availability: 5 days x 2 weeks x 2 developers.

And if, say, this is our first Sprint, we might choose to only commit to delivering our 8 Story Points of Complexity.

At the end of the first Sprint, one of three things will have happened. Either the 8 Story Points of Complexity have utilised our available resource time exactly; we finish ahead of schedule (say in half the available time); or we continue to have outstanding work. The developers have either perfectly committed, under committed or over committed.

If the first scenario happens, then our Velocity is currently 8 Story Points per 20 Man Days of average resource availability. We should commit to no more than 8 Story Points of Complexity for the following Sprint and see how we get on, with our Velocity averaging out over time.

If scenario two happens, then our Velocity is currently 16 Story Points per 20 Man Days of average resource availability and we should commit to 16 Story Points of Complexity for the following Sprint. Again, our Velocity will average out over time.

And finally, if scenario three happens, then our current Velocity is less than 8 Story Points per 20 Man Days of average resource availability and we should commit to less Story Points going forward.

It’s important to note that while it’s the ideal scenario, perfect committing is rare – people are generally optimistic about what they can get done! The trick is that Story Points average over time, so you’ll see a more stable and consistent delivery over an average number of sprints (3 is a common number). Over time, teams will learn what their true Velocity is, and adapt as new skills are developed.

Conclusion

Hopefully this article has been of some use in shining a light over why Story Points are a measure of complexity not effort, and why they should be used as such when it comes to improving the software development estimation practices of your teams.

Rob is an experienced and enthusiastic software developer, having worked on all aspects of software development for large custom web based systems, which underpin the businesses that rely on them.

Renowned for his ability to work efficiently and to a very high standard, Rob has technical and problem solving analytical skills that are often called upon.

  • Michael Van Geertruy

    I have an issue with your thesis statement, story points are definitely not a measure of complexity. Think about it like this, if you take a user story for licking 1,000 stamps and compare it to minor brain surgery, there is a huge difference in complexity. But, the effort to get the work done is about the same. Mike Kohn and Jeff Sutherland both agree that story-points are a relative measure of effort, not complexity. Use story points to quickly estimate effort prior to placing a user story onto your sprint backlog. Then, use hours to estimate tasks and measure time left to complete user stories.

    • Robert Giddings

      Hi Michael,

      Thank you for replying to my blog post. You are not the only one to have mentioned to me that story points are not a measure of “complexity”. But I think this boils down to semantics and perhaps I was wrong to specifically use the word “complexity”. I think we will both agree that story points are a measure of “X”. Whatever “X” might specifically be. What I wanted to high-light was that story points are not a measure of “effort” in the traditional sense either. Let me explain.

      Suppose I tell you the weight of an object as 10 kilograms. Depending on who you are and where you are located, you might respond “Mind if I have that in pounds and ounces?”. The point being, both are a measure of weight and there exists a conversion between the two.

      Similarly, if we specifically state that story points are a measure of “effort”, then it stands to reason there exists a conversion that can be used to convert story points into the traditional measure of effort: “man days”. And vise versa.
      But to the best of my knowledge, no such conversion exists. And if such a conversion did exist, surely it would completely undermine the point of story points?

      Answering your 1000 stamp licking and brain surgery example, which I assume you have got from here (https://www.mountaingoatsoftware.com/blog/its-effort-not-complexity), I disagree with it as a good example for this reason.

      In the example it is stated that the 1000 stamp licking (performed by a little kid) and the brain surgery (performed by a brain surgeon) “despite their vastly different complexities, the two items should be given the same number of story points – each is expected to take the same amount of time.”.

      But at that point, both the little kid and brain surgeon might just as well have dispensed with story points altogether and just stated their estimate in “man days”. But again, does this not completely undermine the point of story points?

      Now assuming both the little kid and the brain surgeon are working in the same team and both the 1000 stamp licking and brain surgery are items in the same backlog, then I would expect both items to have completely different story points.
      They should both agree that the brain surgery is “X” (whatever X is) more than a 1000 stamp licking. The fact that the brain surgeon might well complete the brain surgery in the same amount of time as the little kid completes the 1000 stamp licking is not relevant, because story points are supposed to be applied to stories independently of who will ultimately perform the task. That is the whole reason behind using story points instead of man days.

      Backlog forecasting can be performed on completed story points in order to give an indication of future throughput. This is the team’s velocity. For example if the little kid and brain surgeon deem the 1000 stamp licking to be 2 story points and the brain surgery to be 100 story points, but the brain surgeon completes the brain surgery in the same time it takes the little kid to perform the 1000 stamp licking then the team as a whole is deemed to have a velocity of 51 story points at that moment in time.

      I’m sure neither of us will be the last to comment on the topic of story points and the discussion continues…

      • Another Opinion

        “But at that point, both the little kid and brain surgeon might just as well have dispensed with story points altogether and just stated their estimate in “man days”. But again, does this not completely undermine the point of story points?”

        I disagree with this – it’s not correct to say they ‘might as well’ revert to using man days if story points are used in this way, as there are still significant differences.

        When used in this way (to capture effort, therefore having a time consideration), story points are about one’s perceived effort and using the Fibonacci scale offers a way to abstract this into the realm of perception. At this point we are not holding people to a set number of days and it allows for stories being completed quicker and more slowly than anticipated. Therefore this makes the velocity meaningful.

        Additional the velocity is more accurate too. Surely your velocity is skewed and less reliable when purely estimating complexity. For example, say one sprint you have two simple tasks that each take a few days (say they’re labour intensive), and the next sprint you have four more complex tasks that can be done more quickly than the simple tasks but are more complex.

        Sprint 1 = 2x 1 Story Points = 2
        Sprint 2 = 4x 3 Story points = 12

        Your average velocity is immediately less accurate than if effort was a factor.

        After all, sprints are defined by time. Therefore the points that fit into it need to consider effort (and therefore time) too.

    • Jeff

      Wrong.

  • Peter

    The problem is that you cannot compare something you have never done with something you have (even approximately) done. Further, if you have already (approximately) done it then why are you doing it again and not simply reusing/refining your previous attempt?

    In the case of building a wall/licking stamps/brain surgery – you have a reasonable degree of certainty what you need to do and how/when you need to do it. In the case of painting a picture you have more uncertainties (light/subject/size/medium/etc) than you do certainties, so how can you accurately estimate how long it will take? Either you compromise quantity/quality or you compromise your estimate.

    Story points must include effort and uncertainty as well as complexity; otherwise developers will just pad/fudge tickets to match the required velocity. Also measuring one teams velocity against another s is counter-productive – it leads to bidding wars. There have been many well publicized large (cost and time) projects where the project was extended and extended again because the ‘winning bidder’ simply asked for an extension and nobody would admit that the initial guestimate was just plain wrong; meanwhile the ‘losing bidder’ was probably more accurate and would probably have already delivered.

    I have close to 40 years experience developing software for both large corporations and small start-ups.

  • masterBrog

    Thanks for this well written article. I feel it gives a well-rounded summary of the benefits of thinking in relative terms when story pointing. However, I do feel the use of the word ‘complexity’ over-simplifies what a developer should consider when story pointing.

    Bringing it back to a real-world example; I recently worked on a product that needed some simple but very lengthy migration work done on it. If we were to story point on relative complexity alone, we would have story pointed it similarly to fixing a typo, i.e. low complexity. We all knew the migration work was going to take a lot longer to complete. If the ‘relative complexity’ approach had been taken, we would have over-loaded the current sprint and reduced our velocity so much so that next sprint would have been under-loaded.

    What we actually did was discuss the work involved (including several factors) and conclude that it was going to take about the same amount of time (or effort) as another story on the backlog. And then gave it the same story points as the existing story. The result being that our velocity for that sprint better represented what the team could actually achieve.

    This is where people kick-off because I mentioned the word ‘time’. But we’re only using it as a way of helping us work out the relative size of the stories. We are NOT using time as an estimate in which to set deadlines or figure out how much we can fit in a sprint.

    In the real-world, I believe thinking purely in terms of relative complexity won’t arrive at a useful velocity and will therefore result in inferior planned sprints, which is ultimately what story pointing is attempting to mitigate against.