Observations on Estimation

Teams following a process like Scrum tend to estimate the "size" of stories as an aid to figuring out a commitment for a sprint. My view is that this is a transitional practice, and that the aim should be to learn how to make stories all roughly the same size so that commitments (also a transitional practice) can be determined by counting.

While all of that is going on teams that want to use a numerical scale to estimate (rather than, say, "t-shirt" sizing) tend to choose a scale, a sequence of licit values from which estimates must be drawn. The various planning tools that demand a numerical field be filled in tend to force this issue.

I've noticed a tendency for "expert" level practitioners to want to use some clever non-linear scale, maybe Fibonacci numbers (1,2,3,5,8,13), maybe a geometric series (1,2,4,8,16) and they will have some sophisticated reason why this or that series is preferred. And I've noticed that a lot of teams aren't comfortable with this. They want to use a linear scale.

It seems to be traumatic enough that the estimates don't have units, or even dimensions. The idea that estimates are dimensionless but also structured can be a double cause of confusion.

Anecdote: a team had been estimating and planning and delivering consistently for a good long time. Their velocity was fairly constant, but drifted over time (fair enough). One day it turned out that their velocity happened to be numerically equal to the number of team members times the number of days to the next planning horizon. Someone noticed this and with a huge sigh of relief the team concluded that these mysterious "units" in which they estimated were actually man-days in disguise. Now they finally understood what they were estimating! And they promptly lost the ability to estimate: their next planning session was all over the place and it took some time for their planning activities to converge again. My inference was that it's actually quite important that estimates are dimensionless.

Anecdote: a User Experience expert at a client had been involved in some research whereby (as a side effect) members of the general public had to create a scale that made sense to them within which to rank the usability of features. These folks were presented with different generic objects and asked to give them a "size", and then to give a corresponding "size" to some other generic objects in order to create a scale that made sense to them, which would then be applied to the merit of the system features that were the actual target of the research. They created linear scales.

[After seeing this he added the observation that this process was in aid of avoiding what often happens with the strongly disagree, disagree, no preference... type of scale which is either polarised or bland results, neither of which is that useful]

That surprised me at first, since I know that the physics of our sensory apparatus are generally non-linear, and memory is non-linear and so forth. But thinking about it some more I realised that our experience tends to seem to be linear, even if the underlying phenomena aren't.

Meanwhile, if one did want to use a particular scale for estimating the size of stories, why not use one of the series of prefered values? They are very well established in engineering and product design and offer interesting error-minimising properties. On the other hand, it might be a real struggle to get a team to decide if a story was a 1.6 or a 3.15

I don't have a grand narrative into wich to fit these observations, but here is another related anecdote about estimation.

No comments: