One of the most common mistakes people make when looking at data is to jump to conclusions about the data. We all live in a world of cause and effect. It is only natural that when we see data that appears to show cause and effect, we assume that it does. But it often doesn’t. This article shows the difference between cause and effect relationships and correlated data.
A Simple Experiment
Studies have shown that the temperature of soil has an effect on the germination rates of seeds. But does soil temperature affect the rate of growth of plants? You decide to find out, and you pot a bunch of grass in individual containers, stick thermometers in each pot, and measure the grass height and temperature every day. You spread the pots out in your back yard in areas that get different amounts of sunlight. The pots that get the most light are hottest, the ones that get the least light are the coolest. All plants get the same amount of water. Assume all of your data from measurement is accurate.
Imagine that you had data that showed the following combinations of soil temperature and plant growth.
- Soil temperature: 70 F. Grass growth: 1 inch
- Soil temperature: 72 F. Grass growth: 2 inches
- Soil temperature: 74 F. Grass growth: 3 inches
- Soil temperature: 76 F. Grass growth: 4 inches
- Soil temperature: 78 F. Grass growth: 5 inches
- Soil temperature: 80 F. Grass growth: 6 inches
Can you conclude that warmer soil temperatures between 70 and 80 degrees cause grass to grow more?
That is the common mistake that many people make when looking at data.
You could just as easily conclude that faster growing grass causes warmer soil temperatures, and you would be just as correct. Or rather, incorrect.
Correlation
Correlation between two sets of data means that they have related trends. The data in our example above are correlated. Rising temperatures coincide with increasing growth rates. Increasing growth rates coincide with warmer soil temperatures.
When you can not say that one factor caused the other, you have a correlation.
Consider another experiment – you plant the same grass in the same pots. This time, you install heaters under each pot, and you control the temperature of all of the pots. You create pots with each of the different temperatures you observed in the previous experiment. Then you put them all of the pots in a dark closet with no light.
None of the grass grows.
Causation
The grass didn’t grow in the closet because the warmth isn’t what caused the grass to grow.
In the first experiment, the pots were placed in areas of the yard that got varying amounts of sunlight. The sunlight caused the grass to grow and it caused the soil to get warmer. If you had measured the amount of sunlight, as well as the soil temperature and the grass growth you would have seen three values that were correlated.
To confirm that sunlight was the causal, or independent variable, you can inspect the pots that were warmed in the dark. You could also run an experiment that combined the variation in sunlight with constant soil temperatures – isolating the sunlight variable. You might even have to do that experiment to rule out the possibility that fresh air contributes to grass growth (because fresh air is present in the outside experiment, but not when the pots are put in the closet).
Jumping to Conclusions
One of the goals in writing software requirements is to make them measurable. Some requirements are easily measurable – the query returns results in 10 seconds, for example. You write a query, test it, and look at the results. If the code is too slow, you rewrite and test again, until you meet the requirement as defined.
Other requirements, while easily measurable, may be hard to control. Consider a usability requirement like “A novice user will be able to complete the task in 2 minutes.” This is easily measurable – but if you run the test, and the novice user takes 3 minutes – what do you change to reduce the amount of time? Do you add interactive help? This would certainly make it easier for the user once she becomes competent – but she will have to spend time reading the instructions when she is still a novice.
[As an aside – measuring the impact of usability investments is very hard – and one of the reasons is the challenge of separating causality and correlation.]
We run the risk of jumping to conclusions when defining these metrics. Image that you are doing an analysis of a shopping cart, as part of designing a new one. You find that 80% of the users who add an item to the shopping cart (from a product page) never purchase the product. You also find that there are 8 steps required to complete a purchase, starting with adding the item to the cart (and then viewing the cart, confirming your order, etc…). You may notice that almost all of these users stay on the site until the 7th step – where only a few ultimately purchase.
You could write a requirement that says that purchasing a product requires no more than 6 steps. If you did, you would be jumping to the conclusion that the number of steps causes the users to abandon the purchase. You would also be specifying design (which is bad). It may be that step 7 requires them to enter their social security number – and that this is the deal-breaker. If your developers implement a solution that only has 4 steps, but keep the social security number requirement, do you expect to get a massive improvement in your funnel?
Writing good measurable requirements is hard, not because measuring is hard (although sometimes it is), but because correlation is often mistaken for causality – and we therefore choose bad things to measure. Choosing what to measure is the hard part.
Good points about the trickiness of measuring usability.
It’s true that, in some cases, we’re not so much interested in reducing time or effort for users as we are keeping them from being frustrated or from abandoning the session or product. Depending on user psychology, it’s even conceivable that adding steps to a user task could decrease frustration.
If we’re willing to commit to solving the larger problems, we can instead specify “frustration level” and “abandonment rate” metrics.
A frustration level attribute might constrain the frustration measured by a certain neural indicator (I call this “neurorequirements”). In practice, we might indirectly test the constraint by polling users about their frustation level.
An abandonment rate attribute is relatively straightforward to measure if set up a test with representative users and scenarios.
Either way, I like the emphasis on first understanding – and thinking about how to measure – the problems we’re really trying to solve.
Thanks Roger!
I chuckled that you’ve defined a “hyphenated requirement.” Not disagreeing – just grinning.
Yeah, this usability measurement stuff is hard. I don’t know that we understand enough to generalize things yet – we have to anecdotally tackle the problems situation by situation.
Frustration is a tricky one. Hadn’t really thought about it before. Polling may be the only way. But as you say, it’s all about first principles – why do we care about frustration levels?
Well, at the root we want customers to buy our products. So we want to foster repeat business and positive word of mouth. But I don’t think measuring those things lies in the realm of requirements.
I generally recommend stopping once you hit a basic need (a la Maslow’s hierarchy).