There are all kinds of statistics that get thrown around, some of which appear rather dubious. As a result, there seems to be a lot of suspicion about statistics in general. But is that warranted?
Let’s start with what statistics are. According to Wikipedia, statistics is “the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data.” Descriptive statistics are concerned with describing relationships within sets of data, while inferential statistics examine phenomena that are subject to random variation.
Performing statistical operations isn’t inherently subjective, and steps are deliberately taken to reduce the risk of bias. However, the people performing the statistics can make errors or poor choices. Also, once statistical results are released out into the world, they’re often interpreted without taking the whole picture into consideration.
Let’s say I wanted to do a political poll. I find some people to ask, collect my data, crunch my numbers, and find that Peanut the guinea pig is the favoured candidate to rule the world by 85% of respondents.
Go Peanut! What more could anyone possibly need to know? Well, a fair bit, actually, and that 85% figure doesn’t mean much without it. How many people were polled? This may be written as N=100 if there were 100 people polled. How did I find those people, and do they represent a broad cross-section of the population of interest? What questions were asked and what were the possible options to choose from? What if the only options were Peanut or some dude who’s already dead?
You’ll also need some other details to accompany that 85% figure, which will look something like this line from Ipsos: “the results of the poll are considered accurate to within +/- 3.5 percentage points, 19 times out of 20.” Huh?
Well, applying that to the Peanut poll, that means you can say with 95% confidence (aka 19 times out of 20; determining this involves statistical calculations) that 85% of people, with a margin of error of 3.5% on either side to account for random chance in sampling, are pro-Peanut for leader of the world.
Research data is often interpreted using statistics. In clinical trials, typically a 99% confidence level would normally be used rather than the 95% level often used for public opinion polls. A “confidence interval” would be determined for each treatment arm based on the margin of error and a 99% confidence level.
To use random unrealistic numbers, let’s say 50% of people responded to drug A, with a confidence interval ranging from 40-60%. Let’s say 31% of people responded to placebo, with the confidence interval ranging from 31-41%. Even though 50% sounds much better than 31%, the statistics show no “significant” difference between drug A and placebo. Significance in a statistical sense means the differences are larger than what would be expected to happen by chance.
How to interpret statistics?
Statistics can be very useful, but they’re only as good as the data they’re based on, and they’re only generalizable if a convincing reason is given as to why they should be.
So, what can you watch out for? If you’re just given a figure without any context as to how it was arrived at, the number alone isn’t going to mean a whole heck of a lot. Statistical calculations should result in a better understanding of the phenomenon in question; when used properly, the purpose should not be to obfuscate.
Also, pay attention to the source. The sketchier the source is, the more likely they haven’t adhered to proper statistical principles.
In the end, statistics aren’t inherently good or bad; it’s just a way of managing numbers. The more you understand about how the work, the more you’ll be aware of what kinds of questions to ask about any stats you’re presented with.
And as for Peanut, watch out, because he’ll continue his campaigning to be the next ruler of the world.
The Science Corner, part of the Blog Index, has info on media & research literacy, fake news, public health, and debunking pseudoscience.