There are all kinds of statistics that get thrown around, some of which appear rather dubious. As a result, there seems to be a lot of suspicion about statistics in general. But is that warranted?
Let’s start with what statistics are. According to Wikipedia, statistics is “the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data.” Descriptive statistics are concerned with describing relationships within sets of data, while inferential statistics examine phenomena that are subject to random variation.
Performing statistical operations is not inherently subjective, and steps are deliberately taken to reduce the risk of bias. However, the people performing the statistics can make errors or poor choices. Also, once statistical results are released out into the world, they’re often interpreted without taking the whole picture into consideration.
Let’s say I wanted to do a political poll. I find some people to ask, collect my data, crunch my numbers, and find that Peanut the guinea pig is the favoured candidate to rule the world for 85% of respondents.
Go Peanut! What more could anyone possibly need to know? Well, a fair bit, actually, and that 85% figure doesn’t mean much without it. You’ll want to know many people were polled, which may be written as N=100 if there were 100 people polled. You’ll want to know how I found those people, and whether they represent a broad cross-section of the population of interest. You’ll also want to know what question was asked and what the possible answers were. What if the only options were Peanut or some dude who’s already dead?
You’ll also need some other details to accompany that 85% figure, which will look something like this line from Ipsos: “the results of the poll are considered accurate to within +/- 3.5 percentage points, 19 times out of 20.” Huh?
Well, applying that to the Peanut poll, that means you can say with 95% confidence (aka 19 times out of 20; determining this involves statistical calculations) that 85% of people, with a margin of error of 3.5% on either side to account for random chance in sampling, are pro-Peanut for leader of the world.
Statistics are often used in research studies to interpret the data. In clinical trials, typically a 99% confidence level would nornally be used rather than the 95% level often used for public opinion polls. A “confidence interval” would be determined for each treatment arm based on the margin of error and a 99% confidence level.
To use random unrealistic numbers, let’s say 50% of people responded to drug A, with a confidence interval ranging from 40-60%. Let’s say 31% of people responded to placebo, with the confidence interval ranging from 31-41%. Even though 50% sounds much better than 31%, the statistics show no “significant” difference between drug A and placebo. Significance in a statistical sense means the differences are larger than what would be expected to happen by chance.
Statistics can be very useful, but they’re only as good as the data they’re based on, and they’re only generalizable if a convincing reason is given as to why they should be.
So, what can you watch out for? If you’re just given a figure without any context as to how it was arrived at, the number alone isn’t going to mean a whole heck of a lot. Statistical calculations should result in a better understanding of the phenomenon in question; when used properly, the purpose should not be to obfuscate.
Also, pay attention to the source. The sketchier the source is, the more likely they haven’t adhered to proper statistical principles.
In the end, statistics aren’t inherently good or bad; it’s just a way of managing numbers. The more you understand about how the work, the more you’ll be aware of what kinds of questions to ask about any stats you’re presented with.
And as for Peanut, watch out, because he’ll continue his campaigning to be the next ruler of the world.
There’s more about separating science from BS on the science corner page.