What the heck is Benford’s law and why should anyone care? I was oblivious until I recently saw an episode of the series Connected on Netflix on the topic, but it turns out, the distribution of numbers in our world is a lot less random than you might think.
Physicist Frank Benford was actually the second one at the party on this idea (after astronomer Simon Newcomb), but he got the name. Benford’s law describes the frequency at which the digits 1-9 would appear as leading digits in sets of numerical data. Yawn, right? Except not so much.
Basic statistical probability would say each digit from 1-9 is equally likely to show up, which would work out to about 11% of the time. That wouldn’t be noteworthy, and Benford would have faded into oblivion.
But the world isn’t boring, it’s weird, and there isn’t an even distribution of probabilities. It turns out that lower leading digits are a lot more common than higher digits.
The heading of this graph happens to be in Polish, just because that’s what Wikipedia had, but it shows the frequency distribution for each digit. The number one is by far the most common leading digit, occurring in about 30% of the numbers in a data set, and nine is the least common, only occurring in under 5%.
So, if you had a set of 100 numbers from a given set of data of a type that fits Benford’s law, about 30 of them would start with a 1, about 17 would start with a 2, and so on, with only 4-5 starting with a 9.
That’s the background, but where it gets interesting is that this pattern shows up in many different types of data sets that refer to both man-made and natural phenomena. Wikipedia gives as examples electricity bills, street addresses, stock prices, house prices, population figures, death rates, geographical areas of countries, and natural features like lengths and drainage rates of rivers. All Benford.
The heights of the tallest buildings in the world? Benford, no matter what the units of measurement are. It also doesn’t matter whether you use base 10 or a different system of counting.
Cancer incidence rates? Benford.
Friend counts on social platforms? Benford…. except bots don’t follow natural patterns and thus don’t tend to conform to Benford’s law.
And that’s where Benford’s law can come in handy. Numbers that have been fudged tend to be more uniform (what we might instinctively think of as random) than natural data. That means Benford’s law can help with detecting things like:
- accounting fraud and fraudulent tax returns
- fraudulent government macroeconomic data
- fraudulent scientific results
- election results that have been tampered with
- images that have been manipulated (the data sets come from measures of properties like light intensities)
- fudged campaign finance figures
- falsely reported toxic gas emissions
This makes my inner geek so, so excited. It’s weird, and I like weird!
There’s an explanation given in this article in The American Statistician for why Benford’s law works the way it does. I’m not mathematically oriented enough to follow it, but I’m okay with that.
In a way, it’s probably more interesting not to get bogged down in statistics, and just contemplate that the world is full of patterns that we don’t really understand, or have only scratched the surface of. And I like being a geek [insert contented sigh here].
The Science Corner, part of the Blog Index, has info on media & research literacy, fake news, public health, and debunking pseudoscience.