Benford’s Law and the Weird Ways Our World Works

Benford's law: graphic showing integers of different sizes

What the heck is Benford’s law and why should anyone care? I was oblivious until I recently saw an episode of the series Connected on Netflix on the topic, but it turns out that the distribution of numbers in our world is a lot less random than you might think.

Physicist Frank Benford was actually the second one at the party on this idea (after astronomer Simon Newcomb), but he got the name. Benford’s law describes the frequency at which the digits 1-9 would appear as leading digits in sets of numerical data. Yawn, right? Except not so much.

Uneven distribution of probabilities

Basic statistical probability would say each digit from 1-9 is equally likely to show up, which would work out to about 11% of the time. That wouldn’t be noteworthy, and Benford would have faded into oblivion.

But the world isn’t boring, it’s weird, and there isn’t an even distribution of probabilities. It turns out that lower leading digits are a lot more common than higher digits.

graph of frequency distribution of digits 1-9, with the heading Rozklad Benforda
Gknor, Public domain, via Wikimedia Commons

The heading of this graph happens to be in Polish, just because that’s what Wikipedia had, but it shows the frequency distribution for each digit. The number one is by far the most common leading digit, occurring in about 30% of the numbers in a data set, and nine is the least common, only occurring in under 5%.

So, if you had a set of 100 numbers from a given set of data of a type that fits Benford’s law, about 30 of them would start with a 1, about 17 would start with a 2, and so on, with only 4-5 starting with a 9.

Where Benford shows up

That’s the background, but where it gets interesting is that this pattern shows up in many different types of data sets that refer to both man-made and natural phenomena. Wikipedia gives examples like electricity bills, street addresses, stock prices, house prices, population figures, death rates, geographical areas of countries, and natural features like lengths and drainage rates of rivers. All Benford.

The heights of the tallest buildings in the world? Benford, no matter what the units of measurement are. It also doesn’t matter whether you use base 10 or a different system of counting.

Cancer incidence rates? Benford.

Friend counts on social platforms? Benford…. except bots don’t follow natural patterns and thus don’t tend to conform to Benford’s law.

Uses of Benford’s law

And that’s where Benford’s law can come in handy. Numbers that have been fudged tend to be more uniform (what we might instinctively think of as random) than natural data. That means Benford’s law can help with detecting things like:

  • accounting fraud and fraudulent tax returns
  • fraudulent government macroeconomic data
  • fraudulent scientific results
  • election results that have been tampered with
  • images that have been manipulated (the data sets come from measures of properties like light intensities)
  • fudged campaign finance figures
  • falsely reported toxic gas emissions

This makes my inner geek so, so excited. It’s weird, and I like weird!

How does it work?

There’s an explanation given in this article in The American Statistician for why Benford’s law works the way it does. I’m not mathematically oriented enough to follow it, but I’m okay with that.

In a way, it’s probably more interesting not to get bogged down in statistics, and just contemplate that the world is full of patterns that we don’t really understand, or have only scratched the surface of. And I like being a geek [insert contented sigh here].

The science corner: pseudoscience, public health, and media literacy

Writing about science and debunking pseudoscience makes my heart sing! Visit the How to Spot Pseudoscience to explore other Science Corner posts on Mental Health @ Home.

16 thoughts on “Benford’s Law and the Weird Ways Our World Works”

  1. Just logging into my bank account online and remembering this post and wondering if the “random” digits they request from my secret data to log in are really random. I don’t think I’ll ever look at them the same way again…

Leave a Reply

%d bloggers like this: