A Bayesian Detour 4

The dressed ox weighed in at 1197 pounds.

The mean guess of the 800 or so participants was 1196 pounds.

A one pound difference. Essentially a perfect guess.

This is not a coincidence. It cannot be a coincidence. This sort of experiment has been repeated many times. A classroom teacher could repeat it with a jar of M&Ms. It is simply a fact that the group somehow had access to information in a pure and unfiltered way, despite the fact that essentially every individual in that group did not. The group had access to knowledge that no individual in the group had access to.

This is absolutely remarkable.

The "wisdom of crowds" is amazing. Obviously, it is not a perfect result that is robust to all possible questions, or aggregation systems. Some types of issues get a biased response. Some methods of aggregation also inherently skew the result. But the fact that it can even work at all is what I want to focus on here. This actually works. Add together all of the random guesses from all of the people, and sometimes -- not always but sometimes -- you get the best possible answer that is available to us.

What the fuck?

Seriously.

What. The. Fuck.

There's a well written book on this, James Surowiecki's The Wisdom of Crowds.

The Miracle of Aggregation can occur with basic counting: the weight of an ox, or jelly beans in a jar in the classroom. It can also work with Who Wants to Be a Millionaire?, where the plurality of the crowd answer -- which was right about 90% of the time -- can give us the beginning of a story to explain it.

There are four answers. An audience gives their guesses. Let's say 96% of the audience has not the slightly clue -- Knightian Uncertainty! -- what the actual answer is. If they just "randomly" throw out a guess, then the purely ignorant guesses could show up distributed roughly equally among the four possibilities, around 24% per box, if there's no bias-introducing factor. (This might be a problem if the button to push A is bigger and shinier than the others, or if 99% of the crowd is named Barry, and the Barry's really love their name. But we can ignore that for now.)

The final 4% -- the informed audience members -- are not going to mash a random button. Their knowledge will coordinate their answer. And this will bump the correct answer into the plurality answer. As long as the "errors are unbiased", and you have a large enough crowd, the knowledgeable can be a very small percentage of the whole and yet still decide the entire issue. Again, this is weird, but now we have the beginnings of an explanation about it. With the right assumption about the distribution of the errors, the tiniest bit of signal will cut right through the noise.

Now we need to focus on the nature of that assumption: unbiased errors. That's a weird one.

But that was a discrete case. Now I want to go back to a continuous case, where we have an infinite range of answers. We can go back to that dead ox.

This is not a multiple choice test. The numbers for the weight guessing competition were left open. Fill in the blank. Although a negative weight is not a logically feasible possibility, a joker could still have written in such a number. The entry fee would have made such a joke slightly costly, but it could still be done.

How do we tell a story of knowledge and ignorance in the case when the "uninformed" could have written down literally anything.

This is a deep fucking question.

It's the same WTF question about what happened at the county fair but this time we're going to use an ever so slight bit of math. We're going to try to understand the distribution of answers that would lead to a mean that is correct.

Essentially every single person in that crowd was ignorant. They got the answer wrong. They were further away than the group average. But as it happens, essentially every single person who got the answer wrong by guessing high was balanced by a person who got it wrong by guessing low. And not just balanced, but balanced almost perfectly.

In the discrete case, we could posit perfect ignorance and just throw up a uniform distribution for those people. But we can't do that any longer. We can't throw up a uniform distribution over the real number line. It wouldn't make any sense, even if such a thing made sense. [b]The guesses weren't totally random.[/b]

They were centered on the truth.

They could have written literally anything. A billion pounds? A trillion? Ten times the mass of the known universe? Well, why not? There were no limits to the answer. People could have written down 10^10000000 as the weight, but they didn't. We can't allow that perfect Knightian Uncertainty, where people have literally no idea. We can't allow that, because that is not how human knowledge works. They were centered on the truth.

What's the ten trillionth digit of pi? Damned if I know. But wait, I'm not going to write a question mark there. There are ten possibilities. I can do at least that much, if not any more than that. I'm not going to write "eleven" as if that were sensible.

If we're trying to understand people who guess the weight of dead meat, we can likewise eliminate some answers. If we're trying to model those guesses, we can assume away anything that's larger than the mass of the known universe. Or really, any guess that requires scientific notation. The participants have entered a contest, they have some incentive to get the answer right, they're not going to write down "a thousand million pounds" as their answer.

You can say, "I don't know anything about A, B, or C!"

You can say that, but it is very often not true. Just a figure of speech. You know some things about the weight of dead meat. No negative numbers. No numbers that are most easily written in scientific notation. I can say that I don't know anything about the weight of a butchered and dressed ox. Except that I do.

It will weigh more than a person. You know that. I know that.

It will weight less than the living beast.

That is information.

This entry was posted in Uncategorized. Bookmark the permalink.