|

A Note on Statistical Deception

I recently saw a comment asserting that “people lie, but numbers don’t lie.” This should be put up against this: “There are lies, damn lies, and statistics.”

Numbers indeed don’t lie. They’re just numbers, but like “guns don’t kill, people kill” (though they often do it more efficiently with guns), so numbers don’t lie, but people lie regularly with numbers. Even more, people lie to themselves with numbers.

Numbers are not immaculately conceived. They often have interesting pasts. The first thing you should ask about any number is: How was this number generated?

Let me give a fictional example. Suppose I want to know how many people like Key Lime Pie. It’s seems likely the number will be quite high, but I am faced with a skeptic who wants some solid evidence.

What might this evidence be? Well, I could say, “I talked to a lot of people, and nearly all of them like Key Lime Pie. OK, great, but that sounds rather subjective. I need a way to make my assertion sound more objective. So I ask 10 people, and what I ask them is not whether they like Key Lime Pie or not, but rather, for a rating between 1 and 10 of how much they like it. I’m just going to make up some results; I actually could not possibly care any less about who likes what kind of pie. But let’s suppose that my results are 9.5, which might be produced by 5 people saying 10 and 5 saying 9.

So now, instead of saying that most people I talk to like Key Lime Pie, or even, “almost everyone,” I can say that in a survey, it was found that Key Lime Pie was rated 95% on a scale of intensity of liking.

What have I actually done? I have simply assigned numbers to people’s subjective opinion. There is no guidance (nor do I think there could be adequate guidance) on precisely what each number means. So I have numbers attached to a subjective feeling, totaled and averaged. I sound more objective, but my information is still quite subjective.

Further, if you don’t know how many people I asked, or how my question was phrased, or how I selected the people I asked, then while my number sounds objective, you really have no idea how good it is.

Now suppose I ask another question: How many of you have stolen a slice of Key Lime Pie? Probably none of my ten people admit to this crime, and so I can make up a new, objective sounding result, which is that despite how intensely people like Key Lime Pie, they still decline to steal any.

The information is as good as the people from whom it was acquired, and has the potential to have been corrupted by the procedure. For example, I might have asked the question this way: “I despise people who don’t like Key Lime Pie! I think they’re evil. Rate how much you like Key Lime Pie on a scale of 1 to 10.”

Depending on how I present myself (it’s possible people will decide they hate the pie just to annoy me!), this has the potential to change the results.

Compare this to a coin toss or dice rolling experiment. In those cases, barring methods of cheating, the information is objective. I can still derive false conclusions, but the head or the tail is not a matter of opinion.

In my introduction to statistics course we had to do the exercise with coins, with predictably similar results. None of us were able to rig coin tosses. Then we had to do some kind of survey. I created on survey on the perception of various jobs.

Someone helped me with part of the interviewing, and when she brought back her surveys she wondered if I wanted to interview a different candidate. I had used random selection from a list of names, all of whom were students at the college I attended. The reason the interviewer wondered if she should discard the one interview was that the student had seen that she was rating “doctor” vs “janitor” (two of the ten occupations listed), and thought that was unfair, so she gave everyone a score of 10.

Of course I kept the interview, because that was one of the possible human responses to the question. Note here that I don’t object to using a numerical scale for responses. It can help people in answering a question and distinguish intensities. The important this is, well, that is the next point.

When you see a statistical result presented, what do you do?

  1. Ask how the data was collected
  2. Ask how the data was analyzed
  3. What was the sample used?
  4. What were the qualifications and potential biases of interviewers?

All of these things and a number of other details can be found in the publication data (often online these days) for the survey in question. If you can’t find it, be very skeptical of the survey. A survey without known methodology is just a bunch of numbers thrown at a page. It might be good data, but without the methodology you don’t know.

“But this is very difficult. It takes too much time,” you say. Well, unless you follow up and know this information you don’t actually “know” that the survey data you are using is any good.

I suggest a simple option in that case: Withhold your opinion. You don’t need an opinion on everything. I often say, “I don’t know enough about this, or I don’t have enough time to investigate this such that my opinion would have any validity. Thus I will refrain from expressing such an opinion.”

I suggest this procedure!

Similar Posts

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.