The most accurate survey of a group of people is a vote: Just ask everyone to make a decision and tally the ballots. It's 100% accurate, assuming you counted the votes correctly.
(By the way, there's a whole other topic in math that describes the errors people can make when they try to measure things like that. But, for now, let's assume you can count with 100% accuracy.)
Here's the problem: Running elections costs a lot of money. It's simply not practical to conduct a public election every time you want to test a new product or ad campaign. So companies, campaigns and news organizations ask a randomly selected small number of people instead. The idea is that you're surveying a sample of people who will accurately represent the beliefs or opinions of the entire population.
But how many people do you need to ask to get a representative sample?
The best way to figure this one is to think about it backwards. Let's say you picked a specific number of people in the United States at random. What then is the chance that the people you picked do not accurately represent the U.S. population as a whole? For example, what is the chance that the percentage of those people you picked who said their favorite color was blue does not match the percentage of people in the entire U.S. who like blue best?
Of course, our little mental exercise here assumes you didn't do anything sneaky like phrase your question in a way to make people more or less likely to pick blue as their favorite color. Like, say, telling people "You know, the color blue has been linked to cancer. Now that I've told you that, what is your favorite color?" That's called a leading question, and it's a big no-no in surveying.
Common sense will tell you (if you listen...) that the chance that your sample is off the mark will decrease as you add more people to your sample. In other words, the more people you ask, the more likely you are to get a representative sample. This is easy so far, right?
Okay, enough with the common sense. It's time for some math. (insert smirk here) The formula that describes the relationship I just mentioned is basically this:
The margin of error in a sample = 1 divided by the square root of the number of people in the sample
How did someone come up with that formula, you ask? Like most formulas in statistics, this one can trace its roots back to pathetic gamblers who were so desperate to hit the jackpot that they'd even stoop to mathematics for an "edge." If you really want to know the gory details, the formula is derived from the standard deviation of the proportion of times that a researcher gets a sample "right," given a whole bunch of samples.
Which is mathematical jargon for..."Trust me. It works, okay?"
So a sample of just 1,600 people gives you a margin of error of 2.5 percent, which is pretty darn good for a poll.
You've probably heard that term — "margin of error" — a lot before. Reporters throw it around like a hot potato — like if they linger with it too long (say, by trying to explain what it means), they'll just get burned. That's because many reporters have no idea what a "margin of error" really represents.
I gave you the math up above. But let's talk about what that math represents. When you do a poll or survey, you're making a very educated guess about what the larger population thinks. If a poll has a margin of error of 2.5 percent, that means that if you ran that poll 100 times — asking a different sample of people each time — the overall percentage of people who responded the same way would remain within 2.5 percent of your original result in at least 95 of those 100 polls.
(WARNING: Math Geek Stuff!)
Why 95 times out of 100? In reality, the margin of error is what statisticians call a confidence interval. The math behind it is much like the math behind the standard deviation. So you can think of the margin of error at the 95 percent confidence interval as being equal to two standard deviations in your polling sample. Occasionally you will see surveys with a 99-percent confidence interval, which would correspond to three standard deviations and a much larger margin of error.
(End of Math Geek Stuff!)
If a poll says that 48 percent of registered voters surveyed are likely to vote for Candidate A and 46 precent of those voters plan to cast their ballots for Candidate B, you'll likely hear reporters saying that Candidate A has a two-point lead. Now that's true in this poll, but given the likely margin of error, a mathematician wouldn't say that Candidate A has a two-point lead in the actual race. There's just too much of a chance that Candidate A's true support is enough less than 48 percent and the Candidate B's true support is enough higher than 46 percent that the two might actually be tied, or maybe even that Candidate B might have a slight lead. You can't say for sure on the basis of a single poll with a two-point gap.
If you want to get a more accurate picture of who's going to win the election, you need to look at more polls. Just as asking more people in one poll helps reduce your margin of error, looking at multiple polls can help you get a more accurate view of what people really think. Analysts such as Nate Silver and Sam Wang have created models that average multiple polls to help predict which candidates are most likely to win elections. (Silver got his start using baseball statistics to predict future on-field performance, which goes to show that numbers can help you predict things other than elections.) In 2012, Silver was 50-for-50 in predicting state results in the presidential election, based on his model for averaging publicly available polls.
Now, remember that the size of the entire population doesn't matter when you're measuring the accuracy of polls. You could have a nation of 250,000 people or 250 million and that won't affect how big your sample needs to be to come within your desired margin of error. The Math Gods just don't care.
Sometimes you'll see polls with anywhere from 600 to 1,800 people, all promising the same margin of error. That's because pollsters often want to break down their poll results by the gender, age, race or income of the people in the sample. To do that, the pollster needs to have enough women, for example, in the overall sample to ensure a reasonable margin or error among just the women. And the same goes for young adults, retirees, rich people, poor people, etc. That means that in order to have a poll with a margin of error of five percent among many different subgroups, a survey will need to include many more than the minimum 400 people to get that five percent margin in the overall sample.
Read the rest of Robert's statistics lessons for people who don't know math.
© Robert Niles. Read more in the column archive.