being an informed consumer of polls

Nate Silver is accusing the Strategic Vision polling firm of making up its data (which would be much cheaper than collecting it!). I don’t know the truth about this case, but I’m especially interested because one of the disputed Strategic Vision polls found extremely poor knowledge of civics among Oklahoma students–and civics is my main interest.

In any case, this seems an opportune moment for some remarks about polling, in general. I’ve been involved (usually with collaborators) in commissioning nine national surveys using at least five different firms. I’ve had mixed experience, ranging from deep respect to real concerns. All the firms I’ve been involved with have been basically reliable, but polling is not a pure science. There is an important aspect of art or craft, and quality is inconsistent. If you take survey results as precise and fully reliable, you’re naive. But if you reject high quality surveys because you don’t like the results–something I’ve seen happen on many occasions–you are equally mistaken. Although polling is an art or a craft and not a pure science, a good poll is far from arbitrary.

Polling would be more of a science if probability sampling really worked. The theory suggests that you can say something about a whole population based on a small number of respondents, randomly selected. But it is never possible to achieve pure random selection. That’s partly because there is no list of all Americans from which names can be drawn; you have to use a substitute (such as randomly generated telephone numbers) which must omit some individuals. To make matters worse, most people don’t agree to be surveyed. The response rate is always far below 50% and very uneven across demographic groups. So if you randomly dialed a bunch of phone numbers and took down the answers of the first 1000 people who agreed to be interviewed, you would have a deeply biased survey.

Instead, all pollsters I’ve dealt with use some combination of demographic quotas (i.e., they contact people randomly until they have enough respondents within each particular category), lists of individuals who are likely to respond, and weighting. To “weight” a sample after it’s been collected, you adjust it to match the whole population. If, for instance, you have only half as many young white males as their prevalence in the whole community, each one’s responses must count for two.

Every survey I know of has been weighted, but it makes a great deal of difference how. To double the responses of 150 young white males is not a big problem; but sometimes three young Latino males can be counted for 100. That stretches the representativeness of the sample past the breaking point. There is also a huge question about what categories to use for weighting. You can weight a sample to match the ethnic, age, and gender composition of the whole population but still get badly biased answers to opinion questions if you don’t weight to religion or political party. But if you weight to everything, you haven’t really taken a survey.

A pollster can reduce the need for weighting by making dogged efforts to interview the first people who were randomly selected to be interviewed–but that’s expensive. It’s much cheaper to move on to someone else and then “weight” after the fact. Even the most dogged efforts never yield particularly high response rates, which is why weighting is unavoidable.

I have focused here on sampling issues, but of course the questions one chooses to ask on a poll also introduce all kinds of bias. Writing good questions is very much an art, and no amount of statistical testing for reliability and validity can ever tell you for certain whether your questions are good.

At CIRCLE, we are an unusual client because we never purchase a report from a survey firm. We purchase the data, including the unweighted dataset. We then spend a lot of time analyzing the sample and making judgment calls. Sometimes, we will refrain from talking about particular subsamples because the weights are too large. In one case, we threw out a whole national poll that had cost more than $100,000 to collect because we did not believe it was reliable.

If you’re a regular reader of polls in the news, you can’t analyze the raw data. You have to rely on intermediaries, such as reporters, editors, clients, and associations of pollsters, to separate the wheat from the chaff. I don’t believe that simple distinctions (such as random-digit-dialing versus online samples) are all that helpful, because there are tradeoffs between the various methods. You can only hope that the intermediaries you trust have looked closely at response rates, quotas, and weighting schemes. It’s also helpful to apply some common sense–for instance, I find it difficult to believe that Oklahoma students know as little as Strategic Vision claims–and to compare more than one poll if they are available.

Finally, I wouldn’t throw the baby out with the bathwater. Most respectable pollsters struggle very hard to get representative samples and to ask good questions. Their results are more reliable for certain purposes than others. (For instance, an estimate of the frequency of a behavior–like “37% of people volunteer”–is less reliable than a pattern in the data, such as “volunteers are much better educated than non-volunteers.”) But certainly it’s wise to credit their basic findings about what people like and what they believe.