Like most academics I get the occasional paper rejected from academic journals. In theory, the peer review process ensures that the research in published papers is scientifically sound and that the writing accurately describes the research. But, when I get rejected on papers in polling, I almost always get told to read one particular paper: Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples. The weird thing is that I do not think my work on non-probability samples disagrees with the research in this paper at all, actually I am a huge fan of the research in this paper. I just think the authors and their supporters completely misunderstand the implications of their own research.

The punchline of the paper is the last line of the abstract: “These results are consistent with the conclusion that non-probability samples yield data that are neither as accurate as nor more accurate than data obtained from probability samples.” The authors do a commendable job in the paper separating out the mode: face-to-face, mail, telephone, or online from the sampling method: probability-based and non-probability-based. The meat of the result is on the method, not the mode, so I will focus there.

The research focuses two probability-based and seven non-probability-based samples asking questions about primary demographics (sex, age, etc.,), secondary demographics (marital status, household size, etc.,) and non-demographics (cigarette and alcohol consumption, etc.,). The responses to the nine studies are compared to the benchmarks of the Census (for the demographic questions) and government surveys (for the non-demographics).

The key results are in top of Table 2, everything else basically just averages or breaks-down this portion of this table:

Figure2

1) All Non-Post-Stratified Results Are Not Meaningful: The authors keep going back and forth between post-stratified and non-post-stratified results. No polling company ever publishes raw results (i.e., non-post-stratified results). It is useful and interesting to see how hard the weights are working, but it is not a finding, it is a raw data point on the way to findings.

2) Post-Stratification Method Favors Probability-based Sample: The authors use a post-stratification method that is created and optimized for probability-based samples. The further the samples are from representative, the larger the weights on each individual response to make the overall marginal weighted demographics match the general population. Our research is emphatic that more advanced modeling and post-stratification is especially helpful the worse the non-probability sample.

3) Error in Benchmarks Favors Probability-based Sample: The authors are weighing the results to the primary demographics, thus, the main results are on secondary demographics and non-demographics. The secondary demographics are matched to the Census and the non-demographics are matched to government surveys. The modes for these two data are probably dominated by face-to-face. Thus, the mode-based bias, differences that occur simply because of the mode people answer, are going to be closer to the telephone than internet (which is the mode for all non-probability-based samples). We will never know the ground-truth of the percentage of the American population with various household size and cigarette smoking habits, but the telephone and face-to-face surveys are likely to have similar errors to each other, possibly quite different than internet surveys.

4) Study Does Not Apply to Market Research Questions: The authors asked questions that all have objective answers known by the respondent. Most market research asks subjective or conditional questions that are not known by the respondent. The error and variability in those questions is very different.

5) Probability-based polling is a set method; Non-probability is not: The authors note that they pick popular non-probability pollsters. This is not a systematic review of the best non-probability methods, but a group of pollster chosen by authors out to prove non-probability polling does not work. Probability-based polling is a set methodology with less pollster-level variability.

6) How often do you care if something is 3 or 5 percentage points off? How much are you willing to pay for that in both time and money? If we assume that the authors did not stack the research against non-probability-based methods, what should we make of their results? The probability-bases samples have errors of 2.9 and 3.4, the non-probability: 4.5, 4.5, 5.1, 5.2, 5.2, 5.5, and 6.6. For many questions that is perfectly adequate, especially if the cost is 5x of 10x less and the time is a half or quarter or less of the time to run a comparable probability-based sample.

Even with a set of authors stacking the deck against non-probability-based samples as much as possible, non-probability-based samples do pretty well when you consider cost in time and money, along with accuracy. This paper, with much of its data over 10 years old now, is a great forerunner of my work in non-probability-based sampling. So, while many people cite my paper as a key example of non-probability-based polling, I tell everyone I how inspired I was by this paper. Even if their fans use it as a blunt weapon to say, probability-based polling rules!