Huffington Post’s Pollster counts about 100 head-to-head polls for Hillary Clinton versus Donald Trump since the start of 2016. Their model, which aggregates these polls with a local trend, estimates a consistent Clinton lead that started at 44.9 to 42.3 on January 1, 2016, expanded to 49.5 to 38.0 on March 31, 2016 and is now 44.4 to 40.1. Clinton has led most polls. At the same time there are about 60 Bernie Sanders versus Donald Trump polls, because polling companies generally ask all combinations of viable candidates. As recently as late April this generally meant 6 hypothetical general election matchups.

Our experimental polling using the mobile application-based polling firm Pollfish has shown a much more stable view of the general election, with the Democratic nominee moving from a slight deficit to a strong lead over the first five months of 2016. And, when we say Democratic nominee, we mean whomever that may be, against whomever the Republican nominee may be. The below chart takes the Democratic poll share and divides it by the sum of the Democratic and Republican poll shares. We do the same for the Pollster trend.

Figure 1.20160601

 

We have been fielding just one question since the start of the year: Who are you most likely to vote for in the upcoming presidential election? With the answers of: Definitely Republican candidate, Likely Republican candidate, Likely Democratic candidate, Definitely Democratic candidate, and not voting. This question focuses on the party, rather than the individual candidate because we believe the question will more accurately reflect voting in November, than polling in the spring.

The hypothetic matchup question suffers from many issues, including: noise (i.e., random fluctuations) and strategic polling (i.e., people answering the poll differently than they will vote):

Noise can stem from two sources: measurement error in the individual responses and random noise from poll-to-poll. Most Americans do not focus on politics all day; as hard as it to believe for everyone reading this column, Gallup’s May 13-15 poll showed just 40% of adults paying very close attention and this number is probably inflated due to social desirability bias (i.e., people pretend they are reading about politics when they are actually reading about Dancing with the Stars). The result is a disconnect between answers that people state when confronted with a new question and their true underlying beliefs. In addition, a hypothetical matchup between two candidates voters have yet to consider involves estimating uncertainty accurately, a notoriously difficult task.

It is a good idea to think of the collection of polls as a sequence of noisy measurements. While multiple measurements get us closer to the truth, single polls that informed the Huffington Post’s Pollster estimates earlier this year might be just that: random deviations from some unknown mean. Given the number of polls taken into consideration now, we might contend that the movement documented by Pollster is at least in parts a move from noise to truth, but this is likely erroneous as well. While multiple polls have the potential to take care of the noise issue, the problem is how to aggregate correctly. Different polls come with different quality in sampling design (i.e., deciding how to define the sample of possible respondents) and in post-sampling analytics (i.e., making your respondents as comparable to the voting population as possible). Ideally, we would want to see a weighted average of polls where the weight tells us something about poll quality, but in reality the weight remains unknown.

Strategic polling is a second threat to the validity of generic match-up polls. Consider a strategic Sanders supporter: It is rational for her to try to boost Sanders’ standing, especially in a still ongoing primary, by saying she would vote for Sanders over Trump, but not Clinton over Trump. Strategic polling could result in polls showing that Sanders is a more viable general election candidate. But the vast majority of Republican voters, 93%, supported in Romney in 2012, despite a contentious primary, and the vast majority of Democratic voters supported Obama in 2008, 89%, despite their contentious primary. This is a strong trend that we expect to continue in 2016. Thus, with Trump securing his party’s nomination first, it was not surprising that he began narrowing the gap. Clinton’s poll numbers vis-à-vis Trump’s are likely temporarily deflated until she officially secures her party’s nomination or Sanders drops out of the race.

Our experimental poll cuts through the noise and the strategic voting by focusing on the eventual nominees, regardless of who he/she may be, eliminating strategic answers. We control for measurement error by holding question design stable, and address noise by superimposing high-end dynamic statistical models leveraging access to Big Data on all likely voters.

Our data is clear: the trend is in favor of the Democratic nominee. Rather than the big swing we see with the aggregation of the traditional polling, we have seen a steady increase in the poll share for the Democratic nominee. When it comes to national popular vote, a strong predictor of state-level voting, Donald Trump is going to start this presidential campaign with a serious deficit to the eventual Democratic nominee, Hillary Clinton.

Tobias Konitzer is a Ph.D. candidate in communication at Stanford University.

David Rothschild is an economist at Microsoft Research. Find him on Twitter @DavMicRot and PredictWise.com

Methods Note: We develop a dynamic statistical model that yields probabilities of sub-demographic groups supporting either party, and is able to parse out noise from substantive movement. We then weight these probabilities based on the proportion of this sub-demographic in the likely voter space. We estimate the likely voter space leveraging Big Data on all registered American voters. The data was collected in 18 waves and includes 18,000 responses total.