Calibrated polling: Post-Mortem


PredictWise Calibrated Polling explained 

Late in the cycle, we introduced a new method we dubbed PredictWise Calibrated PollingAs a brief reminder, this is Calibrated Polling: As we have maintained through the cycle, estimating who people will vote for, conditional on them turning out, was easier than estimating the composition of the turnout composition of the electorate. Our methods are able to separate out these two distinct sources of uncertainty, and that allows us to tackle both problems separately. But, that still means that estimating the turnout composition is hard. Usually, we are relying on a full voter file, and attempt to estimate the composition of the electorate with an ensemble of likely voter, demographic and financial models for each person on the file, relying on a host of auxiliary data, such as the Census ACS.

Note: all these are probability models, and simply rolling them up usually gets us close to the actual turnout composition in the aggregate, but means we are more prone to error the smaller the geography of interest. For example, at the aggregate space, our two-party Democratic vote share estimates hovered somewhere between 8.0 and 9.5 percentage points during the last two weeks before the election, and settled on 8.1 ppt, with the ground-truth being 7.1 ppt. So, overall pretty good, and less than a percentage point from the ground truth. But, things are getting more complicated in smaller areas, such as congressional districts. Measurement errors in underlying probability models (for example: probability our likely voter is Black) start to matter much more, and are less likely to cancel out in the aggregate. This cycle, we have explored the return of early votes in real time to re-calibrate these models feeding into our estimated likely voters pace. Here is the intuition:

  1. We match early votes back to the voter file, and use our massive national generic ballot model, built on more than 3 Million responses from 200,000 survey respondents, and well over 30 Million behavioral data points, to estimate the proportion Democratic among early votes
  2. We optimize the calibrations of the probability models feeding our turnout space such that we can reproduce the early voting numbers from 1. among our survey respondents, i.e. the "actual" Democratic-Republican breakdown of all early votes ("actual" being in parentheses here because it is still modeled). This requires (a) a large enough subset of survey respondents who have in fact voted early, and hence a large-enough overall N, and assumes that early voters are representative of election-day votes, which we believe was more true than in the past in 2018. Briefly, optimizing the calibration of the probability models works like this: If our baseline model assumes a voter in our turnout space is 80% White, 10% Black, 5% Hispanic, and 5% "Other", we move the thresholds around until the "new" probabilistic breakdown is able to reproduce the results from 1. The new result, for example, could be an "adjusted" model that assumes a voter in our turnout space is 82% White, 5% Black, 8% Hispanic, and 5% "Other". In practice, this is a complex multi-dimensional optimization problem

    Results of three pre-registered pols, published before the election


So, with out further ado, how'd we do? We rushed out three polls built on this method on November 5, one day before the election: TX-Sen, AZ-Sen, and CA-21. In TX, we had: Beto 49%, Cruz 51 (timestamped on November 5 on Twitter, and right here on our blog). The ground truth (two-party vote): Beto 48.7%, Cruz 51.3%!

In NV, we had Jacky Rosen at 51%, and Dean Heller at 49% (timestamped on November 5 on Twitter, and right here on our blog). The ground-truth (two-party vote): Rosen 52.6%, Heller 47.4%.

Finally, in CA-22, we had Janz 43%, Nunes 57% (timestamped on November 5 on Twitter, and right here on our blog). The ground-truth (two-party vote): Janz 44.8%, Nunes 55.2%.


We are currently exploring how calibrated polling would have fared applied to our private polling across 40 of the most competitive house-races.