Pre-Election Thoughts on Campaign Data Space

We spent the last few cycles doing academic research on politics, pushing the boundaries of both methods (data collection, analytics, and consumption) and domain understanding (mapping public opinion, effect of advertising, and news). And some of this work has had an impact on how actual practitioners create and use data around politics. But, we missed something critical with that research: we were not answering the most crucial questions that actual practitioners needed (or thought they needed). So, we spent this cycle learning a lot about what was actually going inside of campaigns and related institutions, continuing to push boundaries, but with a goal of creating outputs that are more immediately relevant.

A few high-level points: (1) Pollsters are actually consultants. If you have an image of a campaign having a questions about which topic to discuss, or message to package it, and pollster trying to find the answer in a data-driven, unbiased way, you are mainly wrong. It is much more like a "pollster" turned consultant finds some way, bolstered with some carefully pre-selected data, to confirm the campaign's prior. This means that campaigns are pushing the exact same message they wanted to push in the first place, validated with some data. Better methods are not really going to help in that scenario, but different available information, automatically distilled and processed via algorithms, could help break that cycle of confirmation bias. (2) Even when they are trying their best to answer questions objectively, practitioners are not using the latest tools and frequently make data-free decisions where data could be readily available. Progress is hampered near monopolies in many aspects of data flow, from data collection to analytics. Further, while the right is well vertically integrated, the left is not: coordination problems further impedes progress. This is a huge advantage for the right. (3) People in the industry probably feel like there is innovation, but it is all at the margins. Practitioners suffer from same problem as academics, everyone is focused on marginal returns for new methods, which has left them vulnerable to the tyranny of the marginal return. It is not as useful to get  0.5 percentage point more accurate, if it costs too much and no one will shift their investment because of it. (4) On the left, presidential campaigns can break these cycles, but that means we see bursts of real innovation every four or eight years, followed by more of the same as those innovators settle in as monopolists. On the right, money is outside the presidential cycle, leading to a more constant flow of innovation, hampered by their much more shallow talent pool.

So what are we doing:

(1) Topics and Messages: We are tracking 200+ (and growing) dimensions on economics, political position, and value-frames (monthly since mid-2017): this would have been impossible before this cycle, as we are using the incredibly cost-effective Dynamic MRP we developed last cycle. We hope that this data will demonstrate to people that gaining insights into public opinion does not need to be an ad-hoc adventure: we believe that carefully selected omnibus surveys van replace much of the overlapping ad-hoc surveys in the industry. And surveys do not need to focus on the horse-races or other topline questions: cheaper means we can explore the value of new questions, at scale. And, this data helps people see the value of breaking away from expensive, slow, and inflexible methods of surveys. We have a few examples where it could be making impact:

Example: We have been publishing articles since the middle of 2016 on how popular the provisions of ObamaCare are, despite our polls showing the standard support for ObamaCare itself. Further, we showed that as ObamaCare became more popular, support for its provisions – or components – was not moving: only the joint threat of losing healthcare and media focus on what teh ACA is was making it more popular. We know that having this type of data available on dozens of issues will help transform not just what the population thinks about legislation, but what elites think the population thinks about legislation. We were able to let everyone know how unpopular the Republican tax plan would be, before it passed, which proved to be a real surprise to the Republicans. Similarly, we were able to show back in the Republican primary of 2016 how immigration and trade were good topics for candidate Trump: both unpopular with the Republican base and supported by elites on both sides (because immigration and trade are really good for America). And, how/when the Republicans went too far on  both, and began to lose elite support. Traditional pollsters do not have the trends on these topics, and they are not asking the right questions to move past people just giving a proxy for their party identification to how people really feel about the topics.

Example: We were able to demonstrate, in combination with TargetSmart, and others, the usefulness of our value-frames (racism, authoritarianism, etc.) and engagement metrics (talking about politics, donating, etc)  in both targeting and predicting engagement. But, we want to emphasize something here: all of work assumes these second-order value-frames are context depended. They work differently in conjunction with different topics.

(2) Individual-level Targeting and ROI: With this data we have rethought what it means to target voters on the individual-level. Right now targeting is all about the marginal voter, but is that right? The main data provider on the left spends all of its time and energy finding the next person to flip to Democratic or vote, and that makes sense the day before the election, but what about a month or a year out? We think it should be about voters that are not as obvious, but have issue and value alignment that make them movable over time. The Republicans have begun targeting Black males, just as they targeted white non-college educated men in 2016. This is about finding voters who may leap-frog traditional marginal voters to switch teams. But, we do not focus on aggregate groups, we focus on individuals, made of a complex web of policy preferences and values. If a candidate is running on healthcare and taxes, then the campaign should not just be targeting people on the margin to "vote Democratic or Republican", but people who strongly support healthcare and taxes policy similar to the candidate.

Example: As a really cool demonstration, we created segmentations, used as the basis of digital and addressable TV persuasive ad buys, is KS-04 and CA-45, delivered within 12 hours to the client. The technology we are developing allows us to score voters flexibly on segmentations that can combine different issues and outcomes, reflecting the content of the actual campaign. As opposed to static segmentations, we are further able to (a) include the most pressing issues of the respective campaigns into the segmentation scores (e.g. taxes, healthcare), and to track RoI over time. We think this is part of the reason our segmentations have lead to record-level ad engagement metrics [validation to come].

(3) Methods: We moved away from model refinement, which had increasingly narrow returns, towards whole new methods of utilizing the voter files and surveys to project the likely voter space. We do believe transparency in analytics is important – both of us have produced or advised on the leading research in survey methodology (see here for an in-depth white paper describing our standard model). And, we are increasingly concerned about (a) the monopoly of collection of data and data analytics (especially) that lies with a handful or in the latter case with only one shop in the progressive eco-system, and a related wunderkind mentality that allows these shops to hide behind a certain image of statistical acumen. But going forward, we believe innovations need to address best ways to mix behavioral and survey data in order to add scale and granularity beyond the possibilities of survey data collection alone.

Example: The bulk of our data collection focuses on attitudinal data, but because we collect data on smart-phones via Random Device Engagement, we are also able to leverage vast data points on, say, installed applications, and, crucially precise geo-locations. We are still exploring methods of combining behavioral and attitudinal data, but so far we have already incorporated more than 30 Million behavioral data points into our models. In general, we do not have tons of results from this tracking data yet, because its value lies also between cycles, not just within cycles.

Example: We incorporated early voting data and other known outcomes, as a way to roll the target population froward each day as people voted. We are calling this new method PredictWise Calibrated Polling.

We are pretty good at horse-race polling as well (see especially our applications of Calibrated polling in Texas 2018 and Montana 2017, but that is not really where the key innovation is. We are increasingly concerned that too many key players in the progressive eco-system are missing the forest for the trees, as the old adage goes.