In the 2016 presidential campaign, Big Data had almost become an antiquated term. At least since Sasha Issenberg’s romanticized account of Obama’s win in the 2008 election fueled by war rooms stacked with high-powered machines, the idea of data scientists informing major campaign decisions based on data voodoo generated in windowless caves has been omnipresent in the endless discourse of American campaigns. Then, during the 2016 campaign, a small data analytics shop with the name of Cambridge Analytica took it a step further: As first reported by the Swiss magazine Das Magazin, the CEO of the company, Alexander Nix, was fond of saying that his team had profiled the personality of every adult in the United States of America—220 million people, truly a new milestone in the advent of Big Data. And, not just any data, but psychometrics, the new Holy Grail of political analytics. Of course, accounts like this were soon discredited as overstated and inaccurate.
The truth today is that Big Data has been essential in presidential campaigns for years. Most important campaigns are subscribed to voter files, delivering them information about turnout history sourced from State Departments of states, demographics, and even financial data for most adult Americans. The truth is also that a lot of components of these Big Datasets are not as informative as it appears from casually skimming data dictionaries. Take education, for example: No US data vendor collects or possesses information on the educational attainment of every adult American. Instead, education in these datasets is “modelled”, i.e. predicted from models trained on auxiliary data sources. In fact, most of the data used in modern campaigns are model scores, rather than ‘actual’ data. Of course, some of these variables provide value, but understanding the data generation process becomes crucial. Another example: Race, while sourced from voter files directly in some states, is mostly modeled. Because clients desire a single estimate, these probabilistic scores are oftentimes aggregated into one racial affiliation, and, that is White in most cases. Aggregating up such individual-level scores can lead to serious errors. For example, it is far from clear that the racial composition of the electorate based on such scores – that was identified as 76% white for the 2012 electorate – is accurate. In all likelihood, the psychometric scores Cambridge Analytica developed suffered from similar shortcomings. The most interesting question – related to quality of the data – relates to the usefulness of such data. Even if Cambridge Analytica had conducted personality tests with every single voter, would the outcome have been predictive of real-world events such that it could help inform a presidential campaign?
While we cannot assess the Cambridge Scores, PredictWise has undertaken a similar exercise: We have projected scores on 9 psychometric value-clusters, including authoritarianism, economic populism, compassion and racial resentment, as well as 10 political issues, onto almost 250,000,000 adult Americans. The scores are powered by survey responses of more than 50,000 Americans, algorithms we have developed for our rather successful prediction of the 2016 presidential election, and by multi-item measures collapsing scores on various items into one. Starting in April, PredictWise will update scores fore more than 8,0000 demographic clusters in every congressional district each month, resulting in more than 1 Billion fresh data points. One of our early test-cases came in the form of the Alabama Senate election on Dec 12, 2017. Many counties had become more blue compared to the 2016 Presidential Election. We wanted to know: Could our scores of racial resentment have foreshadowed this movement of the electoral map in Alabama? When we plot racial resentment across Alabama, we find substantive variation: Racial resentment is much less widely spread in more urban counties, such as Mobile or Jefferson County, and widespread in the central Northern part of the state.
These dynamics indeed correlate with the change of the electoral map 2017-2016. In counties less plagued by racial resentment, voters switched to Democratic Senator Dough Jones in drones.
In fact, our resentment scores predict movement toward the Democratic camp conditional on partisan composition as well as age, gender and race distributions in these counties. We predict that 1 % less spread of racial resentment is associated with a .5 percentage point gain of Doug Jones vis-à-vis Hillary Clinton in these counties.
Yes, Big Data has is not the Holy Grail of politics. But, the challenge will be to identify when psychometric or other models can be useful, and so far the PredictWise racial resentment scores do fall in that “useful” category.