I launched a new website, with a few friends, including Miro Dudik and David Pennock, called Microsoft Prediction Lab. The website consolidates research into both non-representative polling and prediction games. I have spent years understanding how various raw data: polling, prediction markets, and social media and online data, can be transformed into indicators of present interest and sentiment, as well as predictions, of varying populations. Then, how decision makers allocate resources with the low latency and quantifiable market intelligence that we produce. Microsoft Prediction Lab allows us to continuously innovate not only on the path of raw data to analytics to consumption, but the collection of the data itself.

Microsoft Prediction Lab serves two symbiotic purposes; for it to be a successful laboratory, it must also be a successful product, and vice-versa. The project is designed to promote engagement and showcase the bleeding-edge work of Microsoft Research (and other collaborators). Further, the research is making an impact in how people create predictions in the several billion dollar election industry, and that will spread into other domains soon.

Markets: Markets have been an efficient method of aggregating data for millennia, and prediction markets have been forecasting elections for over century, but there is room for improvement. Here are a few of the innovations we are exploring in Microsoft Prediction Lab. First, we are examining how well markets can work without currency by using incentives like teams, leaderboards, etc. Second, we are examining how we can lower the barriers to entry into markets by making more intuitive interfaces and wording the questions efficiently depending on the user’s knowledge of markets and expectations. Third, we are adapting the right questions for the right people to ensure that information flow is maximized from the users to the market. Fourth, once the data is collected we are using fully combinatorial market makers. Individual probabilities are interesting, but combinatorial and conditional probabilities pose a meaningful and interesting challenge.

Polls: The only acceptable form of polling in the multi-billion dollar survey research field utilizes representative “probability” samples; my colleagues and I argue that with proper statistical adjustment, non-representative polling data can translate into accurate predictions, and often in a much more timely and cost-effective fashion. We demonstrated this by applying multilevel regression and post-stratification (MRP) to a 2012 election survey on the Xbox gaming platform. This was an incredibly non-representative sample. But, not only did the transformed top-line projections from this data closely trend standard indicators, we used the unique nature of the data’s size and panel to answer a meaningful political puzzle. We found that reported swings in public opinion polls are generally not due to actual shifts in vote intention, but rather are the result of temporary periods of relatively low response rates among supporters of the reportedly slumping candidate. We raise the possibility that decades of large, reported swings in public opinion—including the perennial “convention bounce”—are mostly artifacts of sampling bias. More broadly, the work on the Xbox, and subsequent studies with Sharad Goel, show great promise for using non-representative polling data to measure public opinion and general social science questions at a lower cost, with more speed and flexibility.

Visit the new site at: Prediction.Microsoft.com.