Someone forwarded me this article by Ray Hennessy: The Problem With Polling, Surveys and Opinion Is That People Fib. The piece so (1) riddled with factual errors (2) over-the-top “business lessons” I thought it was a joke. My friend convinced me that the author was serious, so I am going to use it as foil to defend polling in the 2016 primary, but caution against its interpretation.

The article starts by noting that “polls are lousy way to get a true sense of someone’s opinion.” Because, “every political pundit, analyst, and media heavyweight … said Trumps’ entire candidacy was a joke.” Mr. Hennessy goes on to say that “poll after poll showed lackluster support for Trump”. Then, he notes that the reason polls were so bad is because, “no one wanted to admit” they were going to vote Trump. This entire piece is factually wrong.

Poll after poll showed rising and strong support for Trump. As Natalie Jackson noted in the Huffington Post, polls were very right in that they started with Trump in the lead and basically he just went up from there. And, Andre Tartar notes at Bloomberg, the raw polls from Real Clear Politics did pretty well in the state-by-state by predictions. While it is certainly possible that there were some people that lied to live pollsters, leading to higher polling numbers on internet-based polling, this was a small effect. Because pundits, analysts, and media heavyweights, people who generally do not care about data, were wrong, that is hardly an indictment of polls. But, some people who did use polls were wrong too, in that they gave low predictions of Trump winning the primary.

The data-based pundits, analysts, and media heavyweights knew that historically polling served as a useful guide to current sentiment, but they also knew that historically the establishment candidate always won. So, it made sense for them to talk about how the polls have Trump way in the lead, but that he was likely to loose. Further, trying to make the models more robust, data-journalists like Nate Silver added in endorsements (which Trump lacked), which historically correlated with victory, but again, this model did not hold in 2016. Let me be very clear: this is not the fault of the data, but of the models run on the data! The problem is that the models that translated the raw data (polls and endorsements) into forecasts were under-identified. The pundits, etc., should not have run models on the polling data with so few outcomes. This is not about what the pundits, etc., wanted to happen, but too much precision on under-identified models.

The business lesson we should take from this is that market-intelligence can be hard to parse, if the outcomes are sparse. This is why business leaders should actually understand the underlying data, not just the topline results. Sometimes it impossible to efficiently translate the raw polling data into polished market intelligence. The millions of dollars spent on polling did an amazingly good job at understating the sentiment of the target population, models did a bad job in translating that into probability of victory.