Throughout 2015 in the political world, attention towards the polling positions of presidential candidates reached unforeseen heights. As poll numbers increasingly defined a candidate’s strength and dictated access to a national platform in debate stages, discussion over the meaning of these horse race numbers–and namely their predictiveness–grew as well.
Messages from the data journalism and political science community rightfully warned against placing too much stock in primary polling, which has a long history of instability as presidential races unfold. An entire Twitter account was dedicated to showing where primary leaders from recent elections stood at certain dates, ultimately revealing for most of 2015 the lack of predictive power primary polling held. For example, one day–even in later parts of the year–you might have stumbled on leads by Newt Gingrich in 2012, Rudy Giuliani in 2008, Hillary Clinton in 2008, and Wesley Clark/Howard Dean in 2004.
The point is that primary polling is notoriously variable, thus implying the excessive obsession on polling leaders on a given day as misguided. A more expansive analysis into prior decades of elections would certainly bear this out as well.
At the same time, the caveat to this general dismissal of primary polls conveyed that the closer one gets to election dates, the more accurate the polling measurements become.
Though it might be early for any retrospection considering the Democratic and Republican primary elections have only just begun, it’s still worth looking back at how predictive polls were with the results of the Iowa caucus and New Hampshire primary in hand.
Below I plot the correlation coefficient between monthly polling averages for the GOP side only and eventual percentages of the vote in each of the state’s elections for each month stretching back to the start of 2015, thus giving an indication of the predictive power of polls at each stage in the election cycle.
As can be clearly seen, the predictiveness of polling adheres to the idea of proximity to election dates as determining how closely they can predict eventual results. Polls in both states more closely mirror the final vote the closer they get to their respective election dates. Iowa Caucus surveys demonstrate this particularly well: while all over the place for the first five months of 2015, polls in the state gradually rise in predictive power before they reach their single-peak, high-water mark in the final eight weeks before the caucus.
Perhaps the most striking result from here is the stability of New Hampshire’s vote as expressed in public opinion polling, beginning as early as July of last year, and attaining a strong correlation to the final outcome for all months thereafter. This may have been an abnormally high predictive year for polls in New Hampshire, but even then, the principle about primary polls–specifically those regarding the early election states, and during the year before the elections–still stands: the closer you get, the more predictive you’ll find your polls.
2/14/15 Note: If compared with the run-up to past primaries, this will likely show a much higher correlation earlier on in the campaign. This almost certainly results from the abnormal context of the GOP primary in which at one point 17 candidates were in the race. Many of these on the lower end of public opinion support consistently remained there and thus closely mirrored their election day outcomes. This group (think Jim Gilmore, Mike Huckabee, Rick Santorum, George Pataki, etc.) overrepresents the overall field, and consequently produced a much stronger monthly correlation to eventual results than if only the candidates with the highest polling numbers were examined. Nevertheless, especially in New Hampshire with Trump and the candidates in the establishment lane, the stability explained above is still properly reflective of polling predictiveness.