In versions of the Cooperative Congressional Election Study before 2016, vote validated turnout was consistently higher than actual turnout across states. Grimmer et al. 2017, for example, show this phenomenon here in Figure 1. Matching CCES respondents to individual state voter files to verify whether they voted using governmental records gives a more accurate picture of voter turnout, but the CCES–as with nearly all other surveys–still suffers from a bias where those who take the survey are more likely to have voted than those who did not take it, all else equal.
However, this trend took a weird turn with the 2016 CCES. Unlike the typical overrepresentation of individuals who voted in the CCES, the 2016 version seems to have an underrepresentation of voters. The below graph shows this at the state level, plotting actual voter eligible population (VEP) turnout on the x-axis against CCES vote validated turnout on the y-axis. The closer that the points (states) fall on the 45-degree line, the closer CCES vote validated turnout approximates actual turnout at the state level.
The line of best fit in red clearly does not follow the 45-degree line, indicating that CCES vote validated turnout estimates are very far off from the truth. For comparison, I did a similar plot but for vote share–state level Democratic two-party vote share in the CCES vs. actual two-party vote share:
This result should suggest that it’s not that state level estimates of political outcomes from the CCES are wholly unreliable. Rather, the problem is more specific to state level turnout in the CCES, which Grimmer et al. 2017 stress. That still doesn’t address the switch from average overrepresentation to underrepresentation of voters from 2012 to 2016 in the CCES. In particular, regarding the first graph above, a set of seven states–at around 60-70 percent actual turnout but at around 25 percent CCES turnout–were very inaccurate. I plot the same relationship but change the points on the graph to state initials to clarify which states make up this group:
CCES turnout estimates in seven Northeastern states–Connecticut, Maine, Massachusetts, New Jersey, New Hampshire, Rhode Island, and Vermont–severely underestimated actual turnout. The below table gives the specific numbers on estimated turnout from the CCES, actual turnout, and deviation of CCES turnout from actual turnout (“error”) across these seven states:
On average, CCES turnout in these states underestimated actual turnout by 38.1 percentage points. It is very unlikely that the CCES just happened to sample many more non-voters in these seven states, which marks one explanation for this peculiar result. Another more likely explanation concerns problems with matching CCES survey respondents to the voter file, as Shiro Kuriwaki suggested to me. This turns out to be the likely source for the egregious error. Catalist, a company that manages a voter file database and which matched respondents from the CCES survey to the voter file, had very low match rates for respondents from Connecticut (40.7 percent match rate), Maine (35.6), Massachusetts (32.2), New Jersey (32.1), New Hampshire (38.2), Rhode Island (37.2), and Vermont (33 percent). The below graph illustrates how this affects turnout estimates:
Catalist match rate (the percentage of survey respondents that were matched to the voter file) is plotted on the x-axis, and the difference in CCES turnout and actual turnout (i.e. error) is plotted on the y-axis. These two variables are very closely linked, and for an obvious reason: the CCES treats respondents that are not matched to the voter file as non-voters. Inaccuracies with turnout estimates in fact reflect inaccuracies with voter file match rate. This weird pattern in 2016 is not about overrepresentation of non-voters in the seven specific states but rather about errors in properly carrying out the matching process in those states. The under-matching issue has received attention from CCES organizers and it appears it will be corrected soon:
What’s still strange is that even after ignoring those error-plagued seven states, you don’t observe the usual overrpresentation in the remaining states without a clear matching problem. Many are close to the 45-degree line (that indicates accurate survey turnout estimates) and fall on either side of the line, with more still under the line–suggesting that in several states, the CCES sampled more non-voters than it should have. The estimates remain close to actual turnout, but I still think this is unusual compared to the known consistent overrepresentation of voters in past CCES surveys (again, see Figure 1 here). Perhaps lower-than-usual voter file match rate–while not to the same degree as in the seven Northeastern states–also contributed to a lower than expected CCES vote validated turnout across many other states. However, it could also be that voter/non-voter CCES nonresponse bias occurred to a smaller degree (and even flipped in direction for some states) in 2016.
It looks like this issue in the CCES has been fixed and the corrected dataset has been posted to Dataverse.
I re-did the main part of the analysis above with the updated CCES vote validation data. As the below figure plotting actual turnout against CCES turnout shows, considerable less error results. I calculate “error” as CCES turnout rate minus actual VEP turnout rate. The average error is +0.57 points, ranging from -10.8 (the CCES underestimating turnout) to +10.8 (overestimate), and the half of all states have lie between an error of -3.95 and +5.38.