The Decline of Competitive Precincts from 2008 to 2016 (Decision Desk)

Using recently released data on presidential vote totals at the precinct level from Ryne Rohla, I analyzed the trend toward less competitive precinct landscapes over the last three elections. You can find this analysis in a blog post over at Decision Desk HQ.


The Decline of Competitive Precincts from 2008 to 2016 (Decision Desk)

Survey Nonresponse Bias, Declining Social Trust, and 2016 Polling Error

Response Rate Decline and Resulting Bias

The decline of response rates is one of the most serious methodological and even existential problems that survey research faces today. Interview yield rate–the percent of people who complete a survey out of all those sampled to take it–has been in a steady decline in the last few decades. Pew reports a precipitous response rate fall from 36 percent in 1997 to nine percent in 2012.

This introduces the potential for nonresponse bias, a phenomenon wherein people and their responses to surveys may be systematically different from those who do not response and their answers (that go unrecorded). While this bias cannot be directly calculated (because you do not have responses from the non-responding population), comparing low and high response rate surveys–and their demographic and response compositions–moves us closer to an answer. The largest well-established bias is that people who have disproportionately high levels of civic engagement select into taking surveys. In other words, on measures of volunteerism, community involvement, contacting public officials, and other political activity, surveys tend to display a much more engaged population than in reality.

As response rates decline even further, the assumption needed for unbiased results–an indistinguishable population that responds and one that doesn’t–becomes all the more tenuous. More importantly, it could start to expand to characteristics beyond just those that relate to civic engagement. Pew most recently did not find significant nonresponse biases on metrics of political orientation like partisanship. With nonresponse bias widely considered as one possible source for state polling error in 2016, it’s not far-fetched to believe the previous little of evidence of bias along political beliefs could be changing in the context of this past election cycle, or at least consider that possibility.

The Role of Social Trust in Survey-Taking

One factor that could help explain this declining willingness to take surveys and the accompanying potential for unreliable survey results involves changes in social trust. Clare Malone at FiveThirtyEight highlighted the overall declining trust in American social institutions as recorded by Gallup. Since the early- to mid-2000s, Americans have reported lower levels of confidence in media, banks, Congress, and other democratic institutions. These trends are particularly significant considering that the winning campaign of the 2016 presidential election played to and capitalized on much of this distrust and disillusionment with the social system.

Given that response rates have been sinking before trust in institutions truly began to erode, the relationship–on a qualitative level–is not overwhelmingly strong. However, there could still be a mechanism at play here, and one that possibly has grown over time: the effect of social trust–and social capital more broadly–on survey response rates. Some existing evidence that speaks to this question comes from the Democratic firm Civis Analytics, which has been sounding the alarm about survey response rates well before last year’s Election Day. After the election, Civis made the clear connection between the “Bowling Alone” voter–a term based on Robert Putnam’s book on declining social capital in the U.S.–and pre-election polling error.

While uncertain if nonresponse bias or coverage bias (the inability of surveys to reach certain groups) was more at work, Matt Lackey of Civis stressed that polls were failing to capture the opinion of a certain segment of voters in one way or another. This segment is one Putnam describes as composed of more blue-collar whites, who might have faced economic difficulties (related to industrial changes and globalization) and been uprooted from their homes. Most importantly, this group showed increasingly declining levels of social trust and community ties, all of which falls under social capital traits. Certain qualities would suggest this group was more disposed to voting for Donald Trump. Given their absence from polling, this would obviously introduce error in pre-election polls. As David Martin, also of Civis, points out, survey-taking willingness–in a context where compensation is often not provided for taking a poll–depends largely on an individual’s sense of “civic duty” and “social obligation.” When that’s missing or in low amount, a clear mechanism for survey-taking refusal begins to form.

Martin and a co-author, Benjamin Newman, also examine the association between social capital and an activity very related to survey-taking: Census participation. The two found that Census response rates were strong positive predictors of different measures of social capital, such as trust in and interaction with one’s neighbors. The supposed “cause-and-effect” is the inverse of what I suggested before, but the strength of the relationship here is what matters and lends support for this conceptualization of what drives survey-taking. (Moreover, if any causal link existed, it would not be greater Census response rates causing social trust to increase, but rather the other way around.) Older studies on this issues also frame survey participation–in censuses or not–as a type of “community involvement” and “civic obligation.” Changes in this type of social capital would thus clearly have implications on survey response rates, and especially in the context of the noted decline in these rates.

Social Trust Levels Over Time

It’s difficult to directly test this idea; instead, I turned the General Social Survey to at the very least document this development. Specifically, I wanted to check responses to the question about whether most people can be trusted, which the GSS has asked in several years from 1972 to 2014. One key caveat should be kept in mind. Using survey data to inform this debate presents a problem, as this data gleans information from people who still respond to a survey, ignoring the main object of interest here: people who refuse to take surveys and their levels of social trust. However, I consider the data here to still be informative of the overall trend, and if anything captures a conservative estimate. If levels of trust from those who refuse to take surveys were somehow included in this analysis, the pattern of declining social trust–as well as overall levels of distrust–would only be greater. With that in mind, here are the rates at which all GSS respondents say most people can be trusted (in green) and cannot be trusted (in orange) over time (with another response option that stays about constant and small, “Depends,” not included):


From 1972 to about 1990, trust declines and distrust increases a little bit but overall, responses remains fairly stable. Over the last couple decades, however, social trust drops dramatically. From 1972 to 2014, the percent saying they cannot trust most other people increases from 50.0 to 64.7 (+14.7 percentage points) and those saying they can trust others falls from 46.3 percent to 30.3 percent (-16 points). This is consistent with previous claims of a serious decline in this one key aspect of social capital.

Social Trust Decline By Political Orientation

Next, I break up those same results by party identification, plotting rates of social trust and distrust among the three main partisanship groups: Democrats (with Democratic leaners), Independents, and Republicans (with Republican leaners):


By far, the largest decline in social trust occurs among unaffiliated self-identifying Independents. The percentage of Independents saying they “cannot trust” most other people rises from 48.2 percent in 1972 to 70.2 percent in 2014, a 22 percentage point increase. The rate of social trust in others sunk as low 18.3 percent among Independents in 2010, starting from a high of 46.5 during this time frame.

Importantly, there are some signs pointing to greater declines in trust among Republicans than among Democrats during these last four decades. Although Republicans still express higher levels of trust at most points, they undergo greater change. While the percentage of Democrats saying they cannot trust most other people increase by 11.2 percentage points from 1972 to 2014, Republicans grow distrustful at a much faster rate in showing 17.7 point jump in distrust. Similarly, while 12.1 percent fewer Democrats say they can trust most other people over this time span, 19.3 percent fewer Republicans say the same. Thus, social trust has declined at a higher rate among Republicans than among Democrats over time.

Assuming social trust impacts survey response rates, this difference in trends could prove very consequential. Nonresponse rates matter less when they’re evenly distributed across different values of a variable that might be tied to an outcome you’re interested in–like vote choice. This latter assumption is especially important for something like political orientation (e.g. partisanship or ideology) which qualifies as more of a latent variable and thus a measure that researchers cannot reliably weight on to effectively root out bias. When response rates–perhaps driven by trust–change more based on different values of a variable–a result hard to adjust for–biased survey results become all the more possible. Accordingly, Lackey observed the following dynamic during the 2016 campaign:

  • “What we found this year is there is a difference between those who took surveys and those who didn’t. People who took these surveys were more supportive of Hillary Clinton and Democrats.”

This sounds an awful lot like the important concept of partisan differential nonresponse, which was often emphasized during the 2016 election cycle and highlighted the different likelihoods of Democrats and Republicans to respond to polls. For the most part, it’s the same idea, but I think it’s worth noting one distinction. The type of differential nonresponse supposedly driven by decades-long shifts in social trust levels seems more persistent and longstanding, having less to do with fluctuations in a particular election season. I would consider this different from the kind of selection in and out polls depending on events during the campaign that Doug Rivers and others describe.

Graphing these social trust rates but broken up by three self-reported ideological groups reflects similar patterns (in terms of implications on a right-left wing spectrum), as shown below. While the percentage of liberals from 1975 to 2014 saying most people cannot be trusted grew by 3.8 points, the same rate of distrust among conservatives increased by 10.8 points. This is hardly unexpected given how correlated partisanship and ideology have become and given what the previous plot displayed. Still, it goes to show the potential impact varying trends in social trust could have on producing non-response biases that involve latent political orientations and thus biases that are not easily correctable.


The Case of Survey-Taking Among Republicans and Conservatives

The above two graphs and findings quantify the greater increase in distrust among Republicans and conservatives. Taken in conjunction with research relating social capital levels to survey response rates, it’s very reasonable to consider distrust as a mechanism for declining response rates and growing differential (partisan) nonresponse in particular. The influence of rising social distrust–of people generally but also of social institutions–among those on the right-wing is not unprecedented. Concerns over the ills of “big government,” intrusion into daily life, and violation of individual privacy exist more commonly among conservatives. While a bit speculative, this notion is generally pretty accurate. In this sense, contacts about taking surveys–which of course necessitates revelation of personal information–could be received more poorly and as acts of intrusion by conservatives. Perhaps distrust of others in this way had made Americans on the right less likely to respond to pollsters soliciting survey responses.

Along similar lines, when I first started to consider non-response bias as a source for 2016 polling error back in November, I recalled the efforts to cut administration of the American Community Survey that came from Republican lawmakers. In 2012, the Republican-led House voted on eliminating the ACS. Here are some descriptions of the motivations behind these efforts (emphasis mine):

  • “This is a program that intrudes on people’s lives, just like the Environmental Protection Agency or the bank regulators,” said Daniel Webster, a first-term Republican congressman from Florida who sponsored the relevant legislation.
  • Mr. Webster says that businesses should instead be thanking House Republicans for reducing the government’s reach. “What really promotes business in this country is liberty,” he said, “not demand for information.”

Even within the last few years, Republican lawmakers continue to be the ones pushing to curb survey research, specifically viewing and criticizing the ACS as an invasion of privacy. A 2015 FiveThirtyEight piece by Ben Casselman on similar issues in Canada touched on the efforts by Texas Congressman Ted Poe in the U.S. Poe has frequently introduced a bill in congress to make the mandatory ACS a voluntary survey, a change that could seriously damage the quality of data which many rely on for important decision-making–an experience that Canada had to suffer with one of its major household surveys. The language used in support by Poe reinforces the comments made by Webster above, speaking to the role of right-of-center ideology in refusing to take surveys. Again, the points of emphasis from Poe are the “government-mandated” nature of the ACS, that “the government will come after you” if the ACS is not taken, and that the ACS constitutes “another example of unnecessary and completely unwarranted government intrusion.”

Especially in recent years, the ACS has become increasingly tied into conservative ideological framework’s negative perception of big government, with personal questions seen as intrusions into daily life and as governmental overreach. Evidence of this directly involves only the ACS, which makes sense given it is conducted by a governmental organization. However, I would consider it very likely that the same linkage drawn by conservatives extends to political surveys more broadly, such as pre-election polls; in other words, all types of surveys represent intrusive violations of privacy at odds with key conservative ideological tenets. Notably, the remarks from conservatives that I’ve noted here come from elites (congresspeople) and not from the masses in the form of opinion polling, for example. But given the well-established dynamic of the public’s tendency to often follow cues from elites concerning an issue and to adopt positions from elites of the same partisan stripes, it seems likely that the public–specifically the Republican/conservative rank-and-file on the receiving end of these cues–is also coming to view the ACS and all surveys in a similar light. In this (speculative) sense, right-wingers could be growing increasingly less willing to partake in surveys as questions and debate surrounding matters like the ACS become more common. Consequently, this would lead to non-response biases that involve political orientations. While a bit removed, the notion of social trust and capital could still be at play here in this expansion of where conservative ideology is applied.

Trust Among Non-College Whites and Polling Error

Finally, I wanted to check the same social trust data from before but along non-political variables–namely, the combination of two variables that made for one of the most crucial demographic groups in this past election: non-college-educated whites. This segment of the population swung strongly toward the Republican column with Trump as the party’s presidential candidate. Here’s how the group’s level of social trust changed over the last few decades:


The pattern of sizable declines in social trust/increases in social distrust mirror those seen in the first graph above for all GSS respondents. However, the decline in trust among this subgroup occurs at a higher rate. While the percentage of all respondents saying cannot trust most people increases by 14.7 points over this time period, non-college whites grow 22.3 percentage points more distrustful during the same time. Again, given there’s likely some (though hard to quantify) impact of social trust on survey response rate, this faster growing distrust among non-college whites could also imply nonresponse rates among this group that are growing faster than among the entire population. Nonresponse bias (and changes in this bias over time) could be playing a particularly important role among non-college whites, who also happen to have assumed an even greater importance in deciding the 2016 election. Correcting for demographic biases is within the reach of pollsters. Though I’m not entirely certain of this, while pollsters can weight on race (white percentage) and education (non-college percentage) individually, they likely often do not weight on their interaction (white non-college percentage). Perhaps this explains why a demographic nonresponse bias such as this one can persist even after statistical adjustments.

Considering all of this in tandem could begin to shed light on where and why polling error occurred in 2016. Take the following graph of non-college-educated white percentage of a state’s white population and absolute polling error in Clinton margin as one indicator:


The relationship between non-college whites and polling error in a state is fairly strong at the bivariate level; the adjusted R-squared here is 0.37, a good amount for just one variable explaining another. This assessment is very ecological in nature, and thus should be observed with caution. However, the relationship does speak to an mechanism in place that could be linking a particular demographic to not taking surveys and thus causing inaccuracy in polling. With its social trust declining at a faster rate than the overall population, non-college educated white Americans might also be increasingly growing less likely to take surveys. Correcting for this group’s exclusion in surveys is not always straightforward, introducing the potential for non-response bias in survey results. When this group significantly changes its vote preference to the extent of “vot[ing] like a minority group,” non-response bias could lead to the type of polling error that both public and private surveys suffered from during the 2016 election.

6/22/17 edit: For the graphs about the social trust question from the GSS, I used the response options laid out in the GSS data explorer and not the actual ones to this question in the survey (see page 387 here). This does not change the meaning of these graphs by much, as the “Can trust” vs. “Cannot trust” options capture the difference in response to the question well and more clearly.

Survey Nonresponse Bias, Declining Social Trust, and 2016 Polling Error

Survey House Effects in Donald Trump Approval Ratings: 3/18/17 Update

A couple weeks ago, I worked on calculating survey house effects in approval rating polls on Donald Trump. As more time has passed and the number of polls to examine has increased since then, it’s worth checking in again on what polls are most and least biased on Trump, with other factors held constant. That’s what the below plot shows. (Before that, read the original explanation for how these house effects are calculated.)


Just like three weeks ago, the smallest in-house effect in either direction belong to Gallup. After controlling for polling universe, survey mode, and other variables, the Gallup effect–relative to all other polls–on Trump’s net approval rating is just 0.51 points. Other polls with fairly small house effects are CBS, FOX, YouGov/Economist, and CNN.

On the other end of the spectrum, Quinnipiac has the largest anti-Trump in-house effect–11.2 net points less approval for Trump relative to other polls and all else equal–while Rasmussen has the largest pro-Trump in-house effect at 10.8 points. PPP (D), ICITIZEN, Kaiser FF, Pew, and NBC/SurveyMonkey also have sizable anti-Trump effects, while Suffolk/USA Today, Monmouth University, NBC/WSJ, and Politico/Morning Consult have large pro-Trump effects (greater than five points in either direction). Some of these house effects rely on small amounts of polls to gauge a pollster’s bias, so these results should still be viewed cautiously.

Survey House Effects in Donald Trump Approval Ratings: 3/18/17 Update

Another Lesson in the Importance of Grouping Independent Leaners

Survey Monkey does a lot of great, innovative work, but the one thing that continues to pester me is how they present certain results. While they offer a wide array of ways to examine their polling data, party breakdowns of results–how Democrats and Republicans responded to a recent poll question, for example–usually use ungrouped party identification scales. As I’ve discussed in detail, this method of not grouping Independents with the party to which they lean creates a poor measure of party affiliation. It implies Democrat-leaning and Republican-leaning Independents do not have any inclinations toward the major parties and are distinct from partisans in terms of their political behavior. I find that notion is misleading and mistaken for the most part, as these Independent leaners behave like regular partisans along several outcomes, such as vote choice, self-reported ideology, and issue positions. This idea is nothing new, as I review in my past blog post. Political science research has long documented this phenomenon, both the existence of these closet partisans and the misleading nature of classifying them as Independents.

Just as evidence for the importance of grouping leaners (closet partisans) persists, so too do the tendencies on the part of professional pollsters to not adequately account for this. An even partisanship distribution–roughly a third of respondents who are Democrats, Independents, and Republicans–should raise the immediate red flag. When you group Independent leaners with their respective parties and thus follow the more informative method, that distribution levels out to about 40-45 percent for the two parties, and 10-20 percent pure, unaffiliated Independent variables–close to what Pew finds here. That red flag was raised for me concerning a recent Survey Monkey poll and discussion of its results. Here’s the snippet of the results release that caught my eye, regarding support for “the Republican health care plan to repeal and replace the Affordable Care Act”:

  • “Yet those with less partisan attachment have largely turned against the bill. Americans who initially identify as independent oppose the Republican health care bill by a large margin (40 percent support, 59 percent oppose). Independents who who lean to neither party are against it by an even larger margin (32 percent support, 61 percent oppose).”

Of course, this instance is not nearly as egregious as others. Survey Monkey does in fact break down results with grouped leaners. But a look at this post’s presentation reveals its emphasis: the first mention of party breakdowns uses the deceptive ungrouped leaner method. The one graph in the blog post for this topic uses ungrouped leaners too. As a result, this type of data on Independents essentially includes respondents who are identical to normal partisans, which clearly should not be the intent of these results.

I’m taking a critical tone here, but among the many things that Survey Monkey excels at is transparency. One component of that is release of detailed demographic breakdowns. The inclusion of an ungrouped, three-point party ID and a fuller five-point party ID scale allowed me examine their methodological decisions more closely and what they decide to stress in presentation of their results. When asked the following question:

  • Do you support or oppose the Republican healthcare plan to repeal and replace the Affordable Care Act?

Here’s how results break out if Independent leaners are classified as Independents (making for a three-point PID system with ungrouped leaners):


Again, this method–which Survey Monkey emphasized in this blog post/results release and in ones in the past–necessarily classifies partisan-like respondents as Independents. The other partisan breakdown for opinion on this new healthcare plan provided by Survey Monkey uses a five-point party affiliation scale. While not implementing the usual seven-point scale (and the more effective one for capturing more partisanship variation), this breakdown proves much more meaningful. Namely, it separates out the Independent leaners, which allows us to compare them to self-identifying partisans. Here’s how that breakdown looks:


Note that these responses refer to opinion on the new Republican health care plan. This establishes that more support is the more Republican position, and more opposition assumes the more Democratic stance. When we compare support rates across the five different groups, it becomes very clearly that leaners (“Lean R” and “Lean D”) hold quite similar opinions to those that regular partisans (“Rep.” and “Dem.”) do. Nearly the same amount of self-identifying Republicans (84%) support the plan as Republican-leaning Independents (83%) do. The same closeness in opinion holds for Democratic partisans (7%) and Democratic leaners (10%). It’s thus worth noting how seriously misleading it is to categorize these leaners as Independents, as the previous chart–the one with data Survey Monkey emphasizes most–does. The middle group (“Pure Ind.”), pure Independents, expresses fairly distinct opinions from these surrounding four groups, with 33% support.

The same story emerges when we look at the rates of opposition. Republican partisans (14%) are just about indistinguishable in opinion from Republican leaners (16%). The same holds on the other side of the aisle, where Democratic-leaning Independents in fact express slightly more Democratic views (92% oppose) than self-identifying Democrats do (88%), though the difference here may not be too statistically significant.

The broader point stands: Independents who lean toward a party are very close in opinion–and largely distinguishable–from partisans who identify as such when they’re initially asked about their party ties. Moreover, it’s only after separating out these leaners that the behavior and opinion of the leftover Independents becomes meaningful and closer to a reflection of those who truly don’t hold strong partisan ties. Movement within this specific group is often consequential for political outcomes, a notion that would be difficult to recognize without proper party affiliation categorization.

This, of course, is not to say that only party identification data and breakdowns that use grouped leaners should be reported. Though they behave very similarly to partisans, the fact that they eschew partisan labels in the first place is indeed substantively important. However, when presenting information on opinion splits by party to the public in a concise fashion, it behooves everyone involved–in order to produce the most realistic representation of political behavior–to group leaners with the parties to which they lean, and emphasize this data above all else.

Another Lesson in the Importance of Grouping Independent Leaners

How Education and Religiosity Divided the White Vote in 2016 (Decision Desk)

I wrote another blog post over at Decision Desk using 2016 CCES data, this time breaking down the white vote in the 2016 election using two key demographic variables: educational attainment and level of religiosity. Below are some data visualizations I use in the piece, which you can find here.



How Education and Religiosity Divided the White Vote in 2016 (Decision Desk)

Racial Prejudice Measurement and Modeling Trump Vote Choice

What drove the voting for Donald Trump in the 2016 election most? While not always explicitly part of analyses of the 2016 political environment, the question was very often a central focus and approach from several angles. Much of the discussion has propagated the idea that economic anxiety and dissatisfaction pushed people–and the presidency–to Trump. Early political science analysis–largely by Michael Tesler at the Monkey Cage–however has shown greater evidence behind the idea that racial resentment was associated with voting for Trump to a degree that voting for Mitt Romney in 2012 and John McCain in 2008 was not. In a similar–but, importantly, not identical–vein of thought, other measures that get at old-fashioned racial prejudice have also been shown by Schaffner et al to predict Trump support more strongly that economic satisfaction does.

Recently released 2016 CCES data can help further check for signs of racial prejudice driving Trump support, as well as determining the impact of other factors. Before that, though, it’s worth touching on the nuanced literature of social science measurement of racial prejudice. This paper by Christopher DeSante and Candis Smith both introduces innovative new ideas in this realm and reviews the body of prior research very well. Several different types of questions have been used to approximate racial prejudice–what’s termed the “old-fashioned” kind, which now proves difficult to uncover in surveys due to social desirability bias among respondents. Perhaps the most commonly used has been the racial resentment scale, composed of agree/disagree answers to four questions. Here’s the format in which it appeared in the 2016 ANES Pilot survey:

racial resentment questions 3-1-17

It’s the prejudice metric constructed from these questions that what used by Tesler to show racism drove Trump support in 2016 in a way it did not for support of Romney and McCain. Schaffner et al.’s finding is based on a metric using different questions–an important bit of nuance in this debate. The problem with the above traditional racial resentment battery is that other research by Carney and Enos, for example, has argued that a wider conservative ideology (mainly one promoting rugged individualism) was driving responses to this racial resentment battery–more so than racial prejudice itself. Similar criticisms of the set of racial resentment questions motivated DeSante and Smith to find better measures to uncover prejudice. After testing a wide array of survey question attempting to do just that, the authors concluded that there are two new dimensions that are significant predictors of more old-fashioned racism: cognitive (awareness and acknowledgment of racism) and empathetic (empathy for and experiences with other racial groups) dimensions. Crucially, questions that make up these dimensions significantly affect opinion on conservative issues that involve race, but not conservative issues unrelated to it, distinguishing these measures from simply just conservative ideology.

Schaffner et al. use these questions–specifically those that DeSante and Smith recommend–for their own study. As one of the principal investigators for the CCES, Schaffner implemented many of those same questions in the survey. Here are the four questions that intend to proxy racial prejudice, which emerge from the two new dimensions of racism (cognitive and empathetic) that DeSante and Smith emphasize:

  1. I am angry that racism exists.
  2. White people in the U.S. have certain advantages because of the color of their skin.
  3. I often find myself fearful of people of other races.
  4. Racial problems in the U.S. are rare, isolated situations.

Respondents were asked to strongly agree, somewhat agree, neither agree nor disagree, somewhat disagree, or strongly disagree with these four statements. Disagreement on the first and second and agreement on the third and fourth connote greater racism.

To test the effects of these four questions–that I’ll call a “racism scale”–I averaged responses to them to create one measure of racial prejudice (ranging from a value of 1 to 5 where 5 is most prejudiced; mean = 2.20, standard deviation = 0.79). Here’s how the distribution of those values looks like–right-skewed, as most respondents don’t display too high levels of racism on these dimensions:


I tested the strength of this racism scale by including it in a regression along with several other variables typically included in vote choice/support models. There were nine other variables in addition to the racism scale:

  1. Race (effects of black, Hispanic/Latino, and other races relative to that of whites)
  2. Education (effects of some college, college, and postgraduate education relative to that of high school or less)
  3. Age group (effects of 30-44, 45-54, 55-64, and 65+ age groups relative to that of 18-29)
  4. Gender (effects of male relative to that of female)
  5. Party identification (seven-point partisanship scale where the lowest value is most Democrat and highest is most Republican)
  6. Ideology (five-point self-reported ideology scale where the lowest value is most liberal and the highest is most conservative)
  7. Income bracket (effects of $30k-50k, $50k-70k, $70k-100k, $100k-200k, and $200k< income groups relative to that of $30>)
  8. Religious importance in respondent’s life (1-4 point scale where 1 is “not all important” and 4 is “very important”)
  9. Census-designated region (effect of urban region relative to that of rural)

In running this logistic regression (the table for which is at the bottom of this post), I regressed a binary vote choice variable–taking a value of 1 if the respondent reported voting for Trump and a value of 0 for Clinton vote–on these 10 different variables. Coefficients in a log model aren’t as easily interpretable as in a linear model, so I’ll just focus on the strength of the relationship (going by the size of the z-value in the regression output) of a certain variable–holding all others constant–with vote choice.

The strongest predictor of Trump vote in this log model is by far the partisanship score (where higher values indicate greater attachment to Republican identification). This should come as not much of a surprise, but does reaffirm the bearing of party identification on vote choice. The second strongest predictor, though, is the composite racism scale I explained before. (Both of these relationships are in the positive direction with vote choice for Trump.) Thus, racism–as expressed by this new dimension drawn from DeSante and Smith’s survey questions–is the variable most strongly associated with Trump vote outside of partisanship.

Other significant relationships with Trump vote in the positive direction include political ideology and religious importance, as well as older age group effects relative (to that of 18-29 year-olds) and male gender (relative to female gender). In the other direction, black race and Hispanic/Latino race (both relative to whites) and postgraduate education (relative to high school or less) are the strongest negative predictors of Trump. Urban region (relative to rural) as well as college degree (relative to high school or less) and “other” race (relative to whites) were also statistically significant.

Dependent variable:
Trump Vote Choice
Blacks (Whites Baseline) -1.103***
Hispanic/Latino (Whites Baseline) -0.915***
Other (Whites Baseline) -0.357***
Some College (HS or Less Baseline) -0.108*
College Degree (HS or Less Baseline) -0.469***
Postgraduate Degree (HS or Less Baseline) -0.760***
Age 30-44 (Age 18-29 Baseline) 0.449***
Age 45-54 (Age 18-29 Baseline) 0.586***
Age 55-64 (Age 18-29 Baseline) 0.567***
Age 65+ (Age 18-29 Baseline) 0.509***
Male (Female Baseline) 0.363***
Party Identification 0.988***
Self-Reported Ideology 0.584***
Income $30k-50k ($30k> Baseline) 0.049
Income $50k-70k ($30k> Baseline) 0.196***
Income $70k-100k ($30k> Baseline) -0.002
Income $100k-200k ($30k> Baseline) -0.200**
Income $200k< ($30k> Baseline) -0.444***
Religious Importance 0.211***
Urban Region (Rural Baseline) -0.395***
Racism Scale 1.573***
Constant -9.709***
Observations 33,667
Log Likelihood -6,559.886
Akaike Inf. Crit. 13,163.770
Note: *p<0.1; **p<0.05; ***p<0.01

Note: As with all current analyses using CCES data, self-reported vote is not validated. Read the second and third paragraphs of this article for an explanation of what this means and what it could imply.

Racial Prejudice Measurement and Modeling Trump Vote Choice

How 2016 Vote Choice Broke Down By Rural Status and Age Among Different Races

Region proved one of the biggest divides in the 2016 election, a notion evident when simply looking at a county election results maps or through polling data. As more data emerges concerning voting behavior in this past election, most recently with the 2016 CCES, I wanted to illustrate some aspects of this relationship between region and vote choice (for now, just through subgroup frequencies).

While there was no “region” variable in the CCES–for urban or rural designation–the survey did contain county FIPS codes for respondents. Using that variable, I was able to match Census data on the number of people living in a rural or urban area within each county. Specifically, I could attach a “percent rural” figure on each county to the counties of survey respondents in the CCES. In this way I could arrive at the area in which they live. While the Census considers a county rural if 50 percent or more of its inhabitants live in a rural area, I do not use this classification because it leaves very few rural inhabitants for the two minority subgroup analyses below. Instead, for each group, I construct rural status quartiles, based on whether respondents fall within the first, second, third, or fourth quartile on the range of rural percentage of a county within each subgroup.

I’ll highlight some interesting points as bullet points below each graph.


  • Among whites, there’s a clear relationship between rural area of living and vote choice; Trump vote increases as you progress into more rural areas
  • Breaking this phenomenon up by age doesn’t add much–the rural status and vote association remains relatively the same across each age group
  • I’ve shown before that white youth went third party more than any other group; here, white youth in the most rural areas seem slightly more inclined to vote third party (“Other”) than white youth in the least rural areas


  • Because of sample size considerations, I use two rather than five age groups for Hispanics; however, the rural status and vote relationship generally holds within this racial group too, as Hispanics in more rural areas report voting Trump more
  • However, the positive relationship between rural status and Trump vote does not gradually grow as much as with respondents age 45+ as it does with 18-44 year olds


  • Among blacks, voting for Trump doesn’t appear related to rural status
  • The interesting aspect here is what’s occurring with blacks living in the least rural quartile, particularly with the youngest age bracket: the most urban black youth voted for Clinton at a lower rate than 15 of these other 19 subgroups (the differences are statistically significant at p<0.05)
How 2016 Vote Choice Broke Down By Rural Status and Age Among Different Races

The Educational and Age Dimensions of the 2016 White Vote

I often hear a refrain in politics that posits that younger people tend to be more liberal and vote more Democrat than older folks. It also many times is framed as a longstanding trend in politics, which is not the case, as this more liberal hue to the youth’s political belief system only developed in the last one to two decades. For example, see the second graph on an old post of mine here. It shows that for the 18-29 age group, vote choice was very split if not more Republican from 1968 to 1988, but gradually shifted more Democratic since then. In 2012, according to ANES time series data, the Democratic margin among the youth vote was the largest since 1964.

So are young people becoming gradually more left-leaning in their voting behavior, a phenomenon attributable to age? For the most part, I would disagree with that idea, and I spell out the reasoning behind this claim in a blog post here. As with income and education, it’s practically impossible to look at age as a variable for breaking down vote choice–or other political behavior metrics–without also including race. Once you split the entire population by race and age, you can see, for example, that while overall the youth vote trends more Democratic, the white youth vote remains fairly Republican. The same goes for party identification and political ideology: youth as a whole are more Democratic and liberal, but whites are majority (or plurality) Republican and conservative while non-whites are the opposite. This idea is made possible by varying racial compositions of different age groups. Younger age groups, and the 18-29 year-old bracket in particular, are much more racially diverse than older ones. Given that non-white race is highly (positively) correlated with Democratic vote choice, identification, and other left-wing attributes, that makes the youth group as a whole more liberal.

However, there is something to be said about younger age groups espousing more liberal beliefs–even after controlling for race. The recently released CCES 2016 data sheds important new light on this topic. With it’s huge sample (n = 64,600, n = 52,899 for people interviewed after the election), I can drill down to small subgroups and still get a large sample. I had an exchange over Twitter about this same topic, and another important variable came to my attention: education. This could possibly explain the youth Democratic shift compared to other age cohorts–after controlling for race–as younger groups have had higher rates of educational attainment. Pair that with ample evidence of higher levels of education becoming increasingly correlated with Democratic and liberal political beliefs in recent decades, and this could easily be a part of the story for this youth dynamic. In that vein of thought, I introduce education in this analysis.

The below graph shows vote choice–for Hillary Clinton, Donald Trump, or another candidate–broken down by education (high school or less, some college, college degree, and postgraduate degree) and age group (18-29, 30-44, 45-54, 55-64, and 65+). I restrict this to whites only, as this is where the “controlling for race” hypothesis is most relevant.


This breakdown makes for 20 different subgroups, which all have considerable sample sizes that make for more certain vote choice estimates. By doing this, I can essentially control for (in the loose sense) education, age, and race to look at vote choice. Among whites, age and education are both related to vote choice–moving across one variable results in close to a monotonic increase or decrease in Clinton/Trump vote choice.

Moving from left to right across each row means looking at differences in age while holding education constant. In the first row, looking only at respondents with a high school degree or less, Clinton vote gradually decreases as the age group gets older. The same roughly holds for each educational group, as 18-29 year olds consistently vote more Democratic than the other four age groups. Thus, age is negatively correlated with Democratic vote choice.

Moving from top to bottom on each column shows what happens when you hold age constant, and examine vote choice changes by education. For example, in the first column looking at only 18-29 year olds, Clinton vote gradually increases as the level of education goes up (from HS or less to postgraduate degree attainment). The same dynamic occurs for every other age group: as education goes up, so too does Clinton vote. In this sense, I can say education is positively correlated with Democratic vote choice.

However, while two independent relationships exist with vote choice, one is stronger than the other–education. A bigger change in Clinton vote occurs moving from least to most educated for each age group than when moving from youngest to oldest age group for each educational level. The average absolute difference (not paying attention to direction of the relationship) between the highest and lowest age groups is 12.9 points; between the highest and lowest educational groups, it’s more than double at 26.6 points. Moreover, as someone mentioned on Twitter to me, Clinton and Trump vote doesn’t change much across different educational levels for the three oldest age groups. All in all, both variables seem to be related to vote choice among whites only, but education appears to be more strongly associated with it than age is. To better suss out all these different effects, I’ll soon regress vote choice on different variables such as these and check what significant effects come up. I’ll hopefully have that model ready to present and explain in another blog post in a few days.

The Educational and Age Dimensions of the 2016 White Vote

Voting Wait Times Experienced by Different Races in the 2016 Election

Wait times for voting on Election Day seem to have elicited much more concern in the last election cycle. The issue speaks to the state of democracy in the country and how accessible the simplest democratic function is to the mass public. Notably, as with other aspects of voting ability in U.S., wait times have often been shown to carry a racial dimension. For example, a report using 2012 data showed African-Americans averaged nearly twice the wait time to vote in elections as whites did, who had a much easier time casting a vote. While experiencing lower wait times than blacks, Hispanics and Asian-Americans also had to stand in line longer to vote than whites did. An analysis at the precinct level came to a similar conclusion, finding that minority neighborhoods experienced longer wait times than white ones, which was driven by differences in resources (voting machines and poll workers) that different neighborhoods got. Simply put, such a phenomenon falls within the pervasive web of racial inequalities in this country.

New survey data that has been publicly released in the last 24 hours sheds new light on this issue in the context of the most recent general election. The Cooperative Congressional Election Study is the largest and one of the highest quality surveys that is available for public use (I tend to use the other big one, the American National Election Study, more often), and the data for the 2016 iteration of this survey was recently released. As part of the second wave of the survey after the election passed, the following question was asked of and respond to by 34,293 survey respondents:

  • Approximately, how long did you have to wait in line to vote?

The answers ranged from “not at all,” “less than 10 minutes,” “10-30 minutes,” “31 minutes – 1 hour,” to “more than 1 hour.” Below, I break up those responses by five different race/ethnicity categories (all of which are weighted means):


While not overwhelmingly stark, the pattern is clear here and reinforces past research: whites spend a lot less time waiting in line to vote than minority groups. Moving from left to right along these graphs indicates longer weight times; the distribution for non-white groups tilts much more toward longer wait times than that for whites. 40 percent of whites report no wait time at all, while fewer blacks at 25.1 percent, Hispanics at 25.5 percent, and Asians at 26.1 percent do the same. While 39.6 percent of blacks, 38.7 percent of Hispanics, and 34 percent of Asians waited for more than 10 minutes in line, only 27.3 percent of whites did. Narrowing that down to people who waited for more than half an hour in line makes the wait time more pronounced for Asians (16.2) and Hispanics (13.2 percent)  relative to whites (8.7). Obstacles such as these surely play at least some role in the much lower turnout rates among Hispanics and Asians.

And just to further drive this point home, the below graph includes the same information as the first one but divides wait times by whites and non-whites. As the previous subgroup comparisons indicate, whites have to wait less to cast their votes in the U.S. than non-whites do, a dynamic that has now clearly persisted into 2016 with the most recent election.


Update @1:00pm EST: I made small error calculating the race/ethnicity variable for Hispanics/Latinos earlier. I’ve corrected it in the above graphs and explanation in the text. It makes a vert small difference, but if anything, the correction has resulted in showing Hispanics wait more time in line to vote. 

Voting Wait Times Experienced by Different Races in the 2016 Election