Thoughts on “Racial Equality Frames and Public Policy Support” Working Paper

A recent political science working paper titled “Racial Equality Frames and Public Policy Support: Survey Experimental Evidence” by English and Kalla (I’ll refer to as EK) has garnered a lot of attention and generated plenty of interesting discussion. Many important sticking points have already been debated at length, but I wanted to highlight a few reactions I had while reading the paper that I haven’t seen mentioned much. They include both supportive and critical perspectives, and are paper-specific but also broadly applicable to how we do survey experiments and empirical research.

1. In survey experiments, testing stimuli drawn from the real world is both valuable and justifiable. One of the more common critiques of EK’s work has been the choice of treatment frames used. There will always be substantial room for researcher discretion in how experimental stimuli are created. EK are motivated by real world messaging appeals — made by current politicians — that increasingly invoke race. In this sense, the stimuli of interest is already clearly defined, and the jump from real world construct to experimental representation is much less ambiguous. This is what one of the authors uses in defense of EK’s frame choices — these are frames from the real world, and this in turn has benefits such as conferring stronger external validity. I confronted similar critiques in a messaging-type survey experiment I did in the past and had a similar defense: if your experimental design connects clearly to your research question, and that question seeks to test reactions to pieces of real world information environments, stimuli based on real world content are very useful.

2. We should be cautious with judging effect sizes in messaging survey experiments, which can underestimate influence. Another common pushback EK received centered on the seemingly minor size of the treatment effects on public opinion. Moving beyond only considering statistical significance is important, but often times people have benchmarks in mind for opinion change (e.g. those from observational polling trends) that are not comparable to experimental treatment effect magnitude. Most survey experiments give a good sense of how large an effect treatments would have in the real world but far from perfect. I can think of at least two reasons why they may appear smaller in experiments, mainly in the context of treatments on messaging, cues, and information. First, subjects may have received the treatment before entering the experiment and thus be less moveable by it once in the study. This is known as “pretreatment effects.” For example, imagine that some subjects already heard the class- or race-based appeals in EK’s design before the study, and updated their policy opinions in response. Had this not happened (and those persuadable by the frames weren’t already persuaded), we’d observe larger effects from the experiment. Second, a one-shot treatment like EK’s is not the same as cumulative, repeated exposure to it — the type of “treatment receipt” that’s more likely to occur in the real world if a certain treatment frame is adopted widely (see here for a related perspective). Whether or not due to these issues, it might be unsurprising that recent research argues survey experiments might actually be underestimating the effects of cue-/informational-based treatments.

3. The difference between a significant effect and insignificant effect is not necessarily itself significant. At the heart of EK’s study is the finding that racial frames are more effective than class or race+class ones. Nearly all public discussion of the paper revolves around this point too. After all, testing which message is most effective is a question of comparison. Unfortunately, EK do not provide this statistical test — whether treatment effect A (e.g. class frame) is statistically significantly different from treatment effect B (e.g. race frame) — that is central to their study’s purpose and necessary for properly understanding its findings. This is of course not a fatal error — the key test can easily be done and added to the paper — but it’s worth pointing out that this key test is omitted. Sadly I can’t say it was strange to see it missing, as this slippage (seeing one significant effect, another insignificant effect, and discussing results as if the two effects are themselves distinguishable) is endemic in quantitative social science.

4. In the presence of many researcher degrees of freedom, we need a clear sense of what all combinations show and (in)consistency of results. Experiments are often nice because, compared to observational work, researchers have many fewer degrees of freedom (i.e. choices in how to conduct analysis), analysis is straightforward, and as a result we might be more inclined to believe the result is not a false positive or p-hacked. When reading EK’s paper that uses an experiment, I was surprised to see how many different analytical choices were being used. For example, analysis could diverge on the following dimensions: 1) continuous outcome vs. binary measure, 2) survey weights vs. no weights (the construction of which, a little concerningly, was not specified in the pre-analysis plan), 3) pretreatment covariates as controls vs. none, 4) pooled across issues vs. first issue (or other issue subsets) only, and 5) many different subgroups where effects were checked. Despite many researcher degrees of freedom, we only see a small slice of all possible combinations. It would be unwieldy to individually present results from all combinations, but this issue could be better handled (e.g. maybe showing a distribution of effects/t-statistics/p-values for all combinations). At the very least, we need to have a much better sense of how consistent results from all combinations are, as this bears directly on any conclusions we can draw.

5. Many borderline significant results, a weird distribution of p-values, and a large amount of statistical tests without correction for multiple testing is all concerning. As I was reading EK’s paper, I started noticing a peculiar amount of results that were significant at the p<.05 level. (Given that the authors distinguish between significance levels at .01, .05, and .10 levels, I assume these p<.05 results mean the p-value was between .01 and .05. and not just anywhere below .05.) By my count, out of the 18 reported p-values in the paper’s main text, 4 were <.01, 7 were between .01-.05, 3 were between .05-.10, and 4 >.10. I also counted 51 different statistical tests (e.g. treatment X vs. control comparisons). Why is this concerning? A concentration of results right near the .05 significance level threshold is odd and typically seen as a telltale sign of false positive results. Indeed, lower p-values correlate with higher replicability where even small shifts between the p of .01 to .10 range seem to matter. These concerns are exacerbated by the large amount of researcher degrees of freedom that I noted earlier (e.g. what if one small change in choice of analysis moves a p<.05 result above the threshold?). Importantly, EK do not correct for multiple testing in any way, which is especially problematic given the number of tests here (this means more opportunity to stumble on false positives). It’s not clear how many of their significant results would survive multiple testing corrections, but things don’t bode well in light of the number of borderline significant results.

Thoughts on “Racial Equality Frames and Public Policy Support” Working Paper

Ingroup or Outgroup Leader Influence? (Barber and Pope 2018 Robustness Section Reanalysis)


In their article “Does Party Trump Ideology? Disentangling Party and Ideology in America” recently published in the journal American Political Science Review, Michael Barber and Jeremy Pope present a very compelling, important, and timely study. Investigating the extent of party loyalty and the “follow-the-leader” dynamic among the American public, the authors test how partisans react to flexible policy position-taking by President Donald Trump—and one similar case study for Barack Obama—with a survey experiment. The main finding is striking: on average, when Trump takes a conservative position on a policy issue Republicans express more conservative beliefs on that policy themselves, and when Trump takes a liberal stance Republicans too become significantly more liberal as a result. The latter exemplifies blind leader adherence best—even when Trump takes positions outside of his party’s mainstream ideology, mass members of his party still become more likely to adopt his stance.

What about Democrats?

A common reaction to this finding has been questions about partisan (a)symmetries. The study concerned Republican members of the public and their current leader in Trump most. Should we expect the same dynamic among Democrats in blindly following a comparable leader in their party? To address this, Barber and Pope discuss a robustness analysis towards the end of their paper (in the subsection “Robustness: Other Political Leaders as Tests”) that tests for leader effects among Democrats using Barack Obama as the cue-giver. Specifically, they exploit the close similarity between a new immigration asylum policy from Trump—introduced in the spring of 2018—and the policy stance by the Obama administration a presidency earlier (both policies said families/children, when arrested by the border patrol, will be held in a detention facility before an asylum hearing). The leader cues in support of the policy can thus be credibly interchanged (i.e. experimentally manipulated). In an experiment, partisans were randomly told either 1) that this is Trump’s policy, 2) that this was Obama’s policy, or 3) no cue, after which they expressed how strongly they agreed (a value of 5 on a 1-5 scale) or disagreed (1) with the policy.

Barber and Pope describe their results and the implications of them in the following way:

“The results show large effects for Democrats and smaller, but still statistically significant effects for Republicans…

…Democrats are also willing to adjust their preferences when told that the policy was coming from Obama…”

Separating out Ingroup and Outgroup Cues

Though not explicitly stated, the purpose of this robustness study is to test whether evidence of strong in-group partisan loyalty and influence from leaders within the same party appear for Democrats as well. Because partisans are exposed to both Obama and Trump cues, results from this experimental design can speak not only to ingroup dynamics, but outgroup dynamics as well. The analysis approach used in the article, however, cannot distinguish between these two forces possibly at play. This is because outcomes from the experimental control condition (no exposure to a leader cue) are omitted from the analysis. Specifically, to calculate the effects (appearing in Figure 6 in the actual article), the treatment variable makes use of just the Trump cue and Obama cue conditions. The displayed treatment effects are just the difference in policy opinion between these two conditions.

Original analysis

Below is a graph reproducing the results in the original article (with replication data) using the original analysis approach: regressing the policy opinion variable on a binary variable that—for Republicans—takes on a value of 1 if the cue comes from Trump and a value of 0 if it comes from Obama (and the opposite for Democrats). Thick bars represent 90% confidence intervals and thin ones are for 95% confidence intervals. (Note: The original article shows 0.22 instead of 0.23. This is due to differences in rounding up/down.)


Democrats agreed with the policy by 1.18 points less when told it was Trump’s compared to being told it was Obama’s. Republicans agreed with the policy by 0.23 points more when it came from Trump (vs. coming from Obama). It is not clear, though, whether these effects are driven more by partisans following ingroup leaders on policy (Democrats following Obama), or being repelled by outgroup leaders (Democrats moving away from Trump’s stance). Fortunately, this can be separated out. Instead of comparing average opinion levels between Trump and Obama cue conditions, it would be more informative to compare the average in the Trump cue condition to that in the control condition, and the average in the Obama cue condition to that in the control (and again, split by respondent partisanship).


The below plot presents results from setting the control condition as the reference group in distinct “Trump cue” and “Obama cue” treatment variables, which predict policy opinion among Democrats (left-hand side) and Republicans (right). The Obama cue estimate and confidence interval appear in purple while those for the Trump cue appear in orange.

barberpope2_022019After breaking up the cue effects like this, the result for Republicans is no longer significant at conventional levels. The Obama cue treatment does not move their opinion much, while the Trump cue moves them 0.18 points more supportive, but the effect is not significant.

Of course, the key part of this study is opinion movement among Democrats. In making use of the control condition, this reanalysis reveals that the outgroup leader effect from Trump is nearly twice as large as the ingroup leader effect from Obama on mass Democratic opinion, though both dynamics are at play. When told the immigration asylum policy was Obama’s policy during his presidency, Democrats become 0.42 points more supportive relative to the control. This effect is statistically significant, and provides evidence of what this robustness study was seeking: mass Democrats following their own leader on policy. When told the policy is Trump’s, Democrats react more strongly, becoming 0.76 points more opposed to the policy compared to the control (also statistically significant). To sum up, the ingroup follow-the-leader effect certainly arises for Democrats in this study. But the reported treatment effect was driven in larger part by a reaction to an outgroup leader’s expressed stance—a dynamic different than the one at the heart of the original article.


Beyond clarifying this part of Barber and Pope’s paper, the specific result should not come as that much of a surprise in the context of related literature. In his 2012 article “Polarizing Cues” in the American Journal of Political Science, Stephen Nicholson uses a survey experiment to find that when party leaders take a position on housing and immigration policies, mass partisans from the out-party move significantly away from this leader’s position. (For example, Republicans oppose an immigration bill substantially more when they hear Obama supports it versus when they don’t hear his position.) Thus, this particularly strong reaction to an outgroup leader cue in Barber and Pope’s robustness study—which likely incited negative partisanship—makes sense.

[As an interesting aside, Nicholson curiously does not find strong evidence for ingroup leader persuasion; partisans don’t follow-the-leader much. This contrasts with Barber and Pope’s main results: as Figure 1 in their paper indicates, Republicans follow their ingroup leader in Trump a considerable amount, but Democrats do not react that negatively to an outgroup leader (Trump) in expressing their policy opinion. Future research should address this uncertainty, paying special attention to 1) cue type (actual leader names? anonymous partisan Congress members? party labels?), 2) study timing (during a campaign? right after one when a president’s policy orientation is not as clear?), and 3) policy areas (will attached source cues be viewed credibly by respondents? do the issues vary by salience level?).]

Do Only Republicans Follow-the-Leader? No

Where does this leave us? To reiterate, the purpose of this robustness study was to check whether the follow-the-leader dynamic was not specific to Republicans (as the main study results may imply) but rather common to all partisans. The experiment does indeed support the idea that Democrats also sometimes follow-the-leader on policy opinion—just not as much as the original results may have indicated.

Moreover, other pieces of evidence support a view of partisan symmetry for this behavior. In the first part of a working paper of mine that builds on Barber and Pope’s article, I evaluate how partisans form their opinion in response to policy positions taken by leaders outside the party mainstream (a liberal position by Trump for Republicans, a conservative one by Obama for Democrats). In both cases, partisans follow their respective leaders. For example, when told Obama has expressed support of a major free trade bill that was previously proposed by Republican legislators, Democrats move 1.11 points more supportive of the bill (on a 1-7 scale) compared to no exposure to an Obama cue.

Second, panel survey evidence by Gabe Lenz in his 2012 book “Follow the Leader? How Voters Respond to Politicians’ Policies and Performance” is also telling. One of the case studies Lenz uses is George W. Bush’s policy proposal to invest Social Security funds in the stock market during the 2000 election (his opponent, Al Gore, opposed it), and how this became the most prominent policy debate during the campaign. From August to late October of 2000—during which the issue became particularly salient—Lenz finds that Gore supporters change their policy opinion most to bring it in line with their leader’s (Gore’s stance of opposition). Given that Gore supporters are more likely to be Democrats, this serves as another example of Democrats following their leader on policy. These pieces of evidence—along with Barber and Pope’s robustness study—thus show the follow-the-leader dynamic cuts across partisan lines.

Ingroup or Outgroup Leader Influence? (Barber and Pope 2018 Robustness Section Reanalysis)

Turnout Underestimates and Voter File Match Rate Problems in the 2016 CCES

In versions of the Cooperative Congressional Election Study before 2016, vote validated turnout was consistently higher than actual turnout across states. Grimmer et al. 2017, for example, show this phenomenon here in Figure 1. Matching CCES respondents to individual state voter files to verify whether they voted using governmental records gives a more accurate picture of voter turnout, but the CCES–as with nearly all other surveys–still suffers from a bias where those who take the survey are more likely to have voted than those who did not take it, all else equal.

However, this trend took a weird turn with the 2016 CCES. Unlike the typical overrepresentation of individuals who voted in the CCES, the 2016 version seems to have an underrepresentation of voters. The below graph shows this at the state level, plotting actual voter eligible population (VEP) turnout on the x-axis against CCES vote validated turnout on the y-axis. The closer that the points (states) fall on the 45-degree line, the closer CCES vote validated turnout approximates actual turnout at the state level.


The line of best fit in red clearly does not follow the 45-degree line, indicating that CCES vote validated turnout estimates are very far off from the truth. For comparison, I did a similar plot but for vote share–state level Democratic two-party vote share in the CCES vs. actual two-party vote share:


This result should suggest that it’s not that state level estimates of political outcomes from the CCES are wholly unreliable. Rather, the problem is more specific to state level turnout in the CCES, which Grimmer et al. 2017 stress. That still doesn’t address the switch from average overrepresentation to underrepresentation of voters from 2012 to 2016 in the CCES. In particular, regarding the first graph above, a set of seven states–at around 60-70 percent actual turnout but at around 25 percent CCES turnout–were very inaccurate. I plot the same relationship but change the points on the graph to state initials to clarify which states make up this group:


CCES turnout estimates in seven Northeastern states–Connecticut, Maine, Massachusetts, New Jersey, New Hampshire, Rhode Island, and Vermont–severely underestimated actual turnout. The below table gives the specific numbers on estimated turnout from the CCES, actual turnout, and deviation of CCES turnout from actual turnout (“error”) across these seven states:


On average, CCES turnout in these states underestimated actual turnout by 38.1 percentage points. It is very unlikely that the CCES just happened to sample many more non-voters in these seven states, which marks one explanation for this peculiar result. Another more likely explanation concerns problems with matching CCES survey respondents to the voter file, as Shiro Kuriwaki suggested to me. This turns out to be the likely source for the egregious error. Catalist, a company that manages a voter file database and which matched respondents from the CCES survey to the voter file, had very low match rates for respondents from Connecticut (40.7 percent match rate), Maine (35.6), Massachusetts (32.2), New Jersey (32.1), New Hampshire (38.2), Rhode Island (37.2), and Vermont (33 percent). The below graph illustrates how this affects turnout estimates:


Catalist match rate (the percentage of survey respondents that were matched to the voter file) is plotted on the x-axis, and the difference in CCES turnout and actual turnout (i.e. error) is plotted on the y-axis. These two variables are very closely linked, and for an obvious reason: the CCES treats respondents that are not matched to the voter file as non-voters. Inaccuracies with turnout estimates in fact reflect inaccuracies with voter file match rate. This weird pattern in 2016 is not about overrepresentation of non-voters in the seven specific states but rather about errors in properly carrying out the matching process in those states. The under-matching issue has received attention from CCES organizers and it appears it will be corrected soon:



What’s still strange is that even after ignoring those error-plagued seven states, you don’t observe the usual overrpresentation in the remaining states without a clear matching problem. Many are close to the 45-degree line (that indicates accurate survey turnout estimates) and fall on either side of the line, with more still under the line–suggesting that in several states, the CCES sampled more non-voters than it should have. The estimates remain close to actual turnout, but I still think this is unusual compared to the known consistent overrepresentation of voters in past CCES surveys (again, see Figure 1 here). Perhaps lower-than-usual voter file match rate–while not to the same degree as in the seven Northeastern states–also contributed to a lower than expected CCES vote validated turnout across many other states. However, it could also be that voter/non-voter CCES nonresponse bias occurred to a smaller degree (and even flipped in direction for some states) in 2016.

Update 2/10/18:

It looks like this issue in the CCES has been fixed and the corrected dataset has been posted to Dataverse.

Update 2/14/18:

I re-did the main part of the analysis above with the updated CCES vote validation data. As the below figure plotting actual turnout against CCES turnout shows, considerable less error results. I calculate “error” as CCES turnout rate minus actual VEP turnout rate. The average error is +0.57 points, ranging from -10.8 (the CCES underestimating turnout) to +10.8 (overestimate), and the half of all states have lie between an error of -3.95 and +5.38.


Turnout Underestimates and Voter File Match Rate Problems in the 2016 CCES

Issue Positions and Identity in White Southern Partisan Realignment

The book Democracy for Realists is incredibly important for understanding the current American political environment, but as its authors Christopher Achen and Larry Bartels show, it also sheds light on key historical events. In one particularly informative example, Achen and Bartels apply their framework–the predominance of social identities and groups over issues and policy preferences for shaping political outcomes–to the question of what drove white partisan realignment in the South. Conventional wisdom holds that differences in opinion on racial policy issues underpinned Southern white flight from the Democratic Party. Achen and Bartels, however, demonstrate that the evolving partisan distribution of Southern whites did not differ much by opinion on key issues, such as support for or opposition to (1) enforced racial integration in schools or (2) government aid for blacks. Instead, Southern whites on either side of these issues moved just about equally away from the Democratic Party and to the Republican Party, leading Achen and Bartels to conclude that white Southern partisan realignment was not about policy issues. In further analysis, the authors show the partisan movement centered more on white Southern identity, proxied by feeling thermometer ratings of “Southerners,” as those strongest in this identity were most likely to have left the Democratic Party.

While not as specific policy preference questions as the ones Achen and Bartels used, there is some other interesting data in the ANES–not used by the authors–about general issue positions and perceptions speaking to racial conservatism. I wanted to check these, as well as the Southern feeling thermometer the authors used, as a way to further shed light on white Southern partisan realignment–and whether it varied more by issue positions or indicators of identity attachment.

For a couple years in the 1960s and 70s, the ANES asked respondents whether they favored desegregation, strict segregation, or something in between. Below, I plot how Democratic margin (Democratic % – Republican %) looked like by position on this issue among whites in the South. (Note: In all of the below plots, points correspond to sample size to give a sense of certainty of the estimates and serve as a reminder that these should be interpreted with caution as they’re not very precise.)


This is a short time frame, but if issue positions were driving partisan realignment, we would expect people who favored strict segregation/something in between to become less Democratic (i.e. drop further downward along the y-axis) at a faster rate than those who favored desegregation. At least in these early stages of realignment shown here, that’s not the case. There is movement (downward) away from the Democratic Party, but it doesn’t consistently occur in either of these issue position groups to a greater degree. Instead, those favoring the more racially liberal position of desegregation (the red line) trend Republican at faster rates in some of these years.

Another question with a longer times span is also informative. From the 1960s to the 90s, ANES respondents indicated whether they believed civil rights leaders pushed too fast, too slowly, or moved at the right speed. While not about a specific policy, the question does capture racial ideology to some extent–answers of “Too fast,” plotted in red in the below graph, mark the more conservative response.


As the graph shows, shifts away from the Democratic Party do not follow conservative or liberal positions on this issue. White Southerners who believed civil rights leaders pushed too fast and those who believed leaders pushed too slowly/at the right speed were about equally likely to leave the Democratic Party over time. Once again, this goes to show that key racial issues of the day did not shape partisanship change in the white South.

In conjunction with similar analysis by Achen and Bartels that show the same dynamic, the main takeaway here is that white Southern movement away from the Democratic Party and to the Republican Party does not appear to be associated with positions on racial issues. To argue in favor of an identity-driven partisan change story, Achen and Bartels focus on a feeling thermometer of “Southerners” (similar ratings are asked of other social groups too). While far from perfect, this measure should capture some semblance of Southern identity–what Achen and Bartels argue contributes most to the realignment. Like with the prior graphs, I wanted to check how white Southern partisan distribution varies by strength of this Southern identity proxy. I constructed a “Strong Southern Identity” measure (at the 75th percentile of the Southerners thermometer rating) and “Weak Southern Identity” measure from this ANES question, and plotted how the margin for Democratic partisan identification varied over time by these two identity strength levels. (Note: Different handling of this data–e.g. using the median or a rating of 50 as the cutoff for high or low identity strength–produce similar results.)


Although this thermometer rating isn’t asked in several years, a pattern becomes present: starting by the mid- to late-1970s, white Southerners with the strongest sense of Southern identity become more Republican over time than those with a weaker sense of this identity. Specifically, in 1976, those with strong Southern identities were 64 percent Democrat and 19 percent Republican. By 2008, they were 27 percent Democratic and 63 percent Republican. On the other hand, in 1976, those with weak Southern identities were 50 percent Democrat and 33 percent Republican. By 2008, they certainly changed their partisanship too, but not to the same degree, as they were 36 percent Democrat and 49 percent Republican. In sum, over this 32-year span, the partisanship of strong Southern identifiers changed a net 80 points in favor of the GOP–for weak Southern identifiers, it was less than half at just a 31-point swing.

Taking this graph and earlier ones together, these results further reinforce the notion–as established by Achen and Bartels–that identity, more so than racial conservatism or liberalism on issues, played a bigger role in the partisan realignment of white Southerners. The power of social identity relative to that of policy preferences for political behavior seems to dominate today’s political scene–perhaps this dynamic is a bigger part of American political history than commonly accepted as well.

Issue Positions and Identity in White Southern Partisan Realignment

Social Exclusion and Demographic Determinants of Minority Group Partisanship


In a recent Journal of Politics article, Alexander Kuo, Neil Malhotra, and Cecilia Hyunjung Mo make a very interesting and novel contribution to our understanding of partisan identification. In what’s particularly relevant to non-white minority groups, the authors argue that experiences of social exclusion on the basis of one’s racial/ethnic group membership can influence political identity. People can interpret individual experiences of exclusion as group exclusion. When one party is considered more exclusionary, these experiences can define which party best represents group interests, motivating greater attachment to/detachment from certain parties. Kuo et al. cite past research to establish the prevailing view of the Democratic Party as the party most beneficial to ethnic minority groups and the less exclusionary one. As a result, feelings of social exclusion should translate into greater identification with and support for the Democratic Party.

Continue reading “Social Exclusion and Demographic Determinants of Minority Group Partisanship”

Social Exclusion and Demographic Determinants of Minority Group Partisanship