Tracking National Attention toward Mass Shootings with Google Trends Data

Many often lament that attention toward mass shootings and subsequent debate they engender is fleeting. In a matter of a week, if not days, national discussion about the tragedy itself as well as measures to prevent future ones (largely centered around gun control) quickly evaporate. However, with the most recent mass shooting at Stoneman Douglas High School, there does seem to be evidence of a different trajectory.

To capture “national attention” toward this mass shooting, I used Google Trends data to track web search frequencies for two sets of searches: “gun control” and the name of the location of the mass shooting. In addition to doing this for Stoneman Douglas, I gathered similar data (using the gtrendsR R package) for all other mass shootings that were in the top 10 most deadliest–including Stoneman Douglas, this amounted to seven mass shootings.

Below are two graphs showing the trajectories for both search terms. For each graph, search volume is placed on a 0-100 scale (where 100 represents the highest volume). First, I show searches for gun control seven days before and six days after each of the seven mass shootings:


Each event follows a very similar path. Before Stoneman Douglas, four of the six saw a spike in public discussion about gun control followed by a dramatic decline into obscurity. The trends following the Sandy Hook and San Bernardino diverged from this pattern, as even about a week after these shootings, debate about gun control persisted. The Stoneman Douglas shooting has followed the trajectory of these latter two events: after falling a bit from its peak, gun control debate–as measured by Google searches, which is a serviceable but not perfect proxy–has persisted in the following week. Moreover, six days out, attention toward gun control in the aftermath of Stoneman Douglas eclipsed that after Sandy Hook and San Bernardino.

A similarly distinctive trend for Stoneman Douglas materializes in the following graph as well, which plots web searches for the shooting location name in the two weeks following the shooting:


In nearly every case, the two-week aftermath saw the shooting quickly fall off the radar. In most cases, it took just a matter of days for public attention to dissipate. Interestingly, for the five days after this most shooting, it seemed like Stoneman Douglas was following this same trajectory. But within the last few days (Days 6, 7, and 8 on the graph), attention toward the Stoneman Douglas shooting has reversed its descent to obscurity, and instead has started to receive renewed attention (now on an upward trend). The distinctive post-tragedy trajectory for Stoneman Douglas–maintaining national attention and spurring gun control debate more than usual–is fairly clear by now, and perhaps owes to the role that the school’s students have played at the center of the national debate on gun control in the week following the tragedy.

Tracking National Attention toward Mass Shootings with Google Trends Data

Vote Validation and Possible Underestimates of Turnout among Younger Americans

Vote validation data appended onto survey data is incredibly valuable. Due to the propensity of individuals to lie about whether or not they voted in an election, self-reported turnout is unreliable. Moreover, as that linked Ansolabehere and Hersh 2012 paper shows, this overreport bias is not uniform across Americans of different demographic characteristics, which further precludes any credible use of self-reported turnout in surveys. Checking self-reported turnout against governmental records of whether or not individuals actually voted provides a much more accurate (though not flawless) measure of whether or not someone really voted in an election. I mention that it’s not without flaws because in order to create this metric–validated turnout–respondents to a survey need to be matched to the voter file (each state has one) that contains turnout information on them. This matching process does not always go smoothly. I explored one case of that in my last post (which has since been fixed). Another potential issue was raised on Twitter by political scientist Michael McDonald:

Aside from the topic of this specific discussion, McDonald is making an important broader point that survey-takers who move (have less residential stability) are less likely to be matched to the voter file; even if they turn out to vote, they may not be matched, and thus would show up as non-voters on surveys with vote validation. Younger individuals tend to move more, and so this flaw could impact them most.

I thought it might be interesting to check for evidence of such a pattern with CCES vote validated turnout by age, and compare those estimates against another commonly used data source to study turnout among different demographics: the Current Population Survey (CPS). For the latter data, I pulled two estimates of turnout from McDonald’s website: 1) CPS turnout with a Census weight (which I’ll refer to as “CPS Turnout”) and 2) CPS turnout with a Census weight and a correction for vote overreport bias (which I’ll refer to as “Corrected CPS Turnout”), more detail on which can be found here. I end up with three turnout estimate sources (CCES, CPS, Corrected CPS) across four age groups (18-29, 30-44, 45-59, 60+), all of which I graph below. The key comparison is between CCES turnout and the two CPS turnout estimates. As McDonald describes, the correction to the CPS turnout is important. Therefore, I pay special attention to the Corrected CPS metric, showing the difference between CCES and Corrected CPS turnout estimates in red above the bars for each age group.


These surveys use very different sampling and weighting procedures, so, on average, they likely produce different estimates. If these differences are constant across each age group, then there is likely nothing going on with respect to the movers/youth turnout underestimate theory. However, the difference–the (CCES – Corrected CPS) metric in red–does in fact vary by age. Most vividly, there is no difference in turnout estimate between these two metrics at the oldest age group, for Americans 60 and older. Each metric says about 71 percent of those age 60+ turned out to vote in 2016. However, for each younger age group, CCES vote validated turnout is smaller than the Corrected CPS one. The largest difference (a 12.4 point “underestimate”) curiously appears for the 30-44 age group. This result doesn’t fall seamlessly in line with the youth turnout underestimate theory–which would suggest the younger you go in age group, the larger the underestimate becomes. But the lack of underestimate for the oldest age group–almost surely the most residentially stable of the age groups–compared to underestimates between five and 13 points for the younger age groups is very telling.

I would need to find data on residential mobility/rate of moving by age group in order to confirm this, but it does seem the most likely to move–the youngest three age groups–see a greater difference between a turnout score built from vote validation and a turnout score that doesn’t use vote validation, the CPS. If that’s the case, I think the theory of vote validation missing some movers and thus likely younger Americans (who are actual voters)  is convincing. This notion would fall in line with takeaways from past research similarly looking at the ties between movers, age, and political participation. Thus, the results here shouldn’t be too surprising, but this possible underestimate of youth turnout is something researchers should keep in mind when using surveys that include vote validated turnout, like the CCES. Regardless, this represents just one (potential) drawback amid an otherwise extremely useful dataset for studying political behavior. Every survey has its flaws, but few have a measure of vote validated turnout, which will always prove more reliable than self-report turnout metrics found in typical surveys.

Vote Validation and Possible Underestimates of Turnout among Younger Americans

Turnout Underestimates and Voter File Match Rate Problems in the 2016 CCES

In versions of the Cooperative Congressional Election Study before 2016, vote validated turnout was consistently higher than actual turnout across states. Grimmer et al. 2017, for example, show this phenomenon here in Figure 1. Matching CCES respondents to individual state voter files to verify whether they voted using governmental records gives a more accurate picture of voter turnout, but the CCES–as with nearly all other surveys–still suffers from a bias where those who take the survey are more likely to have voted than those who did not take it, all else equal.

However, this trend took a weird turn with the 2016 CCES. Unlike the typical overrepresentation of individuals who voted in the CCES, the 2016 version seems to have an underrepresentation of voters. The below graph shows this at the state level, plotting actual voter eligible population (VEP) turnout on the x-axis against CCES vote validated turnout on the y-axis. The closer that the points (states) fall on the 45-degree line, the closer CCES vote validated turnout approximates actual turnout at the state level.


The line of best fit in red clearly does not follow the 45-degree line, indicating that CCES vote validated turnout estimates are very far off from the truth. For comparison, I did a similar plot but for vote share–state level Democratic two-party vote share in the CCES vs. actual two-party vote share:


This result should suggest that it’s not that state level estimates of political outcomes from the CCES are wholly unreliable. Rather, the problem is more specific to state level turnout in the CCES, which Grimmer et al. 2017 stress. That still doesn’t address the switch from average overrepresentation to underrepresentation of voters from 2012 to 2016 in the CCES. In particular, regarding the first graph above, a set of seven states–at around 60-70 percent actual turnout but at around 25 percent CCES turnout–were very inaccurate. I plot the same relationship but change the points on the graph to state initials to clarify which states make up this group:


CCES turnout estimates in seven Northeastern states–Connecticut, Maine, Massachusetts, New Jersey, New Hampshire, Rhode Island, and Vermont–severely underestimated actual turnout. The below table gives the specific numbers on estimated turnout from the CCES, actual turnout, and deviation of CCES turnout from actual turnout (“error”) across these seven states:


On average, CCES turnout in these states underestimated actual turnout by 38.1 percentage points. It is very unlikely that the CCES just happened to sample many more non-voters in these seven states, which marks one explanation for this peculiar result. Another more likely explanation concerns problems with matching CCES survey respondents to the voter file, as Shiro Kuriwaki suggested to me. This turns out to be the likely source for the egregious error. Catalist, a company that manages a voter file database and which matched respondents from the CCES survey to the voter file, had very low match rates for respondents from Connecticut (40.7 percent match rate), Maine (35.6), Massachusetts (32.2), New Jersey (32.1), New Hampshire (38.2), Rhode Island (37.2), and Vermont (33 percent). The below graph illustrates how this affects turnout estimates:


Catalist match rate (the percentage of survey respondents that were matched to the voter file) is plotted on the x-axis, and the difference in CCES turnout and actual turnout (i.e. error) is plotted on the y-axis. These two variables are very closely linked, and for an obvious reason: the CCES treats respondents that are not matched to the voter file as non-voters. Inaccuracies with turnout estimates in fact reflect inaccuracies with voter file match rate. This weird pattern in 2016 is not about overrepresentation of non-voters in the seven specific states but rather about errors in properly carrying out the matching process in those states. The under-matching issue has received attention from CCES organizers and it appears it will be corrected soon:



What’s still strange is that even after ignoring those error-plagued seven states, you don’t observe the usual overrpresentation in the remaining states without a clear matching problem. Many are close to the 45-degree line (that indicates accurate survey turnout estimates) and fall on either side of the line, with more still under the line–suggesting that in several states, the CCES sampled more non-voters than it should have. The estimates remain close to actual turnout, but I still think this is unusual compared to the known consistent overrepresentation of voters in past CCES surveys (again, see Figure 1 here). Perhaps lower-than-usual voter file match rate–while not to the same degree as in the seven Northeastern states–also contributed to a lower than expected CCES vote validated turnout across many other states. However, it could also be that voter/non-voter CCES nonresponse bias occurred to a smaller degree (and even flipped in direction for some states) in 2016.

Update 2/10/18:

It looks like this issue in the CCES has been fixed and the corrected dataset has been posted to Dataverse.

Update 2/14/18:

I re-did the main part of the analysis above with the updated CCES vote validation data. As the below figure plotting actual turnout against CCES turnout shows, considerable less error results. I calculate “error” as CCES turnout rate minus actual VEP turnout rate. The average error is +0.57 points, ranging from -10.8 (the CCES underestimating turnout) to +10.8 (overestimate), and the half of all states have lie between an error of -3.95 and +5.38.


Turnout Underestimates and Voter File Match Rate Problems in the 2016 CCES

Leftovers from “Democrats Are Changing Their Minds About Race, and the Youth Are Leading the Way”

Here is some additional analysis and information for an NYMag piece that Sean McElwee and I wrote.

We used Voter Study Group panel data to track changes in racial attitudes towards blacks–as measured the traditional racial resentment battery–over time. Below is a more standard cross-sectional approach that doesn’t exploit the panel structure, showing same-year racial resentment levels among Democrats vs. all Americans in 2011 and 2016.


The key takeaway above is the shift toward more racially liberal attitudes occurs among all Americans, but happens at a faster rate among Democrats.

Then, we checked whether certain demographic characteristics were most associated with this racial liberalization trend among Democrats specifically. To do so, we restricted our sample–already formed a sample of 8,000 Americans interviewed in both 2011 and 2016–to only respondents who identified as Democrats in 2011 and 2016. Thus, we’re following the same group of consistent Democrats and seeing what other characteristics predict change to more liberal racial attitudes.


The demographic most strongly associated with this change turned out to be age, as the above graph shows, which calculates a net agreement level for each racial resentment item across three key age groups. We see that the youngest individuals–those age 17-29 during the 2011 survey–show the greatest shifts toward more liberal racial attitudes.

We also checked to see if this held up in a multivariate model that accounted for other demographic attributes of respondents. Specifically we regressed a dependent variable–indicating whether a respondent shifted from a non-liberal racial attitude in 2011 to a liberal racial attitude in 2016–on a few key demographic variables. As an example, I’ll describe the components that went into Model 1 from the table below, modeling the battery item that asked whether people agreed that blacks have gotten less than they deserve over the last few years:

  • Dependent variable: Our outcome is a 1/0 indicator. For this particular battery item, agreement (strongly or somewhat agree) represents a liberal racial attitude. Thus, to capture shift towards a liberal racial attitude, this variable takes on a value of 1 if a respondent answered anything other than agreement in 2011 AND said they agreed with the statement in 2016, and 0 otherwise. I’ll note a couple of other points. First, ordinary least squares regression produces the same results as logistic regression, so we stick with OLS as a matter of interpretability. Second, using a binary variable here means we ignore degree of agreement (i.e. we treat “strongly” and “somewhat” agree the same) which still could be important. This is a tradeoff we make, where we place greater value on a simple measurement of an attitude switch. I may try some different modeling strategies that capture degree of agreement–I’ll update this post whenever I get around to that.
  • Independent variables: The predictors here are race (Non-white race with whites as the baseline), age (Age 30-54 and Age 17-29 with Age 55+ as the baseline), gender (Female with Male as the baseline), and education (College grad with Non-college grad as the baseline).


Comparing the size and significance of the coefficients here indicate that the youngest Democrats (Age 17-29) are shifting their racial attitudes in the liberal direction the most. Importantly, the strength of the relationship holds when controlling for other potentially important variables, like race and education.

Update 2/2/18:

I ran the same models but with a continuous (rather than binary) racial attitudes scale as the dependent variable. Results from before hold, as the youngest age group drives overall racial liberalization among Democrats the most. For each item and for each wave (2011 and 2016), I created a 1-4 scale out of the agree/disagree four-point Likert scale, where 4 always represented the most liberal racial attitude and 1 always represented the most conservative racial attitude. I then used the difference between the 2016 scale and 2011 scale to create the outcome measure (indicating racial attitude change in the liberal direction). Below is a plot of the coefficients from the same multivariate regression from above except for the dependent variable which is now this new continuous measure. As an example of how to interpret this result, for the “deservemore” item, Democrats of age 17-29 grew 0.36 points more racially liberal than Democrats of age 55+.


Leftovers from “Democrats Are Changing Their Minds About Race, and the Youth Are Leading the Way”