This analysis was done in collaboration with G. Elliott Morris, a fellow junior undergraduate student interested in political science and statistics. You can find his blog here and his Twitter page here.
While the polling-rich election season may have ended months ago, there remains plenty of debate surrounding public opinion data. Nowadays, the focus revolves around Donald Trump’s approval rating numbers. The data, which has come in the form of 95 different polls containing approval ratings as of February 26th, has been interpreted very differently by different parts of the public–including by the president himself:
The same people who did the phony election polls, and were so wrong, are now doing approval rating polls. They are rigged just like before.
— Donald J. Trump (@realDonaldTrump) January 17, 2017
It’s unlikely data (of any kind) could sway Trump. But the wide-ranging reaction by the public is more understandable as these polls are telling somewhat different stories. Of the 95 approval rating polls that have been conducted, the average net rating is -3.4 points (when you subtract Approve % from Disapprove %, where negative values indicate more people disapprove of Trump than approve of him). But this net approval rating has ranged from as low as -18 points to +18 points. Given how a poll can differ by several factors such as the medium through which it’s conducted and the portion of the public it surveys, this variability should not come as a total surprise. What’s important to do in this case is to measure what qualities of polls lead to a more favorable or unfavorable result for Trump in his approval rating. In this way, interpreting the influx of polls out in the public domain becomes easier.
Trying to gauge effects of different factors all at one makes this situation ripe for multivariate regression analysis. Natalie Jackson of HuffPost Pollster took a key first step on this front, finding that the effects of the Rasmussen poll, polls with registered voters, and polls conducted online had positive significant effects in a regression predicting Trump net approval. Below, in work I collaborated with G. Elliott Morris on, we try to expand on this by first running a more recent regression and doing so for both net approval rating and approval percentage, and then calculating house effects for each pollster.
What Affects Trump’s Approval Rating?
In order to take several different survey characteristics into account all at once and estimate their isolated effects while controlling for all other effects, we used multivariate linear regression. In the table below, we ran models that predicted net approval rating (% approving of Trump in a poll minus % disapproving of Trump), appearing in column 1, and approval percentage (% approving of Trump in a poll), appearing in column 2. The data came from the HuffPost Pollster website. We used a few different independent variables for both models:
- Survey population (i.e. polling universe): We looked at polls surveying either all adults in the United States, only registered voters, and or “likely voters” (modeled on their likelihood to vote in the next election). The “Adult Population” effect serves as the baseline for the estimate for this variable, with the table below showing the effects of a “Registered Voter Population” and “Likely Voter Population” relative to the “Adult Population” effect.
- Survey mode: This variable takes into account how a survey is conducted: through a live phone interview, a self-administered online questionnaire, or a mix of interactive voice response and online surveys (IVR/Online). We limit our scope to these three survey modes. Like with the previous variable, we have a baseline–“Live Phone” polls–with effects for “IVR/Online” and “Online” polls measured relative to this baseline appearing in the regression table.
- Days since the inauguration: Calculated as the days between a poll’s end field date and January 20th, 2017.
- Poll field time: Calculated as the difference in days between the start and end date of a poll’s period in the field.
- No opinion percentage: Calculated by subtracting the percent approving and disapproving of Trump from 100, leaving us with people who weren’t sure or had no opinion about Trump in a poll.
|Net Approval||Approve Pct.|
|Registered Voter Population (Relative to Adults)||5.209***||2.604***|
|Likely Voter Population (Relative to Adults)||5.108||2.554|
|IVR/Online Mode (Relative to Live Phone)||9.044**||4.522**|
|Internet Mode (Relative to Live Phone)||7.777***||3.888***|
|Days Since Inauguration||-0.269***||-0.135***|
|Poll Field Time||-0.744**||-0.372**|
|No Opinion Pct.||-0.117||-0.559***|
|Residual Std. Error (df = 87)||4.096||2.048|
|F Statistic (df = 7; 87)||38.277***||69.823***|
|Note:||*p<0.1; **p<0.05; ***p<0.01|
Both models come up with several statistically significant independent variables, and they explain a large amount of the variation in net approval (74%) and approve percentage (84%) from polls. In model 1 predicting net approval, polls that survey registered voters result in a net 5.2 points more for Trump’s approval ratings than polls that survey all adults. This confirms that when you narrow the population from which you’re sampling from the entire public to only those registered to vote, you’ll end up with respondents more favorable to Trump. Even bigger effects appear for the mode variable. Relative to live phone surveys, IVR/online polls are a net 9.0 points and internet polls a net 7.8 points more favorable to Trump. This makes the early mode effect in Trump approval rating polls very clear: surveys conducted online and without a live interviewer result in much better net approval ratings for Trump than surveys conducted over the phone by live interviewers.
The variable for days since the inauguration is also statistically significant, but in the negative direction: with each day we get further away from the inauguration, Trump’s net approval gets 0.27 points worse. This makes sense given that events during his presidency have likely only tarnished his image rather than improved it, with more occurring as his presidency progressed past his inauguration date. The variable for the amount of days a poll was conducted is also significant and negative, which would indicate that as a poll was fielded for a longer period, the worse Trump’s net approval would result. However, it’s hard to see what actual mechanism is causing this and it’s likely that this variable picks up the effect of another variable (e.g. survey quality), so this significant effect is not very meaningful.
The second model regresses approval percentage–rather than net approval–on all the aforementioned predictors. The same significant effects (coefficients) result and are in the same direction as those in model 1: registered voter populations, IVR/online survey modes, internet only survey modes, fewer days since the inauguration, and shorter field periods result in higher percentages approving of Trump. The variable for no opinion percentage comes up as significant and negative, but this is an artifact of it being related to the dependent variable in this model; approve % and 100 – (approve % + disapprove %) are part of the same 100% of all respondents, so a change in one of these variables will always be negatively associated with a change in the other.
Survey House Effects
Evidence of these mode, population, and period effects are not new. Where we add a new layer of understanding is in calculating survey house effects below.
At this early stage in Trump’s presidency, there aren’t as many approval rating polls to evaluate as we would like. There are currently 34 from Gallup and 23 from Rasmussen, but no other pollster has conducted more than five polls asking about Trump approval. This presents a problem at this early stage, as any house effects we calculate are based on a small sample of polls from a given pollster. Survey house effects are likely fairly variable in these first few months of the Trump era, and effects that appear at this point could easily change over the course of the next few months. Thus, it’s important to keep this caveat in mind when viewing the below house effect calculations–they give only a good early picture at house effects, and not as clear a signal as would get in a few months. That being said, here’s how we carried out this process.
First, we downloaded approval rating data for Trump from the HuffPost Pollster website. Including only data for a poll’s entire population (and not just Republicans or Democrats, for example), we created 17 different variables for the 17 different pollsters who have asked about the president’s approval rating. These variables individually went into different regressions predicting Trump approval percentage (or his net approval rating), along with population (adults–the baseline–registered voters, and likely voters), mode (live phone–the baseline–Internet, and IVR/Online), days since the inauguration, poll field time, and the no opinion percentage (as described before). In this way, for each pollster, we were able to make all other polls the baseline in a regression, and then calculate the effect of each pollster on Trump approval (or net rating) relative to a baseline of all other polls. We term this effect–the coefficient from each different regression for each different pollster–the “house effect” of a given pollster.
The graph below plots the survey house effect for 17 different pollsters when using net rating as the dependent variable in 17 different regressions, from greatest effect against Trump in blue to greatest effect in favor of his net rating in red:
After controlling for various different survey characteristics, PPP polls have the strongest in-house effect against Trump out of all polls measuring approval rating of the new president. On the other end of the spectrum, Rasmussen polls have the strongest in-house effect in favor of Trump in terms of producing greater net approval ratings.
Using approval percentage as the dependent variable in this process doesn’t change much–only the range in coefficient values–as it tells the same story as the above graph:
The below table lays out all the survey house effects (i.e. regression coefficients) for each of the 17 pollsters and for net approval and approval percentage. Let’s use net approval as an example for how to interpret these numbers. Rasmussen polls have an in-house bias that makes Trump net approval 13.8 points better relative to all other polls. Meanwhile, PPP has the opposite effect, as relative to all other pollsters, its in-house bias is 13.8 net points worse for Trump. Gallup, at a net -0.3 points, is currently the poll with the smallest in-house survey effect in either direction. The effects for all the other pollsters follow the same scheme–negative values indicate a survey house bias against Trump, and positive values indicate a survey house bias in favor of Trump.
As mentioned before, a lot of these calculations are tentative. Outside of Gallup and Rasmussen, pollsters don’t have large enough samples of approval ratings for us to assertively conclude house effects. This should just serve as a guide for what to look out for, and which polls have shown early signs of in-house biases. At the moment, Gallup, which has the smallest house effect, is the clearest indicator for Trump approval rating, so it might be worth taking more stock into polls it releases.