2016 Postmortem
Related: About this forumYet Another Primer: When polls get it wrong part 1--your sample fails you
Traditional disclaimer: I am a statistical analyst in real life who works in big data using many of the same techniques found in the world of polling; I am also a Clinton supporter. However, these primers are intended to be nonpartisan and educational. Any examples I make using real candidates are for illustrative purposes only and are not subtle political commentary.
I was thinking of what other topics I could cover in these primers, and it occurred to me that it may be worthwhile to talk about how polling may actually miss the mark. I've talked a bit about this before, and I will crib some of my notes from those other posts, but I want to do something fairly original that deals with when the poll results themselves are wrong but not due to sneaky moves on the part of the pollster (I.e. methodology was properly followed, but the poll was a loser.). I am going to divide this into two categories: "micro" misses where individual polls are wrong and "macro" misses where the majority of polls are wrong. I will get into examples of both of these in future posts, but let's start with a micro issue for our first go around.
There's a 5% chance your sample skunked you
I am going to begin with a repeat topic, because this seems to be an area of contention. Scientific polls can be wrong. And they can be wrong because the sample is not representative of the population being examined. Pollsters generally set up their analysis parameters to a 5% limit. What this means is that out of every 20 polls done on a population, 1 of them, on average, will have a misrepresentative random sample. Sometimes these can be small misses, but there are other times where they're major. For example, say a sample contains 50 African American respondents, it is theoretically possible that the random sample coincidentally gets 45 of them who are backing Donald Trump. Now, in real life, it is very unlikely that Trump is getting 90% support from this demographic, but, if the sample was truly random, the pollster needs to follow their methodology. Soon enough there will be a press release about how African Americans have embraced Trump. That result would easily be considered a 1-in-20 situation.
I will also give you one example from my work. I analyze literally hundreds of thousands of data points in a day, so a lot of what I do is sample creation. Just last week I pulled a sample that underestimated a cost metric by a factor of 45% because a majority of the random sample managed to fall in a "valley" where an asset was being used in a non-normal way. Completely random chance, but the data points picked just happened to be in the same general area.
Now, here is what is different between the two examples--I recognized that my sample was way off, and I was able to do something about it. In my case, I pulled multiple samples to confirm or reject the findings of the original sample. Pollsters, however don't have that luxury, as pulling new samples from the population cost both time and money. As a result they have to follow their established methodology and release the poll.
So, statistically speaking, an average of 1 out of every 20 polls on a common subject by the same pollster is wrong. Sometimes, just sometimes, the numbers can lie.
cosmicone
(11,014 posts)Sheepshank
(12,504 posts)I'd always thought that the margin of error, and let's say it's 5%, meant that an actual percentage quoted could vary up or down 5%. Waddya you, learned something new
Godhumor
(6,437 posts)And MOE is essentially what you just described it as. There just happens to be a second type of error possible, which is the confidence level the pollster had that the sample itself is a representative sample, and that is almost always at 95%. MOE is then essentially "If the sample is correct, the real results are somewhere within this band."
It is rarely acknowledged in the industry, however, that a sample can be bad.
oasis
(49,388 posts)Godhumor
(6,437 posts)Agschmid
(28,749 posts)JaneyVee
(19,877 posts)jeff47
(26,549 posts)but "likely voter" screens are one such way to get a bad sample.
For example, there's some polls from Iowa where the "likely voter" screen excludes anyone who did not participate in the 2008 caucuses. Which means they throw out everyone under 26 (had to be 18 in 2008). Also excludes people who were "young/indifferent" in 2008 but are more involved now.
My apologies if you were just wanting to cover random anomalies in "part 1".
Godhumor
(6,437 posts)Whether I get to them depends on interest in the posts. So, if not, this is a good reminder. Thanks.