Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Godhumor

(6,437 posts)
Mon Nov 9, 2015, 12:57 AM Nov 2015

Yet Another Primer: When polls get it wrong part 1--your sample fails you

Traditional disclaimer: I am a statistical analyst in real life who works in big data using many of the same techniques found in the world of polling; I am also a Clinton supporter. However, these primers are intended to be nonpartisan and educational. Any examples I make using real candidates are for illustrative purposes only and are not subtle political commentary.

I was thinking of what other topics I could cover in these primers, and it occurred to me that it may be worthwhile to talk about how polling may actually miss the mark. I've talked a bit about this before, and I will crib some of my notes from those other posts, but I want to do something fairly original that deals with when the poll results themselves are wrong but not due to sneaky moves on the part of the pollster (I.e. methodology was properly followed, but the poll was a loser.). I am going to divide this into two categories: "micro" misses where individual polls are wrong and "macro" misses where the majority of polls are wrong. I will get into examples of both of these in future posts, but let's start with a micro issue for our first go around.

There's a 5% chance your sample skunked you

I am going to begin with a repeat topic, because this seems to be an area of contention. Scientific polls can be wrong. And they can be wrong because the sample is not representative of the population being examined. Pollsters generally set up their analysis parameters to a 5% limit. What this means is that out of every 20 polls done on a population, 1 of them, on average, will have a misrepresentative random sample. Sometimes these can be small misses, but there are other times where they're major. For example, say a sample contains 50 African American respondents, it is theoretically possible that the random sample coincidentally gets 45 of them who are backing Donald Trump. Now, in real life, it is very unlikely that Trump is getting 90% support from this demographic, but, if the sample was truly random, the pollster needs to follow their methodology. Soon enough there will be a press release about how African Americans have embraced Trump. That result would easily be considered a 1-in-20 situation.

I will also give you one example from my work. I analyze literally hundreds of thousands of data points in a day, so a lot of what I do is sample creation. Just last week I pulled a sample that underestimated a cost metric by a factor of 45% because a majority of the random sample managed to fall in a "valley" where an asset was being used in a non-normal way. Completely random chance, but the data points picked just happened to be in the same general area.

Now, here is what is different between the two examples--I recognized that my sample was way off, and I was able to do something about it. In my case, I pulled multiple samples to confirm or reject the findings of the original sample. Pollsters, however don't have that luxury, as pulling new samples from the population cost both time and money. As a result they have to follow their established methodology and release the poll.

So, statistically speaking, an average of 1 out of every 20 polls on a common subject by the same pollster is wrong. Sometimes, just sometimes, the numbers can lie.

9 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Yet Another Primer: When polls get it wrong part 1--your sample fails you (Original Post) Godhumor Nov 2015 OP
k&r n/t cosmicone Nov 2015 #1
Thanks for the info Sheepshank Nov 2015 #2
Margin of Error is a different topic Godhumor Nov 2015 #4
Kick and rec oasis Nov 2015 #3
My traditional blatant one bump n/t Godhumor Nov 2015 #5
I'll help. Agschmid Nov 2015 #6
Always informative and thorough. K and R! JaneyVee Nov 2015 #7
Don't know if you're going to cover this later jeff47 Nov 2015 #8
Yup, voter screens are on my list Godhumor Nov 2015 #9
 

Sheepshank

(12,504 posts)
2. Thanks for the info
Mon Nov 9, 2015, 02:31 AM
Nov 2015

I'd always thought that the margin of error, and let's say it's 5%, meant that an actual percentage quoted could vary up or down 5%. Waddya you, learned something new

Godhumor

(6,437 posts)
4. Margin of Error is a different topic
Mon Nov 9, 2015, 07:46 AM
Nov 2015

And MOE is essentially what you just described it as. There just happens to be a second type of error possible, which is the confidence level the pollster had that the sample itself is a representative sample, and that is almost always at 95%. MOE is then essentially "If the sample is correct, the real results are somewhere within this band."

It is rarely acknowledged in the industry, however, that a sample can be bad.

jeff47

(26,549 posts)
8. Don't know if you're going to cover this later
Mon Nov 9, 2015, 12:55 PM
Nov 2015

but "likely voter" screens are one such way to get a bad sample.

For example, there's some polls from Iowa where the "likely voter" screen excludes anyone who did not participate in the 2008 caucuses. Which means they throw out everyone under 26 (had to be 18 in 2008). Also excludes people who were "young/indifferent" in 2008 but are more involved now.

My apologies if you were just wanting to cover random anomalies in "part 1".

Godhumor

(6,437 posts)
9. Yup, voter screens are on my list
Mon Nov 9, 2015, 01:10 PM
Nov 2015

Whether I get to them depends on interest in the posts. So, if not, this is a good reminder. Thanks.

Latest Discussions»Retired Forums»2016 Postmortem»Yet Another Primer: When ...