I just re-read this thing, and... well... HOLY CRAP! This report sends up a HUGE red flag. Statistics aren't proof, but we're getting damn close to it now.
----
A poll should be a random sample representative of the whole data set. The measurement error for a random sample is normally distributed, and for our sample -- comparing a poll to a dataset of votes -- should be clustered around zero, with deviations equally likely to have occurred in both directions (i.e., errors favoring Bush and Kerry should have been equally likely). Sort of like this:
Non-normal measurement errors indicate a problem with the sample. If the poll's sample collection were bad, and Mitofsky had failed to draw a random sample and weight it properly, then the measurement error would not have been normally distributed. It would have been lopsided:
Now here is what OUR measurement error curve looks like:
We
have a normally distributed measurement error, as the paper shows, but it is not clustered around zero. It's shifted left.
I need to do some more research on this, but intuitively it would seem to suggest that if we have a normal measurement error distribution that is centered around some number other than zero, and we're expecting it to be centered around zero, there is a problem with the dataset. Especially if our sample (the poll) is not a set of values drawn EXPLICITLY from the dataset.
See, ordinarily, you'd take a random sampling of measurements from a very large dataset by directly measuring your variable. The poll technically doesn't do this, because whether one believes in fraud or not, asking 12,000 people how they voted is NOT the same thing as directly observing a random sample of 12,000 ballots being scanned and tallied. It's not really sampling the same dataset. The dataset of all tallied votes (let's call it Set T) and the dataset of all 115 million voters' responses to a poll (which we'll call Set P), had such a thing been taken, are not the same. If we can have confidence in the counting, then they should represent the same
data, but they aren't the same collection. The poll is sampling one and claiming that it represents both.
Clearly, this poll did not represent the set of tabulated votes--Set T--but it was normally distributed, which means it was a random sample of Set P. That implies that Set P was shifted one standard deviation away from Set T. So again, we either have this bizarre nationwide pattern of Bush voters being quiet (or maybe even lying about how they voted), or we have a problem with the counting. The paper offers up very good evidence against the idea of Bush voters being quiet or lying about their votes. The biggest errors favoring Bush occurred in strongly Republican areas where they would not have any reason to feel embarrassed about their votes, had they voted for Bush.
//Edit final name of that paragraph. Typo. Bush, not Kerry. :P
---
Sorry for the long nerdy post. I hope I haven't confused anyone.