I was asked to describe the problems with political polls. It is a great year for showing the problems in predicting from opinion polls. Projecting isn’t the problem – we take partial duration series (like flood data) and project the likelihood of larger and smaller events occurring. In my 70 years on the planet, I’ve seen a couple of hundred-year floods on the same river – and it isn’t a big deal. When you project a hundred-year occurrence from 38 years of data, it is a question of how wrong you’re going to be. Poker odds are easy – there are only 52 cards (unless you play with a joker). A pair of dice have only 12 potential combinations. The potential combinations of weather and climate during our planet’s existence aren’t quite infinite, but they approach it.
In January, 2016, Gallup announced that “Democratic, Republican Identification Near Historical Lows”, and explained that 26% identified as Republicans, 29% as Democrats. On January 16, 2020 30% identified as Republican, 27% as Democrats. Gallup’s most recent stats were on September 14, with 28% identifying as Republican and 27% Democrats. If I start with a good model based on the 2016 election results, I have a problem in 2020.
For political polls, our universe consists of registered voters – but that gets to be a problem: “The Public Interest Legal Foundation (PILF) found that 244 counties across the United States exceed 100% voter registration. Counties in 28 states plus the District of Columbia and Alaska have more voters registered than adults living in those jurisdictions.
After a review of records submitted to the federal government, The Public Interest Legal Foundation (PILF) discovered 244 counties in which voter registration levels exceed the number of living adults in the jurisdiction. Additionally, 279 counties have registration rates ranging from 95%-99%, which PILF determines are “implausibly high.”
Polling is based on “best available data.” It is a coincidence that the initials are BAD. Starting from poor data makes it hard to develop a way to project with accuracy, and it’s hard enough anyway.
California has more immigrants than any other state – in 2017, 27% of California’s residents were foreign born – and a little over half of them are US citizens. About one of eight contacts is a non-citizen and not eligible to vote. If you survey Montana, 2% of the residents are immigrants, and 58% of those are naturalized citizens. Less than 1 percent of Montana residents aren’t citizens. Few calls reach non-voters. It isn’t easy to develop a national model that projects surveys accurately.
And then there are the folks who lie to pollsters – in 2012, polls in South Dakota had shown strong support for legislation that would limit abortion access – but the vote turned out the other way. It was the first time I encountered what is now called “the shy Trump voter.” When you think about it, it isn’t particularly rational to believe the guy who calls you and interrupts dinner has your privacy as a main concern. On that issue, it looks like 3% or more of the survey respondents weren’t truthful. Face it, there was more than a zero chance that the voice on the other end of the phone might report your comments back to your Aunt Sally!
I am glad I never had to make a living polling and predicting elections. It’s easy to look at the data and predict Trump will carry Montana and Biden will carry California. It’s a bit more risky to project Florida, or North Carolina, or Ohio.