Response to KOS and Nate Silver on Polling Dispute



A commonsense, supported analysis on the KOS & Nate Silver v. Research 2000 controversy.


Regarding the Research 2000 Pile-on

(Also see the latest, Research 2000: A Closer Look at Volatility)

Richard Charnin (TruthIsAll)

Markos Moulitsas of Daily Kos and Nate Silver of fivethirtyeight.com have each taken aim at Research 2000, which did the pre-election polling for Daily Kos.

Markos cites the Research 2000 poll that determined only 27% of Republicans supported gays in the military. He found this to be way too low. His evidence? Gallup and ABC polls had over 60% GOP support. Well, because they are MSM, that makes them right and R2000 wrong?

He gives Republicans too much credit. If 90% of Democrats support gays in the military as do 30% of Republicans and 60% of independents, then we would have about 61% supporting in total. In any case, why does Markos go after R2000, which did the 2008 pre-election presidential polling for him?

Nate Silver wrote that the R2000 data “feels way too clean for me”. Better clean than dirty, Nate. Would he rather see the kind of volatility that CNN and Gallup had in their tracking polls? For some reason, Wolf Blitzer always appeared perplexed whenever Gore jumped in the 2000 poll.

Nate points to a chart depicting age breakdowns in the Democratic vote share for the last 20 contests surveyed by R2K and PPP. He writes: ”The age breakdowns in Research 2000's numbers are almost always close to "perfect" -- in 20 out of 20 cases, for instance, the Democrat gets a lower vote share from among 30-44 year olds than among 18-29 year olds. PPP's data, on the other hand, is *much* messier -- which is what I think we should expect when comparing small subsamples, particularly subsamples of lots of different races that are subject to different demographic patterns”.

Of course, 18-29 year olds consistently vote more Democratic than the 30-44 group. Is that news? Is 20 out of 20 cases reasonable? Let’s compute the probability that the Democratic share of the 18-29 age group would exceed their share of the 30-44 group in all 20 elections.

In 2008, Obama had 66% of the 18-29 segment and 52% of the 30-44 group. In 2004, Kerry had 56% and 48%, respectively. So in the last two elections, the Democrats had an average 61% share of the 18-29 age group and 50% of the 30-44 group. Given these shares, the probability is virtually 100% that all 20 elections would show that the 18-29 age group was the best one for the Democrats. Assuming a 56% Dem share of the younger group, the probability is 98% that the Dem 18-29 share would exceed the 30-44 share in all 20 elections.

Nate also finds it strange that in the last 30 races, Democrats did better among women in all cases. Once again, I ask, is that news? I will spare you the probability analysis. He also questions the lack of volatility in R2000. It’s not “noisy” enough for his taste.

“Likewise, take a look at their Presidential tracking numbers from 2008 (http://www.dailykos.com/dailypoll/2008/11/4). They published their daily results in addition to their three-day rolling average ... and the daily results were remarkably consistent from day to day. At no point, for instance, in the two months that they published daily results did Obama's vote share fluctuate by more than a net of 2 points from day to day (to reiterate, this is for the daily results (n=~360) and not the rolling average). That just seems extremely unlikely -- there should be more noise than that.

You want noise? OK, Nate, let’s take a look at the national pre-election polls from Sept. 10 and compare Obama’s 3-poll moving average to the 3-day R2000 tracking poll. The average change in the R2Kpoll was 0.67% compared to 0.99% in the pre-election poll moving average. The moving average standard deviation was 1.58% for R2K compared to 1.88% for the national polls. The absolute standard deviation of the R2K percentage change in vote share was 0.53% compared to 0.58% for all polls Finally, the average Obama R2000 share was 50.27%. It was 49.65% in all polls.

The R2K poll shares are rounded to the nearest percent, so small changes are not reflected in the table. For instance, assume the actual R2K tracking poll results on successive days were 49.7% and 50.2%. These would both be shown as 50% by R2K and the actual 0.5% change would show as 0%. The effect is to lower the overall R2K volatility (standard deviation) to 1.58% as shown above. Therefore, the True R2K volatility must be closer to the ALL polls 1.88% figure.

Nate and Markos have each denigrated exit polls. In fact, they won’t even talk about Election Fraud. Yet they fabricate faux outrage about an independent polling firm that is not the MSM. Nate was recently hired by the NY Times. Unfortunately, the Grey Lady (“All the News that’s Fit to Print”) does not report Election Fraud. In 2004-2005, Markos locked out posters who sought to present statistical and anecdotal evidence pointing to election fraud. But the Kossacks rebelled. Those “conspiracy nuts” are now allowed to post.

Obama’s recorded share was 52.9%, a 9.5 million margin. But the True Vote model indicates he had 57.5%, a 22 million margin. Want proof that fraud cost Obama 13 million? Consider this: The National Exit Poll required that there be 12 million more returning Bush voters than Kerry voters. That is not only implausible; it is mathematically impossible. There were likely 10 million more returning Kerry voters than Bush voters.

Why don’t you write about it in the Times, Nate? Or mention it the next time you get on MSNBC? On the other hand, you better not – if you want to keep that job.

Reprinted with the authors permission.
Original article:
Regarding the Research 2000 Pile-on


Michael Collins July 2, 2010 - 3:43pm
( categories: Blog Criticism )

When they don't produce the results you want to report...

Synoia July 2, 2010 - 4:16pm

When R2K give up all the data and we can sift through it then I'll know who to trust. I'm still with Kos on this one.

If they don't give up the data it means they have something to hide.

"Sí che dal fatto il dir non sia diverso."

-Dante

Sean Paul Kelley July 2, 2010 - 5:06pm

... but, within the context of the statistical arguments advanced by KOS and Silver and the analysis above, which clearly debunks it, I don't see the case that there was any deceptive practice. It's just not there.

Researchers are supposed to share data for the furtherance of knowledge. One of big biggest cases ever on sharing primary data was the 2004 exit polls for the presidential election. Mitofsky and company refused congressional requests for primary exit polling data and then, to deflect pressure, shared primary data with a chosen few. That didn't work. It created a host of questions. (As it turned out fraud was demonstrated using just the data presented by the polling company.)

So we're at an impasse. We're told that R2000 won't share data (although I have not seen their direct response). Regardless of any corporate policy etc., the methods and data need to be released.

As for the substantive arguments about the numbers, KOS and Silver's case is very weak.

Michael Collins July 2, 2010 - 5:39pm

And when the report comes, the outcome will be devastating for one or the other in this controversy. If R2000 produces a report that is satisfactory according to neutral polling professionals, it's going to be an unhappy day on the other side. If they produce something sketchy, then it's their head on the block. Big risk to make those accusations imho.

Michael Collins July 2, 2010 - 5:45pm

Has he published on this since July 1? He got a cease and desist order, according to the president of R2000. July 1 is the last post I've seen.

Michael Collins July 6, 2010 - 2:34am

The Editor in Chief of Gallup Polling, Frank Newport, just wrote a comment on the R2000-KOS controversy. He's not taking sides. However, it seems that he's positioning the polling industry standards for research and reporting as the arbiter of this case, which makes sense.

His first comment below concerns the responsibility for quality assurance by the publisher of polling information.

In a general sense, however, any public arguments about the veracity of polling data are of concern to polling professionals -- just as allegations of plagiarism are of concern to journalists, and allegations of data fabrication in published research are of concern to scientists. ...

… I would emphasize the ultimate responsibility which rests with the entity commissioning or releasing poll data, just as a newspaper or broadcast outlet has the ultimate responsibility for what it releases or publishes. Frank Newport, Gallup Polling, July 1

This is just part of a lengthy response to the controversy by the head of R2000 polling. KOS filed suit against and R2000 filed a counter suit.

Let's also be honest here and state that many in this witch hunt are in the tank for the Kos. If they were truly fair and objective, they would allow this matter to go through the legal process. Most importantly, what would they all report if Kos did receive what they wanted? I can answer that, they would two things, first they would find a biased statistician that would find a subjective flaw in the data and two, even if the data met the satisfaction of all, our vindication would be on page 7 in a small blurb in the Metro section and Kos would not even receive a slap on the wrist from any of them. Most importantly, our reputation will never be repaired even after we are vindicated, so in that sense Kos has already been successful

This is why this will be my only response on this matter until after we are vindicated. No more inquiries or questions will be answered until after this is settled. Del Ali, TPM, July 1

Michael Collins July 3, 2010 - 12:18am

but Nate Silver's jugular instincts, not on math itself but on what appears to be the far more important question of what is most meaningful to measure, carry a lot of weight.

I certainly don't have the relevant background to assess this directly, but even as a layman I can see the problem he's talking about here:

... one relatively obvious problem they identified was the unusual movement in Obama's weekly tracking numbers. In particular, Obama's favorability number rarely remained the same from week to week in Research 2000's tracking, instead almost always moving in one direction or the other by at least a point:

Compare this to Obama's weekly approval numbers in Gallup's polling, which resembled a far more natural and indeed fairly normal distribution.

The stats moved by increments that seem quite plausible right up until you notice that the increments by which they move appear to be significantly less-randomly distributed than Gallup's. And suspiciously so - they primarily tend to move by nice, safe single points up or down - and look at that bizarre "donut hole" right in the middle. It looks as if Research 2000's tracking forgot that Obama's stats *not* moving was an option.

This seems to be part of what Silver means by "too clean" - when analyzed like this, the data appears to either have been heavily massaged or falsified outright. He's hedging on which it is, but under the circumstances I'd expect that.


"The best-informed man is not necessarily the wisest. Indeed there is a danger that precisely in the multiplicity of his knowledge he will lose sight of what is essential."

- Dietrich Bonhoeffer

Escher Sketch July 3, 2010 - 1:37am

Good response. I'll fill in an answer with "edit" tomorrow. In the meantime, check out the new link at the top on "Volatility." It's fascinating.

On your question, See this

Michael Collins July 3, 2010 - 1:55am

..."forcing" into "all odds" or "all evens" issue. If you're always forcing the responses from one category to the other, there should logically be a lot of 1% shifts in the distribution. I'd really like to know what the reason for this "forcing" is - seems unlikely to me to be related to fraud (well at least in a real simple, direct, commonly recognized way). Why, were one trying to make up data out of whole cloth, would one do something so obvious? Seems unlikely. Right now, I'm wondering whether we're not seeing the patterns of some rather "hard" smoothing [probably not actually smoothing, but some sort of post-processing].

As to the data being "too clean" - idly I wonder what would happen were one to take one's data and resample the hell out of it. I'm not a pollster (in the sense of political polling) so I don't have a good understanding of whether anyone's played around with this in the polling world, but resampling based analytical strategies are all the rage right now - wonder if some clever bunny decided that it would be an interesting basis for data presentation in a polling context (i.e., what we see isn't simple tabulation of responses, but the product of thousands of re-sampling runs based on the raw responses).

“The absence of any US-Iran bilateral channel...may have the perverse effect of reinforcing Iranian interest in progressing in the nuclear realm so that the US will be forced to take it seriously and engage it directly." ~ Richard Haass

JustPlainDave July 3, 2010 - 9:03am

In fact Silver posted a column that discusses the possibility that the human intervention demonstrated in the results is inadvertent rather than deliberate: Is Research 2000 Merely Mangling Its Data - Rather Than Fabricating It?

... Grebner et al. argue that this phenomenon could only be a reflection of human intervention - no naturally occurring statistical process could produce it. In my view, that conclusion is correct beyond the shadow of a doubt. This does not necessarily imply, however, that the human being is making up the numbers entirely. He could also be manipulating real data in a scientifically unsound way. Perhaps it feels abnormal to Ali when his raw data has not shown some change in Obama's ratings: he would therefore tweak the numbers upward or downward by a point, as he feels he has license to. In essence, he could be using real data and [e.g. inadvertently - ES] making it look fake...

I should add that I wouldn't have any idea if I'm presenting any of Silver's more compelling arguments - I just chose one that made a good visual example from my complete layman's perspective.

The example cited above - the uncanny degree of correlation between the expected outcomes and R2K's claimed captures, vs the far noisier data gathered by PPP - seems to merit an accompanying graphic too.

... Take a look at the attached chart, for example: these are the age breakdowns in the Democratic vote share for the last
20 contests surveyed by R2K and PPP, respectively. The age
breakdowns in Research 2000's numbers are almost always close
to "perfect" -- in 20 out of 20 cases, for instance, the
Democrat gets a lower vote share from among 30-44 year olds
than among 18-29 year olds. PPP's data, on the other hand,
is *much* messier -- which is what I think we should expect
when comparing small subsamples, particularly subsamples of
lots of different races that are subject to different
demographic patterns...

...

Why, were one trying to make up data out of whole cloth, would one do something so obvious? Seems unlikely. Right now, I'm wondering whether we're not seeing the patterns of some rather "hard" smoothing [probably not actually smoothing, but some sort of post-processing].

I don't disagree with your speculation here, but it's suggested that the potential answer to your first question might lie in human psychology. It's alleged that people falsifying strings of numbers don't simulate randomness very well at all - for example they tend to pick "zero" significantly less often than they pick the numbers one through nine. The answer if that proved to be the case would then be "because they couldn't help themselves". Of course this wouldn't rule out an alternative explanation - some form of data processing that inadvertently produced the same results.

[by the way - that's not to tell you things you probably know better than I, but for the benefit of others who might follow the thread]


"The best-informed man is not necessarily the wisest. Indeed there is a danger that precisely in the multiplicity of his knowledge he will lose sight of what is essential."

- Dietrich Bonhoeffer

Escher Sketch July 3, 2010 - 1:37pm

They're suing KOS in Maryland and KOS is suing them in Northern California. R2000 said they have no further comment until trial so it's speculation all around, particularly on the odd-even issue. However, here's the link to some further analysis and part of what's there.

The R2000 polls seem to match up with Gallup in a favorable way.

From "Research 2000: A Closer Look at Volatility" July 5, 2010

R2K was a 3-day tracking poll of 1100 total respondents. It was essentially no different than standard polls that sample 360-370 voters a day but only report the total sample-size with a typical 3.0% margin of error. Most tracking polls report a continuous three-day moving average, but avoid providing the daily results. R2K provided the daily numbers. Why don’t the other pollsters do likewise?

Therefore, we can look at the R2K tracking poll as just another standard poll that provides daily and full 3-day samples. The volatility of the R2K poll is equivalent to standard polls of 1000-1200 respondents. R2K provided more information than other 2008 tracking polls that only reported the rolling 3-day average.

Table 1 displays R2K daily statistics.

The margin of error is 1.96 times the standard deviation (a measure of volatility) at the 95% confidence level.
The standard deviation of Obama’s daily poll shares was 1.83%. It was 1.59% for the 3-day moving average.

Table 2 is a comparison of Gallup vs. R2K.

Gallup was a registered voter (RV) poll. R2K was a likely voter (LV) poll.
The average shares and volatilities (standard deviation) closely match.
There was a strong 0.70 correlation between Obama’s Gallup and R2K shares.
There was a good 0.50 correlation between McCain’s Gallup and R2K shares.

Michael Collins July 6, 2010 - 2:29am

If the R2000 poll were falsified, it originates anyway from something. Probably from other polls.

Exact timestamps are needed to check if big moves in R2000 polls were coincidental, lagging or leading other polls. Thus having a good correlation with another pollster could be a bad sign.

Additionally it looks very bad when R2000 doesn't release details on its polling. Who did what and when? There should be tons of traces like phone bills etc.

When you see that somebody calculates correlation, you have take in account that maybe he doesn't know how to calculate it. That's normal.


-- Thinking inside the box maximizes risk adjusted profits

Singular July 7, 2010 - 5:32pm

Completely falsified data would never lead other polls. I'm assuming that the falsification would be based on the results other polls. The results would have a smaller standard deviation.

Both polls show negative skew (higher bars on negative side).

R2K results have a couple more extreme values (kurtosis) hinting on a smaller sample or a more aggressive likely voter model.


-- Thinking inside the box maximizes risk adjusted profits

Singular July 5, 2010 - 4:10pm

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.