Did our pollster cheat us?

June 29, 2010 By David

This, published by Markos at Daily Kos, is quite worrisome.

I have just published a report by three statistics wizards showing, quite convincingly, that the weekly Research 2000 State of the Nation poll we ran the past year and a half was likely bunk….

We contracted with Research 2000 to conduct polling and to provide us with the results of their surveys. Based on the report of the statisticians, it’s clear that we did not get what we paid for. We were defrauded by Research 2000, and while we don’t know if some or all of the data was fabricated or manipulated beyond recognition, we know we can’t trust it. Meanwhile, Research 2000 has refused to offer any explanation….

While the investigation didn’t look at all of Research 2000 polling conducted for us, fact is I no longer have any confidence in any of it, and neither should anyone else. I ask that all poll tracking sites remove any Research 2000 polls commissioned by us from their databases. I hereby renounce any post we’ve written based exclusively on Research 2000 polling.

Very unpleasant business. We commissioned and published three (IIRC) Research 2000 polls over the last couple of years, including one in the Scott Brown-Martha Coakley Senate race that got a lot of attention. We will be following what happens between Daily Kos and Research 2000 with great interest. Where things are now:

the lawyers will soon take over, as Daily Kos will be filing suit within the next day or two.

If any statistics wizards care to look over the report to which Markos refers, it is available here.

UPDATE: This is getting really ugly. Daily Kos’s lawyer, Adam Bonin, has told TPM that Research 2000 “handed Daily Kos fiction.” Research 2000, for its part, has retained the large Howrey law firm, which has both threatened to sue Daily Kos and sent a threatening cease and desist letter to Nate Silver at FiveThirtyEight.com.

Please share widely!

Comments

jconway says

June 29, 2010 at 7:20 pm

Was not going with CarTalks pollster, Paul Murky of Murky Research.
- heartlanddem says
  
  June 29, 2010 at 7:35 pm
  
  Are reliable too.
peter-porcupine says

June 29, 2010 at 7:49 pm
couves says

June 29, 2010 at 7:50 pm

Awkward.
lasthorseman says

June 29, 2010 at 8:03 pm

the publicity out of said poll you wanted. What BTW is the point of asking morons what they think outside to trying to market something. You may want to Google Mark Dice and get the youtube about Doc Holliday and Wyatt Earp signing the Declaration of Independence.
- mark-bail says
 
 June 30, 2010 at 5:28 pm
 
 reptilian humanoids for a conspiracy theory that at least has entertainment value.
- dcsohl says
 
 July 2, 2010 at 10:57 am
 
 Because our system of governance is based on asking those same “morons”* to decide who’s going to lead the commonwealth and nation?
 
 * Your word, not mine.
amberpaw says

June 29, 2010 at 8:26 pm

While “figures don’t lie” – “liars can figure” and it is all in how it (numbers) are set up and analyzed.

Not being a “numbers person” and having to use a CPA…like I do…I have learned the presentation seems to matter more than the data most of the time.
- stomv says
 
 June 30, 2010 at 7:22 am
 
 This isn’t about figuring out the combination of phraseology and resulting numbers which offer a rosy picture.
 
 This is about the raw data being so statistically unlikely that error — willful or not — is the only explanation.
 
 
 
 Let’s say I ask every BMGer to go collect all the coins in their couches, under their car seats, in their pockets, and in their purses and wallets, and count it up. If every single BMGer reported that the pennies found always came in sets of three, that the number of quarters was always even, that the people with more sofas and cars always found more coins than those with fewer sofas and only bicycles, etc… that’s that this data looks like. The raw data itself has statistical patterns which simply don’t happen in actual sampling — they could only be explained by a measurement error.
 - somervilletom says
 
 June 30, 2010 at 7:29 am
 
 It looks like manufactured data to me — like they didn’t do the poll at all and instead conjured up some guesses and took the money to the bank.
 - stomv says
 
 June 30, 2010 at 7:38 am
 
 it might not be. I won’t suggest one or the other.
 
 An example of the problems is the even/odd problem. When counts were broken down by gender, they were almost always both odd or both even. This data isn’t natural — but is it pencil-whipped, or is it just some bug or rounding error in software? I won’t speculate.
 
 Another example is that the week-to-week changes in Obama’s ratings in the tracking poll were almost NEVER +0. He’d get + (or -) 1 or 2 or 3 in a way which mapped to a normal distribution centered on 0, but instead of 0 being the most frequent it almost never occurred. Fraud, or the artifact of some sort of moving average? I won’t speculate.
ryepower12 says

June 29, 2010 at 9:00 pm

That’s pretty amazing. Stunning, actually.
cater68 says

June 29, 2010 at 9:09 pm

I predicted a Brown victory through clenched teeth. Anyone with a pulse knew something extraordinary was afoot. It would be interesting to re-post the poll and affiliated comments….
- stomv says
 
 June 30, 2010 at 7:31 am
 
 The R2K polls are concerning not because their results were out of line with expectations, but instead because they were too much in line with expectations.
 
 Check pollster. The R2K polls weren’t coming up with statistically different top line results than the other pollsters (save Ras).
 
 
 
 It’s like this: go to a shopping mall in the middle of the day. I’m going to pay you $100 to count all the cars. I don’t know how many cars are in the parking lot, and neither do you. We both know that three other folks have counted, and have come up with 825, 832, and 803.
 
 Now, you have a few choices:
 1. Count the cars, and try to do a really good job.
 2. Count the cars, but if you make a few mistakes, who cares?
 3. Don’t count the cars — just return a number that sounds close.
 
 (1) is the hardest, and (3) is the easiest. It really looks like R2K did is akin to (3). Their results were believable in each single-poll analysis because they fell in line with the others. So let’s say hypothetically that R2K were hired to count cars in parking lots in 5000 parking lots. While their numbers always seemed right, over the span of thousands of counts, they looked like this:
 
 832
 23658
 8258
 14
 770
 336
 .
 .
 .
 2224
 
 Notice something about those numbers? They’re all even. You’d never notice this unless you looked at their results across many, many polls. Once you’ve noticed it, it’s easy to do statistical tests to see how likely that pattern would exist “in real life”… and it turns out that some of the patterns turn up on the order of one in gajillions.
 
 
 
 The problem with the R2K numbers is not that they’re wrong. The top lines are consistently reasonably accurate. The problem with the R2K numbers is that the underlying data demonstrates statistical properties which suggest that it isn’t the correct raw data — that either (a) it’s made up, or (b) it’s been fudged in a way which is systematic. Not fudged to get a different overall outcome, but fudged to make the numbers more clean — like getting a regular trim doesn’t make your hairdo shorter, it just makes the longest hairs a bit shorter.
af says

June 29, 2010 at 11:16 pm

make people question the whole field of political polling. What do they hope to accomplish, get an idea where the electorate is on a given issue or race, move support for or against a candidate or issue by a desired poll result, or reinforce a chosen position by promoting a poll that supports it? I think the political news industry is far too hooked on the opiate of polls to withdraw, but withdraw they must. Polls have become like the scoop or exclusive to the news business. Do they inform the viewers, or are they just a selling feature that says “I’m doing my job better than them, buy from me”?
- peter-porcupine says
  
  June 30, 2010 at 12:20 am
  
  Said this many times – the ONLY polls that count are taken in Novembers.
  - sabutai says
    
    June 30, 2010 at 11:43 am
    
    So will you sign this letter that says the poll taken last January that put Scott Brown into the Senate doesn’t count?
    - peter-porcupine says
      
      June 30, 2010 at 11:35 pm
johnt001 says

June 30, 2010 at 12:20 am

It’s on the rec list at this link:

http://www.dailykos.com/story/…

It has a very interesting chart, showing an analysis of the cross-tabs for R2K vs PPP – the R2K results are all highly correlated, while PPP’s show randomness in the correlation. Highly correlated numbers are not random, they are artificial, made-up – checking the correlation in this manner is a method for detecting accounting fraud.
stomv says

June 30, 2010 at 7:40 am

D,C,or B:

Drop me an email and I may be able to give this an hour or two this weekend. At the very least, I may be able to identify if the problematic raw data trends in the kos polls also show up in the raw data for which y’all paid.

I’m no statistics guru, but I do have a relatively high comfort level when taking a dip in raw data.
sleeples says

June 30, 2010 at 10:10 am

Check out Nate Silver’s fantastic breakdown of the nonrandom results.

Amazing. There is so little verification of pollsters we have at LEAST two companies now just raking in money off phony data. How many other polling firms are scams?
- kate says
 
 June 30, 2010 at 10:44 am
 
 I was reminded the other day that you won the prodection contest I did back in the Coakley race. I didn’t post the “winner” until the thread was dead. Please contact me off-line at KateDonaghue AT aol DOT com . I don’t rememeber what I promised the winner!
- stomv says
 
 June 30, 2010 at 10:56 am
 
 what was interesting about the R2K polls is that it wouldn’t have been discoverable had kos not commissioned continuous polling. If R2K had done polling the way lots of other firms do — a poll here, a poll there, without consistent time intervals and samples — it wouldn’t have been visible at all.
sabutai says

June 30, 2010 at 11:45 am

But if one were going to defraud thousands of dollars as a polling regime, wouldn’t you be a little better at it than this?

It would take about two hours to write a quick program (even in BASIC) to plug in random deviations from some baseline numbers. Run the program every week and voila who would know?
- shillelaghlaw says
 
 June 30, 2010 at 11:51 am
 
 10 A = RND (100)
 20 PRINT A
 30 GOTO 10
 - medfieldbluebob says
 
 June 30, 2010 at 2:38 pm
 
 Or, their random number generator is seriously flawed. Which is why they eventually got caught (if indeed they’ve been “caught”). There is enough non-randomness in their data to be very suspicious, and a good random number generator wouldn’t have generated that.
 
 It strikes me that a fair amount of thought and effort went it to faking this data. You need randomness, yes. But the results also have to pass muster with a whole lot of people, some of them very good statisticians/pollsters like Ned at 538.com. And, the results had to compare, somehow, with a few dozen other pollsters, who are also your competitors and would love to expose you as a fraud.
 
 Thousands of eyeballs were looking at this data, and two years later they’re getting caught. The Obama tracking poll differences are the most compelling, for me anyway. If a simple random number generator was used there would have been many 0’s in that data. There are almost none.
 
 LIke Maddoff, if they’d put as much thought and effort into actually doing the work they got paid to do, they might have actually produced decent results.
- kbusch says
 
 June 30, 2010 at 11:59 am
 
 When cryptographers go about breaking codes they look very carefully for things that are non-random. Research 2000 appears to have been playing a game at which they could easily be caught.

blueeyes on Beware the latest griftSo where to, then??
Christopher on Some Parting ThoughtsI've enjoyed our discussions as well (but we have yet to…
Christopher on Beware the latest griftI can't imagine anyone of our ilk not already on Twitter…
blueeyes on Beware the latest griftI will miss this site. Where are people going? Twitter?…
chrismatth on A valedictoryI joined BMG late - 13 years ago next month and three da…
SomervilleTom on Geopolitics of FusionEVERY un-designed, un-built, and un-tested technology is…
Charley on the MTA on A valedictoryThat’s a great idea, and I’ll be there on Sunday. It’s a…

Did our pollster cheat us?

Search

Archives