Wednesday, July 27, 2011

A Stimulating Paper


            This is actually the post that started it all.  When I first saw this paper by economists Bill Dupor of Ohio State and Timothy Conley of the University of Western Ontario, I knew that I needed to start blogging about misuses of statistics.  However, I wanted my first post to be more accessible and less technical, so I’ve been holding off on writing this until now, when hopefully my audience will now bear with me through a more technical post (of course, this delay allowed Noah Smith to kind of scoop me on this one, but what can you do?).
            I was just about ready to hit the roof when I first read this paper.  The statistical mistakes made in it are appalling, and to make matters worse, they’re made by people who should know better.  I’m willing to cut journalists some slack when they make statistical mistakes, but given their graduate school curricula, and given that almost every research university in the Western world keeps at least one statistician on staff specifically for the purpose of helping social scientists who are writing papers, economists have absolutely no excuse.  As for the way that this paper has been treated by people aside from its authors?  More on that infuriating topic later.

             In brief, Conley and Dupor want to claim that the American Recovery and Reinvestment Act (usually referred to by proponents and detractors alike  as “the stimulus bill”) saved approximately 450,000 public sector jobs, while destroying 1 million private sector jobs.  Some of their macroeconomic assumptions strike me as suspect, to say the least (e.g. their claim that because many public sector workers are highly educated, they could have easily found jobs in the private sector... tell that to any law school graduate looking for work right now).  However, I can’t claim to be an economist, so I’ll let some real economists make those critiques for me.  In particular, Krugman and Baker (the economists I’ve cited) note that the models used by Conley and Dupor are not in keeping with certain established procedures of econometrics, and are probably deeply flawed.  But let’s just try to take their models at face value for now.

            So, here’s their initial attempt.   The authors divide the economy into four sectors:
1)   Government
2)   HELP (=Health, (private) Education, Professional and Business services) services
3)   Goods-producing
4)   Non-HELP services
Call these sectors i=1,...,4.  Then, number the states j=1,...,50.  The authors then attempt to fit 2 regression models:
Model I (fungibility imposed):
            EMPLOYij = a*(OFFSETj-LOSSj)+c'*ANCj+e
Note that EMPLOYij is the rate of employment growth in the ith economic sector of the jth state over the 18 month period ending in September of 2010, OFFSETj is the the ratio of ARRA dollars actually spent relative to 2008 state government tax revenue for state j, LOSSj is measured as the twenty-month
decrease in state tax revenues plus Medicaid increase ending in March 2010, relative to 2008 tax
revenue for state j, and ANCj is a series of state specific ancillary variables.  "e" is the error term (i.e. random noise that needs to be incorporated into any regression model) and for some reason, the authors don't indicate what its distribution is.
Model II (fungibility not imposed):
            EMPLOYij = b * OFFSETj-d*LOSSj + k*ANCj+e
The variables have the same meanings as before.  I'm not going to say too much about this except note that it's odd that neither model includes an intercept term (i.e. in both cases, we're trying to fit something like y=m*x+b, but just assuming that b=0 without justifying that assumption).
As with all cases of regression, the values of a, c', b, d and k, along with the variance of the e's must be estimated from the data.  Conley and Dupor do just this, and then use their estimates to make predictions about how the stimulus affected employment.  Their findings are summarized by what they call a 90% confidence interval (actually a 90% prediction interval, but hey, who's counting?).  Their findings (across 46 of the 50 states), with bolded numbers representing point estimators, and bracketed numbers representing the upper and lower bounds of the 90% prediction interval.  Estimates are in 1000's of jobs:

Sector         Government           HELP Services     Goods-producing        Non-HELP services


Model I       443                         -772                      -362                             92
                    [ -35 , 920]            [ -1378 , -166]       [ -942 , 218]                [ -347 , 531]


Model II      473                        -880                       -832                             -433
                    [ -531 , 1477]         [ -1912 , 152]        [ -2172 , 507]              [ -1515 , 649]

And here is where I hit the roof.  First of all, I'm not sure why Conley and Dupor used a 90% prediction interval instead of the nearly universal 95% prediction interval.  But what about the intervals themselves?  Every single prediction interval includes a large positive region.  The upshot, of course, is that if we were to test the null hypothesis that the stimulus produced a large net gain in the number of jobs in all four economic sectors across the nation we would not be able to reject the hypothesis.  I suspect that had the authors used 95% intervals instead 90% this would look even worse: a 95% prediction interval is necessarily wider, and would probably include even larger positive values. The paper may claim that, "Our benchmark results suggest that the ARRA created/saved approximately 450 thousand state and local government jobs and destroyed/forestalled roughly one million private sector jobs," but in fact the authors have found no such thing.  The claim the ARRA destroyed private sector jobs is completely lacking in statistical significance.  Additionally, the prediction intervals are very wide, which is often a sign of a poorly fitted regression model.
I can't even begin to convey how sloppy this is.  As I mentioned in my last post, point estimation is never enough.  Think back to the example of the 1st coin, and suppose that I concluded that because the point estimate for the probability of heads was 0.482 there was no way that the coin was in fact biased towards heads.  However, because a 95% confidence interval for the probability of heads includes all the values between 0.5 and 0.5129702. we cannot reject the hypothesis that the coin is weighted towards heads (albeit only slightly).  Or, perhaps more comparable here, suppose that I flipped the coin 10 times, got 4 heads, and concluded that the probability of heads was 0.4, never mind that a 95% confidence interval for the probability of heads after such an experiment is 0.1216<(Pr. of heads)<0.7376 (again, this is why we prefer large sample sizes if we can get them; the range of plausible values gets smaller and smaller the more times we run the experiment).  That's essentially what the authors have done here; they've estimated a few values, and haven't bothered with the necessary hypothesis testing or goodness of fit testing.  Had I attempted to pull something like this in graduate school, whatever professor was grading the paper would have handed it back to me before the ink had dried.

Oh, and it gets worse.  In a later section, the authors attempt to use yet more model specifications.  I won't go into details, except to say that in each case the 90% prediction intervals for number of jobs created/destroyed have positive upper bounds in every sector.  And the worst of all in my opinion, is a model that predicts that the stimulus money led to a small net loss of government jobs.  As Dean Baker notes:
They also have the peculiar result that in one specification they find no significant effect of stimulus on public sector job creation, yet do find a significant loss of jobs in the private sector. Both sides of this are troubling. It really is hard to believe that the stimulus did not even create jobs (or prevent job loss) in the public sector. What exactly did those boneheads do with the money, eat it? In you didn't find that the stimulus created jobs in the public sector, then it seems likely that your instrumental variable is not capturing the effect of the stimulus very well.
One of my professors in graduate school used to say that first thing to do with any data-set was graph it, and just look at it before trying to do anything fancy.  Again, our authors don't seem interested in such basic precautions.

Now, let's have a talk about the larger implications of this paper.  First of all, its authors should have known better than to do this.  And it just gets worse when you look at the way it's been received by other economists.  Greg Mankiw of Harvard University approvingly says on his blog (which is actually where I first saw the paper) that, "Tim Conley and Bill Dupor have a new paper on the American Recovery and Reinvestment Act (that is, the Obama stimulus bill).  Their empirical findings..." and goes on to quote the paper's abstract.  Not a word is said about the authors' failure to do the sort of statistical legwork that might have allowed them to claim their findings were "empirical" in the first place!  Karl Smith (of UNC Chapel Hill) links to the paper without noting any of its flaws, as does Tyler Cowen.

This is worrisome on a number of levels.  Greg Mankiw is the author of a very, very popular textbook that's used to teach introductory economics courses to undergraduates.  I think it's fair to say that a significant percentage of people who went to college after 2000 or so got their first taste of econ through Principles of Economics.  Mankiw is welcome to his political opinions, of course, and he's welcome to write smarmy, self-serving op-eds such as "I can Afford Higher Taxes, but They'll Make me Work Less", but presumably he still knows how to do hypothesis testing (if he doesn't, I need to have a word with MIT's economics faculty about exactly why they gave him a PhD in the first place), and when someone in his position states that a poorly written paper is an empirical finding, the public's trust has probably been abused.  The same goes for people like Tyler Cowen, though maybe to a lesser extent.  And I have to admit, I'm not entirely sure why certain economists (Mankiw, Cowen, John H. Cochrane, Eugene Fama, and more) seem to be so eager to risk not only their own credibility, but perhaps the credibility of the Dismal Science itself by carrying water for a Republican party that believes that fiscal stimulus is always, always bad, even when such beliefes fly in the face of so much of empirical experience (and hence can seem more like theology than economics).

On to the reaction of the media.  To its credit, most of the mainstream media did not tout Conley and Dupor's findings.  Fox News, alas, did (and never thought to say even a word about the CBO's analysis of the stimulus, nor that of the NBER... so much for "fair and balanced coverage"), and thanks to them, whoever watched the show thinks that "an exhaustive study" showed that the stimulus "destroyed jobs":



Then, of course, there's the right-wing blogosphere.  I don't expect much from the blogosphere in general, regardless of political orientation; heck, you can find stories along the line of "Bush caused 9/11" and other such nonsense in much of the left-wing blogosphere with a 2 second Google search.  I am a bit curious regarding exactly how so many right-wing bloggers, who I'm pretty sure don't make a regular habit of reading academic research in their spare time came across an abstruse econ paper.  Which brings me back to my point about the economics profession: university professors need to be careful about not hyping their results and not doctoring their improperly tested results, because in today's media environment, the blogs will find out what you've written and will run with it.  John Maynard Keynes aptly observed that:
Practical men, who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist. Madmen in authority, who hear voices in the air, are distilling their frenzy from some academic scribbler of a few years back.
 Today, it seems, most "practical men" blog; defunct economists of the world, beware what you write.

UPDATE:  A few people have asked me if I'm being too hard on Greg Mankiw here, since he didn't write the offending paper to begin with.  I contend that I'm only taking him at his word:
Let me make one thing clear: When I link to another economist here on this blog, it is typically because I think his or her arguments are worth hearing and thinking about, not necessarily because I agree with all of them. I don't have the time (and, in some cases, expertise) to offer a refereeing service for every article I mention. So when I say, "Here is an article by Professor X," I mean "Here is an article by Professor X," not "Here is an article by Professor X, and I approve of everything he says."
Sorry Professor Mankiw, but I'm not going to let you off the hook on this one.  Conley and Dupor's arguments aren't worth hearing and thinking about because even their own models don't support the conclusions they want to draw.  It's that simple.

Again, remember the example of the coin (I have a suspicion I may keep coming back to this over the life of this blog).  Suppose I published a paper saying something to the effect of "Experiment shows U.S. coins are weighted towards tails" based on my findings that the coin showed 482 heads in 1000 flips, and that for some reason statistics professors wrote approvingly of my paper, and if called out on it claimed they just wanted to add to the debate.  But again, even on their own terms, the data don't support my published conclusion, so there's no debate to be had.  Mankiw can't get out of this by claiming that posting a link to other economists' work is not necessarily a stamp of approval on their results.  For him to even link to Conley and Dupor in the first place is bordering on "Shape of Earth: Views Differ" type reporting, and I expect better from an economist of his stature.

No comments:

Post a Comment