Wednesday, July 27, 2011

A Stimulating Paper

            This is actually the post that started it all.  When I first saw this paper by economists Bill Dupor of Ohio State and Timothy Conley of the University of Western Ontario, I knew that I needed to start blogging about misuses of statistics.  However, I wanted my first post to be more accessible and less technical, so I’ve been holding off on writing this until now, when hopefully my audience will now bear with me through a more technical post (of course, this delay allowed Noah Smith to kind of scoop me on this one, but what can you do?).
            I was just about ready to hit the roof when I first read this paper.  The statistical mistakes made in it are appalling, and to make matters worse, they’re made by people who should know better.  I’m willing to cut journalists some slack when they make statistical mistakes, but given their graduate school curricula, and given that almost every research university in the Western world keeps at least one statistician on staff specifically for the purpose of helping social scientists who are writing papers, economists have absolutely no excuse.  As for the way that this paper has been treated by people aside from its authors?  More on that infuriating topic later.

Wednesday, July 20, 2011

A Quick and Dirty Introduction to Parametric Statistics, Point Estimation, and Hypothesis Testing

This post is highly technical in nature.  Obviously, this is not ideal.  However, I think that this is by far the best way to lead up to the next entry (which I should hopefully be able to complete by Friday).  I also hope that this post will give at least a small taste of the work of statisticians, since as I mentioned in my first post, the the public has only a vague idea of what we do.  So no matter what, I think this post is worth your attention.  Also, I want to note at the outset that feedback is welcome.  If anything is this post is vague or hard to follow, don't be afraid to let me know in the comments section!  I'll do my best to revise the post accordingly.

Generally when laypeople use the word “statistics,” what they really mean are percentages. “67% of all quoted statistics are made up on the spot,” and so forth.  The field of statistics is actually much broader than that. Simply put, statistics is the science (or art, depending on how you see it) of drawing defensible conclusions from data that has some element of randomness built into it.  Despite some recent challenges to its supremacy, the reigning methodology for drawing such conclusions remains what practitioners have come to call “parametric statistics.”  When practicing parametric statistics, we assume that the data are such that they follow a known probability distribution which can be defined solely in terms of a small set of parameters.  In practice, this means that even a very large data set can be summarized efficiently by only a few values, and that we can make predictions with a relatively small amount of computing power (among other benefits).  There are actually two competing ways of deciding upon reasonable values for parameters, but for the purposes of this post, we'll confine ourselves to the methodological assumptions of what has come to be called "frequentist" statistics (this set of assumptions is also sometimes referred to as “classical statistics,” but I happen to think that this designation is a bit of a historical distortion).  The competing “Bayesian” methodology for estimating parameter values will have to wait for another time, as it's not relevant to the post I want to introduce.

Wednesday, July 13, 2011

Life Expectancy

     It's nice to know that I'm not the only person out there who cares about misrepresented statistics:

(Matthew Zeitlin guest posting on Jonathan Chait's blog in response to a huge mistake made by the New York Times columnist Charles Blow).

   I recommend reading the whole post, since Zeitlin does a great job with it.

   I also want to say a word about life expectancy as a statistical concept when applied to public policy, since I feel Zeitlin doesn't quite go far enough, and survival analysis is subject to a lot of media abuse (abuse that I expect will continue during the 2012 elections, alas).  One line that I keep hearing again and again in the media from people who style themselves as "Serious Thinkers on Fiscal Matters" (people who praised Paul Ryan's "Roadmap" I'm looking at you), is something like this: "Social Security is a ticking time bomb.  Nobody ever anticipated in the 1930's that old people would live well into their 70's and 80's.  You retired at 55 and croaked by 60.  So we must impose drastic benefit cuts now in order to keep the system solvent."  It's definitely true that infant life expectancies in developed nations have improved markedly since the 1930's.  A boy born in 1940 had a life expectancy of 60.8 and a woman born in the same year could expect to live to the age of 65.2.  Had those two babies been born as I write this post (using the SSA's mortality calculator), they could expect to live to 82.2 and 86.1 respectively.  These gains are enormous, and represent a great societal achievement.  However, this increase is mostly not because the elderly are living longer; it's mainly because of the incredible reductions in child mortality that occurred during the 20th century.