Wednesday, August 17, 2011

When Statistics Really Do Matter

The Atlantic recently published an article claiming that homeopathic medicine is, well, triumphant.  We are also told that alternative medicine will succeed where allopathic medicine is currently failing.
And what statistics does this article cite to bolster its case?  Not a one.  Instead, we get quips dismissing the importance of randomized clinical trials, and a lot of anecdotes about instances where alternative medicine provided comfort to someone with a chronic condition.
Indeed, if there's a single critique to be made of alternative medicine, it is that none of its methods ever outperform the placebo in randomized clinical trials (the gold standard for testing medical treatments).  To make matters worse, most practitioners of alternative medicine claim that their methods can't be adequately tested by clinical trials (I guess calling something a non-falsifiable hypothesis is now a defense?), something that the article I linked to makes quite clear.  Paul Meier must be rolling over in his grave.

Sunday, August 14, 2011

RIP, Paul Meier

This obituary in the New York Times, of the great American statistician Paul Meier, is well worth reading.  One little point: the Kaplan-Meier estimator is actually not terribly complicated to compute.  This is part of what makes the formula such a great accomplishment.  Still, overall, this obituary is a great tribute to one of the more important statisticians of the last century, and a reminder that my discipline can have a truly positive impact on the world.

Tuesday, August 9, 2011

Here We Go Again

A friend recently pointed me to an article in The Economist.  In the article, we're told "In June McKinsey, a consultancy, found in a survey that 30% of firms would definitely or probably stop offering insurance after 2014, when the exchanges are in place."  We've been through this before: the McKinsey study is simply not credible, because its authors failed to follow most of the rules of good surveys.  (For a refresher on this, see here and here).  So why on earth is The Economist, which claims to offer "authoritative insight and opinion on international news, politics, business, finance, science and technology," quoting a bogus survey?  Worse, why are they quoting it after quite a few credible journalists and academics have criticized its statistical failings?  Sigh.

I haven't had time to go through the related survey by the Federation of Independent Business - which apparently found that 57% of companies would consider dropping insurance completely if some employees began moving into the insurance exchanges created by the PPACA - but its findings are intriguing enough that I intend to do so as soon as possible.

Thursday, August 4, 2011

When Statistics are Correct, but Don't/Shouldn't Matter

Since President Obama recently signed the debt-limit bill that passed Congress, this issue is sort of moot, but I would like to raise it anyway.

Nate Silver and Bruce Bartlett both point out that there is significant public support for raising taxes on wealthy Americans.  The polling data they point to are accurate (and I would say that, in general, Nate Silver's assessments of political polling are top-notch).  The question however, is: who cares?  Obviously, it's valuable for us to know where public opinion lies on a lot of issues.  From my perspective, however, forcing politicians to vote in accordance with opinion polls could undermine the effectiveness of representative government.  Moreover, I think liberals should be hesitant to use these polling data to support arguments in favor of higher tax rates; many of the reforms they hold dear might never have come about if decisions in Washington were always based on polls.  Keep in mind that when the Supreme Court struck down restrictions on interracial marriage in 1967, approximately 73% of the population disapproved of interracial marriage (compared to about 17% as of 2007).  If politicians of either party feel that raising taxes right now is a terrible idea, then they should feel free to legislate accordingly.  If the public dislikes the results, we have the same remedy that we have always had: throw the bums out when they're up for reelection.

Wednesday, July 27, 2011

A Stimulating Paper


            This is actually the post that started it all.  When I first saw this paper by economists Bill Dupor of Ohio State and Timothy Conley of the University of Western Ontario, I knew that I needed to start blogging about misuses of statistics.  However, I wanted my first post to be more accessible and less technical, so I’ve been holding off on writing this until now, when hopefully my audience will now bear with me through a more technical post (of course, this delay allowed Noah Smith to kind of scoop me on this one, but what can you do?).
            I was just about ready to hit the roof when I first read this paper.  The statistical mistakes made in it are appalling, and to make matters worse, they’re made by people who should know better.  I’m willing to cut journalists some slack when they make statistical mistakes, but given their graduate school curricula, and given that almost every research university in the Western world keeps at least one statistician on staff specifically for the purpose of helping social scientists who are writing papers, economists have absolutely no excuse.  As for the way that this paper has been treated by people aside from its authors?  More on that infuriating topic later.

Wednesday, July 20, 2011

A Quick and Dirty Introduction to Parametric Statistics, Point Estimation, and Hypothesis Testing

This post is highly technical in nature.  Obviously, this is not ideal.  However, I think that this is by far the best way to lead up to the next entry (which I should hopefully be able to complete by Friday).  I also hope that this post will give at least a small taste of the work of statisticians, since as I mentioned in my first post, the the public has only a vague idea of what we do.  So no matter what, I think this post is worth your attention.  Also, I want to note at the outset that feedback is welcome.  If anything is this post is vague or hard to follow, don't be afraid to let me know in the comments section!  I'll do my best to revise the post accordingly.


Generally when laypeople use the word “statistics,” what they really mean are percentages. “67% of all quoted statistics are made up on the spot,” and so forth.  The field of statistics is actually much broader than that. Simply put, statistics is the science (or art, depending on how you see it) of drawing defensible conclusions from data that has some element of randomness built into it.  Despite some recent challenges to its supremacy, the reigning methodology for drawing such conclusions remains what practitioners have come to call “parametric statistics.”  When practicing parametric statistics, we assume that the data are such that they follow a known probability distribution which can be defined solely in terms of a small set of parameters.  In practice, this means that even a very large data set can be summarized efficiently by only a few values, and that we can make predictions with a relatively small amount of computing power (among other benefits).  There are actually two competing ways of deciding upon reasonable values for parameters, but for the purposes of this post, we'll confine ourselves to the methodological assumptions of what has come to be called "frequentist" statistics (this set of assumptions is also sometimes referred to as “classical statistics,” but I happen to think that this designation is a bit of a historical distortion).  The competing “Bayesian” methodology for estimating parameter values will have to wait for another time, as it's not relevant to the post I want to introduce.

Wednesday, July 13, 2011

Life Expectancy

     It's nice to know that I'm not the only person out there who cares about misrepresented statistics:

(Matthew Zeitlin guest posting on Jonathan Chait's blog in response to a huge mistake made by the New York Times columnist Charles Blow).


   I recommend reading the whole post, since Zeitlin does a great job with it.


   I also want to say a word about life expectancy as a statistical concept when applied to public policy, since I feel Zeitlin doesn't quite go far enough, and survival analysis is subject to a lot of media abuse (abuse that I expect will continue during the 2012 elections, alas).  One line that I keep hearing again and again in the media from people who style themselves as "Serious Thinkers on Fiscal Matters" (people who praised Paul Ryan's "Roadmap" I'm looking at you), is something like this: "Social Security is a ticking time bomb.  Nobody ever anticipated in the 1930's that old people would live well into their 70's and 80's.  You retired at 55 and croaked by 60.  So we must impose drastic benefit cuts now in order to keep the system solvent."  It's definitely true that infant life expectancies in developed nations have improved markedly since the 1930's.  A boy born in 1940 had a life expectancy of 60.8 and a woman born in the same year could expect to live to the age of 65.2.  Had those two babies been born as I write this post (using the SSA's mortality calculator), they could expect to live to 82.2 and 86.1 respectively.  These gains are enormous, and represent a great societal achievement.  However, this increase is mostly not because the elderly are living longer; it's mainly because of the incredible reductions in child mortality that occurred during the 20th century.