Lies Cubed: 2011

Friday, September 2, 2011

Inconceivable! [Updated]

One thing that never ceases to both amuse and depress me is the way that many individuals react to numbers that their preexisting biases just can't accept. The other day, your favorite blogger was wasting time talking politics: an acquaintance of mine claimed that America was suffering from a decline in moral values, and pointed to the fact that the average American household carries a high amount of credit card debt as proof of this assertion. Obviously, moral judgements aren't really part of a statistician's job description, but I still felt the need to weigh in on this one.

Brains! (Updated)

I really should not be having to write this post. And yet Rick Perry leaves me no choice (I suspect that the governor of Texas will be taking up far too much of my time should he actually receive the Republican nomination). Time for an actual quote from Rick Perry's speech announcing his candidacy for the 2012 Republican nomination: "We’re dismayed at the injustice that nearly half of all Americans don’t even pay any income tax."

This is a perfect example of what some bloggers have come to call a "zombie lie:" a lie or half-truth that continues to be repeated ad nauseum no matter how many times fact checkers attempt to shoot it down (like all zombies, these lies cannot be killed).

Perry is certainly not the first conservative figure to harp on this point. Fox News, the editorial page of the Wall Street Journal, the Tax Foundation and quite a few conservative politicians have been complaining about this for a while. The number of households that pay no income tax is dutifully trotted out, and the Randian conclusion that we're all supposed to draw from the statistic is obvious: the rich carry all the burdens in our society, while those at the lower end of the income distribution are nothing but lazy freeloaders who don't even pay anything to support the vast federal leviathan.

Statisticians in the News

From the BBC. I do have to ask though: when did we acquire the reputation as emotionless calculating machines? Yes, we tend be logically minded people who enjoy mathematical questions, but Mr. Spock exists only in the world of fiction.

Wednesday, August 17, 2011

Do They Have to Make it so Easy?

As some of my readers may remember, when I began blogging, I divided numeric distortions into the following three categories:
1) lies
2) damned lies
3) statistics

I expected to spend my time writing about category 3. Yet for some reason I'm now writing about category 2. The latest from Rush Limbaugh (emphases mine):

Obama is always running around complaining and whining and moaning about all that he inherited from George W. Bush. Well, he inherited a AAA credit rating, an unemployment rate of 5.7%. Does anybody doubt that this is on purpose?

Where does Limbaugh get his numbers, exactly? Here's the seasonally adjusted unemployment rate from January 2008 through the end of 2009 (source: US Bureau of Labor Statistics via Google's Public Data Explorer):

(You can get specific numbers for each month by rolling your cursor along the trend line in the plot, and you can click on the "explore data" link to play around with this a little more). In November of 2008, when Obama was elected, unemployment stood at 6.8%, and had been on a steady upward trajectory since April of 2008, before Obama had even won the Democratic primary. By the time Obama took office in January of 2009, unemployment had climbed to 7.8%. In fact, unemployment in the US hasn't been at or below 5.7% since June of 2008. So I suppose Rush thinks Obama took over from Bush from the moment he won the Democratic primary?

Not to be outdone, Sean Hannity of Fox News claimed that unemployment stood at 5.6% when Obama took over from Bush:

Look, I am not here to defend Barack Obama's economic record. Unemployment has failed to go under 8.8% since he took office, and I might add that the BLS's numbers probably understate the true extent of unemployment because they don't account for people who want to work full-time, but are forced into part-time jobs, or people who are working at jobs for which they're grossly overqualified (e.g. all the highly intelligent law school graduates out there right now who are employed, but not as attorneys). As Christina Romer notes, high unemployment for more than 2 years should be considered a national emergency, and I think that the long-term social and political consequences of this recession will be devastating. Obama clearly bears a lot of responsibility here. But apparently Rush Limbaugh and Fox News are now trying to pretend that the economy was rosy when Bush left office, and that unemployment had not begun its relentless upward march before it was even clear that Obama would be the Democratic nominee (and I guess they'll soon try to tell us that Bear Stearns and Lehman Brothers didn't implode on Bush's watch either). Damned lies, all of it.

When Statistics Really Do Matter

The Atlantic recently published an article claiming that homeopathic medicine is, well, triumphant. We are also told that alternative medicine will succeed where allopathic medicine is currently failing.
And what statistics does this article cite to bolster its case? Not a one. Instead, we get quips dismissing the importance of randomized clinical trials, and a lot of anecdotes about instances where alternative medicine provided comfort to someone with a chronic condition.
Indeed, if there's a single critique to be made of alternative medicine, it is that none of its methods ever outperform the placebo in randomized clinical trials (the gold standard for testing medical treatments). To make matters worse, most practitioners of alternative medicine claim that their methods can't be adequately tested by clinical trials (I guess calling something a non-falsifiable hypothesis is now a defense?), something that the article I linked to makes quite clear. Paul Meier must be rolling over in his grave.

Sunday, August 14, 2011

RIP, Paul Meier

This obituary in the New York Times, of the great American statistician Paul Meier, is well worth reading. One little point: the Kaplan-Meier estimator is actually not terribly complicated to compute. This is part of what makes the formula such a great accomplishment. Still, overall, this obituary is a great tribute to one of the more important statisticians of the last century, and a reminder that my discipline can have a truly positive impact on the world.

Tuesday, August 9, 2011

Here We Go Again

A friend recently pointed me to an article in The Economist. In the article, we're told "In June McKinsey, a consultancy, found in a survey that 30% of firms would definitely or probably stop offering insurance after 2014, when the exchanges are in place." We've been through this before: the McKinsey study is simply not credible, because its authors failed to follow most of the rules of good surveys. (For a refresher on this, see here and here). So why on earth is The Economist, which claims to offer "authoritative insight and opinion on international news, politics, business, finance, science and technology," quoting a bogus survey? Worse, why are they quoting it after quite a few credible journalists and academics have criticized its statistical failings? Sigh.

I haven't had time to go through the related survey by the Federation of Independent Business - which apparently found that 57% of companies would consider dropping insurance completely if some employees began moving into the insurance exchanges created by the PPACA - but its findings are intriguing enough that I intend to do so as soon as possible.

Thursday, August 4, 2011

When Statistics are Correct, but Don't/Shouldn't Matter

Since President Obama recently signed the debt-limit bill that passed Congress, this issue is sort of moot, but I would like to raise it anyway.

Nate Silver and Bruce Bartlett both point out that there is significant public support for raising taxes on wealthy Americans. The polling data they point to are accurate (and I would say that, in general, Nate Silver's assessments of political polling are top-notch). The question however, is: who cares? Obviously, it's valuable for us to know where public opinion lies on a lot of issues. From my perspective, however, forcing politicians to vote in accordance with opinion polls could undermine the effectiveness of representative government. Moreover, I think liberals should be hesitant to use these polling data to support arguments in favor of higher tax rates; many of the reforms they hold dear might never have come about if decisions in Washington were always based on polls. Keep in mind that when the Supreme Court struck down restrictions on interracial marriage in 1967, approximately 73% of the population disapproved of interracial marriage (compared to about 17% as of 2007). If politicians of either party feel that raising taxes right now is a terrible idea, then they should feel free to legislate accordingly. If the public dislikes the results, we have the same remedy that we have always had: throw the bums out when they're up for reelection.

Wednesday, July 27, 2011

A Stimulating Paper

This is actually the post that started it all. When I first saw this paper by economists Bill Dupor of Ohio State and Timothy Conley of the University of Western Ontario, I knew that I needed to start blogging about misuses of statistics. However, I wanted my first post to be more accessible and less technical, so I’ve been holding off on writing this until now, when hopefully my audience will now bear with me through a more technical post (of course, this delay allowed Noah Smith to kind of scoop me on this one, but what can you do?).

I was just about ready to hit the roof when I first read this paper. The statistical mistakes made in it are appalling, and to make matters worse, they’re made by people who should know better. I’m willing to cut journalists some slack when they make statistical mistakes, but given their graduate school curricula, and given that almost every research university in the Western world keeps at least one statistician on staff specifically for the purpose of helping social scientists who are writing papers, economists have absolutely no excuse. As for the way that this paper has been treated by people aside from its authors? More on that infuriating topic later.

A Quick and Dirty Introduction to Parametric Statistics, Point Estimation, and Hypothesis Testing

This post is highly technical in nature. Obviously, this is not ideal. However, I think that this is by far the best way to lead up to the next entry (which I should hopefully be able to complete by Friday). I also hope that this post will give at least a small taste of the work of statisticians, since as I mentioned in my first post, the the public has only a vague idea of what we do. So no matter what, I think this post is worth your attention. Also, I want to note at the outset that feedback is welcome. If anything is this post is vague or hard to follow, don't be afraid to let me know in the comments section! I'll do my best to revise the post accordingly.

Generally when laypeople use the word “statistics,” what they really mean are percentages. “67% of all quoted statistics are made up on the spot,” and so forth. The field of statistics is actually much broader than that. Simply put, statistics is the science (or art, depending on how you see it) of drawing defensible conclusions from data that has some element of randomness built into it. Despite some recent challenges to its supremacy, the reigning methodology for drawing such conclusions remains what practitioners have come to call “parametric statistics.” When practicing parametric statistics, we assume that the data are such that they follow a known probability distribution which can be defined solely in terms of a small set of parameters. In practice, this means that even a very large data set can be summarized efficiently by only a few values, and that we can make predictions with a relatively small amount of computing power (among other benefits). There are actually two competing ways of deciding upon reasonable values for parameters, but for the purposes of this post, we'll confine ourselves to the methodological assumptions of what has come to be called "frequentist" statistics (this set of assumptions is also sometimes referred to as “classical statistics,” but I happen to think that this designation is a bit of a historical distortion). The competing “Bayesian” methodology for estimating parameter values will have to wait for another time, as it's not relevant to the post I want to introduce.

Life Expectancy

     It's nice to know that I'm not the only person out there who cares about misrepresented statistics:

(Matthew Zeitlin guest posting on Jonathan Chait's blog in response to a huge mistake made by the New York Times columnist Charles Blow).

   I recommend reading the whole post, since Zeitlin does a great job with it.

   I also want to say a word about life expectancy as a statistical concept when applied to public policy, since I feel Zeitlin doesn't quite go far enough, and survival analysis is subject to a lot of media abuse (abuse that I expect will continue during the 2012 elections, alas). One line that I keep hearing again and again in the media from people who style themselves as "Serious Thinkers on Fiscal Matters" (people who praised Paul Ryan's "Roadmap" I'm looking at you), is something like this: "Social Security is a ticking time bomb. Nobody ever anticipated in the 1930's that old people would live well into their 70's and 80's. You retired at 55 and croaked by 60. So we must impose drastic benefit cuts now in order to keep the system solvent." It's definitely true that infant life expectancies in developed nations have improved markedly since the 1930's. A boy born in 1940 had a life expectancy of 60.8 and a woman born in the same year could expect to live to the age of 65.2. Had those two babies been born as I write this post (using the SSA's mortality calculator), they could expect to live to 82.2 and 86.1 respectively. These gains are enormous, and represent a great societal achievement. However, this increase is mostly not because the elderly are living longer; it's mainly because of the incredible reductions in child mortality that occurred during the 20th century.

How Not to Take a Survey, Part II

Previously, in “How Not to Take a Survey,” I discussed McKinsey’s opaque approach to its survey on employer reactions to the PPACA. Now, it seems, McKinsey has bowed to pressure from the media and the Democratic Party, and has released some of its survey data. Those who follow this link can also download PDF’s of the survey and the resulting data (I recommend reading the questionnaire at the very least).

Again, I’m not sure how I feel about the political dimensions of this imbroglio. The business press has every right to ask McKinsey questions; that’s part of the role that we assign to the media in a democratic society. But should McKinsey be made to answer to the demands of a political party? That thought makes me more than a little nervous.

Why Sample?

One question that I've received a few times since I wrote "How Not to Take a Survey," is the following: given how thorny issues of sampling are, why bother in the first place? Wouldn't it just be easier to put the question under study to every available member of the entire population? There'd then be no need to bother with the mathematics. The answer, I think, is contained in the following joke, very often told to beginning graduate students:

What's in a Name?

I've received a few questions as to why I'm blogging under the pseudonym of "Student." The answer is actually very simple. In other words, "Student" is a pseudonym that has a very proud history in the field of statistics. While I don't think I've ever accomplished anything comparable to the achievements of William Sealy Gosset, I'm proud to follow in his footsteps as an anonymous statistician.

Thursday, June 16, 2011

Introducing This Blog

Everybody knows the famous quotes. 'There are three kinds of lies: lies, damned lies, and statistics.” “67% of all cited statistics are made up on the spot.” And the list goes on. I’d like to think that my fellow statisticians and I have done a little more for the world than produce a great mass of lies, and I think that part of the reason we so often hear statements such as the one made by the illustrious Samuel Clemens is a failure on our part to explain to the larger public what it is that we actually do. Either way though, the quotes point to an important fact: fuzzy numbers are often used to bolster weak arguments, particularly when those arguments are being made by politicians (though they certainly have plenty of company in this regard).

How Not to Take a Survey

The world has actually been on shockingly good behavior regarding the use of statistics ever since the idea for this blog entered my head. Fortunately, just as I was beginning to despair of ever finding a good topic for a first post, McKinsey and Company just handed me this little gem of a report.

Some Background:

To summarize, McKinsey’s proprietary research arm is claiming that the result of the Patient Protection and Affordable Care Act (often referred to as “Obamacare”), 30% of private sector employers in the United States will stop offering employer sponsored insurance (or ESI for short) to their employees after 2014, when the law’s main provisions go into effect. I’ll begin by noting that the Congressional Budget Office estimated a figure of about 7% for the same question, and studies by the Rand Corporation, the Urban Institute and Mercer all suggest that the number of employers who currently offer traditional ESI for their employees but who intend to end this benefit after 2014 is minimal. Mercer also qualifies its findings by noting that in Massachusetts - where laws signed by former governor Mitt Romney in 2006 have produced a regulatory climate very similar in many ways to the PPACA - very few employers of any size have actually dropped traditional ESI since 2006. So McKinsey is clearly an outlier here. Now, obviously, being an outlier isn’t what disqualifies the study. It’s fully possible that McKinsey is correct, and has seen something that the CBO et. al. either haven’t noticed, or are refusing to see. And it’s important to note that the Urban Institute and Rand both based their conclusions on simulations, rather than polling data; this isn’t a criticism (I happen to think that their simulation methods are rather good), but it needs to be pointed out that their methods are different. Anyway, while the studies done by the opposing camp have their imperfections, McKinsey has done virtually everything it can to undermine the credibility of its own report, to the point that the latter should not be taken seriously unless or until McKinsey releases more information to the public. I suggest reading the short article in its entirety, since it’s an excellent example of how not to publish survey data if you wish to be taken seriously, and of why the general public should be skeptical of survey data to begin with.

Lies Cubed