Thursday, June 16, 2011

Introducing This Blog

            Everybody knows the famous quotes.  'There are three kinds of lies: lies, damned lies, and statistics.” “67% of all cited statistics are made up on the spot.”  And the list goes on.  I’d like to think that my fellow statisticians and I have done a little more for the world than produce a great mass of lies, and I think that part of the reason we so often hear statements such as the one made by the illustrious Samuel Clemens is a failure on our part to explain to the larger public what it is that we actually do.  Either way though, the quotes point to an important fact: fuzzy numbers are often used to bolster weak arguments, particularly when those arguments are being made by politicians  (though they certainly have plenty of company in this regard).

            As much as I hate to admit it, I’ve actually taken to using Mark Twain’s little quip as a rubric for dealing with misuses of statistics in the public sphere.  Most of the errors or misrepresentations that cause me to scream at the media like a lunatic can be thought of as lies, damned lies, or statistics.  We’ll examine each category in turn, and then I’ll try to give a broader idea of what I intend to do with this blog.
            Let’s begin with the category of “lies.”  This is the label I give to numbers that are invented by a speaker or writer due to the fact that the real numbers just aren’t known.  The best example that occurs to me as I write this is Franklin Delano Roosevelt’s second inaugural address, in which FDR spoke of “One-third of a nation ill-housed, ill-clad, ill-nourished.” The speech certainly captured the experiences of a public that was weary of the Great Depression, and summed up their harrowing fears of further economic distress.  There’s just one problem: the figure of one-third is completely made up.  Most of the federal and state agencies that keep track of such numbers are legacies of either the New Deal or the Great Society, and therefore did not exist in 1937.  And while the decennial census currently asks about household incomes, this was not the case in 1930 (the time of the last census before FDR’s presidency).  At best, the federal government (or at least the IRS) had a vague sense of the salaries of the American workforce, since wage earners were required to pay income taxes after 1913.  So when FDR spoke of “one third of a nation” he was merely guessing.
            Now, we move on to “damned lies.”  I use this label to refer to numbers that are made up, even though the true figures exist, and are publicly available.  This category tends to be reserved for claims along the lines of Senator Jon Kyl’s recent outburst:
Everybody goes to clinics, to hospitals, to doctors, and so on. Some people go to Planned Parenthood. But you don’t have to go to Planned Parenthood to get your cholesterol or your blood pressure checked. If you want an abortion, you go to Planned Parenthood, and that’s well over 90 percent of what Planned Parenthood does.  (Emphasis mine)
The claim that abortions account for “over 90% of what Planned Parenthood does,” is not only false, it’s blatantly and evidently false, and had Senator Kyl bothered to spend even a few minutes doing his research, he would have known that.  For a rebuttal, here’s Politifact:
Planned Parenthood calculates the numbers by services provided, rather
than dollars spent. In a fact sheet last updated in March 2011, the group lists the following breakdown of its services:

Contraception (including reversible contraception, emergency contraception, vasectomies and tubal sterilizations): 4,009,549 services

Sexually transmitted infections testing and treatment: 3,955,916 services

Cancer screening and prevention: 1,830,811 services

Other women’s health services (including pregnancy tests and prenatal care): 1,178,369 services

Abortions: 332,278 procedures

Miscellaneous (including primary care and adoption referrals): 76,977

Total services: 11,383,900

By this tally, abortions accounted for just under 3 percent of the procedures Planned Parenthood provided in 2009, which is the most recent year for which the group is reporting statistics. And that would make Kyl’s statement way off.

We should note a few caveats.

First, we think many people would acknowledge a difference between providing an abortion and, say, handing out a pack of condoms or conducting a blood test. The former is a significant surgical procedure, whereas the latter are quick and inexpensive services. So Planned Parenthood’s use of "services" as its yardstick likely decreases abortion’s prominence compared to what other measurements would show. Using dollars spent or hours devoted to patient care would likely put abortion above 3 percent in the calculations.

Second, it’s worth noting that Planned Parenthood self-reported these numbers, although the group says each affiliate’s numbers are independently audited. (There is no single, national audit.) So we have no choice but to accept their accuracy more or less on faith.

Still, even with those caveats, we do think that Kyl has vastly overstated the share of abortions.
So much for “damned lies.”
            The most interesting case is, not surprisingly, “statistics.”  Statistics with scare-quotes around it is a term that I reserve for what I strongly suspect Mark Twain and his ilk were referring to when they disparaged my discipline: numbers and percentages that may be true in some ways, but are also misleadingly presented or have some inherent flaw that is kept hidden.  One final time, let’s take an example.  The year was 1964, and Ronald Reagan was endorsing Barry Goldwater in his bid for the Presidency.  At one point during his endorsement speech, Reagan made the following intriguing claim:
Federal employees -- federal employees number two and a half million; and federal, state, and local, one out of six of the nation's work force employed by government. These proliferating bureaus with their thousands of regulations have cost us many of our constitutional safeguards.
The figure of “one out of six” Reagan gives is accurate enough.  But what were all those government employees actually doing?  Reagan doesn’t say, but we can do better.  In 1964, about two-thirds of federal employees worked for either the Post Office or the Department of Defense (with uniformed military personnel alone accounting for 52%, which is probably not so surprising a figure for the height of the Cold War).  The vast majority of local and state workers were policemen, firefighters and schoolteachers.  Reagan obviously meant to leave one with the impression of a vast monstrosity of a government, filled with selfish and lazy bureaucrats gorging on a trough of money fed to them by a rapacious IRS.  Certainly that’s how many of Goldwater’s supporters understood Reagan.  Somehow though, the actual numbers probably would have weakened that impression.
            What all these quotes have in common is that they neatly illustrate just why people continue to push fuzzy numbers, and why at least some of the public had already begun to grow skeptical as early as the 1800’s.  Simply put, statistics seem to have an almost unmatched rhetorical power.  This is pretty ironic when you consider just how poor the rhetorical skills of most statisticians are, and how little the public sphere and public policy interest a good number of my colleagues.  Be that as it may, FDR probably said what he did because it sounded more powerful to his ear, and more acutely captured the public’s distress and fear about the seemingly endless Great Depression than more truthfully saying “I see a great number of our fellow citizens, ill-housed, ill-clad, ill-nourished.”  Similarly, between the onset of the Great Depression and 1964 the federal government did greatly expand in size and scope, beyond anything known to historical memory.  Reagan’s rhetoric spoke to a large segment of American society that had begun to fear the newly invigorated government and its powers, and to such an audience the prospect of a government that employed one of every six workers must have seemed truly frightening indeed.  The actual occupations of government workers were probably beside the point.
            The fact that fuzzy numbers have rhetorical power is not to excuse their use.  Quite the opposite.  I think the public is ill served whenever statistics are misrepresented, no matter how well meaning the purveyor of those figures may be.  As we’ll hopefully see through the life of this blog, however, these inaccuracies can often be exposed with just a little research.  It’s certainly true that some statistical models are complex and demand a solid mathematical background (drug trials come to mind here), but a lot of what gets discussed in the public sphere requires little more than the ability to think critically, and maybe to do some basic arithmetic once in a while.  Hopefully this blog will cause its readers to pause whenever a journalist, politician, or someone else in the public eye starts rattling off numbers that don’t smell quite right.  In short, I’m hoping that I can help people to avoid being taken in by all those “lies, damned lies, and statistics.”


  1. Though I have no information on every state, I volunteered in CT and they did, in fact, have an independent audit every year. (They also had a lot of trouble with alphabetical filing, but that's a separate problem.) PP's director claims they saw three million patients last year but even so, about 11% of patients got abortions (assuming no repeat customers, which is definitely not the case). Those 11% certainly also got their pap smears and something contraceptive, that's for darn sure. In any event, Kyl is an ill-informed weasel.

    As for the third category: prenatal screening tests!! Wildly inaccurate!

  2. As a perinatologist (= high risk OB specialist), I kinda feel like I need to comment now. I assume, Janice, we are speaking of prenatal genetic diagnosis tests? (As opposed to, say, checking a maternal blood count, which has less of an overt connection to statistical misuse).

    Anyway, what I wanted to say is that many of us know how useless the language of statistics is for certain situations. That is, saying: "Your fetus has a 1/100 chance of Down's syndrome" is not terribly useful, since you're not going to have this pregnancy 100 times. It either is, or it isn't, is the way most people see it; and they tend to hear what we say as a yes or a no.
    (And that's for people who don't have numeracy issues! Which is pretty much everybody!).

    So the language is inadequate, I think, rather than the test being inaccurate. The test does what it is designed to do; whether that is what people *want* it to do, or understand it to have done is an entirely different story.

    But I think prenatal screening tests probably need their own post for their interesting and yes, quite likely not-particularly-helpful use of statistics. And I'll stop writing now, and go think about posting at my place.

  3. @C: I'm talking mainly about the 'risk assessment' tests like triple screen, which is certainly not a genetic test (I have a PhD in biochem/genetics; I do know the difference. :) ). If the true-negative rate is, as quoted, ~95% for women under 30, whose basal risk is <1/1000, that's what I, personally, would consider inaccurate. And yes, people's misunderstandings of genetics statistics give me heartburn too!

  4. I just wanted to say I look forward to future posts on this blog. I'm a journalist who comes from a math and science background, instead of a literary background, and love this blogs topic.

    Have you ever read "A mathemtician reads the newspaper?" It's very good.