Monday, June 20, 2011

Why Sample?

    One question that I've received a few times since I wrote "How Not to Take a Survey," is the following: given how thorny issues of sampling are, why bother in the first place?  Wouldn't it just be easier to put the question under study to every available member of the entire population?  There'd then be no need to bother with the mathematics.    The answer, I think, is contained in the following joke, very often told to beginning graduate students:


    A statistics professor was describing sampling theory to his class, explaining how a sample can be studied and used to generalize to a population. One of the students in the back of the room kept shaking his head. "What's the matter?" asked the professor. "I don't believe it," said the student, "why not study the whole population in the first place?" The professor continued explaining the ideas of random and representative samples. The student still shook his head. The professor launched into the mechanics of proportional stratified samples, randomized cluster sampling, the standard error of the mean, and the central limit theorem. The student remained unconvinced saying, "Too much theory, too risky, I couldn't trust just a few numbers in place of ALL of them." Attempting a more practical example, the professor then explained the scientific rigor and meticulous sample selection of the Nielsen television ratings which are used to determine how multiple millions of advertising dollars are spent. The student remained unimpressed saying, "You mean that just a sample of a few thousand can tell us exactly what over 300 MILLION people are doing?" Finally, the professor, somewhat disgruntled with this skepticism, replied, "Well, the next time you go to the campus clinic and they want to do a blood test...tell them that's not good enough ...tell them to TAKE IT ALL!!"

    Obviously, this joke is just a tad hyperbolic (and you have to wonder just a little about the sadistic streak exhibited by the professor).  But I think it communicates the point well: there are many situations where a small subset of the population under study is sufficiently representative of the population as a whole.  And even more drastically, there are situations where examining the entire population is somehow destructive (quality control studies on an assembly line, for example).  Hence sampling.  Obviously, some populations are less homogenous than our blood streams, and that's why a well-developed theory of sampling and survey design is necessary.  While no sample design is perfect, the mathematical frameworks that have been developed since the late 18th century or so allow us to draw conclusions about a large population without trying to examine every member of the population, which is often prohibitive in terms of time and expense.  And just as importantly, the same mathematical frameworks allow us to know how much uncertainty the sampling methods introduce into our findings.

No comments:

Post a Comment