Statistical Studies and Experiments

A. Polls, Studies and Experiments: Sampling Phase

As we saw in the section on statistics, statistical support for claims can be generated either by enumeration (counting each instance in an entire population), or by estimation (counting each instance in only a subset of the entire population). Though enumeration is always the more accurate of the two, it is far from perfect, especially as the size of the population increases. In the United States, the best known example of enumeration in a large population is probably the national census, which takes place every decade, and which is usually considered to be full of errors. Since, however, the Constitution specifically requires enumeration, and since a more accurate count might have political consequences (for example, representation in the U.S. House of Representatives is based on the census), we continue to use the very expensive and time-consuming method of enumeration. It is possible that we could get just as accurate a picture of the U.S. population more quickly and cost-effectively through estimation, but the assumptions involved in devising that estimation would be even more subject to political pressure and influence.

When large populations are involved, estimation is usually employed. (Even the U.S. Census, which by law counts the population by enumeration, uses estimation to produce most of its analysis of American society.) Estimation involves two stages: first, selection of the group or population to study, and then the investigation itself, which involves collection and analysis of information.

The selection stage is roughly equivalent in all forms of estimation. The process of selection is, first, to identify the group or population that the estimation will describe (the target), and then to select from that target a smaller but representative group (the sample). In theory, if the sample is fully representative of the target, then what is true of the sample is true of the target. Unfortunately, there are few situations in which a fully representative sample can be obtained, such as checking the specifications on a mass-produced engine part. As a result, researchers have devised methods for selecting samples sufficiently representative of the target to make estimations about it. These methods can be divided into two types:

  • Random sampling. The most reliable way of choosing a sample that will prove representative of its target is to select that sample randomly--that is, by some method which will eliminate the biases and expectations of the researcher. In the example of the mass-produced engine part above, samples could be taken at random off the production line and checked for compliance with set specifications. Note that this "randomness" does not necessarily imply a lack of order--measuring every twenty-third part, for example, could still be considered a random sample. In fact, an ordered process of sampling helps eliminate the influence of the person doing the sampling, who may be influenced by qualities (position on the belt, time of day, and so on) that would skew or make more unrepresentative the sample.

    While random sampling is theoretically the most reliable way of producing a representative sample, it is all but impossible to do in a human population. Let's take an example: in order to study voting patterns in San Jose, a polling group chose its sample randomly from the white pages of the telephone book, calling up every twenty-seventh name. This sounds like a random sample, but is it? If we have as a target anyone living in San Jose, this method has already eliminated all those whose names are not listed in the phone book, because either they do not have a phone, they have an unlisted phone number, they have moved to the area recently, or there was an error in producing the phone book. Not only does this begin to seem a pretty large segment of the population of San Jose that has been excluded, but it also seems to comprise elements of the general population whose voting patterns might conceivably be different from those whose names do appear in the white pages.

    Even if we adjust our target to study the voting patterns of those listed in the San Jose phone book, that "random" sampling method might still not prove to be random. Why? Well, having chosen its sample, the polling group begins to make its calls and -- no surprise -- a large percentage of those asked refuse to take the time to answer the series of questions. This means two things. First, it means that those that do respond are self-selected; that is, they are in the sample not just because they were chosen "randomly," but because they themselves chose to participate. The motivation for people to participate in such surveys varies, but one reason is probably because they feel strongly about one or more issues involved, and individuals who self-select because of strong feelings about the subject of the estimation can be said to bias the sample. This is also true of those who refuse to participate, some percentage of whom refuse not simply because they are too busy, but because they are indifferent to the issues, not interested in expressing their views (perhaps because they are controversial), or for some other substantive reason self-selected out. Many polls are conducted in this way, but the possibilities of a biased sample are very strong.

  • Adjusted sampling. Let's go back to the phone book example above. Having realized that their sample was self-selected and therefore biased, the researchers could still use their survey results if they found some way to make their sample more representative. To do so, they could identify those factors that they thought would influence the subject they are studying. So, in the case of voting patterns in San Jose, they may use such categories as income level, race or ethnicity, gender, education, party affiliation, and occupation in order to create a profile of each member of the sample. Then, by comparing those profiles with established information about the target population in general, they could choose, from among their original sample, a smaller sample that is very representative of the target, at least in the categories they have identified.
So, even though it may seem a contradiction in terms, sometimes the best way to prevent a biased sample, when a truly random process is unavailable (which is usually the case in human populations), is to adjust for it by a very un-random selection process of matching the sample to the target in significant ways. But there are dangers inherent in this option as well. After all, the researchers obviously do not understand everything about the topic under study, or else they would not be researching it. So the first problem is that they may not recognize all the significant categories affecting the outcome, and if they are unaware of one or more, they cannot adjust for them. (Imagine using an adjusted sample to study the increase in lung cancer over the past hundred years if you were not aware of the impact of smoking on the disease. Since your sample would not then be adjusted to reflect the proportion of smokers in the general population, your results would probably not be able to identify smoking as the source of much of the increase.) A second problem is that allowing researchers to select the members of the sample group opens the door for the sorts of errors that human involvement often brings, from the unintentional to the unethical. A good estimation will have safeguards built into it to prevent or limit these problems. As a result, the results of any estimation are only as reliable as its design.

We have seen that one of the central areas of concern for any estimation is the way in which the sample is selected. The next section will continue this discussion, by distinguishing between polls, studies and experiments, and by looking at how the results of estimations should be interpreted.

B. Polls, Studies and Experiments: Investigation Phase

The investigation phase of estimation can take one of three forms: polls, studies, or experiments. As you read in the last section, all three of these begin with a rather similar process of identifying the target population, and then selecting a representative sample from that target. After that, polls, studies, and experiments become quite different.

The differences between polls, studies, and experiments are easy to spot.

In experiments, the researchers themselves actively control something related to the sample group, either by introducing it where there was none before, or by removing it where it once existed. Experiments always move from cause to effect, by manipulating the suspected cause, and then gathering data about the results of that manipulation.

Generally speaking, samples in experiments tend to be smaller than in other forms of estimation, and that sample is divided further into at least two groups: the experimental group and the control group. The manipulation of the suspected cause only occurs in the experimental group. Because it is difficult to trace effects to a single cause, it is important to have a second group, the control group, which is statistical similar to the experimental group, and which undergoes all the experiences of the experimental group except the introduction or removal of that single cause under study.

In cases of a medical experiment, for example, where researchers know that some patients respond favorably to any medication, at least at first, when the experimental group is given a pill containing the drug under study, the control group is often given a harmless sugar pill, with no active ingredient, in order to simulate the taking of medication as it occurs in the experimental group. Those sugar pills are known as placebos, a term that can be generally applied to any neutral activity or stimulus introduced in the control group for the sake of reproducing the experiences of the experimental group; and the tendency of subjects to respond favorably to any treatment, including sugar pills, is known as the placebo effect.

In studies, the researchers only passively collect data, whether they record the data from their own observations or analyze existing records. On one hand, because they do not involve the active control of a suspected cause, studies can only show correlation, never causation. On the other, studies have the flexibility of moving from the effect back to its cause, as well as from the cause forward to its effect.

Studies, then, are largely statistical analysis. They do not have the component of direct manipulation, as do experiments, and they usually do not need to rely on the statements of individuals, as do polls. Depending on the design of the study, a control group--that is, a second sample group similar to the first but missing the factor under study--may be used in order to help strengthen the causal arguments, which will be discussed briefly below.

In polls, researchers rely on what people say, rather than studying a phenomenon itself. Polls are the most common type of estimation, and require the least amount of investigative effort because, once the sample is chosen, pollsters simply ask that sample questions and record the responses. Unlike experiments and studies, polls can only be conducted on human populations, since only humans can communicate their responses. (Exceptions such as signing apes, talking parrots, and clicking cetaceans suggest that polls may be done among non-human species in the future, but not yet.) Unfortunately, polls must rely on the veracity of their subjects--and humans are notorious liars, especially on subjects of enough consequence to warrant study, such as sexual practices, food consumption, voting preferences, spending habits, and so on.

Sometimes, polls do not seem to identify a correlation. Asking likely voters whom they favor, for example, does not appear to be involved with correlation or causation. However, pollsters are usually looking for patterns that associate the relevant qualities of their adjusted sample (such things as age, race or ethnicity, gender, education, party affiliation, occupation, income, and so on) with the results of their poll.

Since all forms of estimation are usually looking to show either correlation or causation, they all employ causal reasoning, such as you read about in the section by that name at the beginning of Part 3. Arguing that one factor is the difference or the commonality between sample groups that show a particular outcome is the whole purpose of estimation; and usually both forms of causal reasoning (difference and commonality) need to be employed in order to demonstrate the causation or correlation convincingly.

Polls, studies, and experiments usually produce results that have, at best, a 95% chance of being repeated if the estimation were run again. In addition, as you read in the section on statistics, the results of all estimations are limited by a factor called the "margin of error," which depends largely on the size of the sample used. The results of estimations cannot be precise, but must be expressed within the range of the margin of error. If, for example, George W. Bush received 49% percent of support in a Florida opinion poll during the election, and the opinion poll has a margin of error of plus or minus 3 percentage points, then we can be 95% sure than Bush's actual support at that time in Florida was somewhere between 46% (49-3) and 52% (49+3). Note that it is just as likely, in this example, for Bush's actual support to be 46% or 52%, or any other figure within that range, as it is to be 49%.

Also note that, to be statistically significant, the difference between two results (say, the support for Bush and John Kerry) must exceed that margin of error. If, in the same poll, Kerry received support from 46% of likely voters, and Bush received 49% of likely voters, and if that poll had a margin of error of plus or minus 3 percentage points, then Kerry's results should actually be tabulated as falling between 43% and 49%, while Bush's should fall between 46% and 52%. Because of the overlap of these ranges, however, and despite the apparent 3% lead which Bush seemed to enjoy, we must conclude that there is no statistical significance between Bush's 49% and Kerry's 46%.