the decline effect

A fascinating article in the New Yorker by Jonah Lehrer explores the phenomenon of scientific findings that seem to “wear off.” They are proved in a series of initial studies that cannot be replicated later.

For example, “second generation” antipsychotic drugs worked well in double-blind experimental trials when they were first developed, but now perform worse than the first generation drugs. The phenomenon of “verbal overshadowing” (in which describing something degrades our memories of it) was well demonstrated in experiments that, when replicated today, find much smaller effects. Symmetry was found to attract mates in numerous species, from insects to people, but recent studies no longer find that pattern as strongly, if at all. Even in physics, “the weak coupling ratio exhibited by decaying neutrons … appears to have fallen by more than ten standard deviations [if one compares experiments conducted] between 1969 and 2001.”

It’s not clear that we should expect one explanation for all these examples. In fact, it’s not clear that one phenomenon is occurring–rather than a series of anecdotes plucked from a vast and diverse literature. But the article drew my attention to a specific problem that may, indeed, be widespread.

In statistical studies, we usually say that the effects are significant if there is no more than a 5% chance that they arose as the result of a random bias in the sample. That means that 5% of all “significant” statistical results are false. This error rate should not be a problem if studies are replicated and knowledge is built cumulatively. False results should become outliers and have little effect on the literature.

But most studies are not published, either because the researchers don’t think they have found anything interesting and try a different approach before they submit their articles, or because they submit drafts that are rejected. A study is more likely to be published if it is interesting (meaning counter-intuitive), if it supports the researchers’ bold hypotheses, and if their hypotheses reflect beliefs that are either popular or interestingly controversial. So an erroneous result that arises from random sampling error–which is no one’s fault, by itself–is much more likely to be published than a valid but boring null result. For instance, if you raise students’ test scores with a simple and quick new intervention, journals will be delighted to publish your study; but if you show that this new idea didn’t work, you will be hard pressed to publish. Also, if you show that a conventional form of teaching modestly raises students’ skills, your study may be rejected as dull.

Once a study is in print, none of the incentives (money, fame, and promotion) encourage replicating it exactly to make sure it wasn’t a fluke. Instead, researchers are encouraged to try similar studies in different contexts–and those are most likely to be published if they again confirm the hypothesis. If, by predictable luck, 5% of studies falsely confirm a faddish new theory, they will have decent odds of being published and dominating the literature. It is only once a theory becomes well established that the incentives shift and one can make a mark by disproving it. That could explain why particularly well-established findings are suddenly subjected to replication, decades after they first arose, and are shown to be false or exaggerated.

There is nothing particularly spooky about this process. The fault lies not with nature but with how we organize the institutions of science. Lehrer’s subtitle asks: “Is there something wrong with the scientific method?” I think the right answer may be a little like Gandhi’s purported comment about Western civilization: “It would be a good idea.”

This entry was posted in Uncategorized on by .

About Peter

Associate Dean for Research and the Lincoln Filene Professor of Citizenship and Public Affairs at Tufts University's Tisch College of Civic Life. Concerned about civic education, civic engagement, and democratic reform in the United States and elsewhere.