Often, people who say that they have experienced specific services or opportunities also report better outcomes–such as educational success, employment, or health–compared to similarly situated people who never received those opportunities. For example, according to a paper published by CIRCLE, young adults who were required to perform community service as part of their middle school coursework are 14 percent more likely to graduate on time from college, even when one compares them to people who are similar in respect to all the other factors measured in the survey, such as test scores, parental education, race, and gender. One could conclude that community service has a 14-point positive impact on college graduation.
Indeed, that effect is possible. But service-learning has not yet been tested with a more rigorous method of evaluation. The “gold standard” is a randomized experiment, in which some people are randomly assigned to receive an experience that others (the “treatment group”) don’t get. If assignment is random, then the difference in outcomes is a measure of the impact of the experience.
Experiences that appear highly beneficial in studies of whole populations often show modest or no results in experimental tests. This is such a common pattern that it requires some general reactions.
why experiments rarely show impact
Survey-based studies and experiments may produce divergent results because the people who receive opportunities have advantages that account for their later successes and that are not measured in surveys–such as motivation or perseverance, helpful networks, enrollment at subtly better schools, or ties to motivated teachers and other helpful individuals. These advantages explain their good outcomes and account for what we originally hoped were benefits of specific programs.
(Another possible reason for the failure of programs to “work” when tested in randomized experiments is that the control groups actually find alternative programs. If that happens, an experiment will miss real benefits. But it’s my general sense that this is a relatively rare explanation.)
the injustice of testing only some programs
We treat programs and institutions with profound inconsistency. Government-funded programs for poor people are expected to show long-term impact in “gold standard” experiments, but no one ever asks whether services provided to high-income people work–even when such services are publicly subsidized. For instance, my university provides four years of elaborate educational, social, recreational, health, and housing services for its undergraduates. No one imagines that the impact of that package of services would ever be tested in a randomized experiment. Universities like mine obviously confer advantages on the individuals who graduate; our graduates are preferred in the job market. Since they benefit (relative to others), they want the opportunity to attend. And since they have political and economic clout, they get the opportunities they want. But a randomized experiment might find that the social benefit of a Tufts education is small, especially if the control group got a BA at half the cost.
what we should do
Notwithstanding this serious unfairness, I believe we should expect programs aimed at poor people to “work” under rigorous tests. Political power in unequally distributed. Those without power never get much public assistance. Given limited funds, we need to spend every dollar well. To test programs experimentally is not punitive; it’s a matter of making sure that we really do good.
In fact, I’m in favor of widespread field experimentation, with the following caveats:
1. There is an appropriate life-cycle for programs. They shouldn’t be expected to “work” in randomized studies from Day One. There should first be a fairly long process of informal experimentation and adjustment. Such experimentation should be supported.
2. When many programs fail to show impact, we shouldn’t become generally pessimistic about social interventions. We are holding them to standards that we never use in the private sector or when assessing other types of government programs (such as weapons purchases, agricultural subsidies, or macroeconomic policies).
3. Benefits need not always be long-term. Many beneficial interventions wear off, and that is an argument for follow-up, not for canceling the programs. Besides, if a program actually makes life better for 13-year-olds, that seems like an important advantage even if they are not still better off when they are 20.
4. Experiments should be used to improve programs. That is much more promising than lurching from one untested strategy to another every time results are disappointing.
5. Randomized experimentation is a rather detached, arm’s length method. But it is also an easy method to explain. Participants should influence important aspects of the research, such as decisions about what to measure and interpretations of the results.