making tests | Peter Levine

I’m at the Educational Testing Service in Princeton, NJ, helping them to cook up a test. There are many cooks at work on this particular broth. In fact, what strikes me most about the process of designing a national student assessment is the enormous complexity of the process. There are laws, policies, rules, budgets, curricular standards and objectives, “items” (i.e., questions of various types), answer keys, trainings and guides for scorers, data from preliminary laboratory tests of items, pilot tests of whole exams, statistical results from the pilot tests, revised items and instruments, final results, statistical scales, summary measures, and reports. There are content experts, item-writers, statisticians, psychometricians, scorers, trainers, trainers-of-trainers, and various layers of contractors and government agencies and reviews.

If you are prone to distrust or dislike pencil-and-paper exams (and I understand and respect those arguments), all this apparatus may seem like a bureaucratic and technocratic nightmare. Indeed, any test involves countless value-judgments, guesses, and compromises, often buried in technical or administrative jargon. There is something extremely “Weberian” about a government-sponsored exam or assessment. It is a classic example of the effort to standardize and measure in order to control and improve.

On the other hand, if you are not directly acquainted with the process of test-design at the federal level, you might not realize how many different people struggle to develop and implement tests that reflect ethical principles of fairness, reliability, relevance, and social significance. The result is, if nothing else, the product of a lot of hard work.