the monumental task that confronts a high-stakes testing state

Let’s say you don’t especially trust teachers to assess their own students, because their ratings can be inconsistent and biased. So you want to use validated and standardized assessments to evaluate students, schools, and teachers. Let’s say, furthermore, that your state authorizes about 4,000 different courses, from kindergarten through 12th grade. (A subject like science in 3rd grade counts as a “course,” by the way.) Each course encompasses many different content areas; for instance, an American history course covers the Revolution, the Civil War, civil rights, and so on. For each topic in each course, you need assessment “items” (questions or prompts of various kinds). You need more than a few items for each topic; one question does not yield a valid score. You can’t repeat items without allowing kids to cheat by looking at old tests. And you will be testing frequently–more than once per year in each course if you consider the need for make-up tests and practice tests.

The upshot is that you will need at least several hundred thousand assessment items to make the whole system work. See Florida’s Race to the Top Assessments page for some of the documents on which my estimate is based. Thus …

  1. You face an expensive undertaking, and if you skimp, you will get poor items, written by people who are not sophisticated about the content or well trained in writing assessments. Pilot-testing items costs even more money.
  2. Even if you spend enough money, writing several hundred thousand items is a human enterprise. Error is inevitable. Some proportion of your items will be flatly incorrect or invalid in other ways. Many will be too easy or too hard, or inadequate to assess the desired skills and knowledge.

On its own, this is not an argument against high-stakes testing. The best argument in favor is that measuring pretty well is better than not measuring at all. But the cost and frailty of the whole system must certainly be taken into consideration. After all, the power of the state stands behind these assessments. If a kid cannot move on to 8th grade, or if a teacher loses his job because of test scores, that is a state decision. I think people may reasonably view it as almost a juridical process.

In the corporate context, employers are always assessing employees, and vice-versa. It is not OK if an employer’s assessments are biased or arbitrary, but using standardized measures may at least reduce inevitable bias, and the market does offer a theoretical solution to injustice (the employee finds a different job). In contrast, if a state moves from not making high-stakes assessments at all to doing so badly, it’s like imposing a new juridical regime that makes arbitrary decisions. I see a serious threat to justice.

