assessment: an overview « Peter Levine

Recently I presented some thoughts about why and how we might use assessment in civic education. Most of my points apply to education in general. People seemed to find these ideas useful, so I offer my notes here.

Assessment for what?

“Formative”: to find out what students or other people know or can do before an educational experience begins, so that we can tailor the education to their needs.
As an incentive for performance. For instance, if students must pass a civics test to complete ninth grade (the theory goes), they will work hard at civics.
As a gatekeeper: perhaps no one should hold a high school diploma unless he or she can demonstrate particular knowledge.
To guide institutions or public policies. For example, we assess programs to decide whether to fund or require them; we evaluate teachers to determine their employment status.
For the improvement of programs or institutions: in other words, as helpful feedback to educators and administrators.
To impress outsiders, such as potential funders, with the merits of programs.

Assessment of whom?

Students–but we might choose to focus on average students, highly at-risk students, talented and motivated students who are potential leaders, or groups of students to see how they perform as teams.
Educators
Programs
Schools
States and other governmental entities

Assessment of what?

Students’ knowledge, skills, dispositions, values, or habits and behaviors. Note that the kinds of knowledge we want children to possess are enormously various and extensive. In the civic domain, skills encompass fairly typical academic skills (such as interpreting a written political speech) and distinctively civic skills (such as moderating a meeting or dealing with a free-rider in a group). Dispositions and habits can be assessed, but not when the stakes are high. Asking students to report on their own values and behaviors and then holding them accountable for their answers seems an invitation to lie.
Schools’ offerings–for example, what courses they provide; whether a student newspaper exists.
Teachers’ performance.
Programs’ effects: ideally, the changes in students that are causally attributable to their experience in a given program.
Pedagogical techniques or strategies, or elements of programs abstracted from specific programs. For instance, several evaluations of programs that include seminars for teachers have shown good effects on students. (Facing History and Ourselves is an example). But we do not know whether seminars for teachers, per se, are helpful.

Assessment by whom?

Program staff or teachers, who can assess students or programs
Supervisors, who can assess teachers or students
Expert evaluators or test-designers
Voters, citizens, or parents, whose role can either be informal (putting pressure on schools to remedy perceived failures) or formal (serving on evaluation committees, reviewing data)
Students, who can be asked for their opinions about teachers or programs. More interestingly, they can be asked to supply relatively objective data about educational experiences.

Assessment how?

Tests or test-like written instruments. These are relatively inexpensive, standardizable, and subject to public review; but limited to factual knowledge and fairly simple academic skills. They are limited, also, to the assessment of individuals’ work, not group work.
Performances or portfolios that are graded by teachers or juries.
Simulations or games–winning or scoring well on the game would lead to a positive assessment.
Evaluations based on people’s opinions of the program, e.g., college students’ course evaluations.
Longitudinal studies, which repeat some of the same survey items at different times. Repeating a survey very soon tells us nothing about retention. Repeating it after a long while precludes attributing any changes to a particular intervention. Repeating it many times is helpful but is generally expensive because of the costs of retaining individuals in a study.
Randomized experiments, about which I have written before. My favorite design, by the way, is a wait-list control, in which volunteer participants are assigned to receive the experience either immediately or after a delay, and the two groups are assessed simultaneously. Facing History and Ourselves, the Bill of Rights Institute, and the Center on Politics at the University of Virginia have shown that randomized field experiments of civic education are possible.

What we lack

In the civics field, we are most seriously in need of:

Tools for reliably assessing advanced skills, especially distinctively civic or leadership skills that are not also academic skills.
Tools for assessing participation in group projects and discussions.
Assessments of the quality of “inputs” (not what students know but what schools teach)
Well-designed assessments of the impact of professional development for teachers on their students.