Buros-Nebraska Series on Measurement and Testing


Date of this Version


Document Type



From: Assessment of Teaching: Purposes, Practices, and Implications for the Profession, edited by James Y. Mitchell, Jr., Steven L. Wise, and Barbara S. Plake; Series Editor Jane Close Conoley (Hillsdale, New Jersey, Hove & London: Lawrence Erlbaum Associates, 1990)


Copyright © 1990 by Lawrence Erlbaum Associates, Inc. Digital Edition Copyright © 2012 Buros Center for Testing. This book may be downloaded, saved, and printed by an individual for their own use. No part of this book may be re-published, re-posted, or redistributed without written permission of the holder of copyright.


In a recent update on teacher testing practices across the United States, Rudner (1988) reported that 44 states have developed teacher-certification-testing programs, with 26 states currently testing prospective teachers as a certification requirement and another 18 states scheduled to implement such programs in the near future. It is obvious that teacher testing has become a very extended endeavor- It has also stimulated extended debate.

It was in acknowledgment of the importance of this extended teacher testing and associated debate that the Advisory Committee of the Buros Institute of Mental Measurements decided to devote its 1987 annual symposium to the topic of teacher assessment. As we developed the plans for this symposium, we tried to keep in mind two principles to guide our thinking and planning: (a) our treatment of teacher assessment was not to be narrowly conceived and focused on a singular aspect of teacher assessment (e .g., assessment for certification), but rather was to address the larger measurement and implementation issues that were generic to many or all teacher assessment settings; and (b) we hoped that we could avoid the mere rehashing of old issues and instead effectively advance thinking about teacher assessment in ways that, in John Dewey's words, would represent "a level deeper and more inclusive than is represented by the ideas and practices of the contending parties" (Dewey, 1949, p. v). We hope that we have accomplished that, at least to some degree, both in the symposium and now with the book. The purpose of this concluding chapter is to work within this context to highlight and compare some of the salient thoughts of the several contributors, to reflect on their meaning and implications, and to point out some of the issues that remain. Each contributor is considered in turn, with summary comment to follow about their combined contributions. W. James Popham, the keynote speaker, is considered first.

When we first asked Jim Popham to present the keynote address at the symposium, we had in mind both his extended and important contributions to this area and the fact that this experience would qualify him admirably for addressing the questions implied by the topic we had tentatively suggested: "Teacher Assessment: Why and for What Purpose?" When he accepted the invitation, Popham asked whether he could "spice up" the title, and the final result was a "spicing up" of both title and topic by focusing on an issue that he felt had very potent implications for the future of teacher assessment: "Face Validity: Siren Song for Teacher Testers."

Popham's contention that we are being lured away from more important concerns by becoming preoccupied with face validity considerations is an important and timely one for many participants in the teacher-testing enterprise, particularly those who are not as indoctrinated with the holy trinity of validity classifications as most measurement people are. But measurement people are only a small contingent in the teacher-testing arena, and the lure of face-validity considerations over the more important consideration of the validity of score-based inferences is but another example of the miscommunication, differing (and sometimes unknowledgeable) expectations of different groups, and downright wish fulfillment that often seems rampant whenever the issue of teacher testing arises . The" quick fix" mentality often found in the public, legislators, governmental agencies, and even in some educators creates a setting where the siren song of face validity becomes irresistibly appealing. Popham is to be congratulated for warning us of the risk.

I am almost totally in accord with the major points made by Popham, and this should be kept in mind in the following discussion. However, there were some issues that were raised in my mind that did not necessarily lessen the effect of Popham's arguments but were stimulated by the major directions that his arguments took. If these are side issues, they are important side issues, and they are an interesting example of how a focus on one particular issue can raise other issues for which the answers sought are important in their own right as well as for their contribution to the understanding of the original issue.

The first issue relates to Popham's definition of face validity, a definition that I believe most of us would find acceptable: "Face validity constitutes the perceived legitimacy of a test for the use to which it is being put." As I read that definition I was struck by the extent to which "perceived legitimacy" plays a role not only in the face-validity setting but also in the content-validity exercises that are so much a part of the local validation effort for teacher tests like the National Teacher Examinations (NTE). It is sometimes hard to determine why "perceived legitimacy" is accorded so much more professional approval in the case of content validity than it is for face validity. For the NTE, for example, a typical content validity exercise would have a college-based panel address the question of the content appropriateness of each test item by asking each panelist whether 90% of the applicants for entry-level certification have had the opportunity to acquire the knowledge or academic skills being tested; another panel, in this case a school based panel, would address the question of the job relatedness of each test item by asking each panelist how important the knowledge or skill was for the beginning teacher in general. If this isn't a "perceived legitimacy" question, I don't know what is. There are differences, of course, but are the differences critical? In the case of the NTE panels, for example, the panelists are supposed to be either experts or very knowledgeable people who have direct personal experience with the content or job that defines the judgment setting. Face-validity judgments usually refer to judgments by less qualified or knowledgeable people. Another difference is what is judged. The NTE panels judge either content relatedness with teacher-training curricula or relatedness to the job of teaching. Face-validity considerations involve judgments about whether the test or test items look appropriate for the testing of teachers. But they are both "perceived legitimacy" judgments with all the human frailties usually associated with such judgments.