Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).


Copyright © 1995 by Buros Institute of Mental Measurements. Digital Edition copyright © 2012 Buros Center for Testing.


When test scores are used to make important decisions, as is typically the case with licensure tests, the validity of test score interpretations is extremely critical. The validity of the decision (e.g., pass or fail the licensure examination) relies heavily on the validity of the test score that is used in making the licensure decision. So, although validity is always a critical component in test score interpretation, it has increased importance when the score is used in high-stakes decision situations such as licensure testing.

Issues in validity for licensure tests have been addressed in Chapter 4 of this volume. The focus of this chapter is on techniques that have been developed for identifying one source of test interpretation invalidity: differential item functioning (DIF) by identifiable groups. The chapter begins with a discussion of what constitutes differential item functioning and under what circumstances differential item functioning poses a source of test interpretation invalidity. Next, various methods for identifying test items that function differentially are highlighted. This section focuses principally on multiple-choice test items although a separate subsection on applications of DIF methods with constructed-response type items is presented. The chapter ends with a conclusion section that makes recommendations for future developments in the area of identification of test items that function inappropriately for different sub-populations.

This chapter concentrates on the individual items that comprise the test, not on administrative or other aspects of testing that also might influence examinee test performance. Specifically, this chapter considers ways to identify items that function differentially for identifiable sub-populations. Other reasons for score performance differences (e.g., speeded conditions, administration medium, test anxiety/wiseness) are extremely important. However, these issues are beyond the scope of this chapter.

The focus of this chapter is on discussing different approaches that have promise for identifying items that function differentially in licensure tests. It is not the intent of this chapter to present step-by-step details on calculating these various methods. The reader should reference other books that present formulas for such calculations, particularly Berk (1982), Camilli and Shepard (1994), and Holland and Wainer (1993). Further, this chapter is not designed to be a comprehensive resource for DIF methods; instead, the chapter samples from these methods those techniques that are relevant or dominant in use for DIF analysis with licensure test applications.