Date of this Version
Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).
The number of people in the United States who carry some responsibility for the writing of examination questions and the construction of tests is unknown. In the Preface to The Construction and Use of Achievement Examinations, published by the American Council on Education in 1936, the authors indicated that the number probably exceeded a million. That number has certainly grown in the past 60 years. Questions are posed to students by teachers at all levels of education; the Armed Forces have people whose job it is to construct tests which are used in the promotion of personnel; over 1,000 occupations are regulated by the states and many, ranging from the professions to the trades, require licensure or certification (Brinegar, 1990). Many licensure and certification decisions are based on test performance.
Throughout the years, the types of test questions being used have changed, emphasis has changed from performance testing to multiple-choice testing and back to performance assessment. Apprenticeship programs in the trades- a kind of continuous assessment of performance-have been supplemented, or even replaced, by written examinations, or by a combination of written and performance tests. More recently, the use of technology in testing has begun to come into the picture. For example, computer administration of questions, interactive video, and CD-ROM are beginning to be used.
Regardless of the type of test, whether it was written 50 years ago or last week, there are some important concerns. Fundamental among these concerns are the reliability and validity of the measures. The purpose of this chapter is to focus on the psychometric issues of reliability and validity of measures as they pertain to licensure examinations. In addition, the chapter focuses on the relationship of the measures to various guidelines- those of the Equal Employment Opportunity Commission (EEOC, 1975) and The Standards for Educational and Psychological Testing, produced by a joint committee of the American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME) and published by the AP A (1 985). (We will refer to the EEOC document as the EEOC Guidelines and the AERA, APA, and NCME document as the Standards.)
Frequent references are made to the reliability and validity of examinations when, in reality, it is the scores and the decisions made on the basis of the scores that are, or are not, reliable and valid . In the context of licensure, scores are used to make decisions. Statistical analysis may show that the scores possess properties indicative of reliability . Studies may be conducted to show that the measures have some type of validity. However, reliable and valid scores may be used inconsistently or incorrectly, and when this happens, the decisions made on the basis of the scores may not be reliable or valid decisions.
The discussion of reliability and validity in this chapter focuses on the traditional concepts of reliability and validity rather than on a more contemporary approach broadly called generalizability theory. Our reasons for the focus on the more traditional concepts are simply that most licensure and certification programs with which we are familiar have not yet made the transition to generalizability theory as their basic approach to reporting the psychometric characteristics of their tests.