Buros-Nebraska Series on Measurement and Testing

 

Date of this Version

1995

Document Type

Article

Citation

Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).

Comments

Copyright © 1995 by Buros Institute of Mental Measurements. Digital Edition copyright © 2012 Buros Center for Testing.

Abstract

New technologies continue to emerge each year, and influence testing practices. In particular, in the last 10 years the personal computer has evolved from a curious and minimally useful tool to an indispensable partner in many certification and licensure testing programs. It is involved in every aspect- including candidate scheduling, test assembly, test administration, test scoring and analysis, and score reporting. Initially, it is used to determine the content to be included in the job analysis instrument, and later, to analyze the returned surveys. After the job analysis is completed and test specifications prepared, it can be used to bank test items written to the specifications. Assembly of test forms, and typesetting of final copy prior to printing can be expertly accomplished. When paired to an optical mark reader scanner, it can be used to score and analyze tests. As an alternative to paper-and-pencil test delivery, items can be loaded onto a computer and administered in a variety of alternate forms and can provide instantaneous feedback to candidates. Likewise, score reports can be prepared and mailed to candidates using information stored in the candidate database.

As the personal computer has gained in power, it has had significant impact on the psychometric practices of testing. Statistical packages written for the "PC" platform are now as powerful as their mainframe counterparts. This has increased the accessibility of resource hungry technologies such as Item Response Theory (IRT) , making them available to many more individuals than those at universities and large testing companies. In turn, this availability has stimulated the research on new technologies, and encouraged their transition from " ivory tower" applications to real world, applied testing environments. Although the transition has not been totally painless, the initial trepidation has been overcome, and many organizations are beyond "testing the waters." They are in the operational mode of running IRT and classical psychometric test analyses concurrently. In this chapter, I will discuss what I consider to be the most significant of these technologies, as they relate to the major areas of testing, and attempt to forecast their impact on several areas of licensure testing practices throughout the 1990s.

JOB ANALYSIS AND TEST SPECIFICATIONS

Job analysis is the initial step in any well-designed licensure testing program. The purpose of job analysis is to identify the content to be included on the examination, commonly referred to as test specifications, thereby establishing content validity. A typical procedure includes the development of a sufficient number of task and/or knowledge/skill/ability (KSA) statements that totally describe the important job activities. These statement are then subjected to evaluation by a group of job experts in which the most important activities are identified through a rating process. The rating results are used to develop test specifications the content areas to be covered on the examination and their relative emphasis. A common procedure involves a committee of job experts making rational decisions about the structure and relative weighting of the content. For example, the structure may be defined as three major content areas, and the relative weighting may be 20% for Content Area I, 35% for Area II, and 45% for Area III.

Several methods exist for making these determinations statistically. However, not all have a sound empirical basis. Specifications are sometimes determined by initially combining several rating scales together for each activity statement to determine a "criticality value." For example, in a job analysis study of law enforcement special agents, Sistrunk and Smith (1982) calculated a "Task Importance Value" by multiplying the difficulty and criticality ratings together and then adding the time spent rating to this product. Test section weights are sometimes calculated by summing individual criticality values for all task/KSA statements determined to be in that section. Although this procedure may have intuitive appeal, it has no more statistical basis than the rational approach described earlier. Although both rational and empirical procedures may yield the same results, it has been my experience that a carefully conducted rational judgement procedure produces very usable test specifications.