Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Using IRT created models of ability in standard setting

June E Smith, University of Nebraska - Lincoln

Abstract

This study explored the uses of item response theory in the setting of standards (or cut scores) on examinations. The Angoff method of standard setting requires experts to estimate the probability of success by the just barely proficient examination candidate on each item. These estimates have been found to poorly represent the actual performance of known groups of just barely proficient examinees (Impara & Plake, 1997). This study explored the potential for establishing item estimates using latent trait parameters. Two equivalent 110 item subsets of a 220 operational item certification examination (Forms A and B) were submitted to two 15 member panels of judges for Angoff standard setting. The resulting probability estimates were converted to 1/0 vectors through a probabilistic methodology, allowing a simulation of the Yes/No Angoff alternative methodology proposed by Impara & Plake (1997). These vectors were entered into Bilog (1990) as “synthetic candidates”. The score “earned” by the judge's “synthetic candidate” represented the judge's cut score. Cut scores were calculated from item probabilities for individual judges and for estimates averaged across judges for (1) Angoff probabilities, (2) the Yes/No simulation, and (3) IRT “synthetic candidate” estimates. Another (Best Fit) methodology was tested that compared the judges's simulated 1/0 vectors to vectors created by IRT estimates at 20 evenly-spaced theta levels between –2 and 2. Cut scores created with IRT estimates were higher than those from the Yes/No simulation for 9 of 15 Form A judges, and for 11 of 15 Form B judges. A cut score of 136 (Form A plus B) was set with averaged per item estimates in the Yes/No simulation. The IRT estimates set a cut score of 139. The Best Fit methodology created varied and inclusive results. These results indicated the need for further study with actual Yes/No data. Other indices made possible through the estimation of latent traits such as item information and person fit should also be explored for use in standard setting.

Subject Area

Educational psychology|School administration

Recommended Citation

Smith, June E, "Using IRT created models of ability in standard setting" (1999). ETD collection for University of Nebraska-Lincoln. AAI9929231.
https://digitalcommons.unl.edu/dissertations/AAI9929231

Share

COinS