Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Application of item response theory to criterion-referenced measurement: An investigation of the effects of model choice, sample size, and test length on reliability and estimation accuracy
Abstract
This study focused on the application of item response theory to criterion-referenced testing. The first purpose was to investigate the effects of model choice and reduced test length through optimal item selection methods on the reliability of a criterion-referenced examination. A second purpose was to investigate the effects of model choice, sample size, and reduced test length on the accuracy of ability and parameter estimation. Combinations of sample sizes (250 and 500 examinees) and test lengths (50 and 100 items) were used to study the effects on estimation accuracy with five different IRT models. LOGIST 5 was used to obtain item and ability estimates for the five estimation models. Actual test data were obtained from a 1986 administration of the Psychiatric and Mental Health Nurse Certification Examination given by the American Nurses Association. A total of 2,039 examinees took this 150 item multiple choice test. Optimal selection of items resulted in shortened versions of the examination that had reliability estimates comparable to and even higher in the case of the 100 item examination than the estimates obtained from the 150 item examination. Model performance for the full sample of examinees across the three test lengths showed the two- and three-parameter models provided the best model-data fit. Comparison of estimation accuracy with the IRT models in the reduced sample size test length conditions revealed the one- and modified one-parameter models provided more accurate estimates than the two- or three-parameter models. Little difference was noted in the accuracy of the one-parameter model compared to the modified one-parameter model. In conclusion, the more general models performed the best in the larger sample, while the one- and modified one-parameter models performed the best in the smaller samples. The negligible differences between the guessing and non-guessing models across the conditions were thought to be due to the lack of guessing in the data.
Subject Area
Educational evaluation
Recommended Citation
Pozehl, Bunny Jo, "Application of item response theory to criterion-referenced measurement: An investigation of the effects of model choice, sample size, and test length on reliability and estimation accuracy" (1990). ETD collection for University of Nebraska-Lincoln. AAI9030146.
https://digitalcommons.unl.edu/dissertations/AAI9030146