Department of Educational Psychology


Date of this Version



Published in Educational and Psychological Measurement 76:1 (2016), pp. 141–163; doi: 10.1177/0013164415585166


Copyright © 2016 HyeSun Lee and Kurt F. Geisinger. Published by SAGE Publications. Used by permission.


The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel– Haenszel procedure. Five factors—the length of a test, the proportion of items exhibiting DIF, a sample size, a ratio of a reference and focal group, and the existence of an average ability difference between two groups—were manipulated. The three matching approaches were used with and without purification. Also, a systematic test form difference was considered. The results indicated that overall, matching criterion purification in the three approaches contributed to the improvement of power in the detection of DIF. Depending on the psychometric characteristics of items exhibiting DIF and the existence of an average ability difference, the amount of power improvement due to matching criterion purification was different across the three approaches. The purification of a matching criterion contributed to the slight reduction of Type I error rates in the three approaches when no mean ability difference existed between the two groups. Considering power improvement with the control of Type I error rates, the purification of a matching criterion in the pooled booklet matching and the equated pooled booklet matching approaches can be recommended for DIF analyses in large-scale assessments.