Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Effects of unreliability, missing data, and nonlinearity on the performance of least squares methods of correcting for rater stringency

Melody Ann Hertzog, University of Nebraska - Lincoln

Abstract

When ratings are collected using an incompletely crossed design, differences in rater stringency may unfairly affect the scores. Various statistical adjustments to remove this form of rater bias have been proposed. This study investigated the performance of the ordinary least squares (OLS) and weighted least squares (WLS) adjustment methods proposed by Wilson (1988) using simulated data that varied in terms of generating model, rater unreliability, rater bias, and percentage of data missing from the Rater x Examinee matrix. Data were generated to fit either a linear model, Cason and Cason's (1984) nonlinear Rater Response Theory model, or the latter with violation of its assumption of equal rater sensitivities. The range of rater reliabilities was.20-.39,.40-.63, or.64-.82. Data matrices were 14% complete or 21% complete. Degree of rater bias was based on work by Houston, Raymond, and Svec (1991). Results for the linear model showed that OLS generally outperformed the conventional approach of averaging rater scores on all dependent measures: correlation of true and estimated scores, root mean squared error, and percentage of correct pass/fail decisions made at selected cut points. Using WLS did not result in any additional advantage. Only at the lowest levels of rater reliability and bias was there no advantage in using the adjustment methods. In many situations, the adjustment method produced gains in accuracy as large as those obtained by adding a rater and using the conventional method. With all methods, there was a tendency to make more false negative decisions than expected when the cut score was in the lower tail of the score distribution and more false positive decisions than expected in the upper tail. However, the least squares methods reduced this tendency when reliability was high and the bias level at least medium. Similar results were obtained for both nonlinear models. The author concludes that the least squares methods merit further investigation and offers guidelines for application of results to field studies.

Subject Area

Educational evaluation

Recommended Citation

Hertzog, Melody Ann, "Effects of unreliability, missing data, and nonlinearity on the performance of least squares methods of correcting for rater stringency" (1996). ETD collection for University of Nebraska-Lincoln. AAI9628234.
https://digitalcommons.unl.edu/dissertations/AAI9628234

Share

COinS