Discipline-Based Education Research Group


Date of this Version


Document Type



Presented at DBER Group Discussion March 2015.


Copyright (c) 201 Tony Albano


1. Test design 1. Construct – the unobservable trait or attribute we want to measure; 2. Operationalizing – translating the construct into something observable; 3. Measurement – using scores to represent amounts of the construct via operations; 4. Scale – a set of operations (items) used to create composite scores; 5. Inference – assuming our scores describe some change in the construct; 6. Reliability – extent to which inferences are consistent; 7. Validity – extent to which inferences are accurate; 8. Purpose – the who, what, and why

2. Item writing — Standards or learning objectives – define what students should know or be able to do

3. Scale development: Difficulty indexes average performance; Discrimination indexes relationship between single item and total score; Internal consistency indexes the shared relationship among all items; Bias tells us that performance differs for student with certain demographic or background characteristics