Date of this Version
Presented at DBER Group Discussion March 2015.
1. Test design 1. Construct – the unobservable trait or attribute we want to measure; 2. Operationalizing – translating the construct into something observable; 3. Measurement – using scores to represent amounts of the construct via operations; 4. Scale – a set of operations (items) used to create composite scores; 5. Inference – assuming our scores describe some change in the construct; 6. Reliability – extent to which inferences are consistent; 7. Validity – extent to which inferences are accurate; 8. Purpose – the who, what, and why
2. Item writing — Standards or learning objectives – define what students should know or be able to do
3. Scale development: Difficulty indexes average performance; Discrimination indexes relationship between single item and total score; Internal consistency indexes the shared relationship among all items; Bias tells us that performance differs for student with certain demographic or background characteristics