Date of this Version
Published as Chapter 3 in Improving Surveys with Paradata: Analytic Uses of Process Information, First Edition. Edited by Frauke Kreuter. John Wiley & Sons, 2013, pp. 43–72.
Survey researchers and methodologists seek to have new and innovative ways of evaluating the quality of data collected from sample surveys. Paradata, or data collected for free from computerized survey instruments, have increasingly been used in survey methodological work for this purpose (Couper, 1998). One error source that has been studied using paradata is measurement error, or the deviation of a response from a “true” value (Groves, 1989; Biemer and Lyberg, 2003). Although used in psychological literature since the 1980s (see Fazio, 1990, for an early review) and adapted to telephone interviews by Bassili in the early 1990s (Bassili and Fletcher, 1991; Bassili and Scott, 1996), the adoption and use of paradata for studying measurement-error- related outcomes has grown exponentially with the growth of web surveys and increased use of computerization in interviewer-administered surveys (Couper, 1998; Heerwegh, 2003; Couper and Lyberg, 2005). Paradata are a proxy for breakdowns in the cognitive response process or identify problems respondents and interviewers have with a survey instrument (Couper, 2000; Yan and Tourangeau, 2008).
Paradata can be collected at a variety of levels, resulting in a complex, hierarchical data structure. Examples of paradata collected automatically by many computerized survey software systems include timing data, keystroke data, mouse click data, and information about the type of interface such as the web browser and screen resolution. Examples of paradata that inform the measurement process, but not collected automatically, include behavior codes, analysis of vocal characteristics, and interviewer evaluations or observations of the survey-taking process. Paradata available to be captured vary by mode of data collection and the software used for data collection. One challenge is that not all off-the-shelf software programs capture paradata, and thus user-generated programs have been developed to assist in recording paradata. Further complicating matters is how the data are recorded, ranging from text or sound files to ready-to-analyze variables. In this chapter, we review different types of paradata, evaluate how paradata differs by mode, and examine how to turn paradata into an analytic dataset.