Sociology, Department of

 

Date of this Version

2-26-2019

Document Type

Article

Citation

Presented at “Interviewers and Their Effects from a Total Survey Error Perspective Workshop,” University of Nebraska-Lincoln, February 26-28, 2019.

Comments

Copyright 2019 by the authors.

Abstract

Standardized survey interviewing techniques are intended to reduce interviewers’ effects on survey data. A common method to assess whether or not interviewers read survey questions exactly as worded is behavior coding. However, manually behavior coding an entire survey is expensive and time-consuming. Machine learning techniques such as Recurrent Neural Networks (RNNs) may offer a way to partially automate this process, saving time and money. RNNs learn to categorize sequential data (e.g., conversational speech) based on patterns learned from previously categorized examples. Yet the feasibility of an automated RNN-based behavior coding approach and how accurately this approach codes behaviors compared to human behavior coders are unknown.

In this poster, we compare coding of interviewer question-asking behaviors by human undergraduate coders to the coding of transcripts performed by RNNs. Humans transcribe and manually behavior code each interview in the Work and Leisure Today II telephone survey (AAPOR RR3=7.8%) at the conversational turn level (n=47,900 question-asking turns) to identify when interviewers asked questions (1) exactly as worded, (2) with minor changes (i.e., changes not affecting question meaning), or (3) with major changes (i.e., changes affecting question meaning). With a random subset of interview transcripts as learning examples, we train RNNs to classify interviewer question-asking behaviors into these same categories. A random 10% subsample of transcripts (n=94) were also coded by master coders to evaluate inter-coder reliability. We compare the reliability of coding (versus the master coders) by the human coders with the reliability of the coding (versus the master coders) by the RNNs. Preliminary results indicate that the human coders and the RNNs have equal reliability (p>.05) for questions with a large proportion of major and minor changes. We conclude with implications for behavior coding telephone interview surveys using machine learning in general, and RNNs in particular.

Share

COinS