Honors Program


Date of this Version


Document Type



Schmidt, J. (2019). Subjective evaluation of temperament in beef cattle: Does training improve the reliability of the assessment? Undergraduate Honors thesis. University of Nebraska - Lincoln.


Copyright Jessica Schmidt 2019.


In recent years there has been an increased focus on docility in selection programs in cattle due to its impact on animal well-being, human safety, and profitability. Unfortunately, when using subjective evaluations to assess docility, the reliability of such evaluations both within (intraobserver) and between (interobserver) observers has been equivocal. The objective of this study was to develop and test various training methods to enable participants to more consistently score animal behavior. Fifteen second audio-video recordings capturing the behavior of cattle while loosely restrained in a chute were available. These videos had previously been assigned an official chute score by experienced observers based on an ethogram categorizing behavior from docile (1) to aggressive (6). Ninety volunteers were recruited. At the start of session 1 (S1), these participants completed a questionnaire seeking demographic and animal experience information. They were then shown 14 videos randomly drawn from a pool, with each video shown twice unbeknownst to them; they therefore watched 28 videos in total. Using the same ethogram, they scored each video. Based on responses to the survey, participants were randomly assigned to one of three treatments within demographic categories—a combination of cattle experience level (experienced, inexperienced), age (college, other), and gender (male, female). The first group served as the control (C; n = 31) and received no training. Treatment 1 (T1; n = 30) watched a 20-minute training video. Treatment 2 (T2; n = 29) watched the same training video and completed a self-test. The following week (6.2 ±1.8 days later) the participants repeated the exercise at a second session (S2). Differences in inter- and intraobserver reliabilities were tested using a weighted Kappa coefficient (K). Experience level, age, and gender did not influence reliability at S1 (P > 0.28) or when comparing sessions (P > 0.05). At S2, interobserver reliabilities where higher (P < 0.05) following training (KT1 = 0.68; KT2 = 0.73) as compared to no training (KC = 0.64); however, there was no difference between training methods (P = 0.94). Intraobserver reliability also increased between sessions for all three treatments but did not differ among them (P> 0.31). For all cases, however, the value of K exceeded 0.70, which is considered a threshold value for strong reliability. Relationships among demography, animal experience and treatments were explored