Date of this Version
Cognit Comput. 2018 October ; 10(5): 703–717.
The early eye tracking studies of Yarbus provided descriptive evidence that an observer’s task influences patterns of eye movements, leading to the tantalizing prospect that an observer’s intentions could be inferred from their saccade behavior. We investigate the predictive value of task and eye movement properties by creating a computational cognitive model of saccade selection based on instructed task and internal cognitive state using a Dynamic Bayesian Network (DBN). Understanding how humans generate saccades under different conditions and cognitive sets links recent work on salience models of low-level vision with higher level cognitive goals. This model provides a Bayesian, cognitive approach to top-down transitions in attentional set in pre-frontal areas along with vector-based saccade generation from the superior colliculus. Our approach is to begin with eye movement data that has previously been shown to differ across task. We first present an analysis of the extent to which individual saccadic features are diagnostic of an observer’s task. Second, we use those features to infer an underlying cognitive state that potentially differs from the instructed task. Finally, we demonstrate how changes of cognitive state over time can be incorporated into a generative model of eye movement vectors without resorting to an external decision homunculus. Internal cognitive state frees the model from the assumption that instructed task is the only factor influencing observers’ saccadic behavior. While the inclusion of hidden temporal state does not improve the classification accuracy of the model, it does allow accurate prediction of saccadic sequence results observed in search paradigms. Given the generative nature of this model, it is capable of saccadic simulation in real time. We demonstrated that the properties from its generated saccadic vectors closely match those of human observers given a particular task and cognitive state. Many current models of vision focus entirely on bottom-up salience to produce estimates of spatial “areas of interest” within a visual scene. While a few recent models do add top-down knowledge and task information, we believe our contribution is important in three key ways. First, we incorporate task as learned attentional sets that are capable of self-transition given only information available to the visual system. This matches influential theories of bias signals by (Miller and Cohen Annu Rev Neurosci 24:167–202, 2001) and implements selection of state without simply shifting the decision to an external homunculus. Second, our model is generative and capable of predicting sequence artifacts in saccade generation like those found in visual search. Third, our model generates relative saccadic vector information as opposed to absolute spatial coordinates. This matches more closely the internal saccadic representations as they are generated in the superior colliculus.