Computer Science and Engineering, Department of

 

Date of this Version

11-2010

Document Type

Article

Comments

A Thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science (Major: Computer Science) under the supervision of Professor Leen-Kiat Soh Lincoln, Nebraska November, 2010 Copyright 2010 Adam Eck

Abstract

In many real-world applications of multi-agent systems, agent reasoning suffers from bounded rationality caused by both limited resources and limited knowledge. When agent sensing also requires resource use, the agent’s knowledge revision is affected due to its inability to always sense when and as accurately as needed, further leading to poor decision making. In this research, we consider what happens when sensing activities require the use of stateful resources, which we define as resources whose state-dependent behavior changes over time based on usage. Specifically, sensing itself can change the state of a resource, and thus its behavior, which affects both the information gathered and the resulting knowledge refinement. This produces a phenomenon where the sensing activity can and will distort its own outcome (and potentially future outcomes), termed the Observer Effect after the similar phenomenon in the physical sciences. Under this effect, an agent faces a strategic tradeoff between 1) satisfying the need for knowledge refinement, and 2) satisfying the need for avoiding corruption of knowledge due to distorted sensing outcomes. To address this tradeoff, we model sensing activity selection as a Markov decision process (MDP) where an agent optimizes knowledge refinement while considering the state of the resources used during sensing. In this model, the agent uses reinforcement learning to learn a controller for activity selection, as well as how to predict expected knowledge refinement based on resource usage during sensing. Our approach is unique from other bounded rationality and sensing research as we consider how to make decisions about sensing with stateful resources which produce side-effects such as the Observer Effect, as opposed to simply using stateless resources with no such side-effect. We evaluate our approach in 1) a fully observable robotic mining simulation, as well as 2) a partially observable user preference elicitation simulation. The results demonstrate that considering the Observer Effect during sensing activity selection through our approach yields better knowledge refinement and often better task performance than not considering the effect of stateful resource usage.

Share

COinS