Date of this Version
In many real-world applications of multi-agent systems, agent reasoning suffers from bounded rationality caused by both limited resources and limited knowledge. When agent sensing also requires resource use, the agent’s knowledge revision is affected due to its inability to always sense when and as accurately as needed, further leading to poor decision making. In this research, we consider what happens when sensing activities require the use of stateful resources, which we define as resources whose state-dependent behavior changes over time based on usage. Specifically, sensing itself can change the state of a resource, and thus its behavior, which affects both the information gathered and the resulting knowledge refinement. This produces a phenomenon where the sensing activity can and will distort its own outcome (and potentially future outcomes), termed the Observer Effect after the similar phenomenon in the physical sciences. Under this effect, an agent faces a strategic tradeoff between 1) satisfying the need for knowledge refinement, and 2) satisfying the need for avoiding corruption of knowledge due to distorted sensing outcomes. To address this tradeoff, we model sensing activity selection as a Markov decision process (MDP) where an agent optimizes knowledge refinement while considering the state of the resources used during sensing. In this model, the agent uses reinforcement learning to learn a controller for activity selection, as well as how to predict expected knowledge refinement based on resource usage during sensing. Our approach is unique from other bounded rationality and sensing research as we consider how to make decisions about sensing with stateful resources which produce side-effects such as the Observer Effect, as opposed to simply using stateless resources with no such side-effect. We evaluate our approach in 1) a fully observable robotic mining simulation, as well as 2) a partially observable user preference elicitation simulation. The results demonstrate that considering the Observer Effect during sensing activity selection through our approach yields better knowledge refinement and often better task performance than not considering the effect of stateful resource usage.