Adaptive Behavior

 

Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to register and gain free access

Click here for more information

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Free Full Text (Free PDF) Free
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Banerjee, B.
Right arrow Articles by Peng, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Adaptive Behavior, Vol. 14, No. 4, 339-356 (2006)
DOI: 10.1177/1059712306072334

Reactivity and Safe Learning in Multi-Agent Systems

Bikramjit Banerjee

Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA, banerjee{at}eecs.tulane.edu

Jing Peng

Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA

Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reac tivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to opti mize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

Key Words: multi-agent systems • reinforcement learning • game theory


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?