Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

CiteULike is a free service for managing and discovering scholarly references - click here to get started.

Sign In to gain access to subscriptions and/or personal tools.
Adaptive Behavior
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Banerjee, B.
Right arrow Articles by Peng, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Reactivity and Safe Learning in Multi-Agent Systems

Bikramjit Banerjee

Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA, banerjee{at}eecs.tulane.edu

Jing Peng

Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA

Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reac tivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to opti mize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

Key Words: multi-agent systems • reinforcement learning • game theory

Adaptive Behavior, Vol. 14, No. 4, 339-356 (2006)
DOI: 10.1177/1059712306072334


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?