Adaptive Behavior

 

Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to register and gain free access

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Banerjee, B.
Right arrow Articles by Peng, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Adaptive Behavior, Vol. 14, No. 4, 339-356 (2006)
DOI: 10.1177/1059712306072334

Reactivity and Safe Learning in Multi-Agent Systems

Bikramjit Banerjee

Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA, banerjee{at}eecs.tulane.edu

Jing Peng

Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA

Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reac tivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to opti mize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

Key Words: multi-agent systems • reinforcement learning • game theory

References

  • Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multiarm bandit problem . In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (pp. 322–331 ). Milwaukee, WI: IEEE Computer Society Press.
  • Banerjee, B., & Peng, J. (2003). Countering deception in multiagent reinforcement learning . In Proceedings of the Sixth International Workshop on Trust, Privacy, Deception, and Fraud in Agent Societies. Melbourne, Australia.
  • Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic intercations . In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04) (pp. 2–7 ). San Jose, CA: AAAI Press.
  • Banerjee, B., Sen, S., & Peng, J. (2001). Fast concurrent reinforcement learners . In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 825–832 ). Seattle, WA: Morgan Kaufmann.
  • Bowling, M. (2005). Convergence and no-regret in multiagent learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 17, (pp. 209–216). Cambridge, MA: MIT Press .
  • Bowling, M., & Veloso, M. (2002a). Multiagent learning using a variable learning rate . Artificial Intelligence, 136, 215–250 .[CrossRef]
  • Bowling, M., & Veloso, M. (2002b). Scalable learning in stochastic games . AAAI Workshop Proceedings on Game Theoretic and Decision Theoretic Agents. Edmonton, Canada.
  • Chang, Y. H., & Kaelbling, L. P. (2001). Playing is believing: The role of beliefs in multi-agent learning. Advances in Neural Information Processing Systems, 14 (pp. 1483–1490). Cambridge, MA: MIT Press .
  • Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence (pp. 746–752 ). Menlo Park, CA: AAAI Press/ MIT Press.
  • Freund, Y., & Schapire, R. E. (1999). Adaptive game playing using multiplicative weights . Games and Economic Behavior, 29, 79–103 .
  • Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play . Journal of Economic Dynamics and Control, 19, 1065–1089 .[CrossRef]
  • Gordon, D. (2000). Asimovian adaptive agents . Journal of Artificial Intelligence Research, 13, 95–153 .
  • Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm . In Proceedings of the 15th International Conference on Machine Learning (ML'98) (pp. 242–250 ). San Francisco, CA: Morgan Kaufmann.
  • Lee, C., & Wolpert, D. (2004). Product distribution theory for control of multi-agent systems . In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multi-Agent Systems (pp. 522–529 ). New York, NY: ACM.
  • Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning . In Proceedings of the 11th International Conference on Machine Learning (pp. 157–163 ). San Mateo, CA: Morgan Kaufmann.
  • Nachbar, J. H. (1997). Prediction, optimization and learning in repeated games . Econometrica, 65, 275–309 .[CrossRef]
  • Nash, J. F. (1951). Non-cooperative games . Annals of Mathematics, 54, 286–295 .[CrossRef]
  • Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game . Nature, 364, 56–58 .[CrossRef][Medline] [Order article via Infotrieve]
  • Owen, G. (1995). Game theory. UK: Academic Press .
  • Powers, R., & Shoham, Y. (2005). New criteria and a new algorithm for learning in multi-agent systems. Advances in Neural Information Processing Systems 17, (pp. 1089–1096). Cambridge, MA: MIT Press .
  • SASEMAS. The online proceedings of the 1st and the 2nd international workshop on safety and security in multiagent systems. http://www.sasemas.org/index.html.
  • Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games . In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, (pp. 541–548 ). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  • Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances of Neural Information Processing Systems, 12, (pp. 1057–1063). Cambridge, MA: MIT Press .
  • Wolpert, D. H. (2004). Theory of collective intelligence. In K. Tumer and D. Wolpert (Eds.), Collectives and the design of complex systems, 43–106. New York, Berlin: Springer-Verlag .

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Banerjee, B.
Right arrow Articles by Peng, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?