|
Sign In to gain access to subscriptions and/or personal tools.
|
Adaptive Behavior, Vol. 14, No. 4,
339-356 (2006)
DOI: 10.1177/1059712306072334
Reactivity and Safe Learning in Multi-Agent Systems
Bikramjit Banerjee
Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA, banerjee{at}eecs.tulane.edu
Jing Peng
Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA
Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reac tivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to opti mize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.
Key Words: multi-agent systems reinforcement learning game theory
References
- Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multiarm bandit problem . In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (pp. 322331 ). Milwaukee, WI: IEEE Computer Society Press.
- Banerjee, B., & Peng, J. (2003). Countering deception in multiagent reinforcement learning . In Proceedings of the Sixth International Workshop on Trust, Privacy, Deception, and Fraud in Agent Societies. Melbourne, Australia.
- Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic intercations . In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04) (pp. 27 ). San Jose, CA: AAAI Press.
- Banerjee, B., Sen, S., & Peng, J. (2001). Fast concurrent reinforcement learners . In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 825832 ). Seattle, WA: Morgan Kaufmann.
- Bowling, M. (2005). Convergence and no-regret in multiagent learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 17, (pp. 209216). Cambridge, MA: MIT Press .
- Bowling, M., & Veloso, M. (2002a). Multiagent learning using a variable learning rate . Artificial Intelligence, 136, 215250 .[CrossRef]
- Bowling, M., & Veloso, M. (2002b). Scalable learning in stochastic games . AAAI Workshop Proceedings on Game Theoretic and Decision Theoretic Agents. Edmonton, Canada.
- Chang, Y. H., & Kaelbling, L. P. (2001). Playing is believing: The role of beliefs in multi-agent learning. Advances in Neural Information Processing Systems, 14 (pp. 14831490). Cambridge, MA: MIT Press .
- Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence (pp. 746752 ). Menlo Park, CA: AAAI Press/ MIT Press.
- Freund, Y., & Schapire, R. E. (1999). Adaptive game playing using multiplicative weights . Games and Economic Behavior, 29, 79103 .
- Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play . Journal of Economic Dynamics and Control, 19, 10651089 .[CrossRef]
- Gordon, D. (2000). Asimovian adaptive agents . Journal of Artificial Intelligence Research, 13, 95153 .
- Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm . In Proceedings of the 15th International Conference on Machine Learning (ML'98) (pp. 242250 ). San Francisco, CA: Morgan Kaufmann.
- Lee, C., & Wolpert, D. (2004). Product distribution theory for control of multi-agent systems . In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multi-Agent Systems (pp. 522529 ). New York, NY: ACM.
- Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning . In Proceedings of the 11th International Conference on Machine Learning (pp. 157163 ). San Mateo, CA: Morgan Kaufmann.
- Nachbar, J. H. (1997). Prediction, optimization and learning in repeated games . Econometrica, 65, 275309 .[CrossRef]
- Nash, J. F. (1951). Non-cooperative games . Annals of Mathematics, 54, 286295 .[CrossRef]
- Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game . Nature, 364, 5658 .[CrossRef][Medline]
[Order article via Infotrieve]
- Owen, G. (1995). Game theory. UK: Academic Press .
- Powers, R., & Shoham, Y. (2005). New criteria and a new algorithm for learning in multi-agent systems. Advances in Neural Information Processing Systems 17, (pp. 10891096). Cambridge, MA: MIT Press .
- SASEMAS. The online proceedings of the 1st and the 2nd international workshop on safety and security in multiagent systems. http://www.sasemas.org/index.html.
- Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games . In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, (pp. 541548 ). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances of Neural Information Processing Systems, 12, (pp. 10571063). Cambridge, MA: MIT Press .
- Wolpert, D. H. (2004). Theory of collective intelligence. In K. Tumer and D. Wolpert (Eds.), Collectives and the design of complex systems, 43106. New York, Berlin: Springer-Verlag .

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
|