Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

CiteULike is a free service for managing and discovering scholarly references - click here to get started.

Sign In to gain access to subscriptions and/or personal tools.
Adaptive Behavior
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (7)
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Stone, P.
Right arrow Articles by Kuhlmann, G.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Reinforcement Learning for RoboCup Soccer Keepaway

Peter Stone

Department of Computer Sciences, The University of Texas at Austin, pstone{at}cs.utexas.edu

Richard S. Sutton

Department of Computing Science, University of Alberta, sutton{at}cs.ualberta.ca

Gregory Kuhlmann

Department of Computer Sciences, The University of Texas at Austin, kuhlmann{at}cs.utexas.edu

RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa({lambda}) with linear tile-coding function approximation and variable {lambda} to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, "the keepers," tries to keep control of the ball for as long as possible despite the efforts of "the takers." The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.

Key Words: multiagent systems • machine learning • multiagent learning • reinforcement learning • robot soccer

References

  • Albus, J. S. (1981). Brains, behavior, and robotics. Peterborough, NH: Byte Books .
  • Andou, T. (1998). Refinement of soccer agents’ positions using reinforcement learning. In H. Kitano (Ed.), RoboCup-97: Robot soccer world cup I (pp. 373-388). Berlin: Springer .
  • Andre, D., & Russell, S. J. (2001). Programmable reinforcement learning agents. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 1019-1025). Cambridge, MA: MIT Press .
  • Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents . In R. Dechter, M. Kearns & R. S. Sutton (Eds.), Proceedings of the 18th National Conference on Artificial IntelligenceMento Park (pp. 119-125 ). CA: AAAL Press.
  • Andre, D., & Teller, A. (1999). Evolving team Darwin United. In M. Asada & H. Kitano (Eds.), RoboCup-98: Robot soccer world cup II (pp. 346-351). Berlin: Springer .
  • Bagnell, J. A., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods . In International Conference on Robotics and Automation(pp. 1615-1620 ). IEEE.
  • Baird, L. C., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.) Advances in neural information processing systems (Vol. 11, pp. 968-974). Cambridge, MA: The MIT Press .
  • Balch, T. (2000a). Teambots. http://www.teambots.org.
  • Balch, T. (2000b). Teambots domain: Soccerbots. http://www-2.cs.cmu.edu/trb/TeamBots/Domains/SoccerBots.
  • Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage . Journal of Artificial Intelligence Research, 11, 1-94 .
  • Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, & T. Leem (Eds.), Advances in neural information processing systems (Vol. 7, pp. 393-400). San Mateo, CA: Morgan Kaufmann .
  • Chen, M., Foroughi, E., Heintz, F., Kapetanakis, S., Kostiadis, K., Kummeneje, J., Noda, I., Obst, O., Riley, P., Steffens, T., Wang, Y., & Yin, X. (2003). Users manual: RoboCup soccer server manual for soccer server version 7.07 and later. Available at http://sourceforge.net/projects/sserver/
  • Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1017-1023). Cambridge, MA: The MIT Press .
  • Dean, T., Basye, K., & Shewchuk, J. (1992). Reinforcement learning for planning and control. In S. Minton (Ed.), Machine learning methods for planning and scheduling (pp. 67-92). San Mateo, CA: Morgan Kaufmann .
  • Dietterich, T. G. (2000). Hierarchical reinforcement learning with the maxq value function decomposition . Journal of Artificial Intelligence Research, 13, 227-303 .
  • Gordon, G. (2001). Reinforcement learning with function approximation converges to a region. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 1040-1046). Cambridge, MA: The MIT Press .
  • Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In T. G. Dietterich, S. Becker & Z. Ghahramani (Eds.) Advances in neural information processing systems (Vol. 14, pp. 1523-1530). Cambridge, MA: MIT Press .
  • Hsu, W. H., & Gustafson, S. M. (2002). Genetic programming and multi-agent layered learning by reinforcements . In W. B. Langdonet. al. (Eds.), Genetic and Evolutionary Computation Conference (New York) (pp. 764-771 ). San Mateo, CA: Morgan Kaufmann.
  • Kitano, H., Tambe, M., Stone, P., Veloso, M., Coradeschi, S., Osawa, E., Matsubara, H., Noda, I., & Asada, M. (1997). The RoboCup synthetic agent challenge 97 . In M. E. Pollack (Ed.) Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 24-29 ). San Francisco, CA: Morgan Kaufmann.
  • Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs . In T. Dean (Ed.), Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99) pp ( 1332-1339 ). Morgan Kaufmann.
  • Lin, C.-S., & Kim, H. (1991). CMAC-based adaptive critic self-learning control. In IEEE Transactions on Neural Networks, 2, (pp. 530-533). IEEE .
  • Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1998). Co-evolving soccer softbot team coordination with genetic programming. In Kitano, H. (Ed.), RoboCup-97: Robot soccer world cup I (pp. 398-411). Berlin: Springer .
  • Marsella, S., Tambe, M., Adibi, J., Al-Onaizan, Y., Kaminka, G. A., & Muslea, I. (2001). Experiences acquired in the design of RoboCup teams: A comparison of two fielded teams . Autonomous Agents and Multi-Agent Systems, 4(2), 115-129 .[CrossRef]
  • McAllester, D., & Stone, P. (2001). Keeping the ball from CMUnited-99. In P. Stone, T. Balch, & G. Kraetszchmar (Eds.), RoboCup-2000: Robot soccer world cup IV. (pp. 333-338) Berlin: Springer .
  • Noda, I., Matsubara, H., & Hiraki, K. (1996). Learning cooperative behavior in multi-agent environment: A case study of choice of play-plans in soccer . In N. Y. Foo & R. Gobel (Eds.), PRICAI’96: Topics in Artificial Intelligence (Proceedings of the Fourth Pacific Rim International Conference on Artificial Intelligence) (pp. 570-579 ) (Cairns, Australia) Springer.
  • Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: A tool for research on multiagent systems . Applied Artificial Intelligence, 12, 233-250 .
  • Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 16) (pp. 1595-1602) Cambridge, MA: The MIT Press .
  • Pietro, A. D., While, L., & Barone, L. (2002). Learning in Robo-Cup keepaway using evolutionary algorithms . In W. B. Langdon et. al., GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1065-1072 ). New York: Morgan Kaufmann.
  • Puterman, M. L. (1994). Markov decision problems. New York: Wiley .
  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann .
  • Riedmiller, M., Merke, A., Meier, D., Hoffman, A., Sinner, A., Thate, O., & Ehrmann, R. (2001). Karlsruhe brainstormers—a reinforcement learning approach to robotic soccer. In P. Stone, T. Balch, & G. Kraetszchmar, (Eds.), Robo-Cup-2000: Robot soccer world cup IV. (pp. 367-372) Berlin: Springer .
  • Riedmiller, M., Merke, A., Hoffmann, A., Withopf, D., Nickschas, M., & Zacharias, F. (2003). Brainstormers 2002— team description. In G. A. Kaminka, P. U. Lima, & R. Rojas (Eds.), RoboCup-2002: Robot soccer world cup VI. Berlin: Springer .
  • Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report CUED/FINFENG/TR 166, Cambridge University Engineering Department .
  • Stone, P. (2000). Layered learning in multiagent systems: A winning approach to robotic soccer. Cambridge, MA: The MIT Press .
  • Stone, P., & McAllester, D. (2001). An architecture for action selection in robotic soccer . In E. Andre, S. Sen, C. Frasson & J. P. Muller (Eds.) Proceedings of the Fifth International Conference on Autonomous Agents (pp. 316-323 ). New York, NY: ACM Press.
  • Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer . In C. E. Brodley & A. P. Danyluk (Eds.) Proceedings of the Eighteenth International Conference on Machine Learning (pp. 537-544 ). San Francisco, CA: Morgan Kaufmann.
  • Stone, P., & Sutton, R. S. (2002). Keepaway soccer: A machine learning testbed. In A. Birk, S. Coradeschi, & S. Tadokoro (Eds.), RoboCup-2001: Robot soccer world cup V (pp. 214-223). Berlin: Springer .
  • Stone, P., Sutton, R. S., & Singh, S. (2001). Reinforcement learning for 3 vs. 2 keepaway. In P. Stone, T. Balch, & G. Kraetszchmar (Eds.), RoboCup-2000: Robot soccer world cup IV (pp. 249-258). Berlin: Springer .
  • Stone, P., & Veloso, M. (1999). Team-partitioned, opaque-transition reinforcement learning. In M. Asada, & H. Kitano (Eds.), RoboCup-98: Robot soccer world cup II (pp. 261-272). Berlin: Springer Verlag .
  • Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems Vol. 8, (pp. 1038-1044), Cambridge, MA: The MIT Press .
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: The MIT Press .
  • Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K. R. Muller (Eds.) Advances in neural information processing systems, (Vol. 12, pp. 1057-1063). Cambridge, MA: The MIT Press .
  • Sutton, R., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning . Artificial Intelligence, 112, 181-211 .[CrossRef]
  • Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents . In Proceedings of the Tenth International Conference on Machine Learning (pp. 330-337 ). Morgan Kaufmann.
  • Taylor, M. E., & Stone, P. (2005). Behavior transfer for value-function-based reinforcement learning . In V. Digman, S. Koenig, S. Kraus, M. P. Sigh & M. Wooldridge (Eds.), The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems. (pp. 53-59 ). New York, NY: ACM Press.
  • Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play . Neural Computation, 6(2), 215-219 .
  • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation . IEEE Transactions on Automatic Control, 42, 674-690 .[CrossRef]
  • Uchibe, E. (1999). Cooperative behavior acquisition by learning and evolution in a multi-agent environment for mobile robots. Ph.D. thesis, Osaka University.
  • Uchibe, E., Yanase, M., & Asada, M. (2001). Evolution for behavior selection accelerated by activation/termination constraints . In H. Beyer, E. Canth-Puz, D. Goldberg, Parmee, L. Spector & D. Whitley (Eds.), Proceedings of the Genetic and Evolutionary Computation Conference(pp. 1122-1129 ). Morgan Kaufmann.
  • Veloso, M., Stone, P., & Bowling, M. (1999). Anticipation as a key for collaboration in a team of agents: A case study in robotic soccer . In P. S. Schenker & G. T. McKee (Eds.) Proceedings of SPIE Sensor Fusion and Decentralized Control in Robotic Systems II (Vol. 3839) (Boston, MA). Belligman, W.A: SPIE.
  • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge.
  • Whiteson, S., & Stone, P. (2003). Concurrent layered learning . In J. S. Rosenchein, T. Sandholm, M. Woodridge & M. Yokoo (Eds.), Second International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 193-200 ). New York, NY: ACM Press.

Adaptive Behavior, Vol. 13, No. 3, 165-188 (2005)
DOI: 10.1177/105971230501300301


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Adaptive BehaviorHome page
J. L. Krichmar
The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World
Adaptive Behavior, December 1, 2008; 16(6): 385 - 399.
[Abstract] [PDF]


Home page
Adaptive BehaviorHome page
S. Whiteson, M. E. Taylor, and P. Stone
Empirical Studies in Action Selection with Reinforcement Learning
Adaptive Behavior, March 1, 2007; 15(1): 33 - 50.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (7)
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Stone, P.
Right arrow Articles by Kuhlmann, G.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?