|
Sign In to gain access to subscriptions and/or personal tools.
|
Reinforcement Learning for RoboCup Soccer Keepaway
Peter Stone
Department of Computer Sciences, The University of Texas at Austin, pstone{at}cs.utexas.edu
Richard S. Sutton
Department of Computing Science, University of Alberta, sutton{at}cs.ualberta.ca
Gregory Kuhlmann
Department of Computer Sciences, The University of Texas at Austin, kuhlmann{at}cs.utexas.edu
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa( ) with linear tile-coding function approximation and variable to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, "the keepers," tries to keep control of the ball for as long as possible despite the efforts of "the takers." The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
Key Words: multiagent systems machine learning multiagent learning reinforcement learning robot soccer
References
- Albus, J. S. (1981). Brains, behavior, and robotics. Peterborough, NH: Byte Books .
- Andou, T. (1998). Refinement of soccer agents positions using reinforcement learning. In H. Kitano (Ed.), RoboCup-97: Robot soccer world cup I (pp. 373-388). Berlin: Springer .
- Andre, D., & Russell, S. J. (2001). Programmable reinforcement learning agents. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 1019-1025). Cambridge, MA: MIT Press .
- Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents . In R. Dechter, M. Kearns & R. S. Sutton (Eds.), Proceedings of the 18th National Conference on Artificial IntelligenceMento Park (pp. 119-125 ). CA: AAAL Press.
- Andre, D., & Teller, A. (1999). Evolving team Darwin United. In M. Asada & H. Kitano (Eds.), RoboCup-98: Robot soccer world cup II (pp. 346-351). Berlin: Springer .
- Bagnell, J. A., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods . In International Conference on Robotics and Automation(pp. 1615-1620 ). IEEE.
- Baird, L. C., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.) Advances in neural information processing systems (Vol. 11, pp. 968-974). Cambridge, MA: The MIT Press .
- Balch, T. (2000a). Teambots. http://www.teambots.org.
- Balch, T. (2000b). Teambots domain: Soccerbots. http://www-2.cs.cmu.edu/trb/TeamBots/Domains/SoccerBots.
- Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage . Journal of Artificial Intelligence Research, 11, 1-94 .
- Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, & T. Leem (Eds.), Advances in neural information processing systems (Vol. 7, pp. 393-400). San Mateo, CA: Morgan Kaufmann .
- Chen, M., Foroughi, E., Heintz, F., Kapetanakis, S., Kostiadis, K., Kummeneje, J., Noda, I., Obst, O., Riley, P., Steffens, T., Wang, Y., & Yin, X. (2003). Users manual: RoboCup soccer server manual for soccer server version 7.07 and later. Available at http://sourceforge.net/projects/sserver/
- Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1017-1023). Cambridge, MA: The MIT Press .
- Dean, T., Basye, K., & Shewchuk, J. (1992). Reinforcement learning for planning and control. In S. Minton (Ed.), Machine learning methods for planning and scheduling (pp. 67-92). San Mateo, CA: Morgan Kaufmann .
- Dietterich, T. G. (2000). Hierarchical reinforcement learning with the maxq value function decomposition . Journal of Artificial Intelligence Research, 13, 227-303 .
- Gordon, G. (2001). Reinforcement learning with function approximation converges to a region. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 1040-1046). Cambridge, MA: The MIT Press .
- Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In T. G. Dietterich, S. Becker & Z. Ghahramani (Eds.) Advances in neural information processing systems (Vol. 14, pp. 1523-1530). Cambridge, MA: MIT Press .
- Hsu, W. H., & Gustafson, S. M. (2002). Genetic programming and multi-agent layered learning by reinforcements . In W. B. Langdonet. al. (Eds.), Genetic and Evolutionary Computation Conference (New York) (pp. 764-771 ). San Mateo, CA: Morgan Kaufmann.
- Kitano, H., Tambe, M., Stone, P., Veloso, M., Coradeschi, S., Osawa, E., Matsubara, H., Noda, I., & Asada, M. (1997). The RoboCup synthetic agent challenge 97 . In M. E. Pollack (Ed.) Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (pp. 24-29 ). San Francisco, CA: Morgan Kaufmann.
- Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs . In T. Dean (Ed.), Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99) pp ( 1332-1339 ). Morgan Kaufmann.
- Lin, C.-S., & Kim, H. (1991). CMAC-based adaptive critic self-learning control. In IEEE Transactions on Neural Networks, 2, (pp. 530-533). IEEE .
- Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1998). Co-evolving soccer softbot team coordination with genetic programming. In Kitano, H. (Ed.), RoboCup-97: Robot soccer world cup I (pp. 398-411). Berlin: Springer .
- Marsella, S., Tambe, M., Adibi, J., Al-Onaizan, Y., Kaminka, G. A., & Muslea, I. (2001). Experiences acquired in the design of RoboCup teams: A comparison of two fielded teams . Autonomous Agents and Multi-Agent Systems, 4(2), 115-129 .[CrossRef]
- McAllester, D., & Stone, P. (2001). Keeping the ball from CMUnited-99. In P. Stone, T. Balch, & G. Kraetszchmar (Eds.), RoboCup-2000: Robot soccer world cup IV. (pp. 333-338) Berlin: Springer .
- Noda, I., Matsubara, H., & Hiraki, K. (1996). Learning cooperative behavior in multi-agent environment: A case study of choice of play-plans in soccer . In N. Y. Foo & R. Gobel (Eds.), PRICAI96: Topics in Artificial Intelligence (Proceedings of the Fourth Pacific Rim International Conference on Artificial Intelligence) (pp. 570-579 ) (Cairns, Australia) Springer.
- Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: A tool for research on multiagent systems . Applied Artificial Intelligence, 12, 233-250 .
- Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 16) (pp. 1595-1602) Cambridge, MA: The MIT Press .
- Pietro, A. D., While, L., & Barone, L. (2002). Learning in Robo-Cup keepaway using evolutionary algorithms . In W. B. Langdon et. al., GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1065-1072 ). New York: Morgan Kaufmann.
- Puterman, M. L. (1994). Markov decision problems. New York: Wiley .
- Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann .
- Riedmiller, M., Merke, A., Meier, D., Hoffman, A., Sinner, A., Thate, O., & Ehrmann, R. (2001). Karlsruhe brainstormersa reinforcement learning approach to robotic soccer. In P. Stone, T. Balch, & G. Kraetszchmar, (Eds.), Robo-Cup-2000: Robot soccer world cup IV. (pp. 367-372) Berlin: Springer .
- Riedmiller, M., Merke, A., Hoffmann, A., Withopf, D., Nickschas, M., & Zacharias, F. (2003). Brainstormers 2002 team description. In G. A. Kaminka, P. U. Lima, & R. Rojas (Eds.), RoboCup-2002: Robot soccer world cup VI. Berlin: Springer .
- Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report CUED/FINFENG/TR 166, Cambridge University Engineering Department .
- Stone, P. (2000). Layered learning in multiagent systems: A winning approach to robotic soccer. Cambridge, MA: The MIT Press .
- Stone, P., & McAllester, D. (2001). An architecture for action selection in robotic soccer . In E. Andre, S. Sen, C. Frasson & J. P. Muller (Eds.) Proceedings of the Fifth International Conference on Autonomous Agents (pp. 316-323 ). New York, NY: ACM Press.
- Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer . In C. E. Brodley & A. P. Danyluk (Eds.) Proceedings of the Eighteenth International Conference on Machine Learning (pp. 537-544 ). San Francisco, CA: Morgan Kaufmann.
- Stone, P., & Sutton, R. S. (2002). Keepaway soccer: A machine learning testbed. In A. Birk, S. Coradeschi, & S. Tadokoro (Eds.), RoboCup-2001: Robot soccer world cup V (pp. 214-223). Berlin: Springer .
- Stone, P., Sutton, R. S., & Singh, S. (2001). Reinforcement learning for 3 vs. 2 keepaway. In P. Stone, T. Balch, & G. Kraetszchmar (Eds.), RoboCup-2000: Robot soccer world cup IV (pp. 249-258). Berlin: Springer .
- Stone, P., & Veloso, M. (1999). Team-partitioned, opaque-transition reinforcement learning. In M. Asada, & H. Kitano (Eds.), RoboCup-98: Robot soccer world cup II (pp. 261-272). Berlin: Springer Verlag .
- Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems Vol. 8, (pp. 1038-1044), Cambridge, MA: The MIT Press .
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: The MIT Press .
- Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K. R. Muller (Eds.) Advances in neural information processing systems, (Vol. 12, pp. 1057-1063). Cambridge, MA: The MIT Press .
- Sutton, R., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning . Artificial Intelligence, 112, 181-211 .[CrossRef]
- Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents . In Proceedings of the Tenth International Conference on Machine Learning (pp. 330-337 ). Morgan Kaufmann.
- Taylor, M. E., & Stone, P. (2005). Behavior transfer for value-function-based reinforcement learning . In V. Digman, S. Koenig, S. Kraus, M. P. Sigh & M. Wooldridge (Eds.), The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems. (pp. 53-59 ). New York, NY: ACM Press.
- Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play . Neural Computation, 6(2), 215-219 .
- Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation . IEEE Transactions on Automatic Control, 42, 674-690 .[CrossRef]
- Uchibe, E. (1999). Cooperative behavior acquisition by learning and evolution in a multi-agent environment for mobile robots. Ph.D. thesis, Osaka University.
- Uchibe, E., Yanase, M., & Asada, M. (2001). Evolution for behavior selection accelerated by activation/termination constraints . In H. Beyer, E. Canth-Puz, D. Goldberg, Parmee, L. Spector & D. Whitley (Eds.), Proceedings of the Genetic and Evolutionary Computation Conference(pp. 1122-1129 ). Morgan Kaufmann.
- Veloso, M., Stone, P., & Bowling, M. (1999). Anticipation as a key for collaboration in a team of agents: A case study in robotic soccer . In P. S. Schenker & G. T. McKee (Eds.) Proceedings of SPIE Sensor Fusion and Decentralized Control in Robotic Systems II (Vol. 3839) (Boston, MA). Belligman, W.A: SPIE.
- Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis, Kings College, Cambridge.
- Whiteson, S., & Stone, P. (2003). Concurrent layered learning . In J. S. Rosenchein, T. Sandholm, M. Woodridge & M. Yokoo (Eds.), Second International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 193-200 ). New York, NY: ACM Press.
Adaptive Behavior, Vol. 13, No. 3,
165-188 (2005)
DOI: 10.1177/105971230501300301

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
This article has been cited by other articles:

|
 |

|
 |
 
J. L. Krichmar
The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World
Adaptive Behavior,
December 1, 2008;
16(6):
385 - 399.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Whiteson, M. E. Taylor, and P. Stone
Empirical Studies in Action Selection with Reinforcement Learning
Adaptive Behavior,
March 1, 2007;
15(1):
33 - 50.
[Abstract]
[PDF]
|
 |
|
|
|