Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Sign In to gain access to subscriptions and/or personal tools.
Adaptive Behavior
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Pipe, A. G.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

An Architecture for Learning "Potential Field" Cognitive Maps with an Application to Mobile Robotics

Anthony G. Pipe

Intelligent Autonomous Systems Engineering Laboratory, University of the West of England

The learning architecture described in this article autonomously acquires a topographical (metric) map that encodes a measure of "value" for xy-Cartesian locations in an environment. There are two reasons for the creation of low value areas. Direct negative reinforcement from the environment will result from the robot discovering obstacles or having other "unpleasant" experiences.- The other source of negative reinforcement is internally generated by the learning algorithm, as it identifies regions that are a long distance away from the "pleasant" places in the environment. Conversely example "pleasant" places, where positive environmental reward is received, might be energy-charging sites or simply locations that the robot should visit in executing its daily tasks. In general what the robot learns is a map of "motivational" tendencies, or "expectancies". In such a map, the value attached to a place comes to reflect a balance between the good and bad rewards attainable from that position. When the Temporal Difference learning part of the architecture is turned on, that measure of value comes to include an estimate of how far, in travel time, it is to positive reinforcement. The architecture is loosely based on an Adaptive Heuristic Critic structure. Exploration of a continuous-valued search space is conducted by an Evolution Strategy, tuned for fast and approximate optimization. Knowledge acquired autonomously from this exploration is stored in a Radial Basis Function (RBF) neural network. Inherent features of this neural network type lead to the creation of a "potential field" structure that exerts appetitive and aversive "forces"on the robot as it moves around in the environment. The results of simulation experiments are presented, with a view to illustrating the strengths and weaknesses of the architecture. The map building architecture proposed here is intended to form part of an overall navigational system. In future work it will be integrated with a self-localization algorithm, landmark-based topological mapping, and a reactive system for dealing with local dynamics in the environment.

Key Words: Cognitive Maps • Mobile Robotics • Potential Field Maps • Adaptive Heuristic Critic Reinforcement Learning • Evolutionary Computation • Neural Networks

Adaptive Behavior, Vol. 8, No. 2, 173-203 (2000)
DOI: 10.1177/105971230000800205


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?