Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Sign In to gain access to subscriptions and/or personal tools.
Adaptive Behavior
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Baird, L. C.
Right arrow Articles by Klopf, A. H.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

A Hierarchical Network of Provably Optimal Learning Control Systems: Extensions of the Associative Control Process (ACP) Network

Leemon C. Baird, III

Wright Laboratory

A. Harry Klopf

Wright Laboratory

An associative control process (ACP) network is a learning control system that can reproduce a variety of animal learning results from classical and instrumental conditioning experiments (Klopf, Morgan, & Weaver, 1993; see also the article, 'A Hierarchical Network of Control Systems that Learn"). The ACP networks proposed and tested by Klopf, Morgan, and Weaver are not guaranteed, however, to learn optimal policies for maximizing reinforcement. Optimal behavior is guaranteed for a reinforcement learning system such as Q-learning (Watkins, 1989), but simple Q-learning is incapable of reproducing the animal learning results that ACP networks reproduce. We propose two new models that reproduce the animal learning results and are provably optimal. The first model, the modified ACP network, embodies the smallest number of changes necessary to the ACP network to guarantee that optimal policies will be learned while still reproducing the animal learning results. The second model, the single-layer ACP network, embodies the smallest number of changes necessary to Q-learning to guarantee that it reproduces the animal learning results while still learning optimal policies. We also propose a hierarchical network architecture within which several reinforcement learning systems (e.g., Q-learning systems, single-layer ACP networks, or any other learning controller) can be combined in a hierarchy. We implement the hierarchical network architecture by combining four of the single-layer ACP networks to form a controller for a standard inverted pendulum dynamic control problem. The hierarchical controller is shown to learn more reliably and more than an order of magnitude faster than either the single-layer ACP network or the Barto, Sutton, and Anderson (1983) learning controller for the benchmark problem.

Key Words: optimal control • learning • Q-learning • hierarchical control

Adaptive Behavior, Vol. 1, No. 3, 321-352 (1993)
DOI: 10.1177/105971239300100303


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Adaptive BehaviorHome page
M. E. Harmon, L. C. Baird III, and A. H. Klopf
Reinforcement Learning Applied to a Differential Game
Adaptive Behavior, September 1, 1995; 4(1): 3 - 28.
[Abstract] [PDF]