Multi-agent learning with the success-story algorithm

Schmidhuber, Jürgen; Zhao, Jieyu

doi:10.1007/3-540-62934-3_43

Multi-agent learning with the success-story algorithm

Jürgen Schmidhuber¹ &
Jieyu Zhao¹

Learning, Cooperation and Competition
Conference paper
First Online: 01 January 2005

268 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1221))

Abstract

We study systems of multiple reinforcement learners. Each leads a single life lasting from birth to unknown death. In between it tries to accelerate reward intake. Its actions and learning algorithms consume part of its life — computational resources are limited. The expected reward for a certain behavior may change over time, partly because of other learners' actions and learning processes. For such reasons, previous approaches to multi-agent reinforcement learning are either limited or heuristic by nature. Using a simple backtracking method called the “success-story algorithm”, however, at certain times called evaluation points each of our learners is able to establish success histories of behavior modifications: it simply undoes all those of the previous modifications that were not empirically observed to trigger lifelong reward accelerations (computation time for learning and testing is taken into account). Then it continues to act and learn until the next evaluation point. Success histories can be enforced despite interference from other learners. The principle allows for plugging in a wide variety of learning algorithms. An experiment illustrates its feasibility.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

A. G. Barto. Connectionist approaches for control. Technical Report COINS 89-89, University of Massachusetts, Amherst MA 01003, 1989.
Google Scholar
D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London, 1985.
Google Scholar
M. Boddy and T. L. Dean. Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67:245–285, 1994.
Google Scholar
J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY, 1989.
Google Scholar
R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(2), 1996.
Google Scholar
T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 345–352. MIT Press, Cambridge MA, 1995.
Google Scholar
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.
Google Scholar
P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986.
Google Scholar
M.L. Littman, A.R. Cassandra, and L.P. Kaelbling. Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 362–370. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
Google Scholar
R. A. McCallum. Overcoming incomplete perception with utile distinction memory. In Machine Learning: Proceedings of the Tenth International Conference. Morgan Kaufmann, Amherst, MA, 1993.
Google Scholar
M. B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712, August 1994.
Google Scholar
S. Russell and E. Wefald. Principles of Metareasoning. Artificial Intelligence, 49:361–395, 1991.
Google Scholar
J. Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500–506. San Mateo, CA: Morgan Kaufmann, 1991.
Google Scholar
J. Schmidhuber. A general method for multi-agent learning in unrestricted environments. In Adaptation, Co-evolution and Learning in Multiagent Systems, Technical Report SS-96-01, pages 84–87. American Association for Artificial Intelligence, Menlo Park, Calif., 1996.
Google Scholar
J. Schmidhuber. A general method for incremental self-improvement and multiagent learning in unrestricted environments. In X. Yao, editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore, 1997. In press.
Google Scholar
J. Schmidhuber, J. Zhao, and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, 1996.
Google Scholar
S. Sen, editor. Adaptation, Co-evolution and Learning in Multiagent Systems, Papers from the 1996 AAAI Symposium, Technical Report SS-96-01. American Association for Artificial Intelligence, Menlo Park, Calif., 1996.
Google Scholar
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
Google Scholar
C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279–292, 1992.
Google Scholar
G. Weiss, editor. Learning in Distributed Artificial Intelligence Systems, ECAI-96 Workshop Notes, Budapest, Hungary. 1996.
Google Scholar
G. Weiss and S. Sen, editors. Adaption and Learning in Multi-Agent Systems. LNAI 1042, Springer, 1996.
Google Scholar
S.D. Whitehead. Reinforcement Learning for the adaptive control of perception and action. PhD thesis, University of Rochester, February 1992.
Google Scholar
M.A. Wiering and J. Schmidhuber. Solving POMDPs with Levin search and EIRA. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534–542. Morgan Kaufmann Publishers, San Francisco, CA, 1996.
Google Scholar
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
Google Scholar
J. Zhao and J. Schmidhuber. Incremental self-improvement for life-time multi-agent reinforcement learning. In Pattie Maes, Maja Mataric, Jean-Arcady Meyer, Jordan Pollack, and Stewart W. Wilson, editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516–525. MIT Press, Bradford Books, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, CH-6900, Lugano, Switzerland
Jürgen Schmidhuber & Jieyu Zhao

Authors

Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar
Jieyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Gerhard Weiß

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmidhuber, J., Zhao, J. (1997). Multi-agent learning with the success-story algorithm. In: Weiß, G. (eds) Distributed Artificial Intelligence Meets Machine Learning Learning in Multi-Agent Environments. LDAIS LIOME 1996 1996. Lecture Notes in Computer Science, vol 1221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62934-3_43

Download citation

DOI: https://doi.org/10.1007/3-540-62934-3_43
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62934-4
Online ISBN: 978-3-540-69050-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics