Abstract
We study systems of multiple reinforcement learners. Each leads a single life lasting from birth to unknown death. In between it tries to accelerate reward intake. Its actions and learning algorithms consume part of its life — computational resources are limited. The expected reward for a certain behavior may change over time, partly because of other learners' actions and learning processes. For such reasons, previous approaches to multi-agent reinforcement learning are either limited or heuristic by nature. Using a simple backtracking method called the “success-story algorithm”, however, at certain times called evaluation points each of our learners is able to establish success histories of behavior modifications: it simply undoes all those of the previous modifications that were not empirically observed to trigger lifelong reward accelerations (computation time for learning and testing is taken into account). Then it continues to act and learn until the next evaluation point. Success histories can be enforced despite interference from other learners. The principle allows for plugging in a wide variety of learning algorithms. An experiment illustrates its feasibility.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
A. G. Barto. Connectionist approaches for control. Technical Report COINS 89-89, University of Massachusetts, Amherst MA 01003, 1989.
D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London, 1985.
M. Boddy and T. L. Dean. Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67:245–285, 1994.
J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY, 1989.
R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(2), 1996.
T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 345–352. MIT Press, Cambridge MA, 1995.
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.
P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986.
M.L. Littman, A.R. Cassandra, and L.P. Kaelbling. Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 362–370. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
R. A. McCallum. Overcoming incomplete perception with utile distinction memory. In Machine Learning: Proceedings of the Tenth International Conference. Morgan Kaufmann, Amherst, MA, 1993.
M. B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712, August 1994.
S. Russell and E. Wefald. Principles of Metareasoning. Artificial Intelligence, 49:361–395, 1991.
J. Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500–506. San Mateo, CA: Morgan Kaufmann, 1991.
J. Schmidhuber. A general method for multi-agent learning in unrestricted environments. In Adaptation, Co-evolution and Learning in Multiagent Systems, Technical Report SS-96-01, pages 84–87. American Association for Artificial Intelligence, Menlo Park, Calif., 1996.
J. Schmidhuber. A general method for incremental self-improvement and multiagent learning in unrestricted environments. In X. Yao, editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore, 1997. In press.
J. Schmidhuber, J. Zhao, and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, 1996.
S. Sen, editor. Adaptation, Co-evolution and Learning in Multiagent Systems, Papers from the 1996 AAAI Symposium, Technical Report SS-96-01. American Association for Artificial Intelligence, Menlo Park, Calif., 1996.
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.
C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279–292, 1992.
G. Weiss, editor. Learning in Distributed Artificial Intelligence Systems, ECAI-96 Workshop Notes, Budapest, Hungary. 1996.
G. Weiss and S. Sen, editors. Adaption and Learning in Multi-Agent Systems. LNAI 1042, Springer, 1996.
S.D. Whitehead. Reinforcement Learning for the adaptive control of perception and action. PhD thesis, University of Rochester, February 1992.
M.A. Wiering and J. Schmidhuber. Solving POMDPs with Levin search and EIRA. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534–542. Morgan Kaufmann Publishers, San Francisco, CA, 1996.
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
J. Zhao and J. Schmidhuber. Incremental self-improvement for life-time multi-agent reinforcement learning. In Pattie Maes, Maja Mataric, Jean-Arcady Meyer, Jordan Pollack, and Stewart W. Wilson, editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516–525. MIT Press, Bradford Books, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmidhuber, J., Zhao, J. (1997). Multi-agent learning with the success-story algorithm. In: Weiß, G. (eds) Distributed Artificial Intelligence Meets Machine Learning Learning in Multi-Agent Environments. LDAIS LIOME 1996 1996. Lecture Notes in Computer Science, vol 1221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62934-3_43
Download citation
DOI: https://doi.org/10.1007/3-540-62934-3_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62934-4
Online ISBN: 978-3-540-69050-4
eBook Packages: Springer Book Archive