Skip to main content

Multi-agent learning with the success-story algorithm

  • Learning, Cooperation and Competition
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1221))

Abstract

We study systems of multiple reinforcement learners. Each leads a single life lasting from birth to unknown death. In between it tries to accelerate reward intake. Its actions and learning algorithms consume part of its life — computational resources are limited. The expected reward for a certain behavior may change over time, partly because of other learners' actions and learning processes. For such reasons, previous approaches to multi-agent reinforcement learning are either limited or heuristic by nature. Using a simple backtracking method called the “success-story algorithm”, however, at certain times called evaluation points each of our learners is able to establish success histories of behavior modifications: it simply undoes all those of the previous modifications that were not empirically observed to trigger lifelong reward accelerations (computation time for learning and testing is taken into account). Then it continues to act and learn until the next evaluation point. Success histories can be enforced despite interference from other learners. The principle allows for plugging in a wide variety of learning algorithms. An experiment illustrates its feasibility.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. G. Barto. Connectionist approaches for control. Technical Report COINS 89-89, University of Massachusetts, Amherst MA 01003, 1989.

    Google Scholar 

  2. D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London, 1985.

    Google Scholar 

  3. M. Boddy and T. L. Dean. Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67:245–285, 1994.

    Google Scholar 

  4. J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY, 1989.

    Google Scholar 

  5. R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(2), 1996.

    Google Scholar 

  6. T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 345–352. MIT Press, Cambridge MA, 1995.

    Google Scholar 

  7. L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.

    Google Scholar 

  8. P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986.

    Google Scholar 

  9. M.L. Littman, A.R. Cassandra, and L.P. Kaelbling. Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 362–370. Morgan Kaufmann Publishers, San Francisco, CA, 1995.

    Google Scholar 

  10. R. A. McCallum. Overcoming incomplete perception with utile distinction memory. In Machine Learning: Proceedings of the Tenth International Conference. Morgan Kaufmann, Amherst, MA, 1993.

    Google Scholar 

  11. M. B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712, August 1994.

    Google Scholar 

  12. S. Russell and E. Wefald. Principles of Metareasoning. Artificial Intelligence, 49:361–395, 1991.

    Google Scholar 

  13. J. Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500–506. San Mateo, CA: Morgan Kaufmann, 1991.

    Google Scholar 

  14. J. Schmidhuber. A general method for multi-agent learning in unrestricted environments. In Adaptation, Co-evolution and Learning in Multiagent Systems, Technical Report SS-96-01, pages 84–87. American Association for Artificial Intelligence, Menlo Park, Calif., 1996.

    Google Scholar 

  15. J. Schmidhuber. A general method for incremental self-improvement and multiagent learning in unrestricted environments. In X. Yao, editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore, 1997. In press.

    Google Scholar 

  16. J. Schmidhuber, J. Zhao, and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, 1996.

    Google Scholar 

  17. S. Sen, editor. Adaptation, Co-evolution and Learning in Multiagent Systems, Papers from the 1996 AAAI Symposium, Technical Report SS-96-01. American Association for Artificial Intelligence, Menlo Park, Calif., 1996.

    Google Scholar 

  18. R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.

    Google Scholar 

  19. C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279–292, 1992.

    Google Scholar 

  20. G. Weiss, editor. Learning in Distributed Artificial Intelligence Systems, ECAI-96 Workshop Notes, Budapest, Hungary. 1996.

    Google Scholar 

  21. G. Weiss and S. Sen, editors. Adaption and Learning in Multi-Agent Systems. LNAI 1042, Springer, 1996.

    Google Scholar 

  22. S.D. Whitehead. Reinforcement Learning for the adaptive control of perception and action. PhD thesis, University of Rochester, February 1992.

    Google Scholar 

  23. M.A. Wiering and J. Schmidhuber. Solving POMDPs with Levin search and EIRA. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534–542. Morgan Kaufmann Publishers, San Francisco, CA, 1996.

    Google Scholar 

  24. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.

    Google Scholar 

  25. J. Zhao and J. Schmidhuber. Incremental self-improvement for life-time multi-agent reinforcement learning. In Pattie Maes, Maja Mataric, Jean-Arcady Meyer, Jordan Pollack, and Stewart W. Wilson, editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516–525. MIT Press, Bradford Books, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gerhard Weiß

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schmidhuber, J., Zhao, J. (1997). Multi-agent learning with the success-story algorithm. In: Weiß, G. (eds) Distributed Artificial Intelligence Meets Machine Learning Learning in Multi-Agent Environments. LDAIS LIOME 1996 1996. Lecture Notes in Computer Science, vol 1221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62934-3_43

Download citation

  • DOI: https://doi.org/10.1007/3-540-62934-3_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62934-4

  • Online ISBN: 978-3-540-69050-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics