期刊文献+

Multiagent reinforcement learning through merging individually learned value functions

Multiagent reinforcement learning through merging individually learned value functions
在线阅读 下载PDF
导出
摘要 In cooperative multiagent systems, to learn the optimal policies of multiagents is very difficult. As the numbers of states and actions increase exponentially with the number of agents, their action policies become more intractable. By learning these value functions, an agent can learn its optimal action policies for a task. If a task can be decomposed into several subtasks and the agents have learned the optimal value functions for each subtask, this knowledge can be helpful for the agents in learning the optimal action policies for the whole task when they are acting simultaneously. When merging the agents’ independently learned optimal value functions, a novel multiagent online reinforcement learning algorithm LU-Q is proposed. By applying a transformation to the individually learned value functions, the constraints on the optimal value functions of each subtask are loosened. In each learning iteration process in algorithm LU-Q, the agents’ joint action set in a state is processed. Some actions of that state are pruned from the available action set according to the defined multiagent value function in LU-Q. As the items of the available action set of each state are reduced gradually in the iteration process of LU-Q, the convergence of the value functions is accelerated. LU-Q’s effectiveness, soundness and convergence are analyzed, and the experimental results show that the learning performance of LU-Q is better than the performance of standard Q learning.
出处 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第3期346-350,共5页 哈尔滨工业大学学报(英文版)
关键词 reinforcement learning MULTIAGENT value function 计算机技术 专家系统 知识工程 评价函数
  • 相关文献

参考文献6

  • 1LITTMAN M L.Markov games as a framework for multiagent reinforcement learning[].th ICML.1994
  • 2KAELBLING L,LITTMAN M L,MOORE A W.Reinforcement learning: A survey[].J of Artificial Intelligence Research.1996
  • 3GHAVAMZADEH M,MAHADEVAN S.A multiagent reinforcement learning algorithm by dynam ically merging markov decision processes[].Proceedings of st International Joint Conference on Autonomous Agents& Multiagent Systems (AAMAS - ).2002
  • 4HU J,WELLMAN M P.Nash Q - learning for generalsum stochastic games[].J of Machine Learning Research.2003
  • 5BOUTILIER C.Sequential optimality and coordination in multiagent systems[].th International Joint Conference on Artificial Intelligence.1999
  • 6DOETTEROCJ T G.H ierarchical reinforcement learning with theMAXQ value function decomposition[].Journal of Artificial Organs.2000

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部