Multiagent reinforcement learning through merging individually learned value functions

Multiagent reinforcement learning through merging individually learned value functions

在线阅读下载PDF

导出

摘要 In cooperative multiagent systems, to learn the optimal policies of multiagents is very difficult. As the numbers of states and actions increase exponentially with the number of agents, their action policies become more intractable. By learning these value functions, an agent can learn its optimal action policies for a task. If a task can be decomposed into several subtasks and the agents have learned the optimal value functions for each subtask, this knowledge can be helpful for the agents in learning the optimal action policies for the whole task when they are acting simultaneously. When merging the agents’ independently learned optimal value functions, a novel multiagent online reinforcement learning algorithm LU-Q is proposed. By applying a transformation to the individually learned value functions, the constraints on the optimal value functions of each subtask are loosened. In each learning iteration process in algorithm LU-Q, the agents’ joint action set in a state is processed. Some actions of that state are pruned from the available action set according to the defined multiagent value function in LU-Q. As the items of the available action set of each state are reduced gradually in the iteration process of LU-Q, the convergence of the value functions is accelerated. LU-Q’s effectiveness, soundness and convergence are analyzed, and the experimental results show that the learning performance of LU-Q is better than the performance of standard Q learning.

作者张化祥黄上腾

机构地区 Information and Management School Dept. of Computer Science and Engineering

出处《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第3期346-350,共5页 哈尔滨工业大学学报（英文版）

关键词 reinforcement learning MULTIAGENT value function 计算机技术专家系统知识工程评价函数

分类号 TP182 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1LITTMAN M L.Markov games as a framework for multiagent reinforcement learning[].th ICML.1994
2KAELBLING L,LITTMAN M L,MOORE A W.Reinforcement learning: A survey[].J of Artificial Intelligence Research.1996
3GHAVAMZADEH M,MAHADEVAN S.A multiagent reinforcement learning algorithm by dynam ically merging markov decision processes[].Proceedings of st International Joint Conference on Autonomous Agents& Multiagent Systems (AAMAS - ).2002
4HU J,WELLMAN M P.Nash Q - learning for generalsum stochastic games[].J of Machine Learning Research.2003
5BOUTILIER C.Sequential optimality and coordination in multiagent systems[].th International Joint Conference on Artificial Intelligence.1999
6DOETTEROCJ T G.H ierarchical reinforcement learning with theMAXQ value function decomposition[].Journal of Artificial Organs.2000

1李珺,潘启树.A multiagent reinforcement learning approach based on different states[J].Journal of Harbin Institute of Technology(New Series),2010,17(3):419-423.
2Zhen Zhang,Dongbin Zhao.Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs[J].IEEE/CAA Journal of Automatica Sinica,2014,1(3):248-256. 被引量：3
3张化祥,黄上腾,乐嘉锦.Optimal Response Learning and Its Convergence in Multiagent Domains[J].Journal of Donghua University(English Edition),2005,22(3):116-119.
4Frank L. Lewis,Zhong-Ping Jiang,Tengfei Liu.Call for papers Special issue on Learning and control in cooperative multi-agent systems[J].Control Theory and Technology,2014,12(2):215-216.
5Frank L. Lewis,Zhong-Ping Jiang,Tengfei Liu.Special issue on learning and control in cooperative multi-agent systems[J].Control Theory and Technology,2015,13(1):44-44.
6张化祥,黄上腾.The Cooperative Multi-agent Learning with Random Reward Values[J].Journal of Shanghai Jiaotong university(Science),2005,10(2):147-150.
7陈德军,黄梅,周祖德.基于Multi-agent的虚拟企业安全性研究[J].武汉大学学报（工学版）,2006,39(5):137-142. 被引量：1
8Special issue on Learning and control in cooperative multi-agent systems[J].Control Theory and Technology,2014,12(1):95-96.
9穆瑞辉.运用Multi-Agent技术设计数据挖掘系统的研究[J].数字技术与应用,2013,31(2):81-81.
10SLOWINSKI Roman,GRECO Salvatore2,FIGUEIRA José Rui,MOUSSEAU Vincent.Robust ranking of multi-criteria alternatives using value functions compatible with holistic preference information[J].重庆邮电大学学报（自然科学版）,2008,20(3):324-334.

Journal of Harbin Institute of Technology(New Series)

2005年第3期

浏览历史

内容加载中请稍等...

Multiagent reinforcement learning through merging individually learned value functions

参考文献6

相关作者

相关机构

相关主题

浏览历史