采用资格迹的神经网络学习控制算法被引量：4

Learning to control by neural networks using eligibility traces

在线阅读下载PDF

导出

摘要强化学习是解决自适应问题的重要方法,被广泛地应用于连续状态下的学习控制,然而存在效率不高和收敛速度较慢的问题.在运用反向传播(back propagation,BP)神经网络基础上,结合资格迹方法提出一种算法,实现了强化学习过程的多步更新.解决了输出层的局部梯度向隐层节点的反向传播问题,从而实现了神经网络隐层权值的快速更新,并提供一个算法描述.提出了一种改进的残差法,在神经网络的训练过程中将各层权值进行线性优化加权,既获得了梯度下降法的学习速度又获得了残差梯度法的收敛性能,将其应用于神经网络隐层的权值更新,改善了值函数的收敛性能.通过一个倒立摆平衡系统仿真实验,对算法进行了验证和分析.结果显示,经过较短时间的学习,本方法能成功地控制倒立摆,显著提高了学习效率. Reinforcement learning is an important approach to solve the adaptive learning control problems in continu- ous state space. However, it is bedeviled by its low learning efficiency and low convergence rate. In order to eliminate those deficiencies, based on back propagation （BP） neural networks and eligibility traces, we propose a learning algorithm with a complete description to achieve the multi-step updates in the process of reinforced learning to realize the counter prop- agation of the local gradient from output layer nodes to hidden layer nodes; thus, rapidly adjusting the weights of hidden layers. During the training processes of neural networks, a modified residual method is employed to optimize the weights in each layer by linear combination, achieving the rapid learning rate of the direct gradient method as well as the desired convergence properties of the residual gradient method. Applying this method to update the weights of hidden layers in a neural network, we improve the convergence properties of value functions. A cart-pole system is adopted for testing the application results of the above mentioned algorithms. Simulation results show that all our algorithms can successfully achieve the control for the cart-pole balancing system and improve the learning efficiency significantly.

作者刘智斌曾晓勤徐彦禹继国

机构地区曲阜师范大学信息科学与工程学院河海大学计算机与信息学院南京农业大学信息科技学院

出处《控制理论与应用》 EI CAS CSCD 北大核心 2015年第7期887-894,共8页 Control Theory & Applications

基金国家自然科学基金项目(61403205 61373027 60117089) 曲阜师范大学实验室开放基金项目(sk201415)资助~~

关键词强化学习神经网络资格迹倒立摆梯度下降 reinforcement learning neural networks eligibility traces cart-pole system gradient descent

分类号 TP183 [自动化与计算机技术—控制理论与控制工程] TP13 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献27

1SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge Massachusetts: MIT Press, 1998.
2LIU C, XU X, HU D. Multiobjective reinforcement learning: a com-prehensive overview[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2013, 99(4): 1 - 13.
3WIERING M, OTTERLO M V. Reinforcement Learning State of the Art[M]. Berlin: Springer-Verlag, 2012, 10(3): 325 - 331.
4戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595-1600. 被引量：2
5LUCIAN B, ROBERT B, BART D S. Reinforcement Learning and Dynamic Programming Using Function Approximators[M]. New York: CRC Press, 2010.
6DRIES S VAN DEN, WIERING M A. Neural-fitted TD-Ieaflearning for playing othello with structured neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(11): 1701 - 1713.
7SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988,3(1): 9 - 44.
8DAYAN P, SEJNOWSKI T J. TD().) converges with probability 1[J]. Machine Learning, 1994, 14(1): 295 - 301.
9刘智斌,曾晓勤.基于路径引导知识启发的强化学习方法[J].四川大学学报（工程科学版）,2012,44(5):136-142. 被引量：4
10MIROLLI M, SANTUCCI V G, BALDASSARRE G. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study[J]. Neural Networks, 2013, 39(3): 40 - 51.

二级参考文献34

1苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量：9
2沈晶,顾国昌,刘海波.分层强化学习中的动态分层方法研究[J].小型微型计算机系统,2007,28(2):287-291. 被引量：1
3KAELBLING L P, LITTMAN M L. Reinforcement learning: a sur- vey[J]. Journal ofArtificiallntelligence Research, 1996, 4(1): 237- 285.
4STRENS M. A Bayesian framework for reinforcement learning[C] //Proceeedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000:943 -950.
5SUTTON R S, PRECUP D, SINGH S. Between MDPs and Semi- MDPs: a framework for temporal abstraction in reinforcement learn- ing[J]. Artificial Intelligence, 1999, 112(1): 181 - 211.
6PARR R E. Hierarchical control and learning for Markov decision processes[D]. Berkeley, CA: University of California, 1998.
7DIETTERICH T G, Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelli- gence Research, 2000, 13(1): 227 - 303.
8HENGST B. Discovering hierarchy in reinforcement learning with HEXQ[C]//Proceedings of the Nineteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2002:243 - 250.
9JONG N K, STONE E Hierarchical model-based reinforcement learning: R-MAX + MAXQ[C]//Proceedings of the 25th Interna- tional Conference on Machine Learning. New York: ACM, 2008: 432 - 439.
10DIUK C, STREHL A L, LITTMAN M L. A hierarchical approach to efficient reinforcement learning in deterministic domains[C]//Pro- ceedings of the 5th International Joint Conference on Autonomous Agents andMultiagent Systems. New York: ACM, 2006:313 - 319.

共引文献58

1窦春红,黄明键,王中华,王新江.倒立摆系统及其控制策略研究现状[J].中南大学学报（自然科学版）,2003,34(z1):96-99.
2周济,陈锋.基于强化神经网络的区域协调控制研究[J].电子技术（上海）,2010(9):20-22.
3蔡增威,刘德春,张晓华.一种基于鲁棒性设计的一阶倒立摆双闭环控制方法[J].自动化技术与应用,2004,23(4):11-15. 被引量：4
4王瑞霞,孙亮,阮晓刚.基于内部回归神经网络的强化学习[J].控制工程,2005,12(2):138-140. 被引量：4
5张玉梅,阮晓钢.一种生长型神经网络的倒立摆控制方案[J].微计算机信息,2005,21(11S):91-93. 被引量：1
6徐学军,伍铁斌,李文.基于混沌PID的倒立摆控制[J].湘潭师范学院学报（自然科学版）,2006,28(1):15-19. 被引量：1
7张涛,吴汉生.基于神经网络的强化学习算法实现倒立摆控制[J].计算机仿真,2006,23(4):298-300. 被引量：7
8王瑞霞,孙亮,阮晓钢.基于强化学习的二级倒立摆控制[J].计算机仿真,2006,23(4):305-308. 被引量：3
9刘丽,何华灿.倒立摆系统稳定控制之研究[J].计算机科学,2006,33(5):214-219. 被引量：13
10张玉梅,阮晓钢.基于一种生长型神经网络的倒立摆控制[J].计算机仿真,2006,23(5):288-292. 被引量：1

同被引文献34

1李国玉,孙以材,潘国峰,何平.基于BP网络的压力传感器信息融合[J].仪器仪表学报,2005,26(2):168-171. 被引量：27
2安军涛,程平.基于模糊神经网络的智能PID控制器研究与设计[D].武汉:武汉理工大学,2010.
3CHEN C H.Compensatory neural fuzzy networks with rulebas ed cooperative differential evolution for nonlinear system control[J].Nonlinear Dynamics,2014,75(1/2):355-366.
4CHANG W D.Recurrent neural network modeling combined with bilinear model structure[J].Neural Computing and Applications,2014,4(3/4):765-77.
5CHAND Y,ALAM A.Performance comparison of artificial neural networks learning algorithms and activation functions in predicting severity of autism[J].Network Modeling Analysis in Health Informatics and Bioinformatics,2015(1):1-23.
6TAN P N,STEINBACH M,KUMAR V.Introduction to Data Mining[M].Beijing:Machinery Industry Press,2010.
7RUSSELL S J,NORVIG P.Artificial Intelligence:A Modern Approach[M].Beijing:Tsinghua University Press,2013.
8STEFAN F,SCHWENKER F.Neural network ensembles in reinforcement learning[J].Neural Processing Letters,2015,41(1):55-69.
9CHU J,NIU Y D.A novel hybrid intelligent method for fault diafnosis of the complex system[J].International Journal of Information Technology,2016,9(3):331-340.
10蔡立,徐鑫.MATLAB神经网络工具箱在非线性系统建模中的应用[J].机电工程技术,2008,37(2):82-84. 被引量：4

引证文献4

1吕红芳,顾幸生.基于蚁群神经网络的两级信息融合算法[J].上海交通大学学报,2016,50(8):1323-1330. 被引量：17
2牛亚东,储健,李刚.一种新型淀粉含水量测量方法及仿真[J].天津职业技术师范大学学报,2016,26(3):30-33.
3魏倩,蔡远利.J_2项摄动影响下的大气层外弹道规划改进算法[J].控制理论与应用,2016,33(9):1245-1251. 被引量：4
4司彦娜,普杰信,孙力帆.近似强化学习算法研究综述[J].计算机工程与应用,2022,58(8):33-44. 被引量：6

二级引证文献27

1王天梦,王华,李海阳.面向补给任务的空间站共轨飞行器部署研究[J].载人航天,2017,23(5):582-588. 被引量：8
2胡乃平,耿同同,周艳平.基于LEACH协议的数据融合技术[J].计算机与现代化,2017(10):24-28. 被引量：1
3曾闵,郭秋梅,江虹.无线传感器网络数据健壮性传输设计[J].自动化仪表,2018,39(3):70-73. 被引量：1
4石峰.无线传感网络信息分段融合点提取方法仿真[J].计算机仿真,2018,35(4):262-265. 被引量：2
5李宁波,邵雷,王华吉,雷虎民,龙振国.临近空间拦截弹中制导弹道设计[J].固体火箭技术,2018,41(2):251-257. 被引量：2
6李兆亭,张洪波,郑伟.考虑J2项摄动的Lambert问题状态空间摄动解析近似法[J].航天控制,2018,36(4):52-58. 被引量：3
7靳紫辉,夏钰红.云计算下分布式大数据智能融合算法仿真[J].计算机仿真,2018,35(10):295-298. 被引量：8
8于秀丽,王旭坪.基于蚁群神经网络的泄漏特征融合算法研究[J].现代管理科学,2019,7(8):78-80. 被引量：1
9王哲,耿元芳,彭润亚.交互系统设计中分布式多平台信息融合仿真[J].计算机仿真,2019,36(8):427-430. 被引量：6
10周林,刘先省,方拥军,金勇.贝叶斯框架下基于凸优化的系统偏差估计方法[J].探测与控制学报,2019,41(4):92-97. 被引量：1

1孙羽,张汝波,徐东.强化学习中资格迹的作用[J].计算机工程,2002,28(5):128-129. 被引量：1
2傅启明,刘全,孙洪坤,高龙,李瑾,王辉.一种二阶TD Error快速Q(λ)算法[J].模式识别与人工智能,2013,26(3):282-292. 被引量：5
3王雪松,程玉虎,易建强,王炜强.基于Elman网络的非线性系统增强式学习控制[J].中国矿业大学学报,2006,35(5):653-657. 被引量：8
4刘珊中,朱邦太,邓兵.基于H_∞控制的倒摆平衡系统鲁棒性研究[J].电光与控制,2000,7(2):1-6. 被引量：4
5李红星,张丽萍,沙丽娟.倒立摆系统的鲁棒H_∞控制及仿真[J].高师理科学刊,2007,27(5):24-27.
6赵梅花,朱邦太.倒立摆平衡系统变结构控制器的设计[J].河南科技大学学报（自然科学版）,2003,24(2):79-82. 被引量：5
7张正华,戴磊,黎家文,陈航.基于RSSI的优化加权质心定位算法研究[J].电子设计工程,2013,21(7):171-173. 被引量：9
8沈智鹏,郭晨.带有资格迹的模糊CMAC控制仿真研究[J].系统仿真学报,2004,16(11):2604-2607.
9杨旭东,刘全,李瑾.一种基于资格迹的并行强化学习算法[J].苏州大学学报（自然科学版）,2012,28(1):26-33. 被引量：1
10王婷婷,丁世飞.基于资格迹的RBF非线性系统强化学习研究[J].小型微型计算机系统,2016,37(7):1508-1512. 被引量：1

控制理论与应用

2015年第7期

浏览历史

内容加载中请稍等...

采用资格迹的神经网络学习控制算法被引量：4

参考文献27

二级参考文献34

共引文献58

同被引文献34

引证文献4

二级引证文献27

相关作者

相关机构

相关主题

浏览历史

采用资格迹的神经网络学习控制算法 被引量：4

参考文献27

二级参考文献34

共引文献58

同被引文献34

引证文献4

二级引证文献27

相关作者

相关机构

相关主题

浏览历史

采用资格迹的神经网络学习控制算法被引量：4