期刊文献+

采用资格迹的神经网络学习控制算法 被引量:4

Learning to control by neural networks using eligibility traces
在线阅读 下载PDF
导出
摘要 强化学习是解决自适应问题的重要方法,被广泛地应用于连续状态下的学习控制,然而存在效率不高和收敛速度较慢的问题.在运用反向传播(back propagation,BP)神经网络基础上,结合资格迹方法提出一种算法,实现了强化学习过程的多步更新.解决了输出层的局部梯度向隐层节点的反向传播问题,从而实现了神经网络隐层权值的快速更新,并提供一个算法描述.提出了一种改进的残差法,在神经网络的训练过程中将各层权值进行线性优化加权,既获得了梯度下降法的学习速度又获得了残差梯度法的收敛性能,将其应用于神经网络隐层的权值更新,改善了值函数的收敛性能.通过一个倒立摆平衡系统仿真实验,对算法进行了验证和分析.结果显示,经过较短时间的学习,本方法能成功地控制倒立摆,显著提高了学习效率. Reinforcement learning is an important approach to solve the adaptive learning control problems in continu- ous state space. However, it is bedeviled by its low learning efficiency and low convergence rate. In order to eliminate those deficiencies, based on back propagation (BP) neural networks and eligibility traces, we propose a learning algorithm with a complete description to achieve the multi-step updates in the process of reinforced learning to realize the counter prop- agation of the local gradient from output layer nodes to hidden layer nodes; thus, rapidly adjusting the weights of hidden layers. During the training processes of neural networks, a modified residual method is employed to optimize the weights in each layer by linear combination, achieving the rapid learning rate of the direct gradient method as well as the desired convergence properties of the residual gradient method. Applying this method to update the weights of hidden layers in a neural network, we improve the convergence properties of value functions. A cart-pole system is adopted for testing the application results of the above mentioned algorithms. Simulation results show that all our algorithms can successfully achieve the control for the cart-pole balancing system and improve the learning efficiency significantly.
出处 《控制理论与应用》 EI CAS CSCD 北大核心 2015年第7期887-894,共8页 Control Theory & Applications
基金 国家自然科学基金项目(61403205 61373027 60117089) 曲阜师范大学实验室开放基金项目(sk201415)资助~~
关键词 强化学习 神经网络 资格迹 倒立摆 梯度下降 reinforcement learning neural networks eligibility traces cart-pole system gradient descent
  • 相关文献

参考文献27

  • 1SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge Massachusetts: MIT Press, 1998.
  • 2LIU C, XU X, HU D. Multiobjective reinforcement learning: a com-prehensive overview[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2013, 99(4): 1 - 13.
  • 3WIERING M, OTTERLO M V. Reinforcement Learning State of the Art[M]. Berlin: Springer-Verlag, 2012, 10(3): 325 - 331.
  • 4戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595-1600. 被引量:2
  • 5LUCIAN B, ROBERT B, BART D S. Reinforcement Learning and Dynamic Programming Using Function Approximators[M]. New York: CRC Press, 2010.
  • 6DRIES S VAN DEN, WIERING M A. Neural-fitted TD-Ieaflearning for playing othello with structured neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(11): 1701 - 1713.
  • 7SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988,3(1): 9 - 44.
  • 8DAYAN P, SEJNOWSKI T J. TD().) converges with probability 1[J]. Machine Learning, 1994, 14(1): 295 - 301.
  • 9刘智斌,曾晓勤.基于路径引导知识启发的强化学习方法[J].四川大学学报(工程科学版),2012,44(5):136-142. 被引量:4
  • 10MIROLLI M, SANTUCCI V G, BALDASSARRE G. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study[J]. Neural Networks, 2013, 39(3): 40 - 51.

二级参考文献34

  • 1苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量:9
  • 2沈晶,顾国昌,刘海波.分层强化学习中的动态分层方法研究[J].小型微型计算机系统,2007,28(2):287-291. 被引量:1
  • 3KAELBLING L P, LITTMAN M L. Reinforcement learning: a sur- vey[J]. Journal ofArtificiallntelligence Research, 1996, 4(1): 237- 285.
  • 4STRENS M. A Bayesian framework for reinforcement learning[C] //Proceeedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000:943 -950.
  • 5SUTTON R S, PRECUP D, SINGH S. Between MDPs and Semi- MDPs: a framework for temporal abstraction in reinforcement learn- ing[J]. Artificial Intelligence, 1999, 112(1): 181 - 211.
  • 6PARR R E. Hierarchical control and learning for Markov decision processes[D]. Berkeley, CA: University of California, 1998.
  • 7DIETTERICH T G, Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelli- gence Research, 2000, 13(1): 227 - 303.
  • 8HENGST B. Discovering hierarchy in reinforcement learning with HEXQ[C]//Proceedings of the Nineteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2002:243 - 250.
  • 9JONG N K, STONE E Hierarchical model-based reinforcement learning: R-MAX + MAXQ[C]//Proceedings of the 25th Interna- tional Conference on Machine Learning. New York: ACM, 2008: 432 - 439.
  • 10DIUK C, STREHL A L, LITTMAN M L. A hierarchical approach to efficient reinforcement learning in deterministic domains[C]//Pro- ceedings of the 5th International Joint Conference on Autonomous Agents andMultiagent Systems. New York: ACM, 2006:313 - 319.

共引文献58

同被引文献34

  • 1李国玉,孙以材,潘国峰,何平.基于BP网络的压力传感器信息融合[J].仪器仪表学报,2005,26(2):168-171. 被引量:27
  • 2安军涛,程平.基于模糊神经网络的智能PID控制器研究与设计[D].武汉:武汉理工大学,2010.
  • 3CHEN C H.Compensatory neural fuzzy networks with rulebas ed cooperative differential evolution for nonlinear system control[J].Nonlinear Dynamics,2014,75(1/2):355-366.
  • 4CHANG W D.Recurrent neural network modeling combined with bilinear model structure[J].Neural Computing and Applications,2014,4(3/4):765-77.
  • 5CHAND Y,ALAM A.Performance comparison of artificial neural networks learning algorithms and activation functions in predicting severity of autism[J].Network Modeling Analysis in Health Informatics and Bioinformatics,2015(1):1-23.
  • 6TAN P N,STEINBACH M,KUMAR V.Introduction to Data Mining[M].Beijing:Machinery Industry Press,2010.
  • 7RUSSELL S J,NORVIG P.Artificial Intelligence:A Modern Approach[M].Beijing:Tsinghua University Press,2013.
  • 8STEFAN F,SCHWENKER F.Neural network ensembles in reinforcement learning[J].Neural Processing Letters,2015,41(1):55-69.
  • 9CHU J,NIU Y D.A novel hybrid intelligent method for fault diafnosis of the complex system[J].International Journal of Information Technology,2016,9(3):331-340.
  • 10蔡立,徐鑫.MATLAB神经网络工具箱在非线性系统建模中的应用[J].机电工程技术,2008,37(2):82-84. 被引量:4

引证文献4

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部