期刊文献+

基于Elman网络的非线性系统增强式学习控制 被引量:8

Reinforcement Learning Control for Nonlinear Systems Based on Elman Neural Network
在线阅读 下载PDF
导出
摘要 针对具有连续状态和未知系统模型的非线性系统控制问题,提出一种基于Elman神经网络的Q学习控制策略.利用Elman网络良好的动态特性及泛化能力,对状态-动作对的Q值进行在线估计,解决状态空间泛化中易出现的“维数灾”问题.借鉴TD(λ)算法中状态的资格迹机制,通过对权值向量定义对应的资格迹来加速神经网络的学习过程.将所提方法应用于具有连续状态的小车爬山控制问题,学习系统在经过大约60多次学习后即能获得小车爬山控制策略,仿真结果表明所提方法能够有效解决具有连续状态的非线性系统的无模型增强学习控制. Aiming at the controller design for nonlinear system with continuous state and un known dynamic model, a kind of Q learning method based on Elman neural network was proposed. The Q value of state-action pair was estimated on-line using the dynamic and generalization properties of Elman network, which can solve the curse of dimension' caused from state space generalization. In order to enhance the learning speed of neural network, eligibility trace corresponding to connect weights was introduced by the eligibility trace mechanism of state in TD( A ) algorithm. The method was applied to control of mountain car. The effective control strategy can be obtained after about 60 trials, which indicates that the proposed Q learning method is suitable for reinforcement learning control for nonlinear system with continuous state and unknown dynamic model.
出处 《中国矿业大学学报》 EI CAS CSCD 北大核心 2006年第5期653-657,共5页 Journal of China University of Mining & Technology
基金 国家自然科学基金项目(60475030)
关键词 非线性系统 增强学习 Q学习 ELMAN网络 资格迹 nonlinear system reinforcement learning Q learning Elman neural network eligibility trace
  • 相关文献

参考文献13

  • 1闫友彪,陈元琰.机器学习的主要策略综述[J].计算机应用研究,2004,21(7):4-10. 被引量:57
  • 2MICHIE D,CHAMBERS R A.Boxes:an experiment in adaptive control[J].Machine Intelligence,1968,2(2):137-152.
  • 3BARAS J S,BORKAR V S.A learning algorithm for Markov decision processes with adaptive state aggregation[C]// Proceedings of the IEEE Conference on Decision and Control.New Jersey:Piscataway Press,2000:3351-3356.
  • 4MOORE A W,ATKESON C G.The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces[J].Machine Learning,1995,21(3):199-233.
  • 5LIN C K.A reinforcement learning adaptive fuzzy controller for robots[J].Fuzzy Sets and Systems,2003,137(3):339-352.
  • 6蒋国飞,吴沧浦.基于Q学习算法和BP神经网络的倒立摆控制[J].自动化学报,1998,24(5):662-666. 被引量:55
  • 7KUROZUMI R,FUJISAWA S,YAMAMOTO T,et al.Development of an automatic travel system for electric wheelchairs using reinforcement learning systems and CMACs[C]// Proceedings of the International Joint Conference on Neural Networks.Honolulu:Institute of Electrical and Electronics Engineers Inc.Press,2002:1690-1695.
  • 8SUTTON R S,BARTO A G.Reinforcement learning:an Introduction[M].Cambridge:The MIT Press,1998.
  • 9WATKINS C J C H,DAYAN P.Technical report:Q-learning[J].Machine Learning,1992,8(3):279-292.
  • 10许世范,王雪松,郝继飞.Predicting Model for Complex Production Process Based on Dynamic Neural Network[J].Journal of China University of Mining and Technology,2001,11(1):20-23. 被引量:1

二级参考文献35

  • 1[19]James A Highsmith.Adaptive Software Development[M].北京:清华大学出版社,2003.
  • 2Peng J,博士学位论文,1993年
  • 3Baird L C. Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Machine Learning (ICML95), Tahoe City, California, USA, 1995. 30~37
  • 4Rumelhart D E et al. Learning internal representations by error propagation. In: Rumelhart D E et al, eds. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol.1,Cambridge, MA: MIT Press,1986. 318~362
  • 5Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 1989, 2: 303~314
  • 6Baird L C, Moore A. Gradient descent for general reinforcement learning. In: Kearns M S, Solla S A, Cohn D A eds. Advances in Neural Information Processing Systems 11, Cambrige, MA: MIT Press, 1999. 968~974
  • 7Bertsekas D P, Tsitsiklis J N. Gradient convergence in gradient methods with errors. SIAM Journal on Optimization, 2000, 10(3): 627~642
  • 8Heger M. The loss from imperfect value functions in expectation-based and minimax-based tasks. Machine Learning, 1996, 22(1): 197~225
  • 9Sutton R. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky D S, Mozer M C, Hasselmo M E eds. Advances in Neural Information Processing Systems 8, Cambrige, MA: MIT Press, 1996. 1038~1044
  • 10Kaelbling L P et al. Reinforcement learning: A survey. Jour- nal of Artificial Intelligence Research, 1996, 4: 237~285

共引文献131

同被引文献75

引证文献8

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部