摘要
针对具有连续状态和未知系统模型的非线性系统控制问题,提出一种基于Elman神经网络的Q学习控制策略.利用Elman网络良好的动态特性及泛化能力,对状态-动作对的Q值进行在线估计,解决状态空间泛化中易出现的“维数灾”问题.借鉴TD(λ)算法中状态的资格迹机制,通过对权值向量定义对应的资格迹来加速神经网络的学习过程.将所提方法应用于具有连续状态的小车爬山控制问题,学习系统在经过大约60多次学习后即能获得小车爬山控制策略,仿真结果表明所提方法能够有效解决具有连续状态的非线性系统的无模型增强学习控制.
Aiming at the controller design for nonlinear system with continuous state and un known dynamic model, a kind of Q learning method based on Elman neural network was proposed. The Q value of state-action pair was estimated on-line using the dynamic and generalization properties of Elman network, which can solve the curse of dimension' caused from state space generalization. In order to enhance the learning speed of neural network, eligibility trace corresponding to connect weights was introduced by the eligibility trace mechanism of state in TD( A ) algorithm. The method was applied to control of mountain car. The effective control strategy can be obtained after about 60 trials, which indicates that the proposed Q learning method is suitable for reinforcement learning control for nonlinear system with continuous state and unknown dynamic model.
出处
《中国矿业大学学报》
EI
CAS
CSCD
北大核心
2006年第5期653-657,共5页
Journal of China University of Mining & Technology
基金
国家自然科学基金项目(60475030)
关键词
非线性系统
增强学习
Q学习
ELMAN网络
资格迹
nonlinear system
reinforcement learning
Q learning
Elman neural network
eligibility trace