期刊文献+

基于复杂过程简化模型的DHP学习控制 被引量:2

Learning Control of DHP Method Based on Complex Process Simplified Model
在线阅读 下载PDF
导出
摘要 提出一种基于简化模型的DHP(Dual Heuristic Programming)方法的学习控制,避免了标准DHP方法需要被控对象的精确模型来求得对于状态和控制动作的Jacobian矩阵,而是利用简化过程对象模型获得近似Jacob ian矩阵,实现学习控制的需要.生化反应器定值控制的仿真结果表明,该方法加快了学习过程,并对更大范围的参数变化具有鲁棒性. The standard DHP method needs accurate plant model to calculate the Jacobian matrix of state and control action, which is difficult to meet. A learning control strategy based on DHP(Dual Heuristic Programming) method of simplified model is proposed, which applies approximate Jacobian matrix to DHP training and thus relaxes this limitation. Simulation results of contrapose Bioreactor show that the proposed method can accelerate learning process and is robust to larger parameter changes.
作者 陈宗海 文锋
出处 《控制与决策》 EI CSCD 北大核心 2006年第10期1087-1091,共5页 Control and Decision
基金 国家自然科学基金项目(60575033)
关键词 强化学习 DHP方法 生化反应器 简化模型 Reinforcement learning DHP method Bioreactor Simplified model
  • 相关文献

参考文献2

二级参考文献41

  • 1[1]Sutton R S, Barto A G. Reinforcement Learning: An Introduction [M]. MIT Press, 1998.
  • 2[2]Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems [J]. IEEE Transactions on SMC,1983, 13(5) :834 ~846.
  • 3[3]Xu X, He H-G, Hu D W. Efficient reinforcement learning using recursive least-squares methods [ J ]. Journal of Artificial Intelligence Research, 2002,16:259 ~292.
  • 4[4]Kimura H, Kobayashi S. An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value functions [A]. 15th Int. Conf. on Machine Learning [C]. Madison, 1998. 278~286.
  • 5Seong C-Y, Widrow B. Neural dynamic optimization for control systems-Part Ⅲ: Applications, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 2001, 31(8): 502-513.
  • 6Bellman R E. Dynamic Programming, Princeton, N J: Princeton University Press- 1957.
  • 7Dreyfus S E, Law A M. The Art and Theory of Dynamic Programming, New York, NY: Academic Press,1977.
  • 8Lewis F L, Syrmos V L. Optimal Control, New York, NY: John Wiley, 1995.
  • 9Balakrishnan S N, Biega V. Adaptive-critic-based neural networks for aircraft optimal control, Journal of Guidance, Control, Dynamics, 1996, 19(7-8): 893--898.
  • 10Prokhorov D V, Wunsch D C. Adaptive critic designs, IEEE Transactions on Neural Networks, 1997, 8(9):997--1007.

共引文献14

同被引文献45

  • 1包振强,李长仪,周鑫.基于知识的动态调度决策机制研究[J].中国机械工程,2006,17(13):1366-1370. 被引量:10
  • 2严洪森.新的先进制造模式知识表示方法[J].机械工程学报,2006,42(10):80-90. 被引量:12
  • 3王世进,孙晟,周炳海,奚立峰.基于Q-学习的动态单机调度[J].上海交通大学学报,2007,41(8):1227-1232. 被引量:11
  • 4Tom M Mitchell.曾华军 张银奎译.机器学习[M].北京:机械工业出版社,2003..
  • 5Barto A G, Sutton R S, Brouwer P S. Associative search network.. A reinforcement learning associative memory[J]. Biological Cybernetics, 1981, 40(2) : 201- 211.
  • 6Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems [J]. IEEE Trans on Systems, Man and Cybernetics, 1983, 13(5): 834-846.
  • 7Sutton R S. Temporal credit assignment in reinforcement learning [D]. Amherst: University of Massachusetts, 1984.
  • 8Sutton R S. Learning to predict by the methods of temporal difference[J]. Machine Learning, 1988, 3(1) : 9-44.
  • 9Watkins J C H, Dayan P. Q-learning [J]. Machine Learning, 1992, 8(2): 279-292.
  • 10Sutton R S, Barto A G. Reinforcement learning: An introduetion[M]. Cambridge: MIT Press, 1998.

引证文献2

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部