Improvements in hybrid electric vehicle (HEV) fuel economy and emissions heavily depend on an efficient energy management strategy (EMS). However, the uncertainty of future driving conditions generally cannot be easil...Improvements in hybrid electric vehicle (HEV) fuel economy and emissions heavily depend on an efficient energy management strategy (EMS). However, the uncertainty of future driving conditions generally cannot be easily tackled in EMS design. Most existing EMSs act upon fixed parameters and cannot adapt to varying driving conditions. Therefore, they usually fail to fully explore the potential of these advanced vehicles. In this paper, a novel EMS design procedure based on neural dynamic programming (NDP) is proposed. The NDP is a generic online learning algorithm, which combines stochastic dynamic programming (SDP) and the temporal difference (TD) method. Instead of computing the utility function and optimal control actions through Bellman equations, the NDP algorithm uses two neural networks to approximate them. The weights of these neural networks are updated online by the TD method. It avoids the high computational cost that SDP suffers from and is suitable for real-time implementation. The main advantages of NDP EMS is that it does not rely on prior information related to future driving conditions, and can self-tune with a wide variance in operating conditions. The NDP EMS has been applied to “Qianghua-I”, a prototype of a parallel HEV, using a revolving drum test bench for verification. Experiment results illustrate the potential of the proposed EMS in terms of fuel economy and in keeping state of charge (SOC) deviations at a low level. The proposed research ensures the optimality of NDP EMS, as well as real-time applicability.展开更多
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra...This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.展开更多
基金supported by Innovation Technology Fund of the Hong Kong Special Administrative Region of China (Grant No. GHP/011/05)
文摘Improvements in hybrid electric vehicle (HEV) fuel economy and emissions heavily depend on an efficient energy management strategy (EMS). However, the uncertainty of future driving conditions generally cannot be easily tackled in EMS design. Most existing EMSs act upon fixed parameters and cannot adapt to varying driving conditions. Therefore, they usually fail to fully explore the potential of these advanced vehicles. In this paper, a novel EMS design procedure based on neural dynamic programming (NDP) is proposed. The NDP is a generic online learning algorithm, which combines stochastic dynamic programming (SDP) and the temporal difference (TD) method. Instead of computing the utility function and optimal control actions through Bellman equations, the NDP algorithm uses two neural networks to approximate them. The weights of these neural networks are updated online by the TD method. It avoids the high computational cost that SDP suffers from and is suitable for real-time implementation. The main advantages of NDP EMS is that it does not rely on prior information related to future driving conditions, and can self-tune with a wide variance in operating conditions. The NDP EMS has been applied to “Qianghua-I”, a prototype of a parallel HEV, using a revolving drum test bench for verification. Experiment results illustrate the potential of the proposed EMS in terms of fuel economy and in keeping state of charge (SOC) deviations at a low level. The proposed research ensures the optimality of NDP EMS, as well as real-time applicability.
文摘This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.