摘要
为解决多无人机近距空战机动决策问题,提出一种基于参数共享Q网络与虚拟自我对局的多无人机近距空战机动策略生成算法.首先,设计一种适用于不同无人机编队规模的混合马尔可夫博弈模型与多无人机机动决策策略生成强化学习框架—参数共享Q网络,并通过自编码器对状态空间进行压缩以提高策略学习效率.然后,使用虚拟自我对局方法使机动策略收敛至纳什均衡策略.最后对自编码器的参数选择、策略生成算法的训练过程与机动策略的合理性与迁移性进行了仿真实验.通过仿真结果表明,引入自编码器可以有效地提高策略学习效率,并且使用该算法生成的多无人机近距空战机动策略具有合理性与良好的迁移性.
In order to solve the problem of multi-UAV close-range air combat maneuvering decision-making, a multi-UAV close-range air combat maneuvering strategy generation algorithm based on parameter sharing Q network and neural fictitious self-play is proposed. Firstly, a hybrid Markov game model suitable for different UAV formation sizes and a reinforcement learning framework for generating maneuvering decision strategies of multi-UAV are designed—parameter sharing Q network, and the state space is compressed through the autoencoder to improve the efficiency of strategy learning.Then, using the neural fictitious self-play makes the maneuver strategy converge to the Nash equilibrium strategy. Finally,simulation experiments are carried out on the parameter selection of the autoencoder, the training process of the strategy generation algorithm, and the rationality and portability of the maneuver strategy. The simulation results show that the autoencoder is introduced can effectively improve the efficiency of strategy learning, and the multi-UAV short-range air combat maneuver strategy generated by this algorithm is reasonable and good portability.
作者
孔维仁
周德云
赵艺阳
杨婉莎
KONG Wei-ren;ZHOU De-yun;ZHAO Yi-yang;YANG Wan-sha(School of Electronics and Information,Northwestern Polytechnical University,Xi’an Shaanxi 710129,China;School of Computer Science,The University of Sydney,Sydney 2006,Australia)
出处
《控制理论与应用》
EI
CAS
CSCD
北大核心
2022年第2期352-362,共11页
Control Theory & Applications
基金
国家自然科学基金项目(61603299,61612385)
中央高校基本科研业务费专项资金项目(3102019ZX016)资助。
关键词
空战决策
多无人机协同
强化学习
虚拟自我对局
air combat decision-making
multi-UAV cooperation
reinforcement learning
fictitious self-play