随着电网中新能源渗透率的增加,传统火电机组调频已无法满足电能质量需求。针对多源场景中传统自动发电控制系统区域控制误差较大的问题,提出一种基于Stackelberg博弈与改进深度神经网络(Stackelberg game and improved deep neural net...随着电网中新能源渗透率的增加,传统火电机组调频已无法满足电能质量需求。针对多源场景中传统自动发电控制系统区域控制误差较大的问题,提出一种基于Stackelberg博弈与改进深度神经网络(Stackelberg game and improved deep neural network,S-DNN)的多源调频协调策略。首先,设计一种改进多层次深度神经网络(deep neural network,DNN),由DNN层、自然梯度提升层、最小二乘支持向量机层顺序递进完成预测、评价、执行动作,输出总调频功率指令。该多层次总调频功率输出模型考虑新能源渗透率对调频系统的动态影响,充分学习历史信息与实时状态中更多的特征,提高了时序调频指令精度。然后基于Stackelberg博弈理论,考虑多源调频特征与协同作用,优化各调频源间的功率分配,提高系统二次调频的经济性。最后,通过算例分析验证了提出的多源调频协调策略的有效性。与传统调频方法相比,所提出的S-DNN多源调频协调策略可有效降低区域控制误差与频率偏差,并降低调频成本。展开更多
本文针对传统追逃微分博弈模型在现实复杂环境下,特别是面对不完全信息和计算复杂度时求解困难的问题,创新性提出了一种基于柔性执行者–评论家(Soft Actor-Critic, SAC)算法的改进多智能体强化学习方法,应用于无人机追捕单一智能目标...本文针对传统追逃微分博弈模型在现实复杂环境下,特别是面对不完全信息和计算复杂度时求解困难的问题,创新性提出了一种基于柔性执行者–评论家(Soft Actor-Critic, SAC)算法的改进多智能体强化学习方法,应用于无人机追捕单一智能目标的微分博弈问题。SAC算法在追逃微分博弈中的优势体现在其自然实现了混合策略的概念,能够通过随机性来应对对手的动态变化,且具有较强的探索能力、稳定性和鲁棒性。与其他强化学习算法相比,SAC更适合处理不确定性强、对手行为复杂、动作空间连续的博弈问题。本文假设在部分可观测的环境下,追逐者和逃避者均无法知晓全部信息,仅能通过环境中的部分信息进行决策。为了解决这一连续优化问题,本文采用多智能体柔性执行者–评论家(multi-agent Soft Actor-Critic, MASAC)算法,使追逃双方智能体通过与环境的交互学习各自的最优策略。最终,本文通过测试展示了在部分可观测环境下,改进的多智能体强化学习方法在无人机追捕–逃避场景中的适用性与应用潜力。This paper addresses the difficulty in solving traditional pursuit-evasion differential game models in complex real-world environments, especially when dealing with incomplete information and computational complexity. An innovative solution is proposed in the form of an improved multi-agent reinforcement learning method based on the Soft Actor-Critic (SAC) algorithm, applied to the differential game problem of unmanned aerial vehicles (UAVs) pursuing a single intelligent target. The advantage of the SAC algorithm in pursuit-evasion differential games lies in its natural implementation of the mixed strategy concept, allowing it to handle dynamic changes in the opponent’s behavior through randomness, while exhibiting strong exploration capabilities, stability, and robustness. Compared to other reinforcement learning algorithms, SAC is better suited for handling games with strong uncertainty, complex opponent behaviors, and continuous action spaces. In this paper, we assume a partially observable environment where both the pursuer and evader are unaware of the full information and can only make decisions based on partial environmental observations. To address this continuous optimization problem, we adopt the multi-agent Soft Actor-Critic (MASAC) algorithm, enabling both agents in the pursuit-evasion scenario to learn their optimal strategies through interactions with the environment. Ultimately, through testing, this paper demonstrates the applicability and potential of the improved multi-agent reinforcement learning method in UAV pursuit-evasion scenarios within partially observable environments.展开更多
文摘随着电网中新能源渗透率的增加,传统火电机组调频已无法满足电能质量需求。针对多源场景中传统自动发电控制系统区域控制误差较大的问题,提出一种基于Stackelberg博弈与改进深度神经网络(Stackelberg game and improved deep neural network,S-DNN)的多源调频协调策略。首先,设计一种改进多层次深度神经网络(deep neural network,DNN),由DNN层、自然梯度提升层、最小二乘支持向量机层顺序递进完成预测、评价、执行动作,输出总调频功率指令。该多层次总调频功率输出模型考虑新能源渗透率对调频系统的动态影响,充分学习历史信息与实时状态中更多的特征,提高了时序调频指令精度。然后基于Stackelberg博弈理论,考虑多源调频特征与协同作用,优化各调频源间的功率分配,提高系统二次调频的经济性。最后,通过算例分析验证了提出的多源调频协调策略的有效性。与传统调频方法相比,所提出的S-DNN多源调频协调策略可有效降低区域控制误差与频率偏差,并降低调频成本。
文摘本文针对传统追逃微分博弈模型在现实复杂环境下,特别是面对不完全信息和计算复杂度时求解困难的问题,创新性提出了一种基于柔性执行者–评论家(Soft Actor-Critic, SAC)算法的改进多智能体强化学习方法,应用于无人机追捕单一智能目标的微分博弈问题。SAC算法在追逃微分博弈中的优势体现在其自然实现了混合策略的概念,能够通过随机性来应对对手的动态变化,且具有较强的探索能力、稳定性和鲁棒性。与其他强化学习算法相比,SAC更适合处理不确定性强、对手行为复杂、动作空间连续的博弈问题。本文假设在部分可观测的环境下,追逐者和逃避者均无法知晓全部信息,仅能通过环境中的部分信息进行决策。为了解决这一连续优化问题,本文采用多智能体柔性执行者–评论家(multi-agent Soft Actor-Critic, MASAC)算法,使追逃双方智能体通过与环境的交互学习各自的最优策略。最终,本文通过测试展示了在部分可观测环境下,改进的多智能体强化学习方法在无人机追捕–逃避场景中的适用性与应用潜力。This paper addresses the difficulty in solving traditional pursuit-evasion differential game models in complex real-world environments, especially when dealing with incomplete information and computational complexity. An innovative solution is proposed in the form of an improved multi-agent reinforcement learning method based on the Soft Actor-Critic (SAC) algorithm, applied to the differential game problem of unmanned aerial vehicles (UAVs) pursuing a single intelligent target. The advantage of the SAC algorithm in pursuit-evasion differential games lies in its natural implementation of the mixed strategy concept, allowing it to handle dynamic changes in the opponent’s behavior through randomness, while exhibiting strong exploration capabilities, stability, and robustness. Compared to other reinforcement learning algorithms, SAC is better suited for handling games with strong uncertainty, complex opponent behaviors, and continuous action spaces. In this paper, we assume a partially observable environment where both the pursuer and evader are unaware of the full information and can only make decisions based on partial environmental observations. To address this continuous optimization problem, we adopt the multi-agent Soft Actor-Critic (MASAC) algorithm, enabling both agents in the pursuit-evasion scenario to learn their optimal strategies through interactions with the environment. Ultimately, through testing, this paper demonstrates the applicability and potential of the improved multi-agent reinforcement learning method in UAV pursuit-evasion scenarios within partially observable environments.