The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi...The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.展开更多
With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ...With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.展开更多
工业应用中,动态多变的流式数据特性使强化学习算法在训练过程中很难在模型收敛性与知识遗忘之间实现很好的平衡。考虑工业现场内容请求与当前生产任务具有高度相关性,提出一种基于集成深度Q网络算法(Integrated Deep Q-Network,IDQN)...工业应用中,动态多变的流式数据特性使强化学习算法在训练过程中很难在模型收敛性与知识遗忘之间实现很好的平衡。考虑工业现场内容请求与当前生产任务具有高度相关性,提出一种基于集成深度Q网络算法(Integrated Deep Q-Network,IDQN)的自适应缓存策略。算法在离线阶段利用不同历史任务数据,训练并保存多个历史任务模型。在线阶段每当检测到实时数据流的任务特征发生变化,则重新训练网络模型。如果实时数据流的特征隶属于历史任务,则向深度Q网络(Deep Q-Network,DQN)导入相应的历史任务模型进行网络训练。否则直接利用实时数据流训练并标记为新的任务模型。仿真实验结果表明,IDQN与参考算法相比,在内容请求流行度动态变化时能够有效减少模型收敛时间,提高缓存效率。展开更多
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ...A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.展开更多
The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-rel...The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance.展开更多
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ...With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.展开更多
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ...In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.展开更多
基金supported by the Universiti Tunku Abdul Rahman (UTAR) Malaysia under UTARRF (IPSR/RMC/UTARRF/2021-C1/T05)
文摘The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.
基金supported in part by the Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2022C01083 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/)Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2023C01217 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/).
文摘With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.
文摘工业应用中,动态多变的流式数据特性使强化学习算法在训练过程中很难在模型收敛性与知识遗忘之间实现很好的平衡。考虑工业现场内容请求与当前生产任务具有高度相关性,提出一种基于集成深度Q网络算法(Integrated Deep Q-Network,IDQN)的自适应缓存策略。算法在离线阶段利用不同历史任务数据,训练并保存多个历史任务模型。在线阶段每当检测到实时数据流的任务特征发生变化,则重新训练网络模型。如果实时数据流的特征隶属于历史任务,则向深度Q网络(Deep Q-Network,DQN)导入相应的历史任务模型进行网络训练。否则直接利用实时数据流训练并标记为新的任务模型。仿真实验结果表明,IDQN与参考算法相比,在内容请求流行度动态变化时能够有效减少模型收敛时间,提高缓存效率。
基金Supported by the National Ministries and Research Funds(3020020221111)
文摘A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.
基金supported by National Natural Science Foundation of China(Grant No.62071377,62101442,62201456)Natural Science Foundation of Shaanxi Province(Grant No.2023-YBGY-036,2022JQ-687)The Graduate Student Innovation Foundation Project of Xi’an University of Posts and Telecommunications under Grant CXJJDL2022003.
文摘The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance.
文摘针对传统深度Q学习网络(deep Q-learning network,DQN)在具有动态障碍物的路径规划下,移动机器人在探索时频繁碰撞难以移动至目标点的问题,通过在探索策略和经验回放机制上进行改进,提出一种改进的DQN算法。在探索策略上,利用快速搜索随机树(rapidly-exploring random tree,RRT)算法自动生成静态先验知识来指导动作选取,替代ε-贪婪策略的随机动作,提高智能体到达目标的成功率;在经验利用上,使用K-means算法设计一种聚类经验回放机制,根据动态障碍物的位置信息进行聚类分簇,着重采样与当前智能体状态相似的经验进行回放,使智能体更有效地避免碰撞动态障碍物。二维栅格化环境下的仿真实验表明,在动态环境下,该算法可以避开静态和动态障碍物,成功移动至目标点,验证了该算法在应对动态避障路径规划的可行性。
文摘With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.
文摘In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.