MDPI - Publisher of Open Access Journals

31 pages, 2520 KB

Open AccessArticle

Parameterized Reinforcement Learning with Route Guidance for Controlling Urban Road Traffic Networks

by Edwin M. Kataka, Thomas O. Olwal, Karim Djouani and Prosper Z. Sotenga

Future Transp. 2026, 6(2), 56; https://doi.org/10.3390/futuretransp6020056 - 28 Feb 2026

Viewed by 254

Traditional macroscopic fundamental diagram (MFD)-based traffic perimeter metering control strategies rely on full knowledge of vehicle accumulation and inter-regional flow dynamics, assumptions that seldom hold in heterogeneous and highly variable real-world networks. Classical data-driven reinforcement learning methods face similar constraints, often converging slowly [...] Read more.

Traditional macroscopic fundamental diagram (MFD)-based traffic perimeter metering control strategies rely on full knowledge of vehicle accumulation and inter-regional flow dynamics, assumptions that seldom hold in heterogeneous and highly variable real-world networks. Classical data-driven reinforcement learning methods face similar constraints, often converging slowly and exhibiting low sample efficiency when confronted with such complexities. Motivated by these limitations, this paper proposes a Parameterized Deep Q-Network perimeter control (P-DQNPC) scheme designed for multi-region urban road networks. The framework jointly optimizes discrete actions (regional routing choices) and continuous actions (signal-timing or flow-duration regulation) within a model-free learning structure. The approach is first trained and validated on synthetic MFD data to establish stable and interpretable policy behavior under controlled conditions. It is then transferred and further evaluated using real-world measurements from the Performance Measurement System—San Francisco Bay Area (PeMS-SF), a dataset collected from 18,954 loop detectors across the California State Highway System. PeMS-SF is selected due to its high spatial and temporal resolution, broad network coverage, and strong ability to capture realistic and diverse congestion patterns qualities that support both rigorous validation and generalization to other metropolitan regions. Experimental results show that P-DQNPC consistently outperforms state-of-the-art baselines, including deep deterministic policy gradient, deep Q-network, and No-Control schemes. The proposed method achieves superior regulation of regional accumulations and demonstrates enhanced robustness in large, heterogeneous, and uncertain urban traffic environments. Full article

► Show Figures

Figure 1

19 pages, 3860 KB

Open AccessArticle

An Improved DQN Framework with Dual Residual Horizontal Feature Pyramid for Autonomous Fault Diagnosis in Strong-Noise Scenarios

by Sha Li, Tong Wang, Xin Xu, Weiting Gan, Kun Chen, Xinyan Fan and Xueming Xu

Sensors 2025, 25(24), 7639; https://doi.org/10.3390/s25247639 - 16 Dec 2025

Cited by 1 | Viewed by 517

Abstract

Fault diagnosis methods based on deep learning have made certain progress in recent years. However, in actual industrial scenarios, there are severe strong background noise and limited computing resources, which poses challenges to the practical application of fault diagnosis models. In response to [...] Read more.

Fault diagnosis methods based on deep learning have made certain progress in recent years. However, in actual industrial scenarios, there are severe strong background noise and limited computing resources, which poses challenges to the practical application of fault diagnosis models. In response to the above issues, this paper proposes a novel noise-resistant and lightweight fault diagnosis framework with nonlinear timestep degenerative greedy strategy (NTDGS) and dual residual horizontal feature pyramid (DRHFPN) for fault diagnosis in strong noise scenarios. This method takes advantage of the strong fitting ability of deep learning methods to model the agent in reinforcement learning by the ways of parameterization, fully leveraging the advantages of both deep learning and reinforcement learning methods. NTDGS is further developed to adaptively adjust the action sampling strategy of the agent at different training stages, improving the convergence speed of the network. To enhance the noise resistance of the network, DRHFPN is constructed, which can filter out interference noise at the feature map level by fusing local feature details and global semantic information. Furthermore, the feature map weighting attention mechanism (FMWAM) is designed to enhance the weak feature extraction ability of the network through adaptive weighting of the feature maps. Finally, the performance of the proposed method is evaluated in different datasets and strong noise environments. Experiments show that in various fault diagnosis scenarios, the proposed method has better noise resistance, higher fault diagnosis accuracy, and fewer parameters compared to other methods. Full article

(This article belongs to the Special Issue Smart Sensors for Machine Condition Monitoring and Fault Diagnosis)

► Show Figures

Figure 1

31 pages, 1040 KB

Open AccessArticle

Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance

by Björn Krautwig, Dominik Wans, Li Li, Till Temmen, Lucas Koch, Markus Eisenbarth and Jakob Andert

J. Mar. Sci. Eng. 2025, 13(12), 2233; https://doi.org/10.3390/jmse13122233 - 23 Nov 2025

Cited by 1 | Viewed by 1032

Abstract

Autonomous navigation is critical for unlocking the full potential of Unmanned Surface Vehicles (USVs) in complex maritime environments. Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for developing self-learning control policies, yet the design of reward functions to balance conflicting objectives, [...] Read more.

Autonomous navigation is critical for unlocking the full potential of Unmanned Surface Vehicles (USVs) in complex maritime environments. Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for developing self-learning control policies, yet the design of reward functions to balance conflicting objectives, particularly fast arrival at the target position and collision avoidance, remains a major challenge. The precise, quantitative impact of reward parameterization on a USV’s maneuvering behavior and the inherent performance trade-offs have not been thoroughly investigated. Here, we demonstrate that by systematically varying reward function weights within a framework relying on the Proximal Policy Optimization (PPO), it is possible to quantitatively map the trade-off between collision avoidance safety and mission time. Our results, derived from simulations, show that agents trained with balanced reward weights achieve target-reaching success rates exceeding 98% in dynamic multi-obstacle scenarios. Conversely, configurations that disproportionately penalize obstacle proximity lead to overly cautious behavior and mission failure, with success rates dropping to 22% due to workspace boundary violations. This work provides a data-driven methodological framework for reward function design and parameter selection in safety-critical robotic applications, moving beyond ad-hoc tuning towards a more structured parameter influence analysis. Full article

(This article belongs to the Special Issue Advances in the Decision-Making and Control of Autonomous Marine Vehicles)

► Show Figures

Figure 1

24 pages, 2181 KB

Open AccessArticle

DPDQN-TER: An Improved Deep Reinforcement Learning Approach for Mobile Robot Path Planning in Dynamic Scenarios

by Shuyuan Gao, Yang Xu, Xiaoxiao Guo, Chenchen Liu and Xiaobai Wang

Sensors 2025, 25(21), 6741; https://doi.org/10.3390/s25216741 - 4 Nov 2025

Viewed by 1489

Abstract

Efficient and stable path planning in dynamic and obstacle-dense environments, such as large-scale structure assembly measurement, is essential for improving the practicality and environmental adaptability of mobile robots in measurement and quality inspection tasks. However, traditional reinforcement learning methods often suffer from inefficient [...] Read more.

Efficient and stable path planning in dynamic and obstacle-dense environments, such as large-scale structure assembly measurement, is essential for improving the practicality and environmental adaptability of mobile robots in measurement and quality inspection tasks. However, traditional reinforcement learning methods often suffer from inefficient use of experience and limited capability to represent policy structures in complex dynamic scenarios. To overcome these limitations, this study proposes a method named DPDQN-TER that integrates Transformer-based sequence modeling with a multi-branch parameter policy network. The proposed method introduces a temporal-aware experience replay mechanism that employs multi-head self-attention to capture causal dependencies within state transition sequences. By dynamically weighting and sampling critical obstacle-avoidance experiences, this mechanism significantly improves learning efficiency and policy performance and stability in dynamic environments. Furthermore, a multi-branch parameter policy structure is designed to decouple continuous parameter generation tasks of different action categories into independent subnetworks, thereby reducing parameter interference and improving deployment-time efficiency. Extensive simulation experiments were conducted in both static and dynamic obstacle environments, as well as cross-environment validation. The results show that DPDQN-TER achieves higher success rates, shorter path lengths, and faster convergence compared with benchmark algorithms including Parameterized Deep Q-Network (PDQN), Multi-Pass Deep Q-Network (MPDQN), and PDQN-TER. Ablation studies further confirm that both the Transformer-enhanced replay mechanism and the multi-branch parameter policy network contribute significantly to these improvements. These findings demonstrate improved overall performance (e.g., success rate, path length, and convergence) and generalization capability of the proposed method, indicating its potential as a practical solution for autonomous navigation of mobile robots in complex industrial measurement scenarios. Full article

(This article belongs to the Special Issue Advanced Sensors for Path Planning and Navigation in Challenging Environments)

► Show Figures

Figure 1

26 pages, 1208 KB

Open AccessArticle

Quantum Computing Meets Deep Learning: A QCNN Model for Accurate and Efficient Image Classification

by Sunil Prajapat, Manish Tomar, Pankaj Kumar, Rajesh Kumar and Athanasios V. Vasilakos

Mathematics 2025, 13(19), 3148; https://doi.org/10.3390/math13193148 - 2 Oct 2025

Cited by 1 | Viewed by 3235

Abstract

In deep learning, Convolutional Neural Networks (CNNs) serve as fundamental models, leveraging the correlational structure of data for tasks such as image classification and processing. However, CNNs face significant challenges in terms of computational complexity and accuracy. Quantum computing offers a promising avenue [...] Read more.

In deep learning, Convolutional Neural Networks (CNNs) serve as fundamental models, leveraging the correlational structure of data for tasks such as image classification and processing. However, CNNs face significant challenges in terms of computational complexity and accuracy. Quantum computing offers a promising avenue to overcome these limitations by introducing a quantum counterpart-Quantum Convolutional Neural Networks (QCNNs). QCNNs significantly reduce computational complexity, enhance the models ability to capture intricate patterns, and improve classification accuracy. This paper presents a fully parameterized QCNN model, specifically designed for Noisy Intermediate-Scale Quantum (NISQ) devices. The proposed model employs two-qubit interactions throughout the algorithm, leveraging parameterized quantum circuits (PQCs) with rotation and entanglement gates to efficiently encode and process image data. This design not only ensures computational efficiency but also enhances compatibility with current quantum hardware. Our experimental results demonstrate the model’s notable performance in binary classification tasks on the MNIST dataset, highlighting the potential of quantum-enhanced deep learning in image recognition. Further, we extend our framework to the Wine dataset, reformulated as a binary classification problem distinguishing Class 0 wines from the rest. The QCNN again demonstrates remarkable learning capability, achieving 97.22% test accuracy. This extension validates the versatility of the model across domains and reinforces the promising role of quantum neural networks in tackling a broad range of classification tasks. Full article

(This article belongs to the Special Issue Applied Mathematics in Artificial Intelligence: Methods, Algorithms, and Applications)

► Show Figures

Figure 1

24 pages, 2157 KB

Open AccessArticle

Research on Aerodynamic Force/Thrust Vector Combined Trajectory Optimization Method for Hypersonic Drones Based on Deep Reinforcement Learning

by Zijun Zhang, Yunfan Zhou, Leichao Yang, Wenzhong Jin and Jun Wang

Actuators 2025, 14(9), 461; https://doi.org/10.3390/act14090461 - 22 Sep 2025

Viewed by 986

Abstract

This paper addresses the cruise range maximization problem for hypersonic drones by proposing a combined aerodynamic force/thrust vector trajectory optimization method. A novel continuous linear parameterization strategy for trajectory optimization is innovatively developed, achieving continuous thrust vector trajectory optimization throughout the entire flight [...] Read more.

This paper addresses the cruise range maximization problem for hypersonic drones by proposing a combined aerodynamic force/thrust vector trajectory optimization method. A novel continuous linear parameterization strategy for trajectory optimization is innovatively developed, achieving continuous thrust vector trajectory optimization throughout the entire flight using only 21 parameters through recursive linear function design. This approach reduces parameter dimensionality and effectively addresses sparse rewards and training difficulties in reinforcement learning. The study integrates the Deep Deterministic Policy Gradient (DDPG) algorithm with deep residual networks for trajectory optimization, systematically exploring the impact mechanisms of different aerodynamic force and thrust vector combination modes on range performance. Through collaborative trajectory optimization of thrust vectors and flight height, simulation results demonstrate that the combined trajectory optimization strategy achieves a total range enhancement of approximately 146.14 km compared to pure aerodynamic control, with continuous linearly parameterized thrust vector trajectory optimization providing superior performance over traditional segmented methods. These results verify the significant advantages of the proposed trajectory optimization approach and the effectiveness of the deep reinforcement learning framework. Full article

(This article belongs to the Section Aerospace Actuators)

► Show Figures

Figure 1

22 pages, 9960 KB

Open AccessArticle

Extremal-Aware Deep Numerical Reinforcement Learning Fusion for Marine Tidal Prediction

by Xiaodao Chen, Gongze Zheng and Yuewei Wang

J. Mar. Sci. Eng. 2025, 13(9), 1771; https://doi.org/10.3390/jmse13091771 - 13 Sep 2025

Cited by 1 | Viewed by 1023

Abstract

In the context of global climate change and accelerated urbanization, coastal cities face severe threats from storm surges, and accurately predicting coastal water level changes during storm surges has become a core technological demand for disaster prevention and reduction. Storm surges are caused [...] Read more.

In the context of global climate change and accelerated urbanization, coastal cities face severe threats from storm surges, and accurately predicting coastal water level changes during storm surges has become a core technological demand for disaster prevention and reduction. Storm surges are caused by atmospheric pressure and wind conditions, and their destructive power is closely related to the morphology of the coastline. Traditional tide level prediction models often face difficulties in boundary condition parameterization. Tide level changes result from the combined effect of various complex processes. In past prediction studies, harmonic analysis and numerical simulations have dominated, each with their own limitations. Although machine learning applications in tide prediction have garnered attention, issues such as data inconsistency or missing data still exist. The physical–data fusion approach aims to overcome the limitations of single methods but still faces some challenges. This paper proposes a Deep-Numerical-Reinforcement learning fusion prediction model (DNR), which adopts ensemble learning. First, deep learning models and the numerical model Finite-Volume Coastal Ocean Model (FVCOM) are used to predict tide levels at different tide stations, and then a fusion approach based on the improved reinforcement learning model DDPG_dual is applied for model assimilation. This reinforcement learning fusion model includes a module specifically designed to handle tide extreme points. In the case of the Typhoon Mangkhut storm surge, the DNR model achieved the best results for tide level predictions at six tide stations in the South China Sea. Full article

(This article belongs to the Section Coastal Engineering)

► Show Figures

Figure 1

14 pages, 2743 KB

Open AccessArticle

Parametric Dueling DQN- and DDPG-Based Approach for Optimal Operation of Microgrids

by Wei Huang, Qing Li, Yuan Jiang and Xiaoya Lu

Processes 2024, 12(9), 1822; https://doi.org/10.3390/pr12091822 - 27 Aug 2024

Cited by 7 | Viewed by 1831

Abstract

This study is aimed at addressing the problem of optimizing microgrid operations to improve local renewable energy consumption and ensure the stability of multi-energy systems. Microgrids are localized power systems that integrate distributed energy sources, storage, and controllable loads to enhance energy efficiency [...] Read more.

This study is aimed at addressing the problem of optimizing microgrid operations to improve local renewable energy consumption and ensure the stability of multi-energy systems. Microgrids are localized power systems that integrate distributed energy sources, storage, and controllable loads to enhance energy efficiency and reliability. The proposed approach introduces a novel microgrid optimization method that leverages the parameterized Dueling Deep Q-Network (Dueling DQN) and Deep Deterministic Policy Gradient (DDPG) algorithms. The method employs a parametric hybrid action-space reinforcement learning technique, where the DDPG is utilized to convert discrete actions into continuous action values corresponding to each discrete action, while the Dueling DQN uses the current observation states and these continuous action values to predict the discrete actions that maximize Q-values. This integrated strategy is designed to tackle the co-scheduling challenge in microgrids, enabling them to dynamically select the most favorable control strategies based on their specific states and the actions of other intelligent entities. The ultimate objective is to minimize the overall operational costs of microgrids while ensuring the efficient local consumption of renewable energy and maintaining the stability of multi-energy systems. Simulation experiments were conducted to validate the efficacy and superiority of the proposed method in achieving the optimal microgrid operation, showcasing its potential to improve service quality and reduce operational expenses. Average rewards increased by 30% and 15% compared to the use of the Dueling DQN or DDPG only. Full article

(This article belongs to the Topic Advanced Operation, Control, and Planning of Intelligent Energy Systems)

► Show Figures

Figure 1

27 pages, 11040 KB

Open AccessArticle

PolyDexFrame: Deep Reinforcement Learning-Based Pick-and-Place of Objects in Clutter

by Muhammad Babar Imtiaz, Yuansong Qiao and Brian Lee

Machines 2024, 12(8), 547; https://doi.org/10.3390/machines12080547 - 11 Aug 2024

Cited by 1 | Viewed by 2511

Abstract

This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile [...] Read more.

This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile robotic manipulations involving different robotic grippers in a completely self-supervised manner. The problem was tackled as a reinforcement learning problem; after the Markov decision process (MDP) was designed, the off-policy model-free Q-learning algorithm was deployed using deep Q-networks as a Q-function approximator. Four distinct robotic manipulations, i.e., grasp from the prehensile manipulation category and inward slide, outward slide, and suction grip from the non-prehensile manipulation category were considered as actions. The Q-function comprised four fully convolutional networks (FCN) corresponding to each action based on memory-efficient DenseNet-121 variants outputting pixel-wise maps of action-values jointly trained via the pixel-wise parametrization technique. Rewards were awarded according to the status of the action performed, and backpropagation was conducted accordingly for the FCN generating the maximum Q-value. The results showed that the agent learned the sequential combination of the polydexterous prehensile and non-prehensile manipulations, where the non-prehensile manipulations increased the possibility of prehensile manipulations. We achieved promising results in comparison to the baselines, differently designed variants, and density-based testing clutter. Full article

(This article belongs to the Special Issue Recent Advances in Robotics, Factory Automation and Intelligent Networked Systems)

► Show Figures

Figure 1

14 pages, 665 KB

Open AccessArticle

Discretionary Lane-Change Decision and Control via Parameterized Soft Actor–Critic for Hybrid Action Space

by Yuan Lin, Xiao Liu and Zishun Zheng

Machines 2024, 12(4), 213; https://doi.org/10.3390/machines12040213 - 22 Mar 2024

Cited by 10 | Viewed by 2407

Abstract

This study focuses on a crucial task in the field of autonomous driving, autonomous lane change. Autonomous lane change plays a pivotal role in improving traffic flow, alleviating driver burden, and reducing the risk of traffic accidents. However, due to the complexity and [...] Read more.

This study focuses on a crucial task in the field of autonomous driving, autonomous lane change. Autonomous lane change plays a pivotal role in improving traffic flow, alleviating driver burden, and reducing the risk of traffic accidents. However, due to the complexity and uncertainty of lane-change scenarios, the functionality of autonomous lane change still faces challenges. In this research, we conducted autonomous lane-change simulations using both deep reinforcement learning (DRL) and model predictive control (MPC). Specifically, we used the parameterized soft actor–critic (PASAC) algorithm to train a DRL-based lane-change strategy to output both discrete lane-change decisions and continuous longitudinal vehicle acceleration. We also used MPC for lane selection based on the smallest predictive car-following costs for the different lanes. For the first time, we compared the performance of DRL and MPC in the context of lane-change decisions. The simulation results indicated that, under the same reward/cost function and traffic flow, both MPC and PASAC achieved a collision rate of 0%. PASAC demonstrated a comparable performance to MPC in terms of average rewards/costs and vehicle speeds. Full article

(This article belongs to the Special Issue Data-Driven and Learning-Based Control for Vehicle Applications)

► Show Figures

Figure 1

14 pages, 7528 KB

Open AccessArticle

Optimal Power Allocation in Optical GEO Satellite Downlinks Using Model-Free Deep Learning Algorithms

by Theodore T. Kapsis, Nikolaos K. Lyras and Athanasios D. Panagopoulos

Electronics 2024, 13(3), 647; https://doi.org/10.3390/electronics13030647 - 4 Feb 2024

Cited by 5 | Viewed by 2112

Abstract

Geostationary (GEO) satellites are employed in optical frequencies for a variety of satellite services providing wide coverage and connectivity. Multi-beam GEO high-throughput satellites offer Gbps broadband rates and, jointly with low-Earth-orbit mega-constellations, are anticipated to enable a large-scale free-space optical (FSO) network. In [...] Read more.

Geostationary (GEO) satellites are employed in optical frequencies for a variety of satellite services providing wide coverage and connectivity. Multi-beam GEO high-throughput satellites offer Gbps broadband rates and, jointly with low-Earth-orbit mega-constellations, are anticipated to enable a large-scale free-space optical (FSO) network. In this paper, a power allocation methodology based on deep reinforcement learning (DRL) is proposed for optical satellite systems disregarding any channel statistics knowledge requirements. An all-FSO, multi-aperture GEO-to-ground system is considered and an ergodic capacity optimization problem for the downlink is formulated with transmitted power constraints. A power allocation algorithm was developed, aided by a deep neural network (DNN) which is fed channel state information (CSI) observations and trained in a parameterized on-policy manner through a stochastic policy gradient approach. The proposed method does not require the channels’ transition models or fading distributions. To validate and test the proposed allocation scheme, experimental measurements from the European Space Agency’s ARTEMIS optical satellite campaign were utilized. It is demonstrated that the predicted average capacity greatly exceeds other baseline heuristic algorithms while strongly converging to the supervised, unparameterized approach. The predicted average channel powers differ only by 0.1 W from the reference ones, while the baselines differ significantly more, about 0.1–0.5 W. Full article

(This article belongs to the Special Issue New Advances of Microwave and Optical Communication)

► Show Figures

Figure 1

19 pages, 599 KB

Open AccessArticle

Computation Offloading and Resource Allocation Based on P-DQN in LEO Satellite Edge Networks

by Xu Yang, Hai Fang, Yuan Gao, Xingjie Wang, Kan Wang and Zheng Liu

Sensors 2023, 23(24), 9885; https://doi.org/10.3390/s23249885 - 17 Dec 2023

Cited by 9 | Viewed by 3255

Abstract

Traditional low earth orbit (LEO) satellite networks are typically independent of terrestrial networks, which develop relatively slowly due to the on-board capacity limitation. By integrating emerging mobile edge computing (MEC) with LEO satellite networks to form the business-oriented “end-edge-cloud” multi-level computing architecture, some [...] Read more.

Traditional low earth orbit (LEO) satellite networks are typically independent of terrestrial networks, which develop relatively slowly due to the on-board capacity limitation. By integrating emerging mobile edge computing (MEC) with LEO satellite networks to form the business-oriented “end-edge-cloud” multi-level computing architecture, some computing-sensitive tasks can be offloaded by ground terminals to satellites, thereby satisfying more tasks in the network. How to make computation offloading and resource allocation decisions in LEO satellite edge networks, nevertheless, indeed poses challenges in tracking network dynamics and handling sophisticated actions. For the discrete-continuous hybrid action space and time-varying networks, this work aims to use the parameterized deep Q-network (P-DQN) for the joint computation offloading and resource allocation. First, the characteristics of time-varying channels are modeled, and then both communication and computation models under three different offloading decisions are constructed. Second, the constraints on task offloading decisions, on remaining available computing resources, and on the power control of LEO satellites as well as the cloud server are formulated, followed by the maximization problem of satisfied task number over the long run. Third, using the parameterized action Markov decision process (PAMDP) and P-DQN, the joint computing offloading, resource allocation, and power control are made in real time, to accommodate dynamics in LEO satellite edge networks and dispose of the discrete-continuous hybrid action space. Simulation results show that the proposed P-DQN method could approach the optimal control, and outperforms other reinforcement learning (RL) methods for merely either discrete or continuous action space, in terms of the long-term rate of satisfied tasks. Full article

(This article belongs to the Special Issue Integration of Satellite-Aerial-Terrestrial Networks)

► Show Figures

Figure 1

20 pages, 631 KB

Open AccessArticle

Throughput Optimization for Blockchain System with Dynamic Sharding

by Chuyi Liu, Jianxiong Wan, Leixiao Li and Bingbing Yao

Electronics 2023, 12(24), 4915; https://doi.org/10.3390/electronics12244915 - 6 Dec 2023

Cited by 11 | Viewed by 4809

Abstract

Sharding technology, which divides a network into multiple disjoint groups so that transactions can be processed in parallel, is applied to blockchain systems as a promising solution to improve Transactions Per Second (TPS). This paper considers the Optimal Blockchain Sharding (OBCS) problem as [...] Read more.

Sharding technology, which divides a network into multiple disjoint groups so that transactions can be processed in parallel, is applied to blockchain systems as a promising solution to improve Transactions Per Second (TPS). This paper considers the Optimal Blockchain Sharding (OBCS) problem as a Markov Decision Process (MDP) where the decision variables are the number of shards, block size and block interval. Previous works solved the OBCS problem via Deep Reinforcement Learning (DRL)-based methods, where the action space must be discretized to increase processability. However, the discretization degrades the quality of the solution since the optimal solution usually lies between discrete values. In this paper, we treat the block size and block interval as continuous decision variables and provide dynamic sharding strategies based on them. The Branching Dueling Q-Network Blockchain Sharding (BDQBS) algorithm is designed for discrete action spaces. Compared with traditional DRL algorithms, the BDQBS overcomes the drawbacks of high action space dimensions and difficulty in training neural networks. And it improves the performance of the blockchain system by 1.25 times. We also propose a sharding control algorithm based on the Parameterized Deep Q-Networks (P-DQN) algorithm, i.e., the Parameterized Deep Q-Networks Blockchain Sharding (P-DQNBS) algorithm, to efficiently handle the discrete–continuous hybrid action space without the scalability issues. Also, the method can effectively improve the TPS by up to

28 %

. Full article

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

► Show Figures

Figure 1

12 pages, 1357 KB

Open AccessArticle

Energy Efficient Power Allocation in Massive MIMO Based on Parameterized Deep DQN

by Shruti Sharma and Wonsik Yoon

Electronics 2023, 12(21), 4517; https://doi.org/10.3390/electronics12214517 - 2 Nov 2023

Cited by 8 | Viewed by 2586

Abstract

Machine learning offers advanced tools for efficient management of radio resources in modern wireless networks. In this study, we leverage a multi-agent deep reinforcement learning (DRL) approach, specifically the Parameterized Deep Q-Network (DQN), to address the challenging problem of power allocation and user [...] Read more.

Machine learning offers advanced tools for efficient management of radio resources in modern wireless networks. In this study, we leverage a multi-agent deep reinforcement learning (DRL) approach, specifically the Parameterized Deep Q-Network (DQN), to address the challenging problem of power allocation and user association in massive multiple-input multiple-output (M-MIMO) communication networks. Our approach tackles a multi-objective optimization problem aiming to maximize network utility while meeting stringent quality of service requirements in M-MIMO networks. To address the non-convex and nonlinear nature of this problem, we introduce a novel multi-agent DQN framework. This framework defines a large action space, state space, and reward functions, enabling us to learn a near-optimal policy. Simulation results demonstrate the superiority of our Parameterized Deep DQN (PD-DQN) approach when compared to traditional DQN and RL methods. Specifically, we show that our approach outperforms traditional DQN methods in terms of convergence speed and final performance. Additionally, our approach shows 72.2% and 108.5% improvement over DQN methods and the RL method, respectively, in handling large-scale multi-agent problems in M-MIMO networks. Full article

(This article belongs to the Special Issue Deep Reinforcement Learning and Its Latest Applications)

► Show Figures

Figure 1

18 pages, 3146 KB

Open AccessArticle

Hierarchical Episodic Control

by Rong Zhou, Zhisheng Zhang and Yuan Wang

Appl. Sci. 2023, 13(20), 11544; https://doi.org/10.3390/app132011544 - 21 Oct 2023

Viewed by 2544

Abstract

Deep reinforcement learning is one of the research hotspots in artificial intelligence and has been successfully applied in many research areas; however, the low training efficiency and high demand for samples are problems that limit the application. Inspired by the rapid learning mechanisms [...] Read more.

Deep reinforcement learning is one of the research hotspots in artificial intelligence and has been successfully applied in many research areas; however, the low training efficiency and high demand for samples are problems that limit the application. Inspired by the rapid learning mechanisms of the hippocampus, to address these problems, a hierarchical episodic control model extending episodic memory to the domain of hierarchical reinforcement learning is proposed in this paper. The model is theoretically justified and employs a hierarchical implicit memory planning approach for counterfactual trajectory value estimation. Starting from the final step and recursively moving back along the trajectory, a hidden plan is formed within the episodic memory. Experience is aggregated both along trajectories and across trajectories, and the model is updated using a multi-headed backpropagation similar to bootstrapped neural networks. This model extends the parameterized episodic memory framework to the realm of hierarchical reinforcement learning and is theoretically analyzed to demonstrate its convergence and effectiveness. Experiments conducted in four-room games, Mujoco, and UE4-based active tracking highlight that the hierarchical episodic control model effectively enhances training efficiency. It demonstrates notable improvements in both low-dimensional and high-dimensional environments, even in cases of sparse rewards. This model can enhance the training efficiency of reinforcement learning and is suitable for application scenarios that do not rely heavily on exploration, such as unmanned aerial vehicles, robot control, computer vision applications, and so on. Full article

► Show Figures

Figure 1

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI