Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (900)

Search Parameters:
Keywords = reinforcement learning (RL)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 589 KiB  
Article
Intelligent Queue Scheduling Method for SPMA-Based UAV Networks
by Kui Yang, Chenyang Xu, Guanhua Qiao, Jinke Zhong and Xiaoning Zhang
Drones 2025, 9(8), 552; https://doi.org/10.3390/drones9080552 - 6 Aug 2025
Abstract
Static Priority-based Multiple Access (SPMA) is an emerging and promising wireless MAC protocol which is widely used in Unmanned Aerial Vehicle (UAV) networks. UAV (Unmanned Aerial Vehicle) networks, also known as drone networks, refer to a system of interconnected UAVs that communicate and [...] Read more.
Static Priority-based Multiple Access (SPMA) is an emerging and promising wireless MAC protocol which is widely used in Unmanned Aerial Vehicle (UAV) networks. UAV (Unmanned Aerial Vehicle) networks, also known as drone networks, refer to a system of interconnected UAVs that communicate and collaborate to perform tasks autonomously or semi-autonomously. These networks leverage wireless communication technologies to share data, coordinate movements, and optimize mission execution. In SPMA, traffic arriving at the UAV network node can be divided into multiple priorities according to the information timeliness, and the packets of each priority are stored in the corresponding queues with different thresholds to transmit packet, thus guaranteeing the high success rate and low latency for the highest-priority traffic. Unfortunately, the multi-priority queue scheduling of SPMA deprives the packet transmitting opportunity of low-priority traffic, which results in unfair conditions among different-priority traffic. To address this problem, in this paper we propose the method of Adaptive Credit-Based Shaper with Reinforcement Learning (abbreviated as ACBS-RL) to balance the performance of all-priority traffic. In ACBS-RL, the Credit-Based Shaper (CBS) is introduced to SPMA to provide relatively fair packet transmission opportunity among multiple traffic queues by limiting the transmission rate. Due to the dynamic situations of the wireless environment, the Q-learning-based reinforcement learning method is leveraged to adaptively adjust the parameters of CBS (i.e., idleslope and sendslope) to achieve better performance among all priority queues. The extensive simulation results show that compared with traditional SPMA protocol, the proposed ACBS-RL can increase UAV network throughput while guaranteeing Quality of Service (QoS) requirements of all priority traffic. Full article
Show Figures

Figure 1

21 pages, 3733 KiB  
Article
DNO-RL: A Reinforcement-Learning-Based Approach to Dynamic Noise Optimization for Differential Privacy
by Guixin Wang, Xiangfei Liu, Yukun Zheng, Zeyu Zhang and Zhiming Cai
Electronics 2025, 14(15), 3122; https://doi.org/10.3390/electronics14153122 - 5 Aug 2025
Abstract
With the globalized deployment of cross-border vehicle location services and the trajectory data, which contain user identity information and geographically sensitive features, the variability in privacy regulations in different jurisdictions can further exacerbate the technical and compliance challenges of data privacy protection. Traditional [...] Read more.
With the globalized deployment of cross-border vehicle location services and the trajectory data, which contain user identity information and geographically sensitive features, the variability in privacy regulations in different jurisdictions can further exacerbate the technical and compliance challenges of data privacy protection. Traditional static differential privacy mechanisms struggle to accommodate spatiotemporal heterogeneity in dynamic scenarios because of the use of a fixed privacy budget parameter, leading to wasted privacy budgets or insufficient protection of sensitive regions. This study proposes a reinforcement-learning-based dynamic noise optimization method (DNO-RL) that dynamically adjusts the Laplacian noise scale by real-time sensing of vehicle density, region sensitivity, and the remaining privacy budget via a deep Q-network (DQN), with the aim of providing context-adaptive differential privacy protection for cross-border vehicle location services. Simulation experiments of cross-border scenarios based on the T-Drive dataset showed that DNO-RL reduced the average localization error by 28.3% and saved 17.9% of the privacy budget compared with the local differential privacy under the same privacy budget. This study provides a new paradigm for the dynamic privacy–utility balancing of cross-border vehicular networking services. Full article
Show Figures

Figure 1

14 pages, 1714 KiB  
Article
A Kalman Filter-Based Localization Calibration Method Optimized by Reinforcement Learning and Information Matrix Fusion
by Zijia Huang, Qiushi Xu, Menghao Sun and Xuzhen Zhu
Entropy 2025, 27(8), 821; https://doi.org/10.3390/e27080821 (registering DOI) - 1 Aug 2025
Viewed by 222
Abstract
To address the degradation in localization accuracy caused by insufficient robustness of filter parameters and inefficient multi-trajectory data fusion in dynamic environments, this paper proposes a Kalman filter-based localization calibration method optimized by reinforcement learning and information matrix fusion (RL-IMKF). An actor–critic reinforcement [...] Read more.
To address the degradation in localization accuracy caused by insufficient robustness of filter parameters and inefficient multi-trajectory data fusion in dynamic environments, this paper proposes a Kalman filter-based localization calibration method optimized by reinforcement learning and information matrix fusion (RL-IMKF). An actor–critic reinforcement learning network is designed to adaptively adjust the state covariance matrix, enhancing the Kalman filter’s adaptability to environmental changes. Meanwhile, a multi-trajectory information matrix fusion strategy is introduced, which aggregates multiple trajectories in the information domain via weighted inverse covariance matrices to suppress error propagation and improve system consistency. Experiments using both simulated and real-world sensor data demonstrate that the proposed method outperforms traditional extended Kalman filter approaches in terms of localization accuracy and stability, providing a novel solution for cooperative localization calibration of unmanned aerial vehicle (UAV) swarms in dynamic environments. Full article
(This article belongs to the Special Issue Complexity, Entropy and the Physics of Information II)
Show Figures

Figure 1

18 pages, 1910 KiB  
Article
Hierarchical Learning for Closed-Loop Robotic Manipulation in Cluttered Scenes via Depth Vision, Reinforcement Learning, and Behaviour Cloning
by Hoi Fai Yu and Abdulrahman Altahhan
Electronics 2025, 14(15), 3074; https://doi.org/10.3390/electronics14153074 - 31 Jul 2025
Viewed by 239
Abstract
Despite rapid advances in robot learning, the coordination of closed-loop manipulation in cluttered environments remains a challenging and relatively underexplored problem. We present a novel two-level hierarchical architecture for a depth vision-equipped robotic arm that integrates pushing, grasping, and high-level decision making. Central [...] Read more.
Despite rapid advances in robot learning, the coordination of closed-loop manipulation in cluttered environments remains a challenging and relatively underexplored problem. We present a novel two-level hierarchical architecture for a depth vision-equipped robotic arm that integrates pushing, grasping, and high-level decision making. Central to our approach is a prioritised action–selection mechanism that facilitates efficient early-stage learning via behaviour cloning (BC), while enabling scalable exploration through reinforcement learning (RL). A high-level decision neural network (DNN) selects between grasping and pushing actions, and two low-level action neural networks (ANNs) execute the selected primitive. The DNN is trained with RL, while the ANNs follow a hybrid learning scheme combining BC and RL. Notably, we introduce an automated demonstration generator based on oriented bounding boxes, eliminating the need for manual data collection and enabling precise, reproducible BC training signals. We evaluate our method on a challenging manipulation task involving five closely packed cubic objects. Our system achieves a completion rate (CR) of 100%, an average grasping success (AGS) of 93.1% per completion, and only 7.8 average decisions taken for completion (DTC). Comparative analysis against three baselines—a grasping-only policy, a fixed grasp-then-push sequence, and a cloned demonstration policy—highlights the necessity of dynamic decision making and the efficiency of our hierarchical design. In particular, the baselines yield lower AGS (86.6%) and higher DTC (10.6 and 11.4) scores, underscoring the advantages of content-aware, closed-loop control. These results demonstrate that our architecture supports robust, adaptive manipulation and scalable learning, offering a promising direction for autonomous skill coordination in complex environments. Full article
Show Figures

Figure 1

21 pages, 3473 KiB  
Article
Reinforcement Learning for Bipedal Jumping: Integrating Actuator Limits and Coupled Tendon Dynamics
by Yudi Zhu, Xisheng Jiang, Xiaohang Ma, Jun Tang, Qingdu Li and Jianwei Zhang
Mathematics 2025, 13(15), 2466; https://doi.org/10.3390/math13152466 - 31 Jul 2025
Viewed by 260
Abstract
In high-dynamic bipedal locomotion control, robotic systems are often constrained by motor torque limitations, particularly during explosive tasks such as jumping. One of the key challenges in reinforcement learning lies in bridging the sim-to-real gap, which mainly stems from both inaccuracies in simulation [...] Read more.
In high-dynamic bipedal locomotion control, robotic systems are often constrained by motor torque limitations, particularly during explosive tasks such as jumping. One of the key challenges in reinforcement learning lies in bridging the sim-to-real gap, which mainly stems from both inaccuracies in simulation models and the limitations of motor torque output, ultimately leading to the failure of deploying learned policies in real-world systems. Traditional RL methods usually focus on peak torque limits but ignore that motor torque changes with speed. By only limiting peak torque, they prevent the torque from adjusting dynamically based on velocity, which can reduce the system’s efficiency and performance in high-speed tasks. To address these issues, this paper proposes a reinforcement learning jump-control framework tailored for tendon-driven bipedal robots, which integrates dynamic torque boundary constraints and torque error-compensation modeling. First, we developed a torque transmission coefficient model based on the tendon-driven mechanism, taking into account tendon elasticity and motor-control errors, which significantly improves the modeling accuracy. Building on this, we derived a dynamic joint torque limit that adapts to joint velocity, and designed a torque-aware reward function within the reinforcement learning environment, aimed at encouraging the policy to implicitly learn and comply with physical constraints during training, effectively bridging the gap between simulation and real-world performance. Hardware experimental results demonstrate that the proposed method effectively satisfies actuator safety limits while achieving more efficient and stable jumping behavior. This work provides a general and scalable modeling and control framework for learning high-dynamic bipedal motion under complex physical constraints. Full article
Show Figures

Figure 1

24 pages, 2070 KiB  
Article
Reinforcement Learning-Based Finite-Time Sliding-Mode Control in a Human-in-the-Loop Framework for Pediatric Gait Exoskeleton
by Matthew Wong Sang and Jyotindra Narayan
Machines 2025, 13(8), 668; https://doi.org/10.3390/machines13080668 - 30 Jul 2025
Viewed by 269
Abstract
Rehabilitation devices such as actuated lower-limb exoskeletons can provide essential mobility assistance for pediatric patients with gait impairments. Enhancing their control systems under conditions of user variability and dynamic disturbances remains a significant challenge, particularly in active-assist modes. This study presents a human-in-the-loop [...] Read more.
Rehabilitation devices such as actuated lower-limb exoskeletons can provide essential mobility assistance for pediatric patients with gait impairments. Enhancing their control systems under conditions of user variability and dynamic disturbances remains a significant challenge, particularly in active-assist modes. This study presents a human-in-the-loop control architecture for a pediatric lower-limb exoskeleton, combining outer-loop admittance control with robust inner-loop trajectory tracking via a non-singular terminal sliding-mode (NSTSM) controller. Designed for active-assist gait rehabilitation in children aged 8–12 years, the exoskeleton dynamically responds to user interaction forces while ensuring finite-time convergence under system uncertainties. To enhance adaptability, we augment the inner-loop control with a twin delayed deep deterministic policy gradient (TD3) reinforcement learning framework. The actor–critic RL agent tunes NSTSM gains in real-time, enabling personalized model-free adaptation to subject-specific gait dynamics and external disturbances. The numerical simulations show improved trajectory tracking, with RMSE reductions of 27.82% (hip) and 5.43% (knee), and IAE improvements of 40.85% and 10.20%, respectively, over the baseline NSTSM controller. The proposed approach also reduced the peak interaction torques across all the joints, suggesting more compliant and comfortable assistance for users. While minor degradation is observed at the ankle joint, the TD3-NSTSM controller demonstrates improved responsiveness and stability, particularly in high-load joints. This research contributes to advancing pediatric gait rehabilitation using RL-enhanced control, offering improved mobility support and adaptive rehabilitation outcomes. Full article
Show Figures

Figure 1

23 pages, 5330 KiB  
Article
Explainable Reinforcement Learning for the Initial Design Optimization of Compressors Inspired by the Black-Winged Kite
by Mingming Zhang, Zhuang Miao, Xi Nan, Ning Ma and Ruoyang Liu
Biomimetics 2025, 10(8), 497; https://doi.org/10.3390/biomimetics10080497 - 29 Jul 2025
Viewed by 384
Abstract
Although artificial intelligence methods such as reinforcement learning (RL) show potential in optimizing the design of compressors, there are still two major challenges remaining: limited design variables and insufficient model explainability. For the initial design of compressors, this paper proposes a technical approach [...] Read more.
Although artificial intelligence methods such as reinforcement learning (RL) show potential in optimizing the design of compressors, there are still two major challenges remaining: limited design variables and insufficient model explainability. For the initial design of compressors, this paper proposes a technical approach that incorporates deep reinforcement learning and decision tree distillation to enhance both the optimization capability and explainability. First, a pre-selection platform for the initial design scheme of the compressors is constructed based on the Deep Deterministic Policy Gradient (DDPG) algorithm. The optimization space is significantly enlarged by expanding the co-design of 25 key variables (e.g., the inlet airflow angle, the reaction, the load coefficient, etc.). Then, the initial design of six-stage axial compressors is successfully completed, with the axial efficiency increasing to 84.65% at the design speed and the surge margin extending to 10.75%. The design scheme is closer to the actual needs of engineering. Secondly, Shapley Additive Explanations (SHAP) analysis is utilized to reveal the influence of the mechanism of the key design parameters on the performance of the compressors in order to enhance the model explainability. Finally, the decision tree inspired by the black-winged kite (BKA) algorithm takes the interpretable design rules and transforms the data-driven intelligent optimization into explicit engineering experience. Through experimental validation, this method significantly improves the transparency of the design process while maintaining the high performance of the DDPG algorithm. The extracted design rules not only have clear physical meanings but also can effectively guide the initial design of the compressors, providing a new idea with both optimization capability and explainability for its intelligent design. Full article
(This article belongs to the Special Issue Advances in Biological and Bio-Inspired Algorithms)
Show Figures

Figure 1

27 pages, 405 KiB  
Article
Comparative Analysis of Centralized and Distributed Multi-UAV Task Allocation Algorithms: A Unified Evaluation Framework
by Yunze Song, Zhexuan Ma, Nuo Chen, Shenghao Zhou and Sutthiphong Srigrarom
Drones 2025, 9(8), 530; https://doi.org/10.3390/drones9080530 - 28 Jul 2025
Viewed by 361
Abstract
Unmanned aerial vehicles (UAVs), commonly known as drones, offer unprecedented flexibility for complex missions such as area surveillance, search and rescue, and cooperative inspection. This paper presents a unified evaluation framework for the comparison of centralized and distributed task allocation algorithms specifically tailored [...] Read more.
Unmanned aerial vehicles (UAVs), commonly known as drones, offer unprecedented flexibility for complex missions such as area surveillance, search and rescue, and cooperative inspection. This paper presents a unified evaluation framework for the comparison of centralized and distributed task allocation algorithms specifically tailored to multi-UAV operations. We first contextualize the classical assignment problem (AP) under UAV mission constraints, including the flight time, propulsion energy capacity, and communication range, and evaluate optimal one-to-one solvers including the Hungarian algorithm, the Bertsekas ϵ-auction algorithm, and a minimum cost maximum flow formulation. To reflect the dynamic, uncertain environments that UAV fleets encounter, we extend our analysis to distributed multi-UAV task allocation (MUTA) methods. In particular, we examine the consensus-based bundle algorithm (CBBA) and a distributed auction 2-opt refinement strategy, both of which iteratively negotiate task bundles across UAVs to accommodate real-time task arrivals and intermittent connectivity. Finally, we outline how reinforcement learning (RL) can be incorporated to learn adaptive policies that balance energy efficiency and mission success under varying wind conditions and obstacle fields. Through simulations incorporating UAV-specific cost models and communication topologies, we assess each algorithm’s mission completion time, total energy expenditure, communication overhead, and resilience to UAV failures. Our results highlight the trade-off between strict optimality, which is suitable for small fleets in static scenarios, and scalable, robust coordination, necessary for large, dynamic multi-UAV deployments. Full article
Show Figures

Figure 1

37 pages, 1037 KiB  
Review
Machine Learning for Flood Resiliency—Current Status and Unexplored Directions
by Venkatesh Uddameri and E. Annette Hernandez
Environments 2025, 12(8), 259; https://doi.org/10.3390/environments12080259 - 28 Jul 2025
Viewed by 738
Abstract
A systems-oriented review of machine learning (ML) over the entire flood management spectrum, encompassing fluvial flood control, pluvial flood management, and resiliency-risk characterization was undertaken. Deep learners like long short-term memory (LSTM) networks perform well in predicting reservoir inflows and outflows. Convolution neural [...] Read more.
A systems-oriented review of machine learning (ML) over the entire flood management spectrum, encompassing fluvial flood control, pluvial flood management, and resiliency-risk characterization was undertaken. Deep learners like long short-term memory (LSTM) networks perform well in predicting reservoir inflows and outflows. Convolution neural networks (CNNs) and other object identification algorithms are being explored in assessing levee and flood wall failures. The use of ML methods in pump station operations is limited due to lack of public-domain datasets. Reinforcement learning (RL) has shown promise in controlling low-impact development (LID) systems for pluvial flood management. Resiliency is defined in terms of the vulnerability of a community to floods. Multi-criteria decision making (MCDM) and unsupervised ML methods are used to capture vulnerability. Supervised learning is used to model flooding hazards. Conventional approaches perform better than deep learners and ensemble methods for modeling flood hazards due to paucity of data and large inter-model predictive variability. Advances in satellite-based, drone-facilitated data collection and Internet of Things (IoT)-based low-cost sensors offer new research avenues to explore. Transfer learning at ungauged basins holds promise but is largely unexplored. Explainable artificial intelligence (XAI) is seeing increased use and helps the transition of ML models from black-box forecasters to knowledge-enhancing predictors. Full article
(This article belongs to the Special Issue Hydrological Modeling and Sustainable Water Resources Management)
Show Figures

Figure 1

25 pages, 3791 KiB  
Article
Optimizing Multitenancy: Adaptive Resource Allocation in Serverless Cloud Environments Using Reinforcement Learning
by Mohammed Naif Alatawi
Electronics 2025, 14(15), 3004; https://doi.org/10.3390/electronics14153004 - 28 Jul 2025
Viewed by 159
Abstract
The growing adoption of serverless computing has highlighted critical challenges in resource allocation, policy fairness, and energy efficiency within multitenancy cloud environments. This research proposes a reinforcement learning (RL)-based adaptive resource allocation framework to address these issues. The framework models resource allocation as [...] Read more.
The growing adoption of serverless computing has highlighted critical challenges in resource allocation, policy fairness, and energy efficiency within multitenancy cloud environments. This research proposes a reinforcement learning (RL)-based adaptive resource allocation framework to address these issues. The framework models resource allocation as a Markov Decision Process (MDP) with dynamic states that include latency, resource utilization, and energy consumption. A reward function is designed to optimize the throughput, latency, and energy efficiency while ensuring fairness among tenants. The proposed model demonstrates significant improvements over heuristic approaches, achieving a 50% reduction in latency (from 250 ms to 120 ms), a 38.9% increase in throughput (from 180 tasks/s to 250 tasks/s), and a 35% improvement in energy efficiency. Additionally, the model reduces operational costs by 40%, achieves SLA compliance rates above 98%, and enhances fairness by lowering the Gini coefficient from 0.25 to 0.10. Under burst loads, the system maintains a service level objective success rate of 94% with a time to scale of 6 s. These results underscore the potential of RL-based solutions for dynamic workload management, paving the way for more scalable, cost-effective, and sustainable serverless multitenancy systems. Full article
(This article belongs to the Special Issue New Advances in Cloud Computing and Its Latest Applications)
Show Figures

Figure 1

18 pages, 889 KiB  
Article
Dynamic Leader Election and Model-Free Reinforcement Learning for Coordinated Voltage and Reactive Power Containment Control in Offshore Island AC Microgrids
by Xiaolu Ye, Zhanshan Wang, Qiufu Wang and Shuran Wang
J. Mar. Sci. Eng. 2025, 13(8), 1432; https://doi.org/10.3390/jmse13081432 - 27 Jul 2025
Viewed by 161
Abstract
Island microgrids are essential for the exploitation and utilization of offshore renewable energy resources. However, voltage regulation and accurate reactive power sharing remain significant technical challenges that need to be addressed. To tackle these issues, this paper proposes an algorithm that integrates a [...] Read more.
Island microgrids are essential for the exploitation and utilization of offshore renewable energy resources. However, voltage regulation and accurate reactive power sharing remain significant technical challenges that need to be addressed. To tackle these issues, this paper proposes an algorithm that integrates a dynamic leader election (DLE) mechanism and model-free reinforcement learning (RL). The algorithm aims to address the issue of fixed leaders restricting reactive power flow between buses during heavy load variations in island microgrids, while also overcoming the challenge of obtaining model parameters such as resistance and inductance in practical microgrids. First, we establish a voltage containment control and reactive power error model for island alternating current (AC) microgrids and construct a corresponding value function based on this error model. Second, a dynamic leader election algorithm is designed to address the issue of fixed leaders restricting reactive power flow between buses due to preset voltage limits under unknown or heavy load conditions. The algorithm adaptively selects leaders based on bus load, allowing the voltage limits to adjust accordingly and regulating reactive power flow. Then, to address the difficulty of accurately acquiring parameters such as resistance and inductance in microgrid lines, a model-free reinforcement learning method is introduced. This method relies on real-time measurements of voltage and reactive power data, without requiring specific model parameters. Ultimately, simulation experiments on offshore island microgrids are conducted to validate the effectiveness of the proposed algorithm. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

16 pages, 1823 KiB  
Article
Collaborative Target Tracking Algorithm for Multi-Agent Based on MAPPO and BCTD
by Yuebin Zhou, Yunling Yue, Bolun Yan, Linkun Li, Jinsheng Xiao and Yuan Yao
Drones 2025, 9(8), 521; https://doi.org/10.3390/drones9080521 - 24 Jul 2025
Viewed by 284
Abstract
Target tracking is a representative task in multi-agent reinforcement learning (MARL), where agents must collaborate effectively in environments with dense obstacles, evasive targets, and high-dimensional observations—conditions that often lead to local optima and training inefficiencies. To address these challenges, this paper proposes a [...] Read more.
Target tracking is a representative task in multi-agent reinforcement learning (MARL), where agents must collaborate effectively in environments with dense obstacles, evasive targets, and high-dimensional observations—conditions that often lead to local optima and training inefficiencies. To address these challenges, this paper proposes a collaborative tracking algorithm for UAVs that integrates behavior cloning with temporal difference (BCTD) and multi-agent proximal policy optimization (MAPPO). Expert trajectories are generated using the artificial potential field (APF), followed by policy pre-training via behavior cloning and TD-based value optimization. MAPPO is then employed for dynamic fine-tuning, enhancing robustness and coordination. Experiments in a simulated environment show that the proposed MAPPO+BCTD framework outperforms MAPPO, QMIX, and MADDPG in success rate, convergence speed, and tracking efficiency. The proposed method effectively alleviates the local optimization problem of APF and the training inefficiency problem of RL, offering a scalable and reliable solution for dynamic multi-agent coordination. Full article
(This article belongs to the Special Issue Cooperative Perception for Modern Transportation)
Show Figures

Figure 1

35 pages, 1231 KiB  
Review
Toward Intelligent Underwater Acoustic Systems: Systematic Insights into Channel Estimation and Modulation Methods
by Imran A. Tasadduq and Muhammad Rashid
Electronics 2025, 14(15), 2953; https://doi.org/10.3390/electronics14152953 - 24 Jul 2025
Viewed by 317
Abstract
Underwater acoustic (UWA) communication supports many critical applications but still faces several physical-layer signal processing challenges. In response, recent advances in machine learning (ML) and deep learning (DL) offer promising solutions to improve signal detection, modulation adaptability, and classification accuracy. These developments highlight [...] Read more.
Underwater acoustic (UWA) communication supports many critical applications but still faces several physical-layer signal processing challenges. In response, recent advances in machine learning (ML) and deep learning (DL) offer promising solutions to improve signal detection, modulation adaptability, and classification accuracy. These developments highlight the need for a systematic evaluation to compare various ML/DL models and assess their performance across diverse underwater conditions. However, most existing reviews on ML/DL-based UWA communication focus on isolated approaches rather than integrated system-level perspectives, which limits cross-domain insights and reduces their relevance to practical underwater deployments. Consequently, this systematic literature review (SLR) synthesizes 43 studies (2020–2025) on ML and DL approaches for UWA communication, covering channel estimation, adaptive modulation, and modulation recognition across both single- and multi-carrier systems. The findings reveal that models such as convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and generative adversarial networks (GANs) enhance channel estimation performance, achieving error reductions and bit error rate (BER) gains ranging from 103 to 106. Adaptive modulation techniques incorporating support vector machines (SVMs), CNNs, and reinforcement learning (RL) attain classification accuracies exceeding 98% and throughput improvements of up to 25%. For modulation recognition, architectures like sequence CNNs, residual networks, and hybrid convolutional–recurrent models achieve up to 99.38% accuracy with latency below 10 ms. These performance metrics underscore the viability of ML/DL-based solutions in optimizing physical-layer tasks for real-world UWA deployments. Finally, the SLR identifies key challenges in UWA communication, including high complexity, limited data, fragmented performance metrics, deployment realities, energy constraints and poor scalability. It also outlines future directions like lightweight models, physics-informed learning, advanced RL strategies, intelligent resource allocation, and robust feature fusion to build reliable and intelligent underwater systems. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 354 KiB  
Article
Adaptive Broadcast Scheme with Fuzzy Logic and Reinforcement Learning Dynamic Membership Functions in Mobile Ad Hoc Networks
by Akobir Ismatov, BeomKyu Suh, Jian Kim, YongBeom Park and Ki-Il Kim
Mathematics 2025, 13(15), 2367; https://doi.org/10.3390/math13152367 - 23 Jul 2025
Viewed by 236
Abstract
Broadcasting in Mobile Ad Hoc Networks (MANETs) is significantly challenged by dynamic network topologies. Traditional fuzzy logic-based schemes that often rely on static fuzzy tables and fixed membership functions are limiting their ability to adapt to evolving network conditions. To address these limitations, [...] Read more.
Broadcasting in Mobile Ad Hoc Networks (MANETs) is significantly challenged by dynamic network topologies. Traditional fuzzy logic-based schemes that often rely on static fuzzy tables and fixed membership functions are limiting their ability to adapt to evolving network conditions. To address these limitations, in this paper, we conduct a comparative study of two innovative broadcasting schemes that enhance adaptability through dynamic fuzzy logic membership functions for the broadcasting problem. The first approach (Model A) dynamically adjusts membership functions based on changing network parameters and fine-tunes the broadcast (BC) versus do-not-broadcast (DNB) ratio. Model B, on the other hand, introduces a multi-profile switching mechanism that selects among distinct fuzzy parameter sets optimized for various macro-level scenarios, such as energy constraints or node density, without altering the broadcasting ratio. Reinforcement learning (RL) is employed in both models: in Model A for BC/DNB ratio optimization, and in Model B for action decisions within selected profiles. Unlike prior fuzzy logic or reinforcement learning approaches that rely on fixed profiles or static parameter sets, our work introduces adaptability at both the membership function and profile selection levels, significantly improving broadcasting efficiency and flexibility across diverse MANET conditions. Comprehensive simulations demonstrate that both proposed schemes significantly reduce redundant broadcasts and collisions, leading to lower network overhead and improved message delivery reliability compared to traditional static methods. Specifically, our models achieve consistent packet delivery ratios (PDRs), reduce end-to-end Delay by approximately 23–27%, and lower Redundancy and Overhead by 40–60% and 40–50%, respectively, in high-density and high-mobility scenarios. Furthermore, this comparative analysis highlights the strengths and trade-offs between reinforcement learning-driven broadcasting ratio optimization (Model A) and parameter-based dynamic membership function adaptation (Model B), providing valuable insights for optimizing broadcasting strategies. Full article
Show Figures

Figure 1

27 pages, 5145 KiB  
Article
An Improved Deep Q-Learning Approach for Navigation of an Autonomous UAV Agent in 3D Obstacle-Cluttered Environment
by Ghulam Farid, Muhammad Bilal, Lanyong Zhang, Ayman Alharbi, Ishaq Ahmed and Muhammad Azhar
Drones 2025, 9(8), 518; https://doi.org/10.3390/drones9080518 - 23 Jul 2025
Viewed by 321
Abstract
The performance of the UAVs while executing various mission profiles greatly depends on the selection of planning algorithms. Reinforcement learning (RL) algorithms can effectively be utilized for robot path planning. Due to random action selection in case of action ties, the traditional Q-learning [...] Read more.
The performance of the UAVs while executing various mission profiles greatly depends on the selection of planning algorithms. Reinforcement learning (RL) algorithms can effectively be utilized for robot path planning. Due to random action selection in case of action ties, the traditional Q-learning algorithm and its other variants face the issues of slow convergence and suboptimal path planning in high-dimensional navigational environments. To solve these problems, we propose an improved deep Q-network (DQN), incorporating an efficient tie-breaking mechanism, prioritized experience replay (PER), and L2-regularization. The adopted tie-breaking mechanism improves the action selection and ultimately helps in generating an optimal trajectory for the UAV in a 3D cluttered environment. To improve the convergence speed of the traditional Q-algorithm, prioritized experience replay is used, which learns from experiences with high temporal difference (TD) error and avoids uniform sampling of stored transitions during training. This also allows the prioritization of high-reward experiences (e.g., reaching a goal), which helps the agent to rediscover these valuable states and improve learning. Moreover, L2-regularization is adopted that encourages smaller weights for more stable and smoother Q-values to reduce the erratic action selections and promote smoother UAV flight paths. Finally, the performance of the proposed method is presented and thoroughly compared against the traditional DQN, demonstrating its superior effectiveness. Full article
Show Figures

Figure 1

Back to TopTop