Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,469)

Search Parameters:
Keywords = deep reinforcement learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3662 KB  
Article
Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis
by Lucía Güitta-López, Jaime Boal and Álvaro J. López-López
AI 2026, 7(4), 120; https://doi.org/10.3390/ai7040120 (registering DOI) - 31 Mar 2026
Abstract
The use of virtual environments to collect the experience required by deep reinforcement learning models is accelerating the deployment of these algorithms in industrial environments. However, once the experience-gathering problem is solved, it is necessary to address how to efficiently transfer the knowledge [...] Read more.
The use of virtual environments to collect the experience required by deep reinforcement learning models is accelerating the deployment of these algorithms in industrial environments. However, once the experience-gathering problem is solved, it is necessary to address how to efficiently transfer the knowledge from the virtual scenario to reality. This paper focuses on examining Progressive Neural Networks (PNNs) as a promising transfer learning technique. The analyses carried out range from studying the capabilities and limits of the layers responsible for learning the state representation from a pixel space, which could arguably be the convolutional blocks, to the forgetting agents suffer when learning a new task. Introducing controlled visual changes in the environment scene can lead to a performance degradation of 50.3% in the worst-case scenario. These visual discrepancies significantly impact the agent’s learning time and accuracy when using a PNN architecture. Regarding the PNN forgetting assessment, partial forgetting occurs in two of the three environments analyzed, those where the agent masters its new task. This could be due to a balance between the relevance of the new features learned and the ones inherited from the teacher agent. Full article
Show Figures

Figure 1

28 pages, 4715 KB  
Article
Techno-Economic and SLA-Aware Control of 5G Cloud-RAN via Multi-Objective and Penalty-Constrained Reinforcement Learning
by Sherif M. Aboul, Hala M. Abd El Kader, Esraa M. Eid and Shimaa S. Ali
Network 2026, 6(2), 20; https://doi.org/10.3390/network6020020 - 31 Mar 2026
Abstract
Fifth-generation (5G) mobile networks must simultaneously satisfy stringent latency targets, high user density, and energy-aware operation across heterogeneous services. Cloud Radio Access Networks (C-RAN) provide architectural flexibility through centralized baseband processing, but they also introduce new control challenges related to fronthaul constraints, dynamic [...] Read more.
Fifth-generation (5G) mobile networks must simultaneously satisfy stringent latency targets, high user density, and energy-aware operation across heterogeneous services. Cloud Radio Access Networks (C-RAN) provide architectural flexibility through centralized baseband processing, but they also introduce new control challenges related to fronthaul constraints, dynamic traffic variations, and joint radio–compute coordination with Mobile Edge Computing (MEC). This paper proposes a unified AI-driven optimization framework for adaptive 5G C-RAN management, where the controller dynamically tunes key system decisions—including functional split selection, TDD downlink ratio, user–RU association, fronthaul load management, and MEC offloading proportion. To enable fair benchmarking under identical simulation settings, a static baseline policy is compared against five adaptive control strategies: Deep Q-Network (DQN), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Multi-Objective Reinforcement Learning (MORL), and a Deterministic Service-Level Agreement (SLA)-aware controller Penalty-Constrained Hierarchical Action Controller (PCHAC). Performance evaluation across techno-economic and service KPIs shows that intelligent control significantly improves operational profit, tail-latency behavior, and energy efficiency while enhancing SLA compliance compared with non-adaptive operation. The results highlight the practicality of multi-objective and constraint-aware learning for next-generation C-RAN orchestration under scaling traffic demand. Full article
Show Figures

Figure 1

22 pages, 1119 KB  
Article
The Dual-Core Driving Mechanism of Intelligent Oilfield Development: From Data Perception to Decision-Optimized Ecosystems
by Junxiang Wang, Fei Li, Jing Hu, Xincheng Ma, Siyan Hong, Jun Luo, Tianyu Bao, Shuoyao Dong, Yuming Yang, Jun Chu, Yushin Evgeny Sergeevich and Li He
Processes 2026, 14(7), 1120; https://doi.org/10.3390/pr14071120 - 30 Mar 2026
Abstract
Intelligent oilfield development is experiencing an increasingly deep integration between localized automation and integrated, data-centric ecosystems. To systematically delineate the knowledge structure and technological trajectories within this field, this study analyzes 225 high-quality publications. This study innovatively employs a custom toolchain based on [...] Read more.
Intelligent oilfield development is experiencing an increasingly deep integration between localized automation and integrated, data-centric ecosystems. To systematically delineate the knowledge structure and technological trajectories within this field, this study analyzes 225 high-quality publications. This study innovatively employs a custom toolchain based on the Dart language for heterogeneous data cleaning and standardization, ensuring high accuracy and scientific rigor in the analysis samples. The investigation reveals a distinct dual-core driving mechanism underpinning recent advancements: a cognitive cluster centered on Artificial Intelligence and Deep Learning for complex data interpretation and prediction, and a decision-making cluster focused on Operational Optimization and Predictive Modeling for production enhancement. These two clusters respectively encompass eight sub-clusters: “artificial intelligence,” “machine learning,” “deep learning,” “performance,” “enhanced oil recovery,” “model,” “optimization,” and “predication.” This dual-core framework signifies a paradigm shift from experience-based practices to a synergistic “AI-enabled + mathematical optimization” approach. The analysis further explores emerging trends, including the potential of deep reinforcement learning for dynamic decision-making and the critical role of cybersecurity and model robustness in safety risk management. By mapping the current landscape and core mechanisms, this study provides a foundational reference for researchers and practitioners to navigate the future development of intelligent oilfields towards more resilient and efficient ecosystems. Full article
23 pages, 3054 KB  
Article
A Graph Reinforcement Learning-Based Charging Guidance Strategy for Electric Vehicles in Faulty Electricity–Transportation Coupled Networks
by Yi Pan, Mingshen Wang, Haiqing Gan, Xize Jiao, Kemin Dai, Xinyu Xu, Yuhai Chen and Zhe Chen
Symmetry 2026, 18(4), 591; https://doi.org/10.3390/sym18040591 - 30 Mar 2026
Abstract
To address the issues of load aggregation and traffic congestion in faulty electricity–transportation coupled networks (ETCNs), this paper proposes an electric vehicle (EV) charging guidance strategy based on Graph Reinforcement Learning (GRL). First, a graph-structured feature extraction model is developed. The GraphSAGE module [...] Read more.
To address the issues of load aggregation and traffic congestion in faulty electricity–transportation coupled networks (ETCNs), this paper proposes an electric vehicle (EV) charging guidance strategy based on Graph Reinforcement Learning (GRL). First, a graph-structured feature extraction model is developed. The GraphSAGE module is employed to capture the multi-scale spatiotemporal features of the ETCN. The topological changes and energy-information interaction characteristics under fault scenarios are analyzed. Second, a Finite Markov Decision Process (FMDP) framework is established to address the stochastic and dynamic nature of EV charging behavior. The charging station selection and route planning problem is transformed into an agent decision-making process. A reward function is designed by incorporating voltage constraints, traffic flow constraints, and state-of-charge margin penalties. This ensures a balanced consideration of power grid security and traffic efficiency. The FMDP model is then solved using a Deep Q-Network (DQN) to achieve optimal EV charging guidance under fault conditions. Finally, case studies are conducted on a coupled simulation scenario consisting of an IEEE 33-node power distribution system and a 23-node transportation network. Results show that the proposed method reduces the system operation cost to 218,000 CNY, controls the voltage deviation rate of the distribution network at 3.1% in line with the operation standard, and enables the model to achieve stable convergence after only 250 training episodes. It can effectively optimize the charging load distribution and maintain the voltage stability of the power grid under fault conditions. Full article
(This article belongs to the Special Issue Symmetry with Power Systems: Control and Optimization)
37 pages, 9197 KB  
Article
Research on Intelligent Path Planning and Management of X-Type Mecanum-Wheeled Mobile Robot Based on Improved Proximal Policy Optimization–Gated Recurrent Unit Model
by Ning An, Songlin Yang and Shihan Kong
Machines 2026, 14(4), 382; https://doi.org/10.3390/machines14040382 - 30 Mar 2026
Abstract
To enhance the navigation efficiency and obstacle avoidance capability of omnidirectional mobile robots in unstructured and complex environments, this paper conducts research on intelligent path planning and management for X-type Mecanum-wheeled mobile robots with the improved Proximal Policy Optimization–Gated Recurrent Unit (PPO-GRU) model [...] Read more.
To enhance the navigation efficiency and obstacle avoidance capability of omnidirectional mobile robots in unstructured and complex environments, this paper conducts research on intelligent path planning and management for X-type Mecanum-wheeled mobile robots with the improved Proximal Policy Optimization–Gated Recurrent Unit (PPO-GRU) model on the basis of robot kinematics modeling and deep reinforcement learning. First, by performing kinematic modeling of the X-type Mecanum-wheeled chassis and designing a high-dimensional state space along with a multi-factor composite reward function, the agent training environment for the robot–environment interaction control is established, laying the environmental foundation for in-depth research on path planning. Second, based on the construction of a Proximal Policy Optimization (PPO) path planning model, the PPO model is integrated with Gated Recurrent Units (GRUs) to form an improved PPO-GRU path planning model, thereby achieving an end-to-end path planning strategy. Finally, using a self-developed kinematic simulation platform for the X-type Mecanum-wheeled robot, the rationality and robustness of the proposed path planning model are investigated through ablation experiments, comparative experiments, dynamic environment tests, and tests considering key real-world phenomena. The research results indicate that the improved PPO-GRU path planning model increases the path planning success rate to 96%, reduces the average number of collisions by 82.7%, and achieves an average linear velocity reaching 84.5% of the maximum speed set in the environment. While attaining high-precision and robust planning management for autonomous navigation paths, it significantly improves the response speed of the agent’s autonomous navigation path planning. Full article
(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)
23 pages, 2287 KB  
Article
Large-Scale Metro Train Timetable Rescheduling via Multi-Agent Deep Reinforcement Learning: A High-Dimensional Optimization Approach in Flatland Environment
by Jufen Yang, Haozhe Yang, Weikang Wang and Chengyang Xia
Appl. Sci. 2026, 16(7), 3338; https://doi.org/10.3390/app16073338 - 30 Mar 2026
Abstract
Metro train timetable rescheduling (TTR) is a critical task for ensuring the reliability of urban rail transit systems. However, with the increasing density of railway networks and the growing number of operational trains, TTR has evolved into a typical high-dimensional and large-scale optimization [...] Read more.
Metro train timetable rescheduling (TTR) is a critical task for ensuring the reliability of urban rail transit systems. However, with the increasing density of railway networks and the growing number of operational trains, TTR has evolved into a typical high-dimensional and large-scale optimization problem. Traditional mathematical programming and heuristic approaches often struggle with the “curse of dimensionality” and fail to provide real-time responses under stochastic disturbances. To address these challenges, this paper proposes a novel framework based on Multi-Agent Deep Reinforcement Learning (MADRL). Specifically, we model the TTR problem as a decentralized cooperative process and utilize the Multi-Agent Advantage Actor-Critic (MAA2C) algorithm to optimize train schedules dynamically. The proposed framework is implemented within the Flatland simulation environment, which allows for the representation of complex arbitrary topologies. We design a composite reward function that minimizes total delay deviation while maximizing passenger satisfaction, subject to constraints such as headway, operating time, and train capacity. Furthermore, to enhance the robustness of the model against high-dimensional state uncertainties, random disturbances following a negative exponential distribution are introduced during training. Experimental results across various scenarios—ranging from simple dual-track to complex random networks—demonstrate that the MAA2C-based approach significantly outperforms traditional baselines. It not only achieves faster convergence in small-scale scenarios but also demonstrates superior computational efficiency and scalability in large-scale environments, effectively minimizing passenger waiting times. This study validates the potential of MADRL in solving high-dimensional traffic control problems for intelligent transportation systems. Full article
(This article belongs to the Special Issue Advances in Transportation and Smart City)
36 pages, 4649 KB  
Article
A Multi-Objective Collaborative Optimization Approach for Building Integrated Energy Systems Based on Deep Reinforcement Learning
by Limin Wang, Yongkai Wu, Jumin Zhao, Wei Gao and Dengao Li
Appl. Sci. 2026, 16(7), 3280; https://doi.org/10.3390/app16073280 - 28 Mar 2026
Viewed by 91
Abstract
To address the challenges of coordinated optimization in building integrated energy systems (IES) under the dual-carbon targets—characterized by strong multi-energy coupling, significant uncertainty in renewable generation, and stringent safety constraints—a novel safe deep reinforcement learning algorithm, Safe-DDPG, is proposed. Traditional deep reinforcement learning [...] Read more.
To address the challenges of coordinated optimization in building integrated energy systems (IES) under the dual-carbon targets—characterized by strong multi-energy coupling, significant uncertainty in renewable generation, and stringent safety constraints—a novel safe deep reinforcement learning algorithm, Safe-DDPG, is proposed. Traditional deep reinforcement learning methods often suffer from high constraint-violation risk and limited policy reliability due to coupled objectives in building IES optimization. To overcome these limitations, a dual-channel critic architecture is designed to independently evaluate and decouple economic and safety objectives. In addition, a dynamic safety–penalty mechanism based on logarithmic barrier functions is introduced, together with an adaptive exploration strategy, enabling dynamic balancing between economic cost and constraint satisfaction according to system states during training. Experimental results demonstrate that, compared with mainstream algorithms, Safe-DDPG achieves substantial improvements across multiple key performance indicators: safety violations are reduced by up to 96.7%, average daily operating costs decrease by 18.5%, and cumulative rewards increase by more than 30%. Ablation studies further confirm the effectiveness and necessity of each core component. Two DRL methods from reference papers are reproduced, and their performance is compared with the proposed method in the existing experimental results, showing that the proposed method has significant advantages in reward value and economic cost. This work provides a safe, reliable, and efficient reinforcement-learning-based approach for optimization and scheduling of building energy systems under complex operational constraints. Full article
31 pages, 10290 KB  
Article
Incremental Nonlinear Reinforcement Learning Control for a Civil Aircraft with Model Uncertainties and Actuator Faults
by Qian Zhang, Weizhi Lyu, Congjie Yang, Jiaxin Chen and Shiqian Liu
Aerospace 2026, 13(4), 315; https://doi.org/10.3390/aerospace13040315 - 27 Mar 2026
Viewed by 139
Abstract
The problem of fault-tolerant attitude tracking control for the civil aircraft with model uncertainties and actuator faults is studied. A robust multiple inversion-based incremental nonlinear dynamic inversion (RMI-INDI) fault-tolerant control method is proposed for the problem. Firstly, considering that the higher-order term is [...] Read more.
The problem of fault-tolerant attitude tracking control for the civil aircraft with model uncertainties and actuator faults is studied. A robust multiple inversion-based incremental nonlinear dynamic inversion (RMI-INDI) fault-tolerant control method is proposed for the problem. Firstly, considering that the higher-order term is neglected in the INDI method, an RMI method is proposed to deal with the higher-order term and model uncertainties of the INDI control. Secondly, to achieve the optimal control parameters for the INDI controller, a reinforcement learning (RL) method is suggested, where a Deep Deterministic Policy Gradient (DDPG) algorithm with a smooth reward function is designed. Finally, performances of the proposed RL-RMI-INDI fault-tolerant controller are demonstrated by using two scenario simulations. Compared with the SMC control, RMI-NDI control and INDI control without RL, tracking errors and overshoots are greatly reduced by the proposed RL-RMI-INDI controller for attitude tracking missions, even under model uncertainties and actuator faults. Full article
(This article belongs to the Special Issue Challenges and Innovations in Aircraft Flight Control (2nd Edition))
20 pages, 3452 KB  
Article
Effectiveness of Experience-Sharing Group Learning in Deep Reinforcement Learning
by Keita Muroya, Makoto Ikeda and Akira Notsu
Appl. Sci. 2026, 16(7), 3250; https://doi.org/10.3390/app16073250 - 27 Mar 2026
Viewed by 154
Abstract
Deep reinforcement learning faces a critical trade-off between computational cost and performance. This study proposes an experience-sharing group-learning framework in which multiple agents with different network sizes collaboratively learn a single task through a shared experience replay memory. Unlike conventional multi-agent approaches that [...] Read more.
Deep reinforcement learning faces a critical trade-off between computational cost and performance. This study proposes an experience-sharing group-learning framework in which multiple agents with different network sizes collaboratively learn a single task through a shared experience replay memory. Unlike conventional multi-agent approaches that assume homogeneous agents, our method enables agents with different computational capabilities to share experiences, allowing low-performance agents to benefit from high-performance agents’ quality experiences. The proposed method was evaluated in CartPole and Super Mario Bros environments. In CartPole two-agent experiments, the low-performance agent (Agent16, 404 parameters) achieved approximately 2× performance improvement (93.3 to 184.4 steps) through group learning, while the high-performance agent (Agent64, 4676 parameters) maintained comparable performance, though several group conditions fell below the solo 200-step result. Three-agent experiments further improved Agent16 to 196.5 steps with reduced variance. Under step-matched comparisons in Super Mario Bros, the low-capacity agent benefits from experience sharing beyond solo baselines that consume roughly twice as many steps, while the high-capacity agent remains broadly comparable between group and solo. Claims are limited to step-based normalisation. Q-value analysis revealed accelerated early learning, with Q-values increasing by +10.1 (Mario) and +7.7 (Luigi) at 1 million steps. These results demonstrate that experience-sharing group learning can improve learning efficiency for resource-constrained agents under a fixed environment-step budget. Full article
(This article belongs to the Special Issue Advances in Intelligent Systems—2nd edition)
Show Figures

Figure 1

15 pages, 1915 KB  
Article
Structural Health Diagnosis Using Advanced Spectrum Analysis and Artificial Intelligence of Ground Penetrating Radar Signals
by Wael Zatar, Hien Nghiem, Feng Xiao and Gang Chen
Buildings 2026, 16(7), 1330; https://doi.org/10.3390/buildings16071330 - 27 Mar 2026
Viewed by 199
Abstract
This paper aims to present a non-destructive, optimized variational mode decomposition (VMD)-based ground-penetrating radar (GPR) method developed for identifying void defects in reinforced concrete (RC) structures. This study also presents an enhanced framework for defect detection in RC by integrating advanced spectrum analysis [...] Read more.
This paper aims to present a non-destructive, optimized variational mode decomposition (VMD)-based ground-penetrating radar (GPR) method developed for identifying void defects in reinforced concrete (RC) structures. This study also presents an enhanced framework for defect detection in RC by integrating advanced spectrum analysis with deep learning techniques. A GPR investigation was conducted on an RC bridge deck with known structural defects to generate a representative dataset reflecting both intact and void-defective conditions. In addition to conventional spectral techniques such as fast Fourier transform (FFT), spectrogram, and scalogram, an optimized variational mode decomposition (VMD) method was implemented. The VMD approach decomposes GPR signals into intrinsic mode functions, enabling refined feature extraction beyond traditional spectral methods and allowing clear differentiation between intact and defective signals. The limited availability and quality of GPR small datasets have restricted the application of a functional 1D-CNN which generally requires at least several hundred datasets. To address this challenge, a data augmentation strategy is adopted. FFT-based features were successfully utilized to train a one-dimensional convolutional neural network (1D-CNN) for automated defect identification. The results demonstrate that both the advanced spectrum-based approach and the hybrid framework combining spectral analysis with deep learning significantly improve defect detection performance. Overall, the proposed methodology provides an effective and intelligent solution to support timely, data-driven decision-making for maintenance and safety assurance of bridge infrastructure. Full article
(This article belongs to the Section Building Structures)
Show Figures

Figure 1

18 pages, 2168 KB  
Review
Artificial Intelligence in Transcriptomics: From Human-in-the-Loop to Agentic AI
by Giulia Gentile, Giovanna Morello, Valentina La Cognata, Maria Guarnaccia and Sebastiano Cavallaro
J. Pers. Med. 2026, 16(4), 181; https://doi.org/10.3390/jpm16040181 - 27 Mar 2026
Viewed by 243
Abstract
To better understand the complexity of biological systems, research has shifted from a reductionist to a holistic approach, expanding the focus from single genes to a genome-scale view of gene activity and regulation. This is known as transcriptomics, a continuously growing field generating [...] Read more.
To better understand the complexity of biological systems, research has shifted from a reductionist to a holistic approach, expanding the focus from single genes to a genome-scale view of gene activity and regulation. This is known as transcriptomics, a continuously growing field generating gene expression signatures from different technologies. A comparable paradigm shift has occurred in computational systems biology with the implementation of Artificial Intelligence (AI) learning models for gene expression analysis and integration. These models enable transcriptome-based profiling to address challenges of data heterogeneity, integration, and updating, assisting human intelligence and enhancing their ability to retrieve, analyze, integrate, and generate data recursively, thanks to their intrinsic predictive, inferential, reinforcement, and generative capabilities. Additionally, while scientists worldwide are still learning how to leverage AI methods that can maintain the human-in-the-loop, a new fundamental change is emerging: agentic AI, which can autonomously act and employ other AI methods to pursue its objectives. As a futuristic perspective, the proposed data analysis pipeline imagines agentic AI systems allowing the automated retrieval and pre-processing of heterogeneous transcriptomics data, analysis and integration with other omics datasets, performed with an incremental updating and recurrent analysis (IURA) model that could allow the detection of guideline updates (e.g., disease reclassification) and the generation of new hypotheses, such as candidate biomarkers or transcriptome–phenotype correlations. Since personalized medicine could derive profound benefits from its use, this scenario also raises important considerations regarding the advantages and concerns associated with the use of scientific AI agents in research and clinical practice. Full article
Show Figures

Figure 1

32 pages, 4620 KB  
Article
Joint Resource Allocation for Maritime RIS–RSMA Communications Using Fractal-Aware Robust Deep Reinforcement Learning
by Da Liu, Kai Su, Nannan Yang and Jingbo Zhang
Fractal Fract. 2026, 10(4), 223; https://doi.org/10.3390/fractalfract10040223 (registering DOI) - 27 Mar 2026
Viewed by 88
Abstract
Sea-surface reflections and wind–wave motion render maritime channels strongly time-varying and statistically non-stationary, while nearshore deployments face sparse infrastructure and co-channel multiuser interference. This study integrates reconfigurable intelligent surfaces (RISs) with rate-splitting multiple access (RSMA) for joint online resource allocation. A physics-inspired time-varying [...] Read more.
Sea-surface reflections and wind–wave motion render maritime channels strongly time-varying and statistically non-stationary, while nearshore deployments face sparse infrastructure and co-channel multiuser interference. This study integrates reconfigurable intelligent surfaces (RISs) with rate-splitting multiple access (RSMA) for joint online resource allocation. A physics-inspired time-varying channel model is established by embedding fractional Brownian motion-driven slow statistical drift and reflection-phase perturbations. With imperfect, delayed channel state information (CSI) and discrete RIS phase quantization, a proportional-fairness utility maximization problem is formulated to jointly optimize shore base-station precoding, RIS phase shifts, and RSMA common-rate allocation. To cope with strong non-convexity, high dimensionality, mixed continuous–discrete coupling, and partial observability, a fractal-aware recurrent robust Actor–Critic (FRRAC) algorithm is developed. FRRAC encodes short observation histories using a gated recurrent unit and incorporates a lightweight Hurst-proxy estimator to capture slow channel statistics for robust value evaluation and policy learning. Truncated quantile critics and mixed prioritized–uniform replay further improve value robustness, training stability, and sample efficiency. Simulation results show that FRRAC converges faster and more stably under both conventional and fractal non-stationary channel modeling, and outperforms representative baselines across the objective and multiple statistical metrics, validating its effectiveness for joint resource optimization in maritime RIS–RSMA systems. Full article
(This article belongs to the Section Optimization, Big Data, and AI/ML)
Show Figures

Figure 1

17 pages, 4309 KB  
Article
A Deep Reinforcement Learning Approach for Joint Resource Allocation in Time-Varying Underwater Acoustic Cooperative Networks
by Liangliang Zeng, Tongxing Zheng, Yifan Wu, Yimeng Ge and Jiahao Gao
J. Mar. Sci. Eng. 2026, 14(7), 616; https://doi.org/10.3390/jmse14070616 - 27 Mar 2026
Viewed by 241
Abstract
Underwater acoustic sensor networks (UASNs) have emerged as a pivotal technology for ocean exploration, tactical surveillance, and environmental monitoring. However, the underwater acoustic channel poses severe challenges, including high propagation delay, limited bandwidth, and rapid time-varying multipath fading, which significantly degrade communication reliability. [...] Read more.
Underwater acoustic sensor networks (UASNs) have emerged as a pivotal technology for ocean exploration, tactical surveillance, and environmental monitoring. However, the underwater acoustic channel poses severe challenges, including high propagation delay, limited bandwidth, and rapid time-varying multipath fading, which significantly degrade communication reliability. Cooperative communication, which exploits spatial diversity via relay nodes, offers a promising solution to these impairments. In this paper, we investigate the joint optimization of relay selection and power allocation in UASNs to maximize the long-term system energy efficiency and throughput. This problem is inherently complex due to the hybrid action space, which couples the discrete selection of relay nodes with the continuous allocation of transmission power, and the absence of real-time, perfect channel state information (CSI). To address these challenges, we propose a novel deep hybrid reinforcement learning (DHRL) framework utilizing a parameterized deep Q-Network (P-DQN) architecture. Unlike traditional approaches that discretize power levels or relax discrete constraints, our approach seamlessly integrates a deterministic policy network for continuous power control and a value-based network for discrete relay evaluation. Furthermore, we incorporate a prioritized experience replay (PER) mechanism to improve sample efficiency by focusing on rare but significant channel transition events. We provide a comprehensive theoretical analysis of the algorithm’s complexity and convergence properties. Extensive simulation results demonstrate that the proposed DHRL algorithm outperforms state-of-the-art combinatorial bandit algorithms and conventional deep reinforcement learning baselines in terms of system energy efficiency, and also exhibits superior robustness against channel estimation errors. Full article
(This article belongs to the Section Coastal Engineering)
Show Figures

Figure 1

24 pages, 6273 KB  
Article
Manufacturing-Induced Defect Taxonomy and Visual Detection in UD Tapes with Carbon and Glass Fiber Reinforcements
by Gönenç Duran
Polymers 2026, 18(7), 807; https://doi.org/10.3390/polym18070807 - 26 Mar 2026
Viewed by 177
Abstract
Continuous unidirectional (UD) thermoplastic composite tapes are increasingly used in aerospace, automotive, and energy applications because of their high specific strength, low weight, recyclability, and compatibility with automated manufacturing. Since final component performance strongly depends on tape quality, reliable defect characterization and detection [...] Read more.
Continuous unidirectional (UD) thermoplastic composite tapes are increasingly used in aerospace, automotive, and energy applications because of their high specific strength, low weight, recyclability, and compatibility with automated manufacturing. Since final component performance strongly depends on tape quality, reliable defect characterization and detection are essential. In this study, manufacturing-induced defects in polypropylene-based UD tapes reinforced with carbon and glass fibers were investigated using real images acquired directly from laboratory-scale production without synthetic data. Defects related to interfacial integrity, matrix distribution, fiber architecture, and surface irregularities were systematically analyzed, and a practical four-class defect taxonomy was established. To enable automated inspection under limited-data conditions, lightweight YOLOv8, YOLOv11, and the new YOLO26 models were comparatively evaluated using a UD tape-specific augmentation strategy combining physically constrained Albumentations and on-the-fly augmentation. Among the tested models, YOLO26-s achieved the best overall performance, reaching a mean mAP@0.5 of 0.87 ± 0.03, outperforming YOLOv11 (0.83) and YOLOv8 (0.78), with 0.90 precision and 0.85 recall. Interfacial (0.92 mAP) and matrix-related (0.90 mAP) defects were detected most reliably, whereas fiber-related (0.89 mAP) and surface defects (0.79 mAP) remained more challenging, particularly in glass-fiber-reinforced tapes due to transparency-masking effects. The results demonstrate the potential of compact deep learning models for computationally efficient and manufacturing-relevant in-line quality monitoring of UD tape production. Full article
(This article belongs to the Special Issue Artificial Intelligence in Polymers)
Show Figures

Graphical abstract

34 pages, 6554 KB  
Article
Syncretic Grad-CAM Integrated ViT-CNN Hybrids with Inherent Explainability for Early Thyroid Cancer Diagnosis from Ultrasound
by Ahmed Y. Alhafdhi, Gibrael Abosamra and Abdulrhman M. Alshareef
Diagnostics 2026, 16(7), 999; https://doi.org/10.3390/diagnostics16070999 - 26 Mar 2026
Viewed by 159
Abstract
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, [...] Read more.
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, many approaches focus on local tissue and provide limited, non-quantitative interpretation, reducing clinical confidence. This study proposes an integrated framework combining enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E) to integrate local feature and global relational context during learning, rather than delayed integration. Methods: The proposed framework integrates enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E), enabling simultaneous learning of local feature representations and global relational context. This design allows feature fusion during the learning stage instead of delayed integration, aiming to improve diagnostic performance and interpretability in thyroid ultrasound image analysis. Results: The best-performing model, ViT-E–DenseNet169, achieved 98.5% accuracy, 98.9% sensitivity, 99.15% specificity, and 97.35% AUC, surpassing the robust basic hybrid model (CNN–XGBoost/ANN) and existing systems. A second contribution is improved interpretability, moving from mere illustration to validation. Gradient-weighted class activation mapping (Grad-CAM) maps demonstrated distinct and clinically understandable concentration patterns across various thyroid cancers: precise intralesional concentration for high-confidence malignancies (PTC = 0.968), edge/interface concentration for capsule risk patterns (PTC = 0.957), and broader-field activation consistent with infiltration concerns (PTC = 0.984), while benign scans showed low and diffuse activation (PTC = 0.002). Spatial audits reinforced this behavior (IoU/PAP: 0.72/91%, 0.65/78%, 0.58/62%). Conclusions: The integrated ViT-E–DenseNet169 framework provides highly accurate thyroid cancer detection while offering clinically meaningful interpretability through Grad-CAM-based spatial validation, supporting improved confidence in AI-assisted ultrasound diagnosis. Full article
(This article belongs to the Special Issue Deep Learning Techniques for Medical Image Analysis)
Show Figures

Figure 1

Back to TopTop