Saved Queries

Autonomous driving in dense traffic demands policies that ensure safety, accurate path tracking, and ride comfort, yet reinforcement learning (RL) alone suffers from low sample efficiency and weak safety guarantees, while classical artificial potential field (APF) methods lack adaptability to dynamic scenarios. This paper proposes PIPF-TD3, which integrates APF theory with the Twin Delayed Deep Deterministic Policy Gradient (TD3) by embedding composite potential values and Doppler-weighted gradients as physics-informed features into the state vector. A Hybrid A* planner generates a reference path encoded as an attractive field; repulsive fields model nearby obstacles using real-time perception data; and a multi-objective reward function jointly optimizes path tracking, collision avoidance, and ride comfort. Experiments in CARLA 0.9.14 across two scenarios—a highway segment with mixed obstacles and a signalized intersection with conflicting turning movements—show that PIPF-TD3 achieves 100% task completion with zero collisions, whereas TD3 without potential field guidance suffers a 90% collision rate. PIPF-TD3 reduces mean cross-track error to 0.12 m (72.1% reduction over the rule-based FSM baseline), maintains 67.0% larger safety clearance, and yields RMS longitudinal and lateral accelerations of 1.12 and 0.75 m/s², outperforming the FSM by 37.1% and 42.7%. These results confirm that Doppler-weighted physical priors substantially enhance RL-based driving safety and quality in complex traffic conditions. Full article

(This article belongs to the Section Transportation and Future Mobility)

17 pages, 1711 KB

Open AccessArticle

Surface EMG-Based Hand Gesture Recognition Using a Hybrid Multistream Deep Learning Architecture

by Yusuf Çelik and Umit Can

Sensors 2026, 26(7), 2281; https://doi.org/10.3390/s26072281 - 7 Apr 2026

Abstract

Surface electromyography (sEMG) enables non-invasive measurement of muscle activity for applications such as human–machine interaction, rehabilitation, and prosthesis control. However, high noise levels, inter-subject variability, and the complex nature of muscle activation hinder robust gesture classification. This study proposes a multistream hybrid deep-learning architecture for the FORS-EMG dataset to address these challenges. The model integrates Temporal Convolutional Networks (TCN), depthwise separable convolutions, bidirectional Long Short-Term Memory (LSTM)–Gated Recurrent Unit (GRU) layers, and a Transformer encoder to capture complementary temporal and spectral patterns, and an ArcFace-based classifier to enhance class separability. We evaluate the approach under three protocols: subject-wise, random split without augmentation, and random split with augmentation. In the augmented random-split setting, the model attains 96.4% accuracy, surpassing previously reported values. In the subject-wise setting, accuracy is 74%, revealing limited cross-user generalization. The results demonstrate the method’s high performance and highlight the impact of data-partition strategies for real-world sEMG-based gesture recognition. Full article

(This article belongs to the Special Issue Machine Learning in Biomedical Signal Processing)

10 pages, 512 KB

Open AccessProceeding Paper

Multitask Deep Neural Network for IMU Calibration, Denoising, and Dynamic Noise Adaption for Vehicle Navigation

by Frieder Schmid and Jan Fischer

Eng. Proc. 2026, 126(1), 44; https://doi.org/10.3390/engproc2026126044 - 7 Apr 2026

Abstract

In intelligent vehicle navigation, efficient sensor data processing and accurate system stabilization is critical to maintain robust performance, especially when GNSS signals are unavailable or unreliable. Classical calibration methods for Inertial Measurement Units (IMUs), such as discrete and system-level calibration, fail to capture time-varying, non-linear, and non-Gaussian noise characteristics. Likewise, Kalman filters typically assume static measurement noise levels for non-holonomic constraints (NHCs), resulting in suboptimal performance in dynamic environments. Furthermore, zero-velocity detection plays a vital role in preventing error accumulation by enabling reliable zero-velocity updates during motion stops, but classical thresholding approaches often lack robustness and precision. To address these limitations, we propose a novel multitask deep neural network (MTDNN) architecture that jointly learns IMU calibration, adaptive noise level estimation for NHC, and zero-velocity detection solely from raw IMU data. This shared-encoder design is utilized to minimize computational overhead, enabling real-time deployment on resource-constrained platforms such as Raspberry Pi. The model is trained using post-processed GNSS-RTK ground truth trajectories obtained from both a proprietary dataset and the publicly available 4Seasons dataset. Experimental results confirm the proposed system’s superior accuracy, efficiency, and real-time capability in GNSS-denied conditions. Full article

(This article belongs to the Proceedings of European Navigation Conference 2025)

►▼ Show Figures

Figure 1

16 pages, 2876 KB

Open AccessArticle

Design and Implementation of a High-Resolution Real-Time Ultrasonic Endoscopy Imaging System Based on FPGA and Coded Excitation

by Haihang Gu, Fujia Sun, Shuhao Hou and Shuangyuan Wang

Electronics 2026, 15(7), 1526; https://doi.org/10.3390/electronics15071526 - 6 Apr 2026

Viewed by 72

Abstract

High-frequency endoscopic ultrasound is crucial for the early diagnosis of gastrointestinal tumors. However, achieving high axial resolution, deep tissue signal-to-noise ratio, and real-time data processing simultaneously remains a significant challenge in hardware implementation. This paper proposes a miniaturized real-time high-frequency imaging system based on the Xilinx Artix-7 FPGA. To overcome attenuation limitations of high-frequency signals, we employ a 4-bit Barker code-encoded excitation scheme coupled with a programmable ±100 V high-voltage transmission circuit. This effectively enhances echo energy without exceeding peak voltage safety thresholds. At the receiver end, the system utilizes a multi-channel analog front end integrated with mixed-signal time-gain compensation technology. Furthermore, to address transmission bottlenecks for massive echo data, we designed a Low-Voltage Differential Signaling (LVDS) interface logic based on dynamic phase calibration, ensuring stable, high-speed data transfer to the host computer via USB 3.0. Experimental results with a 20 MHz transducer demonstrate that the system achieves real-time B-mode imaging at 30 frames per second. Phantom testing revealed an axial resolution of 0.13 mm, enabling clear differentiation of 0.1 mm microstructures. Compared to conventional single-pulse excitation, coded excitation technology improved signal-to-noise ratio (SNR) by approximately 4.5 dB at a depth of 40 mm. These results validate the system’s capability for high-precision deep imaging suitable for clinical endoscopy applications, delivered in a compact, low-power form factor. Full article

►▼ Show Figures

Figure 1

24 pages, 4411 KB

Open AccessArticle

GT-TD3: A Kinematics-Aware Graph-Transformer Framework for Stable Trajectory Tracking of High-Degree-of-Freedom (DOF) Manipulators

by Hanwen Miao, Haoran Hou, Zhaopeng Zhu, Zheng Chao and Rui Zhang

Machines 2026, 14(4), 397; https://doi.org/10.3390/machines14040397 - 5 Apr 2026

Viewed by 164

Abstract

Accurate trajectory tracking of redundant manipulators is difficult because the controller must simultaneously model local couplings between adjacent joints and global dependencies across the whole kinematic chain. Existing reinforcement learning methods typically employ multilayer perceptrons, which do not explicitly exploit manipulator structure and therefore show limited stability and representation ability in high-dimensional continuous control tasks. This paper proposes GT-TD3, a Graph Transformer-enhanced-Twin Delayed Deep Deterministic Policy Gradient framework, for redundant manipulator trajectory tracking. The proposed actor first converts the raw system state into joint-level node features and uses a graph neural network to extract local kinematic coupling information. A Transformer is then employed to capture long-range dependencies among joints. To strengthen the use of structural priors, topology- and distance-related bias terms are incorporated into the attention mechanism, enabling the network to encode manipulator structure during global feature learning. Experiments on a 7-DoF KUKA iiwa manipulator in PyBullet demonstrate that GT-TD3 outperforms MLP, pure GNN, and pure Transformer baselines in tracking performance. The proposed method achieves more stable training, faster convergence, and smoother and more accurate end-effector motion. The results show that the integration of local graph modeling and structure-aware global attention provides an effective solution for high-precision trajectory tracking of redundant manipulators. Full article

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

►▼ Show Figures

Figure 1

22 pages, 1280 KB

Open AccessArticle

Enhancing Early Skin Cancer Detection: A Deep Learning Approach with Multi-Scale Feature Refinement and Fusion

by Siyuan Wu, Pengfei Zhao, Huafu Xu and Zimin Wang

Symmetry 2026, 18(4), 612; https://doi.org/10.3390/sym18040612 - 5 Apr 2026

Viewed by 145

Abstract

The global incidence of skin cancer is rising, making it an increasingly critical public health issue. Malignant skin tumors such as melanoma originate from pathological alterations in skin cells, and their accurate early-stage segmentation is crucial for quantitative analysis, early diagnosis, and effective treatment. However, achieving precise and efficient segmentation remains a major challenge, as existing methods often struggle to capture complex lesion characteristics. To address this challenge, we propose a novel deep learning framework that integrates the PVT v2 backbone with two key modules: the Spatial-Aware Feature Enhancement (SAFE) module and the Multiscale Dual Cross-attention Fusion (MDCF) module. The SAFE module enhances multi-scale encoder features through a dual-branch architecture, which adaptively extracts offset information to integrate fine-grained shallow details with deep semantic information, thereby bridging the feature gap across network depths. The MDCF module establishes bidirectional cross-attention between decoder and encoder features, followed by multi-scale deformable convolutions that capture lesion boundaries and small fragments across heterogeneous receptive fields, thereby enriching semantic details while suppressing background interference. The proposed model was evaluated on two public benchmark datasets (ISIC 2016 and ISIC 2018), achieving Intersection over Union (IoU) scores of 87.33% and 83.67%, respectively. These results demonstrate superior performance compared to current state-of-the-art methods and indicate that our framework significantly enhances skin lesion image analysis, offering a promising tool for improving early detection of skin cancer. Full article

(This article belongs to the Special Issue Symmetric/Asymmetric Study in Medical Imaging)

►▼ Show Figures

Figure 1

37 pages, 33258 KB

Open AccessArticle

An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery

by Tong Zhao, Chuanxun Hou, Zhili Zhang and Zhaofa Zhou

Remote Sens. 2026, 18(7), 1088; https://doi.org/10.3390/rs18071088 - 4 Apr 2026

Viewed by 169

Abstract

Accurate water body segmentation from multispectral remote sensing imagery is critical for hydrological monitoring and environmental management. However, leveraging transfer learning with pre-trained models remains challenging due to the dimensional mismatch between three-channel RGB-based architectures and multi-band spectral data. To address this, this study proposes a novel segmentation network, termed Intelligent Gated Fusion Network (IGF-Net), built upon a dual-branch feature encoder module and a core Intelligent Gated Fusion Module (IGFM). The IGFM achieves adaptive fusion of visual and spectral features through a cascaded mechanism integrating differences-and-commonalities parallel modeling, channel-context priors, and adaptive temperature control. We evaluate IGF-Net on the newly constructed Tiangong-2 remote sensing image water body semantic segmentation dataset, which comprises 3776 meticulously annotated multispectral image patches. Comprehensive experiments demonstrate that IGF-Net achieves strong and consistent performance on this dataset, with an Intersection over Union of 0.8742 and a Dice coefficient of 0.9239, consistently outperforming the evaluated baseline methods, such as FCN, U-Net, and DeepLabv3+. It also exhibits strong cross-dataset generalization capabilities on an independent Sentinel-2 water segmentation dataset. Ablation studies and visualization analyses confirm that the proposed fusion strategy significantly enhances segmentation accuracy and stability, particularly in complex scenarios. placeholder Full article

(This article belongs to the Topic Advances in Hydrological Remote Sensing)

16 pages, 6392 KB

Open AccessArticle

An Engineered clMagR Tetramer with Enhanced Magnetism for Magnetic Manipulation

by Peng Zhang, Xiujuan Zhou, Shenting Zhang, Peilin Yang, Zhu-An Xu, Xin Zhang, Junfeng Wang, Tiantian Cai, Yuebin Zhang and Can Xie

Biomolecules 2026, 16(4), 537; https://doi.org/10.3390/biom16040537 - 3 Apr 2026

Viewed by 211

Abstract

Biological manipulation via physical stimuli such as light and magnetism has become a central goal in modern biotechnology. Among these modalities, magnetic fields offer unique advantages, including deep tissue penetration and untethered interventions in living systems. An ideal platform for such a magnetogenetic toolkit would be a genetically encodable protein with tunable magnetic features under physiological conditions. However, the development of such tools has been hindered by the lack of robust and stable protein scaffolds with strong intrinsic magnetic properties. Inspired by animal magnetoreception in nature, here, we rationally designed and systematically screened single-chain variants of the magnetoreceptor MagR. Through nine iterative rounds of design and experimental validation, we generated 25 constructs and ultimately identified a stable single-chain-dimer-based-tetramer, SDT-MagR, as the optimal magnetic molecular platform. This engineered protein exhibits exceptional structural stability and state-dependent magnetic behavior, showing ferrimagnetic-like characteristics in the solid state and paramagnetic behavior in solution. With enhanced magnetic susceptibility, purified SDT-MagR can be directly attracted by a magnet in vitro, establishing it as a promising new platform for future biomagnetic manipulation and magnetogenetics applications. Full article

(This article belongs to the Topic Metalloproteins and Metalloenzymes, 2nd Edition)

►▼ Show Figures

Figure 1

29 pages, 1303 KB

Open AccessArticle

An Enhanced Traffic Classifier Based on Self-Supervised Feature Learning

by Shaoqing Jiang, Xin Luo, Hongyi Wang, Gang Chen and Hongwei Zhao

Appl. Sci. 2026, 16(7), 3493; https://doi.org/10.3390/app16073493 - 3 Apr 2026

Viewed by 142

Abstract

Encrypted network traffic classification is an important research topic in the field of network security. Although deep learning-based methods have made progress, they still face three main challenges: first, the semantic information in encrypted traffic is inadequately represented, making it difficult for existing methods to effectively capture the hierarchical interaction relationships between packet-level and flow-level features; second, models rely on large amounts of labeled data for supervised training, resulting in high training costs and limited generalization ability in new scenarios; third, in existing self-supervised methods, the functions of the encoder and decoder are coupled, which restricts the full potential of the encoder’s representation learning. To address these issues, this paper proposes an Enhanced Traffic Classifier (ETC) based on self-supervised feature learning. The model first constructs a multi-level interactive traffic representation matrix, converting raw traffic into structured grayscale images that fuse packet-level and flow-level temporal features, thereby addressing the problem of missing semantic information. On this basis, an improved Masked Image Modeling Vision Transformer architecture is adopted. Through a three-stage decoupled design of encoder–regressor–decoder, the encoder focuses solely on feature extraction, the regressor performs masked representation prediction, and the decoder is only responsible for image reconstruction, thereby fully unleashing the encoder’s feature learning capability. Furthermore, during the fine-tuning stage, an Attentive Probing classification mechanism is introduced to replace the traditional linear classification head. By using learnable class query vectors to dynamically focus on semantic regions relevant to the classification target, the model’s recognition accuracy and robustness are further improved. Experiments are conducted on five public datasets, including USTC-TFC2016 and CICIoT2022, as well as a self-built Human-Internet dataset. The results show that ETC significantly outperforms mainstream methods such as YaTC and ET-BERT in core metrics including accuracy and F1-score, while also demonstrating strong generalization in few-shot scenarios. Full article

►▼ Show Figures

Figure 1

23 pages, 1312 KB

Open AccessArticle

From Text to Structure: Precise Cognitive Diagnosis via Semantic Enhancement and Dynamic Q-Matrix Calibration

by Jingxing Fan, Zhichang Zhang and Yuming Du

Appl. Sci. 2026, 16(7), 3477; https://doi.org/10.3390/app16073477 - 2 Apr 2026

Viewed by 265

Abstract

Traditional cognitive diagnosis models typically rely on expert-annotated Q-matrices to define the relationship between exercises and knowledge concepts. This process is not only highly subjective and costly, but also prone to introducing noise and bias, which directly affects diagnostic accuracy. Meanwhile, most existing deep learning-based methods overlook the rich semantic information contained in concept descriptions, making it difficult to deeply model the intrinsic relationships among knowledge points, resulting in limited interpretability of the models. To address these issues, this paper proposes a cognitive diagnosis model that incorporates key textual information from concept descriptions to refine the Q-matrix (KECQCD). The core innovation of the model lies in leveraging the pre-trained language model RoBERTa to encode concept texts, fusing semantic features with identifier embeddings through a gating mechanism to construct semantically-enhanced concept representations. It designs a concept-exercise heterogeneous information network and employs a graph attention mechanism to adaptively aggregate node features, explicitly modeling high-order knowledge dependencies. Furthermore, a multi-task joint learning framework is established to predict student performance while dynamically correcting association errors in the initial Q-matrix. Experimental results on the public Junyi dataset show that the KECQCD model significantly outperforms mainstream baseline models across multiple metrics, including accuracy (ACC), area under the curve (AUC), and root mean square error (RMSE). Ablation studies confirm the effectiveness of each core module, and diagnostic consistency (DOA) evaluation further demonstrates the enhanced interpretability of the model’s outcomes. This research offers a new solution for building accurate, reliable, and interpretable cognitive diagnosis systems, contributing positively to the advancement of personalized intelligent education. Full article

►▼ Show Figures

Figure 1

33 pages, 10259 KB

Open AccessArticle

Multimodal Remote Sensing Image Classification Based on Dynamic Group Convolution and Bidirectional Guided Cross-Attention Fusion

by Lu Zhang, Yaoguang Yang, Zhaoshuang He, Guolong Li, Feng Zhao, Wenqiang Hua, Gongwei Xiao and Jingyan Zhang

Remote Sens. 2026, 18(7), 1066; https://doi.org/10.3390/rs18071066 - 2 Apr 2026

Viewed by 170

Abstract

The synergistic integration of Hyperspectral Imaging (HSI) and Light Detection and Ranging (LiDAR) data has become a pivotal strategy in remote sensing for precise land-cover classification. However, existing multimodal deep learning frameworks frequently suffer from intrinsic limitations, including rigid feature extraction protocols, underutilization of LiDAR-derived textural information, and asymmetric fusion mechanisms that fail to balance the contribution of spectral and elevation features effectively. To address these challenges, this paper proposes a novel framework named DGC-BCAF, which integrates Dynamic Group Convolution and Bidirectional Guided Cross-Attention Fusion to achieve adaptive feature representation and robust cross-modal interaction. First, a Dynamic Group Convolution (DGConv) module embedded within a ResNet18 backbone is designed to function as the central spatial context extractor. Unlike traditional group convolution, this module learns a dynamic relationship matrix to automatically group input channels, thereby facilitating flexible and context-aware feature representation that adapts to complex spatial distributions. Second, to overcome the insufficient exploitation of elevation data, we introduce a dedicated LiDAR texture encoding branch. This branch innovatively fuses Gray-Level Co-occurrence Matrix (GLCM) statistical features with multi-scale convolutional representations, capturing both geometric height information and fine-grained surface textural details that are critical for distinguishing objects with similar elevations. Finally, central to our architecture is the Bidirectional Cross-Attention Fusion (BCAF) module. Unlike standard unidirectional fusion approaches, BCAF employs a LiDAR geometry to guide the selection of salient spectral bands, while simultaneously utilizing spectral signatures to emphasize informative LiDAR channels. This mutual guidance ensures a balanced contribution from both modalities. Extensive experiments conducted on three benchmark datasets—Houston 2013, Trento, and MUUFL—demonstrate that DGC-BCAF consistently outperforms state-of-the-art methods in terms of overall accuracy, average accuracy, and Kappa coefficient. The results confirm that the proposed adaptive grouping and bidirectional guidance strategies significantly improve classification performance, particularly in distinguishing spectrally similar materials and delineating complex urban structures. Full article

►▼ Show Figures

Figure 1

19 pages, 1843 KB

Open AccessArticle

Expert Knowledge-Infused Learning for Indoor Radio Propagation Environment Digital Twins

by Haotian Wang, Lili Xu, Yu Zhang, Tao Peng and Wenbo Wang

Sensors 2026, 26(7), 2199; https://doi.org/10.3390/s26072199 - 2 Apr 2026

Viewed by 207

Abstract

Digital Twin (DT) technology, which enables the simulation, evaluation, and optimization of physical entities through synchronized digital replicas, has attracted increasing attention in the context of wireless networks. Among the various components involved, the radio propagation environment is fundamental to communication performance, making its accurate digital replication a critical challenge. This paper focuses on constructing a high-precision radio propagation environment DT using deep learning (DL) methods. While data-driven DL has become a mainstream solution for signal propagation prediction in DTs, its performance depends heavily on the model’s ability to learn intrinsic propagation patterns from data. Owing to the complex interactions between wireless signals and environmental obstacles, conventional DL models often struggle to efficiently capture implicit propagation laws solely from raw data. To address this issue, we propose a general methodology for incorporating expert knowledge of radio propagation into DL frameworks. Building upon the widely adopted encoder–decoder architecture, the proposed approach explicitly integrates theoretical propagation knowledge to enhance learning efficiency and prediction accuracy. Ablation experiments demonstrate that the inclusion of expert knowledge significantly improves the performance of DL-based radio environment DTs. This work highlights the potential of knowledge–data dual-driven DL as a promising direction for advancing radio propagation environment DTs. Full article

(This article belongs to the Topic AI-Driven Wireless Channel Modeling and Signal Processing)

►▼ Show Figures

Figure 1

21 pages, 1291 KB

Open AccessArticle

Development of a Software Model for Classification and Automatic Cataloging of Archive Documents

by Adilbek Dauletov, Bahodir Muminov, Noila Matyakubova, Uldona Abdurahmonova, Khurshida Bakhriyeva and Makhbubakhon Fayzieva

Information 2026, 17(4), 341; https://doi.org/10.3390/info17040341 - 1 Apr 2026

Viewed by 295

Abstract

This study proposes an integrated software model for automatic document classification and metadata generation based on the Dublin Core standard to address the issue of rapid and consistent management of archival documents in a digital environment. This approach combines the stages of receiving incoming documents, converting them to text using optical character recognition (OCR), image preprocessing (binarization, deskew, noise reduction), and text cleaning and vectorization (TF–IDF) into a single pipeline. In the document classification stage, the Bidirectional Encoder Representations from Transformers (BERT) model with a context-sensitive transformer architecture is used, along with classical machine learning models (Logistic Regression, Naive Bayes, Support Vector Machine) and an ensemble approach (LightGBM), to increase the accuracy by modeling the document content at a deep semantic level. Experiments were conducted on the RVL-CDIP dataset, and the OCR efficiency was evaluated using the Character Error Rate (CER) indicator, and the classification results were evaluated using the accuracy, precision, recall and F1-score metrics. The results confirmed the high stability and generalization ability of the BERT (accuracy, 95.1%; F1, 95.0%) and LightGBM (accuracy, 93.2%; F1, 93.2%) models. In the final stage, OCR, NER, and classification outputs are automatically organized into Dublin Core metadata elements (Title, Creator, Date, Description, Subject, Type, Format, Language) and exported in JSON/XML formats. This automation significantly reduces manual cataloging effort and improves indexing and retrieval efficiency in digital archival systems. Full article

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

►▼ Show Figures

Graphical abstract

28 pages, 4366 KB

Open AccessFeature PaperArticle

Temporal Transformer with Conditional Tabular GAN for Credit Card Fraud Detection: A Sequential Deep Learning Approach

by Jiaying Chen, Yiwen Liang, Jingyi Liu and Mengjie Zhou

Mathematics 2026, 14(7), 1183; https://doi.org/10.3390/math14071183 - 1 Apr 2026

Viewed by 321

Abstract

Credit card fraud detection remains a critical challenge in financial security, characterized by severe class imbalance and the need to capture complex temporal patterns in transaction sequences. Traditional machine learning approaches treat transactions as independent events, failing to model the sequential nature of user behavior and suffering from inadequate handling of minority class samples. In this paper, we propose an integrated framework that combines generative modeling and time-aware sequential learning for credit card fraud detection. Our approach addresses two fundamental limitations: (1) we model transaction histories as temporal sequences using a Transformer-based architecture that captures both long-term dependencies and abrupt behavioral changes through multi-head self-attention mechanisms, and (2) we employ CTGAN to generate high-quality synthetic fraudulent samples, providing more effective oversampling than conventional techniques like SMOTE. The Time-Aware Transformer incorporates temporal encoding and position-aware attention to preserve transaction order and time intervals, while CTGAN learns the complex conditional distributions of fraudulent transactions to produce realistic synthetic samples. We evaluate our framework on the IEEE-CIS Fraud Detection dataset, demonstrating significant improvements over representative classical and sequential deep-learning baselines. Experimental results show that our method achieves superior performance with an AUC-ROC of 0.982, precision of 0.891, recall of 0.876, and F1-score of 0.883, outperforming the representative baselines considered in this study, including traditional machine learning models, standalone deep learning architectures, and supervised sequential neural models. Ablation studies confirm the individual contributions of both the sequential modeling component and the generative oversampling strategy. Our work demonstrates that combining temporal sequence modeling with generative synthesis provides a robust solution for imbalanced fraud detection, with potential applications extending to other domains requiring sequential pattern recognition under extreme class imbalance. Full article

►▼ Show Figures

Figure 1

23 pages, 13635 KB

Open AccessArticle

Deep Reinforcement Learning for Autonomous Underwater Navigation: A Comparative Study with DWA and Digital Twin Validation

by Zamirddine Mari, Mohamad Motasem Nawaf and Pierre Drap

Sensors 2026, 26(7), 2179; https://doi.org/10.3390/s26072179 - 1 Apr 2026

Viewed by 253

Abstract

Autonomous navigation in underwater environments is challenged by the absence of GPS, degraded visibility, and submerged obstacles. This article investigates these issues using the BlueROV2, an open platform for scientific experimentation. We propose a deep reinforcement learning approach based on the Proximal Policy Optimization (PPO) algorithm, using an observation space that combines target-oriented navigation information, a virtual occupancy grid, and raycasting along the boundaries of the operational area. This information is encoded into a high-dimensional observation space of 84 dimensions, providing the agent with comprehensive local and global situational awareness. The learned policy is compared against a reference deterministic kinematic planner, the Dynamic Window Approach (DWA), a robust baseline for obstacle avoidance. The evaluation is conducted in a realistic simulation environment and complemented by validation on a physical BlueROV2 supervised by a 3D digital twin of the test site, reducing risks associated with real-world experimentation. The results show that the PPO policy consistently outperforms DWA in highly cluttered environments, notably thanks to better local adaptation and reduced collisions. Finally, experiments demonstrate the transferability of the learned behavior from simulation to the real world, confirming the relevance of deep RL for autonomous navigation in underwater robotics. Full article

(This article belongs to the Special Issue Advanced Techniques in Control and Path Planning for Autonomous and Collaborative Robots in Dynamic Environments)

►▼ Show Figures

Graphical abstract

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 65.

Go to page 1 2 3 4 5

Search Results (3,205)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI