Saved Queries

Accurate one-step forecasting of scenic-spot visitor demand is challenging due to strong non-stationarity, holiday-induced peaks, and abrupt reputation-driven shocks. We propose a symmetry-aware dynamic graph learning framework that fuses social–physical sensing streams for robust demand prediction. Online reviews are treated as social sensing, transformed into daily sentiment indicators, and aligned with demand using a delay-aware aggregation scheme. To capture evolving inter-spot dependencies, we construct a time-varying adjacency matrix that is updated over time and integrated into a lightweight spatio-temporal forecasting model, Dynamic Spatio-temporal Graph Attention LSTM (DSGAT-LSTM). The model preserves the permutation-invariant property of graph learning while introducing sentiment-guided feature reweighting and sentiment-gated temporal updates to better track volatility. Experiments on multi-year daily data from multiple A-level scenic spots with holiday and weather context demonstrate consistent error reductions over representative temporal and graph-based baselines, together with improved stability under peak and shock conditions. We will release the processed feature-level dataset and implementation scripts to support reproducibility. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Symmetry/Asymmetry)

►▼ Show Figures

Figure 1

13 pages, 2998 KB

Open AccessArticle

Inhomogeneous Fluid Motion Induced by Standing Surface Acoustic Wave (SAW): A Finite Element Study

by Jialong Hu, Chao Zhang and Yufeng Zhou

Micromachines 2026, 17(3), 330; https://doi.org/10.3390/mi17030330 (registering DOI) - 6 Mar 2026

Abstract

Acoustofluidics has emerged as a transformative technology for contact-free manipulation of microparticles and fluids in microscale systems. Although bulk acoustic waves (BAWs) are known to displace inhomogeneous fluids through acoustic radiation force acting at fluid interfaces, the capability of surface acoustic waves (SAWs) to produce analogous relocation phenomena remains largely unexplored. This study addresses a critical gap in acoustofluidic theory by presenting the first comprehensive finite element method investigation of SAW-driven motion of inhomogeneous fluid confined within microchannels of widths equal to one full or one-half SAW wavelength. Unlike BAW-based system that generate uniform pressure fields across channel heights, SAW devices exhibit inherently nonuniform vertical pressure distributions and intense near-boundary streaming—features that fundamentally alter fluid relocation dynamics. Our simulations demonstrate that despite high-frequency operation (6.65 MHz) and strong ARF, standing SAW fields fail to achieve stable fluid relocation in both initially stable and unstable configurations due to vertical pressure stratification and rapid floor-level streaming. Nevertheless, these same characteristics generate vigorous transverse folding flows that enable exceptionally rapid homogenization, offering a distinct acoustofluidic mechanism for on-chip mixing. These findings not only elucidate fundamental physical differences between BAW and SAW actuation in multiphase microfluidic systems but also establish design principles for SAW-induced microfluidic mixers. The results provide crucial theoretical guidance for device optimization where rapid homogenization is desired over stable stratification. Full article

►▼ Show Figures

Figure 1

18 pages, 919 KB

Open AccessArticle

Development of a Machine Learning-Based Predictive Model and Clinically Oriented Web Application for 30-Day Mortality Following Cardiac Surgery

by Telmo Miguel-Medina, Susel Góngora Alonso, Isabel de la Torre Díez, Miriam Blanco Sáez, Hector Lazaro Arrechea Elissalt, Atenea Ruigómez Noriega and María Lourdes del Río Solá

Sensors 2026, 26(5), 1656; https://doi.org/10.3390/s26051656 (registering DOI) - 5 Mar 2026

Abstract

This study aimed to develop and validate a machine learning-based model for predicting 30-day mortality in cardiac surgery patients and to implement a functional, clinician-oriented web application that enables the real-time use of the model. A retrospective cohort of 325 cardiac surgery patients was analysed using supervised machine learning. After preprocessing and clinical feature selection, several models were trained and evaluated through cross-validation. XGBoost achieved the best results, with an AUC-ROC of 0.968, recall of 0.800, and Brier score of 0.058. To facilitate clinical usability, a web-based application was developed using StreamLit, enabling clinicians to input patient data and predict mortality in real time. The application includes SHAP-based explainability for each prediction, thereby ensuring model transparency. Preliminary feedback from clinicians indicated that the tool was intuitive and informative and showed potential for preoperative risk assessment. The integration of a robust ML (machine learning) model with a functional clinical application offers a practical tool for supporting decision-making in cardiac surgery. This combined approach enhances both accuracy and accessibility, which are key to real-world impacts. Future work will involve multicentre validation and user-centred refinement. Full article

(This article belongs to the Special Issue Novel Implantable Sensors and Biomedical Applications)

►▼ Show Figures

Figure 1

23 pages, 3615 KB

Open AccessArticle

A Foundational Edge-AI Sensing Framework for Occupancy-Driven Energy Management in SMOs

by Yutong Chen, Daisuke Sumiyoshi, Xiangyu Wang, Takahiro Yamamoto, Takahiro Ueno and Jewon Oh

IoT 2026, 7(1), 25; https://doi.org/10.3390/iot7010025 - 5 Mar 2026

Abstract

Occupant presence is a primary driver of Heating, Ventilation, and Air Conditioning (HVAC) and lighting energy consumption in office environments. Existing occupancy-sensing solutions often rely on privacy-sensitive modalities or require costly infrastructure, limiting their applicability in Small and Medium Offices (SMOs). To address these limitations, this study proposes a lightweight CSI-based occupancy-sensing framework based on a dual-core ESP32-S3 architecture, enabling concurrent CSI processing, environmental sensing, and cloud communication. A multi-stage signal preprocessing pipeline compresses raw CSI streams into a compact

56 \times 8

statistical feature matrix, achieving 98.86% classification accuracy for multi-level occupancy estimation. Compared with image-based baselines such as DenseNet121, the proposed approach reduces input data size to 24 kB and model parameters to 138 K, yielding over 129× reduction in transmission volume without sacrificing performance. These results demonstrate that the proposed framework provides a practical, privacy-preserving, and edge-deployable solution for occupancy-aware energy management in SMOs. Full article

(This article belongs to the Special Issue IoT Meets AI: Driving the Next Generation of Technology)

22 pages, 3288 KB

Open AccessArticle

An Intelligent Real-Time System for Sentence-Level Recognition of Continuous Saudi Sign Language Using Landmark-Based Temporal Modeling

by Adel BenAbdennour, Mohammed Mukhtar, Osama Almolike, Bilal A. Khawaja and Abdulmajeed M. Alenezi

Sensors 2026, 26(5), 1652; https://doi.org/10.3390/s26051652 - 5 Mar 2026

Abstract

A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) and the scarcity of real-time, sentence-level translation systems. This paper presents a real-time system for sentence-level recognition of continuous SSL and direct mapping to natural spoken Arabic. The proposed system operates end-to-end on live video streams or pre-recorded content, extracting spatio-temporal landmark features using the MediaPipe Holistic framework. For classification, the input feature vector consists of 225 features derived from hand and body pose landmarks. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network trained on the ArabSign (ArSL) dataset to perform direct sentence-level classification over a vocabulary of 50 continuous Arabic sign language sentences, supported by an idle-based segmentation mechanism that enables natural, uninterrupted signing. Experimental evaluation demonstrates robust generalization: under a Leave-One-Signer-Out (LOSO) cross-validation protocol, the model attains a mean sentence-level accuracy of 94.2%, outperforming the fixed signer-independent split baseline of 92.07%, while maintaining real-time performance suitable for interactive use. To enhance linguistic fluency, an optional post-recognition refinement stage is incorporated using a large language model (LLM), followed by text-to-speech synthesis to produce audible Arabic output; this refinement operates strictly as post-processing and is not included in the reported recognition accuracy metrics. The results demonstrate that direct sentence-level modeling, combined with landmark-based feature extraction and real-time segmentation, provides an effective and practical solution for continuous SSL sentence recognition in real-time. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

►▼ Show Figures

Figure 1

22 pages, 2688 KB

Open AccessArticle

SOP: Selective Orthogonal Projection for Composed Image Retrieval

by Su Cheng and Guoyang Liu

Sensors 2026, 26(5), 1621; https://doi.org/10.3390/s26051621 - 4 Mar 2026

Abstract

The proliferation of intelligent sensor networks in urban surveillance and remote sensing has triggered the explosive growth of unstructured visual sensor data. Accurately retrieving targets from these massive streams based on complex cross-modal user intents remains a critical bottleneck for efficient intelligent perception. Composed Image Retrieval (CIR) addresses this by enabling retrieval via a multi-modal query that combines a reference image with semantic control signals. However, existing methods often struggle with abstract instructions in real-world scenarios. Consequently, models often suffer from feature distribution shifts due to focus ambiguity, as well as semantic erosion caused by highly entangled visual and textual features. To address these challenges, we propose a geometry-based Selective Orthogonal Projection Network (SOP). First, the Selective Focus Recovery module quantifies instruction uncertainty via information entropy and calibrates shifted query features to the true target distribution using structural consistency regularization. Second, to ensure data fidelity, we introduce Orthogonal Subspace Projectionand Geometric Composition Fidelity. These mechanisms employ Gram–Schmidt orthogonalization to decouple features into a constant visual base and an orthogonal modification increment, restricting semantic modifications to the null space. Extensive experiments on FashionIQ, Shoes, and CIRR datasets demonstrate that SOP significantly outperforms SOTA methods, offering a novel solution for efficient large-scale sensor data retrieval and analysis. Full article

(This article belongs to the Section Intelligent Sensors)

►▼ Show Figures

Figure 1

21 pages, 14880 KB

Open AccessArticle

Beyond the Black Box: Interpretable Multi-Trait Essay Scoring with Trait-Aware Transformer

by Xiaoyi Tang

Electronics 2026, 15(5), 1066; https://doi.org/10.3390/electronics15051066 - 4 Mar 2026

Abstract

The rapid advancement of automated essay scoring (AES) has been constrained by a representation bottleneck, where monolithic models collapse diverse facets of writing constructs into a single, uninterpretable signal, undermining the pedagogical value of multi-dimensional rating traits. To address this limitation, the RoBERTa-based Trait-Aware Transformer (RoBERTa-TAT) is introduced. This architectural reframing replaces unified pooling with parallel, trait-specific attention streams, preserving and disentangling critical features such as conceptual depth and mechanical precision. Tested on the ASAP Dataset-7, RoBERTa-TAT attains a new state-of-the-art Quadratic Weighted Kappa (QWK) of 0.936, outperforming sequential baselines and conventional Transformer variants. Beyond gains in accuracy, this trait-specialized architecture recasts scoring from a black-box prediction into a transparent diagnostic tool, enabling actionable, fine-grained feedback at different rating traits. High-resolution inspection reveals that the model’s internal representations correlate with specific linguistic markers—such as discourse connectives for organization—suggesting a degree of structural alignment with expert judgment. By aligning high-capacity representation learning with the granular demands of formative assessment, RoBERTa-TAT provides a practical, interpretable blueprint for deploying accountable AI in education and broadening access to expert diagnostic insight. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Figure 1

23 pages, 1094 KB

Open AccessArticle

Exploring the Limits of Probes for Latent Representation Edits in GPT Models

by Austin L. Davis, Robinson Vasquez Ferrer and Gita Sukthankar

AI 2026, 7(3), 92; https://doi.org/10.3390/ai7030092 - 4 Mar 2026

Abstract

This article evaluates the use of probing classifiers to modify the internal hidden state of a chess-playing transformer, which has been trained on sequences of chess moves and can generate new moves with prompted. Probing classifiers are a technique for understanding and modifying the operation of neural networks in which a smaller classifier is trained to use the model’s internal representation to learn a probing task. The aim of this research is to discover whether the learned model possesses an editable internal representation of the chess game, despite being trained without explicit information about the rules of chess. We contrast the performance of standard linear probes against Sparse Autoencoders (SAEs), a latent space interpretability technique designed to decompose polysemantic concepts into atomic features via an overcomplete basis. Our experiments demonstrate that linear probes trained directly on the residual stream significantly outperform probes based on SAE latents. When quantifying the success of interventions via the probability of legal moves, linear probe edits achieved an 88% success rate, whereas SAE-based edits yielded only 41%. These findings suggest that while SAEs are valuable for specific interpretability tasks, they do not enhance the controllability of hidden states compared to raw vectors. Finally, we show that the residual stream respects the Markovian property of chess, validating the feasibility of applying consistent edits across different time steps for the same board state. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

►▼ Show Figures

Figure 1

21 pages, 7035 KB

Open AccessArticle

Feature Complementarity-Guided Multi-Weight Multi-Scale Fusion Framework for Underwater Image Enhancement

by Gaopeixuan Sang, Tianyu Cheng and Liang Hua

Appl. Sci. 2026, 16(5), 2451; https://doi.org/10.3390/app16052451 - 3 Mar 2026

Abstract

The selective wavelength absorption and scattering effects caused by complex underwater optical environments lead to a significant contradiction between color restoration and structural preservation in image enhancement. To break through this bottleneck, this paper proposes a multi-weight-guided hierarchical feature fusion framework, which transforms underwater image enhancement into a problem of optimal integration of multi-dimensional feature streams. Addressing underwater image degradation, the method constructs three complementary feature branches targeting visibility restoration, contrast enhancement, and texture compensation. Guided by multiple weights derived from Laplacian contrast, saliency, and saturation, a Laplacian and Gaussian pyramid-based multi-scale fusion strategy is designed, achieving the simultaneous preservation of global structure and enhancement of local high-frequency details. Experimental results on the SQUID real-world underwater open dataset demonstrate that, compared with eleven advanced algorithms, the proposed method exhibits high equilibrium and superiority in key metrics such as AG, IE, ENL, and UCIQE. Furthermore, its visual stability and robustness in complex and variable water environments are validated through the rank-sum composite evaluation method (RSCEM) and a refined scoring strategy. Full article

►▼ Show Figures

Figure 1

31 pages, 3408 KB

Open AccessArticle

Grad-CAM Enhanced Explainable Deep Learning for Multi-Class Lung Cancer Classification Using DE-SAMNet Model

by Murat Kılıç, Merve Bıyıklı, Abdulkadir Yelman, Hüseyin Fırat, Hüseyin Üzen, İpek Balikçi Çiçek and Abdulkadir Şengür

Diagnostics 2026, 16(5), 757; https://doi.org/10.3390/diagnostics16050757 - 3 Mar 2026

Viewed by 38

Abstract

Background/Objectives: Lung cancer (LC) is the leading cause of cancer-related mortality worldwide, making early and accurate diagnosis crucial for improving patient outcomes. Although chest computed tomography (CT) enables detailed assessment of lung abnormalities, manual interpretation is time-consuming, requires expert expertise, and is prone to diagnostic variability. To address these challenges, this study proposes DE-SAMNet, a hybrid deep learning framework for automated multi-class LC classification from CT scans. Methods: The model integrates two pre-trained convolutional neural networks—DenseNet121 and EfficientNetB0—operating in parallel to extract complementary multi-scale features. A Spatial Attention Module (SAM) is applied to each feature stream to emphasize clinically important regions. Final classification is performed through a compact fusion mechanism involving global average pooling, batch normalization, and a fully connected layer. DE-SAMNet was evaluated on two datasets: a public dataset (IQ-OTH/NCCD) with benign, malignant, and normal cases, and a private clinical dataset including benign, malignant, cystic, and healthy cases. Results: On the public dataset, the model achieved a 99.00% F1-score, 98.41% recall, 99.64% precision, and 99.54% accuracy. On the private dataset, it obtained 95.96% accuracy, 95.99% precision, 96.04% F1-score, and 96.21% recall, outperforming existing approaches. To enhance reliability, explainable AI (XAI) techniques such as Grad-CAM were used to visualize the model’s decision rationale. The resulting heatmaps effectively highlight lesion-specific regions, offering transparency and supporting clinical interpretability. Conclusions: This explainability strengthens trust in automated predictions and demonstrates the clinical potential of the proposed system. Overall, DE-SAMNet delivers a highly accurate and interpretable solution for early LC detection. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

►▼ Show Figures

Figure 1

21 pages, 1469 KB

Open AccessArticle

Development of Surveillance Robots Based on Face Recognition Using High-Order Statistical Features and Evidence Theory

by Slim Ben Chaabane, Rafika Harrabi, Anas Bushnag and Hassene Seddik

J. Imaging 2026, 12(3), 107; https://doi.org/10.3390/jimaging12030107 - 28 Feb 2026

Viewed by 172

Abstract

The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are commonly employed in surveillance systems to handle risky tasks that are beyond human capability. In this paper, we present a prototype of a cost-effective mobile surveillance robot built on the Raspberry PI 4, designed for integration into various industrial environments. This smart robot detects intruders using IoT and face recognition technology. The proposed system is equipped with a passive infrared (PIR) sensor and a camera for capturing live-streaming video and photos, which are sent to the control room through IoT technology. Additionally, the system uses face recognition algorithms to differentiate between company staff and potential intruders. The face recognition method combines high-order statistical features and evidence theory to improve facial recognition accuracy and robustness. High-order statistical features are used to capture complex patterns in facial images, enhancing discrimination between individuals. Evidence theory is employed to integrate multiple information sources, allowing for better decision-making under uncertainty. This approach effectively addresses challenges such as variations in lighting, facial expressions, and occlusions, resulting in a more reliable and accurate face recognition system. When the system detects an unfamiliar individual, it sends out alert notifications and emails to the control room with the captured picture using IoT. A web interface has also been set up to control the robot from a distance through Wi-Fi connection. The proposed face recognition method is evaluated, and a comparative analysis with existing techniques is conducted. Experimental results with 400 test images of 40 individuals demonstrate the effectiveness of combining various attribute images in improving human face recognition performance. Experimental results indicate that the algorithm can identify human faces with an accuracy of 98.63%. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

►▼ Show Figures

Figure 1

19 pages, 1458 KB

Open AccessArticle

A Dual-Stream Transformer with Self-Supervised Contrastive Training for fMRI-Based Autism Spectrum Disorder Classification

by Zirui Li and Lei Wang

Brain Sci. 2026, 16(3), 277; https://doi.org/10.3390/brainsci16030277 - 28 Feb 2026

Viewed by 93

Abstract

Background/Objectives: Autism Spectrum Disorder (ASD) diagnosis is difficult due to heterogeneity. Current Time-series Transformer (TST) methods cannot capture both dynamic and global brain connectivity simultaneously, which limits ASD classification performance. Methods: We propose TwoTST, a dual-stream Transformer that combines raw Region of Interest(ROI) time series and Pearson correlation matrices(PCC).We pre-train the two TST branches via self-supervised learning by randomly masking ROIs and PCC, use contrastive learning and fine-tuning for feature alignment, evaluate five fusion strategies, and analyze relative parameter changes during fine-tuning. Results: Experiments were conducted on the ABIDE I dataset using the CC200 atlas. Contrastive learning, pre-training, and the dual-stream structure improve mean AUC by 3–6%, 3–7%, and 3–4% respectively. Attention Pooling is the optimal fusion strategy. Relative parameter changes are 0.32–0.44 for TST modules and 0.31–1.45 for contrastive projection heads. Conclusions: TwoTST effectively integrates dynamic and global connectivity for ASD identification. The proposed design outperforms single-stream models and provides a reliable approach for neuroimaging-based disorder classification. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

►▼ Show Figures

Figure 1

29 pages, 56852 KB

Open AccessArticle

MFE-DETR: Multimodal Feature-Enhanced Detection Transformer for RGB–Infrared Object Detection in Aerial Imagery

by Zekai Yan and Mu-Jiang-Shan Wang

Symmetry 2026, 18(3), 417; https://doi.org/10.3390/sym18030417 - 27 Feb 2026

Viewed by 94

Abstract

Multimodal object detection utilizing RGB and infrared (IR) imagery has become a critical research area for unmanned aerial vehicle (UAV) surveillance applications, providing reliable perception under various lighting and environmental conditions. Nevertheless, current methods encounter three primary challenges: (1) insufficient utilization of frequency-domain properties in heterogeneous modalities, (2) restricted adaptability in crossmodal feature integration across different environmental scenarios, and (3) inadequate modeling of fine-grained spatial relationships for accurate object localization. To overcome these limitations, we introduce MFE-DETR, a novel Multimodal Feature-Enhanced Detection Transformer that achieves superior RGB-IR fusion through three complementary innovations. First, we present the Dual-Modality Enhancement Module (DMEM) with two specialized processing streams: the Haar wavelet decomposition stream (HWD-Stream) that conducts multi-resolution frequency-domain analysis to independently enhance low-frequency structural components and high-frequency textural information, and the Attention-guided Kolmogorov–Arnold Refinement Stream (AKR-Stream) that employs learnable spline-parameterized activation functions for adaptive nonlinear feature refinement. Second, we enhance the Cross-scale Channel Feature Fusion module by integrating an Adaptive Feature Fusion Module (AFAM) with complementary gating mechanisms that dynamically adjust modality contributions according to spatial informativeness. Third, we introduce the Bilinear Attention-Enhanced Detection Module (BADM) that models second-order feature interactions through factorized bilinear pooling, facilitating fine-grained crossmodal correlation analysis. Extensive experiments on the DroneVehicle benchmark show that MFE-DETR attains 78.6% mAP

_{50}

and 57.8% mAP

_{50 : 95}

, outperforming state-of-the-art approaches by 5.3% and 3.7%, respectively. Additional evaluations on the VisDrone dataset further confirm the excellent generalization performance of our method, especially for small object detection with 18.6% AP

_{S}

, achieving a 1.5% improvement over existing techniques. Comprehensive ablation studies and visualizations offer detailed insights into the effectiveness of each proposed component. Full article

(This article belongs to the Section Computer)

28 pages, 8658 KB

Open AccessArticle

Time–Frequency Respiratory Impedance Maps Enable Within-Breath Deep Learning for Small Airway Dysfunction Identification

by Dongfang Zhao, Sunxiaohe Li, Peng Wang, Pang Wu, Zhenfeng Li, Lidong Du, Xianxiang Chen, Ting Yang, Jingen Xia and Zhen Fang

Bioengineering 2026, 13(3), 280; https://doi.org/10.3390/bioengineering13030280 - 27 Feb 2026

Viewed by 166

Abstract

Small airway dysfunction (SAD) is an early functional abnormality associated with multiple chronic airway diseases. However, clinical assessment often relies on spirometry-based indices, which require forced maneuvers and are sensitive to subject effort, thereby increasing patient burden and complicating quality control. In contrast, Impulse Oscillometry (IOS) requires only tidal breathing, imposing minimal subject burden while providing respiratory impedance indices informative for SAD identification. This study proposes a dual-domain complementary deep learning framework based on IOS for SAD identification, leveraging within-breath impedance dynamics. Specifically, raw IOS time-series signals are transformed into time–frequency respiratory impedance maps (TFRIM) capturing impedance over frequency and within-breath time. A two-stream architecture is then used to jointly learn complementary features from TFRIM and the original time-series signals. To mitigate inter-subject baseline variability, we further introduce a demographics-driven adaptive feature modulation module for subject-specific calibration. The model jointly predicts multiple small-airway indices, with decision-level fusion applied during inference. Experimental validation on 2510 subjects using five-fold cross-validation demonstrates that the proposed framework achieves an accuracy of 81.39%, outperforming representative baselines. These results suggest the potential utility of combining within-breath IOS dynamics with subject-specific calibration for SAD identification, warranting further external validation before screening deployment. Full article

(This article belongs to the Special Issue AI-Driven Approaches to Diseases Detection and Diagnosis)

►▼ Show Figures

Figure 1

25 pages, 5720 KB

Open AccessArticle

MuRDE-FPN: Precise UAV Localization Using Enhanced Feature Pyramid Network

by Monika Kisieliūtė and Ignas Daugėla

Drones 2026, 10(3), 162; https://doi.org/10.3390/drones10030162 - 27 Feb 2026

Viewed by 167

Abstract

Unmanned aerial vehicles (UAVs) require reliable autonomous positioning independent of external satellite navigation signals, motivating the development of a vision-based, end-to-end finding point in map (FPI) framework. This study introduces MuRDE-FPN, an enhanced feature pyramid network (FPN) designed for precise UAV localization, building upon a lightweight one-stream transformer-based (OS-PCPVT) backbone. MuRDE-FPN integrates efficient channel attention (ECA) for adaptive channel recalibration and features two novel components: a multi-receptive deformable enhancement (MuRDE) that utilizes deformable convolutions with varying kernel sizes to refine the semantically rich final feature layer, and a feature alignment module (FAM) for cross-level fusion. Evaluated on the UL14 dataset and a new, more diverse UAV-Sat dataset, MuRDE-FPN consistently outperformed four state-of-the-art FPI methods (FPI, WAMF-FPI, OS-FPI, DCD-FPI). It achieved a relative distance score of 84.26 on UL14 and 63.74 on UAV-Sat datasets, demonstrating improved localization. Ablation studies confirmed the cumulative benefits of ECA, MuRDE, and FAM. These findings highlight the effectiveness of custom FPN designs and targeted feature enhancements for precise cross-view positioning, with MuRDE-FPN providing a robust solution and the UAV-Sat dataset offering a new benchmark for evaluation. Future efforts will address computational efficiency and performance across varying data quality environments. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 33.

Go to page 1 2 3 4 5

Search Results (1,626)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI