Saved Queries

Accurate automatic sleep stage classification from single-channel EEG remains challenging due to the need for effective extraction of multiscale neurophysiological features and modeling of long-range temporal dependencies. This study aims to address these limitations by developing an efficient and compact deep learning architecture tailored for wearable and edge device applications. We propose MultiScaleSleepNet, a hybrid convolutional neural network–bidirectional long short-term memory–transformer architecture that extracts multiscale temporal and spectral features through parallel convolutional branches, followed by sequential modeling using a BiLSTM memory network and transformer-based attention mechanisms. The model obtained an accuracy, macro-averaged F1 score, and kappa coefficient of 88.6%, 0.833, and 0.84 on the Sleep-EDF dataset; 85.6%, 0.811, and 0.80 on the Sleep-EDF Expanded dataset; and 84.6%, 0.745, and 0.79 on the SHHS dataset. Ablation studies indicate that attention mechanisms and spectral fusion consistently improve performance, with the most notable gains observed for stages N1, N3, and rapid eye movement. MultiScaleSleepNet demonstrates competitive performance across multiple benchmark datasets while maintaining a compact size of 1.9 million parameters, suggesting robustness to variations in dataset size and class distribution. The study supports the feasibility of real-time, accurate sleep staging from single-channel EEG using parameter-efficient deep models suitable for portable systems. Full article

(This article belongs to the Special Issue AI on Biomedical Signal Sensing and Processing for Health Monitoring)

19 pages, 4130 KB

Open AccessArticle

Deep Learning Application of Fruit Planting Classification Based on Multi-Source Remote Sensing Images

by Jiamei Miao, Jian Gao, Lei Wang, Lei Luo and Zhi Pu

Appl. Sci. 2025, 15(20), 10995; https://doi.org/10.3390/app152010995 (registering DOI) - 13 Oct 2025

Abstract

With global climate change, urbanization, and agricultural resource limitations, precision agriculture and crop monitoring are crucial worldwide. Integrating multi-source remote sensing data with deep learning enables accurate crop mapping, but selecting optimal network architectures remains challenging. To improve remote sensing-based fruit planting classification and support orchard management and rural revitalization, this study explored feature selection and network optimization. We proposed an improved CF-EfficientNet model (incorporating FGMF and CGAR modules) for fruit planting classification. Multi-source remote sensing data (Sentinel-1, Sentinel-2, and SRTM) were used to extract spectral, vegetation, polarization, terrain, and texture features, thereby constructing a high-dimensional feature space. Feature selection identified 13 highly discriminative bands, forming an optimal dataset, namely the preferred bands (PBs). At the same time, two classification datasets—multi-spectral bands (MS) and preferred bands (PBs)—were constructed, and five typical deep learning models were introduced to compare performance: (1) EfficientNetB0, (2) AlexNet, (3) VGG16, (4) ResNet18, (5) RepVGG. The experimental results showed that the EfficientNetB0 model based on the preferred band performed best in terms of overall accuracy (87.1%) and Kappa coefficient (0.677). Furthermore, a Fine-Grained Multi-scale Fusion (FGMF) and a Condition-Guided Attention Refinement (CGAR) were incorporated into EfficientNetB0, and the traditional SGD optimizer was replaced with Adam to construct the CF-EfficientNet architecture. The results indicated that the improved CF-EfficientNet model achieved high performance in crop classification, with an overall accuracy of 92.6% and a Kappa coefficient of 0.830. These represent improvements of 5.5 percentage points and 0.153, compared with the baseline model, demonstrating superiority in both classification accuracy and stability. Full article

►▼ Show Figures

Figure 1

26 pages, 2931 KB

Open AccessReview

Prospects of AI-Powered Bowel Sound Analytics for Diagnosis, Characterization, and Treatment Management of Inflammatory Bowel Disease

by Divyanshi Sood, Zenab Muhammad Riaz, Jahnavi Mikkilineni, Narendra Nath Ravi, Vineeta Chidipothu, Gayathri Yerrapragada, Poonguzhali Elangovan, Mohammed Naveed Shariff, Thangeswaran Natarajan, Jayarajasekaran Janarthanan, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Keerthy Gopalakrishnan and Shivaram P. Arunachalam

Med. Sci. 2025, 13(4), 230; https://doi.org/10.3390/medsci13040230 (registering DOI) - 13 Oct 2025

Abstract

Background: This narrative review examines the role of artificial intelligence (AI) in bowel sound analysis for the diagnosis and management of inflammatory bowel disease (IBD). Inflammatory bowel disease (IBD), encompassing Crohn’s disease and ulcerative colitis, presents a significant clinical burden due to its unpredictable course, variable symptomatology, and reliance on invasive procedures for diagnosis and disease monitoring. Despite advances in imaging and biomarkers, tools such as colonoscopy and fecal calprotectin remain costly, uncomfortable, and impractical for frequent or real-time assessment. Meanwhile, bowel sounds—an overlooked physiologic signal—reflect underlying gastrointestinal motility and inflammation but have historically lacked objective quantification. With recent advances in artificial intelligence (AI) and acoustic signal processing, there is growing interest in leveraging bowel sound analysis as a novel, non-invasive biomarker for detecting IBD, monitoring disease activity, and predicting disease flares. This approach holds the promise of continuous, low-cost, and patient-friendly monitoring, which could transform IBD management. Objectives: This narrative review assesses the clinical utility, methodological rigor, and potential future integration of artificial intelligence (AI)-driven bowel sound analysis in inflammatory bowel disease (IBD), with a focus on its potential as a non-invasive biomarker for disease activity, flare prediction, and differential diagnosis. Methods: This manuscript reviews the potential of AI-powered bowel sound analysis as a non-invasive tool for diagnosing, monitoring, and managing inflammatory bowel disease (IBD), including Crohn’s disease and ulcerative colitis. Traditional diagnostic methods, such as colonoscopy and biomarkers, are often invasive, costly, and impractical for real-time monitoring. The manuscript explores bowel sounds, which reflect gastrointestinal motility and inflammation, as an alternative biomarker by utilizing AI techniques like convolutional neural networks (CNNs), transformers, and gradient boosting. We analyze data on acoustic signal acquisition (e.g., smart T-shirts, smartphones), signal processing methodologies (e.g., MFCCs, spectrograms, empirical mode decomposition), and validation metrics (e.g., accuracy, F1 scores, AUC). Studies were assessed for clinical relevance, methodological rigor, and translational potential. Results: Across studies enrolling 16–100 participants, AI models achieved diagnostic accuracies of 88–96%, with AUCs ≥ 0.83 and F1 scores ranging from 0.71 to 0.85 for differentiating IBD from healthy controls and IBS. Transformer-based approaches (e.g., HuBERT, Wav2Vec 2.0) consistently outperformed CNNs and tabular models, yielding F1 scores of 80–85%, while gradient boosting on wearable multi-microphone recordings demonstrated robustness to background noise. Distinct acoustic signatures were identified, including prolonged sound-to-sound intervals in Crohn’s disease (mean 1232 ms vs. 511 ms in IBS) and high-pitched tinkling in stricturing phenotypes. Despite promising performance, current models remain below established biomarkers such as fecal calprotectin (~90% sensitivity for active disease), and generalizability is limited by small, heterogeneous cohorts and the absence of prospective validation. Conclusions: AI-powered bowel sound analysis represents a promising, non-invasive tool for IBD monitoring. However, widespread clinical integration requires standardized data acquisition protocols, large multi-center datasets with clinical correlates, explainable AI frameworks, and ethical data governance. Future directions include wearable-enabled remote monitoring platforms and multi-modal decision support systems integrating bowel sounds with biomarker and symptom data. This manuscript emphasizes the need for large-scale, multi-center studies, the development of explainable AI frameworks, and the integration of these tools within clinical workflows. Future directions include remote monitoring using wearables and multi-modal systems that combine bowel sounds with biomarkers and patient symptoms, aiming to transform IBD care into a more personalized and proactive model. Full article

►▼ Show Figures

Figure 1

28 pages, 4479 KB

Open AccessArticle

Integrated Network Pharmacology and Molecular Dynamics Reveal Multi-Target Anticancer Mechanisms of Myrtus communis Essential Oils

by Ahmed Bayoudh, Nidhal Tarhouni, Riadh Ben Mansour, Saoussen Mekrazi, Raoudha Sadraoui, Karim Kriaa, Zakarya Ahmed, Ahlem Soussi, Imen Kallel and Bilel Hadrich

Pharmaceuticals 2025, 18(10), 1542; https://doi.org/10.3390/ph18101542 (registering DOI) - 13 Oct 2025

Abstract

Background: Cancer’s multifactorial complexity demands innovative polypharmacological strategies that can simultaneously target multiple oncogenic pathways. Natural products, with their inherent chemical diversity, offer promising multi-target therapeutic potential. This study comprehensively investigates the anticancer mechanisms of Tunisian Myrtus communis essential oils (McEOs) using an integrated computational-experimental framework to elucidate their polypharmacological basis and therapeutic potential. Methods: McEO composition was characterized via GC-MS analysis. Antiproliferative activity was evaluated against HeLa (cervical), MCF-7 (breast), and Raji (lymphoma) cancer cell lines using MTT assays. A multi-scale computational pipeline integrated network pharmacology, molecular docking against eight key oncoproteins, and 100 ns all-atom molecular dynamics simulations to elucidate molecular mechanisms and target interactions. Results: GC-MS revealed a 1,8-cineole-rich chemotype (38.94%) containing significant sesquiterpenes. McEO demonstrated potent differential cytotoxicity: HeLa (IC₅₀ = 8.12 μg/mL) > MCF-7 (IC₅₀ = 19.59 μg/mL) > Raji cells (IC₅₀ = 27.32 μg/mL). Network pharmacology quantitatively explained this differential sensitivity through target overlap analysis, showing higher associations with breast (23%) and cervical (18.3%) versus lymphoma (5.5%) cancer pathways. Molecular docking identified spathulenol as a high-affinity Androgen Receptor (AR) antagonist (XP GScore: −9.650 kcal/mol). Molecular dynamics simulations confirmed exceptional spathulenol-AR complex stability, maintaining critical hydrogen bonding with Asn705 for 96% of simulation time. Conclusions: McEO exerts sophisticated multi-target anticancer effects through synergistic constituent interactions, notably spathulenol’s potent AR antagonism. This integrated computational-experimental approach validates McEO’s polypharmacological basis and supports its therapeutic potential, particularly for hormone-dependent malignancies, while establishing a robust framework for natural product bioactivity deconvolution. Full article

(This article belongs to the Section Natural Products)

17 pages, 4775 KB

Open AccessArticle

Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems

by Jisu Kim and Byoung-Keon D. Park

Sensors 2025, 25(20), 6323; https://doi.org/10.3390/s25206323 (registering DOI) - 13 Oct 2025

Abstract

Understanding occupant behavior is critical for enhancing safety and situational awareness in intelligent transportation systems. This study investigates multimodal occupant behavior recognition using sequential inputs extracted from 2D pose, 2D gaze, and facial movements. We conduct a comprehensive comparative study of three distinct architectural paradigms: a static Multi-Layer Perceptron (MLP), a recurrent Long Short-Term Memory (LSTM) network, and an attention-based Transformer encoder. All experiments are performed on the large-scale Occupant Behavior Classification (OBC) dataset, which contains approximately 2.1 million frames across 79 behavior classes collected in a controlled, simulated environment. Our results demonstrate that temporal models significantly outperform the static baseline. The Transformer model, in particular, emerges as the superior architecture, achieving a state-of-the-art Macro F1 score of 0.9570 with a configuration of a 50-frame span and a step size of 10. Furthermore, our analysis reveals that the Transformer provides an excellent balance between high performance and computational efficiency. These findings demonstrate the superiority of attention-based temporal modeling with multimodal fusion and provide a practical framework for developing robust and efficient in-vehicle occupant monitoring systems. Implementation code and supplementary resources are available (see Data Availability Statement). Full article

(This article belongs to the Special Issue Deep Learning Sensor Fusion for Human–Machine Interaction in Intelligent Transportation Systems)

31 pages, 9234 KB

Open AccessArticle

A Dual-Branch Framework Integrating the Segment Anything Model and Semantic-Aware Network for High-Resolution Cropland Extraction

by Dujuan Zhang, Yiping Li, Yucai Shen, Hengliang Guo, Haitao Wei, Jian Cui, Gang Wu, Tian He, Lingling Wang, Xiangdong Liu and Shan Zhao

Remote Sens. 2025, 17(20), 3424; https://doi.org/10.3390/rs17203424 (registering DOI) - 13 Oct 2025

Abstract

Accurate spatial information of cropland is crucial for precision agricultural management and ensuring national food security. High-resolution remote sensing imagery combined with deep learning algorithms provides a promising approach for extracting detailed cropland information. However, due to the diverse morphological characteristics of croplands across different agricultural landscapes, existing deep learning methods encounter challenges in precise boundary localization. The advancement of large-scale vision models has led to the emergence of the Segment Anything Model (SAM), which has demonstrated remarkable performance on natural images and attracted considerable attention in the field of remote sensing image segmentation. However, when applied to high-resolution cropland extraction, SAM faces limitations in semantic expressiveness and cross-domain adaptability. To address these issues, this study proposes a dual-branch framework integrating SAM and a semantically aware network (SAM-SANet) for high-resolution cropland extraction. Specifically, a semantically aware branch based on a semantic segmentation network is applied to identify cropland areas, complemented by a boundary-constrained SAM branch that directs the model’s attention to boundary information and enhances cropland extraction performance. Additionally, a boundary-aware feature fusion module and a prompt generation and selection module are incorporated into the SAM branch for precise cropland boundary localization. The former aggregates multi-scale edge information to enhance boundary representation, while the latter generates prompts with high relevance to the boundary. To evaluate the effectiveness of the proposed approach, we construct three cropland datasets named GID-CD, JY-CD and QX-CD. Experimental results on these datasets demonstrated that SAM-SANet achieved mIoU scores of 87.58%, 91.17% and 71.39%, along with mF1 scores of 93.54%, 95.35% and 82.21%, respectively. Comparative experiments with mainstream semantic segmentation models further confirmed the superior performance of SAM-SANet in high-resolution cropland extraction. Full article

►▼ Show Figures

Figure 1

30 pages, 23104 KB

Open AccessArticle

MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion

by Guolong Wu and Yimin Lu

Remote Sens. 2025, 17(20), 3425; https://doi.org/10.3390/rs17203425 (registering DOI) - 13 Oct 2025

Abstract

Accurate mapping of marine aquaculture areas is critical for environmental management and sustainable development for marine ecosystem protection and sustainable resource utilization. However, remote sensing imagery based on single-sensor modalities has inherent limitations when extracting aquaculture zones in complex marine environments. To address this challenge, we constructed a multi-modal dataset from five Chinese coastal regions using cloud detection methods and developed Multi-modal Spatial–Frequency Adaptive Fusion Network (MSAFNet) for optical-radar data fusion. MSAFNet employs a dual-path architecture utilizing a Multi-scale Dual-path Feature Module (MDFM) that combines CNN and Transformer capabilities to extract multi-scale features. Additionally, it implements a Dynamic Frequency Domain Adaptive Fusion Module (DFAFM) to achieve deep integration of multi-modal features in both spatial and frequency domains, effectively leveraging the complementary advantages of different sensor data. Results demonstrate that MSAFNet achieves 76.93% mean intersection over union (mIoU), 86.96% mean F1 score (mF1), and 93.26% mean Kappa coefficient (mKappa) in extracting floating raft aquaculture (FRA) and cage aquaculture (CA), significantly outperforming existing methods. Applied to China’s coastal waters, the model generated 2020 nearshore aquaculture distribution maps, demonstrating its generalization capability and practical value in complex marine environments. This approach provides reliable technical support for marine resource management and ecological monitoring. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning for Multi-Modal and Multi-Spectral Remote Sensing Image Processing)

►▼ Show Figures

Figure 1

17 pages, 2558 KB

Open AccessArticle

Spatiotemporal Forecasting of Regional Electric Vehicles Charging Load: A Multi-Channel Attentional Graph Network Integrating Dynamic Electricity Price and Weather

by Hui Ding, Youyou Guo and Haibo Wang

Electronics 2025, 14(20), 4010; https://doi.org/10.3390/electronics14204010 (registering DOI) - 13 Oct 2025

Abstract

Accurate spatiotemporal forecasting of electric vehicle (EV) charging load is essential for smart grid management and efficient charging service operation. This paper introduced a novel spatiotemporal graph convolutional network with cross-attention (STGCN-Attention) for multi-factor charging load prediction. The model demonstrated a strong capability to capture cross-scale spatiotemporal correlations by adaptively integrating historical charging load, charging pile occupancy, dynamic electricity prices, and meteorological data. Evaluations in real-world charging scenarios showed that the proposed model achieved superior performance in hour forecasting, reducing Mean Absolute Error (MAE) by 9% and 16% compared to traditional STGCN and LSTM models, respectively. It also attained approximately 30% higher accuracy than 24 h prediction. Furthermore, the study identified an optimal 1-2-1 multi-scale temporal window strategy (hour–day–week) and revealed key driver factors. The combined input of load, occupancy, and electricity price yielded the best results (RMSE = 37.21, MAE = 27.34), while introducing temperature and precipitation raised errors by 2–8%, highlighting challenges in fine-grained weather integration. These findings provided actionable insights for real-time and intraday charging scheduling. Full article

(This article belongs to the Special Issue Applications of Data Analytics and Artificial Intelligence in Electric Vehicles)

►▼ Show Figures

Figure 1

22 pages, 7434 KB

Open AccessArticle

A Lightweight Image-Based Decision Support Model for Marine Cylinder Lubrication Based on CNN-ViT Fusion

by Qiuyu Li, Guichen Zhang and Enrui Zhao

J. Mar. Sci. Eng. 2025, 13(10), 1956; https://doi.org/10.3390/jmse13101956 - 13 Oct 2025

Abstract

Under the context of “Energy Conservation and Emission Reduction,” low-sulfur fuel has become widely adopted in maritime operations, posing significant challenges to cylinder lubrication systems. Traditional oil injection strategies, heavily reliant on manual experience, suffer from instability and high costs. To address this, a lightweight image retrieval model for cylinder lubrication is proposed, leveraging deep learning and computer vision to support oiling decisions based on visual features. The model comprises three components: a backbone network, a feature enhancement module, and a similarity retrieval module. Specifically, EfficientNetB0 serves as the backbone for efficient feature extraction under low computational overhead. MobileViT Blocks are integrated to combine local feature perception of Convolutional Neural Networks (CNNs) with the global modeling capacity of Transformers. To further improve receptive field and multi-scale representation, Receptive Field Blocks (RFB) are introduced between the components. Additionally, the Convolutional Block Attention Module (CBAM) attention mechanism enhances focus on salient regions, improving feature discrimination. A high-quality image dataset was constructed using WINNING’s large bulk carriers under various sea conditions. The experimental results demonstrate that the EfficientNetB0 + RFB + MobileViT + CBAM model achieves excellent performance with minimal computational cost: 99.71% Precision, 99.69% Recall, and 99.70% F1-score—improvements of 11.81%, 15.36%, and 13.62%, respectively, over the baseline EfficientNetB0. With only a 0.3 GFLOP and 8.3 MB increase in model size, the approach balances accuracy and inference efficiency. The model also demonstrates good robustness and application stability in real-world ship testing, with potential for further adoption in the field of intelligent ship maintenance. Full article

(This article belongs to the Section Ocean Engineering)

►▼ Show Figures

Figure 1

17 pages, 1278 KB

Open AccessArticle

KG-FLoc: Knowledge Graph-Enhanced Fault Localization in Secondary Circuits via Relation-Aware Graph Neural Networks

by Xiaofan Song, Chen Chen, Xiangyang Yan, Jingbo Song, Huanruo Qi, Wenjie Xue and Shunran Wang

Electronics 2025, 14(20), 4006; https://doi.org/10.3390/electronics14204006 (registering DOI) - 13 Oct 2025

Abstract

This paper introduces KG-FLoc, a knowledge graph-enhanced framework for secondary circuit fault localization in intelligent substations. The proposed KG-FLoc innovatively formalizes secondary components (e.g., circuit breakers, disconnectors) as graph nodes and their multi-dimensional relationships (e.g., electrical connections, control logic) as edges, constructing the first comprehensive knowledge graph (KG) to structurally and operationally model secondary circuits. By reframing fault localization as a knowledge graph link prediction task, KG-FLoc identifies missing or abnormal connections (edges) as fault indicators. To address dynamic topologies and sparse fault samples, KG-FLoc integrates two core innovations: (1) a relation-aware gated unit (RGU) that dynamically regulates information flow through adaptive gating mechanisms, and (2) a hierarchical graph isomorphism network (GIN) architecture for multi-scale feature extraction. Evaluated on real-world datasets from 110 kV/220 kV substations, KG-FLoc achieves 97.2% accuracy in single-fault scenarios and 93.9% accuracy in triple-fault scenarios, surpassing SVM, RF, MLP, and standard GNN baselines by 12.4–31.6%. Beyond enhancing substation reliability, KG-FLoc establishes a knowledge-aware paradigm for fault diagnosis in industrial systems, enabling precise reasoning over complex interdependencies. Full article

►▼ Show Figures

Figure 1

20 pages, 5086 KB

Open AccessArticle

A Multi-Modal Attention Fusion Framework for Road Connectivity Enhancement in Remote Sensing Imagery

by Yongqi Yuan, Yong Cheng, Bo Pan, Ge Jin, De Yu, Mengjie Ye and Qian Zhang

Mathematics 2025, 13(20), 3266; https://doi.org/10.3390/math13203266 (registering DOI) - 13 Oct 2025

Abstract

Ensuring the structural continuity and completeness of road networks in high-resolution remote sensing imagery remains a major challenge for current deep learning methods, especially under conditions of occlusion caused by vegetation, buildings, or shadows. To address this, we propose a novel post-processing enhancement framework that improves the connectivity and accuracy of initial road extraction results produced by any segmentation model. The method employs a dual-stream encoder architecture, which jointly processes RGB images and preliminary road masks to obtain complementary spatial and semantic information. A core component is the MAF (Multi-Modal Attention Fusion) module, designed to capture fine-grained, long-range, and cross-scale dependencies between image and mask features. This fusion leads to the restoration of fragmented road segments, the suppression of noise, and overall improvement in road completeness. Experiments on benchmark datasets (DeepGlobe and Massachusetts) demonstrate substantial gains in precision, recall, F1-score, and mIoU, confirming the framework’s effectiveness and generalization ability in real-world scenarios. Full article

(This article belongs to the Special Issue Mathematical Methods for Machine Learning and Computer Vision)

►▼ Show Figures

Figure 1

21 pages, 3794 KB

Open AccessArticle

DEIM-SFA: A Multi-Module Enhanced Model for Accurate Detection of Weld Surface Defects

by Yan Sun, Yingjie Xie, Ran Peng, Yifan Zhang, Wei Chen and Yan Guo

Sensors 2025, 25(20), 6314; https://doi.org/10.3390/s25206314 (registering DOI) - 13 Oct 2025

Abstract

High-precision automated detection of metal welding defects is critical to ensuring structural safety and reliability in modern manufacturing. However, existing methods often struggle with insufficient fine-grained feature retention, low efficiency in multi-scale information fusion, and vulnerability to complex background interference, resulting in low detection accuracy. This work addresses the limitations by introducing the DEIM-SFA, a novel detection framework designed for automated visual inspection in industrial machine vision sensors. The model introduces a novel structure-aware dynamic convolution (SPD-Conv), effectively focusing on the fine-grained structure of defects while suppressing background noise interference; an innovative multi-scale dynamic fusion pyramid (FTPN) architecture is designed to achieve efficient and adaptive aggregation of feature information from different receptive fields, ensuring consistent detection of multi-scale targets; combined with a lightweight and efficient multi-scale attention module (EMA), this further enhances the model’s ability to locate salient regions in complex scenarios. The network is evaluated on a self-built welding defect dataset. Experimental results show that DEIM-SFA achieves a 3.9% improvement in mAP50 compared to the baseline model, mAP75 by 4.3%, mAP50–95 by 3.7%, and Recall by 1.4%. The model exhibits consistently significant superiority in detection accuracy across targets of various sizes, while maintaining well-balanced model complexity and inference efficiency, comprehensively surpassing existing state-of-the-art (SOTA) methods. Full article

(This article belongs to the Section Industrial Sensors)

►▼ Show Figures

Figure 1

16 pages, 5273 KB

Open AccessArticle

Scale-Adaptive High-Resolution Imaging Using a Rotating-Prism-Guided Variable-Boresight Camera

by Zhaojun Deng, Anhu Li, Xin Zhao, Yonghao Lai and Jialiang Jin

Sensors 2025, 25(20), 6313; https://doi.org/10.3390/s25206313 (registering DOI) - 12 Oct 2025

Abstract

Large-field-of-view (FOV) and high-resolution imaging have always been the goals pursued by imaging technology. A scale-adaptive high-resolution imaging architecture is established using a rotating-prism-embedded variable-boresight camera. By planning to prism motion, the multi-view images with rich information are combined to form a large-scale FOV image. The boresight is guided towards the region of interest (ROI) in the combined FOV to reconstruct super-resolution (SR) images with the desired information. A novel distortion correction method is proposed using virtual symmetrical prisms with rotation angles that are complementary. Based on light reverse tracing, the dispersion induced by monochromatic lights with different refractive indices can be eliminated by accurate pixel-level position compensation. For resolution enhancement, we provide a new scheme for SR imaging consisting of the residual removal network and information enhancement network by multi-view image fusion. The experiments show that the proposed architecture can achieve both large-FOV scene imaging for situational awareness and SR ROI display to acquire details, effectively perform distortion and dispersion correction, and alleviate the occlusion to a certain extent. It also provides higher image clarity compared to the traditional SR methods and overcomes the problem of balancing large-scale imaging and high-resolution imaging. Full article

(This article belongs to the Collection 3D Imaging and Sensing System)

►▼ Show Figures

Figure 1

23 pages, 23535 KB

Open AccessArticle

FANT-Det: Flow-Aligned Nested Transformer for SAR Small Ship Detection

by Hanfu Li, Dawei Wang, Jianming Hu, Xiyang Zhi and Dong Yang

Remote Sens. 2025, 17(20), 3416; https://doi.org/10.3390/rs17203416 (registering DOI) - 12 Oct 2025

Abstract

Ship detection in synthetic aperture radar (SAR) remote sensing imagery is of great significance in military and civilian applications. However, two factors limit detection performance: (1) a high prevalence of small-scale ship targets with limited information content and (2) interference affecting ship detection from speckle noise and land–sea clutter. To address these challenges, we propose a novel end-to-end (E2E) transformer-based SAR ship detection framework, called Flow-Aligned Nested Transformer for SAR Small Ship Detection (FANT-Det). Specifically, in the feature extraction stage, we introduce a Nested Swin Transformer Block (NSTB). The NSTB employs a two-level local self-attention mechanism to enhance fine-grained target representation, thereby enriching features of small ships. For multi-scale feature fusion, we design a Flow-Aligned Depthwise Efficient Channel Attention Network (FADEN). FADEN achieves precise alignment of features across different resolutions via semantic flow and filters background clutter through lightweight channel attention, further enhancing small-target feature quality. Moreover, we propose an Adaptive Multi-scale Contrastive Denoising (AM-CDN) training paradigm. AM-CDN constructs adaptive perturbation thresholds jointly determined by a target scale factor and a clutter factor, generating contrastive denoising samples that better match the physical characteristics of SAR ships. Finally, extensive experiments on three widely used open SAR ship datasets demonstrate that the proposed method achieves superior detection performance, outperforming current state-of-the-art (SOTA) benchmarks. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

►▼ Show Figures

Figure 1

39 pages, 13725 KB

Open AccessArticle

SRTSOD-YOLO: Stronger Real-Time Small Object Detection Algorithm Based on Improved YOLO11 for UAV Imageries

by Zechao Xu, Huaici Zhao, Pengfei Liu, Liyong Wang, Guilong Zhang and Yuan Chai

Remote Sens. 2025, 17(20), 3414; https://doi.org/10.3390/rs17203414 (registering DOI) - 12 Oct 2025

Abstract

To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a Multi-scale Feature Complementary Aggregation Module (MFCAM), designed to mitigate the loss of small target information as network depth increases. By integrating channel and spatial attention mechanisms with multi-scale convolutional feature extraction, MFCAM effectively locates small objects in the image. Furthermore, we introduce a novel neck architecture termed Gated Activation Convolutional Fusion Pyramid Network (GAC-FPN). This module enhances multi-scale feature fusion by emphasizing salient features while suppressing irrelevant background information. GAC-FPN employs three key strategies: adding a detection head with a small receptive field while removing the original largest one, leveraging large-scale features more effectively, and incorporating gated activation convolutional modules. To tackle the issue of positive-negative sample imbalance, we replace the conventional binary cross-entropy loss with an adaptive threshold focal loss in the detection head, accelerating network convergence. Additionally, to accommodate diverse application scenarios, we develop multiple versions of SRTSOD-YOLO by adjusting the width and depth of the network modules: a nano version (SRTSOD-YOLO-n), small (SRTSOD-YOLO-s), medium (SRTSOD-YOLO-m), and large (SRTSOD-YOLO-l). Experimental results on the VisDrone2019 and UAVDT datasets demonstrate that SRTSOD-YOLO-n improves the mAP@0.5 by 3.1% and 1.2% compared to YOLO11n, while SRTSOD-YOLO-l achieves gains of 7.9% and 3.3% over YOLO11l, respectively. Compared to other state-of-the-art methods, SRTSOD-YOLO-l attains the highest detection accuracy while maintaining real-time performance, underscoring the superiority of the proposed approach. Full article

(This article belongs to the Special Issue Advanced Image Processing Algorithms for Object Detection and Tracking in Aerial and Satellite Imagery)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 129.

Go to page 1 2 3 4 5

Search Results (6,447)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI