MDPI - Publisher of Open Access Journals

24 pages, 5693 KB

Open AccessArticle

From Geometric Alignment to Scale Balance: Directional Strip Convolution and Efficient Scale Fusion for Remote Sensing Ship Detection

by Jing Sun, Guoyou Shi, Yaxin Yang and Xiaolian Cheng

Remote Sens. 2026, 18(6), 873; https://doi.org/10.3390/rs18060873 - 12 Mar 2026

Abstract

Optical remote sensing ship detection faces significant challenges in realistic maritime scenes due to strong background clutter (e.g., docks, shorelines, wake streaks), extreme scale variation, and the elongated geometry of ships with diverse orientations. These factors frequently lead to geometric misalignment, unstable localization, [...] Read more.

Optical remote sensing ship detection faces significant challenges in realistic maritime scenes due to strong background clutter (e.g., docks, shorelines, wake streaks), extreme scale variation, and the elongated geometry of ships with diverse orientations. These factors frequently lead to geometric misalignment, unstable localization, and false alarms, particularly in congested ports and complex sea states. To enhance robustness under clutter while retaining the set prediction efficiency of DETR, we propose the Directional Efficient Network (DENet), a structure-aware enhancement built upon RT-DETR. DENet introduces two complementary components. First, Directional Strip Convolution (DSConv) replaces the standard

3 \times 3

convolution for spatial mixing. By predicting offsets conditioned on input features, DSConv performs strip aggregation that aligns with slender hull structures, thereby suppressing interference from line-shaped background patterns. Second, Efficient Scale Fusion (ESF) augments the Hybrid Encoder as an additive residual correction. It combines multiple receptive field branches with lightweight differential compensation to balance low-frequency context and high-frequency structural transitions, ensuring stable multi-scale fusion in cluttered scenes. Extensive experiments demonstrate the effectiveness of DENet. On ShipRSImageNet,

{AP}^{val}

improves from 58.8% to 63.2% and

{AP}_{50}^{val}

increases from 68.5% to 73.6%. Consistent gains are also observed on NWPU VHR-10, where

{AP}^{val}

reaches 63.0% and

{AP}_{50}^{val}

reaches 94.6%, alongside improvements on the Infrared Ship Database and VisDrone2019-DET, validating the method’s generalization capabilities. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Machine Learning for Remote Sensing Image Analysis)

► Show Figures

Figure 1

25 pages, 6369 KB

Open AccessArticle

A Lightweight Attention-Guided and Geometry-Aware Framework for Robust Maritime Ship Detection in Complex Electro-Optical Environments

by Zhe Zhang, Chang Lin and Bing Fang

Automation 2026, 7(2), 48; https://doi.org/10.3390/automation7020048 - 12 Mar 2026

Abstract

Reliable ship detection in complex maritime optical imagery is a fundamental requirement for intelligent maritime monitoring and maritime automation systems. However, severe image degradation, large-scale variations, and background clutter often lead to feature ambiguity and unstable detection performance in real-world maritime environments. To [...] Read more.

Reliable ship detection in complex maritime optical imagery is a fundamental requirement for intelligent maritime monitoring and maritime automation systems. However, severe image degradation, large-scale variations, and background clutter often lead to feature ambiguity and unstable detection performance in real-world maritime environments. To address these challenges, this paper proposes a lightweight one-stage ship detection framework designed for robust real-time perception under degraded maritime sensing conditions. The proposed method incorporates an Adaptive Expert Selection Attention (AESA) mechanism to perform adaptive feature selection and background suppression under visually degraded conditions, together with a Geometry-Aware MultiScale Fusion (GAMF) module that enables orientation-aware aggregation of contextual information for elongated ship targets near complex sea–sky boundaries. In addition, a geometry-aware bounding box regression refinement is introduced to improve localization consistency in image space. Extensive experiments conducted on a unified real-world maritime benchmark demonstrate that the proposed framework consistently outperforms the baseline YOLO11n model by approximately 2–5 percentage points in terms of mAP@0.5 and mAP@0.5:0.95, while maintaining moderate computational complexity and real-time inference capability. These results indicate that the proposed method provides a practical and deployment-oriented perception solution for maritime automation applications, including onboard electro-optical sensing and coastal surveillance. Full article

► Show Figures

Figure 1

27 pages, 15115 KB

Open AccessArticle

An Object Tracking Algorithm Based on Multi-Scale Attention and Adaptive Fusion

by Deyu Zhang, Haiyang Li and Yanhui Lv

Appl. Sci. 2026, 16(6), 2646; https://doi.org/10.3390/app16062646 - 10 Mar 2026

Viewed by 114

Abstract

Single-object tracking in complex scenes faces challenges such as drastic target scale variation and strong background interference. To address these issues, an object tracking algorithm based on multi-scale attention and adaptive fusion is proposed. The method integrates a multi-scale attention module and an [...] Read more.

Single-object tracking in complex scenes faces challenges such as drastic target scale variation and strong background interference. To address these issues, an object tracking algorithm based on multi-scale attention and adaptive fusion is proposed. The method integrates a multi-scale attention module and an adaptive gated fusion module, enabling the adaptive mining of key features and dynamic adjustment of fusion weights across multi-level features. This effectively highlights target regions, suppresses redundant information, and enhances the model’s discriminative capability and robustness under complex backgrounds and occlusion. Experiments are conducted on the OTB100 and UAV123 datasets. Results show that, compared with the baseline model, the proposed algorithm improves the success rate and precision by 1.9% and 3.3%, respectively, on OTB100, and by 2.9% and 3.5%, respectively, on UAV123. Moreover, it achieves superior performance when facing typical challenging attributes such as occlusion, scale variation, and background clutter. In summary, the proposed algorithm enhances both tracking accuracy and robustness, offering a viable approach for object tracking under complex conditions. Full article

► Show Figures

Figure 1

24 pages, 18324 KB

Open AccessArticle

DTRFR: A Unified Detector for Diverse Target Detection in High-Spatial-Resolution Spaceborne Infrared Video

by Xiaoying Wu, Dandan Li, Xin Chen, Kai Hu and Peng Rao

Remote Sens. 2026, 18(5), 780; https://doi.org/10.3390/rs18050780 - 4 Mar 2026

Viewed by 159

Abstract

Spaceborne infrared small-target detection plays a critical role in space-sky early warning, disaster rescue, and reconnaissance tracking, benefiting from all-time, all-weather, and wide-area monitoring capabilities. The deployment of high-spatial-resolution infrared payloads (ground sampling distance, GSD < 10 m) has introduced pronounced scale diversity [...] Read more.

Spaceborne infrared small-target detection plays a critical role in space-sky early warning, disaster rescue, and reconnaissance tracking, benefiting from all-time, all-weather, and wide-area monitoring capabilities. The deployment of high-spatial-resolution infrared payloads (ground sampling distance, GSD < 10 m) has introduced pronounced scale diversity among targets, leading to size-sensitive performance degradation in existing detectors and heightened risks of missed detections or false alarms in mixed-size scenarios. Furthermore, multi-frame infrared small-target detection methods often face challenges in maintaining consistent temporal coherence during feature propagation across sequences. To overcome these limitations in high-resolution spaceborne infrared videos, we propose DTRFR, an end-to-end unified detection framework built on an enhanced recurrent feature refinement architecture. This approach incorporates a realistic SITP-QLSD dataset derived from QLSAT-2 infrared backgrounds, featuring diverse scenes, multi-size small targets, and a dedicated generalization sub-test set with extremely small targets partially unseen in training; a multi-scale IRFeatureExtractor leveraging parallel convolutions and dilated receptive fields for improved cross-scale discrimination and clutter suppression; and an adaptive gating pyramid deformable alignment module to optimize sequence alignment and enhance temporal consistency, enabling robust performance across various clutter levels and dynamic backgrounds. Extensive evaluations on SITP-QLSD demonstrate that DTRFR attains competitive performance, achieving mIoU of 74.32% and Pd of 94.51% on the main set, with strong robustness on the generalization sub-test set (Pd = 92.37%). Compared to single-frame and multi-frame baselines, the proposed method achieves higher detection accuracy with significantly reduced false alarms, benefiting from multi-scale feature extraction that enables robust detection of small targets of different sizes in infrared videos. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

29 pages, 7057 KB

Open AccessArticle

Research on Automatic Velocity Spectrum Picking Algorithm for Seabed Multiples Based on Deep Learning

by Sixin Zhu, Xu Zhao, Shuo Cai and Fuyao Cui

Appl. Sci. 2026, 16(5), 2373; https://doi.org/10.3390/app16052373 - 28 Feb 2026

Viewed by 123

Abstract

Multiples often dominate semblance (velocity) spectra and bias-stacking velocity picks, degrading NMO correction and subsequent imaging. We propose FLD–PA, a practical two-stage workflow for automatic velocity-spectrum picking under strong multiple interference. First, the feature-level decoupling detector (FLD) uses an attention-enhanced, YOLOv4-style architecture to [...] Read more.

Multiples often dominate semblance (velocity) spectra and bias-stacking velocity picks, degrading NMO correction and subsequent imaging. We propose FLD–PA, a practical two-stage workflow for automatic velocity-spectrum picking under strong multiple interference. First, the feature-level decoupling detector (FLD) uses an attention-enhanced, YOLOv4-style architecture to localize sparse key picks while suppressing multiple-related clutter. Second, the physics-informed Point Adjustment (PA) module refines coarse picks by enforcing lateral continuity across adjacent spectra and time consistency constraints derived from the stacked section. This refinement yields a geophysically plausible velocity trend. Experiments on two real datasets from a single offshore survey (with non-overlapping CMP/line subsets) show that FLD–PA improves PA@10px from 91.50% to 93.14% and reduces RMSE from 12.40 to 10.15 pixels compared with a YOLOv8–LSTM baseline. Under a matched-recall setting (≈81%), we tune confidence thresholds on a held-out validation subset and evaluate both methods at the same recall. FLD–PA achieves PA@10px = 93.14% with RMSE = 10.15 pixels, compared with 91.50% and 12.40 pixels for YOLOv8–LSTM. Overall, FLD–PA improves the accuracy and stability of velocity picking under strong multiple interference. However, our evaluation focuses on within-survey robustness; cross-survey generalization remains for future work. Full article

(This article belongs to the Section Earth Sciences)

► Show Figures

Figure 1

20 pages, 30586 KB

Open AccessArticle

Orthogonal-Heading Wavelength-Resolution SAR Image Stack Fusion-Based Foliage-Penetrating Vehicle Detection

by Haonan Zhang and Daoxiang An

Remote Sens. 2026, 18(5), 734; https://doi.org/10.3390/rs18050734 - 28 Feb 2026

Viewed by 101

Abstract

This paper presents an orthogonal-heading wavelength-resolution SAR (WRSAR) target detection framework that fuses multi-heading image stacks for foliage-penetrating (FOPEN) vehicle detection. First, a low-rank–sparse decomposition is applied to very-high-frequency (VHF), ultra-wideband (UWB) WRSAR stacks to suppress vegetation clutter and enhance target contrast. The [...] Read more.

This paper presents an orthogonal-heading wavelength-resolution SAR (WRSAR) target detection framework that fuses multi-heading image stacks for foliage-penetrating (FOPEN) vehicle detection. First, a low-rank–sparse decomposition is applied to very-high-frequency (VHF), ultra-wideband (UWB) WRSAR stacks to suppress vegetation clutter and enhance target contrast. The clutter-suppressed sparse stacks acquired from orthogonal headings are then fused to enrich target scattering characteristics. Finally, a Rayleigh-entropy statistic computed on the fused sparse stack is used to represent discontinuous positional changes. Based on the non-negative nature of WRSAR amplitudes for both clutter and FOPEN targets, we introduce a non-negative constrained tensor robust principal component analysis (NCTRPCA) to improve sparsity in the stack components. Furthermore, since Shannon differential entropy has no tunable parameter, we replace Shannon entropy with RE in this work and derive its closed-form expression for the proposed detector. Experiments on the publicly available multi-heading, multi-temporal CARABAS II dataset show that the proposed orthogonal-heading WRSAR fusion achieves higher FOPEN vehicle detection performance than recent state-of-the-art methods while maintaining moderate computational cost. Full article

(This article belongs to the Section Engineering Remote Sensing)

► Show Figures

Figure 1

30 pages, 11442 KB

Open AccessArticle

Robust 2D Human Pose Estimation with Parallel Graph–Attention Modeling and Entropy-Aware Feature Decoding

by Jiayuan Zhao, Dingyao Yu, Chunjia Han, Yingcheng Xu and Chunlei Shi

Entropy 2026, 28(3), 265; https://doi.org/10.3390/e28030265 - 28 Feb 2026

Viewed by 294

Abstract

Robust 2D human pose estimation remains challenging due to occlusion and background interference, which introduce substantial uncertainty into visual representations. This paper proposes PMNet, a Parallel Modeling Network that integrates explicit graph-based structural modeling and implicit self-attention-based semantic modeling through parallel pathways to [...] Read more.

Robust 2D human pose estimation remains challenging due to occlusion and background interference, which introduce substantial uncertainty into visual representations. This paper proposes PMNet, a Parallel Modeling Network that integrates explicit graph-based structural modeling and implicit self-attention-based semantic modeling through parallel pathways to jointly capture local dependencies and global contextual relationships among keypoints. From an information-theoretic perspective, occlusion and clutter can be interpreted as sources of increased representational entropy, and PMNet addresses this issue by progressively reducing uncertainty through complementary structural reasoning and attention-based information selection. The framework incorporates a criss-cross attention module to suppress irrelevant features, an adaptive nonlinear fusion strategy to balance complementary information across parallel branches, and an error-compensated decoding method to sharpen heatmap distributions and refine keypoint localization while maintaining efficiency. Extensive experiments on the MPII and COCO benchmarks demonstrate that PMNet achieves state-of-the-art or comparable performance, attaining 92.42% PCKh@0.5 on MPII and 77.3% AP on COCO. Ablation studies and qualitative visualizations further confirm the effectiveness of each component, showing improved signal-to-noise ratios and more concentrated heatmap responses. Overall, PMNet provides a robust and efficient pose estimation framework with strong potential for real-world applications such as surveillance and autonomous systems. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

39 pages, 9763 KB

Open AccessArticle

SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection

by Lanfang Lei, Sheng Chang, Zhongzhen Sun, Xinli Zheng, Changyu Liao, Wenjun Wei, Long Ma and Ping Zhong

Remote Sens. 2026, 18(4), 619; https://doi.org/10.3390/rs18040619 - 16 Feb 2026

Viewed by 357

Abstract

Synthetic aperture radar (SAR) imagery is widely used for target detection in complex backgrounds and adverse weather conditions. However, high-precision detection of rotated small targets remains challenging due to severe speckle noise, significant scale variations, and the need for robust rotation-aware representations. To [...] Read more.

Synthetic aperture radar (SAR) imagery is widely used for target detection in complex backgrounds and adverse weather conditions. However, high-precision detection of rotated small targets remains challenging due to severe speckle noise, significant scale variations, and the need for robust rotation-aware representations. To address these issues, we propose SAR-DRBNet, a high-precision rotated small-target detection framework built upon YOLOv13. First, we introduce a Detail-Enhanced Oriented Bounding Box detection head (DEOBB), which leverages multi-branch enhanced convolutions to strengthen fine-grained feature extraction and improve oriented bounding box regression, thereby enhancing rotation sensitivity and localization accuracy for small targets. Second, we design a Ck-MultiDilated Reparameterization Block (CkDRB) that captures multi-scale contextual cues and suppresses speckle interference via multi-branch dilated convolutions and an efficient reparameterization strategy. Third, we propose a Dynamic Feature Weaving module (DynWeave) that integrates global–local dual attention with dynamic large-kernel convolutions to adaptively fuse features across scales and orientations, improving robustness in cluttered SAR scenes. Extensive experiments on three widely used SAR rotated object detection benchmarks (HRSID, RSDD-SAR, and DSSDD) demonstrate that SAR-DRBNet achieves a strong balance between detection accuracy and computational efficiency compared with state-of-the-art oriented bounding box detectors, while exhibiting superior cross-dataset generalization. These results indicate that SAR-DRBNet provides an effective and reliable solution for rotated small-target detection in SAR imagery. Full article

(This article belongs to the Special Issue Advanced Methods and Applications in SAR (Synthetic Aperture Radar) Image Target Detection and Recognition)

► Show Figures

Figure 1

24 pages, 4319 KB

Open AccessArticle

HLNet: A Lightweight Network for Ship Detection in Complex SAR Environments

by Xiaopeng Guo, Fan Deng, Jie Gong, Jing Zhang, Jiajia Guo, Yong Wang, Yinmei Zeng and Gongquan Li

Remote Sens. 2026, 18(4), 577; https://doi.org/10.3390/rs18040577 - 12 Feb 2026

Viewed by 201

Abstract

The coherent speckle noise in synthetic aperture radar (SAR) imagery, together with complex sea clutter and large variations in ship target scales, poses significant challenges to accurate and robust ship detection, particularly under strict lightweight constraints required by satellite-borne and airborne platforms. To [...] Read more.

The coherent speckle noise in synthetic aperture radar (SAR) imagery, together with complex sea clutter and large variations in ship target scales, poses significant challenges to accurate and robust ship detection, particularly under strict lightweight constraints required by satellite-borne and airborne platforms. To address this issue, this paper proposes a high-precision lightweight detection network, termed High-Lightweight Net (HLNet), specifically designed for SAR ship detection. The network incorporates a novel multi-scale backbone, Multi-Scale Net (MSNet), which integrates dynamic feature completion and multi-core parallel convolutions to alleviate small-target feature loss and suppress background interference. To further enhance multi-scale feature fusion while reducing model complexity, a lightweight path aggregation feature pyramid network, High-Lightweight Feature Pyramid (HLPAFPN), is introduced by reconstructing fusion pathways and removing redundant channels. In addition, a lightweight detection head, High-Lightweight Head (HLHead), is designed by combining grouped convolutions with distribution focal loss to improve localization robustness under low signal-to-noise ratio conditions. Extensive experiments conducted on the public SSDD and HRSID datasets demonstrate that HLNet achieves mAP₅₀ scores of 98.3% and 91.7%, respectively, with only 0.66 M parameters. Extensive evaluations on the more challenging CSID subset, composed of complex scenes selected from SSDD and HRSID, demonstrate that HLNet attains an mAP₅₀ of 75.9%, outperforming the baseline by 4.3%. These results indicate that HLNet achieves an effective balance between detection accuracy and computational efficiency, making it well-suited for deployment on resource-constrained SAR platforms. Full article

(This article belongs to the Special Issue Advanced Methods and Applications in SAR (Synthetic Aperture Radar) Image Target Detection and Recognition)

► Show Figures

Figure 1

23 pages, 1285 KB

Open AccessArticle

GTO-YOLO11n: YOLOv11n-Based Efficient Target Detection in Ship Remote Sensing Imagery

by Bei Xiao, Peisheng Liu, Xiwang Guo, Bin Hu, Jiankang Ren and Yushuang Jiang

Processes 2026, 14(4), 583; https://doi.org/10.3390/pr14040583 - 7 Feb 2026

Viewed by 264

Abstract

Accurate and efficient ship detection in remote sensing imagery is a key enabler of intelligent maritime surveillance operations, supporting real-time decision-making in search and rescue, traffic management, and maritime law enforcement. However, remote ship images pose unique challenges for detection. These include densely [...] Read more.

Accurate and efficient ship detection in remote sensing imagery is a key enabler of intelligent maritime surveillance operations, supporting real-time decision-making in search and rescue, traffic management, and maritime law enforcement. However, remote ship images pose unique challenges for detection. These include densely distributed targets, complex sea-land backgrounds, large aspect ratios, diverse ship geometries, and high color similarity between ships and their surroundings. To address these issues under the computational constraints of unmanned aerial platforms, we propose GTO-YOLO11n, an enhanced YOLOv11n-based detection model tailored for efficient maritime ship sensing. First, we introduce the GatedFDConvBlock, which employs gated convolutional filtering to strengthen feature extraction for small and elongated ships while suppressing background clutter, thereby reducing missed and false detections in dense scenes. Second, we improve the C2PSA module with a dynamic multi-scale attention design, TSSABlock_DMS, to adaptively model cross-scale feature interactions and enhance robustness to complex maritime environments. Third, we replace the original detection head with OBB_ED, a parameter-sharing head that incorporates depthwise separable convolution (DSConv) and an angle prediction branch to lower model complexity while preserving high-quality localization and classification. To verify the performance of the algorithm, we were conducted on the public datasets HRSC2016, HRSC2016-MS, and ShipRSImageNet. The mAP@50 results were 95.2%, 88.3%, and 76.7%, showing improvements of 3.2%, 2.2%, and 2.6% compared to the original YOLOv11n. Full article

(This article belongs to the Special Issue Intelligent Operation, Maintenance, and Scheduling of Industrial Manufacturing Processes)

► Show Figures

Figure 1

24 pages, 3204 KB

Open AccessArticle

Web-Based Explainable AI System Integrating Color-Rule and Deep Models for Smart Durian Orchard Management

by Wichit Sookkhathon and Chawanrat Srinounpan

AgriEngineering 2026, 8(1), 23; https://doi.org/10.3390/agriengineering8010023 - 9 Jan 2026

Viewed by 393

Abstract

This study presents a field-oriented AI web system for durian orchard management that recognizes leaf health from on-orchard images under variable illumination. Two complementary pipelines are employed: (1) a rule-based module operating in HSV and CIE Lab color spaces that suppresses sun-induced specular [...] Read more.

This study presents a field-oriented AI web system for durian orchard management that recognizes leaf health from on-orchard images under variable illumination. Two complementary pipelines are employed: (1) a rule-based module operating in HSV and CIE Lab color spaces that suppresses sun-induced specular highlights via V/L* thresholds and applies interpretable hue–chromaticity rules with spatial constraints; and (2) a Deep Feature (PCA–SVM) pipeline that extracts features from pretrained ResNet50 and DenseNet201 models, performs dimensionality reduction using Principal Component Analysis, and classifies samples into three agronomic classes: healthy, leaf-spot, and leaf-blight. This hybrid architecture enhances transparency for growers while remaining robust to illumination variations and background clutter typical of on-farm imaging. Preliminary on-farm experiments under real-world field conditions achieved approximately 80% classification accuracy, whereas controlled evaluations using curated test sets showed substantially higher performance for the Deep Features and Ensemble model, with accuracy reaching 0.97–0.99. The web interface supports near-real-time image uploads, annotated visual overlays, and Thai-language outputs. Usability testing with thirty participants indicated very high satisfaction (mean 4.83, SD 0.34). The proposed system serves as both an instructional demonstrator for explainable AI-based image analysis and a practical decision-support tool for digital horticultural monitoring. Full article

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

► Show Figures

Figure 1

14 pages, 9038 KB

Open AccessArticle

BSGNet: Vehicle Detection in UAV Imagery of Construction Scenes via Biomimetic Edge Awareness and Global Receptive Field Modeling

by Yongwei Wang, Yuan Chen, Yakun Xie, Jun Zhu, Chao Dang and Hao Zhu

Drones 2026, 10(1), 32; https://doi.org/10.3390/drones10010032 - 5 Jan 2026

Viewed by 282

Abstract

Detecting vehicles in remote sensing images of construction sites captured by Unmanned Aerial Vehicles (UAVs) faces severe challenges, including extremely small target scales, high inter-class visual similarity, cluttered backgrounds, and highly variable imaging conditions. To address these issues, we propose BSGNet (Biomimetic Sharpening [...] Read more.

Detecting vehicles in remote sensing images of construction sites captured by Unmanned Aerial Vehicles (UAVs) faces severe challenges, including extremely small target scales, high inter-class visual similarity, cluttered backgrounds, and highly variable imaging conditions. To address these issues, we propose BSGNet (Biomimetic Sharpening and Global Receptive Field Network)—a novel detection architecture that synergistically fuses biologically inspired visual mechanisms with global receptive field modeling. Inspired by the Sustained Contrast Detection (SCD) mechanism in frog retinal ganglion cells, we design a Perceptual Sharpening Module (PSM). This module combines dual-path contrast enhancement with spatial attention mechanisms to significantly improve sensitivity to the high-frequency edge structures of small targets while effectively suppressing interfering backgrounds. To overcome the inherent limitation of such biomimetic mechanisms—specifically their restricted local receptive fields—we further introduce a Global Heterogeneous Receptive Field Learning Module (GRM). This module employs parallel multi-branch dilated convolutions and local detail enhancement paths to achieve joint modeling of long-range semantic context and fine-grained local features. Extensive experiments on our newly constructed UAV Construction Vehicle (UCV) dataset demonstrate that BSGNet achieves state-of-the-art performance: obtaining 64.9% APs on small targets and 81.2% on the overall mAP@0.5 metric, with an inference latency of only 31.4 milliseconds, outperforming existing mainstream detection frameworks in multiple metrics. Furthermore, the model demonstrates robust generalization performance on public datasets. Full article

► Show Figures

Figure 1

21 pages, 4180 KB

Open AccessArticle

Mine Exogenous Fire Detection Algorithm Based on Improved YOLOv9

by Xinhui Zhan, Rui Yao, Yun Qi, Chenhao Bai, Qiuyang Li and Qingjie Qi

Processes 2026, 14(1), 169; https://doi.org/10.3390/pr14010169 - 4 Jan 2026

Viewed by 404

Abstract

Exogenous fires in underground coal mines are characterized by low illumination, smoke occlusion, heavy dust loading and pseudo fire sources, which jointly degrade image quality and cause missed and false alarms in visual detection. To achieve accurate and real-time early warning under such [...] Read more.

Exogenous fires in underground coal mines are characterized by low illumination, smoke occlusion, heavy dust loading and pseudo fire sources, which jointly degrade image quality and cause missed and false alarms in visual detection. To achieve accurate and real-time early warning under such conditions, this paper proposes a mine exogenous fire detection algorithm based on an improved YOLOv9m, termed PPL-YOLO-F-C. First, a lightweight PP-LCNet backbone is embedded into YOLOv9m to reduce the number of parameters and GFLOPs while maintaining multi-scale feature representation suitable for deployment on resource-constrained edge devices. Second, a Fully Connected Attention (FCAttention) module is introduced to perform fine-grained frequency–channel calibration, enhancing discriminative flame and smoke features and suppressing low-frequency background clutter and non-flame textures. Third, the original upsampling operators in the neck are replaced by the CARAFE content-aware dynamic upsampler to recover blurred flame contours and tenuous smoke edges and to strengthen small-object perception. In addition, an MPDIoU-based bounding-box regression loss is adopted to improve geometric sensitivity and localization accuracy for small fire spots. Experiments on a self-constructed mine fire image dataset comprising 3000 samples show that the proposed PPL-YOLO-F-C model achieves a precision of 97.36%, a recall of 84.91%, mAP@50 of 96.49% and mAP@50:95 of 76.6%, outperforming Faster R-CNN, YOLOv5m, YOLOv7 and YOLOv8m while using fewer parameters and lower computational cost. The results demonstrate that the proposed algorithm provides a robust and efficient solution for real-time exogenous fire detection and edge deployment in complex underground mine environments. Full article

(This article belongs to the Section AI-Enabled Process Engineering)

► Show Figures

Figure 1

20 pages, 2188 KB

Open AccessArticle

SAQ-YOLO: An Efficient Small Object Detection Model for Unmanned Aerial Vehicle in Maritime Search and Rescue

by Sichen Li, Hao Yi, Shengyi Chen, Xinmin Chen, Mao Xu and Feifan Yu

Appl. Sci. 2026, 16(1), 131; https://doi.org/10.3390/app16010131 - 22 Dec 2025

Viewed by 486

Abstract

In Search and Rescue (SAR) missions, UAVs must be capable of detecting small objects from complex and noise-prone maritime images. Existing small object detection methods typically rely on super-resolution techniques or complex structural designs, which often demand significant computational resources and fail to [...] Read more.

In Search and Rescue (SAR) missions, UAVs must be capable of detecting small objects from complex and noise-prone maritime images. Existing small object detection methods typically rely on super-resolution techniques or complex structural designs, which often demand significant computational resources and fail to meet the real-time requirements for small mobile devices in SAR tasks. To address this challenge, we propose SAQ-YOLO, an efficient small object detection model based on the YOLO framework. We design a Small Object Auxiliary Query branch, which uses deep semantic information to guide the fusion of shallow features, thereby improving small object capture efficiency. Additionally, SAQ-YOLO incorporates a series of lightweight channel, spatial, and group (large kernel) gated attention mechanisms to suppress background clutter in complex maritime environments, enhancing feature extraction at a low computational cost. Experiments on the SeaDronesSee dataset demonstrate that, compared to YOLOv11s, SAQ-YOLO reduces the number of parameters by approximately 70% while increasing mAP@50 by 2.1 percentage points. Compared to YOLOv11n, SAQ-YOLO improves mAP@50 by 8.7 percentage points. When deployed on embedded platforms, SAQ-YOLO achieves an inference latency of only 35 milliseconds per frame, meeting the real-time requirements of maritime SAR applications. These results suggest that SAQ-YOLO provides an efficient and deployable solution for UAV SAR operations in vast and highly dynamic marine environments. Future work will focus on enhancing the robustness of the detection model. Full article

► Show Figures

Figure 1

24 pages, 8304 KB

Open AccessArticle

STAIR-DETR: A Synergistic Transformer Integrating Statistical Attention and Multi-Scale Dynamics for UAV Small Object Detection

by Linna Hu, Penghao Xue, Bin Guo, Yiwen Chen, Weixian Zha and Jiya Tian

Sensors 2025, 25(24), 7681; https://doi.org/10.3390/s25247681 - 18 Dec 2025

Viewed by 704

Abstract

Detecting small objects in unmanned aerial vehicle (UAV) imagery remains a challenging task due to the limited target scale, cluttered backgrounds, severe occlusion, and motion blur commonly observed in dynamic aerial environments. This study presents STAIR-DETR, a real-time synergistic detection framework derived from [...] Read more.

Detecting small objects in unmanned aerial vehicle (UAV) imagery remains a challenging task due to the limited target scale, cluttered backgrounds, severe occlusion, and motion blur commonly observed in dynamic aerial environments. This study presents STAIR-DETR, a real-time synergistic detection framework derived from RT-DETR, featuring comprehensive enhancements in feature extraction, resolution transformation, and detection head design. A Statistical Feature Attention (SFA) module is incorporated into the neck to replace the original AIFI, enabling token-level statistical modeling that strengthens fine-grained feature representation while effectively suppressing background interference. The backbone is reinforced with a Diverse Semantic Enhancement Block (DSEB), which employs multi-branch pathways and dynamic convolution to enrich semantic expressiveness without sacrificing spatial precision. To mitigate information loss during scale transformation, an Adaptive Scale Transformation Operator (ASTO) is proposed by integrating Context-Guided Downsampling (CGD) and Dynamic Sampling (DySample), achieving context-aware compression and content-adaptive reconstruction across resolutions. In addition, a high-resolution P2 detection head is introduced to leverage shallow-layer features for accurate classification and localization of extremely small targets. Extensive experiments conducted on the VisDrone2019 dataset demonstrate that STAIR-DETR attains 41.7% mAP@50 and 23.4% mAP@50:95, outperforming contemporary state-of-the-art (SOTA) detectors while maintaining real-time inference efficiency. These results confirm the effectiveness and robustness of STAIR-DETR for precise small object detection in complex UAV-based imaging scenarios. Full article

(This article belongs to the Special Issue Dynamics and Control System Design for Robotics)

► Show Figures

Figure 1

Search Results (93)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (93)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI