MDPI - Publisher of Open Access Journals

18 pages, 6162 KB

Open AccessArticle

YOLO-UTD: A Domain-Specific Detection Framework for Small Objects in UAV Traffic Surveillance

by Hailang Huang, Meng Li, Jiebao Zhang and Yitong Li

Sensors 2026, 26(12), 3931; https://doi.org/10.3390/s26123931 (registering DOI) - 20 Jun 2026

Detecting objects in drone-captured aerial imagery is particularly formidable due to challenges such as the prevalence of numerous small targets and their dense spatial distribution. To bridge this gap, this paper introduces YOLO-UTD (YOLO-UAV Traffic Detection), a dedicated small object detector tailored for [...] Read more.

Detecting objects in drone-captured aerial imagery is particularly formidable due to challenges such as the prevalence of numerous small targets and their dense spatial distribution. To bridge this gap, this paper introduces YOLO-UTD (YOLO-UAV Traffic Detection), a dedicated small object detector tailored for drone traffic surveillance. Built upon the YOLOv8 framework, the proposed model incorporates three principal enhancements. First, a specialized small-object detection head replaces the original large-object head to increase the sensitivity to fine-grained features. Second, we introduce a shallow-augmented feature pyramid network (SFPN) into the neck module. The SFPN enriches the semantic content of high-resolution shallow features via dense multiscale interactions and CARAFE upsampling, boosting performance on small targets. Finally, a C2fA layer is integrated into the deep backbone stages to adaptively fuse spatial details and semantic context through a dual-path architecture and a cross-attention mechanism, thereby dynamically refining features critical for small objects. Extensive experiments on the VisDrone2019 dataset validate that YOLO-UTD achieves a 3.6% higher mean average precision (mAP) than YOLOv8 while preserving a low parameter footprint, with a particularly significant gain of 5.3% in vehicle detection accuracy. These findings confirm the model’s efficacy and strong potential for application in smart city drone surveillance. Full article

(This article belongs to the Topic Transformer and Deep Learning Applications in Image Processing)

► Show Figures

Figure 1

29 pages, 6688 KB

Open AccessArticle

CGMSN: CFAR-Guided Mode-Selective Network for SAR Target Detection

by Lingjuan Yu, Xinya Xiong, Xiaochun Xie, Miaomiao Liang, Xiangchun Yu, Xuan Jiao and Wen Hong

Remote Sens. 2026, 18(12), 2040; https://doi.org/10.3390/rs18122040 - 18 Jun 2026

Viewed by 85

Abstract

Improving detection performance across diverse synthetic aperture radar (SAR) scenes remains challenging because different datasets exhibit different levels of target–background separability. To address this issue, we propose a constant false alarm rate (CFAR)-guided mode-selective network (CGMSN), which selects an appropriate feature-fusion mode according [...] Read more.

Improving detection performance across diverse synthetic aperture radar (SAR) scenes remains challenging because different datasets exhibit different levels of target–background separability. To address this issue, we propose a constant false alarm rate (CFAR)-guided mode-selective network (CGMSN), which selects an appropriate feature-fusion mode according to the CFAR target–background separation margin. Specifically, CFAR is used as an interpretable statistical tool to construct an anomaly response map. The separation margin is then calculated by comparing the average CFAR anomaly responses of annotated target regions and their surrounding contextual backgrounds. Based on this indicator, a You Only Look Once version 8 (YOLOv8)-based mode-selective detector is constructed with three key components. First, a lightweight representation-enhanced backbone that integrates ResNet18 and a dilated convolutional spatial pyramid (DCSP) module is adopted to improve contextual representation while maintaining moderate model complexity. Second, a mode-selective neck (MSN) is designed with three predefined fusion modes, where the appropriate fusion depth is selected according to the CFAR-guided target–background separation margin of each dataset. Third, a complete intersection over the union modulated head (CMH) is developed to enhance classification-regression alignment and suppress clutter-induced responses. Experiments on SAR-Aircraft-1.0, High-Resolution SAR Images Dataset (HRSID), and SAR Ship Detection Dataset (SSDD) indicate that datasets with smaller CFAR target–background separation margins benefit from deeper fusion, while datasets with larger separation margins can adopt shallower fusion. Moreover, the proposed CGMSN achieves superior performance over representative detectors, demonstrating its effectiveness on the evaluated SAR datasets with diverse scene characteristics. Full article

(This article belongs to the Special Issue Target Recognition and Detection Based on High Resolution Radar Images (Second Edition))

26 pages, 3882 KB

Open AccessArticle

Remote Sensing Small Object Detection Network Based on Wavelet-Convolution and Fine-Grained Preservation

by Hangyu Li and Tiecheng Song

Information 2026, 17(6), 609; https://doi.org/10.3390/info17060609 (registering DOI) - 18 Jun 2026

Viewed by 137

Abstract

Small object detection in remote sensing imagery is a fundamental task for visual information extraction, yet it remains challenging due to extremely small target scales, complex backgrounds, and the loss of discriminative feature information caused by repeated downsampling. To address these issues, this [...] Read more.

Small object detection in remote sensing imagery is a fundamental task for visual information extraction, yet it remains challenging due to extremely small target scales, complex backgrounds, and the loss of discriminative feature information caused by repeated downsampling. To address these issues, this paper proposes a Wavelet-Convolution and Fine-Grained Preservation Network (WCFPNet) based on YOLOv8n. Specifically, a Wavelet-Convolution Module (WCM) is introduced into the backbone to decompose feature maps into low- and high-frequency sub-bands, thereby enhancing structural feature modeling and preserving subtle target details. To compensate for the weakened fine-grained information after repeated downsampling, an Enhanced Spatial Pyramid Pooling-Fast (ESPPF) module is embedded at the end of the backbone to strengthen multi-scale contextual aggregation. In addition, an Enhanced Feature Pyramid Network (EFPN) is designed in the neck to facilitate the propagation of shallow and intermediate fine-grained features to high-level semantic features through cross-level fusion and the Convolutional Block Attention Module (CBAM). Experiments on the NWPU VHR-10 dataset show that WCFPNet achieves 0.879 mAP@0.5 and 0.515 mAP@0.5:0.95, outperforming YOLOv8n by 1.7 and 2.5 percentage points, respectively. Moreover, the proposed WCFPNet achieves a competitive performance compared with several representative detectors while maintaining moderate model complexity. These results demonstrate the effectiveness of WCFPNet in challenging remote sensing scenes characterized by complex backgrounds, dense object distributions, and weak textures. Full article

(This article belongs to the Special Issue Emerging Research in Target Detection and Recognition in Remote Sensing Images, 2nd Edition)

► Show Figures

Figure 1

40 pages, 24197 KB

Open AccessArticle

Research on Object Detection in Cluttered Hospital Corridor Scenes with CSAWOA-YOLOv8

by Tianye Luo, Jing Hu, Bangcheng Zhang, Xinming Zhang and Shaoming Luo

Biomimetics 2026, 11(6), 431; https://doi.org/10.3390/biomimetics11060431 - 17 Jun 2026

Viewed by 116

Abstract

Dynamic hospital corridor environments are characterized by complex corridor environments, diverse target-scale variations, frequent occlusions, and dense small-object distribution, posing significant challenges to the accuracy and efficiency of the existing methods on resource-constrained platforms. To effectively address these challenges, a high-precision framework CSAWOA [...] Read more.

Dynamic hospital corridor environments are characterized by complex corridor environments, diverse target-scale variations, frequent occlusions, and dense small-object distribution, posing significant challenges to the accuracy and efficiency of the existing methods on resource-constrained platforms. To effectively address these challenges, a high-precision framework CSAWOA (Cross Search Adaptive Whale Optimization Algorithm)-YOLOv8 (You Only Look Once version 8) model for complex medical environments was introduced in this work. By jointly modelling high-level semantic information and low-level cues such as texture and colour, the proposed model achieved a more discriminative and informative feature representation. The T-CBS (Transformer-Convolutional Bottleneck Structure) module, capable of extracting shallow-level features and integrating global contextual information to address target occlusion issues, was also proposed. Furthermore, the integration of the BiFormer module yielded an enhanced feature discriminability, improving small-target recognition while reducing sensitivity to background noise. The classification function was modified, effectively solving the problem of class imbalance in complex corridor environments. The combination of these two concepts achieved an effective balance of diversity in detection and convergence speed, leading to improved optimization performance and greater resistance to local-optimum stagnation. Meanwhile, an improved version of the WOA was developed, termed CSAWOA, enabling automatic hyperparameter optimization for the improved YOLOv8 model. From the experimental results, improvements of 4.9%, 6.1%, and 8.3% in mAP, precision, and recall, respectively, compared to YOLOv8 were demonstrated, while also exhibiting better generalization. Overall, the proposed method provides a reliable and efficient approach for object detection in complex hospital corridors, offering a valuable foundation for future research and real-world healthcare applications. Full article

(This article belongs to the Section Biological Optimisation and Management)

► Show Figures

Graphical abstract

33 pages, 8778 KB

Open AccessArticle

SPTD-YOLO: Small-Object-Aware Pyramidal and Task-Aligned Dynamic YOLO for UAV Small Object Detection

by Jiarui Liang, Jiachen Yu, Mingyang Li, Yikui Zhai and Xiaolin Tian

Appl. Sci. 2026, 16(12), 6062; https://doi.org/10.3390/app16126062 - 15 Jun 2026

Viewed by 124

Abstract

Unmanned aerial vehicle (UAV) object detection plays an essential role in modern visual perception, but it remains challenging because UAV imagery typically contains extremely small, densely distributed objects embedded in complex backgrounds. Conventional detectors, including the recent YOLOv12, are prone to losing critical [...] Read more.

Unmanned aerial vehicle (UAV) object detection plays an essential role in modern visual perception, but it remains challenging because UAV imagery typically contains extremely small, densely distributed objects embedded in complex backgrounds. Conventional detectors, including the recent YOLOv12, are prone to losing critical spatial details during downsampling and often exhibit task misalignment between classification and localization, particularly under severe scale variations. To address these problems, this study proposes SPTD-YOLO, a small-object-aware pyramidal and task-aligned dynamic detector. Specifically, a Small Object Enhanced Pyramid (SOEP) is developed by incorporating SPDConv and CSPOmniKernel to preserve and refine shallow, fine-grained features. In addition, a high-resolution P₂ detection layer is introduced to increase spatial grid density and strengthen the structural representation of tiny objects. Furthermore, a Task-Aligned Dynamic Detection Head (TADDH) is designed to decouple and coordinate classification and regression through dynamic convolution and a synergistic dual-gating mechanism. Experiments on VisDrone2019 show that SPTD-YOLO improves mAP@0.5 by 8.37% and mAP@0.5:0.95 by 5.11% over YOLOv12 while maintaining practical efficiency for UAV edge deployment. Full article

► Show Figures

Figure 1

21 pages, 3582 KB

Open AccessArticle

An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection

by Tengfei He, Zhongyuan Mao and Yuanchang Zhong

Remote Sens. 2026, 18(12), 1986; https://doi.org/10.3390/rs18121986 - 15 Jun 2026

Viewed by 171

Abstract

To address missed detections, false alarms, and deployment limitations in thermal defect detection of photovoltaic modules from unmanned aerial vehicle (UAV) infrared images, this paper proposes an improved detection method based on You Only Look Once version 8 nano (YOLOv8n). The proposed method [...] Read more.

To address missed detections, false alarms, and deployment limitations in thermal defect detection of photovoltaic modules from unmanned aerial vehicle (UAV) infrared images, this paper proposes an improved detection method based on You Only Look Once version 8 nano (YOLOv8n). The proposed method is optimized according to the characteristics of UAV infrared photovoltaic inspection, including small thermal targets, weak and diffuse thermal responses, complex backgrounds, and lightweight deployment requirements. Specifically, a P2 shallow feature layer is introduced to enhance fine-grained feature perception for small thermal defects, while Ghost Convolution (GhostConv) is incorporated into the backbone to reduce model complexity. In addition, C2f-Large Separable Kernel Attention (C2f-LSKA) is embedded in the neck to strengthen contextual and spatial feature modeling under complex infrared backgrounds, and Wise-IoU version 3 (WIoUv3) is adopted to improve bounding box regression and localization stability for boundary-ambiguous thermal anomalies. Experiments are conducted on a self-constructed UAV infrared thermal imaging dataset. From nearly 10,000 inspection images, 3000 representative images are selected and manually annotated, covering typical challenges such as small hot spots, low-contrast defects, complex background interference, and diffuse abnormal temperature-rise regions. Compared with the baseline YOLOv8n, the proposed method improves Precision, Recall, mean average precision at an IoU threshold of 0.5 (mAP@0.5), and mean average precision averaged over IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95) by 5.1, 11.4, 9.6, and 13.2 percentage points, respectively, while reducing the number of parameters and model size by 65.8% and 61.9%, respectively. These results indicate that the proposed method improves detection accuracy and localization quality under the evaluated UAV infrared inspection setting while maintaining lightweight characteristics. Full article

► Show Figures

Figure 1

38 pages, 26167 KB

Open AccessArticle

Uncertainty-Aware Keypoint Guidance and Fractional Fourier Feature Enhancement for Multi-Class SAR Aircraft Detection

by Yu Qiu, Bin Zou, Fangzhou Han, Lamei Zhang and Jordi J. Mallorqui

Remote Sens. 2026, 18(12), 1969; https://doi.org/10.3390/rs18121969 - 13 Jun 2026

Viewed by 112

Abstract

Aircraft targets in SAR imagery often exhibit discrete scattering characteristics, significant variations in pose and scale, strong speckle noise in background clutter, and complex background interference, which jointly hinder stable structural feature extraction and accurate target localization. Existing detectors for SAR aircraft recognition [...] Read more.

Aircraft targets in SAR imagery often exhibit discrete scattering characteristics, significant variations in pose and scale, strong speckle noise in background clutter, and complex background interference, which jointly hinder stable structural feature extraction and accurate target localization. Existing detectors for SAR aircraft recognition primarily rely on bounding-box regression and classification; they do not completely exploit target structural cues, spatial attention, and frequency-domain information. To address these limitations, we propose a collaborative detection framework that integrates an uncertainty-aware keypoint-driven module (UAKM) with a fractional Fourier convolution backbone (S-FRConv). UAKM introduces a center-keypoint regression branch that jointly predicts keypoint coordinates and Laplacian scale parameters and employs a 2D Laplace negative log-likelihood loss to estimate uncertainty. The derived dense uncertainty heatmap is then used as spatial attention weights to guide distribution-based regression and multi-scale feature re-weighting, without requiring any additional annotations. S-FRConv embeds the Fractional Fourier Transform into shallow backbone layers and C2f modules, enabling joint spatial–spectral feature modeling that suppresses speckle noise and enhances edge and orientation representations. Experiments on the public SAR-AIRcraft-1.0 dataset demonstrate that the proposed method systematically improves the detection performance. For the Nano model, the overall mAP50 increases from 0.810 to 0.867, and the mAP 50:95 improves from 0.637 to 0.655 compared with the baseline, corresponding to gains of 5.7 and 1.8 percentage points, respectively. These results validate the effectiveness and generalization potential of combining uncertainty-driven spatial attention with fractional spectral feature enhancement for SAR aircraft target detection. Full article

(This article belongs to the Special Issue Object Detection in Remote Sensing Imagery)

► Show Figures

Figure 1

19 pages, 3589 KB

Open AccessArticle

DIDW-YOLOv11: The Steel Surface Defect Detection Method Based on Improved YOLOv11 Network

by Jiajun Jiang, Yaodan Zhang, Ziyang Xue and Chuzheng Wang

Electronics 2026, 15(12), 2593; https://doi.org/10.3390/electronics15122593 - 12 Jun 2026

Viewed by 125

Abstract

The steel surface defect detection is crucial for steel quality and usage safety. The high computational cost and low detection accuracy are still the main issues in current steel detection models. To efficiently address the issues above, this paper proposes a new steel [...] Read more.

The steel surface defect detection is crucial for steel quality and usage safety. The high computational cost and low detection accuracy are still the main issues in current steel detection models. To efficiently address the issues above, this paper proposes a new steel surface defect detection model named DIDW-YOLOv11. In the proposed DIDW-YOLOv11, the YOLOv11 C3k2 module is first innovatively improved by C3K2-DIMB, which integrates C3K2 and DIMB by introducing DynamicInceptionDWConv2d (DIDW) to sufficiently strengthen the detailed feature extraction for tiny defects and weak-texture defects, improving the matching degree of multi-scale receptive fields. Then the YOLOv11 SPPF module is enhanced by integrating the IDWFSPPF module for optimizing the fusion of local and global information, which combines average pooling and max pooling to enhance the model’s multi-scale feature fusion capability. An auxiliary detection head (ADH) is finally proposed with an additional coarse loss function to process shallow feature information into the model, which uses extra supervision for shallow features to suppress background noise and reduce false detections. Experimental results on the NEU-DET and GC10-DET datasets show that DIDW-YOLOv11 achieves 4.9% and 3.8% improvements in mAP@0.5 compared to the baseline model YOLOv11s. Our research indicates that DIDW-YOLOv11 exhibits stronger recognition ability and robustness in complex and diverse defect detection, providing an effective solution for steel defect detection in industrial production. In addition, experimental results show that our model offers improved performance over the baseline methods. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications for Computer Vision and Recognition Systems)

► Show Figures

Figure 1

32 pages, 14789 KB

Open AccessArticle

A Multi-Dimensional Feature Enhancement Network for SAR Target Detection via Cascaded Frequency–Spatial Refinement

by Shanhong Guo, Ji Zhu, Gao Chen, Mu Yang and Weixing Sheng

Remote Sens. 2026, 18(12), 1888; https://doi.org/10.3390/rs18121888 - 8 Jun 2026

Viewed by 278

Abstract

Target detection in synthetic aperture radar (SAR) images is constrained by three primary challenges. First, speckle noise overlaps heavily with the high-frequency features of target edges in the frequency domain, so standard convolutions cannot suppress noise without sacrificing edge texture. Second, the scattering [...] Read more.

Target detection in synthetic aperture radar (SAR) images is constrained by three primary challenges. First, speckle noise overlaps heavily with the high-frequency features of target edges in the frequency domain, so standard convolutions cannot suppress noise without sacrificing edge texture. Second, the scattering signature of a SAR target varies markedly with viewing angle, and a fixed-parameter convolution kernel cannot accommodate this spatial non-stationarity. Third, deep and shallow levels of the feature pyramid differ in semantics and resolution, and a naive element-wise sum either introduces noise interference or loses small-target signals. We propose the Frequency–Spatial Detection Network (FSDNet), whose core FSDBlock cascades three operators to address these failure modes in turn. Wavelet Convolution (WTConv) projects features into Haar sub-bands and applies independent low- and high-frequency kernels prior to inverse-DWT reconstruction, suppressing noise while preserving edges. Receptive-Field Attention Convolution (RFAConv) generates location-conditional kernels and so adapts to non-stationary scattering. Spatial Context Self-Attention (SCSA) aggregates discrete scattering points into coherent target representations via long-range grouped attention. At the fusion stage, CGAFusion replaces FPN element-wise addition with a channel–spatial–pixel triple-attention soft switch that mitigates deep–shallow semantic mismatch. On HRSID, FSDNet attains mAP₅₀

= 92.3

% and mAP_50:95

= 68.6

%. On SSDD, it attains mAP₅₀

= 98.7

% and mAP_50:95

= 74.2

%. Both sets of results consistently surpass the baseline methods. Against the strongest YOLO baseline (YOLOv11n), FSDNet improves HRSID mAP₅₀ by

+ 1.7

percentage points (pp) and mAP_50:95 by

+ 2.3

pp, and SSDD mAP₅₀ by

+ 0.5

pp and mAP_50:95 by

+ 2.7

pp; against the capacity-fair YOLOv11s reference (∼51% more parameters), FSDNet still leads on mAP₅₀, mAP_50:95, recall, and F1. Ablation studies and power-spectral-density analyses corroborate the contribution of each module and confirm WTConv’s role in preserving high-frequency target features. Full article

► Show Figures

Figure 1

29 pages, 3734 KB

Open AccessArticle

Bathymetric Inversion of Tibetan Plateau Lakes Using Hyperspectral Imagery and ICESat-2 Data

by Chang Zhong, Yu Zhao, Mengchun Pan, Qi Zhang, Xinxin Sui, Li Chen, Ning Wang and Fan Bu

Remote Sens. 2026, 18(12), 1886; https://doi.org/10.3390/rs18121886 - 8 Jun 2026

Viewed by 226

Abstract

Lake depth is a fundamental parameter for estimating lake storage, analyzing basin morphology, and understanding the evolution of plateau lakes. Compared with typical shallow lakes, Tibetan Plateau lakes are characterized by high elevation, strong radiation, pronounced inter-lake and inter-annual variability, and in some [...] Read more.

Lake depth is a fundamental parameter for estimating lake storage, analyzing basin morphology, and understanding the evolution of plateau lakes. Compared with typical shallow lakes, Tibetan Plateau lakes are characterized by high elevation, strong radiation, pronounced inter-lake and inter-annual variability, and in some cases considerable basin depth, which limits the accuracy, stability, and generalization ability of existing bathymetric inversion methods based on single-source optical imagery. Meanwhile, although ICESat-2 can provide sparse but high-precision along-track bathymetric constraints, a unified framework suitable for plateau-lake scenarios is still lacking. To address this issue, this study proposes TabKAN, a bathymetric inversion framework for Tibetan Plateau lakes under joint constraints from hyperspectral imagery and ICESat-2 data. TabKAN constructs tabular input features from hyperspectral reflectance, water indices, imaging geometry, and environmental variables; employs TabNet for feature selection and encoding; and introduces a KAN regression head to enhance nonlinear bathymetric mapping. A joint-supervision and bias-correction mechanism is further designed to incorporate ICESat-2 samples, thereby improving model robustness across lakes and acquisition dates. To enhance the temporal coverage of training samples, multi-year sample expansion based on stereo-mapping data is introduced, and a stripe-aware self-supervised learning strategy is developed for hyperspectral image restoration and pretraining. Experiments on five Tibetan Plateau lakes, including Anglaren Co, Caiduo Chaka, Cuoe, Geren Co, and Qixiang Co, show that the proposed method outperforms benchmark methods in both overall accuracy and depth-stratified evaluation, while providing more stable recovery of basin morphology and depth gradients. These results demonstrate that combining hyperspectral information, ICESat-2 laser constraints, and stripe-aware pretraining can effectively improve the accuracy and robustness of bathymetric inversion for Tibetan Plateau lakes and provide a new technical route for storage estimation and change monitoring of cold inland lakes. Full article

(This article belongs to the Special Issue Recent Advances in Hyperspectral Remote Sensing: Theories, Technologies and Applications)

► Show Figures

Figure 1

24 pages, 3834 KB

Open AccessArticle

DMNet: A Frequency-Enhanced and Adaptive Spatial Fusion Network for RGB–Infrared Object Detection

by Yuchen Yao, Xinlong Wang and Yan Liu

Sensors 2026, 26(12), 3625; https://doi.org/10.3390/s26123625 - 6 Jun 2026

Viewed by 371

Abstract

Object detection in complex environments remains challenging due to illumination variations, background clutter, and the presence of small objects. Multimodal detection methods based on RGB and infrared (IR) data have shown promising potential by leveraging complementary information across modalities. However, existing approaches still [...] Read more.

Object detection in complex environments remains challenging due to illumination variations, background clutter, and the presence of small objects. Multimodal detection methods based on RGB and infrared (IR) data have shown promising potential by leveraging complementary information across modalities. However, existing approaches still suffer from cross-modal feature misalignment, loss of fine-grained details, and insufficient semantic interaction. In this work, we introduce a novel dual-stream framework called DMNet, specifically tailored for visible and IR multimodal object detection. The architecture integrates four core components designed to tackle these challenges: surface detail fusion (SDF) for shallow feature alignment, wavelet feature extraction (WFE) for frequency-domain enhancement, context-guided enhancement (CGE) for semantic refinement, and adaptive spatial fusion (ASF) for multi-scale feature aggregation. We conduct extensive evaluations on three benchmark datasets, including M3FD, LLVIP, and VEDAI, demonstrating that DMNet achieves superior detection performance compared with existing methods. Experimental results confirm that DMNet outperforms existing approaches, achieving an mAP@0.5 of 78.4% on M3FD, 94.4% on LLVIP, and 59.0% on VEDAI. Notably, the model maintains a relatively compact parameter scale (5.72 million parameters) while achieving superior detection performance, making it suitable for practical deployment. These findings highlight DMNet as an effective and efficient solution for multimodal object detection under challenging conditions, especially in low-light and small-object scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 12268 KB

Open AccessArticle

Phase Congruency-Guided Cross-Scale Contextual Fusion Network for Salient Object Detection in Optical Remote Sensing Images

by Junfang Jiang, Wanjin Wang, Xiaohui Lin, Pingping Miao, Lina Gao and Mingzhu Xu

Remote Sens. 2026, 18(11), 1847; https://doi.org/10.3390/rs18111847 - 4 Jun 2026

Viewed by 186

Abstract

In recent years, salient object detection in optical remote sensing images (ORSI-SOD) has garnered increasing research attention. However, in practical applications, issues such as blurred target edges under low-contrast and complex background interference continue to restrict the accuracy and robustness of detection. To [...] Read more.

In recent years, salient object detection in optical remote sensing images (ORSI-SOD) has garnered increasing research attention. However, in practical applications, issues such as blurred target edges under low-contrast and complex background interference continue to restrict the accuracy and robustness of detection. To address these problems, this paper proposes the Phase Congruency-Guided Cross-Scale Contextual Fusion Network (PCFNet). Specifically, we design a novel Phase Congruency Enhanced Module (PCE) to solve the problem of low-contrast between targets and backgrounds. It acquire phase features via Fourier decomposition and employs them to generate a weighting map to modulate the shallow features via element-wise multiplication, thereby highlighting structurally significant regions. Meanwhile, we adopt a tailored loss weighting mechanism to weight phase congruency learning for better PCE adaptation. To address complex background interference, we design a novel Dynamic Residual Fusion (DRF) Module. It leverages dynamic spatial attention to generate sample-specific kernels that perform convolution to spatially weight features and uses consecutive residual connection, thereby refining multi-scale features to accurately capture effective targets under complex background interference. Experiments on ORSSD, EORSSD, and ORSI4199 benchmarks demonstrate that PCFNet achieves nine best performances and three second-best performances across the twelve core evaluation metrics, outperforming 23 state-of-the-art methods. Notably, the

F_{β}

score is 1.16% higher than HFCNet on ORSSD and 0.85% higher than MCPNet on EORSSD. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 2289 KB

Open AccessArticle

Superpixel Random Selection Random Walk Multi-Branch Depthwise Convolutional Neural Network for Hyperspectral Image Classification

by Kai Zhang, Xinwei Jiang and Zhihua Cai

Sensors 2026, 26(11), 3558; https://doi.org/10.3390/s26113558 - 3 Jun 2026

Viewed by 294

Abstract

Convolutional neural networks (CNNs) and training-free CNN variants have been successfully applied to hyperspectral image (HSI) processing and analysis. Training-free CNNs have shown promising feature extraction performance, which could effectively address the issue of typical CNNs being highly parameterized; however, inevitable noise and [...] Read more.

Convolutional neural networks (CNNs) and training-free CNN variants have been successfully applied to hyperspectral image (HSI) processing and analysis. Training-free CNNs have shown promising feature extraction performance, which could effectively address the issue of typical CNNs being highly parameterized; however, inevitable noise and redundancy in the randomly selected training-free convolutional kernels often leads to unsatisfactory performance. To address this issue, we propose Superpixel Random Selection Random Walk Multi-Branch Depthwise Convolutional Neural Network (SRSRWMD-CNN). Specifically, we propose a novel training-free convolutional neural network characterized by inter-layer multi-scale integration and intra-layer grouping. Various superpixels groups are first generated through multi-scale superpixel segmentation algorithms, then the predetermined number of superpixels are randomly sampled from these groups to serve as training-free convolution kernels. This mechanism enables adaptive computation of HSI feature maps without costly model training in the feature extraction stage, allowing the network to effectively capture a multi-scale spectral–spatial feature representation. Additionally, we propose a multi-branch depthwise convolution strategy that mitigates feature learning errors while significantly enhancing feature representation capabilities. A random walk strategy is employed to expand the receptive field and enhance the robustness of the training-free convolution kernels. Finally, the multi-scale spectral–spatial features are concatenated with the multiple convolutional stages to fuse salient shallow and deep features for accurate HSI classification. Extensive experiments demonstrate that the proposed method achieves superior performance compared to state-of-the-art algorithms. Full article

(This article belongs to the Special Issue High-Frequency Spectroscopy and Imaging: Techniques and Applications)

► Show Figures

Figure 1

28 pages, 37658 KB

Open AccessArticle

LDSDet: Long-Range Context and Dynamic Cross-Modal Alignment for Multimodal Object Detection Under Challenging Illumination

by Shijun Sun, Shuai Ma, Xuyang Feng, Chen Sun, Baolong Ding, Yaoyao Ran and Yihong Zhang

Remote Sens. 2026, 18(11), 1827; https://doi.org/10.3390/rs18111827 - 3 Jun 2026

Viewed by 322

Abstract

In the field of remote sensing applications, multimodal object detection has emerged as an important technique for enhancing perception robustness in UAV-based scenarios. Nevertheless, RGB–IR UAV detection remains difficult: Degraded illumination destabilizes shallow representations and weakens local discriminative cues, while spatial inconsistencies and [...] Read more.

In the field of remote sensing applications, multimodal object detection has emerged as an important technique for enhancing perception robustness in UAV-based scenarios. Nevertheless, RGB–IR UAV detection remains difficult: Degraded illumination destabilizes shallow representations and weakens local discriminative cues, while spatial inconsistencies and fluctuating modality reliability further hinder cross-modal interaction. In addition, existing methods, which often depend on global illumination estimation or simplistic fusion schemes, struggle to jointly maintain contextual stability, reliable cross-modal interaction, and compact discriminative representations in complex aerial scenes. To address these issues, this paper proposes LDSDet, an RGB–IR multimodal UAV object detector for challenging illumination conditions. Specifically, LDSDet integrates three complementary modules: a Long-range Aware Residual Convolution (LARC) module that enhances contextual perception and stabilizes shallow features; a Dynamic Attention-based Cross-modal Fusion (DACF) block that performs spatially adaptive RGB–IR interaction; and a lightweight SeqShuffleGate (SSG) module that suppresses redundant fusion responses to yield compact and discriminative multimodal representations. Extensive experiments on DroneVehicle, FLIR-Aligned, and LLVIP demonstrate the effectiveness of LDSDet, which achieves 85.2%

{mAP}_{50}

, 45.3% mAP, and 67.1% mAP, respectively, showing strong robustness under day–night alternation, low-light environments, and complex illumination variations. Full article

(This article belongs to the Section Remote Sensing for Geospatial Science)

► Show Figures

Figure 1

16 pages, 2242 KB

Open AccessArticle

‘Typical’ No More: Digital Re-Evaluation of Yanguoxia Caririchnium Trackways Reveals Behavioural Complexity

by Anthony Romilio

Geosciences 2026, 16(6), 221; https://doi.org/10.3390/geosciences16060221 - 2 Jun 2026

Viewed by 273

Abstract

Ornithopod dinosaur trackways (OA and OB) from the Lower Cretaceous Hekou Group at Yanguoxia (Gansu Province, China) have previously been described as “typical”—a term applied to contrast them with swim traces from the same surface rather than as a comprehensive behavioural assessment. Building [...] Read more.

Ornithopod dinosaur trackways (OA and OB) from the Lower Cretaceous Hekou Group at Yanguoxia (Gansu Province, China) have previously been described as “typical”—a term applied to contrast them with swim traces from the same surface rather than as a comprehensive behavioural assessment. Building on published trackway maps, this study uses an expanded suite of quantitative digital analytical tools to reassess ichnotaxonomic affinity, manus–pes relationships, and locomotor behaviour. Pes morphology in both trackways is consistent with the ornithopod ichnogenus Caririchnium, with closest affinity to Caririchnium lotus. However, quantitative analysis reveals crossover events, extreme pes-dominated heteropody, and unusual manus placement that depart substantially from expectations for typical quadrupedal ornithopod locomotion. These features are most parsimoniously explained by trackmaker locomotion under shallow subaqueous conditions, in which partial buoyancy reduced effective forelimb loading rather than reflecting anatomically reduced palmar surfaces. Exploratory statistical analysis indicates left–right asymmetry in pace and step parameters within the OA trackway, raising the possibility of lateralised locomotor behaviour. Together, these findings demonstrate that trackways previously regarded as typical may preserve unrecognised behavioural complexity, and that digital re-evaluation of legacy ichnological datasets can substantially refine interpretations of dinosaur locomotion. Full article

(This article belongs to the Special Issue Perspectives on Palaeogeography, Palaeoclimate, Palaeobiology and Sedimentary Records)

► Show Figures

Figure 1

Search Results (542)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (542)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI