Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (739)

Search Parameters:
Keywords = fine-grained image

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 8476 KiB  
Article
A Weakly Supervised Network for Coarse-to-Fine Change Detection in Hyperspectral Images
by Yadong Zhao and Zhao Chen
Remote Sens. 2025, 17(15), 2624; https://doi.org/10.3390/rs17152624 - 28 Jul 2025
Abstract
Hyperspectral image change detection (HSI-CD) provides substantial value in environmental monitoring, urban planning and other fields. In recent years, deep-learning based HSI-CD methods have made remarkable progress due to their powerful nonlinear feature learning capabilities, yet they face several challenges: mixed-pixel phenomenon affecting [...] Read more.
Hyperspectral image change detection (HSI-CD) provides substantial value in environmental monitoring, urban planning and other fields. In recent years, deep-learning based HSI-CD methods have made remarkable progress due to their powerful nonlinear feature learning capabilities, yet they face several challenges: mixed-pixel phenomenon affecting pixel-level detection accuracy; heterogeneous spatial scales of change targets where coarse-grained features fail to preserve fine-grained details; and dependence on high-quality labels. To address these challenges, this paper introduces WSCDNet, a weakly supervised HSI-CD network employing coarse-to-fine feature learning, with key innovations including: (1) A dual-branch detection framework integrating binary and multiclass change detection at the sub-pixel level that enhances collaborative optimization through a cross-feature coupling module; (2) introduction of multi-granularity aggregation and difference feature enhancement module for detecting easily confused regions, which effectively improves the model’s detection accuracy; and (3) proposal of a weakly supervised learning strategy, reducing model sensitivity to noisy pseudo-labels through decision-level consistency measurement and sample filtering mechanisms. Experimental results demonstrate that WSCDNet effectively enhances the accuracy and robustness of HSI-CD tasks, exhibiting superior performance under complex scenarios and weakly supervised conditions. Full article
(This article belongs to the Section Remote Sensing Image Processing)
19 pages, 2106 KiB  
Article
Rethinking Infrared and Visible Image Fusion from a Heterogeneous Content Synergistic Perception Perspective
by Minxian Shen, Gongrui Huang, Mingye Ju and Kaikuang Ma
Sensors 2025, 25(15), 4658; https://doi.org/10.3390/s25154658 - 27 Jul 2025
Abstract
Infrared and visible image fusion (IVIF) endeavors to amalgamate the thermal radiation characteristics from infrared images with the fine-grained texture details from visible images, aiming to produce fused outputs that are more robust and information-rich. Among the existing methodologies, those based on generative [...] Read more.
Infrared and visible image fusion (IVIF) endeavors to amalgamate the thermal radiation characteristics from infrared images with the fine-grained texture details from visible images, aiming to produce fused outputs that are more robust and information-rich. Among the existing methodologies, those based on generative adversarial networks (GANs) have demonstrated considerable promise. However, such approaches are frequently constrained by their reliance on homogeneous discriminators possessing identical architectures, a limitation that can precipitate the emergence of undesirable artifacts in the resultant fused images. To surmount this challenge, this paper introduces HCSPNet, a novel GAN-based framework. HCSPNet distinctively incorporates heterogeneous dual discriminators, meticulously engineered for the fusion of disparate source images inherent in the IVIF task. This architectural design ensures the steadfast preservation of critical information from the source inputs, even when faced with scenarios of image degradation. Specifically, the two structurally distinct discriminators within HCSPNet are augmented with adaptive salient information distillation (ASID) modules, each uniquely structured to align with the intrinsic properties of infrared and visible images. This mechanism impels the discriminators to concentrate on pivotal components during their assessment of whether the fused image has proficiently inherited significant information from the source modalities—namely, the salient thermal signatures from infrared imagery and the detailed textural content from visible imagery—thereby markedly diminishing the occurrence of unwanted artifacts. Comprehensive experimentation conducted across multiple publicly available datasets substantiates the preeminence and generalization capabilities of HCSPNet, underscoring its significant potential for practical deployment. Additionally, we also prove that our proposed heterogeneous dual discriminators can serve as a plug-and-play structure to improve the performance of existing GAN-based methods. Full article
(This article belongs to the Section Sensing and Imaging)
23 pages, 3875 KiB  
Article
Soil Water-Soluble Ion Inversion via Hyperspectral Data Reconstruction and Multi-Scale Attention Mechanism: A Remote Sensing Case Study of Farmland Saline–Alkali Lands
by Meichen Liu, Shengwei Zhang, Jing Gao, Bo Wang, Kedi Fang, Lu Liu, Shengwei Lv and Qian Zhang
Agronomy 2025, 15(8), 1779; https://doi.org/10.3390/agronomy15081779 - 24 Jul 2025
Viewed by 357
Abstract
The salinization of agricultural soils is a serious threat to farming and ecological balance in arid and semi-arid regions. Accurate estimation of soil water-soluble ions (calcium, carbonate, magnesium, and sulfate) is necessary for correct monitoring of soil salinization and sustainable land management. Hyperspectral [...] Read more.
The salinization of agricultural soils is a serious threat to farming and ecological balance in arid and semi-arid regions. Accurate estimation of soil water-soluble ions (calcium, carbonate, magnesium, and sulfate) is necessary for correct monitoring of soil salinization and sustainable land management. Hyperspectral ground-based data are valuable in soil salinization monitoring, but the acquisition cost is high, and the coverage is small. Therefore, this study proposes a two-stage deep learning framework with multispectral remote-sensing images. First, the wavelet transform is used to enhance the Transformer and extract fine-grained spectral features to reconstruct the ground-based hyperspectral data. A comparison of ground-based hyperspectral data shows that the reconstructed spectra match the measured data in the 450–998 nm range, with R2 up to 0.98 and MSE = 0.31. This high similarity compensates for the low spectral resolution and weak feature expression of multispectral remote-sensing data. Subsequently, this enhanced spectral information was integrated and fed into a novel multiscale self-attentive Transformer model (MSATransformer) to invert four water-soluble ions. Compared with BPANN, MLP, and the standard Transformer model, our model remains robust across different spectra, achieving an R2 of up to 0.95 and reducing the average relative error by more than 30%. Among them, for the strongly responsive ions magnesium and sulfate, R2 reaches 0.92 and 0.95 (with RMSE of 0.13 and 0.29 g/kg, respectively). For the weakly responsive ions calcium and carbonate, R2 stays above 0.80 (RMSE is below 0.40 g/kg). The MSATransformer framework provides a low-cost and high-accuracy solution to monitor soil salinization at large scales and supports precision farmland management. Full article
(This article belongs to the Special Issue Water and Fertilizer Regulation Theory and Technology in Crops)
Show Figures

Figure 1

22 pages, 4611 KiB  
Article
MMC-YOLO: A Lightweight Model for Real-Time Detection of Geometric Symmetry-Breaking Defects in Wind Turbine Blades
by Caiye Liu, Chao Zhang, Xinyu Ge, Xunmeng An and Nan Xue
Symmetry 2025, 17(8), 1183; https://doi.org/10.3390/sym17081183 - 24 Jul 2025
Viewed by 190
Abstract
Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background [...] Read more.
Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background interference. To address this, based on the high-speed detection model YOLOv10-N, this paper proposes a novel detection model named MMC-YOLO. First, the Multi-Scale Perception Gated Convolution (MSGConv) Module was designed, which constructs a full-scale receptive field through multi-branch fusion and channel rearrangement to enhance the extraction of geometric asymmetry features. Second, the Multi-Scale Enhanced Feature Pyramid Network (MSEFPN) was developed, integrating dynamic path aggregation and an SENetv2 attention mechanism to suppress background interference and amplify damage response. Finally, the Channel-Compensated Filtering (CCF) module was constructed to preserve critical channel information using a dynamic buffering mechanism. Evaluated on a dataset of 4818 wind turbine blade damage images, MMC-YOLO achieves an 82.4% mAP [0.5:0.95], representing a 4.4% improvement over the baseline YOLOv10-N model, and a 91.1% recall rate, an 8.7% increase, while maintaining a lightweight parameter count of 4.2 million. This framework significantly enhances geometric asymmetry defect detection accuracy while ensuring real-time performance, meeting engineering requirements for high efficiency and precision. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)
Show Figures

Figure 1

22 pages, 7139 KiB  
Article
Influence of Fe Ions on the Surface, Microstructural and Optical Properties of Solution Precursor Plasma-Sprayed TiO2 Coatings
by Key Simfroso, Romnick Unabia, Anna Gibas, Michał Mazur, Paweł Sokołowski and Rolando Candidato
Coatings 2025, 15(8), 870; https://doi.org/10.3390/coatings15080870 - 24 Jul 2025
Viewed by 502
Abstract
This work investigates on how Fe incorporation influences the surface, microstructural, and optical properties of solution precursor plasma-sprayed TiO2 coatings. The Fe-TiO2 coatings were prepared using titanium isopropoxide and iron acetylacetonate as precursors, with ethanol as the solvent. X-ray diffraction analysis [...] Read more.
This work investigates on how Fe incorporation influences the surface, microstructural, and optical properties of solution precursor plasma-sprayed TiO2 coatings. The Fe-TiO2 coatings were prepared using titanium isopropoxide and iron acetylacetonate as precursors, with ethanol as the solvent. X-ray diffraction analysis revealed the existence of both anatase and rutile TiO2 phases, with a predominant rutile phase, also confirmed by Raman spectroscopy. There was an increase in the anatase crystals upon the addition of Fe ions. A longer spray distance further enhanced the anatase content and reduced the average TiO2 crystallite sizes present in the Fe-added coatings. SEM cross-sectional images displayed finely grained, densely packed deposits in the Fe-added coatings. UV-Vis spectroscopy showed visible-light absorption by the Fe-TiO2 coatings, with reduced band gap energies ranging from 2.846 ± 0.002 eV to 2.936 ± 0.003 eV. Photoluminescence analysis showed reduced emission intensity at 356 nm (3.48 eV) for the Fe-TiO2 coatings. These findings confirm solution precursor plasma spray to be an effective method for developing Fe-TiO2 coatings with potential application as visible-light-active photocatalysts. Full article
Show Figures

Figure 1

25 pages, 5142 KiB  
Article
Wheat Powdery Mildew Severity Classification Based on an Improved ResNet34 Model
by Meilin Li, Yufeng Guo, Wei Guo, Hongbo Qiao, Lei Shi, Yang Liu, Guang Zheng, Hui Zhang and Qiang Wang
Agriculture 2025, 15(15), 1580; https://doi.org/10.3390/agriculture15151580 - 23 Jul 2025
Viewed by 211
Abstract
Crop disease identification is a pivotal research area in smart agriculture, forming the foundation for disease mapping and targeted prevention strategies. Among the most prevalent global wheat diseases, powdery mildew—caused by fungal infection—poses a significant threat to crop yield and quality, making early [...] Read more.
Crop disease identification is a pivotal research area in smart agriculture, forming the foundation for disease mapping and targeted prevention strategies. Among the most prevalent global wheat diseases, powdery mildew—caused by fungal infection—poses a significant threat to crop yield and quality, making early and accurate detection crucial for effective management. In this study, we present QY-SE-MResNet34, a deep learning-based classification model that builds upon ResNet34 to perform multi-class classification of wheat leaf images and assess powdery mildew severity at the single-leaf level. The proposed methodology begins with dataset construction following the GBT 17980.22-2000 national standard for powdery mildew severity grading, resulting in a curated collection of 4248 wheat leaf images at the grain-filling stage across six severity levels. To enhance model performance, we integrated transfer learning with ResNet34, leveraging pretrained weights to improve feature extraction and accelerate convergence. Further refinements included embedding a Squeeze-and-Excitation (SE) block to strengthen feature representation while maintaining computational efficiency. The model architecture was also optimized by modifying the first convolutional layer (conv1)—replacing the original 7 × 7 kernel with a 3 × 3 kernel, adjusting the stride to 1, and setting padding to 1—to better capture fine-grained leaf textures and edge features. Subsequently, the optimal training strategy was determined through hyperparameter tuning experiments, and GrabCut-based background processing along with data augmentation were introduced to enhance model robustness. In addition, interpretability techniques such as channel masking and Grad-CAM were employed to visualize the model’s decision-making process. Experimental validation demonstrated that QY-SE-MResNet34 achieved an 89% classification accuracy, outperforming established models such as ResNet50, VGG16, and MobileNetV2 and surpassing the original ResNet34 by 11%. This study delivers a high-performance solution for single-leaf wheat powdery mildew severity assessment, offering practical value for intelligent disease monitoring and early warning systems in precision agriculture. Full article
Show Figures

Figure 1

17 pages, 1927 KiB  
Article
ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments
by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang
Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025
Viewed by 253
Abstract
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

35 pages, 58241 KiB  
Article
DGMNet: Hyperspectral Unmixing Dual-Branch Network Integrating Adaptive Hop-Aware GCN and Neighborhood Offset Mamba
by Kewen Qu, Huiyang Wang, Mingming Ding, Xiaojuan Luo and Wenxing Bao
Remote Sens. 2025, 17(14), 2517; https://doi.org/10.3390/rs17142517 - 19 Jul 2025
Viewed by 218
Abstract
Hyperspectral sparse unmixing (SU) networks have recently received considerable attention due to their model hyperspectral images (HSIs) with a priori spectral libraries and to capture nonlinear features through deep networks. This method effectively avoids errors associated with endmember extraction, and enhances the unmixing [...] Read more.
Hyperspectral sparse unmixing (SU) networks have recently received considerable attention due to their model hyperspectral images (HSIs) with a priori spectral libraries and to capture nonlinear features through deep networks. This method effectively avoids errors associated with endmember extraction, and enhances the unmixing performance via nonlinear modeling. However, two major challenges remain: the use of large spectral libraries with high coherence leads to computational redundancy and performance degradation; moreover, certain feature extraction models, such as Transformer, while exhibiting strong representational capabilities, suffer from high computational complexity. To address these limitations, this paper proposes a hyperspectral unmixing dual-branch network integrating an adaptive hop-aware GCN and neighborhood offset Mamba that is termed DGMNet. Specifically, DGMNet consists of two parallel branches. The first branch employs the adaptive hop-neighborhood-aware GCN (AHNAGC) module to model global spatial features. The second branch utilizes the neighborhood spatial offset Mamba (NSOM) module to capture fine-grained local spatial structures. Subsequently, the designed Mamba-enhanced dual-stream feature fusion (MEDFF) module fuses the global and local spatial features extracted from the two branches and performs spectral feature learning through a spectral attention mechanism. Moreover, DGMNet innovatively incorporates a spectral-library-pruning mechanism into the SU network and designs a new pruning strategy that accounts for the contribution of small-target endmembers, thereby enabling the dynamic selection of valid endmembers and reducing the computational redundancy. Finally, an improved ESS-Loss is proposed, which combines an enhanced total variation (ETV) with an l1/2 sparsity constraint to effectively refine the model performance. The experimental results on two synthetic and five real datasets demonstrate the effectiveness and superiority of the proposed method compared with the state-of-the-art methods. Notably, experiments on the Shahu dataset from the Gaofen-5 satellite further demonstrated DGMNet’s robustness and generalization. Full article
(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)
Show Figures

Figure 1

21 pages, 5616 KiB  
Article
Symmetry-Guided Dual-Branch Network with Adaptive Feature Fusion and Edge-Aware Attention for Image Tampering Localization
by Zhenxiang He, Le Li and Hanbin Wang
Symmetry 2025, 17(7), 1150; https://doi.org/10.3390/sym17071150 - 18 Jul 2025
Viewed by 219
Abstract
When faced with diverse types of image tampering and image quality degradation in real-world scenarios, traditional image tampering localization methods often struggle to balance boundary accuracy and robustness. To address these issues, this paper proposes a symmetric guided dual-branch image tampering localization network—FENet [...] Read more.
When faced with diverse types of image tampering and image quality degradation in real-world scenarios, traditional image tampering localization methods often struggle to balance boundary accuracy and robustness. To address these issues, this paper proposes a symmetric guided dual-branch image tampering localization network—FENet (Fusion-Enhanced Network)—that integrates adaptive feature fusion and edge attention mechanisms. This method is based on a structurally symmetric dual-branch architecture, which extracts RGB semantic features and SRM noise residual information to comprehensively capture the fine-grained differences in tampered regions at the visual and statistical levels. To effectively fuse different features, this paper designs a self-calibrating fusion module (SCF), which introduces a content-aware dynamic weighting mechanism to adaptively adjust the importance of different feature branches, thereby enhancing the discriminative power and expressiveness of the fused features. Furthermore, considering that image tampering often involves abnormal changes in edge structures, we further propose an edge-aware coordinate attention mechanism (ECAM). By jointly modeling spatial position information and edge-guided information, the model is guided to focus more precisely on potential tampering boundaries, thereby enhancing its boundary detection and localization capabilities. Experiments on public datasets such as Columbia, CASIA, and NIST16 demonstrate that FENet achieves significantly better results than existing methods. We also analyze the model’s performance under various image quality conditions, such as JPEG compression and Gaussian blur, demonstrating its robustness in real-world scenarios. Experiments in Facebook, Weibo, and WeChat scenarios show that our method achieves average F1 scores that are 2.8%, 3%, and 5.6% higher than those of existing state-of-the-art methods, respectively. Full article
Show Figures

Figure 1

28 pages, 5450 KiB  
Article
DFAST: A Differential-Frequency Attention-Based Band Selection Transformer for Hyperspectral Image Classification
by Deren Fu, Yiliang Zeng and Jiahong Zhao
Remote Sens. 2025, 17(14), 2488; https://doi.org/10.3390/rs17142488 - 17 Jul 2025
Viewed by 157
Abstract
Hyperspectral image (HSI) classification faces challenges such as high dimensionality, spectral redundancy, and difficulty in modeling the coupling between spectral and spatial features. Existing methods fail to fully exploit first-order derivatives and frequency domain information, which limits classification performance. To address these issues, [...] Read more.
Hyperspectral image (HSI) classification faces challenges such as high dimensionality, spectral redundancy, and difficulty in modeling the coupling between spectral and spatial features. Existing methods fail to fully exploit first-order derivatives and frequency domain information, which limits classification performance. To address these issues, this paper proposes a Differential-Frequency Attention-based Band Selection Transformer (DFAST) for HSI classification. Specifically, a Differential-Frequency Attention-based Band Selection Embedding Module (DFASEmbeddings) is designed to extract original spectral, first-order derivative, and frequency domain features via a multi-branch structure. Learnable band selection attention weights are introduced to adaptively select important bands, capture critical spectral information, and significantly reduce redundancy. A 3D convolution and a spectral–spatial attention mechanism are applied to perform fine-grained modeling of spectral and spatial features, further enhancing the global dependency capture of spectral–spatial features. The embedded features are then input into a cascaded Transformer encoder (SCEncoder) for deep modeling of spectral–spatial coupling characteristics to achieve classification. Additionally, learnable attention weights for band selection are outputted for dimensionality reduction. Experiments on several public hyperspectral datasets demonstrate that the proposed method outperforms existing CNN and Transformer-based approaches in classification performance. Full article
Show Figures

Figure 1

29 pages, 5825 KiB  
Article
BBSNet: An Intelligent Grading Method for Pork Freshness Based on Few-Shot Learning
by Chao Liu, Jiayu Zhang, Kunjie Chen and Jichao Huang
Foods 2025, 14(14), 2480; https://doi.org/10.3390/foods14142480 - 15 Jul 2025
Viewed by 275
Abstract
Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with [...] Read more.
Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with a limited number of images. BBSNet incorporates a batch channel normalization (BCN) layer to enhance feature distinguishability and employs BiFormer for optimized fine-grained feature extraction. Trained on a dataset of 600 pork images graded by microbial cell concentration, BBSNet achieved an average accuracy of 96.36% in a challenging 5-way 80-shot task. This approach significantly reduces data dependency while maintaining high accuracy, presenting a viable solution for cost-effective real-time pork quality monitoring. This work introduces a novel framework that connects laboratory freshness indicators to industrial applications in data-scarce conditions. Future research will investigate its extension to various food types and optimization for deployment on portable devices. Full article
Show Figures

Figure 1

28 pages, 7404 KiB  
Article
SR-YOLO: Spatial-to-Depth Enhanced Multi-Scale Attention Network for Small Target Detection in UAV Aerial Imagery
by Shasha Zhao, He Chen, Di Zhang, Yiyao Tao, Xiangnan Feng and Dengyin Zhang
Remote Sens. 2025, 17(14), 2441; https://doi.org/10.3390/rs17142441 - 14 Jul 2025
Viewed by 327
Abstract
The detection of aerial imagery captured by Unmanned Aerial Vehicles (UAVs) is widely employed across various domains, including engineering construction, traffic regulation, and precision agriculture. However, aerial images are typically characterized by numerous small targets, significant occlusion issues, and densely clustered targets, rendering [...] Read more.
The detection of aerial imagery captured by Unmanned Aerial Vehicles (UAVs) is widely employed across various domains, including engineering construction, traffic regulation, and precision agriculture. However, aerial images are typically characterized by numerous small targets, significant occlusion issues, and densely clustered targets, rendering traditional detection algorithms largely ineffective for such imagery. This work proposes a small target detection algorithm, SR-YOLO. It is specifically tailored to address these challenges in UAV-captured aerial images. First, the Space-to-Depth layer and Receptive Field Attention Convolution are combined, and the SR-Conv module is designed to replace the Conv module within the original backbone network. This hybrid module extracts more fine-grained information about small target features by converting image spatial information into depth information and the attention of the network to targets of different scales. Second, a small target detection layer and a bidirectional feature pyramid network mechanism are introduced to enhance the neck network, thereby strengthening the feature extraction and fusion capabilities for small targets. Finally, the model’s detection performance for small targets is improved by utilizing the Normalized Wasserstein Distance loss function to optimize the Complete Intersection over Union loss function. Empirical results demonstrate that the SR-YOLO algorithm significantly enhances the precision of small target detection in UAV aerial images. Ablation experiments and comparative experiments are conducted on the VisDrone2019 and RSOD datasets. Compared to the baseline algorithm YOLOv8s, our SR-YOLO algorithm has improved mAP@0.5 by 6.3% and 3.5% and mAP@0.5:0.95 by 3.8% and 2.3% on the datasets VisDrone2019 and RSOD, respectively. It also achieves superior detection results compared to other mainstream target detection methods. Full article
Show Figures

Figure 1

25 pages, 85368 KiB  
Article
SMA-YOLO: An Improved YOLOv8 Algorithm Based on Parameter-Free Attention Mechanism and Multi-Scale Feature Fusion for Small Object Detection in UAV Images
by Shenming Qu, Chaoxu Dang, Wangyou Chen and Yanhong Liu
Remote Sens. 2025, 17(14), 2421; https://doi.org/10.3390/rs17142421 - 12 Jul 2025
Viewed by 622
Abstract
With special consideration for complex scenes and densely distributed small objects, this frequently leads to serious false and missed detections for unmanned aerial vehicle (UAV) images in small object detection scenarios. Consequently, we propose a UAV image small object detection algorithm, termed SMA-YOLO. [...] Read more.
With special consideration for complex scenes and densely distributed small objects, this frequently leads to serious false and missed detections for unmanned aerial vehicle (UAV) images in small object detection scenarios. Consequently, we propose a UAV image small object detection algorithm, termed SMA-YOLO. Firstly, a parameter-free simple slicing convolution (SSC) module is integrated in the backbone network to slice the feature maps and enhance the features so as to effectively retain the features of small objects. Subsequently, to enhance the information exchange between upper and lower layers, we design a special multi-cross-scale feature pyramid network (M-FPN). The C2f-Hierarchical-Phantom Convolution (C2f-HPC) module in the network effectively reduces information loss by fine-grained multi-scale feature fusion. Ultimately, adaptive spatial feature fusion detection Head (ASFFDHead) introduces an additional P2 detection head to enhance the resolution of feature maps to better locate small objects. Moreover, the ASFF mechanism is employed to optimize the detection process by filtering out information conflicts during multi-scale feature fusion, thereby significantly optimizing small object detection capability. Using YOLOv8n as the baseline, SMA-YOLO is evaluated on the VisDrone2019 dataset, achieving a 7.4% improvement in mAP@0.5 and a 13.3% reduction in model parameters, and we also verified its generalization ability on VAUDT and RSOD datasets, which demonstrates the effectiveness of our approach. Full article
Show Figures

Graphical abstract

25 pages, 4882 KiB  
Article
HSF-YOLO: A Multi-Scale and Gradient-Aware Network for Small Object Detection in Remote Sensing Images
by Fujun Wang and Xing Wang
Sensors 2025, 25(14), 4369; https://doi.org/10.3390/s25144369 - 12 Jul 2025
Viewed by 392
Abstract
Small object detection (SOD) in remote sensing images (RSIs) is a challenging task due to scale variation, severe occlusion, and complex backgrounds, often leading to high miss and false detection rates. To address these issues, this paper proposes a novel detection framework named [...] Read more.
Small object detection (SOD) in remote sensing images (RSIs) is a challenging task due to scale variation, severe occlusion, and complex backgrounds, often leading to high miss and false detection rates. To address these issues, this paper proposes a novel detection framework named HSF-YOLO, which is designed to jointly enhance feature encoding, attention interaction, and localization precision within the YOLOv8 backbone. Specifically, we introduce three tailored modules: Hybrid Atrous Enhanced Convolution (HAEC), a Spatial–Interactive–Shuffle attention module (C2f_SIS), and a Focal Gradient Refinement Loss (FGR-Loss). The HAEC module captures multi-scale semantic and fine-grained local information through parallel atrous and standard convolutions, thereby enhancing small object representation across scales. The C2f_SIS module fuses spatial and improved channel attention with a channel shuffle strategy to enhance feature interaction and suppress background noise. The FGR-Loss incorporates gradient-aware localization, focal weighting, and separation-aware constraints to improve regression accuracy and training robustness. Extensive experiments were conducted on three public remote sensing datasets. Compared with the baseline YOLOv8, HSF-YOLO improved mAP@0.5 and mAP@0.5:0.95 by 5.7% and 4.0% on the VisDrone2019 dataset, by 2.3% and 2.5% on the DIOR dataset, and by 2.3% and 2.1% on the NWPU VHR-10 dataset, respectively. These results confirm that HSF-YOLO is a unified and effective solution for small object detection in complex RSI scenarios, offering a good balance between accuracy and efficiency. Full article
(This article belongs to the Special Issue Application of Satellite Remote Sensing in Geospatial Monitoring)
Show Figures

Figure 1

23 pages, 10392 KiB  
Article
Dual-Branch Luminance–Chrominance Attention Network for Hydraulic Concrete Image Enhancement
by Zhangjun Peng, Li Li, Chuanhao Chang, Rong Tang, Guoqiang Zheng, Mingfei Wan, Juanping Jiang, Shuai Zhou, Zhenggang Tian and Zhigui Liu
Appl. Sci. 2025, 15(14), 7762; https://doi.org/10.3390/app15147762 - 10 Jul 2025
Viewed by 217
Abstract
Hydraulic concrete is a critical infrastructure material, with its surface condition playing a vital role in quality assessments for water conservancy and hydropower projects. However, images taken in complex hydraulic environments often suffer from degraded quality due to low lighting, shadows, and noise, [...] Read more.
Hydraulic concrete is a critical infrastructure material, with its surface condition playing a vital role in quality assessments for water conservancy and hydropower projects. However, images taken in complex hydraulic environments often suffer from degraded quality due to low lighting, shadows, and noise, making it difficult to distinguish defects from the background and thereby hindering accurate defect detection and damage evaluation. In this study, following systematic analyses of hydraulic concrete color space characteristics, we propose a Dual-Branch Luminance–Chrominance Attention Network (DBLCANet-HCIE) specifically designed for low-light hydraulic concrete image enhancement. Inspired by human visual perception, the network simultaneously improves global contrast and preserves fine-grained defect textures, which are essential for structural analysis. The proposed architecture consists of a Luminance Adjustment Branch (LAB) and a Chroma Restoration Branch (CRB). The LAB incorporates a Luminance-Aware Hybrid Attention Block (LAHAB) to capture both the global luminance distribution and local texture details, enabling adaptive illumination correction through comprehensive scene understanding. The CRB integrates a Channel Denoiser Block (CDB) for channel-specific noise suppression and a Frequency-Domain Detail Enhancement Block (FDDEB) to refine chrominance information and enhance subtle defect textures. A feature fusion block is designed to fuse and learn the features of the outputs from the two branches, resulting in images with enhanced luminance, reduced noise, and preserved surface anomalies. To validate the proposed approach, we construct a dedicated low-light hydraulic concrete image dataset (LLHCID). Extensive experiments conducted on both LOLv1 and LLHCID benchmarks demonstrate that the proposed method significantly enhances the visual interpretability of hydraulic concrete surfaces while effectively addressing low-light degradation challenges. Full article
Show Figures

Figure 1

Back to TopTop