Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (507)

Search Parameters:
Keywords = feature pyramid networks (FPN)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 11548 KB  
Article
Frequency-Aware Feature Pyramid Framework for Contextual Representation in Remote Sensing Object Detection
by Lingyun Gu, Qingyun Fang, Eugene Popov, Vitalii Pavlov, Sergey Volvenko, Sergey Makarov and Ge Dong
Astronautics 2026, 1(1), 5; https://doi.org/10.3390/astronautics1010005 (registering DOI) - 17 Jan 2026
Abstract
Remote sensing object detection is a critical task in Earth observation. Despite the remarkable progress made in general object detection, existing detectors struggle with remote sensing scenarios due to the prevalence of numerous small objects with limited discriminative cues. Cutting-edge studies have shown [...] Read more.
Remote sensing object detection is a critical task in Earth observation. Despite the remarkable progress made in general object detection, existing detectors struggle with remote sensing scenarios due to the prevalence of numerous small objects with limited discriminative cues. Cutting-edge studies have shown that incorporating contextual information effectively enhances the detection performance for small objects. Meanwhile, recent research has revealed that convolution in the frequency domain is capable of capturing long-range spatial dependencies with high efficiency. Inspired by this, we propose a Frequency-aware Feature Pyramid Framework (FFPF) for remote sensing object detection, which consists of a novel Frequency-aware ResNet (F-ResNet) and a Bilateral Spectral-aware Feature Pyramid Network (BS-FPN). Specifically, the F-ResNet is proposed to extract the spectral context information by plugging the frequency domain convolution into each stage of the backbone, thereby enriching features of small objects. In addition, the BS-FPN employs a bilateral sampling strategy and skipping connection to model the association of object features at different scales, enabling the contextual information extracted by the F-ResNet to be fully leveraged. Extensive experiments are conducted for object detection in the public remote sensing image dataset and natural image dataset. The experimental results demonstrate the excellent performance of the FFPF, achieving 73.8% mAP on the DIOR dataset without using any additional training tricks. Full article
(This article belongs to the Special Issue Feature Papers on Spacecraft Dynamics and Control)
Show Figures

Figure 1

25 pages, 65227 KB  
Article
SAANet: Detecting Dense and Crossed Stripe-like Space Objects Under Complex Stray Light Interference
by Yuyuan Liu, Hongfeng Long, Xinghui Sun, Yihui Zhao, Zhuo Chen, Yuebo Ma and Rujin Zhao
Remote Sens. 2026, 18(2), 299; https://doi.org/10.3390/rs18020299 - 16 Jan 2026
Abstract
With the deployment of mega-constellations, the proliferation of on-orbit Resident Space Objects (RSOs) poses a severe challenge to Space Situational Awareness (SSA). RSOs produce elongated and stripe-like signatures in long-exposure imagery as a result of their relative orbital motion. The accurate detection of [...] Read more.
With the deployment of mega-constellations, the proliferation of on-orbit Resident Space Objects (RSOs) poses a severe challenge to Space Situational Awareness (SSA). RSOs produce elongated and stripe-like signatures in long-exposure imagery as a result of their relative orbital motion. The accurate detection of these signatures is essential for critical applications like satellite navigation and space debris monitoring. However, on-orbit detection faces two challenges: the obscuration of dim RSOs by complex stray light interference, and their dense overlapping trajectories. To address these challenges, we propose the Shape-Aware Attention Network (SAANet), establishing a unified Shape-Aware Paradigm. The network features a streamlined Shape-Aware Feature Pyramid Network (SA-FPN) with structurally integrated Two-way Orthogonal Attention (TTOA) to explicitly model linear topologies, preserving dim signals under intense stray light conditions. Concurrently, we propose an Adaptive Linear Oriented Bounding Box (AL-OBB) detection head that leverages a Joint Geometric Constraint Mechanism to resolve the ambiguity of regressing targets amid dense, overlapping trajectories. Experiments on the AstroStripeSet and StarTrails datasets demonstrate that SAANet achieves state-of-the-art (SOTA) performance, achieving Recalls of 0.930 and 0.850, and Average Precisions (APs) of 0.864 and 0.815, respectively. Full article
Show Figures

Figure 1

22 pages, 15950 KB  
Article
An Automatic Identification Method for Large-Scale Landslide Hazard Potential Integrating InSAR and CRF-Faster RCNN: A Case Study of Ahai Reservoir Area in Jinsha River Basin
by Yujuan Dong, Yongfa Li, Xiaoqing Zuo, Na Liu, Xiaona Gu, Haoyi Shi, Rukun Jiang, Fangzhen Guo, Zhengxiong Gu and Yongzhi Chen
Remote Sens. 2026, 18(2), 283; https://doi.org/10.3390/rs18020283 - 15 Jan 2026
Viewed by 99
Abstract
Currently, the manual delineation of landslide anomalies from Interferometric Synthetic Aperture Radar(InSAR )deformation data is labor-intensive and time-consuming, creating a major bottleneck for operational large-scale landslide mapping. This study proposes an automated approach for large-scale landslide identification by integrating InSAR technology with an [...] Read more.
Currently, the manual delineation of landslide anomalies from Interferometric Synthetic Aperture Radar(InSAR )deformation data is labor-intensive and time-consuming, creating a major bottleneck for operational large-scale landslide mapping. This study proposes an automated approach for large-scale landslide identification by integrating InSAR technology with an improved Faster Regional Convolutional Neural Network (Faster R-CNN). First, surface deformation over the study area was obtained using the Small Baseline Subset Interferometric Synthetic Aperture Radar (SBAS-InSAR) technique. An enhanced CRF-Faster R-CNN model was then developed by incorporating a Residual Network with 50 layers (ResNet-50)-based backbone, strengthened with a Convolutional Block Attention Module (CBAM), within a Feature Pyramid Network (FPN) framework. This model was applied to deformation velocity maps for the automated detection of landslide-prone areas. Preliminary results were subsequently validated and refined using optical images to produce a final landslide inventory. The proposed method was evaluated in the Ahai Reservoir area of the Jinsha River Basin using 248 ascending and descending Sentinel-1A images acquired between January 2019 and December 2021. Its performance was compared with that of the standard Faster R-CNN model. The results indicate that the CRF-Faster R-CNN model outperforms the conventional approach in terms of landslide anomaly detection, convergence speed, and overall accuracy. A total of 38 potential landslide hazards were identified in the Ahai Reservoir area, with an 84% validation accuracy confirmed through field investigations. This study provides crucial technical support for the rapid identification and operational application of large-scale potential landslide hazards. Full article
Show Figures

Figure 1

23 pages, 1308 KB  
Article
MFA-Net: Multiscale Feature Attention Network for Medical Image Segmentation
by Jia Zhao, Han Tao, Song Liu, Meilin Li and Huilong Jin
Electronics 2026, 15(2), 330; https://doi.org/10.3390/electronics15020330 - 12 Jan 2026
Viewed by 103
Abstract
Medical image segmentation acts as a foundational element of medical image analysis. Yet its accuracy is frequently limited by the scale fluctuations of anatomical targets and the intricate contextual traits inherent in medical images—including vaguely defined structural boundaries and irregular shape distributions. To [...] Read more.
Medical image segmentation acts as a foundational element of medical image analysis. Yet its accuracy is frequently limited by the scale fluctuations of anatomical targets and the intricate contextual traits inherent in medical images—including vaguely defined structural boundaries and irregular shape distributions. To tackle these constraints, we design a multi-scale feature attention network (MFA-Net), customized specifically for thyroid nodule, skin lesion, and breast lesion segmentation tasks. This network framework integrates three core components: a Bidirectional Feature Pyramid Network (Bi-FPN), a Slim-neck structure, and the Convolutional Block Attention Module (CBAM). CBAM steers the model to prioritize boundary regions while filtering out irrelevant information, which in turn enhances segmentation precision. Bi-FPN facilitates more robust fusion of multi-scale features via iterative integration of top-down and bottom-up feature maps, supported by lateral and vertical connection pathways. The Slim-neck design is constructed to simplify the network’s architecture while effectively merging multi-scale representations of both target and background areas, thus enhancing the model’s overall performance. Validation across four public datasets covering thyroid ultrasound (TNUI-2021, TN-SCUI 2020), dermoscopy (ISIC 2016), and breast ultrasound (BUSI) shows that our method outperforms state-of-the-art segmentation approaches, achieving Dice similarity coefficients of 0.955, 0.971, 0.976, and 0.846, respectively. Additionally, the model maintains a compact parameter count of just 3.05 million and delivers an extremely fast inference latency of 1.9 milliseconds—metrics that significantly outperform those of current leading segmentation techniques. In summary, the proposed framework demonstrates strong performance in thyroid, skin, and breast lesion segmentation, delivering an optimal trade-off between high accuracy and computational efficiency. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application: Second Edition)
Show Figures

Figure 1

23 pages, 41532 KB  
Article
CW-DETR: An Efficient Detection Transformer for Traffic Signs in Complex Weather
by Tianpeng Wang, Qiaoshuang Teng, Shangyu Sun, Weidong Song, Jinhe Zhang and Yuxuan Li
Sensors 2026, 26(1), 325; https://doi.org/10.3390/s26010325 - 4 Jan 2026
Viewed by 264
Abstract
Traffic sign detection under adverse weather conditions remains challenging due to severe feature degradation caused by rain, fog, and snow, which significantly impairs the performance of existing detection systems. This study presents the CW-DETR (Complex Weather Detection Transformer), an end-to-end detection framework designed [...] Read more.
Traffic sign detection under adverse weather conditions remains challenging due to severe feature degradation caused by rain, fog, and snow, which significantly impairs the performance of existing detection systems. This study presents the CW-DETR (Complex Weather Detection Transformer), an end-to-end detection framework designed to address weather-induced feature deterioration in real-time applications. Building upon the RT-DETR, our approach integrates four key innovations: a multipath feature enhancement network (FPFENet) for preserving fine-grained textures, a Multiscale Edge Enhancement Module (MEEM) for combating boundary degradation, an adaptive dual-stream bidirectional feature pyramid network (ADBF-FPN) for cross-scale feature compensation, and a multiscale convolutional gating module (MCGM) for suppressing semantic–spatial confusion. Extensive experiments on the CCTSDB2021 dataset demonstrate that the CW-DETR achieves 69.0% AP and 94.4% AP50, outperforming state-of-the-art real-time detectors by 2.3–5.7 percentage points while maintaining computational efficiency (56.8 GFLOPs). A cross-dataset evaluation on TT100K, the TSRD, CNTSSS, and real-world snow conditions (LNTU-TSD) confirms the robust generalization capabilities of the proposed model. These results establish CW-DETR as an effective solution for all-weather traffic sign detection in intelligent transportation systems. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

29 pages, 15342 KB  
Article
GS-BiFPN-YOLO: A Lightweight and Efficient Method for Segmenting Cotton Leaves in the Field
by Weiqing Wu and Liping Chen
Agriculture 2026, 16(1), 102; https://doi.org/10.3390/agriculture16010102 - 31 Dec 2025
Viewed by 227
Abstract
Instance segmentation of cotton leaves in complex field environments presents challenges including low accuracy, high computational complexity, and costly data annotation. This paper presents GS-BiFPN-YOLO, a lightweight instance segmentation method that integrates SAM for semi-automatic labeling and enhances YOLOv11n-seg with GSConv, BiFPN, and [...] Read more.
Instance segmentation of cotton leaves in complex field environments presents challenges including low accuracy, high computational complexity, and costly data annotation. This paper presents GS-BiFPN-YOLO, a lightweight instance segmentation method that integrates SAM for semi-automatic labeling and enhances YOLOv11n-seg with GSConv, BiFPN, and CBAMs to reduce annotation cost and improve accuracy. To streamline parameters, the YOLOv11-seg architecture incorporates the lightweight GSConv module, utilizing group convolution and channel shuffle. Integration of a Bidirectional Feature Pyramid Network (BiFPN) enhances multi-scale feature fusion, while a Convolutional Block Attention Module (CBAM) boosts discriminative focus on leaf regions through dual-channel and spatial attention mechanisms. Experimental results on a self-built cotton leaf dataset reveal that GS-BiFPN-YOLO achieves a bounding box and mask mAP@0.5 of 0.988 and a recall of 0.972, maintaining a computational cost of 9.0 GFLOPs and achieving an inference speed of 322 FPS. In comparison to other lightweight models (YOLOv8n-seg to YOLOv12n-seg), the proposed approach achieves superior segmentation accuracy while preserving high real-time performance. This research offers a practical solution for precise and efficient cotton leaf instance segmentation, thereby facilitating the advancement of intelligent monitoring systems for cotton production. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

15 pages, 3148 KB  
Article
A Cross-Scale Feature Fusion Method for Effectively Enhancing Small Object Detection Performance
by Yaoxing Kang, Yunzuo Zhang, Yaheng Ren and Yu Cheng
Information 2026, 17(1), 25; https://doi.org/10.3390/info17010025 - 31 Dec 2025
Viewed by 259
Abstract
Deep learning-based industrial product surface defect detection methods are replacing manual inspection, while the issue of small object detection remains a key challenge in the current field of surface defect detection. The feature pyramid structures demonstrate great potential in improving the performance of [...] Read more.
Deep learning-based industrial product surface defect detection methods are replacing manual inspection, while the issue of small object detection remains a key challenge in the current field of surface defect detection. The feature pyramid structures demonstrate great potential in improving the performance of small object detection and are one of the important current research directions. Nevertheless, traditional feature pyramid networks still suffer from problems such as imprecise focus on key features, insufficient feature discrimination capabilities, and weak correlations between features. To address these issues, this paper proposes a plug-and-play guided focus feature pyramid network, named GF-FPN. Built on the foundation of FPN, this network is designed with a bottom-up guided aggregation network (GFN): through a lightweight pyramidal attention module (LPAM), star operation, and residual connections, it establishes correlations between objects and local contextual information, as well as between shallow-level details and deep-level semantic features. This enables the feature pyramid network to focus on key features, enhance the ability to distinguish between objects and backgrounds, and thereby improve the model’s small object detection performance. Experimental results on the self-built TinyIndus dataset and NEU-DET demonstrate that the detection model based on GF-FPN exhibits more competitive advantages in object detection compared to existing models. Full article
(This article belongs to the Special Issue Machine Learning in Image Processing and Computer Vision)
Show Figures

Graphical abstract

19 pages, 3524 KB  
Article
Research on Underwater Fish Scale Loss Detection Method Based on Improved YOLOv8m and Transfer Learning
by Qiang Wang, Zhengyang Yu, Renxin Liu, Xingpeng Peng, Xiaoling Yang and Xiuwen He
Fishes 2026, 11(1), 21; https://doi.org/10.3390/fishes11010021 - 29 Dec 2025
Viewed by 250
Abstract
Monitoring fish skin health is essential in aquaculture, where scale loss serves as a critical indicator of fish health and welfare. However, automatic detection of scale loss regions remains challenging due to factors such as uneven underwater illumination, water turbidity, and complex background [...] Read more.
Monitoring fish skin health is essential in aquaculture, where scale loss serves as a critical indicator of fish health and welfare. However, automatic detection of scale loss regions remains challenging due to factors such as uneven underwater illumination, water turbidity, and complex background conditions. To address this issue, we constructed a scale loss dataset comprising approximately 2750 images captured under both clear above-water and complex underwater conditions, featuring over 7200 annotated targets. Various image enhancement techniques were evaluated, and the Clarity method was selected for preprocessing underwater samples to enhance feature representation. Based on the YOLOv8m architecture, we replaced the original FPN + PAN structure with a weighted bidirectional feature pyramid network to improve multi-scale feature fusion. A convolutional block attention module was incorporated into the output layers to highlight scale loss features in both channel and spatial dimensions. Additionally, a two-stage transfer learning strategy was employed, involving pretraining the model on above water data and subsequently fine-tuning it on a limited set of underwater samples to mitigate the effects of domain shift. Experimental results demonstrate that the proposed method achieves a mAP50 of 96.81%, a 5.98 percentage point improvement over the baseline YOLOv8m, with Precision and Recall increased by 10.14% and 8.70%, respectively. This approach reduces false positives and false negatives, showing excellent detection accuracy and robustness in complex underwater environments, offering a practical and effective approach for early fish disease monitoring in aquaculture. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Aquaculture)
Show Figures

Graphical abstract

26 pages, 10427 KB  
Article
Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach
by Peng Luan, Nawei Guo, Libo Li, Bo Li, Zhanmin Zhao, Li Ma and Bo Liu
Agriculture 2026, 16(1), 71; https://doi.org/10.3390/agriculture16010071 - 28 Dec 2025
Viewed by 220
Abstract
The increasing complexity of plant disease manifestations, especially in cases of multiple simultaneous infections, poses significant challenges to sustainable agriculture. To address this issue, we introduce the Apple Leaf Mixed Disease Recognition (ALMDR) model, a novel multi-task learning approach specifically designed for identifying [...] Read more.
The increasing complexity of plant disease manifestations, especially in cases of multiple simultaneous infections, poses significant challenges to sustainable agriculture. To address this issue, we introduce the Apple Leaf Mixed Disease Recognition (ALMDR) model, a novel multi-task learning approach specifically designed for identifying and quantifying mixed disease infections in apple leaves. ALMDR comprises four key modules: a Group Feature Pyramid Network (GFPN) for multi-scale feature extraction, a Multi-Label Classification Head (MLCH) for disease type prediction, a Leaf Segmentation Head (LSH), and a Lesion Segmentation Head (LeSH) for precise delineation of leaf and lesion areas. The GFPN enhances the traditional Feature Pyramid Network (FPN) through differential sampling and grouping strategies, significantly improving the capture of fine-grained disease characteristics. The MLCH enables simultaneous classification of multiple diseases on a single leaf, effectively addressing the mixed infection problem. The segmentation heads (LSH and LeSH) work in tandem to accurately isolate leaf and lesion regions, facilitating detailed analysis of disease patterns. Experimental results on the Plant Pathology 2021-FGVC8 dataset demonstrate ALMDR’s effectiveness, outperforming state-of-the-art methods across multiple tasks. Our model achieves high performance in multi-label classification (F1-score of 93.74%), detection and segmentation (mean Average Precision (mAP) of 51.32% and 45.50%, respectively), and disease severity estimation (R2 = 0.9757). Additionally, the model maintains this accuracy while processing 6.25 frames per second, balancing performance with computational efficiency. ALMDR demonstrates potential for real-time disease management in apple orchards, with possible applications extending to other crops. Full article
(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)
Show Figures

Figure 1

19 pages, 3910 KB  
Article
Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO
by Yicheng Liu, Gaoxia Fan, Hanquan Zhang and Dong Xiao
Mathematics 2026, 14(1), 110; https://doi.org/10.3390/math14010110 - 28 Dec 2025
Viewed by 221
Abstract
Galvanized steel sheets are vital anti-corrosion materials, yet their surface quality is prone to defects that impact performance. Manual inspection is inefficient, while conventional machine vision struggles with complex, small-scale defects in industrial settings. Although deep learning offers promising solutions, standard object detection [...] Read more.
Galvanized steel sheets are vital anti-corrosion materials, yet their surface quality is prone to defects that impact performance. Manual inspection is inefficient, while conventional machine vision struggles with complex, small-scale defects in industrial settings. Although deep learning offers promising solutions, standard object detection models like YOLOv5 (which is short for ‘You Only Look Once’) exhibit limitations in handling the subtle textures, scale variations, and reflective surfaces characteristic of galvanized sheet defects. To address these challenges, this paper proposes S-C-B-YOLO, an enhanced detection model based on YOLOv5. First, a Squeeze-and-Excitation (SE) attention mechanism is integrated into the deep layers of the backbone network to adaptively recalibrate channel-wise features, improving focus on defect-relevant information. Second, a Transformer block is combined with a C3 module to form a C3TR module, enhancing the model’s ability to capture global contextual relationships for irregular defects. Finally, the original path aggregation network (PANet) is replaced with a bidirectional feature pyramid network (Bi-FPN) to facilitate more efficient multi-scale feature fusion, significantly boosting sensitivity to small defects. Extensive experiments on a dedicated galvanized sheet defect dataset show that S-C-B-YOLO achieves a mean average precision (mAP@0.5) of 92.6% and an inference speed of 62 FPS, outperforming several baseline models including YOLOv3, YOLOv7, and Faster R-CNN. The proposed model demonstrates a favorable balance between accuracy and speed, offering a robust and practical solution for automated, real-time defect inspection in galvanized steel production. Full article
(This article belongs to the Special Issue Advance in Neural Networks and Visual Learning)
Show Figures

Figure 1

24 pages, 18949 KB  
Article
KGE–SwinFpn: Knowledge Graph Embedding in Swin Feature Pyramid Networks for Accurate Landslide Segmentation in Remote Sensing Images
by Chunju Zhang, Xiangyu Zhao, Peng Ye, Xueying Zhang, Mingguo Wang, Yifan Pei and Chenxi Li
Remote Sens. 2026, 18(1), 71; https://doi.org/10.3390/rs18010071 - 25 Dec 2025
Viewed by 344
Abstract
Landslide disasters are complex spatiotemporal phenomena. Existing deep learning (DL) models for remote sensing (RS) image analysis primarily exploit shallow visual features, inadequately incorporating critical geological, geographical, and environmental knowledge. This limitation impairs detection accuracy and generalization, especially in complex terrains and diverse [...] Read more.
Landslide disasters are complex spatiotemporal phenomena. Existing deep learning (DL) models for remote sensing (RS) image analysis primarily exploit shallow visual features, inadequately incorporating critical geological, geographical, and environmental knowledge. This limitation impairs detection accuracy and generalization, especially in complex terrains and diverse vegetation conditions. We propose Knowledge Graph Embedding in Swin Feature Pyramid Networks (KGE–SwinFpn), a novel RS landslide segmentation framework that integrates explicit domain knowledge with deep features. First, a comprehensive landslide knowledge graph is constructed, organizing multi-source factors (e.g., lithology, topography, hydrology, rainfall, land cover, etc.) into entities and relations that characterize controlling, inducing, and indicative patterns. A dedicated KGE Block learns embeddings for these entities and discretized factor levels from the landslide knowledge graph, enabling their fusion with multi-scale RS features in SwinFpn. This approach preserves the efficiency of automatic feature learning while embedding prior knowledge guidance, enhancing data–knowledge–model coupling. Experiments demonstrate significant outperformance over classic segmentation networks: on the Yuan-yang dataset, KGE–SwinFpn achieved 96.85% pixel accuracy (PA), 88.46% mean pixel accuracy (MPA), and 82.01% mean intersection over union (MIoU); on the Bijie dataset, it attained 96.28% PA, 90.72% MPA, and 84.47% MIoU. Ablation studies confirm the complementary roles of different knowledge features and the KGE Block’s contribution to robustness in complex terrains. Notably, the KGE Block is architecture-agnostic, suggesting broad applicability for knowledge-guided RS landslide detection and promising enhanced technical support for disaster monitoring and risk assessment. Full article
Show Figures

Figure 1

26 pages, 8829 KB  
Article
YOLO-MSLT: A Multimodal Fusion Network Based on Spatial Linear Transformer for Cattle and Sheep Detection in Challenging Environments
by Yixing Bai, Yongquan Li, Ruoyu Di, Jingye Liu, Xiaole Wang, Chengkai Li and Pan Gao
Agriculture 2026, 16(1), 35; https://doi.org/10.3390/agriculture16010035 - 23 Dec 2025
Viewed by 370
Abstract
Accurate detection of cattle and sheep is a core task in precision livestock farming. However, the complexity of agricultural settings, where visible light images perform poorly under low-light or occluded conditions and infrared images are limited in resolution, poses significant challenges for current [...] Read more.
Accurate detection of cattle and sheep is a core task in precision livestock farming. However, the complexity of agricultural settings, where visible light images perform poorly under low-light or occluded conditions and infrared images are limited in resolution, poses significant challenges for current smart monitoring systems. To tackle these challenges, this study aims to develop a robust multimodal fusion detection network for the accurate and reliable detection of cattle and sheep in complex scenes. To achieve this, we propose YOLO-MSLT, a multimodal fusion detection network based on YOLOv10, which leverages the complementary nature of visible light and infrared data. The core of YOLO-MSLT incorporates a Cross Flatten Fusion Transformer (CFFT), composed of the Linear Cross-modal Spatial Transformer (LCST) and Deep-wise Enhancement (DWE), designed to enhance modality collaboration by performing complementary fusion at the feature level. Furthermore, a Content-Guided Attention Feature Pyramid Network (CGA-FPN) is integrated into the neck to improve the representation of multi-scale object features. Validation was conducted on a cattle and sheep dataset built from 5056 pairs of multimodal images (visible light and infrared) collected in the Manas River Basin, Xinjiang. Results demonstrate that YOLO-MSLT performs robustly in complex terrain, low-light, and occlusion scenarios, achieving an mAP@0.5 of 91.8% and a precision of 93.2%, significantly outperforming mainstream detection models. This research provides an impactful and practical solution for cattle and sheep detection in challenging agricultural environments. Full article
(This article belongs to the Section Farm Animal Production)
Show Figures

Figure 1

24 pages, 1837 KB  
Article
SD-GASNet: Efficient Dual-Domain Multi-Scale Fusion Network with Self-Distillation for Surface Defect Detection
by Jiahao Fu, Zili Zhang, Tao Peng, Xinrong Hu and Jun Zhang
Sensors 2026, 26(1), 23; https://doi.org/10.3390/s26010023 - 19 Dec 2025
Viewed by 356
Abstract
Surface defect detection is vital in industrial quality control. While deep learning has largely automated inspection, accurately locating defects with large-scale variations or those difficult to distinguish from similar backgrounds remains challenging. Furthermore, achieving high-precision and real-time performance under limited computational resources in [...] Read more.
Surface defect detection is vital in industrial quality control. While deep learning has largely automated inspection, accurately locating defects with large-scale variations or those difficult to distinguish from similar backgrounds remains challenging. Furthermore, achieving high-precision and real-time performance under limited computational resources in deployment environments complicates effective solutions. In this work, we propose SD-GASNet, a network based on a self-distillation model compression strategy. To identify subtle defects, we design an Alignment, Enhancement, and Synchronization Feature Pyramid Network (AES-FPN) fusion network incorporating the Frequency Domain Information Gathering-and-Allocation (FIGA) mechanism and the Channel Synchronization (CS) module for industrial images from different sensors. Specifically, FIGA refines features via the Multi-scale Feature Alignment (MFA) module, then the Frequency-Guided Perception Enhancement Module (FGPEM) extracts high- and low-frequency information to enhance spatial representation. The CS module compensates for information loss during feature fusion. Addressing computational constraints, we adopt self-distillation with an Enhanced KL divergence loss function to boost lightweight model performance. Extensive experiments on three public datasets (NEU-DET, PCB, and TILDA) demonstrate that SD-GASNet achieves state-of-the-art performance with excellent generalization, delivering superior accuracy and a competitive inference speed of 180 FPS, offering a robust and generalizable solution for sensor-based industrial imaging applications. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

14 pages, 17578 KB  
Article
A Two-Stage High-Precision Recognition and Localization Framework for Key Components on Industrial PCBs
by Li Wang, Liu Ouyang, Huiying Weng, Xiang Chen, Anna Wang and Kexin Zhang
Mathematics 2026, 14(1), 4; https://doi.org/10.3390/math14010004 - 19 Dec 2025
Viewed by 223
Abstract
Precise recognition and localization of electronic components on printed circuit boards (PCBs) are crucial for industrial automation tasks, including robotic disassembly, high-precision assembly, and quality inspection. However, strong visual interference from silkscreen characters, copper traces, solder pads, and densely packed small components often [...] Read more.
Precise recognition and localization of electronic components on printed circuit boards (PCBs) are crucial for industrial automation tasks, including robotic disassembly, high-precision assembly, and quality inspection. However, strong visual interference from silkscreen characters, copper traces, solder pads, and densely packed small components often degrades the accuracy of deep learning-based detectors, particularly under complex industrial imaging conditions. This paper presents a two-stage, coarse-to-fine PCB component localization framework based on an optimized YOLOv11 architecture and a sub-pixel geometric refinement module. The proposed method enhances the backbone with a Convolutional Block Attention Module (CBAM) to suppress background noise and strengthen discriminative features. It also integrates a tiny-object detection branch and a weighted Bi-directional Feature Pyramid Network (BiFPN) for more effective multi-scale feature fusion, and it employs a customized hybrid loss with vertex-offset supervision to enable pose-aware bounding box regression. In the second stage, the coarse predictions guide contour-based sub-pixel fitting using template geometry to achieve industrial-grade precision. Experiments show significant improvements over baseline YOLOv11, particularly for small and densely arranged components, indicating that the proposed approach meets the stringent requirements of industrial robotic disassembly. Full article
(This article belongs to the Special Issue Complex Process Modeling and Control Based on AI Technology)
Show Figures

Figure 1

18 pages, 3588 KB  
Article
CE-FPN-YOLO: A Contrast-Enhanced Feature Pyramid for Detecting Concealed Small Objects in X-Ray Baggage Images
by Qianxiang Cheng, Zhanchuan Cai, Yi Lin, Jiayao Li and Ting Lan
Mathematics 2025, 13(24), 4012; https://doi.org/10.3390/math13244012 - 16 Dec 2025
Viewed by 903
Abstract
Accurate detection of concealed items in X-ray baggage images is critical for public safety in high-security environments such as airports and railway stations. However, small objects with low material contrast, such as plastic lighters, remain challenging to identify due to background clutter, overlapping [...] Read more.
Accurate detection of concealed items in X-ray baggage images is critical for public safety in high-security environments such as airports and railway stations. However, small objects with low material contrast, such as plastic lighters, remain challenging to identify due to background clutter, overlapping contents, and weak edge features. In this paper, we propose a novel architecture called the Contrast-Enhanced Feature Pyramid Network (CE-FPN), designed to be integrated into the YOLO detection framework. CE-FPN introduces a contrast-guided multi-branch fusion module that enhances small-object representations by emphasizing texture boundaries and improving semantic consistency across feature levels. When incorporated into YOLO, the proposed CE-FPN significantly boosts detection accuracy on the HiXray dataset, achieving up to a +10.1% improvement in mAP@50 for the nonmetallic lighter class and an overall +1.6% gain, while maintaining low computational overhead. In addition, the model attains a mAP@50 of 84.0% under low-resolution settings and 87.1% under high-resolution settings, further demonstrating its robustness across different input qualities. These results demonstrate that CE-FPN effectively enhances YOLO’s capability in detecting small and concealed objects, making it a promising solution for real-world security inspection applications. Full article
Show Figures

Figure 1

Back to TopTop