Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (921)

Search Parameters:
Keywords = faster-rcnn

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 24793 KB  
Article
SAR-ESAE: Echo Signal-Guided Adversarial Example Generation Method for Synthetic Aperture Radar Target Detection
by Jiahao Cui, Jiale Duan, Wang Guo, Chengli Peng and Haifeng Li
Remote Sens. 2025, 17(17), 3080; https://doi.org/10.3390/rs17173080 - 4 Sep 2025
Viewed by 201
Abstract
Synthetic Aperture Radar (SAR) target detection models are highly vulnerable to adversarial attacks, which significantly reduce detection performance and robustness. Existing adversarial SAR target detection approaches mainly focus on the image domain and neglect the critical role of signal propagation, making it difficult [...] Read more.
Synthetic Aperture Radar (SAR) target detection models are highly vulnerable to adversarial attacks, which significantly reduce detection performance and robustness. Existing adversarial SAR target detection approaches mainly focus on the image domain and neglect the critical role of signal propagation, making it difficult to fully capture the connection between the physical space and the image domain. To address this limitation, we propose an Echo Signal-Guided Adversarial Example Generation method for SAR target detection (SAR-ESAE). The core idea is to embed adversarial perturbations into SAR echo signals and propagate them through the imaging and inverse scattering processes, thereby establishing a unified attack framework across the signal, image, and physical spaces. In this way, perturbations not only appear as pixel-level distortions in SAR images but also alter the scattering characteristics of 3D target models in the physical space. Simulation experiments in the Scenario-SAR dataset demonstrate that the SAR-ESAE method reduces the mean Average Precision of the YOLOv3 model by 23.5% and 8.6% compared to Dpatch and RaLP attacks, respectively. Additionally, it exhibits excellent attack effectiveness in both echo signal and target model attack experiments and exhibits evident adversarial transferability across detection models with different architectures, such as Faster-RCNN and FCOS. Full article
Show Figures

Figure 1

27 pages, 1157 KB  
Article
An Ultra-Lightweight and High-Precision Underwater Object Detection Algorithm for SAS Images
by Deyin Xu, Yisong He, Jiahui Su, Lu Qiu, Lixiong Lin, Jiachun Zheng and Zhiping Xu
Remote Sens. 2025, 17(17), 3027; https://doi.org/10.3390/rs17173027 - 1 Sep 2025
Viewed by 345
Abstract
Underwater Object Detection (UOD) based on Synthetic Aperture Sonar (SAS) images is one of the core tasks of underwater intelligent perception systems. However, the existing UOD methods suffer from excessive model redundancy, high computational demands, and severe image quality degradation due to noise. [...] Read more.
Underwater Object Detection (UOD) based on Synthetic Aperture Sonar (SAS) images is one of the core tasks of underwater intelligent perception systems. However, the existing UOD methods suffer from excessive model redundancy, high computational demands, and severe image quality degradation due to noise. To mitigate these issues, this paper proposes an ultra-lightweight and high-precision underwater object detection method for SAS images. Based on a single-stage detection framework, four efficient and representative lightweight modules are developed, focusing on three key stages: feature extraction, feature fusion, and feature enhancement. For feature extraction, the Dilated-Attention Aggregation Feature Module (DAAFM) is introduced, which leverages a multi-scale Dilated Attention mechanism for strengthening the model’s capability to perceive key information, thereby improving the expressiveness and spatial coverage of extracted features. For feature fusion, the Channel–Spatial Parallel Attention with Gated Enhancement (CSPA-Gate) module is proposed, which integrates channel–spatial parallel modeling and gated enhancement to achieve effective fusion of multi-level semantic features and dynamic response to salient regions. In terms of feature enhancement, the Spatial Gated Channel Attention Module (SGCAM) is introduced to strengthen the model’s ability to discriminate the importance of feature channels through spatial gating, thereby improving robustness to complex background interference. Furthermore, the Context-Aware Feature Enhancement Module (CAFEM) is designed to guide feature learning using contextual structural information, enhancing semantic consistency and feature stability from a global perspective. To alleviate the challenge of limited sample size of real sonar images, a diffusion generative model is employed to synthesize a set of pseudo-sonar images, which are then combined with the real sonar dataset to construct an augmented training set. A two-stage training strategy is proposed: the model is first trained on the real dataset and then fine-tuned on the synthetic dataset to enhance generalization and improve detection robustness. The SCTD dataset results confirm that the proposed technique achieves better precision than the baseline model with only 10% of its parameter size. Notably, on a hybrid dataset, the proposed method surpasses Faster R-CNN by 10.3% in mAP50 while using only 9% of its parameters. Full article
(This article belongs to the Special Issue Underwater Remote Sensing: Status, New Challenges and Opportunities)
Show Figures

Figure 1

19 pages, 2846 KB  
Article
Cross-Domain Object Detection with Hierarchical Multi-Scale Domain Adaptive YOLO
by Sihan Zhu, Peipei Zhu, Yuan Wu and Wensheng Qiao
Sensors 2025, 25(17), 5363; https://doi.org/10.3390/s25175363 - 29 Aug 2025
Viewed by 365
Abstract
To alleviate the performance degradation caused by domain shift, domain adaptive object detection (DAOD) has achieved compelling success in recent years. DAOD aims to improve the model’s detection performance on the target domain by reducing the distribution discrepancy between different domains. However, most [...] Read more.
To alleviate the performance degradation caused by domain shift, domain adaptive object detection (DAOD) has achieved compelling success in recent years. DAOD aims to improve the model’s detection performance on the target domain by reducing the distribution discrepancy between different domains. However, most existing methods are built on two-stage Faster RCNN, which is not suitable for real applications due to the detection efficiency. In this paper, we propose a novel Hierarchical Multi-scale Domain Adaptive (HMDA) method by integrating a simple but effective one-stage YOLO framework. HMDAYOLO mainly consists of the hierarchical backbone adaptation and the multi-scale head adaptation. The former performs hierarchical adaptation based on the differences in representational information of features at different depths of the backbone network, which promotes comprehensive distribution alignment and suppresses the negative transfer. The latter makes full use of the rich discriminative information in the feature maps to be detected for multi-scale adaptation. Additionally, it can reduce local instance divergence and ensure the model’s multi-scale detection capability. In this way, HMDA can improve the model’s generalization ability while ensuring its discriminating capability. We empirically verify the effectiveness of our method on four cross-domain object detection scenarios, comprising different domain shifts. Experimental results and analyses demonstrate that HMDA-YOLO can achieve competitive performance with real-time detection efficiency. Full article
(This article belongs to the Special Issue Advanced Signal Processing for Affective Computing)
Show Figures

Figure 1

16 pages, 2127 KB  
Article
VIPS: Learning-View-Invariant Feature for Person Search
by Hexu Wang, Wenlong Luo, Wei Wu, Fei Xie, Jindong Liu, Jing Li and Shizhou Zhang
Sensors 2025, 25(17), 5362; https://doi.org/10.3390/s25175362 - 29 Aug 2025
Viewed by 301
Abstract
Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search—detecting and identifying individuals across distributed camera networks—presents unique challenges. Severe viewpoint variations, occlusions, and cluttered [...] Read more.
Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search—detecting and identifying individuals across distributed camera networks—presents unique challenges. Severe viewpoint variations, occlusions, and cluttered backgrounds in UAV-captured data degrade the performance of conventional discriminative models, which struggle to maintain robustness under such geometric and semantic disparities. To address this, we propose view-invariant person search (VIPS), a novel two-stage framework combining Faster R-CNN with a view-invariant re-Identification (VIReID) module. Unlike conventional discriminative models, VIPS leverages the semantic flexibility of large vision–language models (VLMs) and adopts a two-stage training strategy to decouple and align text-based ID descriptors and visual features, enabling robust cross-view matching through shared semantic embeddings. To mitigate noise from occlusions and cluttered UAV-captured backgrounds, we introduce a learnable mask generator for feature purification. Furthermore, drawing from vision–language models, we design view prompts to explicitly encode perspective shifts into feature representations, enhancing adaptability to UAV-induced viewpoint changes. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, with ablation studies validating the efficacy of each component. Beyond technical advancements, this work highlights the potential of VLM-derived semantic alignment for UAV applications, offering insights for future research in real-time UAV-based surveillance systems. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

26 pages, 23082 KB  
Article
SPyramidLightNet: A Lightweight Shared Pyramid Network for Efficient Underwater Debris Detection
by Yi Luo and Osama Eljamal
Appl. Sci. 2025, 15(17), 9404; https://doi.org/10.3390/app15179404 - 27 Aug 2025
Viewed by 353
Abstract
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, [...] Read more.
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, this paper proposes a novel lightweight object detection network named the Shared Pyramid Lightweight Network (SPyramidLightNet). The network adopts an improved architecture based on YOLOv11 and achieves an optimal balance between detection performance and computational efficiency by integrating three core innovative modules. First, the Split–Merge Attention Block (SMAB) employs a dynamic kernel selection mechanism and split–merge strategy, significantly enhancing feature representation capability through adaptive multi-scale feature fusion. Second, the C3 GroupNorm Detection Head (C3GNHead) introduces a shared convolution mechanism and GroupNorm normalization strategy, substantially reducing the computational complexity of the detection head while maintaining detection accuracy. Finally, the Shared Pyramid Convolution (SPyramidConv) replaces traditional pooling operations with a parameter-sharing multi-dilation-rate convolution architecture, achieving more refined and efficient multi-scale feature aggregation. Extensive experiments on underwater debris datasets demonstrate that SPyramidLightNet achieves 0.416 on the mAP@0.5:0.95 metric, significantly outperforming mainstream algorithms including Faster-RCNN, SSD, RT-DETR, and the YOLO series. Meanwhile, compared to the baseline YOLOv11, the proposed algorithm achieves an 11.8% parameter compression and a 17.5% computational complexity reduction, with an inference speed reaching 384 FPS, meeting the stringent requirements for real-time detection. Ablation experiments and visualization analyses further validate the effectiveness and synergistic effects of each core module. This research provides important theoretical guidance for the design of lightweight object detection algorithms and lays a solid foundation for the development of automated underwater debris recognition and removal technologies. Full article
Show Figures

Figure 1

37 pages, 10467 KB  
Article
Cascaded Hierarchical Attention with Adaptive Fusion for Visual Grounding in Remote Sensing
by Huming Zhu, Tianqi Gao, Zhixian Li, Zhipeng Chen, Qiuming Li, Kongmiao Miao, Biao Hou and Licheng Jiao
Remote Sens. 2025, 17(17), 2930; https://doi.org/10.3390/rs17172930 - 23 Aug 2025
Viewed by 486
Abstract
Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have [...] Read more.
Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have more prominent grounding accuracy than small objects. Based on Faster R-CNN, we propose Faster R-CNN in Visual Grounding for Remote Sensing (FR-RSVG), a two-stage method for grounding RS objects. Building on this foundation, to enhance the ability to ground multi-scale objects, we propose Faster R-CNN with Adaptive Vision-Language Fusion (FR-AVLF), which introduces a layered Adaptive Vision-Language Fusion (AVLF) module. Specifically, this method can adaptively fuse deep or shallow visual features according to the input text (e.g., location-related or object characteristic descriptions), thereby optimizing semantic feature representation and improving grounding accuracy for objects of different scales. Given that RSVG is essentially an expanded form of RS object detection, and considering the knowledge the model acquired in prior RS object detection tasks, we propose Faster R-CNN with Adaptive Vision-Language Fusion Pretrained (FR-AVLFPRE). To further enhance model performance, we propose Faster R-CNN with Cascaded Hierarchical Attention Grounding and Multi-Level Adaptive Vision-Language Fusion Pretrained (FR-CHAGAVLFPRE), which introduces a cascaded hierarchical attention grounding mechanism, employs a more advanced language encoder, and improves upon AVLF by proposing Multi-Level AVLF, significantly improving localization accuracy in complex scenarios. Extensive experiments on the DIOR-RSVG dataset demonstrate that our model surpasses most existing advanced models. To validate the generalization capability of our model, we conducted zero-shot inference experiments on shared categories between DIOR-RSVG and both Complex Description DIOR-RSVG (DIOR-RSVG-C) and OPT-RSVG datasets, achieving performance superior to most existing models. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

20 pages, 1818 KB  
Article
Image Captioning Model Based on Multi-Step Cross-Attention Cross-Modal Alignment and External Commonsense Knowledge Augmentation
by Liang Wang, Meiqing Jiao, Zhihai Li, Mengxue Zhang, Haiyan Wei, Yuru Ma, Honghui An, Jiaqi Lin and Jun Wang
Electronics 2025, 14(16), 3325; https://doi.org/10.3390/electronics14163325 - 21 Aug 2025
Viewed by 612
Abstract
To address the semantic mismatch between limited textual descriptions in image captioning training datasets and the multi-semantic nature of images, as well as the underutilized external commonsense knowledge, this article proposes a novel image captioning model based on multi-step cross-attention cross-modal alignment and [...] Read more.
To address the semantic mismatch between limited textual descriptions in image captioning training datasets and the multi-semantic nature of images, as well as the underutilized external commonsense knowledge, this article proposes a novel image captioning model based on multi-step cross-attention cross-modal alignment and external commonsense knowledge enhancement. The model employs a backbone architecture comprising CLIP’s ViT visual encoder, Faster R-CNN, BERT text encoder, and GPT-2 text decoder. It incorporates two core mechanisms: a multi-step cross-attention mechanism that iteratively aligns image and text features across multiple rounds, progressively enhancing inter-modal semantic consistency for more accurate cross-modal representation fusion. Moreover, the model employs Faster R-CNN to extract region-based object features. These features are mapped to corresponding entities within the dataset through entity probability calculation and entity linking. External commonsense knowledge associated with these entities is then retrieved from the ConceptNet knowledge graph, followed by knowledge embedding via TransE and multi-hop reasoning. Finally, the fused multimodal features are fed into the GPT-2 decoder to steer caption generation, enhancing the lexical richness, factual accuracy, and cognitive plausibility of the generated descriptions. In the experiments, the model achieves CIDEr scores of 142.6 on MSCOCO and 78.4 on Flickr30k. Ablations confirm both modules enhance caption quality. Full article
Show Figures

Figure 1

19 pages, 7468 KB  
Article
A Comparative Study of Hybrid Machine-Learning vs. Deep-Learning Approaches for Varroa Mite Detection and Counting
by Amira Ghezal and Andreas König
Sensors 2025, 25(16), 5075; https://doi.org/10.3390/s25165075 - 15 Aug 2025
Viewed by 391
Abstract
This study presents a comparative evaluation of traditional machine-learning (ML) and deep-learning (DL) approaches for detecting and counting Varroa destructor mites in hyperspectral images. As Varroa infestations pose a serious threat to honeybee health, accurate and efficient detection methods are essential. The ML [...] Read more.
This study presents a comparative evaluation of traditional machine-learning (ML) and deep-learning (DL) approaches for detecting and counting Varroa destructor mites in hyperspectral images. As Varroa infestations pose a serious threat to honeybee health, accurate and efficient detection methods are essential. The ML pipeline—based on Principal Component Analysis (PCA), k-Nearest Neighbors (kNN), and Support Vector Machine (SVM)—was previously published and achieved high performance (precision = 0.9983, recall = 0.9947), with training and inference completed in seconds on standard CPU hardware. In contrast, the DL approach, employing Faster R-CNN with ResNet-50 and ResNet-101 backbones, was fine-tuned on the same manually annotated images. Despite requiring GPU acceleration, longer training times, and presenting a reproducibility challenges, the deep-learning models achieved precision of 0.966 and 0.971, recall of 0.757 and 0.829, and F1-Score of 0.848 and 0.894 for ResNet-50 and ResNet-101, respectively. Qualitative results further demonstrate the robustness of the ML method under limited-data conditions. These findings highlight the differences between ML and DL approaches in resource-constrained scenarios and offer practical guidance for selecting suitable detection strategies. Full article
Show Figures

Figure 1

28 pages, 14601 KB  
Article
Balancing Accuracy and Computational Efficiency: A Faster R-CNN with Foreground-Background Segmentation-Based Spatial Attention Mechanism for Wild Plant Recognition
by Zexuan Cui, Zhibo Chen and Xiaohui Cui
Plants 2025, 14(16), 2533; https://doi.org/10.3390/plants14162533 - 14 Aug 2025
Viewed by 382
Abstract
Computer vision recognition technology, due to its non-invasive and convenient nature, can effectively avoid damage to fragile wild plants during recognition. However, balancing model complexity, recognition accuracy, and data processing difficulty on resource-constrained hardware is a critical issue that needs to be addressed. [...] Read more.
Computer vision recognition technology, due to its non-invasive and convenient nature, can effectively avoid damage to fragile wild plants during recognition. However, balancing model complexity, recognition accuracy, and data processing difficulty on resource-constrained hardware is a critical issue that needs to be addressed. To tackle these challenges, we propose an improved lightweight Faster R-CNN architecture named ULS-FRCN. This architecture includes three key improvements: a Light Bottleneck module based on depthwise separable convolution to reduce model complexity; a Split SAM lightweight spatial attention mechanism to improve recognition accuracy without increasing model complexity; and unsharp masking preprocessing to enhance model performance while reducing data processing difficulty and training costs. We validated the effectiveness of ULS-FRCN using five representative wild plants from the PlantCLEF 2015 dataset. Ablation experiments and multi-dataset generalization tests show that ULS-FRCN significantly outperforms the baseline model in terms of mAP, mean F1 score, and mean recall, with improvements of 12.77%, 0.01, and 9.07%, respectively. Compared to the original Faster R-CNN, our lightweight design and attention mechanism reduce training parameters, improve inference speed, and enhance computational efficiency. This approach is suitable for deployment on resource-constrained forestry devices, enabling efficient plant identification and management without the need for high-performance servers. Full article
Show Figures

Figure 1

23 pages, 7313 KB  
Article
Marine Debris Detection in Real Time: A Lightweight UTNet Model
by Junqi Cui, Shuyi Zhou, Guangjun Xu, Xiaodong Liu and Xiaoqian Gao
J. Mar. Sci. Eng. 2025, 13(8), 1560; https://doi.org/10.3390/jmse13081560 - 14 Aug 2025
Viewed by 543
Abstract
The increasingly severe issue of marine debris presents a critical threat to the sustainable development of marine ecosystems. Real-time detection is essential for timely intervention and cleanup. Furthermore, the density of marine debris exhibits significant depth-dependent variation, resulting in degraded detection accuracy. Based [...] Read more.
The increasingly severe issue of marine debris presents a critical threat to the sustainable development of marine ecosystems. Real-time detection is essential for timely intervention and cleanup. Furthermore, the density of marine debris exhibits significant depth-dependent variation, resulting in degraded detection accuracy. Based on 9625 publicly available underwater images spanning various depths, this study proposes UTNet, a lightweight neural model, to improve the effectiveness of real-time intelligent identification of marine debris through multidimensional optimization. Compared to Faster R-CNN, SSD, and YOLOv5/v8/v11/v12, the UTNet model demonstrates enhanced performance in random image detection, achieving maximum improvements of 3.5% in mAP50 and 9.3% in mAP50-95, while maintaining reduced parameter count and low computational complexity. The UTNet model is further evaluated on underwater videos for real-time debris recognition at varying depths to validate its capability. Results show that the UTNet model exhibits a consistently increasing trend in confidence levels across different depths as detection distance decreases, with peak values of 0.901 at the surface and 0.764 at deep-sea levels. In contrast, the other six models display greater performance fluctuations and fail to maintain detection stability, particularly at intermediate and deep depths, with evident false positives and missed detections. In summary, the lightweight UTNet model developed in this study achieves high detection accuracy and computational efficiency, enabling real-time, high-precision detection of marine debris at varying depths and ultimately benefiting mitigation and cleanup efforts. Full article
(This article belongs to the Section Marine Pollution)
Show Figures

Figure 1

18 pages, 2151 KB  
Article
Drone-Assisted Plant Stress Detection Using Deep Learning: A Comparative Study of YOLOv8, RetinaNet, and Faster R-CNN
by Yousef-Awwad Daraghmi, Waed Naser, Eman Yaser Daraghmi and Hacene Fouchal
AgriEngineering 2025, 7(8), 257; https://doi.org/10.3390/agriengineering7080257 - 11 Aug 2025
Viewed by 560
Abstract
Drones have been widely used in precision agriculture to capture high-resolution images of crops, providing farmers with advanced insights into crop health, growth patterns, nutrient deficiencies, and pest infestations. Although several machine and deep learning models have been proposed for plant stress and [...] Read more.
Drones have been widely used in precision agriculture to capture high-resolution images of crops, providing farmers with advanced insights into crop health, growth patterns, nutrient deficiencies, and pest infestations. Although several machine and deep learning models have been proposed for plant stress and disease detection, their performance regarding accuracy and computational time still requires improvement, particularly under limited data. Therefore, this paper aims to address these challenges by conducting a comparative analysis of three State-of-the-Art object detection deep learning models: YOLOv8, RetinaNet, and Faster R-CNN, and their variants to identify the model with the best performance. To evaluate the models, the research uses a real-world dataset from potato farms containing images of healthy and stressed plants, with stress resulting from biotic and abiotic factors. The models are evaluated under limited conditions with original data of size 360 images and expanded conditions with augmented data of size 1560 images. The results show that YOLOv8 variants outperform the other models by achieving larger mAP@50 values and lower inference times on both the original and augmented datasets. The YOLOv8 variants achieve mAP@50 ranging from 0.798 to 0.861 and inference times ranging from 11.8 ms to 134.3 ms, while RetinaNet variants achieve mAP@50 ranging from 0.587 to 0.628 and inference times ranging from 118.7 ms to 158.8 ms, and Faster R-CNN variants achieve mAP@50 ranging from 0.587 to 0.628 and inference times ranging from 265 ms to 288 ms. These findings highlight YOLOv8’s robustness, speed, and suitability for real-time aerial crop monitoring, particularly in data-constrained environments. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
Show Figures

Figure 1

21 pages, 9664 KB  
Article
A Detection Approach for Wheat Spike Recognition and Counting Based on UAV Images and Improved Faster R-CNN
by Donglin Wang, Longfei Shi, Huiqing Yin, Yuhan Cheng, Shaobo Liu, Siyu Wu, Guangguang Yang, Qinge Dong, Jiankun Ge and Yanbin Li
Plants 2025, 14(16), 2475; https://doi.org/10.3390/plants14162475 - 9 Aug 2025
Viewed by 450
Abstract
This study presents an innovative unmanned aerial vehicle (UAV)-based intelligent detection method utilizing an improved Faster Region-based Convolutional Neural Network (Faster R-CNN) architecture to address the inefficiency and inaccuracy inherent in manual wheat spike counting. We systematically collected a high-resolution image dataset (2000 [...] Read more.
This study presents an innovative unmanned aerial vehicle (UAV)-based intelligent detection method utilizing an improved Faster Region-based Convolutional Neural Network (Faster R-CNN) architecture to address the inefficiency and inaccuracy inherent in manual wheat spike counting. We systematically collected a high-resolution image dataset (2000 images, 4096 × 3072 pixels) covering key growth stages (heading, grain filling, and maturity) of winter wheat (Triticum aestivum L.) during 2022–2023 using a DJI M300 RTK equipped with multispectral sensors. The dataset encompasses diverse field scenarios under five fertilization treatments (organic-only, organic–inorganic 7:3 and 3:7 ratios, inorganic-only, and no fertilizer) and two irrigation regimes (full and deficit irrigation), ensuring representativeness and generalizability. For model development, we replaced conventional VGG16 with ResNet-50 as the backbone network, incorporating residual connections and channel attention mechanisms to achieve 92.1% mean average precision (mAP) while reducing parameters from 135 M to 77 M (43% decrease). The GFLOPS of the improved model has been reduced from 1.9 to 1.7, an decrease of 10.53%, and the computational efficiency of the model has been improved. Performance tests demonstrated a 15% reduction in missed detection rate compared to YOLOv8 in dense canopies, with spike count regression analysis yielding R2 = 0.88 (p < 0.05) against manual measurements and yield prediction errors below 10% for optimal treatments. To validate robustness, we established a dedicated 500-image test set (25% of total data) spanning density gradients (30–80 spikes/m2) and varying illumination conditions, maintaining >85% accuracy even under cloudy weather. Furthermore, by integrating spike recognition with agronomic parameters (e.g., grain weight), we developed a comprehensive yield estimation model achieving 93.5% accuracy under optimal water–fertilizer management (70% ETc irrigation with 3:7 organic–inorganic ratio). This work systematically addresses key technical challenges in automated spike detection through standardized data acquisition, lightweight model design, and field validation, offering significant practical value for smart agriculture development. Full article
(This article belongs to the Special Issue Plant Phenotyping and Machine Learning)
Show Figures

Figure 1

16 pages, 2338 KB  
Article
YOLOv5s-TC: An Improved Intelligent Model for Insulator Fault Detection Based on YOLOv5s
by Yingying Yin, Yunpeng Duan, Xin Wang, Shuo Han and Chengyang Zhou
Sensors 2025, 25(16), 4893; https://doi.org/10.3390/s25164893 - 8 Aug 2025
Viewed by 304
Abstract
Insulators play a pivotal role in power grid infrastructure, offering indispensable electrical insulation and mechanical support. Precise and efficient detection of insulator faults is of paramount importance for safeguarding grid reliability and ensuring operational safety. With the rapid advancements in UAV (unmanned aerial [...] Read more.
Insulators play a pivotal role in power grid infrastructure, offering indispensable electrical insulation and mechanical support. Precise and efficient detection of insulator faults is of paramount importance for safeguarding grid reliability and ensuring operational safety. With the rapid advancements in UAV (unmanned aerial vehicle) technology and deep learning, there has been a notable transition from traditional manual inspections to automated UAV-based detection systems. To further enhance detection accuracy, this study conducts a series of systematic improvements to the YOLOv5s model and proposes an advanced intelligent insulator detection model, namely YOLOv5s-TC. Firstly, this new model replaces the C3 (Cross Stage Partial Bottleneck with 3 convolutions) module with Bottleneck Transformers to enhance feature learning ability. Secondly, the CBAM (Convolutional Block Attention Module) is introduced to make the model focus more on the key features of the images, thus improving the target localization ability. Finally, the improved loss function named OSIoU is adopted to further enhance detection accuracy. Comparative experiments demonstrate that YOLOv5s-TC achieves significant performance gains, with mean average precision improvements of 4.4%, 24.5%, and 13.9% over the original YOLOv5s, Faster R-CNN, and SSD models, respectively. The results indicate that YOLOv5s-TC offers superior detection performance and greater reliability for practical power grid inspection applications. Full article
Show Figures

Figure 1

23 pages, 4350 KB  
Article
Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network
by Bo Liu, Junhua Wang, Qing An, Yanglu Wan, Jianing Zhou and Xijiang Chen
Symmetry 2025, 17(8), 1269; https://doi.org/10.3390/sym17081269 - 8 Aug 2025
Viewed by 367
Abstract
Fire detection primarily relies on sensors such as smoke detectors, heat detectors, and flame detectors. However, due to cost constraints, it is impractical to deploy such a large number of sensors for fire detection in outdoor gardens and landscapes. To address this challenge [...] Read more.
Fire detection primarily relies on sensors such as smoke detectors, heat detectors, and flame detectors. However, due to cost constraints, it is impractical to deploy such a large number of sensors for fire detection in outdoor gardens and landscapes. To address this challenge and aiming to enhance fire detection accuracy in gardens while achieving lightweight design, this paper proposes an improved symmetry SSS-YOLOv8 model for lightweight fire detection in garden video surveillance. Firstly, the SPDConv layer from ShuffleNetV2 is used to preserve flame or smoke information, combined with the Conv_Maxpool layer to reduce computational complexity. Subsequently, the SE module is introduced into the backbone feature extraction network to enhance features specific to fire and smoke. ShuffleNetV2 and the SE module are configured into a symmetric local network structure to enhance the extraction of flame or smoke features. Finally, WIoU is introduced as the bounding box regression loss function to further ensure the detection performance of the symmetry SSS-YOLOv8 model. Experimental results demonstrate that the improved symmetry SSS-YOLOv8 model achieves precision and recall rates for garden flame and smoke detection both exceeding 0.70. Compared to the YOLOv8n model, it exhibits a 2.1 percentage point increase in mAP, while its parameter is only 1.99 M, reduced to 65.7% of the original model. The proposed model demonstrates superior detection accuracy for garden fires compared to other YOLO series models of the same type, as well as different types of SSD and Faster R-CNN models. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

15 pages, 4258 KB  
Article
Complex-Scene SAR Aircraft Recognition Combining Attention Mechanism and Inner Convolution Operator
by Wansi Liu, Huan Wang, Jiapeng Duan, Lixiang Cao, Teng Feng and Xiaomin Tian
Sensors 2025, 25(15), 4749; https://doi.org/10.3390/s25154749 - 1 Aug 2025
Viewed by 397
Abstract
Synthetic aperture radar (SAR), as an active microwave imaging system, has the capability of all-weather and all-time observation. In response to the challenges of aircraft detection in SAR images due to the complex background interference caused by the continuous scattering of airport buildings [...] Read more.
Synthetic aperture radar (SAR), as an active microwave imaging system, has the capability of all-weather and all-time observation. In response to the challenges of aircraft detection in SAR images due to the complex background interference caused by the continuous scattering of airport buildings and the demand for real-time processing, this paper proposes a YOLOv7-MTI recognition model that combines the attention mechanism and involution. By integrating the MTCN module and involution, performance is enhanced. The Multi-TASP-Conv network (MTCN) module aims to effectively extract low-level semantic and spatial information using a shared lightweight attention gate structure to achieve cross-dimensional interaction between “channels and space” with very few parameters, capturing the dependencies among multiple dimensions and improving feature representation ability. Involution helps the model adaptively adjust the weights of spatial positions through dynamic parameterized convolution kernels, strengthening the discrete strong scattering points specific to aircraft and suppressing the continuous scattering of the background, thereby alleviating the interference of complex backgrounds. Experiments on the SAR-AIRcraft-1.0 dataset, which includes seven categories such as A220, A320/321, A330, ARJ21, Boeing737, Boeing787, and others, show that the mAP and mRecall of YOLOv7-MTI reach 93.51% and 96.45%, respectively, outperforming Faster R-CNN, SSD, YOLOv5, YOLOv7, and YOLOv8. Compared with the basic YOLOv7, mAP is improved by 1.47%, mRecall by 1.64%, and FPS by 8.27%, achieving an effective balance between accuracy and speed, providing research ideas for SAR aircraft recognition. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

Back to TopTop