Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (383)

Search Parameters:
Keywords = three-scale fusion module

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 24211 KB  
Article
BMDNet-YOLO: A Lightweight and Robust Model for High-Precision Real-Time Recognition of Blueberry Maturity
by Huihui Sun and Rui-Feng Wang
Horticulturae 2025, 11(10), 1202; https://doi.org/10.3390/horticulturae11101202 - 5 Oct 2025
Abstract
Accurate real-time detection of blueberry maturity is vital for automated harvesting. However, existing methods often fail under occlusion, variable lighting, and dense fruit distribution, leading to reduced accuracy and efficiency. To address these challenges, we designed a lightweight deep learning framework that integrates [...] Read more.
Accurate real-time detection of blueberry maturity is vital for automated harvesting. However, existing methods often fail under occlusion, variable lighting, and dense fruit distribution, leading to reduced accuracy and efficiency. To address these challenges, we designed a lightweight deep learning framework that integrates improved feature extraction, attention-based fusion, and progressive transfer learning to enhance robustness and adaptability To overcome these challenges, we propose BMDNet-YOLO, a lightweight model based on an enhanced YOLOv8n. The backbone incorporates a FasterPW module with parallel convolution and point-wise weighting to improve feature extraction efficiency and robustness. A coordinate attention (CA) mechanism in the neck enhances spatial-channel feature selection, while adaptive weighted concatenation ensures efficient multi-scale fusion. The detection head employs a heterogeneous lightweight structure combining group and depthwise separable convolutions to minimize parameter redundancy and boost inference speed. Additionally, a three-stage transfer learning framework (source-domain pretraining, cross-domain adaptation, and target-domain fine-tuning) improves generalization. Experiments on 8,250 field-collected and augmented images show BMDNet-YOLO achieves 95.6% mAP@0.5, 98.27% precision, and 94.36% recall, surpassing existing baselines. This work offers a robust solution for deploying automated blueberry harvesting systems. Full article
15 pages, 3389 KB  
Article
Photovoltaic Decomposition Method Based on Multi-Scale Modeling and Multi-Feature Fusion
by Zhiheng Xu, Peidong Chen, Ran Cheng, Yao Duan, Qiang Luo, Huahui Zhang, Zhenning Pan and Wencong Xiao
Energies 2025, 18(19), 5271; https://doi.org/10.3390/en18195271 - 4 Oct 2025
Abstract
Deep learning-based Non-Intrusive Load Monitoring (NILM) methods have been widely applied to residential load identification. However, photovoltaic (PV) loads exhibit strong non-stationarity, high dependence on weather conditions, and strong coupling with multi-source data, which limit the accuracy and generalization of existing models. To [...] Read more.
Deep learning-based Non-Intrusive Load Monitoring (NILM) methods have been widely applied to residential load identification. However, photovoltaic (PV) loads exhibit strong non-stationarity, high dependence on weather conditions, and strong coupling with multi-source data, which limit the accuracy and generalization of existing models. To address these challenges, this paper proposes a multi-scale and multi-feature fusion framework for PV disaggregation, consisting of three modules: Multi-Scale Time Series Decomposition (MTD), Multi-Feature Fusion (MFF), and Temporal Attention Decomposition (TAD). These modules jointly capture short-term fluctuations, long-term trends, and deep dependencies across multi-source features. Experiments were conducted on real residential datasets from southern China. Results show that, compared with representative baselines such as SGN-Conv and MAT-Conv, the proposed method reduces MAE by over 60% and SAE by nearly 70% for some users, and it achieves more than 45% error reduction in cross-user tests. These findings demonstrate that the proposed approach significantly enhances both accuracy and generalization in PV load disaggregation. Full article
Show Figures

Figure 1

28 pages, 32809 KB  
Article
LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye
Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025
Abstract
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article
25 pages, 9710 KB  
Article
SCS-YOLO: A Lightweight Cross-Scale Detection Network for Sugarcane Surface Cracks with Dynamic Perception
by Meng Li, Xue Ding, Jinliang Wang and Rongxiang Luo
AgriEngineering 2025, 7(10), 321; https://doi.org/10.3390/agriengineering7100321 - 1 Oct 2025
Abstract
Detecting surface cracks on sugarcane is a critical step in ensuring product quality control, with detection precision directly impacting raw material screening efficiency and economic benefits in the sugar industry. Traditional methods face three core challenges: (1) complex background interference complicates texture feature [...] Read more.
Detecting surface cracks on sugarcane is a critical step in ensuring product quality control, with detection precision directly impacting raw material screening efficiency and economic benefits in the sugar industry. Traditional methods face three core challenges: (1) complex background interference complicates texture feature extraction; (2) variable crack scales limit models’ cross-scale feature generalization capabilities; and (3) high computational complexity hinders deployment on edge devices. To address these issues, this study proposes a lightweight sugarcane surface crack detection model, SCS-YOLO (Surface Cracks on Sugarcane-YOLO), based on the YOLOv10 architecture. This model incorporates three key technical innovations. First, the designed RFAC2f module (Receptive-Field Attentive CSP Bottleneck with Dual Convolution) significantly enhances feature representation capabilities in complex backgrounds through dynamic receptive field modeling and multi-branch feature processing/fusion mechanisms. Second, the proposed DSA module (Dynamic SimAM Attention) achieves adaptive spatial optimization of cross-layer crack features by integrating dynamic weight allocation strategies with parameter-free spatial attention mechanisms. Finally, the DyHead detection head employs a dynamic feature optimization mechanism to reduce parameter count and computational complexity. Experiments demonstrate that on the Sugarcane Crack Dataset v3.1, compared to the baseline model YOLOv10, our model achieves mAP50:95 to 71.8% (up 2.1%). Simultaneously, it achieves significant reductions in parameter count (down 19.67%) and computational load (down 11.76%), while boosting FPS to 122 to meet real-time detection requirements. Considering the multiple dimensions of precision indicators, complexity indicators, and FPS comprehensively, the SCS—YOLO detection framework proposed in this study provides a feasible technical reference for the intelligent detection of sugarcane quality in the raw materials of the sugar industry. Full article
Show Figures

Figure 1

19 pages, 5891 KB  
Article
MS-YOLOv11: A Wavelet-Enhanced Multi-Scale Network for Small Object Detection in Remote Sensing Images
by Haitao Liu, Xiuqian Li, Lifen Wang, Yunxiang Zhang, Zitao Wang and Qiuyi Lu
Sensors 2025, 25(19), 6008; https://doi.org/10.3390/s25196008 - 29 Sep 2025
Abstract
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few [...] Read more.
In remote sensing imagery, objects smaller than 32×32 pixels suffer from three persistent challenges that existing detectors inadequately resolve: (1) their weak signal is easily submerged in background clutter, causing high miss rates; (2) the scarcity of valid pixels yields few geometric or textural cues, hindering discriminative feature extraction; and (3) successive down-sampling irreversibly discards high-frequency details, while multi-scale pyramids still fail to compensate. To counteract these issues, we propose MS-YOLOv11, an enhanced YOLOv11 variant that integrates “frequency-domain detail preservation, lightweight receptive-field expansion, and adaptive cross-scale fusion.” Specifically, a 2D Haar wavelet first decomposes the image into multiple frequency sub-bands to explicitly isolate and retain high-frequency edges and textures while suppressing noise. Each sub-band is then processed independently by small-kernel depthwise convolutions that enlarge the receptive field without over-smoothing. Finally, the Mix Structure Block (MSB) employs the MSPLCK module to perform densely sampled multi-scale atrous convolutions for rich context of diminutive objects, followed by the EPA module that adaptively fuses and re-weights features via residual connections to suppress background interference. Extensive experiments on DOTA and DIOR demonstrate that MS-YOLOv11 surpasses the baseline in mAP@50, mAP@95, parameter efficiency, and inference speed, validating its targeted efficacy for small-object detection. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

26 pages, 11189 KB  
Article
DSEE-YOLO: A Dynamic Edge-Enhanced Lightweight Model for Infrared Ship Detection in Complex Maritime Environments
by Siyu Wang, Yunsong Feng, Wei Jin, Liping Liu, Changqi Zhou, Huifeng Tao and Lei Cai
Remote Sens. 2025, 17(19), 3325; https://doi.org/10.3390/rs17193325 - 28 Sep 2025
Abstract
Complex marine infrared images, which suffer from background interference, blurred features, and indistinct contours, hamper detection accuracy. Meanwhile, the limited computing power, storage, and energy of maritime devices require target detection models suitable for real-time detection. To address these issues, we propose DSEE-YOLO [...] Read more.
Complex marine infrared images, which suffer from background interference, blurred features, and indistinct contours, hamper detection accuracy. Meanwhile, the limited computing power, storage, and energy of maritime devices require target detection models suitable for real-time detection. To address these issues, we propose DSEE-YOLO (Dynamic Ship Edge-Enhanced YOLO), an efficient lightweight infrared ship detection algorithm. It integrates three innovative modules with pruning and self-distillation: the C3k2_MultiScaleEdgeFusion module replaces the original bottleneck with a MultiEdgeFusion structure to boost edge feature expression; the lightweight DS_ADown module uses DSConv (depthwise separable convolution) to reduce parameters while preserving feature capability; and the DyTaskHead dynamically aligns classification and localization features through task decomposition. Redundant structures are pruned via LAMP (Layer-Adaptive Sparsity for the Magnitude-Based Pruning), and performance is optimized via BCKD (Bridging Cross-Task Protocol Inconsistency for Knowledge Distillation) self-distillation, yielding a lightweight, efficient model. Experimental results show the DSEE-YOLO outperforms YOLOv11n when applied to our self-constructed IRShip dataset by reducing parameters by 42.3% and model size from 10.1 MB to 3.5 MB while increasing mAP@0.50 by 2.8%, mAP@0.50:0.95 by 3.8%, precision by 2.3%, and recall by 3.0%. These results validate its high-precision detection capability and lightweight advantages in complex infrared scenarios, offering an efficient solution for real-time maritime infrared ship monitoring. Full article
Show Figures

Figure 1

19 pages, 15475 KB  
Article
Oriented Object Detection with RGB-D Data for Corn Pose Estimation
by Yuliang Gao, Haonan Tang, Yuting Wang, Tao Liu, Zhen Li, Bin Li and Lifeng Zhang
Appl. Sci. 2025, 15(19), 10496; https://doi.org/10.3390/app151910496 - 28 Sep 2025
Abstract
Precise oriented object detection of corn provides critical support for automated agricultural tasks such as harvesting, spraying, and precision management. In this work, we address this challenge by leveraging oriented object detection in combination with depth information to estimate corn poses. To enhance [...] Read more.
Precise oriented object detection of corn provides critical support for automated agricultural tasks such as harvesting, spraying, and precision management. In this work, we address this challenge by leveraging oriented object detection in combination with depth information to estimate corn poses. To enhance detection accuracy while maintaining computational efficiency, we construct a precise annotated oriented corn detection dataset and propose YOLOv11OC, an improved detector. YOLOv11OC integrates three key components: Angle-aware Attention Module for angle encoding and orientation perception, Cross-Layer Fusion Network for multi-scale feature fusion, and GSConv Inception Network for efficient multi-scale representation. Together, these modules enable accurate oriented detection while reducing model complexity. Experimental results show that YOLOv11OC achieves 97.6% mAP@0.75, exceeding YOLOv11 by 3.2%, and improves mAP50:95 by 5.0%. Furthermore, when combined with depth maps, the system achieves 92.5% pose estimation accuracy, demonstrating its potential to advance intelligent and automated cultivation and spraying. Full article
Show Figures

Figure 1

26 pages, 10666 KB  
Article
FALS-YOLO: An Efficient and Lightweight Method for Automatic Brain Tumor Detection and Segmentation
by Liyan Sun, Linxuan Zheng and Yi Xin
Sensors 2025, 25(19), 5993; https://doi.org/10.3390/s25195993 - 28 Sep 2025
Abstract
Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI [...] Read more.
Brain tumors are highly malignant diseases that severely threaten the nervous system and patients’ lives. MRI is a core technology for brain tumor diagnosis and treatment due to its high resolution and non-invasiveness. However, existing YOLO-based models face challenges in brain tumor MRI image detection and segmentation, such as insufficient multi-scale feature extraction and high computational resource consumption. This paper proposes an improved lightweight brain tumor detection and instance segmentation model named FALS-YOLO, based on YOLOv8n-Seg and integrating three key modules: FLRDown, AdaSimAM, and LSCSHN. FLRDown enhances multi-scale tumor perception, AdaSimAM suppresses noise and improves feature fusion, and LSCSHN achieves high-precision segmentation with reduced parameters and computational burden. Experiments on the tumor-otak dataset show that FALS-YOLO achieves Precision (B) of 0.892, Recall (B) of 0.858, mAP@0.5 (B) of 0.912 in detection, and Precision (M) of 0.899, Recall (M) of 0.863, mAP@0.5 (M) of 0.917 in segmentation, outperforming YOLOv5n-Seg, YOLOv8n-Seg, YOLOv9s-Seg, YOLOv10n-Seg and YOLOv11n-Seg. Compared with YOLOv8n-Seg, FALS-YOLO reduces parameters by 31.95%, computational amount by 20.00%, and model size by 32.31%. It provides an efficient, accurate and practical solution for the automatic detection and instance segmentation of brain tumors in resource-limited environments. Full article
(This article belongs to the Special Issue Emerging MRI Techniques for Enhanced Disease Diagnosis and Monitoring)
Show Figures

Figure 1

22 pages, 2395 KB  
Article
Multimodal Alignment and Hierarchical Fusion Network for Multimodal Sentiment Analysis
by Jiasheng Huang, Huan Li and Xinyue Mo
Electronics 2025, 14(19), 3828; https://doi.org/10.3390/electronics14193828 - 26 Sep 2025
Abstract
The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity [...] Read more.
The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity of modalities to noise. To enhance analytical accuracy, a novel model named MAHFNet is proposed. The proposed architecture is composed of three main components. Firstly, an attention-guided gated interaction alignment module is developed for modeling the semantic interaction between text and image using a gated network and a cross-modal attention mechanism. Next, a contrastive learning mechanism is introduced to encourage the aggregation of semantically aligned image-text pairs. Subsequently, an intra-modality emotion extraction module is designed to extract local emotional features within each modality. This module serves to compensate for detail loss during interaction fusion. The intra-modal local emotion features and cross-modal interaction features are then fed into a hierarchical gated fusion module, where the local features are fused through a cross-gated mechanism to dynamically adjust the contribution of each modality while suppressing modality-specific noise. Then, the fusion results and cross-modal interaction features are further fused using a multi-scale attention gating module to capture hierarchical dependencies between local and global emotional information, thereby enhancing the model’s ability to perceive and integrate emotional cues across multiple semantic levels. Finally, extensive experiments have been conducted on three public multimodal sentiment datasets, with results demonstrating that the proposed model outperforms existing methods across multiple evaluation metrics. Specifically, on the TumEmo dataset, our model achieves improvements of 2.55% in ACC and 2.63% in F1 score compared to the second-best method. On the HFM dataset, these gains reach 0.56% in ACC and 0.9% in F1 score, respectively. On the MVSA-S dataset, these gains reach 0.03% in ACC and 1.26% in F1 score. These findings collectively validate the overall effectiveness of the proposed model. Full article
Show Figures

Figure 1

16 pages, 10633 KB  
Article
HVI-Based Spatial–Frequency-Domain Multi-Scale Fusion for Low-Light Image Enhancement
by Yuhang Zhang, Huiying Zheng, Xinya Xu and Hancheng Zhu
Appl. Sci. 2025, 15(19), 10376; https://doi.org/10.3390/app151910376 - 24 Sep 2025
Viewed by 30
Abstract
Low-light image enhancement aims to restore images captured under extreme low-light conditions. Existing methods demonstrate that fusing Fourier transform magnitude and phase information within the RGB color space effectively improves enhancement results. Meanwhile, recent advances have demonstrated that certain color spaces based on [...] Read more.
Low-light image enhancement aims to restore images captured under extreme low-light conditions. Existing methods demonstrate that fusing Fourier transform magnitude and phase information within the RGB color space effectively improves enhancement results. Meanwhile, recent advances have demonstrated that certain color spaces based on human visual perception, such as Hue–Value–Intensity (HVI), are superior to RGB for enhancing low-light images. However, these methods neglect the key impact of the coupling relationship between spatial and frequency-domain features on image enhancement. This paper proposes a spatial–frequency-domain multi-scale fusion for low-light image enhancement by exploring the intrinsic relationships among the three channels of HVI space, which consists of a dual-path parallel processing architecture. In the spatial domain, a specifically designed multi-scale feature extraction module systematically captures comprehensive structural information. In the frequency domain, our model establishes deep coupling between spatial features and Fourier transform features in the I-channel. The effectively fused features from both domains synergistically drive an encoder–decoder network to achieve superior image enhancement performance. Extensive experiments on multiple public benchmark datasets show that the proposed method significantly outperforms state-of-the-art approaches in both quantitative metrics and visual quality. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 6747 KB  
Article
YOLOv11-MSE: A Multi-Scale Dilated Attention-Enhanced Lightweight Network for Efficient Real-Time Underwater Target Detection
by Zhenfeng Ye, Xing Peng, Dingkang Li and Feng Shi
J. Mar. Sci. Eng. 2025, 13(10), 1843; https://doi.org/10.3390/jmse13101843 - 23 Sep 2025
Viewed by 198
Abstract
Underwater target detection is a critical technology for marine resource management and ecological protection, but its performance is often limited by complex underwater environments, including optical attenuation, scattering, and dense distributions of small targets. Existing methods have significant limitations in feature extraction efficiency, [...] Read more.
Underwater target detection is a critical technology for marine resource management and ecological protection, but its performance is often limited by complex underwater environments, including optical attenuation, scattering, and dense distributions of small targets. Existing methods have significant limitations in feature extraction efficiency, robustness in class-imbalanced scenarios, and computational complexity. To address these challenges, this study proposes a lightweight adaptive detection model, YOLOv11-MSE, which optimizes underwater detection performance through three core innovations. First, a multi-scale dilated attention (MSDA) mechanism is embedded into the backbone network to dynamically capture multi-scale contextual features while suppressing background noise. Second, a Slim-Neck architecture based on GSConv and VoV-GSCSPC modules is designed to achieve efficient feature fusion via hybrid convolution strategies, significantly reducing model complexity. Finally, an efficient multi-scale attention (EMA) module is introduced in the detection head to reinforce key feature representations and suppress environmental noise through cross-dimensional interactions. Experiments on the underwater detection dataset (UDD) demonstrate that YOLOv11-MSE outperforms the baseline model YOLOv11, achieving a 9.67% improvement in detection precision and a 3.45% increase in mean average precision (mAP50) while reducing computational complexity by 6.57%. Ablation studies further validate the synergistic optimization effects of each module, particularly in class-imbalanced scenarios where detection precision for rare categories (e.g., scallops) is significantly enhanced, with precision and mAP50 improving by 60.62% and 10.16%, respectively. This model provides an efficient solution for edge computing scenarios, such as underwater robots and ecological monitoring, through its lightweight design and high underwater target detection capability. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

24 pages, 6470 KB  
Article
A Method for Improving the Efficiency and Effectiveness of Automatic Image Analysis of Water Pipes
by Qiuping Wang, Lei Lu, Shuguang Liu, Qunfang Hu, Guihui Zhong, Zhan Su and Shengxin Xu
Water 2025, 17(18), 2781; https://doi.org/10.3390/w17182781 - 20 Sep 2025
Viewed by 325
Abstract
The integrity of urban water supply pipelines, an essential element of municipal infrastructure, is frequently undermined by internal defects such as corrosion, tuberculation, and foreign matter. Traditional inspection methods relying on CCTV are time-consuming, labor-intensive, and prone to subjective interpretation, which hinders the [...] Read more.
The integrity of urban water supply pipelines, an essential element of municipal infrastructure, is frequently undermined by internal defects such as corrosion, tuberculation, and foreign matter. Traditional inspection methods relying on CCTV are time-consuming, labor-intensive, and prone to subjective interpretation, which hinders the timely and accurate assessment of pipeline conditions. This study proposes YOLOv8-VSW, a systematically optimized and lightweight model based on YOLOv8 for automated defect detection in in-service pipelines. The framework is twofold: First, to overcome data limitations, a specialized defect dataset was constructed and augmented using photometric transformation, affine transformation, and noise injection. Second, the model architecture was improved on three levels: a VanillaNet backbone was adopted for lightweighting, a C2f-Star module was introduced to enhance multi-scale feature fusion, and the WIoUv3 dynamic loss function was employed to improve robustness under complex imaging conditions. Experimental results demonstrate the superior performance of the proposed YOLOv8-VSW model. This study validates the framework on a curated, real-world image dataset, where YOLOv8-VSW achieved mAP@50 of 83.5%, a 4.0% improvement over the baseline. Concurrently, GFLOPs were reduced by approximately 38.9%, while the inference speed was increased to 603.8 FPS. The findings validate the effectiveness of the proposed method, delivering a solution that effectively balances detection accuracy, computational efficiency, and model size. The results establish a strong technical basis for the intelligent and automated control of safety in urban water supply systems. Full article
(This article belongs to the Section Urban Water Management)
Show Figures

Figure 1

27 pages, 9667 KB  
Article
REU-YOLO: A Context-Aware UAV-Based Rice Ear Detection Model for Complex Field Scenes
by Dongquan Chen, Kang Xu, Wenbin Sun, Danyang Lv, Songmei Yang, Ranbing Yang and Jian Zhang
Agronomy 2025, 15(9), 2225; https://doi.org/10.3390/agronomy15092225 - 20 Sep 2025
Viewed by 266
Abstract
Accurate detection and counting of rice ears serve as a critical indicator for yield estimation, but the complex conditions of paddy fields limit the efficiency and precision of traditional sampling methods. We propose REU-YOLO, a model specifically designed for UAV low-altitude remote sensing [...] Read more.
Accurate detection and counting of rice ears serve as a critical indicator for yield estimation, but the complex conditions of paddy fields limit the efficiency and precision of traditional sampling methods. We propose REU-YOLO, a model specifically designed for UAV low-altitude remote sensing to collect images of rice ears, to address issues such as high-density and complex spatial distribution with occlusion in field scenes. Initially, we combine the Additive Block containing Convolutional Additive Self-attention (CAS) and Convolutional Gated Linear Unit (CGLU) to propose a novel module called Additive-CGLU-C2F (AC-C2f) as a replacement for the original C2f in YOLOv8. It can capture the contextual information between different regions of images and improve the feature extraction ability of the model, introduce the Dropblock strategy to reduce model overfitting, and replace the original SPPF module with the SPPFCSPC-G module to enhance feature representation and improve the capacity of the model to extract features across varying scales. We further propose a feature fusion network called Multi-branch Bidirectional Feature Pyramid Network (MBiFPN), which introduces a small object detection head and adjusts the head to focus more on small and medium-sized rice ear targets. By using adaptive average pooling and bidirectional weighted feature fusion, shallow and deep features are dynamically fused to enhance the robustness of the model. Finally, the Inner-PloU loss function is introduced to improve the adaptability of the model to rice ear morphology. In the self-developed dataset UAVR, REU-YOLO achieves a precision (P) of 90.76%, a recall (R) of 86.94%, an mAP0.5 of 93.51%, and an mAP0.5:0.95 of 78.45%, which are 4.22%, 3.76%, 4.85%, and 8.27% higher than the corresponding values obtained with YOLOv8 s, respectively. Furthermore, three public datasets, DRPD, MrMT, and GWHD, were used to perform a comprehensive evaluation of REU-YOLO. The results show that REU-YOLO indicates great generalization capabilities and more stable detection performance. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

22 pages, 3632 KB  
Article
RFR-YOLO-Based Recognition Method for Dairy Cow Behavior in Farming Environments
by Congcong Li, Jialong Ma, Shifeng Cao and Leifeng Guo
Agriculture 2025, 15(18), 1952; https://doi.org/10.3390/agriculture15181952 - 15 Sep 2025
Viewed by 407
Abstract
Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity [...] Read more.
Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity among different cow behaviors. To address these limitations, this study introduces an enhanced target detection algorithm for cow behavior recognition, termed RFR-YOLO, which is developed upon the YOLOv11n framework. A well-structured dataset encompassing nine distinct cow behaviors—namely, lying, standing, walking, eating, drinking, licking, grooming, estrus, and limping—is constructed, comprising a total of 13,224 labeled samples. The proposed algorithm incorporates three major technical improvements: First, an Inverted Dilated Convolution module (Region Semantic Inverted Convolution, RsiConv) is designed and seamlessly integrated with the C3K2 module to form the C3K2_Rsi module, which effectively reduces computational overhead while enhancing feature representation. Second, a Four-branch Multi-scale Dilated Attention mechanism (Four Multi-Scale Dilated Attention, FMSDA) is incorporated into the network architecture, enabling the scale-specific features to align with the corresponding receptive fields, thereby improving the model’s capacity to capture multi-scale characteristics. Third, a Reparameterized Generalized Residual Feature Pyramid Network (Reparameterized Generalized Residual-FPN, RepGRFPN) is introduced as the Neck component, allowing for the features to propagate through differentiated pathways and enabling flexible control over multi-scale feature expression, thereby facilitating efficient feature fusion and mitigating the impact of behavioral similarity. The experimental results demonstrate that RFR-YOLO achieves precision, recall, mAP50, and mAP50:95 values of 95.9%, 91.2%, 94.9%, and 85.2%, respectively, representing performance gains of 5.5%, 5%, 5.6%, and 3.5% over the baseline model. Despite a marginal increase in computational complexity of 1.4G, the algorithm retains a high detection speed of 147.6 frames per second. The proposed RFR-YOLO algorithm significantly improves the accuracy and robustness of target detection in group cow farming scenarios. Full article
(This article belongs to the Section Farm Animal Production)
Show Figures

Figure 1

19 pages, 2675 KB  
Article
Fast Intra-Coding Unit Partitioning for 3D-HEVC Depth Maps via Hierarchical Feature Fusion
by Fangmei Liu, He Zhang and Qiuwen Zhang
Electronics 2025, 14(18), 3646; https://doi.org/10.3390/electronics14183646 - 15 Sep 2025
Viewed by 311
Abstract
As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like [...] Read more.
As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like depth modeling modes (DMMs), substantially prolongs the decision-making process for coding unit (CU) partitioning, becoming a critical bottleneck in compression encoding time. To address this issue, this paper proposes a fast CU partitioning framework based on hierarchical feature fusion convolutional neural networks (HFF-CNNs). It aims to significantly accelerate the overall encoding process while ensuring excellent encoding quality by optimizing depth map CU partitioning decisions. This framework synergistically captures CU’s global structure and local details through multi-scale feature extraction and channel attention mechanisms (SE module). It introduces the wavelet energy ratio designed for quantifying the texture complexity of depth map CU and the quantization parameter (QP) that reflects the encoding quality as external features, enhancing the dynamic perception ability of the model from different dimensions. Ultimately, it outputs depth-corresponding partitioning predictions through three fully connected layers, strictly adhering to HEVC’s quad-tree recursive segmentation mechanism. Experimental results demonstrate that, across eight standard test sequences, the proposed method achieves an average encoding time reduction of 48.43%, significantly lowering intra-frame encoding complexity with a BDBR increment of only 0.35%. The model exhibits outstanding lightweight characteristics with minimal inference time overhead. Compared with the representative methods under comparison, this method achieves a better balance between cross-resolution adaptability and computational efficiency, providing a feasible optimization path for real-time 3D-HEVC applications. Full article
Show Figures

Figure 1

Back to TopTop