FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses

Ge, Hongyi; Wang, Jing; Zhen, Tong; Li, Zhihui; Zhu, Yuhua; Pan, Quan

doi:10.3390/agronomy15061313

Open AccessArticle

FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses

by

Hongyi Ge

¹,

Jing Wang

^1,2,3,*,

Tong Zhen

^1,2,3,

Zhihui Li

^1,2,3,

Yuhua Zhu

^1,2,3 and

Quan Pan

^1,2,3,4

¹

School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China

²

Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China

³

Henan Key Laboratory of Grain Storage Information Intelligent Perception and Decision Making, Henan University of Technology, Zhengzhou 450001, China

⁴

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(6), 1313; https://doi.org/10.3390/agronomy15061313

Submission received: 28 April 2025 / Revised: 22 May 2025 / Accepted: 26 May 2025 / Published: 27 May 2025

(This article belongs to the Section Pest and Disease Management)

Download

Browse Figures

Versions Notes

Abstract

Stored wheat pests threaten food quality and economic returns, yet existing detection methods struggle with small-object detection, complex scenarios, and efficiency–accuracy trade-offs, largely due to the lack of high-quality datasets. To address these challenges, this study constructed MPest3 dataset for stored wheat pests and proposed an enhanced detection model, FCA-YOLO, based on YOLOv8. This multi-scale fusion architecture, combining pyramid feature extraction with adaptive spatial weighting, improves the detection of small pests through hierarchical feature integration. Experimental results demonstrate that FCA-YOLO enhances multi-scale feature extraction and spatial fusion, achieving a 2.06% increase in mAP, a 4.51% improvement in accuracy, and reducing the pre- and postprocessing time for each image. Compared to Faster-rcnn, FCA-YOLO achieves a better balance between accuracy and computational efficiency, providing a robust and efficient solution for intelligent pest monitoring in grain storage applications.

Keywords:

feature pyramid network; adaptive spatial feature fusion; ConvNeXt-based block; stored wheat pest detection

1. Introduction

The pest problem during grain storage not only threatens food security but also leads to significant economic losses. According to statistics, grain losses in China due to storage pests account for approximately 0.2% to 0.5% of the total grain volume, with annual economic losses exceeding 2 billion Renminbi (RMB), which is equivalent to one-tenth of the country’s total grain storage capacity [1]. The most common storage pests in our country include Tribolium castaneum (Tribolium, Tenebrionidae), Sitophilus oryzae (Sitophilus, Dryophthoridae), Cryptolestes ferrugineus (Coleoptera, Laemophloeidae), and Plodia interpunctella (Plodia, Pyralidae), among others [2]. Among them, the optimal development temperature for Sitophilus oryzae is 26 to 31 °C, and the optimal relative humidity for development is 70%. It mainly harms rice and peanuts, causing irregular holes on the surface of the grains, turning them into powder and significantly damaging the integrity of the grains. The optimal development temperature for Tribolium castaneum is 27 to 30 °C, with an optimal relative humidity of 70%. Additionally, Tribolium castaneum secretes a foul liquid that produces a moldy odor, and its secretions also contain the carcinogen phenol. Cryptolestes ferrugineus mainly feeds on broken or damaged grains and can cause heating in grain piles when the damage is severe [3]. In recent years, computer vision and image processing technologies have gained widespread attention as potential solutions for reducing storage losses and achieving precise pest detection [4]. These technologies, with their powerful information extraction capabilities and efficient inference performance, offer effective solutions for pest detection in grain storage [5,6]. Convolutional neural network based object detection algorithms have been widely applied in various fields [7,8,9,10], demonstrating high detection accuracy and robustness in complex backgrounds. For example, the R-CNN series significantly improves the average precision of object detection, but its high computational cost limits its application in small object detection [11]. In contrast, the lightweight YOLO series has attracted attention due to its fast detection speed, but its limited feature extraction ability makes it difficult to meet the high precision requirements for pest detection in grain storage [12].

The detection of pests during grain storage has garnered significant attention due to its implications for food security and economic stability [13,14]. Early research primarily focused on traditional pest detection methods such as visual inspection and pheromone traps [15,16]. However, these approaches are time-consuming, labor-intensive, and prone to human error. As a result, there has been a growing interest in applying computer vision and machine learning techniques to automate pest detection and improve accuracy. Convolutional neural networks have been widely adopted for object detection tasks, demonstrating their strong capability in feature extraction and classification in various domains. Among the most prominent CNN-based models is the R-CNN series, which significantly improved average precision in object detection by utilizing region-based CNNs. Despite its success, R-CNN models suffer from high computational complexity, limiting their effectiveness in real-time or small target detection tasks [17]. To address this limitation, lighter models like YOLO were introduced. YOLO’s fast processing speed has made it an attractive choice for real-time applications. However, its relatively low feature extraction capability limits its performance in pest detection tasks in stored grain. In recent years, significant progress has been made in the detection of stored grain pests using YOLO-based models. For example, Kuzuhara et al. [18] improved pest detection accuracy by integrating a two-stage CNN recognition framework with the YOLOv3 region proposal network; Dong et al. [19] enhanced YOLOv5, achieving a detection accuracy of 90.7% while reducing the model’s parameter count; and Yang et al. [20] optimized YOLOv8, and while improving the computational efficiency, the detection accuracy reached 93.6%, but there were still challenges such as false positives and missed detection, which limited its applicability in complex food storage environments. Additionally, the severe lack of publicly available high-quality datasets for stored grain pest detection poses a significant barrier to the training and performance of deep learning models, thus limiting their generalization and stability in grain storage environments. To mitigate this issue, researchers often resort to self-collected datasets, employing methods such as capturing images in simulated warehouse environments in laboratory settings [21], utilizing multispectral illuminators in conjunction with high-definition cameras [22], and photographing specimens within biochemical incubators [23]. To address the issue of low computational efficiency in YOLO models for specific detection tasks, several studies have introduced strategies such as multi-scale feature fusion, lightweight residual modules, and adaptive attention mechanisms [24,25,26,27]. However, in the task of stored grain pest detection, this remains a challenging problem, prompting further research into more effective algorithms and data resources to optimize pest detection technology in grain storage environments. Recently, YOLOv8 has enhanced the accuracy of small object detection by employing deep feature extraction networks and adaptive anchor mechanisms, but it still faces issues such as false positives and missed detections [28]. Moreover, the lack of publicly available high-quality datasets for grain storage pest detection further limits the performance and stability of detection models.

To address these issues, this paper proposes the FCA-YOLO model, which significantly improves the accuracy and efficiency of pest detection in grain storage by combining a feature pyramid network (FPN), a lightweight residual module (CNeB), and an adaptive spatial feature fusion (ASFF) mechanism. The main contributions of this paper are summarized as follows:

Construction of the Grain Pest Image Dataset MPest3: This dataset focuses on stored wheat and includes morphological features of typical pests such as Tribolium castaneum, Sitophilus oryzae, and Cryptolestes ferrugineus, providing a foundation for feature extraction and analysis in complex scenarios.
Design of Multi-Scale Feature Fusion Mechanism: The FPN structure is integrated into YOLOv8, enhancing the detection performance of small targets across multiple scales by effectively combining shallow and deep features.
Introduction of the Lightweight Residual Module CNeB: This module combines depthwise separable convolutions and feature alignment strategies to reduce computational cost while improving feature representation and model stability.
Proposal of the ASFF Detection Head Structure: This structure uses an adaptive weighting mechanism in the spatial dimension to effectively mitigate the interference of complex backgrounds and target occlusion on detection performance.

The structure of this paper is as follows: Section 2 provides a detailed description of the materials and methods; Section 3 presents the experimental results; Section 4 discusses the generalization capabilities of the model; and finally, Section 5 summarizes the paper and discusses future research directions.

2. Materials and Methods

2.1. Experimental Datasets

In this study, three kinds of wheat storage pests were systematically collected using the Sony IMX803 sensor (Sony Group Corporation, Tokyo, Japan) as the core imaging device. Images of Tribolium castaneum, Sitophilus oryzae, and Cryptolestes ferrugineus taken in the context of stored wheat and augmented using various techniques. The resulting MPest3 dataset contains 4255 images with resolutions of

4032 \times 3024

and

256 \times 256

, forming the basis for this experiment. Nine augmentation methods were applied, including flipping, center cropping, and adjustments in brightness, contrast, and saturation, along with adding Gaussian and salt-and-pepper noise. The dataset was split into training, validation, and test sets in an 8:1:1 ratio. A detailed breakdown is shown in Table 1, with augmented data on the left of the ‘+’ sign and raw images on the right.

2.2. Experimental Environment

This paper uses the PyTorch framework and Python for model development. All models are trained on a GPU with Windows OS. The specific experimental setup is listed in Table 2 below. In addition, the validation phase includes non-maximum suppression (NMS, IoU threshold of 0.7), a basic learning rate of 0.01, a weight decay coefficient of 0.0005, a learning rate warm-up period of 3, an HSV tone enhancement amplitude of 0.015, and a horizontal flip probability of 0.5.

2.3. Modeling Assessment

For comparative experiments, the MPest3 dataset was used to train five representative models, evaluated using precision, recall, mAP, GFLOPs, and parameter count (Params). In ablation studies, the FCA-YOLO model was compared with its preoptimization version, using additional metrics such as Accuracy, F1-score, and postprocessing time, with an IoU threshold of 0.5.

2.4. FCA-YOLO Model Architecture

The proposed FCA-YOLO model enhances multi-scale target detection through three core components: feature pyramid network (FPN) for hierarchical feature fusion, ConvNeXt-based Block (CNeB) for efficient feature extraction, and adaptive spatial feature fusion (ASFF) for dynamic scale adaptation. The pseudocode of the model is shown in Algorithms 1 and 2.

Algorithm 1 FCA-YOLO Detection Framework (Part 1)

Require Input

I \in R^{3 \times H \times W}

Ensure: Result

{(b_{i}, c_{i}, s_{i})}_{i = 1}^{N}

1:: Phase 1: CNeB-based Backbone
2:: for stage $k \in {1, 2, 3, 4}$ do
3:: Downsample: $F_{k} = Conv (F_{k - 1}, kernel = 3, stride = 2)$
4:: Alternating C2f & CNeB blocks:
5:: for $m = 1$ to blocks[k] do
6:: if m is odd then
7:: $F_{k} = C 2 f (F_{k}, expand = T r u e)$
8:: else
9:: $\hat{F} = DSConv (F_{k})$
10:: $\hat{F} = LayerNorm (\hat{F})$
11:: $F_{k} = CNeB (\hat{F} + F_{k})$
12:: end if
13:: end for
14:: Collect multi-scale features ${C_{3}, C_{4}, C_{5}}$
15:: end for
16:: Phase 2: Multi-Level Feature Fusion
17:: Low-FAM: ${Feat}_{l o w} = Low_FAM (C_{3}, C_{4}, C_{5})$
18:: Low-IFM: ${Global}_{l o w} = Low_IFM ({Feat}_{l o w})$
19:: Feature Injection:
20:: for target $t \in {P 4, P 3}$ do
21:: Channel split: $Base, Context = Split (C_{t})$
22:: Feature alignment: $Align = Low_LAF (Context)$
23:: Inject: $C_{t}^{'} = Inject (Base, Align)$
24:: Enhance: $C_{t}^{'} = RepBlock (C_{t}^{'})$
25:: end for

Algorithm 2 FCA-YOLO Detection Framework (Part 2)

1:: Phase 3: ASFF Detection Head
2:: High-FAM: ${Feat}_{h i g h} = High_FAM (C_{3}^{'}, C_{4}^{'}, C_{5})$
3:: High-IFM: ${Global}_{h i g h} = High_IFM ({Feat}_{h i g h})$
4:: Multi-Level Fusion:
5:: for level $l \in {n 4, n 5}$ do
6:: Cross-level alignment: ${Align}_{l} = High_LAF (C_{l}, {Global}_{h i g h})$
7:: Inject: $C_{l}^{″} = Inject (C_{l}, {Align}_{l})$
8:: Enhance: $C_{l}^{″} = RepBlock (C_{l}^{″})$
9:: end for
10:: Final Prediction:
11:: ASFF weights: $α = σ (Conv (concat (C_{3}^{″}, C_{4}^{″}, C_{5}^{″})))$
12:: Fused features: $F_{fused} = \sum α_{i} \cdot R (C_{i}^{″})$
13:: ${(b_{i}, c_{i}, s_{i})} = Detect_ASFF (F_{fused})$

In view of the subtle characteristics of stored wheat pests, this study has made a series of targeted improvements to the YOLOv8 model, focusing on three key aspects. Firstly, the FPN is used as a neck structure to strengthen the integration of multi-scale features while reducing computing overhead. Secondly, the CNeB module based on ConvNeXt is integrated into the backbone network to improve feature extraction efficiency and ensure a robust gradient flow across layers. Thirdly, the ASFF mechanism is introduced in the detection header, which allows the seamless integration of multi-level features and alleviates potential conflicts between low-level spatial details and advanced semantic information. These modifications jointly improve the multi-scale feature fusion ability of the model and ensure excellent detection performance. The enhanced FCA-YOLO network structure is shown in Figure 1.

2.4.1. Baseline Model YOLOv8

The YOLOv8 network marks a significant advancement in the YOLO series, offering notable improvements in feature extraction efficiency and inference speed. It incorporates a deeper feature extraction network and an adaptive anchor mechanism, which enhance its ability to detect small targets [29]. In stored-wheat pest detection tasks, these deep features effectively distinguish pests from background noise, improving detection accuracy. Additionally, YOLOv8 refines the IoU-based loss function and optimizes the Non-Maximum Suppression algorithm, boosting both accuracy and inference speed [30]. Compared to YOLOv10, YOLOv8 strikes an optimal balance between model complexity and detection performance, making it particularly suitable for resource-limited environments [31]. Moreover, the introduction of the C2f module improves gradient flow efficiency by adding more shortcut connections and split operations. The Bottleneck unit facilitates direct channel recovery in the middle convolution layer, significantly reducing model parameters and computational load, making YOLOv8 a preferred choice for detecting wheat pests in complex backgrounds.

2.4.2. Gold-YOLO: A Model Based on Feature Pyramid Networks

Compared to traditional YOLO models, Gold-YOLO offers significant advantages in small target detection and resource-constrained scenarios. Firstly, the model adopts an enhanced multi-scale feature fusion strategy to address the issue of limited small target information in higher-level feature maps [32]. Secondly, it replaces the YOLO neck’s path aggregation network with a feature pyramid network, which integrates shallow fine-grained information with deep semantic features to improve small target detection. Lastly, Gold-YOLO incorporates a local feature enhancement module during feature extraction to better detect fine-grained targets [33].

Additionally, Gold-YOLO utilizes a dynamic feature selection mechanism and NWD loss to significantly optimize bounding box localization accuracy while adaptively fusing features from different layers, thus reducing redundant computations and enhancing detection efficiency [34]. Consequently, the use of a feature pyramid network improves the capture of small target details while maintaining efficient inference speed (Figure 2).

2.4.3. Attention Mechanism CNeB

The CNeB module, optimized for small target detection, enhances feature fusion and information transfer efficiency. Based on the ConvNeXt framework, it improves feature extraction through depthwise separable convolution and layer normalization blocks [35]. Depthwise separable convolutions efficiently capture features by processing each channel independently, followed by pointwise

1 \times 1

convolutions for cross-channel integration, refining the feature representation and boosting network sensitivity to small targets. The computational process is as follows:

Y_{p o i n t w i s e} = (X \times K_{d e p t h w i s e}) \times K_{p o i n t w i s e}

(1)

where

X \in R^{H \times W \times C_{i n}}

represents the input feature map,

K_{d e p t h w i s e} \in R^{k \times k \times C_{i n}}

is the depthwise convolution kernel, and

K_{p o i n t w i s e} \in R^{1 \times 1 \times C_{i n} \times C_{o u t}}

is the pointwise convolution kernel. This design reduces convolutional computational complexity from

O (k^{2} \cdot C_{i n} \cdot C_{o u t})

to

O (k^{2} \cdot C_{i n} + C_{i n} \cdot C_{o u t})

, significantly lowering the computational cost while preserving feature extraction capabilities. Layer normalization stabilizes input feature distribution across layers, addressing vanishing gradient issues. Its formula is:

\hat{x_{i}} = \frac{x_{i} - μ}{\sqrt{σ^{2} + φ}} \cdot γ + β

(2)

where

μ

and

σ^{2}

represent the mean and variance of input

x_{i}

,

φ

is a smoothing term, and

γ

and

β

are learnable parameters. Additionally, the CNeB module incorporates residual connections, passing input features directly to subsequent layers to facilitate information flow and mitigate gradient vanishing. The residual operation is expressed as:

Y_{r e s i d u a l} = F (X) + X

(3)

where

F (X)

represents the output after convolution, normalization, and other operations. The structure and parameters of the CNeB module, as shown in Figure 3, demonstrate its ability to enhance feature extraction while reducing computational costs. This attention-mechanism-based module is chosen to handle the challenges of inconsistent input sizes, enabling more stable and efficient detection of small targets.

2.4.4. Attention Mechanisms ASFF

ASFF is an adaptive spatial feature fusion mechanism designed to improve detection accuracy in single-stage object detection by addressing feature inconsistencies across scales [36]. Low-level features capture rich spatial details, while high-level features provide strong semantic representation. However, their disparity can lead to information conflict and reduced performance. ASFF resolves this by aligning multi-scale features into a unified spatial dimension through upsampling or downsampling, enabling smooth spatial fusion (Figure 4).

This mechanism adaptively assigns weights to features during fusion, allowing the model to emphasize critical regions and suppress redundant information. YOLOv8 serves as a robust foundational framework, enhanced by the FPN design and multi-scale fusion strategies in Gold-YOLO. Furthermore, CNeB and ASFF improve feature extraction and multi-scale integration. Building on these advancements, the FCA-YOLO model is introduced, showing superior performance and robustness in stored-wheat pest detection.

3. Results

3.1. Comparison Experiment

In this section, we trained the MPest3 datasets on six classical models: Faster-rcnn_r50_fpn, SSDlite_mobilenetv2, YOLOv6, YOLOv8, YOLOv10, and YOLOv12, respectively, to evaluate their performance in grain storage pest detection tasks. To ensure fair benchmarking, all models followed their native preprocessing protocols [37,38]: Faster-rcnn_r50_fpn [39] resized images to 800 × 1333 (shorter side = 800, longer side

a \leq b

1333) per COCO convention [40]; SSDlite_mobilenetv2 used 300 × 300 square scaling without aspect ratio retention for mobile optimization [41]; YOLO series (v6/v8/v10/v12) standardized inputs to 640 × 640 with aspect ratio preservation and gray padding [28,42,43,44]. These resolution choices reflect architectural priorities: YOLO balances speed-accuracy tradeoffs, Faster-rcnn preserves multi-scale features, while SSDlite emphasizes computational frugality. The results are shown in Table 3, where cn stands for Tribolium castaneum, mx stands for Sitophilus oryzae, and xc stands for Cryptolestes ferrugineus.

As shown in Table 3, Faster-rcnn_r50_fpn achieves high mAP, demonstrating strong target detection capabilities. However, its substantial computational cost and parameter count make it unsuitable for resource-constrained scenarios. Although SSDlite_mobilenetv2 has minimal computational requirements, its precision and mAP in small-target detection are significantly lower, failing to meet high-precision demands. YOLOv6 achieves a high mAP but exhibits low precision and recall in complex categories, with a high rate of missed detections. In contrast, YOLOv8 maintains a balanced performance across precision, recall, and mAP, outperforming YOLOv10 and YOLOv12 in detecting Cryptolestes ferrugineus, demonstrating strong deployment potential. YOLOv12 requires more memory to run, and considering both accuracy and device costs, YOLOv8 is superior. Therefore, optimizing YOLOv8 further is expected to enhance small-target detection performance and overall model accuracy. In the proposed model, the original neck structure’s upsampling, concatenation, and C2f module are replaced with a feature pyramid network (FPN). Experimental results show that FPN improves detection accuracy by 1% while reducing image preprocessing time by 16.7%. To further evaluate the effectiveness of the proposed attention mechanism, this study integrates DySnakeConv, SPDC, CNeB, and GhostConv into the C2f module of the backbone and compares their impact on stored-grain pest detection. Table 4 presents the experimental results, highlighting the optimal values in bold.

Experimental analysis indicates that introducing a dynamic deformable convolution kernel into the C2f module enhances flexibility and adaptability, yielding a marginal accuracy improvement of 0.05% [45]. The SPDC module [46], incorporating a spatial depth layer and stride-free convolution, was designed to improve feature extraction by using stride-1 convolutions for channel reduction. However, initial tests revealed that a single-layer SPDC module reduced accuracy to 77.45%, failing to meet the study’s accuracy enhancement objective. To address this, the CNeB module was introduced, leveraging stacked depthwise separable convolutions and layer normalization. This approach improved accuracy by 0.53%, outperforming DySnakeConv in terms of accuracy, parameter count, and model complexity. To further reduce computational cost, the GhostConv module was also explored. By generating ghost feature maps through linear transformations, GhostConv aimed to enhance feature representation and facilitate deeper information extraction [47].

While it successfully reduced computational complexity by 0.5 GFLOPs and lowered the parameter count by approximately 0.19M, it led to a decline in accuracy, making the trade-off unfavorable. As shown in Figure 5a,b, integrating CNeB slightly improves model accuracy while maintaining the same F1-score and achieving faster convergence. Although both DySnakeConv and CNeB enhance accuracy, DySnakeConv incurs a higher parameter count and computational complexity. Additionally, DySnakeConv’s confidence threshold of 0.373 is higher than CNeB’s 0.308, indicating a greater likelihood of false detections. To balance model performance and computational efficiency, this study integrates the CNeB module into the backbone network to optimize overall detection performance.

Analyzing the experimental data in Table 5, the addition of a partial convolutional component (PC_Detect) to the Detect module reduced redundant computations and memory accesses, achieving a processing speed of 0.4 ms and lower computational complexity, though it led to a 0.28% decrease in accuracy [48]. Given the high memory usage of the dataset, optimizations were applied to both spatial and channel dimensions, leading to the introduction of the SC_Detect module. This module, incorporating Spatial Reconstruction Units (SRUs) and Channel Reconstruction Units (CRUs), reduced computational overhead [49], but did not improve accuracy. Next, the SA_Detect module, with an adaptive attention mechanism, enhanced the model’s focus on target regions [50], resulting in a slight accuracy increase to 95.92%. To further enhance feature learning, we applied the reparameterization approach from RepConv in the detection head, using multi-branch assistance during training [51]. This led to a slight increase in GFLOPs (0.3) but a 0.45% decrease in accuracy. Finally, to improve multi-scale detection capabilities, an adaptive spatial feature fusion (ASFF) strategy was applied for optimal feature selection, achieving a detection accuracy of 97.29%. The results confirm that exploring multi-scale detection is a promising approach.

In order to more intuitively see the effect of different methods on the model performance, we performed ablation experiments on the model FCA-YOLO, as shown in Figure 6. In this figure, (a) shows the original image of grain storage pests, (b) shows the detection results of the YOLOv8 algorithm, (c) shows the detection results of the feature pyramid network used for the neck structure of YOLOv8, (d) shows the detection results of YOLOv8 based on the feature pyramid network and the CNeB attention mechanism, and (e) shows the detection results of our proposed FCA-YOLO model. The marked cn, mx and xc prediction boxes in the detection results indicate Tribolium castaneum, Sitophilus oryzae, and Cryptolestes ferrugineus, respectively.

3.2. Ablation Experiment

The results show that during the network model improvement process, the confidence level increased when detecting the relatively large textitTribolium castaneum. For the more distinct Sitophilus oryzae, the confidence level initially rose and then stabilized. In detecting Cryptolestes ferrugineus, misdetections and omissions were progressively reduced. Ultimately, the loss curve stabilized, indicating model convergence (see Figure 7).

The objective evaluation and analysis results of the comparative detection accuracy, precision, average accuracy, and F1-score for different module improvements are shown in Table 6, with bold indicating the optimal values. Here, G represents the introduction of the feature pyramid network, C indicates the incorporation of the CNeB attention mechanism, A refers to the use of the ASFF attention mechanism to optimize the detection head, ‘✔’ signifies the use of the method, and blanks indicate the method was not applied.

As can be seen from Table 6, by using the feature pyramid network structure the model F1-score is improved by 3%, the average accuracy is increased by 1%, and the accuracy is increased by 4.51%. By using the CNeB module based on the feature pyramid network, the accuracy is improved by 0.63% and the accuracy is further improved by 0.11%. Finally, the optimization of the detection head using the ASFF mechanism improves the detection accuracy by 0.43% and achieves a better balance in the detection of the textitTribolium castaneum Herbst and the Cryptolestes ferrugineus with similar features. Our model improves the overall precision by 2.06% and accuracy by 4.51% compared to the original YOLOv8, which reduces the occurrence of model misdetections. The results show that the adjustment strategy based on multiple scales is feasible.

4. Discussion

4.1. Hybrid Technology Comparison

To validate the synergistic advantages of the proposed CNeB + ASFF combination, we compared it with other hybrid architectures on the MPest3 dataset. The FPN + CNeB + ASFF configuration achieves the highest mAP (97.29%) while maintaining moderate computational complexity (20.8 GFLOPs). In contrast, the CNeB GhostNet ASFF framework simplifies its architecture through the GhostNet module, and the number of parameters is only 41% of that in this study, but the mAP is also higher than YOLOv8. The GhostNet module is more suitable for lightweight deployments than the introduction of FPN [47]. When applied to smaller and visually similar classes (Tribolium castaneum and Cryptolestes ferrugineus) recognition, our model has 0.8% and 1.3% higher mAP, respectively, and a higher optimal confidence threshold, showing better model calibration and feature optimization capabilities.

4.2. Cross-Domain Validation

To preliminarily investigate the model’s generalization capability under distribution shift, we used the external pest dataset 6e723-maincite to assess the generalization ability of FCA-YOLO [52]. Despite discrepancies between this dataset and the target task of stored grain pest detection (e.g., differences in object categories and background complexity), they share similarities in the relative scale distribution of pest targets and both pertain to agricultural pest scenarios. Experimental results demonstrate that, compared to the baseline YOLOv8 model, FCA-YOLO achieves an improvement in mAP from 0.85 to 0.86 on this dataset, along with a reduced miss detection rate. Across categorical evaluations, class 10AFps (Signoret pseudococcus) demonstrated a marginal reduction in mAP, while the remaining classes achieved marked improvements in this performance metric. Although the mAP improvement is small, the improvement of the confidence threshold and the improvement of the performance of key categories together indicate the generalization ability of the FCA-YOLO model (Figure 8).

4.3. Limitations and Future Work

Although FCA-YOLO exhibits remarkable performance in stored-grain pest detection, several limitations necessitate further exploration. First, the MPest3 dataset was confined to three primary wheat storage pests (Tribolium castaneum, Sitophilus oryzae, and Cryptolestes ferrugineus). While data augmentation techniques alleviated sample diversity constraints, the dataset lacks realistic environmental variations in grain storage facilities, such as fluctuating illumination, motion blur caused by automated inspection vehicles, and heterogeneous grain backgrounds. These deficiencies may hinder the model’s generalization capacity in practical storage scenarios. Second, computational efficiency analysis shows that while FCA-YOLO reduces preprocess and postprocess times, it significantly reduces inference speed, which poses challenges for the deployment of resource-constrained edge devices. Third, the current evaluation predominantly focused on static image analysis, overlooking the practical efficacy of video-based surveillance systems. Future research will prioritize the following objectives. Firstly, the dataset of more pest species in the context of the real granary environment was added to improve the generalization of the model. Secondly, the image preprocessing technology is optimized to improve the recognition ability of the model under low-light and dusty conditions, and reduce the cost of the camera. Then, a hybrid quantification method and a dynamic pruning mechanism were investigated to reduce the computational overhead without affecting the detection accuracy. Finally, a video-based detection framework was developed, and inspection trolleys were deployed to achieve real-time prediction and analysis of pests in the warehouse.

5. Conclusions

To advance stored-grain pest detection, this study introduces the MPest3 dataset and proposes the FCA-YOLO model, designed to address the challenges of small target detection in complex storage environments. The model integrates multi-scale feature fusion via a feature pyramid network, employs the CNeB module for enhanced feature extraction, and incorporates an adaptive fusion mechanism at the detection head, significantly improving detection precision. A key innovation of FCA-YOLO is its custom optimization for stored grain pest detection, which effectively captures fine-grained pest signatures against a cluttered background. The next phase of research aims to optimize inference speed without compromising accuracy, facilitating better deployment of smart inspection vehicles in grain storage facilities. This will help in the timely detection of pests and reduce economic losses associated with grain storage.

Author Contributions

T.Z. and Z.L. contributed to the conceptualization. Y.Z. provided guidance in methodology. J.W. and H.G. conducted software operations and carried out subsequent experimental verification. T.Z. and J.W. were responsible for the acquisition and processing of the dataset. Z.L. and Y.Z. made contributions to resource acquisition. J.W. conducted relevant application innovations, recorded the experimental process, and wrote the initial draft, which was then reviewed and edited by Q.P., T.Z. and H.G. Throughout the entire experimental process, H.G., Z.L., Y.Z. and Q.P. provided supervision and guidance as well as obtaining funding. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Program of the 14th Five-Year Plan of China: Research and Development and Demonstration of High-Efficiency Connecting Equipment for Multimodal Transportation of Grain Logistics (Project No.: 2022YFD2100200), and in part by the Natural Science Project of the Henan Provincial Department of Education: Research on Key Technologies for Monitoring Grain Storage Conditions Based on Digital Twin (Project No.: 24A520013).

Data Availability Statement

The datasets used in this experiment were all collected independently. Visible light images of the grain surface were captured for three types of stored-grain pests. The datasets presented in this article are not readily available because these data are part of an ongoing study. For access to this dataset information, please contact the corresponding author.

Acknowledgments

During the conduct of this study, the authors used PyCharm 2023.3.1 to achieve the application of object detection in the field of stored-grain pests. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflict of interest. Funders play a decisive role in data collection, analysis, manuscript writing, and decision submissions for publication.

References

Cheng, S.K. Research on Detection Method of Storage Pests Based on Deep Learning. Master’s Thesis, Henan University of Technology, Zhenzhou, China, 2017. [Google Scholar]
Wang, D.X.; Dou, Y.H.; Yan, X.P.; Wang, Z.M.; Shao, X.L.; He, Y.P. Research on fauna investigation of primary and representative species of stored grain insects in seven grain storage ecoregion of China. J. Chin. Cereals Oils Assoc. 2025, 1–15. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Z.M. Major stored-product pests and advances in control technologies. Chin. J. Hyg. Insect. Equip. 2025, 31, 8–11+21. [Google Scholar]
Shen, Y.F.; Zhou, H.L.; Li, J.T.; Jian, F.J.; Jayas, D.S. Detection of stored-grain insects using deep learning. Comput. Electron. Agric. 2018, 145, 319–325. [Google Scholar] [CrossRef]
Zou, Z.X.; Chen, K.Y.; Shi, Z.W.; Guo, Y.H.; Ye, J.P. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Zhao, L.; Yao, H.T.; Fan, Y.J.; Ma, H.H.; Li, Z.H.; Tian, M. Power Line Detection for Aerial Images Using Object-Based Markov Random Field With Discrete Multineighborhood System. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Fu, S.T.; Lai, Y.J.; Gu, C.S.; Gu, H. Prediction model of concrete dam deformation based on EEMD-AFSA-CNN. Adv. Sci. Technol. Water Resour. 2025, 1–10. [Google Scholar]
Xiao, Y.D. Integrating CNN and RANSAC for improved object recognition in industrial robotics. Syst. Soft Comput. 2025, 7, 200240. [Google Scholar] [CrossRef]
Walker, J.C.; Swineford, C.; Patel, K.R.; Dougherty, L.R.; Wiggins, J.L. Deep learning identification of reward-related neural substrates of preadolescent irritability: A novel 3D CNN application for fMRI. Neuroimage Rep. 2025, 5, 100259. [Google Scholar] [CrossRef]
Akter, R.; Islam, M.R.; Debnath, S.K.; Sarker, P.K.; Uddin, M.K. A hybrid CNN-LSTM model for environmental sound classification: Leveraging feature engineering and transfer learning. Digit. Signal Process. 2025, 163, 105234. [Google Scholar] [CrossRef]
Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 16965–16974. [Google Scholar]
Zhao, Y.; Lv, W.Y.; Xu, S.L.; Wei, J.M.; Wang, G.Z.; Dang, Q.Q. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Melki, M.N.E.; Khayri, J.M.A.; Aldaej, M.I.; Almaghasla, M.I.; Moueddeb, K.E.; Khlifi, S. Assessment of the Effect of Climate Change on Wheat Storage in Northwestern Tunisia: Control ofRhyzopertha dominicaby Aeration. Agronomy 2023, 13, 1773. [Google Scholar] [CrossRef]
Radek, A.; Ali, S.J.; Vlastimil, K.; Li, Z.H.; Vaclav, S. Control of Stored Agro-Commodity Pests Sitophilus granarius and Callosobruchus chinensis by Nitrogen Hypoxic Atmospheres: Laboratory and Field Validations. Agronomy 2022, 12, 2748. [Google Scholar] [CrossRef]
Li, R.; Li, C.; Wen, Y.M.; Li, H.; Wang, D.X. Research on trapping of Stored Grain Pests in main Grain Storage Ecological Areas of Yunnan Province. Grain Storage 2018, 47, 6–11. [Google Scholar]
Agrafioti, P.; Lampiri, E.; Kaloudis, E.; Gourgouta, M.; Vassilakos, T.N.; Ioannidis, P.M.; Athanassiou, C.G. Spatio-Temporal Distribution of Stored Product Insects in a Feed Mill in Greece. Agronomy 2024, 14, 2812. [Google Scholar] [CrossRef]
Zhen, T.; Wang, J.; Li, Z.H.; Zhu, Y.H. A review of the application of computer vision and image detection technology in the monitoring of stored grain pests. JCCOA 2025, 1–14. [Google Scholar] [CrossRef]
Kuzuhara, H.; Takimoto, H.; Sato, Y.; Kanagawa, A. Insect Pest Detection and Identification Method Based on Deep Learning for Realizing a Pest Control System. In Proceedings of the 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), Chiang Mai, Thailand, 23–26 September 2020; pp. 709–714. [Google Scholar]
Dong, Q.; Sun, L.; Han, T.X.; Cai, M.Q.; Gao, C. PestLite: A Novel YOLO-Based Deep Learning Technique for Crop Pest Detection. Agriculture 2024, 14, 228. [Google Scholar] [CrossRef]
Yang, Y.Y.; Xiao, Y.J.; Chen, Z.A.; Tang, D.X.; Li, Z.H.; Li, Z.Y. FCBTYOLO: A Lightweight and High-Performance Fine Grain Detection Strategy for Rice Pests. IEEE Access 2023, 11, 101286–101295. [Google Scholar] [CrossRef]
Shi, Z.C.; Dang, H.; Liu, Z.C.; Zhou, X.G. Detection and Identification of Stored-Grain Insects Using Deep Learning: A More Effective Neural Network. IEEE Access 2020, 8, 163703–163714. [Google Scholar] [CrossRef]
Liu, L.; Wang, R.J.; Xie, C.J.; Yang, P.; Wang, F.Y.; Sudirman, S. PestNet: An End-to-End Deep Learning Approach for Large-Scale Multi-Class Pest Detection and Classification. IEEE Access 2019, 7, 45301–45312. [Google Scholar] [CrossRef]
Min, C.; Zhan, W.; Zhang, Y.Q.; Lv, J.H.; Hong, S.B.; Dong, T.Y.; She, J.H.; Huang, H.Z. Trajectory Tracking and Behavior Analysis of Stored Grain Pests via Hungarian Algorithm and LSTM Network. JCCOA 2023, 38, 28–34. [Google Scholar]
Wu, J.; Zhao, F.Y.; Yao, G.T.; Jin, Z.H. FGA-YOLO: A one-stage and high-precision detector designed for fine-grained aircraft recognition. Neurocomputing 2025, 618, 129067. [Google Scholar] [CrossRef]
Liu, X.Y.; Wang, T.; Yang, J.M.; Tang, C.W.; Lv, J.C. MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment. Neurocomputing 2024, 574, 127210. [Google Scholar] [CrossRef]
Xie, W.N.; Ma, W.F.; Sun, X.T. An efficient re-parameterization feature pyramid network on YOLOv8 to the detection of steel surface defect. Neurocomputing 2025, 614, 128775. [Google Scholar] [CrossRef]
Ghazlane, Y.; Ahmed, E.H.A.; Hicham, M. Real-time lightweight drone detection model: Fine-grained Identification of four types of drones based on an improved Yolov7 model. Neurocomputing 2024, 596, 127941. [Google Scholar] [CrossRef]
Varghese, R.J.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ge, Z.; Liu, S.T.; Wang, F.; Li, Z.M.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Ho, K.C.; Young, K.S. Real-time object detection and segmentation technology: An analysis of the YOLO algorithm. JMST Adv. 2023, 5, 69–76. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.F.; Shi, J.P.; Jia, J.Y. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Zhang, X.Y.; Zou, J.H.; He, K.M.; Sun, J. Accelerating Very Deep Convolutional Networks for Classification and Detection. TPAMI 2016, 38, 1943–1955. [Google Scholar] [CrossRef]
Liu, Z.; Mao, H.Z.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S.N. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar]
Liu, S.T.; Huang, D.; Wang, Y.H. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
Kai, C.; Wang, J.Q.; Pang, J.M.; Cao, Y.H.; Xiong, Y.; Li, X.X.; Sun, S.Y.; Feng, W.S.; Liu, Z.W.; Xu, J.R.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
Singh, B.; Davis, L.S. An Analysis of Scale Invariance in Object Detection-SNIP. arXiv 2018, arXiv:1711.08189. [Google Scholar]
Tao, Z.Y.; Sun, S.F.; Luo, C.S. Study on peanut pest image recognition based on Faster-RCNN. Jiangsu Agric. Sci. 2019, 47, 247–250. [Google Scholar]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Sandler, M.; Howard, A.; Zhu, M.L.; Zhmoginov, A. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Li, C.Y.; Li, L.L.; Jiang, H.L.; Weng, K.H.; Geng, Y.F.; Li, L.; Ke, Z.D.; Li, Q.Y.; Cheng, M.; Nie, W.Q.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Guan, S.T.; Lin, Y.M.; Lin, G.Y.; Su, P.S.; Huang, S.L.; Meng, X.Y.; Liu, P.Z.; Yan, J. Real-Time Detection and Counting of Wheat Spikes Based on Improved YOLOv10. Agronomy 2024, 14, 1936. [Google Scholar] [CrossRef]
Tian, Y.J.; Ye, Q.X.; David, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Qi, Y.L.; He, Y.T.; Qi, X.M.; Zhang, Y.; Yang, G.Y. Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6047–6056. [Google Scholar]
Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Proceedings of the ECML PKDD, Grenoble, France, 19–23 September 2023; pp. 443–459. [Google Scholar]
Han, K.; Wang, Y.H.; Tian, Q.; Guo, J.Y.; Xu, C.J.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
Chen, J.R.; Kao, S.H.; He, H.; Zhuo, W.P.; Wen, S.; Lee, C.H. Run Do not Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Li, J.F.; Wen, Y.; He, L.H. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
Yu, H.Y.; Wan, C.; Liu, M.C.; Chen, D.D.; Xiao, B.; Dai, X.Y. Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search. arXiv 2024, arXiv:2403.10413. [Google Scholar]
Ding, X.H.; Zhang, X.Y.; Ma, N.N.; Han, J.G.; Ding, G.G.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13728–13737. [Google Scholar]
Wang, S.W.; Li, Y.; Qiao, S.H. ALF-YOLO: Enhanced YOLOv8 based on multiscale attention feature fusion for ship detection. Ocean Eng. 2024, 308, 118233. [Google Scholar] [CrossRef]

Figure 1. FCA-YOLO network structure.

Figure 2. Gold-YOLO model structure.

Figure 3. CNeB model structure.

Figure 4. Working principle of ASFF.

Figure 5. Curve comparison of IoU = 0.5.

Figure 6. Test results of ablation experiment.

Figure 7. Loss curve.

Figure 8. Metric comparison.

Table 1. Dataset details.

Dataset	Training/Sheet	Test/Sheet	Validation/Sheet
Tribolium castaneum	878 + 272	120 + 34	109 + 34
Sitophilus oryzae	980 + 307	113 + 39	123 + 38
Cryptolestes ferrugineus	649 + 278	106 + 35	105 + 35

Table 2. Experimental setup.

Version Information		Experimental Parameters
Windows	11 Professional Edition	Input Image Size	$640 \times 640$
GPU	NVIDIA GeForce GTX 1660 SUPER (NVIDIA, Santa Clara, CA, USA)	Epochs	200
Python	3.9.19	Optimizer	SGD
Torch	2.0.0	SGD Momentum	0.937
Cuda	11.8	Batch Size	16
C++ Version	199711	Patience	10

Note: Training programs are adjusted in conjunction with multi-scale inputs, automatic mixed precision = False.

Table 3. Model performance comparison.

Num	Model Name	Precision (%)			Recall (%)			mAP (%)	FLOPs (G)	Params (M)
Num	Model Name	cn	mx	xc	cn	mx	xc	mAP (%)	FLOPs (G)	Params (M)
1	Faster-rcnn_r50_fpn	98.7	96.5	96.1	92.6	98.1	83.9	96.5	74.16	41.36
2	SSDlite_mobilenetv2	96.0	96.7	87.3	95.1	97.5	23.1	79.8	0.69	3.06
3	YOLOv6	67.4	73.6	46.8	72.9	77.7	57.1	95.2	11.39	4.64
4	YOLOv8	96.6	95.3	90.2	91.8	96.8	82.7	95.2	8.1	3.01
5	YOLOv10	94.2	94.6	88.8	92.7	94.8	79.5	94.2	8.2	2.70
6	YOLOv12	95.7	99.1	90.6	94.9	94.9	78.4	94.2	5.8	2.51

Note: The bolded sections indicate the optimal values.

Table 4. Comparison of different attention mechanisms.

Num	Attention Mechanism	mAP (%)	Params (M)	Preprocess	FLOPs (G)
1	C2f	96.23	8.04	0.5 ms	17.5
2	C2f_DySnakeConv	96.28	8.97	0.4 ms	19.6
3	C2f_SPDC	77.45	8.04	0.6 ms	4.6
4	C2f_CNeB	96.86	8.45	0.4 ms	18.6
5	C2f_GhostConv	95.97	7.85	0.6 ms	17

Note: The attention mechanisms in numbers 2 to 5 in the table are based on the v8 fusion FPN. In addition, the bolded sections indicate the optimal values.

Table 5. Experimental results of different attention mechanisms.

Num	Attention Mechanism	mAP (%)	Layers	Preprocess	FLOPs (G)
1	Detect	96.86	533	0.4ms	18.6
2	PC_Detect	96.58	561	0.5 ms	16.1
3	SC_Detect	94.81	546	0.4 ms	16.2
4	SA_Detect	95.92	613	0.5 ms	17.5
5	Rep_Detect	96.41	562	0.5 ms	18.9
6	ASFF_Detect	97.29	612	0.4 ms	20.8

Note: The bolded sections indicate the optimal values.

Table 6. Analysis of ablation experiment results.

Num	G	C	A	Precision (%)			mAP (%)	Accuracy (%)	F1-Score	Postprocess
Num	G	C	A	cn	mx	xc	mAP (%)	Accuracy (%)	F1-Score	Postprocess
1				0.97	0.95	0.90	95.23	85.74	0.92	1.1 ms
2	✔			0.97	0.99	0.93	96.23	90.25	0.95	0.8 ms
3	✔	✔		0.97	0.98	0.90	96.86	90.36	0.94	0.8 ms
4	✔	✔	✔	0.96	0.98	0.95	97.29	90.25	0.95	0.7 ms

Note: The bolded sections indicate the optimal values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ge, H.; Wang, J.; Zhen, T.; Li, Z.; Zhu, Y.; Pan, Q. FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses. Agronomy 2025, 15, 1313. https://doi.org/10.3390/agronomy15061313

AMA Style

Ge H, Wang J, Zhen T, Li Z, Zhu Y, Pan Q. FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses. Agronomy. 2025; 15(6):1313. https://doi.org/10.3390/agronomy15061313

Chicago/Turabian Style

Ge, Hongyi, Jing Wang, Tong Zhen, Zhihui Li, Yuhua Zhu, and Quan Pan. 2025. "FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses" Agronomy 15, no. 6: 1313. https://doi.org/10.3390/agronomy15061313

APA Style

Ge, H., Wang, J., Zhen, T., Li, Z., Zhu, Y., & Pan, Q. (2025). FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses. Agronomy, 15(6), 1313. https://doi.org/10.3390/agronomy15061313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FCA-YOLO: An Efficient Deep Learning Framework for Real-Time Monitoring of Stored-Grain Pests in Smart Warehouses

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Datasets

2.2. Experimental Environment

2.3. Modeling Assessment

2.4. FCA-YOLO Model Architecture

2.4.1. Baseline Model YOLOv8

2.4.2. Gold-YOLO: A Model Based on Feature Pyramid Networks

2.4.3. Attention Mechanism CNeB

2.4.4. Attention Mechanisms ASFF

3. Results

3.1. Comparison Experiment

3.2. Ablation Experiment

4. Discussion

4.1. Hybrid Technology Comparison

4.2. Cross-Domain Validation

4.3. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI