Next Article in Journal
Vibration-Excited Combined Harvester for Dual Harvesting of Ears and Stalks: Design and Experiments
Previous Article in Journal
Analysis of Critical “Source-Area-Period” of Agricultural Non-Point Source Pollution in Typical Hilly and Mountainous Areas: A Case Study of Yongchuan District, Chongqing City, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Vision-Guided Cleaning System for Seed-Production Wheat Harvesters Using RGB-D Sensing and Object Detection

1
College of Mechanical and Electrical Engineering, Qingdao Agricultural University, Qingdao 266109, China
2
Qingdao Plantech Mechanical Technology Co., Ltd., Qingdao 266109, China
*
Author to whom correspondence should be addressed.
Agriculture 2026, 16(1), 100; https://doi.org/10.3390/agriculture16010100
Submission received: 2 December 2025 / Revised: 21 December 2025 / Accepted: 29 December 2025 / Published: 31 December 2025
(This article belongs to the Section Agricultural Technology)

Abstract

Residues in the grain tank of seed-production wheat harvesters often cause varietal admixture, challenging seed purity maintenance above 99%. To address this, an intelligent cleaning system was developed for automatic residue recognition and removal. The system utilizes an RGB-D camera and an embedded AI unit paired with an improved lightweight object detection model. This model, enhanced for feature extraction and compressed via LAMP, was successfully deployed on a Jetson Nano, achieving 92.5% detection accuracy and 13.37 FPS for real-time 3D localization of impurities. A D–H kinematic model was established for the 4-DOF cleaning manipulator. By integrating the PSO and FWA models, the motion trajectory was optimized for time-optimality, reducing movement time from 9 s to 5.96 s. Furthermore, a gas–solid coupled simulation verified the separation capability of the cyclone-type dust extraction unit, which prevents motor damage and centralizes residue collection. Field tests confirmed the system’s comprehensive functionality, achieving an average cleaning rate of 92.6%. The proposed system successfully enables autonomous residue cleanup, effectively minimizing the risk of variety mixing and significantly improving the harvest purity and operational reliability of seed-production wheat. It presents a novel technological path for efficient seed production under the paradigm of smart agriculture.

1. Introduction

As one of the three major staple crops worldwide, wheat is cultivated on more than 33 million hectares of land in China and serves as a vital component of the national food supply system [1]. With the continuous expansion of seed production and propagation, increasingly stringent standards have been imposed on seed purity. During seed production, seed purity is generally maintained at levels exceeding 99%; even a minor admixture can cause varietal degradation, thereby diminishing both agricultural productivity and economic value [2]. At present, conventional combine harvesters are primarily designed for commercial grain production, and dedicated cleaning functions for residual kernels are generally not incorporated into their design, resulting in frequent retention of kernels or impurities in the grain tank and conveying components after harvesting [3]. During subsequent harvesting operations, these residues readily cause admixture among different varieties, thereby becoming a key technical bottleneck in the harvest process for seed production.
Existing studies have been conducted to improve the conveying and cleaning structures inside combine harvesters [4]. However, these improvements are primarily oriented toward conventional grain harvesting scenarios, where the main objectives are to enhance operational efficiency or reduce grain losses, rather than to strictly control cross-contamination among different varieties. Consequently, existing technologies still struggle to meet the stringent purity requirements of seed production [5]. Arai et al. developed a grain conveying mechanism for rice combine harvesters, which reduced the cleaning time by approximately 50% compared to traditional structures [6]. However, their study primarily focused on general grain harvesting, where the tolerance for residue cross-contamination is higher than that in seed production scenarios. In contrast, this study specifically targets wheat seed production, which demands stricter cleaning standards to prevent variety mixing. In recent years, the application of object detection technology in agriculture has been widely studied [7,8,9], and YOLO [10] has shown advantages in terms of speed and simplicity of model architecture, making it particularly well suited to mobile deployment requirements in agricultural engineering. Jing et al. proposed a lightweight hybrid design network [11], Wheat-YoloNet, based on an improved version of YOLOv8 for wheat ear recognition; compared with the YOLOv8n baseline model, the parameter count of the improved model was reduced by 33.3%, thereby facilitating deployment and operation on devices with limited computing resources. Machine vision techniques have been increasingly applied in cereal processing industries. In the milling and malting industry of barley seeds, vision-based methods have been employed for kernel identification and surface structure assessment, demonstrating the feasibility of video-based inspection approaches in grain cleaning and processing applications [12]. However, existing studies mainly focus on industrial grain processing and quality evaluation, and their application to automated cleaning tasks inside seed-production harvesters has rarely been reported. YOLO-based approaches have demonstrated high accuracy and real-time performance in tasks such as wheat spike recognition and grain detection, indicating their strong engineering applicability in agricultural scenarios. However, existing studies mainly focus on field phenotyping or yield estimation, and the application of object detection techniques to the automated removal of internal residues in combine harvesters remains largely unexplored. In this study, object detection is integrated with a four-degree-of-freedom cleaning manipulator to enable the removal of residual material from combine harvesters.
At present, a limited number of studies have begun attempting to combine visual perception, manipulator operation, and motion planning for automated cleaning or manipulation tasks. For instance, Wang et al. proposed a vision-guided cleaning robot system, verifying the feasibility of such methods in the industrial cleaning sector [13]. In the realm of autonomous cleaning robotics, Bergies et al. developed a vision system based on RGB-D cameras and a modified YOLOv3 algorithm to detect and locate various trash types on indoor floors [14]. Their work successfully demonstrated the feasibility of using deep learning to guide robotic cleaning in structured, open indoor environments. Additionally, Barathraj et al. proposed an intelligent beach cleaning robot to address coastal pollution [15]. This system integrates YOLOv5-based detection with dual refuse collection mechanisms, effectively filtering plastic debris from sand. In the energy infrastructure sector, Luo et al. developed a photovoltaic panel cleaning robot utilizing an improved lightweight YOLOv8 model [16]. By integrating optimized path planning algorithms, this system achieved autonomous dust removal on solar panels, significantly enhancing photoelectric conversion efficiency. However, the cleaning task within a seed production harvester differs fundamentally from the aforementioned studies. The interior of the harvester is characterized by narrow compartments and severe mechanical occlusions. Consequently, general cleaning solutions cannot be directly applied to the complex internal compartments of agricultural machinery, necessitating a specialized robotic system designed specifically for this confined environment.
Qing et al. addressed the problem of seed residues on the header of rice combine harvesters by designing a self-cleaning header device, in which several nozzles were arranged as key components [17]. Field experiments conducted under optimal parameter settings showed that the cleaning time per cycle was 10 s and that the self-cleaning rate reached 97.68%. This device largely addresses the problem of residue removal at the header of the harvester and exhibits high cleaning efficiency and a high cleaning rate; however, the approach is applicable only to cleaning the header portion of the harvester, and residues inside the harvester still cannot be removed. As mentioned above, research on cleaning systems for seed-production wheat combine harvesters is limited, and automated cleaning systems specifically designed for these machines are clearly lacking.
To address the above problems, a cleaning system for seed-production wheat combine harvesters that integrates visual detection with manipulator-based cleaning is proposed in this paper. The proposed cleaning system accomplishes the entire process from residue recognition to cleaning. The system integrates an RGB-D vision system, an embedded computing platform, and a four-degree-of-freedom cleaning manipulator to achieve autonomous detection, localization, and removal of residues in the grain tank, thereby providing a new technical route for ensuring high seed purity and enhancing the functionality of harvesting equipment.
The main contributions of this study are as follows:
(1)
An intelligent cleaning system is proposed for seed-production wheat combine harvesters, which realizes automatic detection, localization, and adsorption-based removal of residues in the grain tank after harvesting;
(2)
A YOLOv11-SMASE detection model is proposed, which enhances residue detection capability under complex conditions and achieves model lightweighting, enabling successful deployment on the NVIDIA Jetson Nano platform;
(3)
A time-optimal trajectory planning method integrating Particle Swarm Optimization (PSO) and the Fireworks Algorithm (FWA) is proposed to improve the motion efficiency of a robotic manipulator in vision-guided cleaning tasks.

2. Materials and Methods

This study comprises the structural design of the device, training of the object detection model, time-optimal trajectory planning and optimization, edge-computing deployment, drive unit design, and experimental validation. The overall research workflow is illustrated in Figure 1. First, raw images of residues were captured by a camera and annotated to form a dedicated dataset. The dataset was then preprocessed by adding noise and applying rotation and flipping operations. An improved object detection model was constructed on the basis of this dataset to satisfy the requirements of residue detection. The constructed model was deployed on an edge-computing platform, where a depth camera is employed to perform real-time detection. After the positions of residues are detected, the position information is transmitted to an STM32 controller (STMicroelectronics, Plan-les-Ouates, Switzerland), which drives the operation of each unit and applies the proposed intelligent optimization algorithm to reduce the operating time of the cleaning device and improve efficiency. These components collectively form a complete closed-loop process from data acquisition to residue removal.

2.1. Construction of the Dataset

The dataset employed in this study was collected by the authors within the grain bin of a Lovol RG70 combine harvester (LOVOL, Weifang, China). Images of grain-bin residues—including wheat kernels, partially threshed grains, and straw—were captured using a LUMIX G100 camera (Panasonic, Tokyo, Japan) from multiple angles under varying lighting conditions. The images possess three resolutions (4096 × 3072, 3072 × 3072, and 3888 × 3888 pixels) and are stored in .JPG format. A total of 974 original images were obtained, and representative samples from the dataset are illustrated in Figure 2. The number of labels in the three samples was 11,539, 9729, and 9032, respectively.
Image data augmentation is a widely adopted technique in deep learning, designed to enhance model robustness and generalization by increasing the size and diversity of the training dataset [18]. In this study, data augmentation was applied to expand the grain-bin residue dataset, thereby simulating a more realistic residue environment. As illustrated in Figure 3, multiple augmentation techniques were employed, including vertical flipping, 90° rotation, and Gaussian noise addition, followed by manual verification to ensure the validity of the augmented dataset. Finally, the original dataset was randomly expanded threefold, yielding a total of 2922 images, which were subsequently divided into a training set (2400 images) and a validation set (522 images) at an 8:2 ratio.

2.2. YOLOv11-SMASE Model

To address the challenges of diverse target sizes, dense distribution, and uneven lighting in the combine harvester grain tank—which often lead to missed detections and false alarms—this paper proposes the YOLOv11-SMASE architecture based on the lightweight YOLOv11n, as shown in Figure 4. Distinct from simple module stacking, the proposed architecture establishes a synergistic ‘Enhance-and-Refine’ mechanism. Specifically, the C3k2-SMAFB-CGLU module, integrated with self-modulated attention and gated convolution, is introduced into Layers 6 and 8 of the backbone to serve as a feature enhancer, strengthening the extraction of fine-grained details for small and occluded targets. Subsequently, the C2PSA-SEFFN module is embedded at Layer 10, functioning as a feature refiner that combines frequency domain modeling with channel attention to suppress environmental noise caused by lighting variations. This cooperative design ensures a closed loop from feature capture to refinement, significantly improving the model’s robustness in unstructured field environments.

2.2.1. C3k2_SMAFB_CGLU Structure

To enhance multi-scale perception and channel selection at the deep feature stage, the C3k2 module was structurally reconstructed. First, while preserving the original topology, we replaced the internal Bottleneck unit with the SMAFormer Block (SMAFB). As shown in Figure 5, the SMA module employs an embedded modulator and Multi-Head Self-Attention (MSA) to strengthen the capture of global context information.
Second, to address the semantic over-smoothing issue inherent in the original E-MLP structure, we substituted it with a Convolutional Gated Linear Unit (CGLU), whose structure is illustrated in Figure 6. Distinct from standard GLUs that generate gating signals solely from self-features, the CGLU embeds a 3 × 3 Depth-wise Convolution within the gating branch. This design forces the model to incorporate fine-grained local information from the neighborhood during channel selection, enabling context-aware feature screening.
Consequently, the C3k2-SMAFB-CGLU module integrates the global modeling of SMAFB with the local fine-grained gating of CGLU, significantly improving the parsing accuracy for small targets and complex textures.

2.2.2. C2PSA-SEFFN Structure

The C2PSA (Cross-Stage Partial Parallel Spatial Attention) module employs a parallel structure with main and auxiliary branches, combining the CSP (Cross Stage Partial) structure and the PSA (Pyramid Squeeze Attention) mechanism to enhance multi-scale feature-extraction capability. In this study, a spectrum-enhanced feed-forward network (SEFFN) [19] is introduced into the main branch, and its structure is illustrated in Figure 7. While maintaining the overall C2PSA framework, a feature-enhancement path is constructed that couples spatial attention with frequency-domain modulation. After spatial-attention aggregation in the main branch, the features are fed into the SEFFN module to achieve multi-stage feature remodeling from the spatial domain to the frequency domain. The SEFFN module first expands the channel dimension using a 1 × 1 convolution and then splits the features into two branches, which employ standard 3 × 3 depthwise convolution and dilated 3 × 3 depthwise convolution, respectively, to extract short-range and long-range contextual information.

2.2.3. Model Pruning

To facilitate deployment on embedded terminals, we employ the Layer-Adaptive Magnitude-based Pruning (LAMP) method to compress the YOLOv11-SMASE model, with the target acceleration ratio set to 2.0 [20]. Distinct from traditional global thresholding strategies, LAMP utilizes a layer-adaptive mechanism that prevents the degradation of critical feature layers caused by uniform pruning. The core principle involves quantifying the relative importance of connection weights via the LAMP score, which is defined as follows (Equation (1)):
s c o r e u ; W = W u 2 v = u W v 2
The algorithm automatically allocates sparsity across layers subject to a global pruning rate constraint. Specifically, it prioritizes the removal of redundant channels with lower LAMP scores while strictly preserving the optimal connections within each layer. This approach significantly reduces parameter volume while maintaining the model’s feature representation capability. The pruning process is illustrated in Figure 8.

2.2.4. Model Deployment

To ensure the practical applicability of the improved model in real-world production, it was deployed on the NVIDIA Jetson Nano platform (NVIDIA, Santa Clara, CA, USA). Validation was performed by capturing real-time images of the grain-bin interior after harvesting, using an Intel RealSense D435i depth camera (Intel, Santa Clara, CA, USA) for image acquisition. The deployment architecture is illustrated in Figure 9.

2.3. Construction of the Residue Cleaning System

To realize automated cleaning of residues within the harvester grain bin, a dedicated cleaning device was developed in this study. The device minimizes interference with the harvesting process and performs cleaning operations once harvesting is completed. The base of the device comprises an enclosed electric slide rail. In addition, to enable the suction nozzle to approach the residues as closely as possible, a four-degree-of-freedom (4-DOF) cleaning manipulator was designed, with the nozzle mounted on its end effector. The nozzle is connected to the vacuum system through an air hose embedded within the manipulator structure, thereby reducing the likelihood of interference with the grain bin. To detect residue positions, an RGB-D camera (Intel RealSense D435i) is integrated at the end of the cleaning manipulator. The detection program was developed in Python3.8 and deployed on an NVIDIA Jetson Nano platform. The Jetson Nano communicates with the STM32 main controller, transmitting the detected residue-position data. The main controller subsequently drives the manipulator, electric slide rail, and vacuum motor to complete the cleaning process. The system architecture of the wheat-harvester grain-bin cleaning device is illustrated in Figure 10.
The program flow of the cleaning system is illustrated in Figure 11. The manipulator controller and the Jetson Nano program operate in synchronization. The manipulator controller is primarily responsible for receiving coordinate data, controlling the servo motors of each joint, and operating the electric slide rail. In contrast, the Jetson Nano primarily handles target detection, coordinate transformation, data transmission, and human–machine interaction. To ensure the accuracy of coordinate transformation, a standard checkerboard was used for hand–eye calibration in advance to determine the rigid-body transformation matrix between the camera and the robot end-effector.

2.4. Kinematic Modeling and Trajectory Planning of the Cleaning Device

2.4.1. Kinematic Modeling

In complex operating environments, the multi-joint structure of the cleaning device necessitates a unified modeling approach for accurate kinematic representation. In this study, the standard Denavit–Hartenberg (D–H) parameter method [21] is employed to describe each joint of the cleaning device and to establish its kinematic model. This method accurately characterizes the geometric constraints among the motion pairs, and the established D–H coordinate system is illustrated in Figure 12.
Based on the kinematic model of the cleaning device and the D–H coordinate system, the parameters are obtained as listed in Table 1. The spatial relationship between each pair of adjacent joints is characterized by four parameters  a , α , d , θ , and the corresponding homogeneous transformation matrix is defined as shown in Equation (2). By sequentially computing the transformation matrices of all joints and multiplying them in order, the overall transformation relationship of the end-effector with respect to the base coordinate system is derived, as presented in Equation (3).
T i i 1 = cos θ i sin θ i cos α i sin θ i sin α i a i cos θ i sin θ i cos θ i cos α i cos θ i sin α i a i sin θ i 0 sin α i cos α i d i 0 0 0 1
T 4 0 = T 1 0 T 2 1 T 3 2 T 4 3 = n x o x a x p x n y o y a y p y n z o z a z p z 0 0 0 1
In the equation,  n x , n y , n z o x , o y , o z , and  a x , a y , a z  denote the orientation of the device’s end-effector, whereas  p x , p y , p z  represent its spatial position.
In this study, an analytical method was employed to solve the inverse kinematics of the cleaning device. This analytical approach offers rapid computation and high accuracy, thereby satisfying the system’s real-time performance requirements.
To facilitate analytical computation, the end-effector pose in three-dimensional space is projected onto the manipulator’s workspace plane. Let the desired end position be defined as:
P = x y z
According to the geometric relationship,  θ 1  can be determined from the horizontal projection of the end-effector, as given by:
θ 1 = a t a n 2 ( y , x )
Thus, the equivalent coordinates of the end-effector on this plane are defined as follows:
s = x 2 + y 2 h = z + a 1
Here, s represents the horizontal distance from the end-effector to the central axis of the base, whereas h denotes the vertical height of the end-effector relative to the reference plane of the first link.
Let the end-effector orientation angle be φ; then, the wrist-point position of the preceding link, sw and hw, can be expressed as follows:
s w = s a 4 cos φ h w = h a 4 sin φ
Furthermore:
r = s w 2 + h w 2 β = atan 2 ( h w ,   s w )
Here, r represents the projected distance from the wrist point to the base plane, whereas  β  denotes the angle of the wrist point relative to the horizontal axis.
According to the law of cosines, the following relationship can be derived:
cos θ 3 = r 2 a 2 2 a 3 2 2 a 2 a 3
sin   θ 3 = ± 1 cos 2 θ 3
θ 3 = atan 2   ( ± 1 cos 2 θ 3 , cos θ 3 )
According to planar geometric relationships, the following expressions can be derived:
γ = a t a n 2 ( a 3 sin θ 3 , a 2 + a 3 cos θ 3 )
θ 2 = β γ
Furthermore,  θ 4  is expressed as follows:
θ 4 = φ ( θ 2 + θ 3 )

2.4.2. Construction of the Piecewise Polynomial Interpolation Function

In this study, the 3–5–3 polynomial method was employed to construct a trajectory framework that satisfies boundary conditions, thereby ensuring motion smoothness and continuity. Compared to uniform higher-order strategies, this approach uses 3rd-order polynomials for the short-stroke lifting and lowering phases, effectively preventing the positional overshoot and oscillation often associated with high-order interpolation in short-duration movements. Meanwhile, the 5th-order polynomial in the intermediate long-distance translation phase ensures continuous acceleration. efficiency. Along the motion path of the cleaning device, one starting point, one endpoint, and two transition points were defined, dividing the overall trajectory into three segments, which were interpolated using cubic, quintic, and cubic polynomials, respectively. The 3–5–3 polynomial piecewise interpolation expression is given in Equation (15):
θ i 1 t 1 = a i 13 t 1 3 + a i 12 t 1 2 + a i 11 t 1 + a i 10 θ i 2 ( t 2 ) = a i 25 t 2 5 + a i 24 t 2 4 + a i 23 t 2 3 + a i 22 t 2 2 + a i 21 t 2 + a i 20 θ i 3 t 3 = a i 33 t 3 3 + a i 32 t 3 2 + a i 31 t 3 + a i 30
In the equation,  θ i 1 t 1 θ i 2 t 2 , and  θ i 3 t 3  represent the cubic polynomial trajectory of the first segment, the quintic polynomial trajectory of the second segment, and the cubic polynomial trajectory of the third segment for each joint, respectively. t 1 t 2 , and  t 3  denote the interpolation times of the three trajectory segments for each joint, whereas  a i m n  represents the coefficients of the m-th interpolation segment for the i-th joint.
The segmented interpolation times are treated as optimization variables to perform time-allocation optimization for the trajectories of each joint in the cleaning device. The objective function, together with the joint velocity and acceleration constraints, is formulated as follows:
f ( t ) = min j = 0 3 t i j max a i j a max max v i j v max
In the equation,  f t  denotes the fitness function; j represents the index of the path segment; tij is the interpolation time(s) of the j-th segment for the i-th joint; vmax denotes the maximum angular velocity of each joint (rad/s); and amax denotes the maximum angular acceleration of each joint (rad/s2).

2.4.3. Improved FWA–PSO Hybrid Algorithm

To optimize the time parameters for 3-5-3 polynomial interpolation [22], this study proposes an Improved Hybrid FWA-PSO Algorithm (IFPHA). By organically integrating the explosive global search mechanism of the Fireworks Algorithm (FWA) with the fast convergence capability of Particle Swarm Optimization (PSO) [23,24], this approach effectively overcomes the limitation of premature convergence inherent in standard PSO. During execution, the algorithm first employs the FWA explosion operator to maintain population diversity and perform wide-area exploration. Subsequently, the generated sparks are treated as particles and refined using the PSO mechanism for precise optimization.
To overcome insufficient exploration in the early stages and the tendency to fall into local optima in the standard PSO algorithm, this study introduces dynamic learning factors that adaptively adjust  c 1  and  c 2  during the iterative process. During the early phase, the adjustment expands the search range to enhance global optimization capability, whereas in the later phase, it strengthens convergence, guiding particles to rapidly approach the global optimum. The corresponding expressions are given as follows:
c 1 = 2 cos 2 π × i 2 i max c 2 = 2 c 1
In the equation,  i  denotes the current iteration number, whereas  i max  represents the maximum number of iterations.
To achieve a wider search range during the early stage and to improve convergence accuracy and speed during the later stage, the inertia weight is dynamically adjusted. Accordingly, a nonlinear inertia weight factor is introduced in this study and is expressed as follows:
ω ( k ) = ω start ( ω start ω end ) × 2 k T max k T max 2
In the equation,  ω k  represents the inertia weight at the  k -th iteration;  ω max  denotes the initial inertia weight ( ω max = 0.9 );  ω min  is the final inertia weight ( ω min = 0.4 ); and  T max  represents the maximum number of iterations. The variation in the improved inertia weight is illustrated in Figure 13.
Furthermore, to enhance the ability to escape from local optima, the spark generation strategy in the FWA component is improved, and an Elite Cauchy Mutation operator is introduced to replace traditional Gaussian mutation. The improved spark generation formula is shown in Equation (19), which dynamically allocates search resources based on fitness. By leveraging the significant heavy-tail characteristic of the Cauchy distribution, the algorithm generates larger-scale perturbations, thereby significantly improving global optimization efficiency in complex environments.
S i = M Y m a x f ( x i ) + ε i = 1 N Y m a x f ( x i ) + ε
In the equation,  M  is a constant used to regulate the number of sparks;  Y max  represents the maximum fitness value of the current population; and  f x i  denotes the fitness value of the  i -th firework.  ε  is introduced to prevent division-by-zero errors.

3. Results

3.1. Experimental Environment

To ensure experimental fairness, all models were evaluated under identical experimental conditions. The configuration of the experimental environment is presented in Table 2.
To ensure stable convergence across all models, and based on extensive empirical evaluation, the optimized training parameters employed in this study are presented in Table 3.

3.2. Evaluation Indicators

In this study, model performance was evaluated using the commonly adopted metrics in object detection, including Precision (P), Recall (R), and Mean Average Precision (mAP). Model lightweight characteristics were assessed based on the number of parameters, floating-point operations (FLOPs), and model size. Detailed descriptions of these evaluation metrics are presented in Table 4.

3.3. Ablation Study

To enhance the performance of residue recognition, multiple improvements were introduced, and the performance of the YOLOv11 framework was systematically analyzed. The C3k2-SMAFB-CGLU and C2PSA-SEFFN modules were incorporated to improve the recognition accuracy of grain bin residues. Specifically, the original YOLOv11 model served as the baseline for ablation experiments, in which improvement strategies were sequentially added to evaluate the contribution of each module and its impact on the baseline model. As shown in Table 5, introducing either the C3k2-SMAFB-CGLU or C2PSA-SEFFN module individually enhanced residue recognition performance. Replacing the original C3k2 module with the C3k2-SMAFB-CGLU module increased recall (R) and mean average precision (mAP) by 0.4% and 0.8%, respectively. This improvement is primarily attributed to the integration of multi-dimensional attention and fine-grained channel-gating structures during feature extraction. The SMAFB module enhanced small-target perception through multi-dimensional attention, whereas the CGLU’s local gating mechanism improved channel feature consistency, resulting in better detection of fine residues such as wheat kernels. The improvement achieved by the C2PSA-SEFFN module mainly stems from its spatial attention mechanism, which strengthens multi-scale feature fusion, and its frequency-domain enhancement mechanism, which improves feature distribution and suppresses noise, thereby yielding more stable and accurate detection results. When both the C3k2-SMAFB-CGLU and C2PSA-SEFFN modules were integrated simultaneously, the model achieved the highest accuracy and precision, with mAP and R values reaching 93.1% and 87%, respectively, while the number of parameters and FLOPs remained nearly identical to those of the original model, This demonstrates a complementary interaction between the two modules. C3k2-SMAFB-CGLU enhances spatial feature extraction and multi-scale context aggregation, whereas C2PSA-SEFFN performs channel-wise feature recalibration to suppress background noise. Since both modules utilize lightweight attention mechanisms, they effectively improve the model’s representational capability without significantly increasing the parameter count or FLOPs.
To further evaluate the robustness of the proposed model, we conducted a quantitative analysis on a subset of the validation data representing complex lighting conditions (e.g., strong shadows and dim light). The experimental results indicate that the baseline YOLOv11n model suffered a significant performance drop, with the mAP decreasing to 83.1%, whereas the proposed YOLOv11-SMASE maintained a higher accuracy of 88.4%. This 5.3% superiority in complex scenarios demonstrates the effectiveness of the SEFFN module in suppressing environmental noise.
To more clearly illustrate the results, HiResCAM (High-Resolution Class Activation Mapping) was employed to visualize the models’ prediction regions before and after the improvements. In the heatmap, warmer colors indicate higher attention to that region. As shown in Figure 14, when the YOLOv11n model is used for detection, the model’s primary attention is overly concentrated on small particles such as wheat kernels, thereby reducing its focus on straw and wheat ears. This not only causes small grains on wheat ears to be misclassified as individual wheat kernels but also increases the likelihood that irrelevant impurities are erroneously detected as residues. By contrast, the improved YOLOv11n-SMASE model exhibits a much broader and more uniform attention distribution: attention paid to wheat kernels is appropriately reduced, whereas attention to straw and wheat ears is markedly enhanced, thereby reducing missed and false detections. The comparison of heatmaps before and after optimization clearly demonstrates the effectiveness of the model improvements.

3.4. Model Deployment Experiment

To enable deployment of the model on edge-computing devices, comparative experiments were conducted before and after applying LAMP to verify its effectiveness. The comparison revealed significant reductions in both weight file size and parameter count following pruning. After applying LAMP, the model’s computational demand decreased by 3.2 GFLOPs, its file size was reduced by 3.38 MB (a 64.5% reduction), and the total number of parameters decreased to 802,276. These results provide compelling evidence of the effectiveness of the LAMP method in model optimization.
To validate the performance of the proposed YOLOv11-SMASE model in real-world scenarios and ensure that it meets the real-time requirements of grain-bin residue cleaning, deployment experiments were conducted after the model was deployed on the edge-computing device NVIDIA Jetson Nano. The detailed hardware specifications are presented in Table 6.
The deployment results of each model on mobile devices are presented in Table 7. In terms of detection accuracy, the YOLOv11n-SMASE model without LAMP achieved the highest performance, maintaining an accuracy of 92.5%. The unmodified YOLOv11n model exhibited the lowest accuracy, at 90.7%. The improved and pruned YOLOv11n-SMASE-LAMP model achieved an accuracy of 91.7%, showing only a slight reduction; however, its frame rate reached 13.37 FPS, representing a 45.1% improvement compared with the unpruned model. Additionally, FLOPs were reduced by 3.2 GFLOPs, and the model size decreased by 63.4%. These results demonstrate that the optimized model achieves faster inference on devices with limited computational resources, thereby validating the effectiveness of the pruning strategy and the feasibility of real-time deployment in practical applications.

3.5. Simulation of Time-Optimal Trajectory Planning

Based on the fusion strategy of the Fireworks Algorithm (FWA) and the Particle Swarm Optimization (PSO) algorithm, and combined with Cauchy mutation, dynamic learning factors, and adaptive inertia weights, the time-segmentation allocation problem was optimized and solved. To verify the effectiveness of the proposed hybrid algorithm, comparative analyses were conducted against the basic PSO and standalone FWA models. The main parameter settings were as follows: the maximum number of iterations was set to 150, and the population size was set to 50. For the basic PSO, the learning factors were  c 1 = c 2 = 0.45 , and the inertia weight was  ω = 0.5 . For the improved PSO, the learning factors were dynamically adjusted within the range [0.5, 2.5], with  c min = 0.5  and  c max = 2.5 ; the inertia weights were defined as  ω max = 0.9  and  ω min = 0.4 . For the FWA, the number of fireworks was set to  S min = 15  and  S max = 50 , with explosion amplitudes of  A min = 0.04  and  A max = 0.8 .
The iterative results are illustrated in Figure 15. It is evident that the improved Fireworks–Particle Swarm Hybrid Algorithm (IFPHA) converges significantly faster than the two conventional algorithms, and its final fitness value is substantially higher. Owing to its stronger local optimization capability, the PSO algorithm exhibits faster convergence than the FWA, whereas the FWA demonstrates superior global exploration ability, resulting in slower convergence but improved final optimization performance. The hybrid algorithm effectively exploits the complementary strengths of PSO and FWA, achieving substantial improvements in global search capability, local convergence, convergence speed, accuracy, and robustness, thereby validating the effectiveness of the proposed method.
Furthermore, to validate the engineering feasibility on embedded hardware, a hardware test was conducted using the STM32H750 main controller (STMicroelectronics, Plan-les-Ouates, Switzerland) operating at 480 MHz. By utilizing internally simulated coordinates of 20 randomly distributed targets as input, 20 independent trials were performed. Statistical results indicate that the average planning time of the improved algorithm was 0.682 s, with a standard deviation of 0.035 s and a variation range from 0.645 s to 0.712 s. Compared to the operation cycle of 56.0 s, this computational latency accounts for only 1.2%, demonstrating the feasibility of the algorithm on resource-constrained embedded devices.
Based on the improved 3–5–3 polynomial piecewise interpolation trajectory optimization strategy, trajectory planning simulations were performed using the MATLAB (R2020b) Robotics Toolbox to generate the end-effector motion trajectory through predefined waypoints, along with the optimized joint angular displacement, velocity, and acceleration curves. As shown in Figure 16, after optimization using the improved Fireworks–Particle Swarm Hybrid Algorithm (IFPHA), the manipulator’s motion time was reduced from 9.00 s to 5.96 s, corresponding to a 33% reduction. The peak values of joint velocity and acceleration increased, while their variation curves remained smooth and continuous without abrupt transitions. The joints experienced minimal impact, leading to improved response speed and enhanced operational efficiency of the system. These results ensure the high-performance operation of the cleaning mechanism and further validate the effectiveness of the proposed algorithm.

3.6. Simulation Experiment of the Vacuum Unit

The cyclone separator comprises a dust outlet, inlet and outlet pipes, and a conical cylinder [25]. It is installed outside the grain bin and connected to the suction pipe and the collector, as illustrated in Figure 17.
To evaluate the cyclone separator’s ability to effectively separate air and impurities and to prevent debris from entering and damaging the vacuum motor, a numerical simulation of the internal gas–solid two-phase flow was performed. According to the characteristics of the actual motor, the inlet boundary condition was defined as a velocity inlet, with both the continuous and discrete phases assigned an inlet velocity of 28 m/s. Based on the commonly cultivated wheat varieties in the Qingdao region, the dimensions of the wheat kernels used in the models are as follows: The wheat grain, straw, and ear of wheat models were sized at 7 × 4 × 3 mm, 46 × 5 × 3 mm, and 37 × 15 × 13 mm, respectively. The generation rate was set to 1000 particles/s, with a quantity ratio of 7:2:1 for the respective models. The outlet boundary condition was defined as a pressure outlet under standard atmospheric pressure, while the discrete phase boundary was specified as an escape condition. The wall boundary conditions were set as no-slip walls; the outlet wall for the discrete phase was treated as a trap, and all remaining wall boundaries were defined as rebound conditions.
A numerical simulation was conducted to evaluate the gas–solid separation performance of the cyclone separator for residual wheat kernels, straw, and partially unthreshed grains. The particle trajectories within the cyclone separator are illustrated in Figure 18. Driven by the rotational airflow entering through the inlet, particles move downward in a counterclockwise spiral. The tangential velocity peaks near the wall of the exhaust cylinder and gradually decreases as the gas flows downward. Particles are captured when they reach the bottom of the cone. A small fraction of untrapped particles spiral upward in a counterclockwise direction and escape through the exhaust outlet under the influence of secondary rotational airflow.
In addition, to verify that the separation performance and particle trajectories of the cyclone separator conformed to the design expectations, an EDEM–Fluent coupled simulation was conducted. The model parameters used in EDEM are summarized in Table 8 and Table 9.
A CFD–DEM two-way coupling approach was employed to evaluate the solid–gas separation effectiveness within the separator. As illustrated in Figure 19, purple is for wheat, and green for straw. The particle trajectories demonstrate that under the combined effects of centrifugal force and gravity, the residues are effectively discharged into the collector, while the airflow exits through the exhaust port. The simulation results confirm the rationality of the separation mechanism and validate the flow field design of the cyclone separator.

3.7. Grain Bin Cleaning Experiment

A field cleaning experiment was carried out using the grain bin of a Lovol RG70 combine harvester, which measured 1507 × 928 × 1634 mm. During the experiment, the cleaning device was installed at the center of the grain bin’s side wall, enabling coverage of the entire area beneath the auger, where residues are most densely accumulated.
To evaluate the cleaning performance of the system, ten cleaning trials were conducted inside the grain tank. The experimental procedure was standardized as follows: First, a fixed mass of 1.0 kg of residues was prepared for each trial, with a mass ratio of kernels, straw, and broken ears controlled at 8:1:1. These residues were poured into the grain bin and randomly distributed to simulate natural accumulation. Subsequently, the cleaning device was activated. To ensure effective suction, the system employed a 1200 W vacuum motor connected via a 3 m long flexible corrugated hose with a 32 mm inner diameter. Upon completion, the residues collected in the bin were weighed. The cleaning rate was defined as the ratio of the mass of removed residues to the initial mass. The experiments yielded an average cleaning rate of 92.6% with a standard deviation of 1.85%, ranging from 89.1% to 95.8%. The average cleaning time was 56 s with a standard deviation of 1.72 s, varying between 53 s and 59 s. The field experiment setup is illustrated in Figure 20.

4. Discussion

4.1. System Performance Analysis

Harvesting, as a critical stage in large-scale seed production, directly influences final seed quality. To address the issue of varietal admixture caused by harvest residues remaining in the grain tank, this study proposes a vision-guided robotic cleaning system for seed-production wheat harvesters, enabling automated detection and removal of post-harvest residues. The system integrates a lightweight deep learning detection model, a time-optimal motion planning algorithm, and a dedicated cleaning mechanism.
In terms of visual perception, the improved YOLOv11-SMASE model exhibits strong feature extraction capability under the complex and cluttered background conditions inside the grain tank. By incorporating the CGLU and SFEEN modules, the model effectively enhances the representation and recognition of fine-grained features associated with residual materials. Experimental results demonstrate that the proposed model not only surpasses the baseline in detection accuracy but also maintains real-time inference performance on resource-constrained edge computing platforms, thereby providing a reliable perceptual foundation for subsequent precise cleaning operations.
With respect to motion planning, the proposed Improved Fireworks–Particle Swarm Hybrid Algorithm (IFPHA) significantly improves the operational efficiency of the system. By integrating the global exploration capability of the Fireworks Algorithm (FWA) with the fast convergence characteristics of Particle Swarm Optimization (PSO), the hybrid algorithm effectively balances global exploration and local exploitation, mitigating the tendency of conventional optimization methods to become trapped in local optima. The optimized manipulator trajectories achieve time optimality while substantially reducing the cleaning cycle duration. At the same time, the generated motion profiles remain smooth and continuous, which helps to reduce mechanical impact, minimize wear, and extend the service life of the cleaning device.
Furthermore, the cyclone-based dust collection unit, whose effectiveness was verified through gas–solid two-phase coupled simulations, successfully prevents impurities from being drawn into the suction motor while enabling centralized collection of residual materials. This design not only ensures thorough cleaning but also avoids secondary contamination and contributes to improving the durability and reliability of the overall system.

4.2. Limitations and Future Work

Although the proposed system demonstrates effective cleaning performance in experimental validation, certain limitations remain when considering complex real-world agricultural environments, which also indicate directions for future research.
The dataset used in this study was primarily collected from the grain tank of a Lovol RG70 combine harvester. Although data augmentation techniques were applied to increase sample diversity, the background characteristics remain highly dependent on the specific harvester model. Variations in material properties, color, and structural design among different harvester brands or models may affect the cross-platform generalization capability of the detection model. Future work will focus on expanding the dataset to include images acquired from a wider range of harvester types and brands, thereby further validating and enhancing the general applicability of the proposed system.
In addition, real harvesting operations are often accompanied by high dust concentrations and severe illumination variations. Although image enhancement techniques alleviate some of these disturbances, the detection accuracy of a single vision-based perception system may still fluctuate under extreme operating conditions, potentially resulting in missed detections of residual materials. Future research will explore the integration of LiDAR and vision-based sensing to establish a multi-sensor fusion perception framework with improved robustness against environmental interference.
Moreover, the current validation primarily focuses on functional feasibility and short-term performance. The cumulative effects of prolonged vibration during harvesting operations and long-term dust accumulation on sensor accuracy and actuator lifespan have not yet been systematically evaluated. Future studies will therefore involve long-duration field tests to comprehensively assess the durability, stability, and long-term reliability of the proposed system throughout its operational lifecycle.

5. Conclusions

An automated cleaning system that integrates machine vision with a robotic manipulator is developed in this study to address the problem of seed admixture caused by post-harvest residues in the grain tank. Within this system, to achieve accurate recognition of residues under complex background conditions, an improved object detection algorithm, YOLOv11-SMASE, is proposed. With the incorporation of the LAMP strategy, the model’s computational requirements are significantly reduced, thereby enabling its successful deployment on resource-constrained edge-computing devices and demonstrating that, even without high-performance hardware, it can accurately identify residue features such as straw and wheat ears, thus meeting the detection requirements of the cleaning system. An improved particle swarm–fireworks hybrid algorithm is developed to increase the motion efficiency of the manipulator, yield smoother trajectories, and effectively reduce non-working time during the cleaning process. A cyclone-type dust extraction unit compatible with the cleaning system is designed, and EDEM–Fluent coupled simulations are conducted to verify its favorable gas–solid separation performance, ensuring that residues fall into the collector rather than entering the dust extraction motor, thus avoiding secondary contamination and motor damage. Field test results indicate that the developed system achieves an average cleaning rate of 92.6%, thereby significantly reducing the risk of varietal admixture and improving the harvest purity of seed-production wheat. However, in certain complex scenarios, such as excessive residue accumulation and highly variable illumination, recognition stability and cleaning thoroughness are adversely affected; future work will therefore focus on expanding multi-scenario datasets to enhance the model’s generalization capability and on investigating visual–servo-based adaptive adsorption strategies to further improve system stability under complex operating conditions.

Author Contributions

Conceptualization, L.Z. and C.Y.; methodology, L.Z. and J.X.; software, X.Z.; validation, J.Z. and J.X.; formal analysis, R.Y. and X.Z.; investigation, J.Z.; resources, L.Z.; data curation, J.X. and X.Z.; writing—original draft preparation, J.X.; writing—review and editing, J.X.; visualization, R.Y.; supervision, G.L. and C.Y.; project administration, G.L.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program (2023YFD2000404-1) and the Shandong Modern Agricultural Industry System Wheat Industry Innovation Team (SDAIT-01-13).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Authors Cheng Yang and Guoying Li were employed by the company Qingdao Plantech Mechanical Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Wang, F.; Wang, L.; Yu, X.; Gao, J.; Ma, D.; Guo, H.; Zhao, H. Effect of Planting Density on the Nutritional Quality of Grain in Representative High-Yielding Maize Varieties from Different Eras. Agriculture 2023, 13, 1835. [Google Scholar] [CrossRef]
  2. Wimalasekera, R. Role of Seed Quality in Improving Crop Yields. In Crop Production and Global Environmental Issues; Hakeem, K.R., Ed.; Springer International Publishing: Cham, Germany, 2015; pp. 153–168. [Google Scholar] [CrossRef]
  3. Jiang, Q.; Xiao, Z.; Zhong, J.; Zhao, Y.; Huang, H.; Zhang, Z.; Wang, R. Study of a Cleaning Intelligent Control System for Rice and Wheat Combine Harvester. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China, 20–22 March 2020; pp. 400–406. [Google Scholar] [CrossRef]
  4. Jiang, Q.; Liu, Y.; Zhou, X.; Yang, Y.; Zhang, J.; Jiang, B.; Fu, Y. Intelligent Control Knowledge-Based System for Cleaning Device of Rice–Wheat Combine Harvester. Int. J. Pattern Recognit. Artif. Intell. 2023, 37, 2359015. [Google Scholar] [CrossRef]
  5. Liang, Z.; Wada, M.E. Development of cleaning systems for combine harvesters: A review. Biosyst. Eng. 2023, 236, 79–102. [Google Scholar] [CrossRef]
  6. Arai, K.; Shimazu, M.; Umeda, N.; Kurihara, E. Study of Grain Conveyor Structure to Reduce Cleaning Time of Combine Harvester. Jpn. Agric. Res. Q. 2019, 53, 247–253. [Google Scholar] [CrossRef]
  7. Farjon, G.; Huijun, L.; Edan, Y. Deep-learning-based counting methods, datasets, and applications in agriculture: A review. Precis. Agric. 2023, 24, 1683–1711. [Google Scholar] [CrossRef]
  8. Zhang, Q.; Liu, Y.; Gong, C.; Chen, Y.; Yu, H. Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review. Sensors 2020, 20, 1520. [Google Scholar] [CrossRef] [PubMed]
  9. Nawaz, S.A.; Li, J.; Bhatti, U.A.; Shoukat, M.U.; Ahmad, R.M. Frontiers|AI-based object detection latest trends in remote sensing, multimedia and agriculture applications. Front. Plant Sci. 2022, 13, 1041514. [Google Scholar] [CrossRef] [PubMed]
  10. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
  11. Jing, F.; Wang, C.; Li, J.; Yang, C.; Liu, H.; Chen, Y. A Dual Detection Head YOLO Model with Its Application in Wheat Ear Recognition. Int. J. Cogn. Inform. Nat. Intell. 2024, 18, 17. [Google Scholar] [CrossRef]
  12. Szczypiński, P.M.; Zapotoczny, P. Computer vision algorithm for barley kernel identification, orientation estimation and surface structure assessment. Comput. Electron. Agric. 2025, 87, 32–38. [Google Scholar] [CrossRef]
  13. Wang, L.; Cheng, W.; Wang, C.; Jin, Z.; Peng, G.; Xiong, X. Motion planning of cleaning robot based on 3D vision. Sci. Prog. 2025, 108, 00368504251395134. [Google Scholar] [CrossRef] [PubMed]
  14. Bergies, S.-A.; Nguyen, P.T.-T.; Kuo, C.-H. Cleaning Robot Vision System Based on RGBD Camera and Deep Learning YOLO-based Object Detection Algorithm. Int. J. Irobotics 2021, 4, 23–29. [Google Scholar]
  15. Barathraj, M.; Yathunanthanasarma, B.; Mahiliny, J.; Jayasekara, A.G.B.P. Intelligent Beach Cleaning Robot with Dual Modes of Refuse Collection and YOLO-based Detection. In Proceedings of the 2024 4th International Conference on Electrical Engineering (EECon), Pekanbaru, Indonesia, 16–17 October 2024; pp. 89–94. [Google Scholar] [CrossRef]
  16. Luo, J.; Wang, G.; Lei, Y.; Wang, D.; Chen, Y.; Zhang, H. A photovoltaic panel cleaning robot with a lightweight YOLO v8. Front. Robot. AI 2025, 12, 1606774. [Google Scholar] [CrossRef] [PubMed]
  17. Qing, Y.; Chen, L.; Chen, D.; Wang, P.; Sun, W.; Yang, R. Simulation and Optimization of a Self-Cleaning Device for the Header of a Rice Seed Harvester Using Fluent-EDEM Coupling. Agriculture 2024, 14, 2312. [Google Scholar] [CrossRef]
  18. Chen, W.; Zhao, Y.; You, T.; Wang, H.; Yang, Y.; Yang, K. Automatic Detection of Scattered Garbage Regions Using Small Unmanned Aerial Vehicle Low-Altitude Remote Sensing Images for High-Altitude Natural Reserve Environmental Protection. Environ. Sci. Technol. 2021, 55, 3604–3611. [Google Scholar] [CrossRef] [PubMed]
  19. Sun, S.; Ren, W.; Zhou, J.; Gan, J.; Wang, R.; Cao, X. A Hybrid Transformer-Mamba Network for Single Image Deraining. arXiv 2024, arXiv:2409.00410. [Google Scholar] [CrossRef]
  20. Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive sparsity for the Magnitude-based Pruning. arXiv 2021, arXiv:2010.07611. [Google Scholar] [CrossRef]
  21. Denavit, J.; Hartenberg, R.S. A Kinematic Notation for Lower-Pair Mechanisms Based on Matrices. J. Appl. Mech. 2021, 22, 215–221. [Google Scholar] [CrossRef]
  22. Jiao, Y.; Zhao, Y.; Wen, S. Time-optimal trajectory planning for 6R manipulator arm based on chaotic improved sparrow search algorithm. Ind. Robot. Int. J. Robot. Res. Appl. 2025, 52, 509–521. [Google Scholar] [CrossRef]
  23. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
  24. Li, J.; Tan, Y. A Comprehensive Review of the Fireworks Algorithm. ACM Comput. Surv. 2019, 52, 1–28. [Google Scholar] [CrossRef]
  25. Wu, Z.; Su, C.; Xu, H.; Wang, L. Numerical Simulation of Dust Removal in the Cyclone Collector of a Straw Crusher Based on a Discrete Phase Model. Fluid Dyn. Mater. Process. 2022, 19, 1143–1157. [Google Scholar] [CrossRef]
Figure 1. The overall research route.
Figure 1. The overall research route.
Agriculture 16 00100 g001
Figure 2. Sample Images from the Residue Dataset.
Figure 2. Sample Images from the Residue Dataset.
Agriculture 16 00100 g002
Figure 3. Image augmentation of the dataset, including rotation, flipping, and noise addition.
Figure 3. Image augmentation of the dataset, including rotation, flipping, and noise addition.
Agriculture 16 00100 g003
Figure 4. Structure of the Improved YOLOv11 Model.
Figure 4. Structure of the Improved YOLOv11 Model.
Agriculture 16 00100 g004
Figure 5. Structure of the SMA Module.
Figure 5. Structure of the SMA Module.
Agriculture 16 00100 g005
Figure 6. Structure of the CGLU Module.
Figure 6. Structure of the CGLU Module.
Agriculture 16 00100 g006
Figure 7. Structure of the SEFFN Module.
Figure 7. Structure of the SEFFN Module.
Agriculture 16 00100 g007
Figure 8. Schematic diagram of LAMP.
Figure 8. Schematic diagram of LAMP.
Agriculture 16 00100 g008
Figure 9. Model Deployment Architecture.
Figure 9. Model Deployment Architecture.
Agriculture 16 00100 g009
Figure 10. Hardware Architecture of the Cleaning System.
Figure 10. Hardware Architecture of the Cleaning System.
Agriculture 16 00100 g010
Figure 11. Flowchart of the Cleaning Program.
Figure 11. Flowchart of the Cleaning Program.
Agriculture 16 00100 g011
Figure 12. D–H Coordinate Frames of the Cleaning Device.
Figure 12. D–H Coordinate Frames of the Cleaning Device.
Agriculture 16 00100 g012
Figure 13. Variation Curve of the Inertia Weight.
Figure 13. Variation Curve of the Inertia Weight.
Agriculture 16 00100 g013
Figure 14. Visualization Heatmap Comparison Before and After Model Improvement.
Figure 14. Visualization Heatmap Comparison Before and After Model Improvement.
Agriculture 16 00100 g014
Figure 15. Comparison of Convergence Curves.
Figure 15. Comparison of Convergence Curves.
Agriculture 16 00100 g015
Figure 16. Comparison of Joint Angle, Velocity, and Acceleration Curves for Different Optimization Algorithms.
Figure 16. Comparison of Joint Angle, Velocity, and Acceleration Curves for Different Optimization Algorithms.
Agriculture 16 00100 g016
Figure 17. Structure of Cyclone Dust Collector.
Figure 17. Structure of Cyclone Dust Collector.
Agriculture 16 00100 g017
Figure 18. Flow Field of the Cyclone Separator.
Figure 18. Flow Field of the Cyclone Separator.
Agriculture 16 00100 g018
Figure 19. Particle Trajectory Diagram.
Figure 19. Particle Trajectory Diagram.
Agriculture 16 00100 g019
Figure 20. Field Experiment.
Figure 20. Field Experiment.
Agriculture 16 00100 g020
Table 1. D-H parameters of the 4-DOF cleaning device.
Table 1. D-H parameters of the 4-DOF cleaning device.
i α i 1 / ° a i 1 / mm d i / mm θ i / °
1−904350−90~90
203900−30~90
3023000~90
4030 0 −90~90
Table 2. Experimental Environment Configuration.
Table 2. Experimental Environment Configuration.
HardwareConfigureSoftwareConfigure
CPUIntel i5-14600KFMATLABR2020b
GPURTX5070(12G)PyTorch2.8.0
RAM32 GCUDA12.9
Hard disk2 TPython3.8
Table 3. Training Parameters.
Table 3. Training Parameters.
Training ParametersConfiguration
Batch Size32
Initial Learning Rate0.01
OptimizerSGD
Number of Epochs300
Image size640 × 640
Momentum0.937
Weight Decay0.0005
Table 4. Description of Evaluation Metrics.
Table 4. Description of Evaluation Metrics.
Evaluation IndicatorFull NameComputational
PPrecision rate P = T P T P + F P × 100 %
RRecall rate R = T P T P + F N × 100 %
mAPAverage Precision m A P = 0 1 P R d R M
Table 5. Comparison of Ablation Experiments for Different Modules.
Table 5. Comparison of Ablation Experiments for Different Modules.
Yolov11nC3k2
-SMAFB-CGLU
C2PSA
-SEFFN
Precision
(%)
Recall
(%)
mAP
(%)
ParametersFLOPs
(G)
92.084.691.12,583,7376.3
92.085.091.92,971,0096.6
91.884.891.82,620,7536.3
92.587.093.12,592,7336.3
Table 6. Hardware Specifications of the Deployment Platform.
Table 6. Hardware Specifications of the Deployment Platform.
CategoryParameters
Operating SystemUbuntu 18.04
CPU4-core Arm® Cortex®-A57 (Arm Holdings, Cambridge, Cambridgeshire, UK)
CPU Max Freq1.43 GHz
GPUNVIDIA Tegra X1
GPU Max Freq921 MHz
Memory4 GB
Storage64 G
Power5–10 W
Table 7. Comparison of Deployment Performance among Different Models.
Table 7. Comparison of Deployment Performance among Different Models.
ModelPrecision
(%)
Recall
(%)
F1 ScoreFPSFLOPsModel Size (MB)
YOLOv8n91.0%84.6%87.7%8.028.16.0
YOLOv11n90.7%82.5%86.4%9.496.35.2
YOLOv11n-SMASE92.5%86.9%89.6%9.216.35.2
YOLOv11n-SMASE-LAMP91.7%87.1%89.3%13.373.11.9
Table 8. Material Property Parameters Used in the Simulation.
Table 8. Material Property Parameters Used in the Simulation.
ParticlePoisson RatioDensity
(kg/m3)
Shear Modulus
(Pa)
Grain0.313005.1 × 108
Straw0.3818501 × 106
Separator wall (plastic)0.429005.3 × 108
Table 9. Mechanical Parameters Used in the Simulation.
Table 9. Mechanical Parameters Used in the Simulation.
Contact FormCoefficient of RestitutionCoefficient of Static FrictionCoefficient of Kinetic Friction
Grain-Grain0.50.450.35
Grain-Straw0.40.350.3
Grain-Separator wall0.350.450.3
Straw-Straw0.30.50.4
Straw- Separator wall0.350.50.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xia, J.; Zhang, X.; Zhang, J.; Yang, C.; Li, G.; Yu, R.; Zhao, L. Vision-Guided Cleaning System for Seed-Production Wheat Harvesters Using RGB-D Sensing and Object Detection. Agriculture 2026, 16, 100. https://doi.org/10.3390/agriculture16010100

AMA Style

Xia J, Zhang X, Zhang J, Yang C, Li G, Yu R, Zhao L. Vision-Guided Cleaning System for Seed-Production Wheat Harvesters Using RGB-D Sensing and Object Detection. Agriculture. 2026; 16(1):100. https://doi.org/10.3390/agriculture16010100

Chicago/Turabian Style

Xia, Junjie, Xinping Zhang, Jingke Zhang, Cheng Yang, Guoying Li, Runzhi Yu, and Liqing Zhao. 2026. "Vision-Guided Cleaning System for Seed-Production Wheat Harvesters Using RGB-D Sensing and Object Detection" Agriculture 16, no. 1: 100. https://doi.org/10.3390/agriculture16010100

APA Style

Xia, J., Zhang, X., Zhang, J., Yang, C., Li, G., Yu, R., & Zhao, L. (2026). Vision-Guided Cleaning System for Seed-Production Wheat Harvesters Using RGB-D Sensing and Object Detection. Agriculture, 16(1), 100. https://doi.org/10.3390/agriculture16010100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop