1. Introduction
As one of the three major staple crops worldwide, wheat is cultivated on more than 33 million hectares of land in China and serves as a vital component of the national food supply system [
1]. With the continuous expansion of seed production and propagation, increasingly stringent standards have been imposed on seed purity. During seed production, seed purity is generally maintained at levels exceeding 99%; even a minor admixture can cause varietal degradation, thereby diminishing both agricultural productivity and economic value [
2]. At present, conventional combine harvesters are primarily designed for commercial grain production, and dedicated cleaning functions for residual kernels are generally not incorporated into their design, resulting in frequent retention of kernels or impurities in the grain tank and conveying components after harvesting [
3]. During subsequent harvesting operations, these residues readily cause admixture among different varieties, thereby becoming a key technical bottleneck in the harvest process for seed production.
Existing studies have been conducted to improve the conveying and cleaning structures inside combine harvesters [
4]. However, these improvements are primarily oriented toward conventional grain harvesting scenarios, where the main objectives are to enhance operational efficiency or reduce grain losses, rather than to strictly control cross-contamination among different varieties. Consequently, existing technologies still struggle to meet the stringent purity requirements of seed production [
5]. Arai et al. developed a grain conveying mechanism for rice combine harvesters, which reduced the cleaning time by approximately 50% compared to traditional structures [
6]. However, their study primarily focused on general grain harvesting, where the tolerance for residue cross-contamination is higher than that in seed production scenarios. In contrast, this study specifically targets wheat seed production, which demands stricter cleaning standards to prevent variety mixing. In recent years, the application of object detection technology in agriculture has been widely studied [
7,
8,
9], and YOLO [
10] has shown advantages in terms of speed and simplicity of model architecture, making it particularly well suited to mobile deployment requirements in agricultural engineering. Jing et al. proposed a lightweight hybrid design network [
11], Wheat-YoloNet, based on an improved version of YOLOv8 for wheat ear recognition; compared with the YOLOv8n baseline model, the parameter count of the improved model was reduced by 33.3%, thereby facilitating deployment and operation on devices with limited computing resources. Machine vision techniques have been increasingly applied in cereal processing industries. In the milling and malting industry of barley seeds, vision-based methods have been employed for kernel identification and surface structure assessment, demonstrating the feasibility of video-based inspection approaches in grain cleaning and processing applications [
12]. However, existing studies mainly focus on industrial grain processing and quality evaluation, and their application to automated cleaning tasks inside seed-production harvesters has rarely been reported. YOLO-based approaches have demonstrated high accuracy and real-time performance in tasks such as wheat spike recognition and grain detection, indicating their strong engineering applicability in agricultural scenarios. However, existing studies mainly focus on field phenotyping or yield estimation, and the application of object detection techniques to the automated removal of internal residues in combine harvesters remains largely unexplored. In this study, object detection is integrated with a four-degree-of-freedom cleaning manipulator to enable the removal of residual material from combine harvesters.
At present, a limited number of studies have begun attempting to combine visual perception, manipulator operation, and motion planning for automated cleaning or manipulation tasks. For instance, Wang et al. proposed a vision-guided cleaning robot system, verifying the feasibility of such methods in the industrial cleaning sector [
13]. In the realm of autonomous cleaning robotics, Bergies et al. developed a vision system based on RGB-D cameras and a modified YOLOv3 algorithm to detect and locate various trash types on indoor floors [
14]. Their work successfully demonstrated the feasibility of using deep learning to guide robotic cleaning in structured, open indoor environments. Additionally, Barathraj et al. proposed an intelligent beach cleaning robot to address coastal pollution [
15]. This system integrates YOLOv5-based detection with dual refuse collection mechanisms, effectively filtering plastic debris from sand. In the energy infrastructure sector, Luo et al. developed a photovoltaic panel cleaning robot utilizing an improved lightweight YOLOv8 model [
16]. By integrating optimized path planning algorithms, this system achieved autonomous dust removal on solar panels, significantly enhancing photoelectric conversion efficiency. However, the cleaning task within a seed production harvester differs fundamentally from the aforementioned studies. The interior of the harvester is characterized by narrow compartments and severe mechanical occlusions. Consequently, general cleaning solutions cannot be directly applied to the complex internal compartments of agricultural machinery, necessitating a specialized robotic system designed specifically for this confined environment.
Qing et al. addressed the problem of seed residues on the header of rice combine harvesters by designing a self-cleaning header device, in which several nozzles were arranged as key components [
17]. Field experiments conducted under optimal parameter settings showed that the cleaning time per cycle was 10 s and that the self-cleaning rate reached 97.68%. This device largely addresses the problem of residue removal at the header of the harvester and exhibits high cleaning efficiency and a high cleaning rate; however, the approach is applicable only to cleaning the header portion of the harvester, and residues inside the harvester still cannot be removed. As mentioned above, research on cleaning systems for seed-production wheat combine harvesters is limited, and automated cleaning systems specifically designed for these machines are clearly lacking.
To address the above problems, a cleaning system for seed-production wheat combine harvesters that integrates visual detection with manipulator-based cleaning is proposed in this paper. The proposed cleaning system accomplishes the entire process from residue recognition to cleaning. The system integrates an RGB-D vision system, an embedded computing platform, and a four-degree-of-freedom cleaning manipulator to achieve autonomous detection, localization, and removal of residues in the grain tank, thereby providing a new technical route for ensuring high seed purity and enhancing the functionality of harvesting equipment.
The main contributions of this study are as follows:
- (1)
An intelligent cleaning system is proposed for seed-production wheat combine harvesters, which realizes automatic detection, localization, and adsorption-based removal of residues in the grain tank after harvesting;
- (2)
A YOLOv11-SMASE detection model is proposed, which enhances residue detection capability under complex conditions and achieves model lightweighting, enabling successful deployment on the NVIDIA Jetson Nano platform;
- (3)
A time-optimal trajectory planning method integrating Particle Swarm Optimization (PSO) and the Fireworks Algorithm (FWA) is proposed to improve the motion efficiency of a robotic manipulator in vision-guided cleaning tasks.
3. Results
3.1. Experimental Environment
To ensure experimental fairness, all models were evaluated under identical experimental conditions. The configuration of the experimental environment is presented in
Table 2.
To ensure stable convergence across all models, and based on extensive empirical evaluation, the optimized training parameters employed in this study are presented in
Table 3.
3.2. Evaluation Indicators
In this study, model performance was evaluated using the commonly adopted metrics in object detection, including Precision (P), Recall (R), and Mean Average Precision (mAP). Model lightweight characteristics were assessed based on the number of parameters, floating-point operations (FLOPs), and model size. Detailed descriptions of these evaluation metrics are presented in
Table 4.
3.3. Ablation Study
To enhance the performance of residue recognition, multiple improvements were introduced, and the performance of the YOLOv11 framework was systematically analyzed. The C3k2-SMAFB-CGLU and C2PSA-SEFFN modules were incorporated to improve the recognition accuracy of grain bin residues. Specifically, the original YOLOv11 model served as the baseline for ablation experiments, in which improvement strategies were sequentially added to evaluate the contribution of each module and its impact on the baseline model. As shown in
Table 5, introducing either the C3k2-SMAFB-CGLU or C2PSA-SEFFN module individually enhanced residue recognition performance. Replacing the original C3k2 module with the C3k2-SMAFB-CGLU module increased recall (R) and mean average precision (mAP) by 0.4% and 0.8%, respectively. This improvement is primarily attributed to the integration of multi-dimensional attention and fine-grained channel-gating structures during feature extraction. The SMAFB module enhanced small-target perception through multi-dimensional attention, whereas the CGLU’s local gating mechanism improved channel feature consistency, resulting in better detection of fine residues such as wheat kernels. The improvement achieved by the C2PSA-SEFFN module mainly stems from its spatial attention mechanism, which strengthens multi-scale feature fusion, and its frequency-domain enhancement mechanism, which improves feature distribution and suppresses noise, thereby yielding more stable and accurate detection results. When both the C3k2-SMAFB-CGLU and C2PSA-SEFFN modules were integrated simultaneously, the model achieved the highest accuracy and precision, with mAP and R values reaching 93.1% and 87%, respectively, while the number of parameters and FLOPs remained nearly identical to those of the original model, This demonstrates a complementary interaction between the two modules. C3k2-SMAFB-CGLU enhances spatial feature extraction and multi-scale context aggregation, whereas C2PSA-SEFFN performs channel-wise feature recalibration to suppress background noise. Since both modules utilize lightweight attention mechanisms, they effectively improve the model’s representational capability without significantly increasing the parameter count or FLOPs.
To further evaluate the robustness of the proposed model, we conducted a quantitative analysis on a subset of the validation data representing complex lighting conditions (e.g., strong shadows and dim light). The experimental results indicate that the baseline YOLOv11n model suffered a significant performance drop, with the mAP decreasing to 83.1%, whereas the proposed YOLOv11-SMASE maintained a higher accuracy of 88.4%. This 5.3% superiority in complex scenarios demonstrates the effectiveness of the SEFFN module in suppressing environmental noise.
To more clearly illustrate the results, HiResCAM (High-Resolution Class Activation Mapping) was employed to visualize the models’ prediction regions before and after the improvements. In the heatmap, warmer colors indicate higher attention to that region. As shown in
Figure 14, when the YOLOv11n model is used for detection, the model’s primary attention is overly concentrated on small particles such as wheat kernels, thereby reducing its focus on straw and wheat ears. This not only causes small grains on wheat ears to be misclassified as individual wheat kernels but also increases the likelihood that irrelevant impurities are erroneously detected as residues. By contrast, the improved YOLOv11n-SMASE model exhibits a much broader and more uniform attention distribution: attention paid to wheat kernels is appropriately reduced, whereas attention to straw and wheat ears is markedly enhanced, thereby reducing missed and false detections. The comparison of heatmaps before and after optimization clearly demonstrates the effectiveness of the model improvements.
3.4. Model Deployment Experiment
To enable deployment of the model on edge-computing devices, comparative experiments were conducted before and after applying LAMP to verify its effectiveness. The comparison revealed significant reductions in both weight file size and parameter count following pruning. After applying LAMP, the model’s computational demand decreased by 3.2 GFLOPs, its file size was reduced by 3.38 MB (a 64.5% reduction), and the total number of parameters decreased to 802,276. These results provide compelling evidence of the effectiveness of the LAMP method in model optimization.
To validate the performance of the proposed YOLOv11-SMASE model in real-world scenarios and ensure that it meets the real-time requirements of grain-bin residue cleaning, deployment experiments were conducted after the model was deployed on the edge-computing device NVIDIA Jetson Nano. The detailed hardware specifications are presented in
Table 6.
The deployment results of each model on mobile devices are presented in
Table 7. In terms of detection accuracy, the YOLOv11n-SMASE model without LAMP achieved the highest performance, maintaining an accuracy of 92.5%. The unmodified YOLOv11n model exhibited the lowest accuracy, at 90.7%. The improved and pruned YOLOv11n-SMASE-LAMP model achieved an accuracy of 91.7%, showing only a slight reduction; however, its frame rate reached 13.37 FPS, representing a 45.1% improvement compared with the unpruned model. Additionally, FLOPs were reduced by 3.2 GFLOPs, and the model size decreased by 63.4%. These results demonstrate that the optimized model achieves faster inference on devices with limited computational resources, thereby validating the effectiveness of the pruning strategy and the feasibility of real-time deployment in practical applications.
3.5. Simulation of Time-Optimal Trajectory Planning
Based on the fusion strategy of the Fireworks Algorithm (FWA) and the Particle Swarm Optimization (PSO) algorithm, and combined with Cauchy mutation, dynamic learning factors, and adaptive inertia weights, the time-segmentation allocation problem was optimized and solved. To verify the effectiveness of the proposed hybrid algorithm, comparative analyses were conducted against the basic PSO and standalone FWA models. The main parameter settings were as follows: the maximum number of iterations was set to 150, and the population size was set to 50. For the basic PSO, the learning factors were , and the inertia weight was . For the improved PSO, the learning factors were dynamically adjusted within the range [0.5, 2.5], with and ; the inertia weights were defined as and . For the FWA, the number of fireworks was set to and , with explosion amplitudes of and .
The iterative results are illustrated in
Figure 15. It is evident that the improved Fireworks–Particle Swarm Hybrid Algorithm (IFPHA) converges significantly faster than the two conventional algorithms, and its final fitness value is substantially higher. Owing to its stronger local optimization capability, the PSO algorithm exhibits faster convergence than the FWA, whereas the FWA demonstrates superior global exploration ability, resulting in slower convergence but improved final optimization performance. The hybrid algorithm effectively exploits the complementary strengths of PSO and FWA, achieving substantial improvements in global search capability, local convergence, convergence speed, accuracy, and robustness, thereby validating the effectiveness of the proposed method.
Furthermore, to validate the engineering feasibility on embedded hardware, a hardware test was conducted using the STM32H750 main controller (STMicroelectronics, Plan-les-Ouates, Switzerland) operating at 480 MHz. By utilizing internally simulated coordinates of 20 randomly distributed targets as input, 20 independent trials were performed. Statistical results indicate that the average planning time of the improved algorithm was 0.682 s, with a standard deviation of 0.035 s and a variation range from 0.645 s to 0.712 s. Compared to the operation cycle of 56.0 s, this computational latency accounts for only 1.2%, demonstrating the feasibility of the algorithm on resource-constrained embedded devices.
Based on the improved 3–5–3 polynomial piecewise interpolation trajectory optimization strategy, trajectory planning simulations were performed using the MATLAB (R2020b) Robotics Toolbox to generate the end-effector motion trajectory through predefined waypoints, along with the optimized joint angular displacement, velocity, and acceleration curves. As shown in
Figure 16, after optimization using the improved Fireworks–Particle Swarm Hybrid Algorithm (IFPHA), the manipulator’s motion time was reduced from 9.00 s to 5.96 s, corresponding to a 33% reduction. The peak values of joint velocity and acceleration increased, while their variation curves remained smooth and continuous without abrupt transitions. The joints experienced minimal impact, leading to improved response speed and enhanced operational efficiency of the system. These results ensure the high-performance operation of the cleaning mechanism and further validate the effectiveness of the proposed algorithm.
3.6. Simulation Experiment of the Vacuum Unit
The cyclone separator comprises a dust outlet, inlet and outlet pipes, and a conical cylinder [
25]. It is installed outside the grain bin and connected to the suction pipe and the collector, as illustrated in
Figure 17.
To evaluate the cyclone separator’s ability to effectively separate air and impurities and to prevent debris from entering and damaging the vacuum motor, a numerical simulation of the internal gas–solid two-phase flow was performed. According to the characteristics of the actual motor, the inlet boundary condition was defined as a velocity inlet, with both the continuous and discrete phases assigned an inlet velocity of 28 m/s. Based on the commonly cultivated wheat varieties in the Qingdao region, the dimensions of the wheat kernels used in the models are as follows: The wheat grain, straw, and ear of wheat models were sized at 7 × 4 × 3 mm, 46 × 5 × 3 mm, and 37 × 15 × 13 mm, respectively. The generation rate was set to 1000 particles/s, with a quantity ratio of 7:2:1 for the respective models. The outlet boundary condition was defined as a pressure outlet under standard atmospheric pressure, while the discrete phase boundary was specified as an escape condition. The wall boundary conditions were set as no-slip walls; the outlet wall for the discrete phase was treated as a trap, and all remaining wall boundaries were defined as rebound conditions.
A numerical simulation was conducted to evaluate the gas–solid separation performance of the cyclone separator for residual wheat kernels, straw, and partially unthreshed grains. The particle trajectories within the cyclone separator are illustrated in
Figure 18. Driven by the rotational airflow entering through the inlet, particles move downward in a counterclockwise spiral. The tangential velocity peaks near the wall of the exhaust cylinder and gradually decreases as the gas flows downward. Particles are captured when they reach the bottom of the cone. A small fraction of untrapped particles spiral upward in a counterclockwise direction and escape through the exhaust outlet under the influence of secondary rotational airflow.
In addition, to verify that the separation performance and particle trajectories of the cyclone separator conformed to the design expectations, an EDEM–Fluent coupled simulation was conducted. The model parameters used in EDEM are summarized in
Table 8 and
Table 9.
A CFD–DEM two-way coupling approach was employed to evaluate the solid–gas separation effectiveness within the separator. As illustrated in
Figure 19, purple is for wheat, and green for straw. The particle trajectories demonstrate that under the combined effects of centrifugal force and gravity, the residues are effectively discharged into the collector, while the airflow exits through the exhaust port. The simulation results confirm the rationality of the separation mechanism and validate the flow field design of the cyclone separator.
3.7. Grain Bin Cleaning Experiment
A field cleaning experiment was carried out using the grain bin of a Lovol RG70 combine harvester, which measured 1507 × 928 × 1634 mm. During the experiment, the cleaning device was installed at the center of the grain bin’s side wall, enabling coverage of the entire area beneath the auger, where residues are most densely accumulated.
To evaluate the cleaning performance of the system, ten cleaning trials were conducted inside the grain tank. The experimental procedure was standardized as follows: First, a fixed mass of 1.0 kg of residues was prepared for each trial, with a mass ratio of kernels, straw, and broken ears controlled at 8:1:1. These residues were poured into the grain bin and randomly distributed to simulate natural accumulation. Subsequently, the cleaning device was activated. To ensure effective suction, the system employed a 1200 W vacuum motor connected via a 3 m long flexible corrugated hose with a 32 mm inner diameter. Upon completion, the residues collected in the bin were weighed. The cleaning rate was defined as the ratio of the mass of removed residues to the initial mass. The experiments yielded an average cleaning rate of 92.6% with a standard deviation of 1.85%, ranging from 89.1% to 95.8%. The average cleaning time was 56 s with a standard deviation of 1.72 s, varying between 53 s and 59 s. The field experiment setup is illustrated in
Figure 20.
4. Discussion
4.1. System Performance Analysis
Harvesting, as a critical stage in large-scale seed production, directly influences final seed quality. To address the issue of varietal admixture caused by harvest residues remaining in the grain tank, this study proposes a vision-guided robotic cleaning system for seed-production wheat harvesters, enabling automated detection and removal of post-harvest residues. The system integrates a lightweight deep learning detection model, a time-optimal motion planning algorithm, and a dedicated cleaning mechanism.
In terms of visual perception, the improved YOLOv11-SMASE model exhibits strong feature extraction capability under the complex and cluttered background conditions inside the grain tank. By incorporating the CGLU and SFEEN modules, the model effectively enhances the representation and recognition of fine-grained features associated with residual materials. Experimental results demonstrate that the proposed model not only surpasses the baseline in detection accuracy but also maintains real-time inference performance on resource-constrained edge computing platforms, thereby providing a reliable perceptual foundation for subsequent precise cleaning operations.
With respect to motion planning, the proposed Improved Fireworks–Particle Swarm Hybrid Algorithm (IFPHA) significantly improves the operational efficiency of the system. By integrating the global exploration capability of the Fireworks Algorithm (FWA) with the fast convergence characteristics of Particle Swarm Optimization (PSO), the hybrid algorithm effectively balances global exploration and local exploitation, mitigating the tendency of conventional optimization methods to become trapped in local optima. The optimized manipulator trajectories achieve time optimality while substantially reducing the cleaning cycle duration. At the same time, the generated motion profiles remain smooth and continuous, which helps to reduce mechanical impact, minimize wear, and extend the service life of the cleaning device.
Furthermore, the cyclone-based dust collection unit, whose effectiveness was verified through gas–solid two-phase coupled simulations, successfully prevents impurities from being drawn into the suction motor while enabling centralized collection of residual materials. This design not only ensures thorough cleaning but also avoids secondary contamination and contributes to improving the durability and reliability of the overall system.
4.2. Limitations and Future Work
Although the proposed system demonstrates effective cleaning performance in experimental validation, certain limitations remain when considering complex real-world agricultural environments, which also indicate directions for future research.
The dataset used in this study was primarily collected from the grain tank of a Lovol RG70 combine harvester. Although data augmentation techniques were applied to increase sample diversity, the background characteristics remain highly dependent on the specific harvester model. Variations in material properties, color, and structural design among different harvester brands or models may affect the cross-platform generalization capability of the detection model. Future work will focus on expanding the dataset to include images acquired from a wider range of harvester types and brands, thereby further validating and enhancing the general applicability of the proposed system.
In addition, real harvesting operations are often accompanied by high dust concentrations and severe illumination variations. Although image enhancement techniques alleviate some of these disturbances, the detection accuracy of a single vision-based perception system may still fluctuate under extreme operating conditions, potentially resulting in missed detections of residual materials. Future research will explore the integration of LiDAR and vision-based sensing to establish a multi-sensor fusion perception framework with improved robustness against environmental interference.
Moreover, the current validation primarily focuses on functional feasibility and short-term performance. The cumulative effects of prolonged vibration during harvesting operations and long-term dust accumulation on sensor accuracy and actuator lifespan have not yet been systematically evaluated. Future studies will therefore involve long-duration field tests to comprehensively assess the durability, stability, and long-term reliability of the proposed system throughout its operational lifecycle.
5. Conclusions
An automated cleaning system that integrates machine vision with a robotic manipulator is developed in this study to address the problem of seed admixture caused by post-harvest residues in the grain tank. Within this system, to achieve accurate recognition of residues under complex background conditions, an improved object detection algorithm, YOLOv11-SMASE, is proposed. With the incorporation of the LAMP strategy, the model’s computational requirements are significantly reduced, thereby enabling its successful deployment on resource-constrained edge-computing devices and demonstrating that, even without high-performance hardware, it can accurately identify residue features such as straw and wheat ears, thus meeting the detection requirements of the cleaning system. An improved particle swarm–fireworks hybrid algorithm is developed to increase the motion efficiency of the manipulator, yield smoother trajectories, and effectively reduce non-working time during the cleaning process. A cyclone-type dust extraction unit compatible with the cleaning system is designed, and EDEM–Fluent coupled simulations are conducted to verify its favorable gas–solid separation performance, ensuring that residues fall into the collector rather than entering the dust extraction motor, thus avoiding secondary contamination and motor damage. Field test results indicate that the developed system achieves an average cleaning rate of 92.6%, thereby significantly reducing the risk of varietal admixture and improving the harvest purity of seed-production wheat. However, in certain complex scenarios, such as excessive residue accumulation and highly variable illumination, recognition stability and cleaning thoroughness are adversely affected; future work will therefore focus on expanding multi-scenario datasets to enhance the model’s generalization capability and on investigating visual–servo-based adaptive adsorption strategies to further improve system stability under complex operating conditions.