Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting

Kim, Joonam; Tokuda, Kenichi; Miho, Yuichiro; Kim, Giryeon; Yoshitoshi, Rena; Tsuchiya, Shinori; Deguchi, Noriko; Funabiki, Kunihiro

doi:10.3390/agronomy16030383

Open AccessEditor’s ChoiceArticle

Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting

by

Joonam Kim

^1,*

,

Kenichi Tokuda

¹

,

Yuichiro Miho

¹,

Giryeon Kim

¹,

Rena Yoshitoshi

¹

,

Shinori Tsuchiya

²,

Noriko Deguchi

²

and

Kunihiro Funabiki

³

¹

Research Center for Agricultural Robotics, National Agricultural and Food Research Organization, 2-1-2 Kannondai, Tsukuba 305-0856, Ibaraki, Japan

²

Hokkaido Agricultural Research Center, National Agricultural and Food Research Organization, 9-4 Shinsei-Minami, Memuro, Kasai 082-0081, Hokkaido, Japan

³

Toyo Agricultural Machinery Manufacturing Co., Ltd., 2-5, Kita 1, Nishi 22, Obihiro 080-2462, Hokkaido, Japan

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(3), 383; https://doi.org/10.3390/agronomy16030383

Submission received: 23 December 2025 / Revised: 23 January 2026 / Accepted: 28 January 2026 / Published: 5 February 2026

(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)

Download

Browse Figures

Versions Notes

Abstract

An advanced artificial intelligence (AI)-powered mobile automated impurity removal system was developed and integrated into potato harvesting machinery for decentralized agricultural environments in Japan. As opposed existing stationary AI systems in centralized processing facilities, this mobile prototype enables on-field impurity removal in real time through a systematic dual-evaluation methodology. The system integrates the YOLOX-small architecture with precision pneumatic actuators and achieves 40–50 FPS processing under dynamic field conditions. Algorithm validation across 10 morphologically diverse potato varieties (Danshaku, Harrow Moon, Hokkaikogane, Kitaakari, Kitahime, May Queen, Sayaka, Snowden, Snow March, and Toyoshiro) using count-based analysis showed exceptional recognition, with potato misclassification rates of 0.08 ± 0.03% (range: 0.01–0.32%) and impurity detection rates of 89.99 ± 1.25% (range: 80.00–93.30%). Cross-farm validation across seven commercial farms in Hokkaido confirmed robust algorithm consistency (PMR: 0.08 ± 0.03%, IDR: 90.56 ± 0.82%) without farm-specific calibration, establishing variety-independent and environment-independent operation. Field validation using weight-based analysis during actual harvesting at 1–4 km/h confirmed successful AI-to-field translation, with 0.22–0.42% potato misclassification and adaptive impurity removal of 71.43–85.29%. The system adapted intelligently, employing conservative sorting under high-impurity loads (71.43% removal, 0.33% misclassification) to prioritize potato preservation while maximizing efficiency under standard conditions (85.29% removal, 0.30% misclassification). The dual-evaluation framework successfully bridged the gap between AI accuracy in laboratory settings and effectiveness in agricultural operations. The proposed AI algorithm surpassed project targets for all tested conditions (>60% impurity removal, <1% potato misclassification). This successful integration demonstrates technical feasibility and commercial viability for widespread agricultural automation, with a validated 50% reduction in labor (four workers to two workers). This implementation provides a comprehensive validation methodology for next-generation autonomous harvesting systems.

Keywords:

artificial intelligence; agricultural automation; potato harvesting; YOLOX; mobile robotics; dual-evaluation framework

1. Introduction

1.1. Background and Motivation

The global agricultural sector faces critical challenges because it is driven by a declining and aging workforce. This issue has been identified as a significant threat to food production in the United Nations Sustainable Development Goals [1]. In particular, Japan has experienced a drastic 34% decrease in its core agricultural workforce, which decreased from 1.757 million in 2015 to 1.164 million in 2023. Compounding this issue is the fact that, as of 2023, 71% of these workers were over 65 years old. According to the Ministry of Agriculture, Forestry and Fisheries of Japan, labor shortages result in annual economic losses of approximately one trillion yen owing to declining productivity and abandoned farmlands [2]. Among the major crops, potatoes play a crucial role in global food sustainability owing to their high nutritional value and climate adaptability [3,4]; however, their harvesting process remains significantly more labor intensive than that of wheat, rice, or corn. Traditional harvesting operations typically require three to four workers simultaneously—two dedicated to impurity removal and two to quality control—making it the most labor-intensive phase of production [5].

Whereas large-scale operations in Europe and the United States have adopted centralized processing facilities to leverage economies of scale [6], the decentralized agricultural structure of Japan, which is characterized by hilly terrain, narrow fields, and diverse microclimatic conditions, demands on-site processing during harvesting [7,8]. Although centralized facilities utilize advanced artificial intelligence (AI) technologies for automated sorting [9,10,11,12,13], adapting these stationary solutions to Japan’s unique mobile context remains a significant hurdle. Thus, recent advancements in agricultural automation have shifted the focus from full autonomy to human–machine collaboration systems, which offer practical and economically viable solutions by leveraging the complementary strengths of human decision-making and machine precision [14].

1.2. Key Challenges in Mobile AI Implementation

This study addresses two primary challenges that have previously been detailed as separate constraints.

The first challenge involves ensuring reliable AI recognition under dynamic field conditions. Adapting AI systems to diverse environmental conditions in outdoor farming locations presents significant challenges [15]. It requires a model capable of precise identification across various conditions, such as variable lighting, mechanical vibrations, and diverse potato varieties. The second challenge concerns the creation of a scalable framework for system expansion, necessitating systematic processes for onsite data collection and continuous model adaptation as the system is deployed at new farming locations.

1.3. Objectives and Contributions

1.3.1. Objectives and System Description

To address these challenges, this study developed a mobile AI-powered impurity removal system that was integrated into the TOP1e series potato harvester, designed specifically for Japan’s decentralized farming environment. As shown in Figure 1, the system operates on the first sorting conveyor alongside human sorters, with the aim of reducing the labor requirements from four to two workers while maintaining the harvest throughput. Building on preliminary research from the Strategic Innovation Creation Program [16], this study pursued two primary objectives: (1) ensuring reliable AI recognition under dynamic field conditions and (2) establishing a scalable framework for systematic adaptation as deployment expands to new farming locations.

1.3.2. Three Key Contributions

This study makes three key contributions to agricultural automation: (1) Development of a mobile AI system: We developed the first mobile AI-powered impurity removal system that can be integrated directly into commercial potato harvesters, demonstrating real-time processing (30–40 FPS) and a 50% labor reduction capability. (2) Establishment of a dual-evaluation framework: We established a methodology that combines count-based AI algorithm metrics with weight-based field assessments, bridging the gap between laboratory accuracy and operational agricultural effectiveness. (3) Cross-environment validation: We confirmed robust system operation across 10 morphologically diverse potato varieties and seven commercial farms without variety-specific calibration, establishing scalability for decentralized agricultural operations. This restructuring clearly delineates the essential differences between our work and existing stationary facility-based systems.

2. Proposition

The proposed system comprises three interconnected components to enable efficient onsite implementation and adaptation: a standalone data-collection system for base AI model training, an adaptive machine-learning strategy for continuous improvement, and a real-time AI-based impurity removal device.

2.1. Standalone Data-Collection System

The standalone data-collection system shown in Figure 2 enables seamless integration with the harvesters operating in actual farming environments for base AI model training data acquisition. The system design prioritizes two key aspects: consistent state image acquisition and non-interference with regular harvesting operations.

This implementation incorporates a Jetson Nano computer (NVIDIA, Santa Clara, CA, USA), which was selected for cost-effectiveness in the camera control system integration on an Ubuntu 20.04LT operating system. The internal configuration of the control box, depicted in Figure 2a, includes embedded system components with an uninterrupted power supply to provide reliable data-storage capabilities. The image acquisition system consists of a Basler acA2040-55uc camera (Basler, Ahrensburg, Germany) coupled with an H0514-MP2 low-distortion wide-angle lens (Computar, Tokyo, Japan). This industrial-grade camera configuration enables precise control of the brightness, exposure time, gain, and region of interest. The lens specifications ensure an optimal focal length and minimal distortion at a working distance of approximately 60 cm from the conveyor belt.

This system incorporates mechanisms for controlling the exposure time and gain to maintain a consistent image quality under changing outdoor conditions. A key component is the 18% reflectance gray panel, which serves as a reference area for dynamic exposure adjustment and allows for adaptation to changing natural light. This approach is particularly crucial because of the computational limitations of implementing complex AI architectures that can handle widely varying lighting.

Furthermore, the automated data-collection process, which is synchronized with the tractor power, enables automatic recording initiation, data preservation, and system shutdown while minimizing operator intervention. Video-format recordings are saved on external SD cards, with the data collected twice weekly for optimized storage. During experimental implementation spanning 50 d, four operational systems successfully acquired 200,000 high-quality training images from seven different Hokkaido farms. The quality and effectiveness of this approach can be observed in Figure 2b, which shows typical field images captured by the Basler camera during actual operation. The field deployment implementation, shown in Figure 2c, validated the seamless integration with commercial harvesting operations. The compact control box design enables direct mounting on existing harvesters without modifying the standard operating procedures, whereas the weatherproof enclosure ensures reliable operation under field conditions. This integration approach facilitates data collection across multiple farming operations simultaneously, maximizing dataset diversity while minimizing operational disruption to commercial harvesting activities.

2.2. Adaptive Machine-Learning Strategy for Continuous Improvement

To address the challenge of adapting AI systems to diverse environmental conditions in outdoor farming locations, the proposed adaptive machine-learning strategy enables the rapid incorporation of field data and continuous adaptation to varying conditions, as illustrated in the comprehensive workflow diagram (Figure 3).

This strategy consists of two main components: base model preparation (Figure 3b, upper section) and data feedback for adaptation (Figure 3b, lower section). The overall annotation efficiency is achieved through the semi-automatic workflow shown in Figure 3a.

Semi-automatic annotation workflow: The annotation process begins with automated video collection to produce approximately 100,000 raw images from the standalone data-collection system (Figure 2). This large corpus represents diverse field conditions that are encountered during actual harvesting operations. From these 100,000 images, 500 representative samples are manually annotated by trained agricultural experts to create the initial training dataset. These manually annotated samples are carefully selected to represent varying soil moisture levels, lighting conditions, and impurity compositions. The manual annotation establishes ground-truth labels for potatoes, soil clumps, and stones, with precise bounding box coordinates and class assignments. The annotation process uses an in-house-developed semi-automatic annotation program based on the YOLOv5 architecture, which is specifically designed for agricultural field data processing. This custom tool generates initial bounding-box predictions for detected objects, which are then systematically reviewed and corrected by trained agricultural experts. The YOLOv5-based annotation tool serves as a labeling assistant to accelerate the dataset preparation phase, whereas the final object detection system deployed on the harvester employs the YOLOX-small architecture that is optimized for real-time edge computing performance.

The initial 500 annotated images (labeled using the YOLOv5-based annotation tool with manual verification) are used to train the base YOLOX model following the training procedure described in Section 3.3. This base model is then deployed to perform inference on an expanded dataset of 2000 + n images automatically, generating predicted bounding boxes and class labels. The inference results undergo systematic manual revision by trained experts who compare the predicted annotations against the actual object identities in the images. Corrections are made to misclassified objects, missed detections, and incorrectly positioned bounding boxes. This manually revised dataset is fed back into the training pipeline, creating an improved model through iterative machine-learning cycles that are repeated n times until the detection accuracy stabilizes.

This iterative refinement typically requires 3–5 cycles to achieve satisfactory accuracy for new field conditions. Each iteration expands the annotated dataset while progressively improving the model capability for environment-specific characteristics such as local soil color variations and lighting patterns. The semi-automatic approach dramatically reduces the annotation workload: processing 2000+ images requires approximately 12 h including all revision cycles, compared with an estimated 200+ hours for fully manual annotation of equivalent quantity.

Base model preparation phase (Figure 3b, upper): In this initial phase, the data collected from the standalone data-acquisition system undergo AI-assisted annotation following the semi-automatic workflow described above. The resulting Annotation Dataset #1 combines manually annotated samples with iteratively refined predictions to train the base AI model. This base model incorporates representative data from multiple farms, emphasizing soil moisture conditions, surface adherence patterns, and structural features of the conveyor belt.

Data feedback for adaptation phase (Figure 3b, lower): Once the system is deployed in the AI potato harvester, it continuously collects operational data that undergo quality control through the data uniformizer. This uniformizer maintains a consistent image quality by filtering out problematic frames with extreme lighting conditions, motion blur, or mechanical occlusions. High-quality data from harvester operations then undergo secondary AI-assisted annotation (AI-assisted annotation *2), generating Annotated Dataset #2 that captures the environment-specific characteristics of each deployment site. The updated dataset is fed back into the AI model, enabling continuous refinement and adaptation to local field conditions.

Environment-specific optimization: This strategy focuses on selecting data with distinct class characteristics while optimizing the model for class object distribution, soil-related surface characteristics, and conveyor belt structural features that are specific to each deployment site. The rapid adaptation capability of the system enables same-day model customization when it is deployed to new farms, requiring only a representative image collection and targeted manual revision of model predictions, rather than complete dataset re-annotation. This systematic approach ensures reliable results across different farming environments while efficiently utilizing the limited training data and time required for expert annotation.

2.3. Real-Time AI-Based Impurity Removal System

The mechanical configuration depicted in Figure 4 integrates an impurity removal system within the TOP1e series offset-type potato harvester (type: TOP1eCVWHZ, manufactured by Toyo Agricultural Machinery Manufacturing Co., Ltd., Obihiro, Japan), which is a commercial harvesting platform designed for Japanese agricultural conditions that incorporates a high-speed pneumatic finger-flip mechanism powered by an air cylinder. The conceptual design and system architecture, which build upon our previous work [17], are shown in Figure 4a, whereas the actual implementation on the TOP1e potato harvester is illustrated in Figure 4b. A Basler RGB camera (Basler, Ahrensburg, Germany) mounted above the main conveyor captures images for real-time processing. The production system uses an NVIDIA Jetson AGX Orin 64 GB Developer Kit (NVIDIA, Santa Clara, CA, USA) as the AI processing unit (AIPU), featuring Ampere GPU architecture with 2048 CUDA cores and AI capabilities up to 275 TOPS. The same Basler acA2040-55uc camera with an IMX265 CMOS sensor providing 2048 × 1536-pixel resolution is employed for the image acquisition to maintain consistency with the configuration of the data-collection system. The 18% reflectance standard gray reference panel enables automatic brightness correction through continuous luminance monitoring under a 10 ms exposure range, ensuring a consistent image quality independent of field lighting. The AI analyzes images to detect objects and sends class and position information to the programmable logic controller (PLC). The PLC controls the finger-flip component based on the received information. The system employs a dual-conveyor configuration to facilitate efficient material separation and sorting, as illustrated in the inside view in Figure 4c. The potatoes continue to pass naturally onto the potato conveyor belt through inertial motion, and the finger-flip mechanism actively diverts the identified impurities onto a separate impurity conveyor belt.

The implemented program architecture consists of two primary functional blocks (Figure 5), namely the AIPU and PLC, which operate the integrated AI-driven potato sorting system. This optimized architecture establishes a streamlined data flow and processing pipeline, supporting both real-time operation and subsequent analysis capabilities. The operational workflow is initiated by the Basler camera capturing images for AI-based object detection using YOLOX inference to generate class information and coordinate the data. The AIPU processes this information and generates appropriate control signals for the PLC system within approximately 20 ms from image input to control signal generation.

The architecture employs an asynchronous communication strategy using queue-based systems to optimize system efficiency and minimize processing delays. Recent advances in IoT communication protocols have emphasized the importance of optimized data transmission strategies for agricultural applications [18]. Whereas TCP/IP communication between the AIPU and PLC requires only 1–2 ms, synchronous processing introduces CPU/GPU idle time, degrading the overall efficiency from an optimal 15 ms processing time to approximately 25 ms. The implemented queue-based asynchronous processing eliminates this idle time by decoupling the control signal generation from the communication tasks. The asynchronous architecture maintains practical synchronous operation owing to the significant time difference between control signal generation (15 ms) and communication latency (1–2 ms). This design ensures that the control signals are transmitted to the PLC without affecting the core AI processing pipeline, while the PLC system actuates the cylinder-based finger-flip mechanism based on the received control signals to enable precise impurity removal, as illustrated in the system architecture flow diagram (Figure 5).

The data logging system operates on an external M.2 solid-state drive storage with dual-purpose data collection: raw camera images are saved at 10 FPS specifically for annotation dataset development, whereas the YOLOX inference results with annotated detection boxes are saved at 30 FPS for verification purposes. The PLC communication logs are analyzed to validate the actual control execution, thereby ensuring system reliability and providing data for optimization.

3. Materials and Methods

3.1. Base Model Preparation and Model Optimization

Developing accurate AI models for potato harvesting requires careful balancing of the training datasets to prevent recognition bias [19]. Artificial soil clumps were generated and incorporated into the datasets to address the data imbalance observed at the harvesting sites (90% potatoes, 10% impurities). In addition, lower-resolution footage from previous RealSense cameras (Intel, Santa Clara, CA, USA) [20], with inferior resolution compared with Basler cameras (Basler, Ahrensburg, Germany), was integrated into the training data. This deliberate inclusion of lower-resolution images helped to reduce model bias and enhance the accuracy by exposing the model to varying image qualities [20]. Including conveyor belt structure data was particularly crucial owing to the belt design of the TOP1e harvester (Toyo Agricultural Machinery Manufacturing Co., Ltd., Obihiro, Japan) featuring protrusions after every two rows. These protrusions occasionally obstructed the camera visibility at the dropping points, posing challenges for accurate object detection.

An incremental strategy was adopted for final training to provide a lightweight base model that could be enhanced with environment-specific data while maintaining real-time capabilities on the edge devices. The base model incorporated representative data from multiple farms, emphasizing soil moisture conditions and surface adherence patterns. A 6:4 ratio was maintained between the potato and foreign-object classes to minimize potato loss from false detections.

3.2. Variety Selection and Morphology Classification

Ten morphologically diverse potato varieties were selected for validation based on their commercial production significance in Hokkaido: Danshaku, Harrow Moon, Hokkaikogane, Kitaakari, Kitahime, May Queen, Sayaka, Snowden, Snow March, and Toyoshiro. For cross-variety analysis, each variety was evaluated using 200 standardized field images that were captured during normal harvesting operations. A total of 2000 images were analyzed. Morphology-based classification was not employed, as the preliminary analysis indicated that the system operation was independent of the potato shape characteristics. This approach enabled variety-independent operation without morphological categorization or variety-specific parameter adjustment.

3.3. AI Model Architecture and Optimization

Prior to selecting the final architecture, several object detection models were evaluated, including the Single Shot Detector (SSD) and various YOLO-series variants. Comparative testing revealed that the YOLO architectures outperformed the SSD in terms of both processing speed and detection accuracy, making them more suitable for real-time applications in dynamic field environments [21,22,23]. Specifically, YOLO achieved superior true positive detection and positional accuracy in outdoor settings, as observed in urban surveillance and agricultural applications [22]. Among the YOLO series, YOLOX was selected owing to it being license free (Apache-2.0), as well as its anchor-free design and advantages in mobile deployment speed, which are critical for edge computing on agricultural machinery [24,25,26]. The anchor-free architecture of YOLOX simplifies the detection and enhances the speed, making it ideal for resource-constrained environments such as Jetson AGX Orin (NVIDIA, Santa Clara, CA, USA).

The detailed characteristics and evaluation of the YOLOX-small model used in this study were comprehensively documented in our previous work [17]. Recent comparisons with YOLOv12 showed that YOLOX exhibits superior processing speed (107 FPS vs. 45 FPS, p < 0.01) and energy efficiency (0.58 J/frame vs. 0.75 J/frame) on the Jetson AGX Orin platform, where medium and large model variants lag owing to hardware constraints and increased computational demands. These findings align with the size-specific precision and recall analyses in our previous work [17], in which YOLOX maintained balanced performance across potato, soil clump, and stone classes. In addition, the selection of the YOLOX-small model over the medium and large variants was based on its faster initialization and processing speeds on edge devices, as larger models are impractical owing to their higher computational demands [27,28,29]. Recent studies have confirmed that YOLOX architectures maintain superior performance-to-resource ratios on edge computing platforms across diverse real-time applications [27], thereby validating our selection for resource-constrained mobile agricultural equipment.

Selecting default hyperparameters ensured reproducibility and leveraged proven optimization settings from the original YOLOX implementation. TensorRT acceleration provides necessary inference speed optimization for real-time processing while maintaining model accuracy. YOLOX-small, with approximately 9.8 million parameters, yields a superior performance-to-resource ratio compared with larger variants while maintaining sufficient accuracy for agricultural applications [28]. The effectiveness of the YOLOX architecture has been validated across diverse applications including autonomous systems [30], confirming its suitability for real-time agricultural applications. The lightweight nature of the selected architecture enables deployment on mobile agricultural equipment, where computational resources and power consumption are constrained [28,29].

Training dataset configuration: The YOLOX-small model was trained using the official YOLOX repository, (version 0.1.1, Megvii-BaseDetection, Beijing, China) with the following specifications. The dataset was split into 70% training, 20% validation, and 10% testing sets to ensure robust model evaluation. Training was conducted for 300 epochs without early stopping criteria to allow for complete convergence. The batch size was set to 24, optimized for available training hardware consisting of an Intel Core i9-14900KF processor (Intel, Santa Clara, CA, USA) and an NVIDIA RTX 4090 GPU (NVIDIA, Santa Clara, CA, USA). All other hyperparameters, including the SGD optimizer with a momentum of 0.9, an initial learning rate of 0.01 with a cosine annealing schedule, and a weight decay of 5 × 10⁻⁴, followed the default YOLOX-small configuration to ensure reproducibility. Standard data augmentation techniques were applied during training, including random horizontal flipping, mosaic augmentation, and MixUp with an alpha parameter of 0.5. The training process required approximately 14 h on the specified hardware configuration.

Model configuration: YOLOX-small with default hyperparameters was used for the architecture. The input resolution was set to 640 × 640 pixels. The inference optimization employed TensorRT (version 8.5.2) acceleration on NVIDIA Jetson AGX Orin. The target processing speed exceeded 30 FPS. Standard TensorRT optimization without additional compression was applied for model compression.

Training strategy: The base model training incorporated representative data from multiple farms, with an emphasis on soil moisture conditions and surface adherence patterns. The data ratio was carefully optimized with 6:4 balance between the potato and foreign-object classes to minimize potato loss from false detections. The incremental learning approach enables rapid adaptation to new field conditions within 12 h, providing a lightweight base model that can be enhanced with environment-specific data while maintaining real-time capability on edge devices.

The field-specific adapted model exhibited excellent results, with the potato classification process achieving 0.997 precision and 0.998 recall, and impurity detection achieving 0.880 precision and 0.970 recall. The entire process, from data collection to model deployment, was completed within approximately 14 h, demonstrating a rapid environmental adaptation capability. The development of standardized agricultural terminology and entity recognition systems supports model consistency across different farming environments [31].

Detection Post-Processing Pipeline

YOLOX detection output undergoes systematic post-processing prior to pneumatic actuation to ensure reliable object classification under field conditions. Non-maximum suppression with an IoU threshold of 0.5 eliminates overlapping detections, retaining only the bounding box with the highest confidence score. Class-specific confidence thresholds were established through field validation: 0.65 for potatoes, 0.55 for soil clumps, and 0.60 for stones. These asymmetric thresholds reflect the operational priority of minimizing potato loss while maintaining effective impurity removal. The pneumatic actuation logic employs object-class-specific strategies that are designed to minimize false rejections. For potato-class objects, the system implements conservative actuation rules in which the pneumatic fingers remain retracted even when the detection boxes partially overlap with the finger mechanism boundaries. This ensures that no potatoes are inadvertently ejected owing to edge detection uncertainty. Conversely, for impurity-class objects (soil clumps and stones), the system requires complete coverage of the finger width by detecting the bounding box before triggering actuation. This full-coverage criterion prevents unnecessary actuations for small debris while ensuring effective removal of substantial impurities. This size-selective actuation strategy naturally handles small impurities (approximately <4 cm) without explicit filtering. Such small objects fail to meet the full-coverage criterion and are consequently not actuated by the pneumatic system. However, these small impurities do not reach downstream sorting stations as they fall through gaps in the rolling conveyor mechanism, which naturally sieves small debris before materials reach the manual sorting areas. This synergy in mechanical design between the AI detection thresholds and conveyor architecture ensures comprehensive impurity management without unnecessary pneumatic interventions for negligible contaminants.

3.4. Performance Evaluation

Two distinct evaluation approaches were employed to address the different validation requirements.

AI model performance evaluation (count-based): The AI model accuracy was assessed using standard computer vision metrics that compare AI object counts with ground-truth annotations. This approach is methodologically required for AI validation because ground-truth datasets inherently consist of discrete labeled objects (potatoes, soil clumps, and stones), regardless of their physical dimensions. Count-based evaluation enables a direct comparison between the predicted and actual classifications, providing fundamental measures of detection precision and recall across different object categories. Thus, the object detection accuracy was calculated on a count basis by measuring the percentage of correctly identified potatoes and impurities across 200,000 images collected from five farms. This evaluation isolated the AI algorithm accuracy from mechanical and environmental factors.

Field performance evaluation (weight-based): The operational effectiveness was measured during actual harvesting operations using weight-based metrics. Materials removed using a sorting mechanism were collected and weighed to calculate the actual removal efficiency and misclassification rates by mass. This approach captured the combined results of AI detection, mechanical actuation, and environmental adaptation under actual agricultural conditions.

This methodology addresses practical agricultural requirements in which economic value, sorting efficiency, and commercial viability are determined by mass rather than individual item counts. Weight-based evaluation provides a direct correlation with farm-gate economic impact, as agricultural sales and quality assessments are fundamentally mass dependent. Similar weight-based assessment approaches have proven effective in automated agricultural sorting systems [32].

The dual-evaluation approach addresses the critical gap between laboratory AI results and the reality of field operations. Count-based metrics validate the algorithm accuracy, whereas weight-based metrics assess the commercial viability under actual working conditions.

The methodological transition from count-based to weight-based evaluation also addresses a practical limitation that is observed during field operations. Soil clumps frequently fragment upon mechanical impact during sorting, creating multiple smaller pieces from single original objects. Count-based metrics would register these fragments as separate detection events, potentially inflating the apparent effectiveness. Weight-based assessment captures the actual mass removed, regardless of fragmentation, thereby providing a more accurate representation of the sorting effectiveness under actual operational conditions.

3.5. Cross-Farm Validation: Dual-Layer Approach

Field validation was conducted across seven commercial potato farms in Shikaoi Town, Hokkaido, during the 2024 harvest season (September–October), as illustrated in Figure 6. The field validation setup in Figure 6a illustrates the TOP1e potato harvester with the integrated AI-driven impurity removal system operating under actual field conditions, whereas Figure 6b presents the representative AI object detection results during field testing. A dual-layer approach was employed for the validation to assess the consistency of the AI algorithm and integrated field operation comprehensively.

Layer 1: AI Detection Performance Assessment (All Seven Farms)

All seven farms were equipped with identical automated data collection systems to evaluate the consistency of the AI algorithm across diverse field conditions. A Basler acA2040-55uc industrial RGB camera (Basler, Ahrensburg, Germany) with an H0514-MP2 lens (Computar, Tokyo, Japan) was used for the camera setup. The brightness control employed a 18% standard reflectance gray panel for automatic exposure adjustment. The data collection was automated via video recording synchronized with the tractor operation for hands-free recording. The YOLOX model was applied to the collected videos for the count-based metrics, namely the potato misclassification rate (PMR) and impurity detection rate (IDR).

This standardized setup enabled consistent AI assessment without farm-specific parameter adjustment. All seven farms provided AI detection data through this automated video collection and analysis protocol.

Layer 2: Field Performance Validation (Sasagawahokuto Farm)

Additional intensive field experiments were conducted at the Sasagawahokuto farm to validate that AI detection metrics translate to actual field operations. Harvesting operations took place at operational speeds (1–4 km/h) with a pneumatic actuation system. Material collection involved separate collection of the ejected materials and remaining potatoes on the conveyor belt. Measurement included precision weighing of all the collected materials using calibrated agricultural scales. The weight-based PMR and IDR were calculated from the measured masses.

Extensive material handling, precision weighing equipment, and operational coordination were required for this intensive validation. Resource constraints limited the weight-based validation to one representative farm, whereas the AI detection was assessed across all seven sites to confirm the algorithm consistency.

Validation Rationale

The dual-layer approach provides comprehensive validation with efficient resource allocation. The AI algorithm consistency was confirmed across seven diverse farm conditions (Layer 1). AI-to-field translation was validated through intensive weight-based measurements at a representative farm (Layer 2).

The AI detection results for Sasagawahokuto (Layer 1) were compared with its field experiment results (Layer 2) to confirm that the count-based AI metrics successfully predicted the weight-based operational effectiveness.

Experimental Protocol

The field experiments at Sasagawahokuto involved a 15 m long test field with 0.75 m furrow widths. Three replicate passes were conducted at operational speeds of 1–4 km/h to evaluate the real-world capabilities. The system implemented specific actuation rules that maintained retracted positions for potato-classified objects and limited actuation based on detected impurities to meet the target metrics (<1% potato misclassification, >60% impurity removal). Testing beyond 4 km/h was not pursued because the downstream manual quality control operators (who are responsible for removing damaged and green potatoes) could not maintain processing efficiency at higher speeds, representing a system-level constraint that is independent of the AI performance. This speed range (1–4 km/h) reflects actual commercial harvesting conditions under which the AI system must operate alongside the remaining human workers in the current implementation.

4. Results

4.1. Proof-of-Concept Performance Validation

This proof-of-concept study successfully validated the technical feasibility and commercial viability of mobile AI-based impurity removal for decentralized potato harvesting applications. A comprehensive dual-evaluation methodology combining count-based AI algorithm metrics with weight-based field assessment was used for the validation. The count-based evaluations across ten potato varieties and seven commercial farms showed exceptional AI detection consistency (PMR: 0.08 ± 0.03%, IDR: 89.99 ± 1.25%), whereas the weight-based field validation during actual harvesting operations confirmed successful translation to operational effectiveness (PMR: 0.22–0.42%, IDR: 71.43–85.29% depending on field conditions). The prototype system consistently exceeded project targets (>60% impurity removal, <1% potato misclassification) across all tested conditions, validating the fundamental approach to AI-centric mobile agricultural automation.

4.2. Variety-Specific Detection Performance

The capability of the AI algorithm was evaluated across 10 morphologically diverse potato varieties representing primary cultivars used in Hokkaido commercial production. Each variety was assessed using approximately 200 standardized field images that were captured during normal harvesting operations, totaling 2000 images for cross-variety analysis. Automated video collection systems installed across multiple commercial farms were employed for validation to ensure representative sampling under authentic field conditions. No morphology-based classification was employed, which allowed the system to process all varieties using identical detection parameters without cultivar-specific calibration.

As shown in Figure 7 and detailed in Table 1, all 10 varieties showed consistent AI detection results. The cross-variety statistics revealed PMRs of 0.08 ± 0.03% (range: 0.01–0.32%) and IDRs of 89.99 ± 1.25% (range: 80.00–93.30%), with all varieties achieving the required thresholds (<1% PMR, >60% IDR). The minimal cross-variety standard errors (0.03% for PMR, 1.25% for IDR) demonstrate that the system operation is fundamentally independent of the potato variety and morphological characteristics.

Notably, the May Queen variety, which exhibits an elongated morphology, achieved a 0.10% PMR and 92.31% IDR, which are comparable results to those of round varieties such as Kitahime (0.01% PMR, 93.30% IDR) and Toyoshiro (0.02% PMR, 92.44% IDR). This variety-independent operation confirms that the system maintains consistent effectiveness across diverse potato shapes without morphology-based classification or variety-specific parameter adjustment, establishing a key advantage for commercial deployment in which multiple cultivars may be processed sequentially, without the need for system reconfiguration.

4.3. Cross-Farm AI Detection Consistency

Cross-farm validation was conducted across seven commercial potato farms in Shikaoi Town, Hokkaido, during the 2024 harvest season (September–October) to evaluate the robustness of the AI algorithm under naturally varying field conditions. All seven farms were equipped with identical automated data collection systems featuring Basler acA2040-55uc industrial RGB cameras with H0514-MP2 lenses and 18% standard reflectance gray panels for automatic brightness control. Video recordings were collected during normal harvesting operations and subsequently analyzed using the trained YOLOX model for count-based detection assessment without any farm-specific parameter adjustments.

As shown in Table 2, the AI detection achieved robust consistency across all seven farms. The cross-farm statistics revealed PMRs of 0.08 ± 0.03% (range: 0.02–0.24%) and IDRs of 90.56 ± 0.82% (range: 87.00–94.00%), with all farms meeting both required thresholds (<1% PMR, >60% IDR). The minimal cross-farm standard errors indicate that the AI detection was consistent across naturally occurring field condition variations, including soil types, moisture levels, lighting conditions, and impurity compositions. No farm-specific calibration was required, thereby validating the robustness of the base model training strategy and effectiveness of the automated brightness control system in adapting to diverse environmental conditions.

The Sasagawahokuto farm, which exhibited AI detection that was consistent with the cross-farm average (0.17% PMR, 91.00% IDR), was selected for additional intensive field validation experiments involving actual harvesting operations with a fully integrated pneumatic actuation system. As detailed in Section 3.5, weight-based assessment was conducted at operational speeds of 1–4 km/h. All ejected materials and remaining potatoes were collected and weighed to validate the translation from count-based AI detection metrics to weight-based operational effectiveness.

This cross-farm validation yielded three critical findings: (1) the robustness of the AI algorithm across diverse farm environments without site-specific tuning, (2) the predictability of field results based on AI detection metrics, and (3) commercial deployment readiness with consistent effectiveness across multiple operational sites.

4.4. Weight-Based Performance Validation Across Operational Speeds

Validation of the real-world harvesting revealed critical differences between the capabilities of the AI algorithm and the operation of the integrated system under actual field conditions, as shown in Figure 8. The weight-based analysis during actual harvesting operations showed PMRs of 0.22–0.42%, which were slightly higher than the results for the pure AI owing to mechanical factors, vibration, and dynamic field conditions; however, the <1% target was consistently maintained.

Adaptive field condition operation: The IDRs showed intelligent system adaptation to varying field conditions. The standard conditions at 1–2 km/h yielded 85.04 ± 0.89% and 85.29 ± 0.77% removal, respectively. Extended testing at 3 km/h (71.43 ± 3.12%) was conducted at farms with significantly higher impurity loads, where conservative sorting parameters were employed to prioritize potato preservation over maximum removal rates. This adaptive strategy successfully maintained misclassification rates of 0.33 ± 0.22% while handling challenging field conditions.

AI-to-field translation: A comparison of the results for the AI algorithm (Figure 7) and field system operation (Figure 8) revealed successful translation from laboratory conditions to agricultural reality. Although the field misclassification rates (0.22–0.42%) exceeded the pure AI rates (0.01–0.32%), both remained well within commercial specifications, validating the robustness of the integrated system design.

5. Discussion

The deployment of real-time AI processing under dynamic field conditions establishes new possibilities for developing autonomous agricultural machinery. As opposed to existing stationary systems that operate in controlled indoor environments, this mobile approach demonstrates the viability of AI in field conditions characterized by varying lighting, mechanical vibrations, and power constraints. The integration of AI-powered impurity removal directly into mobile potato harvesting machinery addresses the unique requirements of decentralized agricultural operations in which centralized processing facilities are not economically viable.

5.1. Prototype Validation and Commercial Development Pathway

The real-time AI processing deployed under dynamic field conditions demonstrated significant technical feasibility, marking a departure from existing stationary facility-based systems [33]. The AIPU achieved a consistent processing time of 20 ms from image capture to control signal generation, with the end-to-end system response averaging 40 FPS (reaching 50 FPS at 2 km/h). Recent advances in IoT communication protocols have emphasized the importance of optimized data transmission strategies [15]. Accordingly, the implementation of the asynchronous queue-based communication architecture successfully eliminated CPU/GPU idle time, ensuring that the pneumatic cylinder response time of 50 ms provided adequate margins for all tested speeds. These specifications provide adequate performance margins for reliable field operation, consistent with those of mobile agricultural robotics [34].

Despite robust computational performance, a detailed analysis of 100 misclassification events revealed that 90% of errors stemmed from mechanical design limitations, rather than algorithmic failures. The primary causes were identified as small potatoes becoming entrapped between conveyor belt protrusions (45%) and rebounding owing to upper conveyor rotation (25%). These findings suggest clear engineering pathways for enhancement, such as refining of the conveyor mechanism to minimize entrapment points. Furthermore, to address infrequent USB 3.0 disconnections caused by vibration, future commercial iterations will incorporate industrial-grade locking connectors (e.g., GigE Vision with M12 connectors) and enhanced vibration-dampening mounts to ensure the continuous operation required for commercial harvesting seasons.

5.2. Technical Achievement and Performance Validation

The dual-evaluation methodology confirmed that the system consistently exceeded project targets (>60% impurity removal, <1% potato misclassification) across all test conditions. One critical finding was the ability of the system to maintain exceptional recognition capabilities across 10 morphologically diverse potato varieties, with misclassification rates ranging from 0.01% to 0.32%. The slight variance observed in the Snowden variety (0.32%) was attributed to specific environmental lighting conditions, rather than to morphological characteristics, confirming that environmental control is the primary determinant of system performance. This variety-independent operation validates the robustness of the YOLOX-small architecture and simplifies its commercial deployment process by eliminating the need for cultivar-specific calibration.

5.3. Adaptive System Intelligence and Field Robustness

The field validation demonstrated intelligent operational adaptation beyond the simple algorithmic processing. The variation in IDRs (71.43–85.29%) reflects a deliberate adaptive strategy: under high-impurity conditions at 3 km/h, the system automatically prioritized crop preservation, accepting a lower detection rate (71.43%) to maintain a low misclassification rate (0.33%). This capability addresses a fundamental challenge in agricultural automation by balancing competing objectives—efficiency versus quality—under highly variable field conditions, thereby ensuring consistent crop quality standards without manual parameter adjustment.

5.4. Methodological Contribution and Economic Impact

This study highlights the necessity of a comprehensive validation methodology for agricultural AI. While the count-based metrics provided essential baselines for algorithm optimization, the weight-based field metrics determined the ultimate commercial viability and operational success.

The validation approach employed industry-established performance targets (>60% impurity removal, <1% potato misclassification) rather than a direct comparison with manual sorting, reflecting fundamental operational differences between completion-oriented manual work and throughput-oriented AI systems. Manual workers adjust the harvester speed to achieve thorough sorting (slowing or stopping as needed), making their performance inherently variable and speed dependent. In contrast, the AI system maintains consistent 71–85% removal rates at fixed operational speeds without requiring the harvester to slow down, enabling predictable throughput that justifies labor reduction from four to two workers. This throughput-maintenance capability, combined with the achievement of commercial performance thresholds, validates the practical labor reduction claim within the decentralized farming context of Japan.

The practical operational advantages of the AI system extend beyond labor reduction to throughput enhancement. Manual sorting operations typically target a 2 km/h harvesting speed as the baseline operational rate. However, under high-impurity field conditions, manual workers frequently require a speed reduction to 1 km/h or temporary harvester stops to maintain sorting completeness, often falling short of the 2 km/h continuous operation target. In contrast, our field validation demonstrated that the AI system enables continuous operation at 3 km/h, even under high-impurity conditions (a 71.43% removal rate at 3 km/h), achieving uninterrupted harvesting without speed reduction. This capability represents a 1.5× throughput improvement (from the typical 2 km/h manual baseline to 3 km/h AI-assisted operation) while simultaneously reducing the crew size from four to two workers. The combined effect, namely a 50% labor reduction and 1.5× throughput enhancement, substantially amplifies the economic value beyond simple wage savings:

Harvest time reduction: For a typical Hokkaido potato farm harvesting 10 hectares, operations are completed in approximately 4 d (at 3 km/h) vs. 6 d (at 2 km/h with manual sorting), representing a 33% time reduction.

Weather risk mitigation: A shorter harvest duration reduces exposure to adverse weather conditions that can damage the crop quality or halt operations.

Seasonal capacity enhancement: Higher throughput enables more farms to be serviced per harvest season or the cultivated area to be expanded without a proportional equipment investment.

Economically, the system is projected to generate annual labor savings of ¥300,000 per harvester by reducing the crew size from four to two workers. Under a conservative economic model, this allows for investment recovery within the 10-year operational lifespan, aligning with broader studies that have demonstrated positive returns on agricultural automation investments [35]. More importantly, in the context of Japan’s severe labor shortage, the system provides value beyond direct financial returns by enabling harvest operations that might otherwise be unfeasible owing to workforce unavailability.

6. Conclusions

This study successfully demonstrated the first mobile AI-powered impurity removal system to be integrated directly into commercial potato harvesting machinery. Through a rigorous dual-evaluation framework, we validated both the algorithmic accuracy (count-based PMR: 0.08 ± 0.03%) and operational effectiveness (weight-based PMR: 0.22–0.42%) across diverse agricultural environments. Robust, variety-independent performance across 10 potato varieties and seven commercial farms was confirmed, with the system achieving the target 50% labor reduction while consistently surpassing quality thresholds (>60% impurity detection, <1% misclassification).

Detailed failure analysis revealed that 90% of misclassifications originated from mechanical limitations rather than AI detection errors, identifying clear pathways for optimization through refinement of the mechanical system. Furthermore, the system exhibited intelligent environmental adaptation, employing conservative sorting strategies under high-impurity loads to prioritize crop preservation. These findings bridge the critical gap between laboratory AI performance and agricultural reality, establishing a validated technical foundation for commercial deployment.

Future work will focus on implementing the identified mechanical optimizations and conducting multi-seasonal validation across expanded geographical regions. The ability to maintain consistent performance at operational speeds (1–4 km/h) while meeting commercial thresholds demonstrates technical feasibility for the target 50% labor reduction in current harvesting systems. By validating both the technical performance and economic viability in decentralized farming environments, this study offers a scalable solution for labor-intensive operations globally, advancing the sustainability of agriculture under changing demographic conditions.

Author Contributions

J.K., AI model development, software development, data analysis, writing—original draft; K.T., S.T. and K.F., system integration, hardware configuration, writing—review and editing; G.K. and R.Y., data collection, image annotation, software development; N.D., statistical analysis, visualization, field coordination, field validation, experimental design; S.T. and Y.M., experiment design, agronomic consultation, variety selection; K.F. and Y.M., mechanical system design, industrial implementation, funding acquisition. All authors read and approved the final version of the manuscript and agreed to be accountable for all aspects of the work.

Funding

This study was supported by the Agricultural Machinery Technology Cluster Project of The Institute of Agricultural Machinery, National Agriculture and Food Research Organization (IAM/NARO).

Data Availability Statement

Performance evaluation data supporting the findings of this study are included in the published figures and tables. The YOLOX model architecture and training methodology are described in the Methods section. The trained model weights, proprietary source code, and raw training datasets contain commercially sensitive information and intellectual property of Toyo Agricultural Machinery Manufacturing Co., Ltd. and are not publicly available. Aggregated datasets and statistical analysis results are available from the corresponding author upon reasonable request, subject to confidentiality agreements and approval from collaborating organizations.

Acknowledgments

The authors acknowledge the foundational research conducted through the Strategic Innovation Creation Program Smart Bio-industry and Agricultural Infrastructure Technology project, which provided valuable insights for the system development. We extend our sincere gratitude to the Tokachi Federation of Agricultural Cooperative for facilitating collaboration with the Hokkaido farms and assisting with the annotation work. We are particularly grateful to Noriyuki Murakami for providing his valuable suggestions during system development, Kainuma Hideo for his extensive support during the field experiments, and Uchimura Yuki for consultation on the data-collection system. In addition, we would like to thank Unseok Lee for providing the semi-automatic annotation program, which significantly facilitated our research process.

Conflicts of Interest

K.F. is affiliated with Toyo Agricultural Machinery Manufacturing Co., Ltd., which provided the harvester platform for this study. The research was funded by public grants. The other authors declare no competing interests.

References

Food and Agriculture Organization. The Future of Food and Agriculture: Trends and Challenges; Food and Agriculture Organization of the United Nations: Rome, Italy, 2017. [Google Scholar]
Ministry of Agriculture, Forestry and Fisheries. The 96th Statistical Yearbook of Ministry of Agriculture, Forestry and Fishers; Ministry of Agriculture, Forestry and Fisheries of Japan: Tokyo, Japan, 2023.
Wijesinha-Bettoni, R.; Mouillé, B. The contribution of potatoes to global food security, nutrition and healthy diets. Am. J. Potato Res. 2019, 96, 139–149. [Google Scholar] [CrossRef]
Devaux, A.; Goffart, J.P.; Kromann, P.; Andrade-Piedra, J.; Polar, V.; Hareau, G. The potato of the future: Opportunities and challenges in sustainable agri-food systems. Potato Res. 2021, 64, 681–720. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Wu, F.; Gu, F.; Cao, M.; Yang, H.; Shi, L.; Wang, B.; Wang, B. Recent Research Progress on Key Technologies and Equipment for Mechanized Potato Harvesting. Agriculture 2025, 15, 675. [Google Scholar] [CrossRef]
Charlton, D.; Hill, A.E.; Taylor, J.E. Automation and Social Impacts: Winners and Losers; FAO Agricultural Development Economics Working Paper 22-09; Food and Agriculture Organization of the United Nations: Rome, Italy, 2022. [Google Scholar]
Borsellino, V.; Schimmenti, E.; El Bilali, H. Agri-food markets towards sustainable patterns. Sustainability 2020, 12, 2193. [Google Scholar] [CrossRef]
Ministry of Agriculture, Forestry and Fisheries. Annual Report on Food, Agriculture and Rural Areas in Japan FY2023; Ministry of Agriculture, Forestry and Fisheries of Japan: Tokyo, Japan, 2024.
Krishnan, R.S.; Julie, E.G. Computer aided detection of leaf disease in agriculture using convolution neural network based squeeze and excitation network. Automatika 2023, 64, 1038–1053. [Google Scholar] [CrossRef]
Xia, Y.; Tang, M.; Tang, W. Fine-grained potato disease identification based on contrastive convolutional neural networks. Appl. Artif. Intell. 2023, 37, 2166233. [Google Scholar] [CrossRef]
Javaid, M.; Haleem, A.; Khan, I.H.; Suman, R. Understanding the potential applications of artificial intelligence in agriculture sector. Adv. Agrochem 2023, 2, 15–30. [Google Scholar] [CrossRef]
Korchagin, S.A.; Gataullin, S.T.; Osipov, A.V.; Smirnov, M.V.; Suvorov, S.V.; Serdechnyi, D.V.; Bublikov, K.V. Development of an optimal algorithm for detecting damaged and diseased potato tubers moving along a conveyor belt using computer vision systems. Agronomy 2021, 11, 1980. [Google Scholar] [CrossRef]
Johnson, C.M.; Auat Cheein, F. Machinery for potato harvesting: A state-of-the-art review. Front. Plant Sci. 2023, 14, 1156734. [Google Scholar] [CrossRef]
Duckett, T.; Pearson, S.; Blackmore, S.; Grieve, B.; Smith, M. White Paper—Agricultural Robotics: The Future of Robotic Agriculture. UK-RAS White Paper. UK-RAS Network. 2018. Available online: https://uwe-repository.worktribe.com/output/866226 (accessed on 20 November 2025).
Ghazal, S.; Munir, A.; Qureshi, W.S. Computer vision in smart agriculture and precision farming: Techniques and applications. Artif. Intell. Agric. 2024, 13, 64–83. [Google Scholar] [CrossRef]
Fujita, K.; Kurasiki, K.; Fukao, T.; Murakami, N.; Funabiki, K. An automatic impurity removal system on a potato harvester using deep learning. In Proceedings of the 22nd SICE System Integration Division Annual conference (SI2021), Online, 15–17 December 2021. Paper No.1G3-04. [Google Scholar]
Kim, J.; Tokuda, K.; Kim, G.; Yoshitoshi, R. Real-time object detection for edge computing-based agricultural automation: A case study comparing the YOLOX and YOLOv12 architectures and their performance in potato harvesting systems. Sensors 2025, 25, 4586. [Google Scholar] [CrossRef] [PubMed]
Kreković, D.; Krivić, P.; Žarko, I.P.; Kušek, M.; Le-Phuoc, D. Reducing communication overhead in the IoT-edge-cloud continuum: A survey on protocols and data reduction strategies. Internet Things 2025, 31, 101553. [Google Scholar] [CrossRef]
Alif, M.A.R.; Hussain, M. YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain. arXiv 2024, arXiv:2406.10139. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
More, G.; Patil, O.; More, O.; More, M.; Suryavanshi, S.; Mali, M. Comparison of object detection algorithms CNN, YOLO and SSD. Int. J. Sci. Res. Technol. 2024, 1, 137–144. [Google Scholar]
Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural object detection with You Look Only Once (YOLO) algorithm: A bibliometric and systematic literature review. arXiv 2024, arXiv:2401.10379. [Google Scholar] [CrossRef]
Morera, A.; Sánchez, A.; Moreno, A.B.; Sappa, A.D.; Vélez, J.F. SSD vs. YOLO for detection of outdoor urban advertising panels under multiple variabilities. Sensors 2020, 20, 4587. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Megvii-BaseDetection. YOLOX: YOLOX is a High-Performance Anchor-Free Version of YOLO. GitHub. 2021. Available online: https://github.com/Megvii-BaseDetection/YOLOX (accessed on 2 September 2025).
Bao, X.A.; Zhou, L.Q.; Tu, X.M.; Wu, B.; Zhang, Q.Q.; Jin, Y.T.; Zhang, N. Wildlife target detection based on improved YOLOX-s network. Sci. Rep. 2024, 14, 23608. [Google Scholar] [CrossRef]
Rey, L.; Bernardos, A.M.; Dobrzycki, A.D.; Carramiñana, D.; Bergesio, L.; Besada, J.A.; Casar, J.R. A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications. Electronics 2025, 14, 638. [Google Scholar] [CrossRef]
Ma, M.Y.; Shen, S.E.; Huang, Y.C. Enhancing UAV visual landing recognition with YOLO’s object detection by onboard edge computing. Sensors 2023, 23, 8999. [Google Scholar] [CrossRef] [PubMed]
Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge YOLO: Real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
He, Q.; Xu, A.; Ye, Z.; Zhou, W.; Cai, T. Object detection based on lightweight YOLOX for autonomous driving. Sensors 2023, 23, 7596. [Google Scholar] [CrossRef] [PubMed]
D’Souza, J. Agriculture Named Entity Recognition—Towards FAIR, reusable scholarly contributions in agriculture. Knowledge 2024, 4, 1–26. [Google Scholar] [CrossRef]
Gaikwad, N.N.; Samuel, D.V.K.; Grewal, M.K.; Manjunatha, M. Development of orange grading machine on weight basis. J. Agric. Eng. 2014, 51, 1–8. [Google Scholar] [CrossRef]
Wang, R.F.; Su, W.H. The application of deep learning in the whole potato production chain: A comprehensive review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
Wei, W.; Xiao, M.; Duan, W.; Wang, H.; Zhu, Y.; Zhai, C.; Geng, G. Research progress on autonomous operation technology for agricultural equipment in large fields. Agriculture 2024, 14, 1473. [Google Scholar] [CrossRef]
Gong, R.; Zhang, H.; Li, G.; He, J. Edge computing-enabled smart agriculture: Technical architectures, practical evolution, and bottleneck breakthroughs. Sensors 2025, 25, 5302. [Google Scholar] [CrossRef] [PubMed]

Figure 1. TOP1e potato harvester configuration with integrated AI impurity removal system: (a) schematic highlighting the first sorting conveyor (yellow) where the AI system is installed, with four workers at the subsequent sorting stations; arrows indicate the direction of object flow. and (b) field operation photograph showing the harvester with workers and AI system integration.

Figure 2. Standalone data-collection system: (a) internal configuration with Jetson Nano in the control box, (b) field image captured by a Basler camera on the conveyor, and (c) field deployment integrated into commercial harvester operations.

Figure 3. Workflow of adaptive machine-learning strategy for continuous improvement. (a) Semi-automatic annotation process: the initial manual annotation of 500 images from a total of 100,000 images creates the base model, which performs inference on 2000 + n images; the manual revision of the inference results enables iterative refinement through n cycles. (b) System architecture showing the base model preparation phase using AI-assisted annotation (*) to create the initial dataset (Annotation Dataset #1), and data feedback adaptation phase where the field data from AI potato harvester undergoes data uniformization and secondary AI-assisted annotation (*) to generate an updated dataset (Annotated Dataset #2) for continuous model improvement.

Figure 4. Integrated AI-driven potato sorting system: (a) conceptual design and architecture, (b) implementation on TOP1e harvester, and (c) internal view showing conveyor and flip mechanism. Note: Panel (a) is adapted from [17].

Figure 5. System architecture for AI-based object detection with asynchronous communication between the AIPU and PLC.

Figure 6. Field validation: (a) TOP1e harvester with integrated AI system and (b) AI object detection results during field tests; the vertical green lines indicate the boundaries of the conveyor belt, and the horizontal blue line marks the dropping position.

Figure 7. Evaluation of AI algorithm performance across 10 potato varieties using count-based analysis: (a) PMRs showing that <1% error was achieved for all varieties and (b) IDRs that consistently exceeded 87% across all varieties. Error bars represent ±1 standard error. The red dashed lines indicate the project performance targets (<1% for PMR and >60% for IDR).

Figure 8. Performance validation of the integrated system under actual field harvesting conditions using weight-based analysis: (a) PMRs consistently maintained at <1% across all operational speeds and (b) IDRs demonstrating adaptive performance based on field conditions. Error bars represent ±1 standard error.

Table 1. AI detection performance across 10 potato varieties.

Variety	n (Images)	PMR (%)	IDR (%)
Danshaku	200	0.03 ± 0.01	90.24 ± 1.60
Harrow Moon	200	0.04 ± 0.02	90.88 ± 3.68
Hokkaikogane	200	0.06 ± 0.02	91.22 ± 1.67
Kitaakari	200	0.08 ± 0.01	87.00 ± 2.20
Kitahime	200	0.01 ± 0.01	91.38 ± 2.48
May Queen	200	0.10 ± 0.002	92.31 ± 2.50
Sayaka	200	0.04 ± 0.01	90.40 ± 1.10
Snowden	200	0.32 ± 0.17	93.30 ± 2.80
Snow March	200	0.06 ± 0.01	80.00 ± 1.40
Toyoshiro	200	0.02 ± 0.01	93.18 ± 2.59
Mean ± SE	2000	0.08 ± 0.03	89.99 ± 1.25

PMR = potato misclassification rate, IDR = impurity detection rate. Values represent mean ± standard error.

Table 2. Dual-layer validation across seven commercial potato farms.

Farm	PMR (%)	IDR (%)
Sasagawahokuto	0.17	91
Takahashi	0.03	92
Kunizima	0.04	90
Nagaya	0.03	87
Ono	0.04	94
Sato	0.24	91
Yamada	0.02	89
Mean ± SE	0.08 ± 0.03	91 ± 0.8

PMR = potato misclassification rate (target: <1%). IDR = impurity detection rate (target: >60%). Additional weight-based field validation was conducted at the Sasagawahokuto farm (Section 3.5).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.; Tokuda, K.; Miho, Y.; Kim, G.; Yoshitoshi, R.; Tsuchiya, S.; Deguchi, N.; Funabiki, K. Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting. Agronomy 2026, 16, 383. https://doi.org/10.3390/agronomy16030383

AMA Style

Kim J, Tokuda K, Miho Y, Kim G, Yoshitoshi R, Tsuchiya S, Deguchi N, Funabiki K. Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting. Agronomy. 2026; 16(3):383. https://doi.org/10.3390/agronomy16030383

Chicago/Turabian Style

Kim, Joonam, Kenichi Tokuda, Yuichiro Miho, Giryeon Kim, Rena Yoshitoshi, Shinori Tsuchiya, Noriko Deguchi, and Kunihiro Funabiki. 2026. "Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting" Agronomy 16, no. 3: 383. https://doi.org/10.3390/agronomy16030383

APA Style

Kim, J., Tokuda, K., Miho, Y., Kim, G., Yoshitoshi, R., Tsuchiya, S., Deguchi, N., & Funabiki, K. (2026). Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting. Agronomy, 16(3), 383. https://doi.org/10.3390/agronomy16030383

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mobile AI-Powered Impurity Removal System for Decentralized Potato Harvesting

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Key Challenges in Mobile AI Implementation

1.3. Objectives and Contributions

1.3.1. Objectives and System Description

1.3.2. Three Key Contributions

2. Proposition

2.1. Standalone Data-Collection System

2.2. Adaptive Machine-Learning Strategy for Continuous Improvement

2.3. Real-Time AI-Based Impurity Removal System

3. Materials and Methods

3.1. Base Model Preparation and Model Optimization

3.2. Variety Selection and Morphology Classification

3.3. AI Model Architecture and Optimization

Detection Post-Processing Pipeline

3.4. Performance Evaluation

3.5. Cross-Farm Validation: Dual-Layer Approach

4. Results

4.1. Proof-of-Concept Performance Validation

4.2. Variety-Specific Detection Performance

4.3. Cross-Farm AI Detection Consistency

4.4. Weight-Based Performance Validation Across Operational Speeds

5. Discussion

5.1. Prototype Validation and Commercial Development Pathway

5.2. Technical Achievement and Performance Validation

5.3. Adaptive System Intelligence and Field Robustness

5.4. Methodological Contribution and Economic Impact

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI