Location Research and Picking Experiment of an Apple-Picking Robot Based on Improved Mask R-CNN and Binocular Vision

Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript discusses the development of the vision system for apple harvesting robot. The authors' major focus is the improvement of the vision system from the viewpoints of accurate detection for target apples in binocular camera images. For this purpose, the authors employed a deep learning approach consisting of the author's modified Mask R-CNN especially with the updated backbone and reduced classification. The authors trained this deep learning model with the authors' dataset, which they did not disclosed in this manuscript, and assessed the performance of the proposed approach both in the apple detection and position accuracy with the images obtained by commercial-available biocular camera (ZED2i). The authors also implemented this approach into the apple picking robot they desinged, resulting in succedded apple havesting.
Overall, the manuscript well described the advancement and novelty of the study in this field. However, there are several points which should imporove the quality of the manuscript.
- The model architecture should be disclosed for the community of this field for the fair evaluation and broaden utilization.
- Please describe the reason why the ResNet-DenseNet hybrid network was used. There are many models such as not only neural networks but also recent transformer architecture, could be quite effective to improve the performance.
- Please add the detailed information of computation environments for training, image processing, and harvesting robot.
- Please mention about the generative AI and foundation model for robotics. Recently, it is potentially quite promising to drive robots in any field by generating the motion of robots directly from the vision data, like the robotics transformers https://robotics-transformer-x.github.io/ Physical Intelligence's pi series https://www.physicalintelligence.company/ and several studies even in plant science and agriculture https://doi.org/10.1109/TASE.2023.3312657 and https://doi.org/10.1016/j.compag.2025.110131
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript presents a study about the location and the picking operation of an apple harvesting robot using Mask R-CNN and Binocular Vision.
The objective described in lines 106-120 is confused. The objective should be clarified and clearly defined. The objective explained is not clearly correlated to the title of the manuscript.
The novelty of the image recognition system has been explained and justified.
In the introduction, there is not clear review of previous studies about harvesting robots and about the detaching systems in apple robots. The literature review has been based on the image recognition systems, but no revision has been made about the picking systems.
In the M & M section the image acquisition procedure has been clearly described and the image analysis methodology described. However, the description of the robot and the picking system, indicated in the title of the manuscript and in the Result section, is missing. It is necessary to describe the robot, the picking system and the picking experiment procedure.
Information about the picking test has been included in the Result section, it should be moved to the M & M section (and improved) (lines 365-376). Crucial information is missing in order to understand the picking experiment.
The results about the picking efficiency very limited. Lines 376-377 (“The overall harvesting success rate achieved 94.3%, confirming that the proposed method satisfies real-time operational requirements for apple-harvesting robotic systems”). It is necessary to describe the experimental design and clearly explain the meaning of the overall harvesting success rate achieved 94.3%, indicated by the authors.
The conclusions are only based on the image analysis results. These results and conclusions are not correlated to the title of the manuscript in which the picking operation of the robot has been included (“Location Research and Picking Experiment of Apple Picking Robot…”)
Author Response
Please see the attachment.
Author Response File: Author Response.pdf