Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Dynamic Task Planning for Multi-Arm Apple-Harvesting Robots Using LSTM-PPO Reinforcement Learning Algorithm

Agriculture 2025, 15(6), 588; https://doi.org/10.3390/agriculture15060588

by Zhengwei Guo^1,2,†, Heng Fu^2,†

, Jiahao Wu², Wenkai Han², Wenlei Huang², Wengang Zheng^2,* and Tao Li^2,*

Reviewer 1: Anonymous

Reviewer 2:

Egle Jotautiene

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5: Anonymous

Agriculture 2025, 15(6), 588; https://doi.org/10.3390/agriculture15060588

Submission received: 11 February 2025 / Revised: 5 March 2025 / Accepted: 6 March 2025 / Published: 10 March 2025

(This article belongs to the Section Agricultural Technology)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Review comments of Dynamic Task Planning for Multi-Arm Apple Harvesting Robots Using LSTM-PPO Reinforcement Learning Algorithm

This article proposes a dynamic task planning method for multi arm apple picking robots based on LSTM-PPO reinforcement learning algorithm, aiming to improve task coordination efficiency and real-time decision-making ability in dynamic orchard environments. The research provides technical support for agricultural automation and demonstrates significant advantages in dynamic environmental adaptability, with high practical application value. But the following issues need further improvement:

(1) Chapter 3.3: The title "Agent" is relatively vague. It is suggested to refine it. Can it be changed to "LSTM-PPO Agent Design and Implementation"

(2) Table 1. Hyper parametric settings for the training step of the proposed method in Chapter 4.1 does not explain the basis for hyperparameter settings.

(3) 4.2. The Comparative Experiments section only compares the PPO algorithm and does not include other mainstream algorithms, which weakens the argument for the universality of the method. Suggest adding comparative experiments to verify the superiority of LSTM-PPO.

(4) The font of the legend in Figure 8 can be appropriately enlarged, and other issues related to the readability of the legend and chart will not be repeated. Please check and modify them by the author.

(5) 4.4. In the Field Experiments section, there is no comparison data between the actual motion trajectory of the robotic arm and the simulation results provided in the field experiments. It is necessary to supplement the performance verification in actual scenarios.

(6) In the conclusion section, the algorithm optimization direction for
large-scale scenarios can be added to enhance the adaptability of the
algorithm in different scenarios.

(7) Please ask the author to read the article carefully, solve any
language and logic problems that may exist, and self-check and make
necessary modifications.

(8) Please check the reference format carefully, because the DOI number
of some references is inconsistent with other references

Comments on the Quality of English Language

double-check the correctness of professional vocabulary.

Author Response

Comment 1 Chapter 3.3: The title "Agent" is relatively vague. It is suggested to refine it. Can it be changed to "LSTM-PPO Agent Design and Implementation"

Response 1 Thank you for your valuable feedback. We agree that the title "Agent" is relatively vague, and we have updated it to "LSTM-PPO Agent Design and Implementation" to more clearly reflect the specific method we designed and implemented in this paper. We appreciate your attention to the details of the manuscript.

Comment 2 Table 1. Hyper parametric settings for the training step of the proposed method in Chapter 4.1 does not explain the basis for hyperparameter settings.

Response 2 We have clarified in the manuscript that our hyperparameters were selected through testing and empirical methods.

In the revision, we have added the following content:

Hyperparameter selection involved testing multiple configurations and fine-tuning to achieve optimal performance. We tested different learning rates in the range of 1e-4 to 1e-3, starting with a lower learning rate to avoid model divergence and gradually increasing it to determine the best balance between learning speed and stability. After multiple training iterations, we found that a learning rate of 3e-4 provided a good balance. For the batch size, we experimented with sizes similar to or slightly larger than the network width, ultimately finding that a batch size of 128 provided a good balance. The discount factor was empirically set to 0.99.

Comment 3 4.2. The Comparative Experiments section only compares the PPO algorithm and does not include other mainstream algorithms, which weakens the argument for the universality of the method. Suggest adding comparative experiments to verify the superiority of LSTM-PPO.

Response 3 Thank you for your valuable feedback on the algorithm comparison. We chose LSTM-PPO and Genetic Algorithm (GA) as the core comparison methods, mainly due to task characteristics and algorithm compatibility considerations. In our previous work, we also attempted to use DQN to solve the problem mentioned in the paper, but DQN performed poorly in this task and failed to learn an effective strategy. Unfortunately, there is currently no other research on the application of reinforcement learning in multi-arm harvesting task planning. However, existing literature shows that in tasks similar to robotic control, PPO-based algorithms significantly outperform DQN and A3C (Zhang, 2024)[1]. Therefore, this study focuses on comparing LSTM-PPO with classical optimization methods (GA) to highlight its unique advantages in complex harvesting scenarios.

[1] Zhang Q, Ma W, Zheng Q, et al. Path planning of mobile robot in dynamic obstacle avoidance environment based on deep reinforcement learning[J]. IEEE Access, 2024.

Comment 4 The font of the legend in Figure 8 can be appropriately enlarged, and other issues related to the readability of the legend and chart will not be repeated. Please check and modify them by the author.

Response 4 Thank you for your valuable feedback. In response to your suggestion, we have emphasized the content of the legend below the figure. Additionally, during our review, we noticed readability issues with Figure 2, and we have made the necessary adjustments. We appreciate your attention to the readability of the figures.

Comment 5 4.4. In the Field Experiments section, there is no comparison data between the actual motion trajectory of the robotic arm and the simulation results provided in the field experiments. It is necessary to supplement the performance verification in actual scenarios.

Response 5 Thank you for your valuable comment. We acknowledge the importance of comparing actual robotic arm motion trajectories with simulation results to further validate our approach. Due to hardware limitations and field experiment constraints, we were unable to capture precise trajectory data in our current study. However, we recognize that such a comparison would provide deeper insights into discrepancies between simulation and real-world performance. As a result, we have outlined this as an area for future work, where we plan to integrate motion tracking sensors to systematically analyze trajectory deviations. We have updated the discussion in Chapter 5 to acknowledge this limitation and outline our future research direction. We appreciate your suggestion, which will guide further improvements in our study. .

In the revision, we have added the following content:

In our field experiments, we observed that the actual motion trajectories of the robotic arms exhibited certain deviations from their simulated counterparts. While the simulation environment provides a controlled testing platform, it inevitably simplifies many complex real-world factors, such as sensor noise, actuator response delays, environmental dynamics, and unintended interactions. These discrepancies may impact the transferability of simulation-trained models and planned trajectories to real-world applications. To further bridge the Sim-to-Real gap, our future research will focus on refining the simulation model by incorporating more realistic actuator dynamics, improving sensor noise modeling, and developing advanced adaptive trajectory re-planning mechanisms. Additionally, we plan to integrate motion tracking sensors in field experiments to quantitatively analyze trajectory deviations between simulated and actual execution, thereby enhancing the alignment between simulation and reality and improving the practical applicability of our approach.

Comment 6 In the conclusion section, the algorithm optimization direction for large-scale scenarios can be added to enhance the adaptability of the algorithm in different scenarios.
Response 6 Thank you for your valuable suggestions. We recognize that algorithm generalization is an important research direction, especially in larger-scale scenarios. In this study, we primarily focused on the decision-making problem of orchard harvesting robots under typical fruit distribution conditions and validated the adaptability of our algorithm by simulating different fruit distribution patterns. However, for large-scale scenarios—such as larger orchards or higher fruit density distributions—the generalization performance of the algorithm still requires further exploration.

In the revision, we have added the following content in the Discussion and Conclusions sections:

Discussion：First, this study specifically targets dual-arm robots and their typical operating scenarios. In future work, when considering multiple robotic arms collaborating or applying the approach to larger-scale orchards, computational complexity becomes a crucial issue that requires further optimization. Future research could explore techniques such as pruned neural networks or parallel computing to reduce computational complexity and enhance scalability.

Conclusions：It is important to note that the impact of tree shape and agricultural management practices in this study primarily affects fruit distribution. During interactive training, the study incorporates approximately five types of fruit distribution scenarios to validate the adaptability of the robot planning algorithm to different distributions. By simulating complex distribution patterns such as branch occlusion and fruit clustering, which may lead to picking failures and fruit drops under real-world conditions, the experimental results further support the generalization conclusions.

Comment 7 Please ask the author to read the article carefully, solve any language and logic problems that may exist, and self-check and make necessary modifications.

Response 8 We appreciate your comments. We will carefully review the manuscript to address any language and logical issues that may exist. A thorough self-check will be performed, and any necessary modifications will be made to improve clarity and coherence.

Comment 8 Please check the reference format carefully, because the DOI number of some references is inconsistent with other references.

Response 8 Thank you for your careful review. We apologize for the inconsistency in the DOI numbers of some references. We will thoroughly check the reference list and ensure that all DOI numbers are formatted correctly and consistently in accordance with the required citation style. The revised manuscript will reflect these corrections.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The work deals with quite interesting issues regarding the a dynamic task planning approach for multi-arm apple picking robots based on a deep reinforcement learning framework incorporating Long Short-Term Memory networks and Proximal Policy Optimization .

However, the revision is still needed before the acceptance of this manuscript in Journal.

The literature review is comprehensive, but very broad. I would suggest narrowing it down a bit, especially paragraphs 1-3. It would be enough to describe the main ideas about apple picking robots in a more narrow way.

The Introduction ends with “The contributions of this paper are as follows:”. However, it will be good to see the text why you did this. What was important of this research. The literature review should be summarized and the reasons for your research should be stated, emphasizing the originality and novelty of the research.

Citations should be consistent according to the journal requirements. Please review one more time. Row 52: “...to traditional orchards (Stavros et al., 52 2023)”

Figure 2 is not of sufficient quality. Please, correct.

“Problem description” would be more appropriate after the Introduction as summarizing the literature analysis.

I would suggest reviewing all the information in Introduction, Preliminary and Method. I would suggest shortening these parts. Please leave the specific information that was necessary for conducting the research in the methodology. Informational knowledge should be in the Introduction.

Author Response

Comment 1 The literature review is comprehensive, but very broad. I would suggest narrowing it down a bit, especially paragraphs 1-3. It would be enough to describe the main ideas about apple picking robots in a more narrow way.

Response 1 Thank you for your valuable suggestion. We appreciate your feedback regarding the breadth of our literature review. In response, we have refined paragraphs 1–3 by narrowing the scope of the discussion, focusing more specifically on key aspects relevant to apple harvesting robots. We have removed some general discussions on agricultural robotics and emphasized the main advancements in apple harvesting robot research, such as structural optimization, fruit recognition, task planning, and path planning. These modifications ensure a more concise and targeted review while maintaining the necessary background information.

Comment 2 The Introduction ends with “The contributions of this paper are as follows:”. However, it will be good to see the text why you did this. What was important of this research. The literature review should be summarized and the reasons for your research should be stated, emphasizing the originality and novelty of the research.

Response 2 Thank you for your insightful feedback. We acknowledge the need to better connect the introduction with the research contributions. In response, we have revised the concluding part of the introduction to provide a more explicit rationale for this study. We have summarized the key findings from the literature review and clearly stated the research gap, emphasizing the importance, originality, and novelty of our work. These modifications ensure a smoother transition to the contributions of this paper, making the motivation and significance of our research more explicit.
In the revision, we have added the following content:

However, existing research mainly focuses on static environments or single-arm robots, lacking in-depth exploration of task planning for multi-arm harvesting robots in dynamic environments. This deficiency makes it difficult for robots to promptly adjust task plans when the environment changes or uncertain factors arise, resulting in idle robotic arms, reduced harvesting efficiency, insufficient task adaptability, and ultimately limiting the overall system performance.

Comment 3 Citations should be consistent according to the journal requirements. Please review one more time. Row 52: “...to traditional orchards (Stavros et al., 52 2023)”

Response 3 Thank you for your valuable feedback. We apologize for the inconsistency in citations. We have carefully reviewed the manuscript and ensured that all citations follow the journal's formatting requirements. Specifically, we have corrected the citation in Row 52 to align with the required style. The revised manuscript now reflects these changes, and we appreciate your attention to detail.

Comment 4 Figure 2 is not of sufficient quality. Please, correct.

Response 4 Thank you for your feedback. In response to your suggestion, we have enlarged the icons in Figure 2 and made the text more prominent to improve its clarity.

Comment 5 “Problem description” would be more appropriate after the Introduction as summarizing the literature analysis.

Response 5 Thank you for your suggestion. We appreciate your perspective on the placement of the "Problem Description" section. However, in this paper, the problem we address is closely related to the dual-arm apple picking robot introduced in the Preliminary section. The preliminary concepts provide essential context for understanding the problem formulation. Therefore, we believe that maintaining the current structure better preserves the logical flow of the paper. We kindly ask for your understanding in keeping the original arrangement.

Comment 6 I would suggest reviewing all the information in Introduction, Preliminary and Method. I would suggest shortening these parts. Please leave the specific information that was necessary for conducting the research in the methodology. Informational knowledge should be in the Introduction.

Response 6 We have reviewed and streamlined these sections by moving general informational content from the Methodology to the Introduction while ensuring that only essential details necessary for conducting the research remain in the Methodology. Additionally, we have condensed redundant explanations to enhance clarity and conciseness. These revisions improve the logical flow of the paper and ensure that each section serves its intended purpose more effectively.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Your paper presents significant contributions to the field, particularly in the area of robotic apple harvesting. The structure of the paper is well-organized, making it easy to follow your research objectives and findings.

Please find my detailed comments attached in the PDF copy of your paper.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Comment 1 Highlighting gaps in the literature can better justify your approach.

Response 1 Thank you for your valuable suggestion. We acknowledge the importance of clearly highlighting gaps in the literature to better justify our approach. In our revised manuscript, we have explicitly discussed the limitations of existing research in handling task planning for multi-arm harvesting robots in dynamic environments.

In the revision, we have added the following content:

Comment 2 How does the density of fruit affect the robot's ability?

Response 2 We appreciate the reviewer’s insightful question regarding the impact of fruit density on the robot's ability. As fruit density increases, the robot faces several challenges, including increased task planning complexity, a higher risk of mechanical interference, an increased likelihood of accidental contact with surrounding fruits, and reduced harvesting efficiency. We have incorporated a detailed analysis of the impact of fruit density in the Discussion section to further clarify the relationship between fruit density and robotic performance.

In the revision, we have added the following content:

Additionally, densely packed fruit distributions elevate the probability of accidental contact with surrounding fruits during harvesting, which may result in premature fruit detachment and ultimately affect overall harvesting efficiency and yield. Therefore, it is essential to develop more intelligent planning strategies to optimize the performance of the robotic system under such conditions.

Comment 3 Please, providing more detailed information about the simulation environment.

Response 3 Thank you for your valuable suggestion. We have added more detailed information about the simulation environment at the beginning of the Chapter 4 in the manuscript. The added content has been highlighted in yellow in the PDF version.

Comment 4 Discussing the results, it would be helpful to compare the outcomes of the simulations and experiments with previous studies.

Response 4 Thanks for this suggestion, however, currently, there are no other studies on the application of reinforcement learning in multi-arm harvesting task planning, hence we only compared our method with heuristic approaches. It is indeed a regrettable limitation.

Comment 5 Please, added any limitations encountered during the simulations and experiments, such as environmental factors, would provide a more balanced perspective on your findings.

Response 5 Your suggestion has been highly inspiring to us. We realize that the simulation environment simplifies many real-world factors, such as sensor noise, actuator delays, and environmental dynamics, which may affect the transferability of the trained model. In field experiment, variations in lighting conditions, wind disturbances, and minor mechanical deviations of the robotic arms may introduce uncertainties in task execution. Future research can further enhance the robustness of the proposed method by improving the realism of the simulation environment and incorporating adaptive control strategies. We have added an analysis of key limitations in the discussion section.

In the revision, we have added the following content:

Comment 6 Details on the types of metrics measured time, accuracy....

Response 6 Thank you for your insightful comment. In our revised manuscript, we have added clarifications regarding the metrics used to evaluate the proposed algorithm.

In the revision, we have added the following content:

In this study, "Mean Episode Length" is used to measure the number of decision steps taken from environment reset until all picking targets are completed or the termination condition is met. Since each decision step typically corresponds to a fixed control time interval, a reduction in episode length directly leads to a shorter task completion time. Additionally, "Mean Reward" is used to evaluate the execution efficiency and success rate of the agent during the picking process. A higher reward value indicates improved picking efficiency and success rate.

As shown in Figure 8, the proposed LSTM-PPO algorithm consistently achieves a lower average episode length than PPO, indicating that it can complete tasks more quickly in most scenarios. In terms of average reward, LSTM-PPO exhibits faster convergence in the early training phase and achieves a higher and more stable convergence level than PPO in the mid-training phase. Overall, the proposed algorithm outperforms PPO in terms of convergence speed, mean reward, and decision steps, demonstrating higher learning efficiency and shorter final episode length. This not only enhances the algorithm's learning performance in the simulation environment but also effectively reduces task completion time in real-world applications, thereby validating its superiority in dynamic task planning for orchard picking and similar applications.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The authors proposed an optimization method for a two-arm apple-picking robot. The proposed method uses PPO and reinforcement learning to select the best route to fruit for each arm while considering the other arm. Generally, the article is written and structured well on a practical issue. However, the following problems need more clarification.

The authors separately explained different parts of their network. The reader can best understand the method if presented with a general workflow of the network before explaining its components.

The authors should elaborate more on how the robot detects the fruits. What processes are performed on the RGB-D sensor to obtain coordinates?

The authors should give more details about the training data and the used simulations.

Please double-check the inline notations’ fonts and spaces.

Author Response

Comment 1 The authors should elaborate more on how the robot detects the fruits. What processes are performed on the RGB-D sensor to obtain coordinates?

Response 1 Thank you for your valuable comment. Our harvesting robot detects and locates fruits through the vision system, which includes an RGB-D sensor and a central processing unit. First, the RGB-D sensor captures the color and depth images at the current position. Then, the YOLOv8 vision detection model is used to detect apples in the RGB image, generating 2D bounding boxes and masks. Based on the color and depth images, we generate a point cloud, and the mask is used to extract the corresponding portion of the point cloud in order to obtain the 3D information of the apples. We will provide a more detailed description of this process in the revised manuscript.

In the revision, we have added the following content:

The visual sensing system primarily consists of an RGB-D camera and an edge computing module. First, the RGB-D camera captures the color and depth images at the current position. The YOLOv8 vision detection model is then applied to detect apples in the RGB image, generating 2D bounding boxes and masks. A point cloud is generated by fusing the color and depth images, and the mask is used to extract the corresponding portion of the point cloud in order to obtain the 3D information of the apples.

Comment 2 The authors should give more details about the training data and the used simulations.

Response 2 We appreciate the reviewer’s suggestion. As recommended, we have added more details in the beginning of Chapter 4. The relevant information has been highlighted in yellow in the PDF version of the manuscript.

Comment 3 Please double-check the inline notations’ fonts and spaces.

Response 3 Thank you for your feedback. We have carefully reviewed the inline notations and spaces in the manuscript to ensure that the formatting adheres to the required standards. If there are any inconsistencies, we will make further adjustments to ensure the manuscript meets the journal's formatting requirements.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

The research paper "Dynamic Task Planning for Multi-Arm Apple Harvesting Robots Using LSTM-PPO Reinforcement Learning Algorithm" is very structured and deals with applying a relevant, fast-evolving area of research. I would, however, propose these major revisions to strengthen the manuscript further:

The GA comparison is well organized, but the paper doesn't explicitly declare computational complexity for both methods. Including a complexity analysis (e.g., Big-O) for LSTM-PPO and GA would provide clarification of differences in performance.
The research points out the environmental adaptability of LSTM-PPO, like the vanishing of fruits. Quantitative measures of adaptability (i.e., rate of success of re-picking operations) must be incorporated for a more accurate assessment.
The field experiments are well-designed, but they involve a limited dataset (20 apples). Testing the algorithm on a larger scale (e.g., 100+ apples) and across different tree structures would be necessary to confirm real-world applicability.
The experiment combines real and simulated environments to enhance consistency. Nevertheless, the simulation arrangement should be explained more clearly, such as parameters of fruit movement probability, wind impact, and variations in lighting.
The selected performance measures (total picking time, average picking time, planning time) are suitable, but other measures such as energy consumption, success rate, and grasping accuracy should be included to give a complete assessment..
The work compares LSTM-PPO to GA but fails to benchmark with other AI-based methods, e.g., Deep Q-Networks (DQN) or Advantage Actor-Critic (A3C). Comparison with at least one other AI-based method is suggested.
The work doesn't describe the computing hardware utilized in training and running. Providing processor model, GPU type, and per decision step execution time details would enhance replicability.
The new method is tested on a small dataset of 20 apples. It should be tested in larger datasets (e.g., across multiple trees or fruit categories) to confirm generalization and scalability. The paper suggests using LiDAR or hyperspectral imaging for improved robustness, but no preliminary results or feasibility analysis are provided. A brief discussion on expected benefits and possible integration strategies is recommended.

Author Response

Comment 1 The GA comparison is well organized, but the paper doesn't explicitly declare computational complexity for both methods. Including a complexity analysis (e.g., Big-O) for LSTM-PPO and GA would provide clarification of differences in performance.

Response 1 Thank you for your suggestion. We will include an analysis of the computational complexity of LSTM-PPO and Genetic Algorithm (GA), and add the Big-O notation for both time and space complexity in the methodology section to clarify the fundamental differences in training efficiency and real-time performance. We have added Table 4 to the manuscript and highlighted the changes in yellow in the PDF version.

Comment 2 The research points out the environmental adaptability of LSTM-PPO, like the vanishing of fruits. Quantitative measures of adaptability (i.e., rate of success of re-picking operations) must be incorporated for a more accurate assessment.

Response 2 Thank you for your valuable suggestion. Based on the actual operating conditions and the real success rate of the dual-arm picking robot mentioned in the paper, we set the picking failure probability to 10% to evaluate the performance of different strategies under uncertainty.

In the revision, we have added the following content:

In the dynamic experiment, we further validated the adaptability of the proposed LSTM-PPO algorithm in real-world scenarios. Unlike static environments, dynamic conditions simulate potential failures in fruit picking caused by fruit position deviations, robotic arm execution inaccuracies, and trunk vibrations during orchard harvesting. To better reflect real operational conditions, we assumed an approximate 10% failure probability in the experiments to evaluate the performance of different strategies under uncertainty.

Comment 3 The field experiments are well-designed, but they involve a limited dataset (20 apples). Testing the algorithm on a larger scale (e.g., 100+ apples) and across different tree structures would be necessary to confirm real-world applicability.

Response 3 Thank you for your attention to the generalization of the experiments. This study covers approximately 5 types of fruit distribution scenarios in interactive training (ranging from 15 to 25 fruits), with a focus on validating the robot path planning algorithm’s adaptability to different distributions. Since the fruit density in actual orchard working areas typically does not exceed 25 fruits per working unit, the current scene design fully covers typical needs, and experimental results show that the algorithm’s planning success rate remains stable in random distribution tests (as shown in Table 2 and Table 3). Furthermore, tree shape differences in this study only affect the spatial distribution features of the fruits (not the grasping posture), and the training data has simulated complex distribution patterns such as branch occlusion and cluster aggregation, which effectively supports the generalization conclusions.

In the revision, we have added the following content:

It is important to note that the impact of tree shape and agricultural management practices in this study primarily affects fruit distribution. During interactive training, the study incorporates approximately five types of fruit distribution scenarios to validate the adaptability of the robot planning algorithm to different distributions. By simulating complex distribution patterns such as branch occlusion and fruit clustering, which may lead to picking failures and fruit drops under real-world conditions, the experimental results further support the generalization conclusions.

Comment 4 The experiment combines real and simulated environments to enhance consistency. Nevertheless, the simulation arrangement should be explained more clearly, such as parameters of fruit movement probability, wind impact, and variations in lighting.

Response 4 Thank you for your attention to the realism of the simulation. The core goal of this paper is to verify the effectiveness of task allocation and path planning strategies. Therefore, the design of the simulation environment focuses on key dynamic factors (such as random disappearance of fruit and the probability of grasping failure), rather than fully replicating real-world physical details (such as continuous variations in wind or lighting). Although the current parameter model simplifies physical interactions, it is sufficient to support the robustness validation of the algorithm under typical disturbance scenarios. The fine-grained physical modeling you suggested is of great value for agricultural robot simulations. In future work, we plan to incorporate fluid dynamics models (such as CFD wind field simulation) and lighting gradient sensor data to enhance the physical credibility of the environmental simulation.

In the revision, we have added the following content:

Comment 5 The selected performance measures (total picking time, average picking time, planning time) are suitable, but other measures such as energy consumption, success rate, and grasping accuracy should be included to give a complete assessment.

Response 5 Thank you for your valuable suggestion. We acknowledge that a comprehensive evaluation of the algorithm should include multiple performance metrics. In this study, our primary focus is on task planning and scheduling efficiency, rather than low-level control optimization. Therefore, we selected total picking time, average picking time, and planning time as the core evaluation criteria, as these metrics directly reflect the effectiveness of the proposed method.Regarding energy consumption, we believe that its value depends on robot hardware configuration, motor type, and control strategy, which can vary significantly across different robotic platforms. Additionally, since energy consumption is closely related to execution time and path efficiency, our existing evaluation metrics already indirectly reflect this aspect.To further enhance the comprehensiveness of the evaluation, we have added task success rate (the proportion of successful picking attempts) and picking failure rate (the probability of failure in a single picking attempt) in the revised manuscript, as these metrics provide a more direct assessment of the algorithm’s reliability.

Comment 6 The work compares LSTM-PPO to GA but fails to benchmark with other AI-based methods, e.g., Deep Q-Networks (DQN) or Advantage Actor-Critic (A3C). Comparison with at least one other AI-based method is suggested.

Response 6 This is a good suggestion, but after our research, we have indeed found no other recent studies on reinforcement learning in multi-arm harvesting task planning, so the methods we compare are limited. We acknowledge this limitation and apologize for it. That being said, the issue you raised is precisely one of our future research directions. We aim to explore and develop more reinforcement learning-based approaches for multi-arm coordination in harvesting tasks and compare them with other optimization techniques to further validate their effectiveness in real-world applications.

Comment 7 The work doesn't describe the computing hardware utilized in training and running. Providing processor model, GPU type, and per decision step execution time details would enhance replicability.

Response 7 Thank you for your suggestion. We will add hardware details. Detailed information can be found in Chapter 4 of the manuscript and is highlighted in yellow in the PDF version.

Comment 8 The new method is tested on a small dataset of 20 apples. It should be tested in larger datasets (e.g., across multiple trees or fruit categories) to confirm generalization and scalability. The paper suggests using LiDAR or hyperspectral imaging for improved robustness, but no preliminary results or feasibility analysis are provided. A brief discussion on expected benefits and possible integration strategies is recommended.

Response 8 Thank you for your suggestion. We will add analysis in the “Conclusion” section. Detailed information can be found in Chapter 6 of the manuscript and is highlighted in yellow in the PDF version.
In the revision, we have added the following content:

For different types of fruit trees, the core challenges lie primarily in the design of the visual localization system and the gripper strategies. This paper focuses on the collaborative strategies in the task execution. By integrating multiple sensors, the accuracy and real-time performance of fruit distribution information acquisition can be significantly improved, which is crucial for task planning algorithms and multi-arm coordination, serving as the foundation and guarantee for efficient collaboration. Future research could further enhance the perception system’s accuracy in acquiring fruit distribution information under various operational conditions by combining stereo vision with LiDAR or hyperspectral imaging, thereby providing more precise data support for the robot’s task decision-making system and improving overall execution efficiency and task completion quality.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The author has made good revisions, and I agree to accept it according to the current version.

Article Menu

Dynamic Task Planning for Multi-Arm Apple-Harvesting Robots Using LSTM-PPO Reinforcement Learning Algorithm

Further Information

Guidelines

MDPI Initiatives

Follow MDPI