Next Article in Journal
Global Properties of Cytokine-Enhanced HIV-1 Dynamics Model with Adaptive Immunity and Distributed Delays
Next Article in Special Issue
Design and Development of an Electronic Controller for Accurate Temperature Management for Storage of Biological and Chemical Samples in Healthcare
Previous Article in Journal
Numerical Approximations of Diblock Copolymer Model Using a Modified Leapfrog Time-Marching Scheme
Previous Article in Special Issue
Simultaneous Integration of D-STATCOMs and PV Sources in Distribution Networks to Reduce Annual Investment and Operating Costs
 
 
Article
Peer-Review Record

Learning Trajectory Tracking for an Autonomous Surface Vehicle in Urban Waterways

Computation 2023, 11(11), 216; https://doi.org/10.3390/computation11110216
by Toma Sikora 1,*, Jonathan Klein Schiphorst 2 and Riccardo Scattolini 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Computation 2023, 11(11), 216; https://doi.org/10.3390/computation11110216
Submission received: 31 August 2023 / Revised: 18 October 2023 / Accepted: 25 October 2023 / Published: 2 November 2023
(This article belongs to the Special Issue Applications of Statistics and Machine Learning in Electronics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Reinforcement learning is used to control the ASV in urban waterways. However, the contribution seems minor and it is not clear about the method used in this paper. The comments are as follows:

1) What are the  contributions and innovations of this paper? 

2) The structure of the whole system should be given. 

3) The structure of the network should be given and a more detailed description of the method proposed in this paper is needed t

4)

Comments on the Quality of English Language

The writing is well. 

Author Response

Reviewer 1

 

Comment 1: What are the  contributions and innovations of this paper?

The contributions and innovations of this paper are the following:

  • Development of a comprehensive physical simulation for an ASV in urban waterways, including the modeling of the various disturbances and uncertainties affecting the platform based on a review of relevant literature.
  • Development of a deep reinforcement learning wrapper of the physical simulation in a standard learning environment and an end-to-end learning procedure modelled specifically with the task of trajectory tracking in urban waterways in mind.
  • A presentation of the rigorous testing procedure and the comparison between the current NMPC and the trained RL based controller performance both in simulation and real world scenarios. When subjected to disturbances and uncertainties, the RL based controller was capable of performing trajectory tracking with a lower error than the current NMPC in simulation, falling short in the real world scenario.

The introduction has been expanded with this information (77-89).


Comment 2: The structure of the whole system should be given.

The manuscript has been updated with a more thorough description of the system (203-207, Figure 2).


Comment 3: The structure of the network should be given and a more detailed description of the method proposed in this paper is needed t

If I am not mistaken, the structure of the network is given at the end of the second chapter (255-259).  Unfortunately, I cannot read the end of the comment, however, changes have been made to make the method more clear (161-194). More information was given about the operation of the PPO algorithm and how the learned agent fits in the global picture of the system.

Thank you for your comments, please let me know if the changes made to the manuscript are satisfactory.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents the development of a learning-based controller for the Roboat platform. When subjected to uncertainty or external disturbances in the model, the proposed controller can track trajectories with smaller errors than the nonlinear model predictive controller (NMPC) and compare it with the current control strategy in simulation and real world. In my opinion, this article has made certain contributions. Before accepting it, make modifications to the following issues.

1. The format of the article needs to be modified.

2. The language of the article needs to be further modified.  

3. For first-time readers, the meanings represented by parameters such as x and y can be further explained.  

4. The advantage of the article is that it reduces tracking error, but relatively speaking, the computational power is also increasing, and the proportion of increase is quite large. Have you considered this issue in practical applications? 

Comments on the Quality of English Language

The language of the article needs to be further modified.

Author Response

Reviewer 2

 

Comment 1:  The format of the article needs to be modified.

I reached out to the editor regarding this topic, as the manuscript was written in accordance with the MDPI Computation guidelines, however, I am still waiting for the reply.


Comment 2: The language of the article needs to be further modified.

Upon consulting with multiple colleagues minor changes have been done to the manuscript. Please let me know if it is satisfactory now.


Comment 3: For first-time readers, the meanings represented by parameters such as x and y can be further explained.

I went through the manuscript updating the description of parameters where it seemed necessary (106, 107, 108, 124, eq. after 222).

Comment 4: The advantage of the article is that it reduces tracking error, but relatively speaking, the computational power is also increasing, and the proportion of increase is quite large. Have you considered this issue in practical applications?

Indeed, I have considered the computational power in practical applications. However, it was found that the computational power was significantly reduced. This is due to the nature of the trained neural network which is composed of only two fully connected 64 neuron layers. Obtaining a control command from the learned controller consists of simply calculating the output of the neural network function from the observation vector values . On the other hand, the NMPC contains more computation heavy operations, such as integration, to obtain the control command leading to slower computation.

 

Thank you for your comments, please let me know if the changes are satisfactory.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper is of value in some sense and can be accepted by this journal if the following aspects are considered carefully

1. The introduction is not clear enough, and some relative results should be stated.

2. The figures in this paper are not clear enough.

3. Some lastest references should be cited.

Author Response

Reviewer 3

Comment 1: The introduction is not clear enough, and some relative results should be stated.

The introduction has been modified according to your suggestions (26-34, 77-89).

Comment 2: The figures in this paper are not clear enough.

The figures have been increased in the manuscript to improve clarity. I also reached out to the editor for further suggestions on the issue, as now the formatting of the figures could be improved. I hope it is more clear now.

Comment 3: Some lastest references should be cited.

The manuscript has been updated with more recent references, you can find them commented in the LaTex file (23, 44, 57-70).

 

Thank you for your comments, please let me know if the changes are satisfactory.

Reviewer 4 Report

Comments and Suggestions for Authors

1. For lines 26 to 28, why traditional control approaches such as the NMPC need reliable measurements of model parameters when encountering significant uncertainties and disturbances? Explain the reason in detail.

2.  This study proposed a learning-based controller. To guarantee fairness of comparison, authors should choose another learning-based controller to do a comparison instead of NMPC.

3. For proposed RL controller, authors should show its principle clearly. Some critical Equations seem to be missed.

4. Lines 18 to 21 lack concrete evidences or references to support the examples provided (marine exploration, search and rescue, environmental monitoring, etc.). It is recommended to include references to real-world cases where autonomous surface vessels (ASVs) have been used for these purposes.

5. It would be advisable to explicitly indicate f_i as f_i (i=1,2,3,4) in line 93 for clarity.

6. If you simply mention that the parameter settings in Table 1 were found through repeated iterations, it may indeed lack sufficient justification for how the parameter values were determined. Is there no algorithm used to find these parameters?

7. In Figure 10, there seems to be a noticeable difference, but when examining the description for Figure 6, it indicates that NMPC had an average positional error of 0.3018m, while RL had 0.2836m, resulting in only a 0.0182 difference. Do you consider this to be a significant difference? Furthermore, since there are no comparisons with other methods, there may be insufficient evidence to assert that this difference is notably substantial.

Comments on the Quality of English Language

The English is quite marginal. The manuscript should be checked again for grammar and clear expression.

Author Response

Reviewer 4

 

Comment 1:  For lines 26 to 28, why traditional control approaches such as the NMPC need reliable measurements of model parameters when encountering significant uncertainties and disturbances? Explain the reason in detail.

 The manuscript has been updated with the reasoning for the necessity of reliable measurements of model parameters (26-34).

Comment 2: This study proposed a learning-based controller. To guarantee fairness of comparison, authors should choose another learning-based controller to do a comparison instead of NMPC.

Indeed, the study proposed a learning-based controller in order to improve the behavior when subjected to disturbances and uncertainties with respect to the NMPC. The algorithm of choice to represent learning-based controllers was the PPO, due to its versatility and proven effectiveness in similar problems. I agree that such a comparison would be interesting, however, I believe a comparison among learning-based controllers should be a different topic on its own.

Comment 3: For proposed RL controller, authors should show its principle clearly. Some critical Equations seem to be missed.

The manuscript has been updated with further explanation of the RL algorithm, please let me know if they are satisfactory (161-194).

Comment 4: Lines 18 to 21 lack concrete evidences or references to support the examples provided (marine exploration, search and rescue, environmental monitoring, etc.). It is recommended to include references to real-world cases where autonomous surface vessels (ASVs) have been used for these purposes.

The manuscript has been updated with a number of references to real-world cases mentioned in the introduction (20).

Comment 5: It would be advisable to explicitly indicate f_i as f_i (i=1,2,3,4) in line 93 for clarity.

Yes, I agree, the text has been updated (124).

Comment 6: If you simply mention that the parameter settings in Table 1 were found through repeated iterations, it may indeed lack sufficient justification for how the parameter values were determined. Is there no algorithm used to find these parameters?

Yes, there are algorithms that could have been used to find the parameters optimally, however, due to very high computation cost of training the reinforcement learning algorithm using them was not feasible. Instead, the tuning process was guided by a heuristic based on the trade-off between smoothness and precision. Therefore, the phrasing may have not been accurate, since the repeated iterations were not random, but heuristically guided. The manuscript was changed accordingly (240-244), so please let me know if it is fine now.

Comment 7: In Figure 10, there seems to be a noticeable difference, but when examining the description for Figure 6, it indicates that NMPC had an average positional error of 0.3018m, while RL had 0.2836m, resulting in only a 0.0182 difference. Do you consider this to be a significant difference? Furthermore, since there are no comparisons with other methods, there may be insufficient evidence to assert that this difference is notably substantial.

I agree, in the baseline setting (without the uncertainties and disturbances) the difference between the two is marginal at best. This goes in line with the aim of the article, as it was never intented to prove that the RL controller outperforms the NMPC in this setting, as the NMPC performs remarkably well. The aim was to try and reduce this error when subjected to uncertainties and disturbances.

 

Thank you for your comments, please let me know if the changes are satisfactory.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have answered all my concerns. So, the paper can be accepted for publication now.

Comments on the Quality of English Language

The authors have answered all my concerns. So, the paper can be accepted for publication now.

Author Response

Thank you very much.

Reviewer 4 Report

Comments and Suggestions for Authors

The overall quality and completeness of the manuscript has been greatly improved compared to the previous submission. 

 

Comments on the Quality of English Language

The manuscript seems to be fine. However, some sentences and expressions still need to be checked again.

 

Author Response

Thank you very much. The language of the manuscript was checked and modified with a few improvements.

Back to TopTop