Next Article in Journal
High Precision Pseudo-Range Measurement in GNSS Anti-Jamming Antenna Array Processing
Previous Article in Journal
Complex Dynamics of a Novel Chaotic System Based on an Active Memristor
 
 
Article
Peer-Review Record

Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space

Electronics 2020, 9(3), 411; https://doi.org/10.3390/electronics9030411
by Reinis Cimurs 1, Jin Han Lee 2 and Il Hong Suh 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2020, 9(3), 411; https://doi.org/10.3390/electronics9030411
Submission received: 20 January 2020 / Revised: 24 February 2020 / Accepted: 26 February 2020 / Published: 28 February 2020
(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

The contribution of this work is interesting and deserves publication. The title is descriptive. The abstract clearly indicates the scope. The paper is well organised and logically written, nevertheless, the English language of the contribution should be improved by a native speaker.

Appropriate research goals are chosen in this contribution, which shows that the authors have a high level of understanding of current research within the field. I suggest that the authors explain more in depth the choice of the. The presentation of the results in terms of the research objectives has been made, nevertheless, there should be a deeper clarification.

The authors have been able to draw logical conclusions from the results. 

The quality of pictures and figures is good.

-Nevertheless, please clarify the learning framework better.

@INPROCEEDINGS{8765910,
author={J. M. S. {Ribeiro} and M. F. {Silva} and M. F. {Santos} and V. F. {Vidal} and L. M. {Honório} and L. A. Z. {Silva} and H. B. {Rezende} and A. F. {Santos Neto} and P. {Mercorelli} and A. A. N. {Pancoti}},
booktitle={2019 20th International Carpathian Control Conference (ICCC)},
title={Ant Colony Optimization Algorithm and Artificial Immune System Applied to a Robot Route},
year={2019},
volume={},
number={},
pages={1-6},}

@CONFERENCE{Ivanov20183187,
author={Ivanov, M. and Kartashov, V. and Sergiyenko, O. and Hernandez, W. and Tyrsa, V. and Sheiko, S. and Mercorelli, P. and Kolendovska, M.},
title={Individual scans fusion in virtual knowledge base for navigation of mobile robotic group with 3D TVS},
journal={Proceedings: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society},
year={2018},
pages={3187-3192},
}

@ARTICLE{Garcia-Cruz2014141,
author={Garcia-Cruz, X.M. and Sergiyenko, O.Yu. and Tyrsa, V. and Rivas-Lopez, M. and Hernandez-Balbuena, D. and Rodriguez-Quiñonez, J.C. and Basaca-Preciado, L.C. and Mercorelli, P.},
title={Optimization of 3D laser scanning speed by use of combined variable step},
journal={Optics and Lasers in Engineering},
year={2014},
volume={54},
pages={141-151},
}

 

Author Response

We would like to thank the reviewer for his time and the review of our paper. Additionally, thank you for the suggested additional papers. We have added them as a reference to our paper and they have been introduced in Section 2.

 

The contribution of this work is interesting and deserves publication. The title is descriptive. The abstract clearly indicates the scope. The paper is well organized and logically written, nevertheless, the English language of the contribution should be improved by a native speaker.

We have updated the contributions section of our paper with, hopefully, improved English language.

 

Appropriate research goals are chosen in this contribution, which shows that the authors have a high level of understanding of current research within the field. I suggest that the authors explain more in-depth the choice of the. The presentation of the results in terms of the research objectives has been made, nevertheless, there should be a deeper clarification.

We have added clarifications for obtained results from the experiments in the real environment and explanation for the difference between the compared approaches in Section 5.2. The resultant output from the network in a real environment is added in Fig. 6(b). This shows the network performance from a real depth camera input.

 

-Nevertheless, please clarify the learning framework better.

We have clarified the learning framework by explaining the technical details of learning and optimization in Section 4. Additionally, learning parameters are described in Table 1.

Reviewer 2 Report

The authors deliver a well-written example of goal-oriented obstracle avoidance navigation system based on deep reinforcement learning. Starting from the introduction the problematics of implemenetation are mentioned and, even without numerous references, explained in detail. Section 2 and 3 are well-funded reviews. Experimental documentation has a high quality, especially through the used figures, especially figure 4, 7 and 8. Section 6 adresses all discovered issues of the experiments. All in all, an acceptable contribution.

Author Response

We would like to thank the reviewer for his time and the review of our paper. Additionally, we have slightly expanded the related works section and referred to wider selection of related research.


Reviewer 3 Report

Abstract needs to clearly state what was achieved.

The state of the art section does not explore how good other approaches are and why your approach is needed.

I am not sure why the approach is that novel.  It just feel like long presentation with actor-critic on end.  The actor-critic model has been used for a long time with this type of approach.

Why is the model unsupervised when giving it a reward. Yes no label, but I would consider this supervision.

There is a need for more details on the model.  The archtecture of the actor-critic model, train time and how fast it learning, parameters used.

The environment does not seem that complicated. Would you model handle a more complex environment with moving objects.

I would have liked to see the actual representation produced by the deep learning part.  

You should have explored more is the robot modelling the room or simply doing object avoidance.  Reactive objective avoid is not that good.  If it is planning in some form would be interesting.  Does it create the optimum route round the room.

I would like to see when the model breaks down and performs badly,  and why this is. 

More details needed on future work.

 

Author Response

We would like to thank the reviewer for his time and the review of our paper.

 

Abstract needs to clearly state what was achieved.

The abstract has been updated to reflect the results obtained from the experiments.

 

The state of the art section does not explore how good other approaches are and why your approach is needed.

We have updated the Related works section to explicitly state why the similar state of the art approaches don’t always work in unknown environments. When comparing similar deep learning-based approaches, their limitations are described. Generally, the used depth image approaches do not consider goal orientation in their obstacle avoidance tasks and produce actions only in continuous space. However, obstacle avoidance in continuous space approaches use very limited input data with combined laser and goal information. These approaches do not consider mixed-input networks and have limitations as shown by the experiments.

 

I am not sure why the approach is that novel.  It just feel like long presentation with actor-critic on end.  The actor-critic model has been used for a long time with this type of approach.

While other actor-critic methods have been used for a variety of implementations, its implementation in learning end-to-end robot navigation for navigation in the real environment is an emerging field. Our approach improves upon previously examined ones by dealing with significantly larger input data as previous uses of the DDPG network use simplified input data. Additionally, a mixed-input approach is introduced for separate feature extraction from different types of data.

Why is the model unsupervised when giving it a reward. Yes no label, but I would consider this supervision.

As correctly pointed out, for supervised learning there needs to be a data-set of correct labels that the input is then compared to. Supervision relates to learning from designated correct answers to an input. For an obstacle avoidance situation that would mean that a human operator first performs the action and then the network learns from comparing its own action to that of the human. Or a path is fist obtained with a path planner with full information of the environment and then the network output is then compared to it. Here, the human action or path is labeled as the correct solution and the task of the network is to be as close to it as possible. However, in reinforcement learning no data-set is available and no designated way of achieving the goal exists. There is no correct answer to an input. There exists a described reward for some outcome. But the way how to arrive at the goal needs to be learned through trial and error. The network scores its own performance and derives a policy by comparing its performance to its previous experiences. While we understand that describing a reward for preferred actions might seem that it implies supervision, the fact that the network does not learn by comparing its output to designated correct actions means it is not using supervised learning. To avoid such confusion, however, we have removed references to unsupervised and supervised learning and refer to our method only as using deep reinforcement learning.

There is a need for more details on the model.  The archtecture of the actor-critic model, train time and how fast it learning, parameters used.

We have clarified the learning framework by explaining the technical details of learning and optimization in Section 4. Additionally, learning parameters are added in Table 1.


The environment does not seem that complicated. Would you model handle a more complex environment with moving objects.

Additional experiments were performed with multiple moving humans in a narrow setting. A robot is tasked to navigate between 2 points in a hallway at a distance of 5 meters. 3 moving people are used as obstacles in this setting. Video of the experiment is available in the supplementary video material uploaded on youtube and available at https://youtu.be/nNWoabjKxIA (additional experiment begins at 4:57). Reference to the video has been added to the article.

I would have liked to see the actual representation produced by the deep learning part.  

The resultant output from the network in a real environment is added in Fig. 6(b). This shows the network performance from a real depth camera input.

You should have explored more is the robot modelling the room or simply doing object avoidance.  Reactive objective avoid is not that good.  If it is planning in some form would be interesting.  Does it create the optimum route round the room.

As mentioned in the introduction and related works, in order to perform path planning, it is necessary to build a map of the environment in some capacity. The sensors need to be reliable and the information trustworthy. Unfortunately, it is not always possible, especially if navigation is performed in an unknown environment. For optimal planning, the full layout of the environment is required. Therefore the goal of our research is not to obtain optimal navigation around the room. The goal is to avoid obstacles and arrive at the destination based only on the locally available information. We agree, that for fully autonomous navigation in known or semi-known environments some per-planning of the robot's route should be performed and our method could be integrated into such a system for dealing with situations, where planning is unreliable or difficult.

I would like to see when the model breaks down and performs badly,  and why this is. 

The failure states of our proposed approach are discussed in Section 5.1. describing the experimental situations depicted in Fig. 4(i) and Fig.4(j). We have added a more explicit explanation in the Summary section of the paper.

More details needed on future work.

Summary and Discussions section has been updated to include more details of the proposed future works. Probable approaches and their implementation ideas have been explained.

Round 2

Reviewer 3 Report

You have made some nice changes.

Back to TopTop