Next Article in Journal
Phase-Based Low Power Management Combining CPU and GPU for Android Smartphones
Previous Article in Journal
Coplanar Asymmetry Transformer Distributed Modeling for X-Band Drive Power Amplifier Design on GaN Process
Previous Article in Special Issue
Analysis of Mobile Robot Control by Reinforcement Learning Algorithm
 
 
Article
Peer-Review Record

An Improved Multi-Objective Deep Reinforcement Learning Algorithm Based on Envelope Update

Electronics 2022, 11(16), 2479; https://doi.org/10.3390/electronics11162479
by Can Hu 1, Zhengwei Zhu 1, Lijia Wang 1, Chenyang Zhu 2,* and Yanfei Yang 2
Reviewer 2: Anonymous
Reviewer 3:
Electronics 2022, 11(16), 2479; https://doi.org/10.3390/electronics11162479
Submission received: 12 July 2022 / Revised: 30 July 2022 / Accepted: 5 August 2022 / Published: 9 August 2022

Round 1

Reviewer 1 Report

- The need to the discounting factor \gamma should be specified altogether with Eq. 6. 

- The network in Fig. 2 is acting like a critic network with some degree. The would be nice if authors can discuss about its benefit over the networks proposed by the previous works such as 10.1080/16168658.2021.1943887 or similar works in the introduction. That will help the readers to clarify this contribution. 

- The convergence of the update law in Eq. 9 should be clarified for more details according to the sequence of \theta^{-} when requiring a positive parameter \tou <1-  

 

- To clarify the results, in Fig. 5, all loss function curves may be plotted together or using the same y-axis scale.

- Section 5 should be improved in order enhance its propose for pointing out the advantage points of the proposed scheme. 

- The conclusion should be rewritten a bit to emphasize the results according to main contributions of this work also the abstract. 

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Multi-objective reinforcement learning (MORL) aims to uniformly approximate the Pareto frontier in multi-objective decision-making problems, which suffers from insufficient exploration and unstable convergence. The authors proposed a multi-objective deep reinforcement learning algorithm (Envelope with Dueling structure, Noisynet and soft update, EDNs) to improve the ability of the agent to learn optimal multi-objective strategies. Firstly, the EDNs algorithm uses neural networks to approximate the value function and update the parameters based on the convex envelope of the solution boundary. Then the DQN network structure is replaced with the dueling structure and the state value function is split into the dominance function and value function to make it converge faster. Secondly, the Noisynet method is used to add exploration noise to the neural network parameters to make the agent have a
more efficient exploration ability. Finally, the soft update method updates the target network parameters to stabilize the training procedure. They used DST environment as a case study, the experimental results show that the EDNs algorithm has better stability and exploration capability than the EMODRL algorithm. In 1000 episodes, EDNs algorithm improved the coverage by 5.39% and reduced the adaptation error by 36.87%. The following references should be added in reference section:

1. Vandana, R. Dubey, Deepmala, L.N. Mishra, V.N. Mishra, Duality relations for a class of a multiobjective fractional programming problem involving support functions, American J. Operations Research, Vol. 8, (2018), pp. 294-311. DOI: 10.4236/ajor.2018.84017.

2.  R. Dubey, Deepmala, V.N. Mishra, Higher-order symmetric duality in nondifferentiable multiobjective fractional programming  problem over cone constraints, Stat., Optim. Inf. Comput., Vol. 8, March 2020, pp 187–205. DOI: 10.19139/soic-2310-5070-601.

Recommendation: Based on above revision, manuscript is accepted in this journal after minor revision.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

 

 MERITS

- A accurate presentation of MORL methods is provided, based on detailed investigation of related works from the literature. Both mathematical support and technical description

- A novel MORL algorithm is proposed and tested against others ones, showing higher performances for some particular multi-objective optimization problems

- The pseudo-code’s lines are presented

- The new algorithm performances are demonstrated in clear way, both by graphs and by numerical values

CRITICS

- The first part of the fifth section, Related Work, since is a presentation of other Authors’ works, should be placed in the beginning of the paper.

ERRORS

- There are few editing errors: capitalized “the”, “on” and “and” in the titles of sub-sections 3.3, 4.1

- The text “In this paper, with the number of training episodes as the horizontal coordinate and

the value of loss function as the vertical coordinate. We compare the loss function curves of EMODRL, Envelope-Dueling, Envelope-Noise, Envelope-soft, Envelope-Dueling-…” seems fuzzy.

- Some figures are over-sized

RECOMMENDATIONS

- Correct the errors

- Try to reduce the size of figures 1, 2, 3 and 4

- The references should be mentioned into the text in increasing order of the numbers

- If it is possible, remove the fifth section including the content into the Background section

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Authors already did a good work to revised the article. 

Back to TopTop