Next Article in Journal
Analysis of Associated Woody and Semi-Woody Local Wild Species in Entre Ríos, Argentina: Exploring the Agricultural Potential of Hexachlamys edulis
Next Article in Special Issue
Identification and Prioritization of Critical Success Factors of a Lean Six Sigma–Industry 4.0 Integrated Framework for Sustainable Manufacturing Using TOPSIS
Previous Article in Journal
Analyzing How European Startups Generate Eco-Processes and Eco-Products: Eco-Innovation Implementation, Financial Resources, and Patents
Previous Article in Special Issue
Exploring the Influence of Pulsed Electric Field and Temperature on Key Physical Attributes in Sustainable Hot-Air-Dried Apple Tissue
 
 
Article
Peer-Review Record

Research on Sustainable Scheduling of Material-Handling Systems in Mixed-Model Assembly Workshops Based on Deep Reinforcement Learning

Sustainability 2024, 16(22), 10025; https://doi.org/10.3390/su162210025
by Beixin Xia, Yuan Li, Jiayi Gu and Yunfang Peng *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Sustainability 2024, 16(22), 10025; https://doi.org/10.3390/su162210025
Submission received: 17 October 2024 / Revised: 15 November 2024 / Accepted: 16 November 2024 / Published: 17 November 2024
(This article belongs to the Special Issue Sustainability in Industrial Engineering and Engineering Management)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents an innovative approach to sustainable scheduling in mixed-model assembly workshops using a Deep Reinforcement Learning (DRL) model, specifically an improved Deep Q-Network (DQN) with Prioritized Experience Replay (PER-DQN). The model aims to dynamically respond to changes in the workshop environment and optimize material handling efficiency, balancing production speed and energy consumption. This is a significant contribution to the field, especially considering the increasing complexity and sustainability requirements in manufacturing industries.

It is recommended to implement following suggestion to improve the paper

1.     The literature review needs to be more thorough, particularly regarding recent advancements in reinforcement learning and sustainable scheduling. For example, more recent studies (post-2020) focusing on hybrid learning methods and their applications in manufacturing could provide valuable context. Relevant papers: A smart algorithm for multi-criteria optimization of model sequencing problem in assembly lines; multi-policy deep reinforcement learning for multi-objective multiplicity flexible job shop scheduling; Multi-Objective Optimization for Models Sequencing in Mixed-Model Assembly Lines

2.     It is suggested to add the research gap and explain the motivation behind this study.

3.     Remove heading 3.1 subsection DQN algorithm

4.     Why chose DQN algorithm instead of other state of art algorithms? What’s the complexity of this algorithm?

5.     How to extract the features from the current state?

6.     Provide high quality diagrams figure 1, 2

7.     How to select the optimal parameters?

8.     Add the future direction is conclusion

9.     The manuscript requires thorough proofreading for language and grammar. It is recommended to perform a detailed language check and proofreading before resubmitting

Comments on the Quality of English Language

   The manuscript requires thorough proofreading for language and grammar. It is recommended to perform a detailed language check and proofreading before resubmitting

Author Response

Comments 1:  The literature review needs to be more thorough, particularly regarding recent advancements in reinforcement learning and sustainable scheduling. For example, more recent studies (post-2020) focusing on hybrid learning methods and their applications in manufacturing could provide valuable context. Relevant papers: A smart algorithm for multi-criteria optimization of model sequencing problem in assembly lines; multi-policy deep reinforcement learning for multi-objective multiplicity flexible job shop scheduling; Multi-Objective Optimization for Models Sequencing in Mixed-Model Assembly Lines. 

Response 1:  Thank you for your suggestions. We have added a literature review on the latest developments in reinforcement learning and sustainable scheduling in the introduction section (page 1 line 37-43; page 2 line 56-63).

Comments 2:  It is suggested to add the research gap and explain the motivation behind this study.

Response 2:  Your suggestions are very helpful. We have elaborated on the research gap in the introduction section and further detailed the motivation of this study (page 2 line 72-84).

Comments 3:  Remove heading 3.1 subsection DQN algorithm

Response 3:  Thank you for your suggestions. We found that section 3.1 indeed overlaps with section 4, and after consideration, we have removed section 3.1 on DQN and merged some of its content into section 4.

Comments 4:  Why chose DQN algorithm instead of other state of art algorithms? What’s the complexity of this algorithm?

Response 4:  The DQN algorithm approximates the Q-function using deep neural networks, enabling it to handle continuous action and state spaces. The workshop environment is complex and dynamic, with a wide range of possible actions. In such a complex and dynamic production environment for material handling systems, the DQN algorithm can leverage its computational advantages. The complexity of the DQN algorithm is high, mainly due to the high-dimensional state and action spaces in complex environments, which require substantial data and computational resources for training. However, experience replay can break data correlations and improve training efficiency.

Comments 5:  How to extract the features from the current state?

Response 5:  In this study, the state is used as a feature. In more complex systems, feature engineering may be required.

Comments 6:  Provide high quality diagrams figure 1, 2

Response 6:  We have revised and enhanced Figure 2 and Figure 3 (originally Figure 1 and Figure 2).

Comments 7:  How to select the optimal parameters?

Response 7:  Future work will involve hyperparameter optimization.

Comments 8:  Add the future direction is conclusion.

Response 8:  Thank you for your suggestions. We have added future research directions in the conclusion (page 15 line 437-438).

Comments 9:  The manuscript requires thorough proofreading for language and grammar. It is recommended to perform a detailed language check and proofreading before resubmitting.

Response 9:  Thank you for your suggestions. We have proofread and adjusted the language and grammar of the paper.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors,

 

The paper is interesting, but it must be improved. Next are my requests:

 

1) There are undefined acronyms in the text;

 

2) What means "terminal state" in equation (1)?

 

3) Define all parameters of all equations after their first appearance. Equations (1) and (2) are missing all definitions;

 

4) Figures 1, 2, 4 and 5 have very bad quality. Make it again with high resolution;

 

5) Include a figure with the architecture of the implemented neural network;

 

6) Include a figure of the simulated environment;

 

7) Explain what the Ps and Ms are in table 2;

 

8) Improve Figure 3-B. The numbers are merged;

 

9) Important: although the paper's aim is achieved at the end of the manuscript, it lacks a strong relationship with sustainability. Just putting this word in the manuscript title is not enough to fall in the scope of this journal. Therefore, the overall paper must be revised and explain the importance of the proposed solution to bring benefits to this topic. The conclusions must all reinforce the benefits of this solution for the avance of sustainability.

Author Response

Comments 1:  There are undefined acronyms in the text;

Response 1: Thank you for your comments. We have defined the corresponding acronyms in the text (page 2 line 65).

Comments 2:   What means "terminal state" in equation (1)?

Response 2:  Thank you for your comment. The terminal state refers to the last state, because the Q value is constantly updated, and the end point needs to be defined, that is, when r and s are in the terminal state. We have also added corresponding explanations in the article (page 7 line 260-261).

Comments 3:  Define all parameters of all equations after their first appearance. Equations (1) and (2) are missing all definitions;

Response 3:  Thank you for your comments. We have redefined all parameters in the formulas.

Comments 4:  Figures 1, 2, 4 and 5 have very bad quality. Make it again with high resolution;

Response 4:  Thank you for your comments. We have recreated the corresponding figures.

Comments 5:  Include a figure with the architecture of the implemented neural network;

Response 5:  Thank you for your comments. Since Figure 2 already includes the neural network structure, we have adjusted Figure 2 to clearly show the workflow of the neural network.

Comments 6:  Include a figure of the simulated environment;

Response 6:  Thank you for your comments. We have added a simulation environment diagram in Section 2 of the text to better illustrate the scenario.

Comments 7:  Explain what the Ps and Ms are in table 2;

Response 7:  Thank you for your comments. In Table 2, which mentions the bill of materials, Ps represents part types, and Ms represents product types. We have also added explanations in the text (page 9 line 289).

Comments 8:  Improve Figure 3-B. The numbers are merged;

Response 8:  Thank you for your comments. We have adjusted Figure 4 (originally Figure 3).

Comments 9:  Important: although the paper's aim is achieved at the end of the manuscript, it lacks a strong relationship with sustainability. Just putting this word in the manuscript title is not enough to fall in the scope of this journal. Therefore, the overall paper must be revised and explain the importance of the proposed solution to bring benefits to this topic. The conclusions must all reinforce the benefits of this solution for the avance of sustainability.

Response 9:  Thank you for your advice. We strengthen the connection with sustainability in the abstract, introduction and the conclusion, illustrating the benefits of this research(page 1 line 17-20&25-33; page 14 line 421-427; page19 line 429-438).

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

1. There are no references to formulas in the text, formulas 11, 12 repeat 1 and 2 respectively.

2. The value of the difference δ is introduced in the text (line 103), but it is not used further; the value of the parameter 𝛾 is not described (expression 1).

3. The designations of the quantities in the text and in the formulas are written differently, which makes it difficult to understand (for example, the designation of the reward r in the test (line 98) and in the formulas is different).

4. The introduction of the loss function L(𝜃) in explicit form occurs only on line 239.

5. The quality of Figures 1 and 2 does not allow you to see anything on them, units of measurement are not indicated in Figures 3 and 4 (quality of them can be improved too).

6. In formula13, the value of TD-error is read as "TD minus error" and the Q-function does not depend on 𝜃.

7. What is P in the header of Table 2? In terms of quantity, it is similar to the number of parts, but Table 3 also uses the P parameter and there it is the number of product models or the number of parts depending on the presence of the index.

8. The results in Tables 4 and 5 repeat Figure 3.

9. Usually, the TD-error parameter takes into account 2 parameters: 𝛾 and α [https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf, Figure 6.12]. What is the reason for the absence of the α coefficient or its equality to one, which is the same thing?

10. Based on the simulation results (Figures 3 and 4), we can say that the differences between the GA, RL and PER-DQN algorithms in this case are insignificant and can be related to the magnitude of the error. Therefore, the advantage of PER-DQN is questionable.

11. It is unclear how the computation time is measured (Table 7, Figure 5) and why MBS and GA are excluded from consideration there? According to Figure 5, the difference in the average computation time can also be explained by the measurement error (which is confirmed by the amount of noise in the graph) and their computation speed is approximately the same.

12. Why was PER-DQN chosen as the model under study for solving the problem, and not their existing modifications? For example, those described here [https://arxiv.org/pdf/1710.02298] give better results in the context of solving similar problems.

Author Response

Comments 1:  There are no references to formulas in the text, formulas 11, 12 repeat 1 and 2 respectively.

Response 1:  Thank you for your comments. We have added all parameter definitions for the corresponding formulas in the text. We indeed found that formulas 11 and 12 are repetitions of 1 and 2, and the content of section 3.1 overlaps with section 4. Therefore, we deleted the original section 3.1 and merged it into section 4 (page 7).

Comments 2:   The value of the difference δ is introduced in the text (line 103), but it is not used further; the value of the parameter ��� is not described (expression 1).

Response 2:  Thank you for your comments. The difference δ is the TD error, and since we really didn't mention it below, we've removed that expression. We have adjusted the parameter notation in the paper (page 7 line259).

Comments 3:  The designations of the quantities in the text and in the formulas are written differently, which makes it difficult to understand (for example, the designation of the reward r in the test (line 98) and in the formulas is different).

Response 3:  Thank you for your comments. We have adjusted the parameter notation in the paper.

Comments 4:  The introduction of the loss function L(���) in explicit form occurs only on line 239.

Response 4:  We agree with this comment. So we have added the description of L(���) in the paper(page 7 line 253).

Comments 5:  The quality of Figures 1 and 2 does not allow you to see anything on them, units of measurement are not indicated in Figures 3 and 4 (quality of them can be improved too).

Response 5:  Thank you for your comments. We have adjusted Figures 2, 3, 4, and 5 (originally Figures 1, 2, 3, and 4), beautified them, and produced them in higher resolution, and added units of measurement.

Comments 6:  In formula13, the value of TD-error is read as "TD minus error" and the Q-function does not depend on ���.

Response 6:  Your comments are very helpful. We have modified the formula for TD error(page 7 formula 11).

Comments 7:  What is P in the header of Table 2? In terms of quantity, it is similar to the number of parts, but Table 3 also uses the P parameter and there it is the number of product models or the number of parts depending on the presence of the index.

Response 7:  Thank you for your comments. Table 2 mentions the bill of materials, where P represents the type of parts, and M represents the type of products. We have added explanations in the text. We found that the parameter usage in Table 3 overlaps with Table 2, so we adjusted the parameters in Table 3. NP (originally P) in Table 3 represents the number of parts, which is a constant (page 9).

Comments 8:  The results in Tables 4 and 5 repeat Figure 3.

Response 8:  Thank you for your comments. Tables 4 and 5 indeed overlap with Figure 3 in some content, but we want to provide specific data while also presenting the data intuitively, and Table 5 contains more information, including the number of stockouts (NS).

Comments 9:  Usually, the TD-error parameter takes into account 2 parameters: ��� and α [https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf, Figure 6.12]. What is the reason for the absence of the α coefficient or its equality to one, which is the same thing?

Response 9:  Thank you very much for your comments. The part you mentioned is about updating the Q function, and the TD-error is just the part after α, so we did not include the α coefficient.

Comments 10:  Based on the simulation results (Figures 3 and 4), we can say that the differences between the GA, RL and PER-DQN algorithms in this case are insignificant and can be related to the magnitude of the error. Therefore, the advantage of PER-DQN is questionable.

Response 10:  Thank you for your comments. Our research simulation time is 100 hours, with scheduling every 72 seconds, resulting in 5000 scheduling events, which can reflect statistical characteristics. We also conducted 100-hour simulations under nine different ratios, and PER-DQN achieved the shortest distance and the lowest total cost among all algorithms, demonstrating its advantages.

Comments 11:  It is unclear how the computation time is measured (Table 7, Figure 5) and why MBS and GA are excluded from consideration there? According to Figure 5, the difference in the average computation time can also be explained by the measurement error (which is confirmed by the amount of noise in the graph) and their computation speed is approximately the same.

Response 11:  Thank you for your comments. Our computation time is measured based on the code execution time. Based on the previous experimental results, the overall performance of our RL algorithm is better, so we compared it with the RL algorithm. Since our data are the average of 5000 computations over 100 hours of simulation, they have statistical significance and can explain the results.

Comments 12:  Why was PER-DQN chosen as the model under study for solving the problem, and not their existing modifications? For example, those described here [https://arxiv.org/pdf/1710.02298] give better results in the context of solving similar problems.

Response 12:  Thank you for your comments. The DQN learning algorithm may ignore some low-frequency but high-value samples during experience extraction, leading to overestimation issues. Prioritized experience replay samples experiences with larger TD-errors, reducing the impact of overestimation and improving the stability and performance of the algorithm. PER-DQN can give higher sampling probabilities to important experiences, making them more frequently used, thus improving learning efficiency and optimizing data utilization, making it more suitable for our research object—the material handling system in a mixed-flow assembly workshop.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

  

The manuscript entitled “Research on Sustainable Scheduling of Material Handling System in Mixed-model Assembly Workshop Based on Deep Reinforcement Learning” provides some good results. Therefore, the current manuscript could be accepted for publication, but after going through a major revision.      

1.     Keywords should be more eye-catching.

2.     Quality of figures 1, 2, 4 and 5 must be improved.

3.     Any abbreviations must be defined in their first appearance in the manuscript such as “DQN” and “PER-DQN” in the abstract.

4.     Other algorithms must be discussed and compared to the one (DQN) used in the current. Advantages and disadvantages of using these algorithms should be tabulated.

5.      The main of the manuscript is not clear. It must be clarified at the end of the introduction.

6.     There must be a further explanation of the terms used equations 1, 2, 11, 12, 13, 14, 15, and 16.

7.     The units of all quantities/terms used in equations 3, 4, 5, 6, 7, 8, and 9 must be mentioned.

8.     The title of tables 3 and 6 should be more informative.

9.     All the captions of the figures must be rewritten in a much more informative way.

10.  All the panels of figures 3 and 4 must be mentioned in the figure caption.  

11.  English language should be improved.

Author Response

Comments 1:  Keywords should be more eye-catching.

Response 1:  Thank you for your suggestion. We have made necessary adjustments to the keywords.

Comments 2:   Quality of figures 1, 2, 4 and 5 must be improved.

Response 2:  Thank you for your comment. We have remade the corresponding figures, and the original Figures 1, 2, 4, and 5 are now Figures 2, 3, 5, and 6.

Comments 3:  Any abbreviations must be defined in their first appearance in the manuscript such as “DQN” and “PER-DQN” in the abstract.

Response 3:  Thank you for your comment. We have redefined the corresponding acronyms in the text.

Comments 4:  Other algorithms must be discussed and compared to the one (DQN) used in the current. Advantages and disadvantages of using these algorithms should be tabulated.

Response 4:  Thank you for your comment. We have added a section discussing the advantages and disadvantages of each algorithm(page 10 line 317-328).

Comments 5:  The main of the manuscript is not clear. It must be clarified at the end of the introduction.

Response 5:  Your suggestion is very helpful. We have added an explanation of the research gap in the introduction and detailed the motivation of this study again(page 2 line 72-84).

Comments 6:  There must be a further explanation of the terms used equations 1, 2, 11, 12, 13, 14, 15, and 16.

Response 6:  Thank you for your comment. We have added definitions and descriptions of all the corresponding parameters in the text.

Comments 7:  The units of all quantities/terms used in equations 3, 4, 5, 6, 7, 8, and 9 must be mentioned.

Response 7:  Thank you for your comment. We have added the units of the corresponding parameters in the paper.

Comments 8:  The title of tables 3 and 6 should be more informative.

Response 8:  Thank you for your comment. We have modified the titles of Tables 3 and 6, adding more content to make them more specific.

Comments 9:  All the captions of the figures must be rewritten in a much more informative way.

Response 9:  Thank you for your comment. We have adjusted the titles of all figures and tables in the text.

Comments 10:  All the panels of figures 3 and 4 must be mentioned in the figure caption.

Response 10:  Thank you for your comment. We have added all sections in the titles of Figures 4 and 5 (the original Figures 3 and 4).

Comments 11:  English language should be improved.

Response 11:  Thank you for your comment. We have proofread and adjusted the language and grammar of the paper.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

author have addressed all the comments. 
however line 41 on page 1 need to fix. It is missing something. 

Comments on the Quality of English Language

It’s ok 

Author Response

Comments 1:  author have addressed all the comments. however line 41 on page 1 need to fix. It is missing something.

Response 1:  Thank you for pointing this out. We agree with this comment. Therefore, we have modified the serial number of the literature (line 41).

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors,

Thank you for updating the paper with my suggestions. I have no more questions. Good luck!

Author Response

Thank you very much for your valuable advice.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have done a good enough job of addressing the comments; the article can be published.

Author Response

Thank you very much for your valuable advice.

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript has been improved. Therefore, it could be accepted for publication, but the authors must make sure that:

All the panels such as a, b, c, d, and e of figures 4 and 5 must be mentioned in the figure caption.

Author Response

Comments 1:  All the panels such as a, b, c, d, and e of figures 4 and 5 must be mentioned in the figure caption.

Response 1:  Thank you for pointing this out. We agree with this comment. Therefore, we have adjusted the headings in Figures 4 and 5 (page 11 line 340; page 13 line 371).

Back to TopTop