Next Article in Journal
Securing IoT Data Using Steganography: A Practical Implementation Approach
Next Article in Special Issue
Sentence Augmentation for Language Translation Using GPT-2
Previous Article in Journal
A New Hybrid Prime Code for OCDMA Network Multimedia Applications
Previous Article in Special Issue
Feature-Based Interpretation of the Deep Neural Network
 
 
Article
Peer-Review Record

Improving Text-to-Code Generation with Features of Code Graph on GPT-2

Electronics 2021, 10(21), 2706; https://doi.org/10.3390/electronics10212706
by Incheon Paik 1,* and Jun-Wei Wang 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2021, 10(21), 2706; https://doi.org/10.3390/electronics10212706
Submission received: 17 September 2021 / Revised: 26 October 2021 / Accepted: 2 November 2021 / Published: 5 November 2021
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)

Round 1

Reviewer 1 Report

This paper describes a new approach for improving text-to-code generation. Models pre-trained from scratch and GPT-2 model fine-tuning lead to the best performance, when compared with code graph models trained with enough data.

Authors need to consider a new round of proofreading to make sure all typos are fixed and more importantly language is improved as much as possible. The work otherwise would be recommended for publication but I feel it needs some language revisions before that should be considered. I have included a small list of potential language improvements the authors should use as a guide on their revision. The methodology and results otherwise are actually very good and would be a shame if the paper was not published because of language.

- consider rephrasing "a very hot deep learning application" 
- consider improving the clarity of the sentence "In this study, we investigate the improvement efficiency of code graphs with several variances on GPT-2 and refer to the abstract semantic tree used to collect the features (variables and variable type status) in the code." 
- consider rephrasing "pre-training from the beginning"
- consider rephrasing "If BERT wants to generate code, it needs to 42 be matched with the seq2seq architecture." 
- the list goes on...

Author Response

Comments by Reviewer#1:  Authors need to consider a new round of proofreading to make sure all typos are fixed and more importantly language is improved as much as possible. The work otherwise would be recommended for publication but I feel it needs some language revisions before that should be considered. I have included a small list of potential language improvements the authors should use as a guide on their revision. The methodology and results otherwise are actually very good and would be a shame if the paper was not published because of language. - consider rephrasing "a very hot deep learning application" / - consider improving the clarity of the sentence "In this study, we investigate the improvement efficiency of code graphs with several variances on GPT-2 and refer to the abstract semantic tree used to collect the features (variables and variable type status) in the code."/ - consider rephrasing "pre-training from the beginning" / - consider rephrasing "If BERT wants to generate code, it needs to 42 be matched with the seq2seq architecture." / - the list goes on...

Author response:  Thank you very much for your constructive and careful comments. We have revised all the points mentioned, and have done additional English check to all contents together with a commercial English check service.

Author Response File: Author Response.pdf

Reviewer 2 Report

  1. Introduction needs significant improvement in highlighting motivation and need for carrying out this research. Major refinements are needed to better highlight its contribution and novelty.
  2. A number of recent work related to the problem have not been referenced. Literature presented fails to establish a strong background and motivation for carrying out this work. Gaps in existing approaches are not addressed.
  3. Architectural description is weak and fails to answer “why” the work is carried out in particular fashion. All diagrams involve a number of elements, each of which may be explained in text or in a tabular form for better clarity.
  4. This paper presents no empirical evidence to show this method is better than other existing methods thus making it unclear whether the new approach is accurate, effective or efficient. This aspect should be discussed and possibly comparisons between other approaches should be provided. Just saying we have done like this is not convincing. This is important to emphasize the novelty of the work.

Author Response

Reviewer#2, Point #1: 1. Introduction needs significant improvement in highlighting motivation and need for carrying out this research. Major refinements are needed to better highlight its contribution and novelty.

Author response:  Thank you so much for point the issue that improves quality of our manuscript. We have revised the introduction entirely to highlight motivation.

Reviewer#2, Concern #2: 2. A number of recent work related to the problem have not been referenced. Literature presented fails to establish a strong background and motivation for carrying out this work. Gaps in existing approaches are not addressed.

Author response:  Thank you so much for pointing out this. We have added descriptions of recent related work and work of code modeling using GPT architecture. The gaps in the existing approaches have been described in the related work section and reference.

Reviewer#2, Concern #3: 3. Architectural description is weak and fails to answer “why” the work is carried out in particular fashion. All diagrams involve a number of elements, each of which may be explained in text or in a tabular form for better clarity.

Author response:  Thanks for your comments. There have been proposals of modifications in Transformer based pre-trained architecture (such as specific attentions, error functions under some research motivations) for better performance. In code generation, some researchers suggested it, but we have not found explanation (reason) to show clear result or effect of the proposal as far as we know. We think they still remained as just unexplainable large black box. In our proposal, we used just GPT-2 architecture as it is. But, we have obtained the good experimental result and it was written in the manuscript. (Of course, the motivation of our work is not proposal of new architecture details for improvement.)

And we improved the figures (Fig. 5 and 6) for better clarity in the revised version, and changed the Fig.1 to a Table.

Reviewer#2, Concern # 4: This paper presents no empirical evidence to show this method is better than other existing methods thus making it unclear whether the new approach is accurate, effective or efficient. This aspect should be discussed and possibly comparisons between other approaches should be provided. Just saying we have done like this is not convincing. This is important to emphasize the novelty of the work.

Author response:  We agree with the comment and thanks so much for the precious comments. Our basic research motivation is observation of code generation with code graph features using GPT-2, and some basic idea is based on that of the GraphCodeBERT. And our current goal is not comparing directly with other models such as BERT. But, as we used additional code graphs in GPT-2 to the existing training set, we expect our approach to outperform the other approaches. Therefore, we have not shown quantitative experimental results of the comparison in this manuscript. In fact, we wanted to focus on internal observation of the results from the GPT-2 model in this manuscript. Now, we are carrying out another extensive experiments with several detailed architectures for code generation by another member for the next step of research. 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thanks for making various improvements to the manuscript! It is now easier to read and understand what you have done! Just a generic comment about the use of the word "code" or "codes". We understand what it means but you could consider writing "source code" a few times instead, to make it a bit more obvious to those that might not completely understand what is been described. 

Just a number of minor fixes required:

Line 27: should read "resources for application"

Line 50: potentially should read "with a unique variable concept"

Line 58: should read "can improve the performance"

Line 65: should read " several features of variables"

Line 67: should read "their effect is observed"

Line 70: consider rephrasing to something like "Code generation is a relatively new topic in the deep learning area that has gained some attention over the last few years."

Line 107: I think it should read "a large set of language models"

Line 115: should read "graph neural networks"

Line 293: potentially should read "has a very powerful"

Line 349: needs rewording to something like "In this work, we mainly used the data type..."

Author Response

Reviewer#1: Thanks for making various improvements to the manuscript! It is now easier to read and understand what you have done! Just a generic comment about the use of the word "code" or "codes". We understand what it means but you could consider writing "source code" a few times instead, to make it a bit more obvious to those that might not completely understand what is been described.

Author response: Thanks so much for your kind and precious comments to improve our manuscript. We have done what have been pointed.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have tried to fill in gaps but still improvement is required:

  1. What is the need of this research still needs to be established. Authors need to highlight the gaps, and they need to mention which gaps they are closing. Just mention some author did this and that is not convincing.
  2. The main aim of the manuscript to "investigate the improvement of efficiency by code graphs with 46 several variances on GPT-2". Theoretical investigation is not accepted. It needs to be validated with existing results. Mentioning "Now, we are carrying out another extensive experiments with several detailed architectures for code generation by another member for the next step of research" is not acceptable. Please address the following concern:  Concern # 4: This paper presents no empirical evidence to show this method is better than other existing methods thus making it unclear whether the new approach is accurate, effective or efficient. This aspect should be discussed and possibly comparisons between other approaches should be provided. Just saying we have done like this is not convincing. This is important to emphasize the novelty of the work.

Author Response

Reviewer#2: What is the need of this research still needs to be established. Authors need to highlight the gaps, and they need to mention which gaps they are closing. Just mention some author did this and that is not convincing.

The main aim of the manuscript to "investigate the improvement of efficiency by code graphs with several variances on GPT-2". Theoretical investigation is not accepted. It needs to be validated with existing results. Mentioning "Now, we are carrying out another extensive experiments with several detailed architectures for code generation by another member for the next step of research" is not acceptable. Please address the following concern:  Concern # 4: This paper presents no empirical evidence to show this method is better than other existing methods thus making it unclear whether the new approach is accurate, effective or efficient. This aspect should be discussed and possibly comparisons between other approaches should be provided. Just saying we have done like this is not convincing. This is important to emphasize the novelty of the work.

Author response:  Thanks so much for your precious comments. We understood what you pointed out fully and respect the comments. We are sorry but, your request requires additional long experiments. This makes it difficult to show the comparison results in 5~7 days, what MDPI office requested. If we have to do this and the MDPI office allows more time (more than 1 month), we will consider to do it if we can find another researcher who can do it because Junwei went back to his country.

Author Response File: Author Response.pdf

Back to TopTop