Next Article in Journal
Deformation Detection and Attribution Analysis of Urban Areas near Dianchi Lake in Kunming Using the Time-Series InSAR Technique
Next Article in Special Issue
Software Fault Prediction Using an RNN-Based Deep Learning Approach and Ensemble Machine Learning Techniques
Previous Article in Journal
A Skeleton-Line-Based Graph Convolutional Neural Network for Areal Settlements’ Shape Classification
 
 
Article
Peer-Review Record

Joint Embedding of Semantic and Statistical Features for Effective Code Search

Appl. Sci. 2022, 12(19), 10002; https://doi.org/10.3390/app121910002
by Xianglong Kong 1,*, Supeng Kong 1, Ming Yu 2 and Chengjie Du 1
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4:
Appl. Sci. 2022, 12(19), 10002; https://doi.org/10.3390/app121910002
Submission received: 12 August 2022 / Revised: 26 September 2022 / Accepted: 28 September 2022 / Published: 5 October 2022
(This article belongs to the Special Issue Challenges in Using Machine Learning to Support Software Engineering)

Round 1

Reviewer 1 Report

The paper proposes a joint embedding model of semantic and statistical features to improve the effectiveness of code annotation. The authors implemented JessCS, based on the joint embedding model. The paper is well-written and organized. 

The authors should justify the proposed deep learning architecture and the setting of hyperparameters.

The experiments and well-shown and the results are useful to answer the stated research questions.

Author Response

Response to Review Comments

Dear editor and reviewer:

Thank you for taking time out of your busy schedule to review our manuscript entitled “Joint Embedding of Semantic and Statistical Features for Effective Code Search”. We really appreciate your help on making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve it. We have uploaded the revision into the system, all the refinements are marked in blue font, and in the following we will respond to the reviewers' comments point by point.

 

Comment from Reviewer 1:

Point 1: The authors should justify the proposed deep learning architecture and the setting of hyperparameters.

Response 1: Thank you for your rigorous consideration. We use an open-sourced deep learning framework, i.e., Keras in our study. We have analyzed the impacts of two parameters, i.e., code features and embedding vector dimension, in the answer of RQ2. The other hyperparameters are determined according to the commonly-used settings and our experience. We have added the architecture and settings in Section 3.3 (Page 9-10, Line 326-334), and discussed the possible threats about hyperparameters in Section 5 (Page 16, Line 487-494).

 

Once again, we thank you for the time you put in reviewing our manuscript and look forward to meeting your expectations. We hope that the revised manuscript is accepted for publication in Applied Sciences.

Yours sincerely,

Xianglong Kong

Supeng Kong

Ming Yu

Chengjie Du

Reviewer 2 Report

First, I would like to thank the authors and editors for having the opportunity to review a paper dealing with an important subject: deep code search. 

 

The paper focuses on an essential research topic for other subjects, such as bug location and tracking requirements, supporting software development and maintenance. 

 

I'm going to point out the main concerns, in my opinion, considering that I suggest another round review (this would be to major review). 

1) The authors have used references from 1997, 2003, 2011, and there are recent papers closely related to this subject, one of them published in ICSE-2018. I suggest a review to allow the comparison with recent results in the literature (up to five years ago). 

 

2) In Section 3.1 the authors mention: "we extract information from the codebase and query descriptions to build the code embedding network and natural language embedding network". To extract information from the source code is ok for me. From codebase might be, but more information is necessary to understand why and what sort of information. Information from the query description is not clear. (lines 125-129)

 

3) Section 3.1 shows a lot of statements that can impact the results. Here, it is necessary to discuss how they affect the results and possible threats to validity.

 

4) In Section 3.2.1: why NNM? What did you consider to choose it?

 

5) In Figure 3 and Figure 6: there are two "Attention". Is that correct? If so, it is not clear what they represent.

 

6) Figures 7 and 8 represent a comparison between different metrics, right? Is that make any sense?

 

7) RQ1 focuses on validity. However, the authors present a comparison between their proposed approach and another one. First, is validity the point? Second, "the model converges when the iteration reaches the 700th round, and then tends to be stable", and UNIF-based 755th: Does the round influence the metrics? 

 

8) "Finding 1" focuses not on validity but effectiveness and accuracy. 

 

9) similarly, RQ2 was not answered.

 

10) Section 5 must be improved, as pointed out.

 

As a minor point to review,  in the introduction, "Balachandran et al.... [5]", it is a single author (no "et al" at all). MRR appears in the abstract and introduction, but the reader knows what it means only in Section 4.3.

 

Author Response

Response to Review Comments

Dear editor and reviewer:

Thank you for taking time out of your busy schedule to review our manuscript entitled “Joint Embedding of Semantic and Statistical Features for Effective Code Search”. We really appreciate your help on making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve it. We have uploaded the revision into the system, all the refinements are marked in blue font, and in the following we will respond to the reviewers' comments point by point.

 

Comments from Reviewer 2:

Point 1: The authors have used references from 1997, 2003, 2011, and there are recent papers closely related to this subject, one of them published in ICSE-2018. I suggest a review to allow the comparison with recent results in the literature (up to five years ago).

Response 1: We are grateful for the suggestion and thank you so much for your careful check. We have removed some irrelevant or early references and added some works that focus on deep code search. UNIF-based approach is proved to perform more effective than DeepCS (ICSE-2018). And this is the main reason that we select UNIF-based approach in our experiments. We have compared the framework of DeepCS, JessCS and UNIF-based approach in Section 3.3 (Page 10, Line 335-340).

 

Point 2: In Section 3.1 the authors mention: "we extract information from the codebase and query descriptions to build the code embedding network and natural language embedding network". To extract information from the source code is ok for me. From codebase might be, but more information is necessary to understand why and what sort of information. Information from the query description is not clear. (lines 125-129)

Response 2: Thank you for this valuable comment. Both of source code and query description can provide important information for our model. We extract a bag of tokens from query description by tokenizing the identifier according to the camel case naming rule. The tokens are used to mine the relationships between user requirements and code snippets. We have added the details in Secton3 (Page 3, Line 115-118).

 

Point 3: Section 3.1 shows a lot of statements that can impact the results. Here, it is necessary to discuss how they affect the results and possible threats to validity.

Response 3: Thank you for this valuable comment. We extract the features from method level and class level to represent code snippets. We have analyzed the impacts of two parameters, i.e., code features and embedding vector dimension, in the answer of RQ2. We also discussed the possible threats to validity caused by other factors in Section 5 (Page 16, Line 487-494).

 

Point 4: In Section 3.2.1: why NNM? What did you consider to choose it?

Response 4: Thank you for your nice comment. We are inspired by some existing work (e.g., DeepCS and UNIF-based approach) to apply deep learning model in our work. The features of method name, API invocation and tokens are not in the same dimension. They are not easily to used together in traditional methods. Deep learning model can be used to mine the hidden relationships between semantic and statistical features, and help us to improve effectiveness of code search techniques. We have added the detailed reason in Section 3.2.1 (Page 5, Line 198-201).

 

Point 5: In Figure 3 and Figure 6: there are two "Attention". Is that correct? If so, it is not clear what they represent.

Response 5: Thank you for pointing out this problem in manuscript. We have removed one "Attention" in Figure 3 (Page 6) and Figure 6 (Page 9).

 

Point 6: Figures 7 and 8 represent a comparison between different metrics, right? Is that make any sense?

Response 6: Thank you for this valuable comment. Figures 7 and 8 represent how the TOP-1 precision and MRR values change with the increase of iteration times. They are used to support the results in Table 2. We compare the effectiveness of JessCS and UNIF-based approach according to the information of Figure 7 and Figure 8. We focus on a comparison between two models instead of two metrics. The reason that we present TOP-1 precision and MRR values in one figure is that their trends are close, and the similar trends can help us to determine the optimal iteration round. We have rewritten the discussion in Section 4.4.1 (Page 12-13, Line 424-429).

 

Point 7: RQ1 focuses on validity. However, the authors present a comparison between their proposed approach and another one. First, is validity the point? Second, "the model converges when the iteration reaches the 700th round, and then tends to be stable", and UNIF-based 755th: Does the round influence the metrics?

Response 7: Thank you for pointing out this problem in manuscript. Our initial idea of RQ1 focuses on effectiveness instead of validity. We have changed the expression of RQ1 in the revision. The iteration round is determined according to the trends in Figure 7 and Figure 8. It directly impacts the values of TOP-1 precision and MRR at beginning, and the values become stable when we train the model for certain times. In our work, JessCS and UNIF-based models converge at 775th and 755th rounds.

 

Point 8: "Finding 1" focuses not on validity but effectiveness and accuracy.

Response 8: We are sorry for this mistake. Our initial idea of RQ1 focuses on effectiveness instead of validity. We have changed the expression of RQ1 and Finding 1 in the revision (Page 13-14).

Point 9: RQ2 was not answered.

Response 9: We totally understand the reviewer's concern. We have reconstructed the contents in RQ2 and Finding 2 (Page 15, Line 473-477).

 

Point 10: Section 5 must be improved, as pointed out.

Response 10: We gratefully appreciate for your valuable comment to enrich our manuscript. We have rewritten the possible threats to validity in Section 5 (Page 15-16).

 

Point 11: As a minor point to review, in the introduction, "Balachandran et al.... [5]", it is a single author (no "et al" at all).

Response 11: Thank you for pointing out this problem in manuscript. We have revised it (Page 1, Line 19-20).

 

Point 12: MRR appears in the abstract and introduction, but the reader knows what it means only in Section 4.3.

Response 12: Thank you for this valuable comment. We have added explanation of MRR in Section 1 (Page 2, Line 40-51).

Once again, we thank you for the time you put in reviewing our manuscript and look forward to meeting your expectations. We hope that the revised manuscript is accepted for publication in Applied Sciences.

Yours sincerely,

Xianglong Kong

Supeng Kong

Ming Yu

Chengjie Du

Reviewer 3 Report

SUMMARY

The paper proposes applying statistical and machine learning methods to deal with the source code search problem, which is a significant and complex problem, although it is also very specific. The problem complexity comes from the fact that source code queries are usually formulated in natural language, with their intuitive informal semantics. In contrast, source code has essentially different (formal) semantics. The paper develops machine language models and statistical analysis methods to establish an embedding of the semantics of queries formulated in English into the semantics of Java programs. The authors argue that the developed tool is relatively more effective than a third-party tool. The paper requires many improvements to clarify some study premises and decisions, make the reported results more transparent, and improve its presentation.

 

ANALYSIS

 

Thank you for submitting your research for publication. The reported research is interesting and describes a step towards more effective and efficient code search tools. Compared to more general work performed under the text mining label, adopting world vectors to capture context and sequence of code fragments in a tractable way appears to be a strength of the reported research. Combining machine learning and statistical methods is welcome as an alternative way to deal with code search and their underlying formal semantics. There are, however, many issues related to the study premises and its design decisions, transparency of the conducted study and presentation aspects that must be treated to ensure a quality publication. These issues are detailed below: 

 

a) Some clarification is needed in study premises, its design decisions and research conduct:

 

Figure 4 describes the multiple-layer neural network model adopted in the reported research. It is a well know problem that multi-layer neural architectures are complex and may introduce cascading precision errors. However, this is not discussed in the paper nor addressed in the threads to validity section. A refined discussion is also needed in Section 4.4.1, where the authors describe model convergence since readers are left wondering whether or not absolute or relative mean-error analysis could be presented. Both issues must be addressed in the paper.  

 

The research questions on Page 10 and in the subsequent sections appear to require some restructuring. Please clarify in RQ1 what you understand by valid in connection to the metrics presented in the following subsection. Moreover, check whether or not the findings are connected to the research questions in the correct order.

 

The research results should be pondered in a more refined way. Please expand the Threads to Validity section to analyze the formulation of the research questions and their depending constructs (internal validity), particularly considering the issues mentioned above, as well as considering alternative paths that could eventually be used to generalize the findings (external validity). Concerning external validity, it would be reasonable to consider comparisons of the developed tool with others apart from UNIF-based and extract from these comparisons ways in which the reported findings can be generalized.

 

b) The performed research should be reported in more transparent ways:

 

Some important details are missing in the text. For instance, on line 321, the authors mention the adoption of a large-scale corpus for model training but do not provide details or a characterization. Please mention explicitly and present a description of the adopted corpus therein.

 

In Section 4.3, three indicators are introduced as commonly used in the field of information retrieval. Still, just a single reference to a paper written by one of the authors is provided. Please mention therein the primary references that justify this statement. Moreover, in subsection 4.3.2, please provide more information that explains setting N = 1.

 

Inconsistencies should be removed from the text. For instance, the reference to the TOP-1 Precision and MRR indicators on line 53 should be checked against similar data in the Abstract. The distinctions among text, sequence, words and source code should be clarified at the end of Section 2.1.

 

c) Many improvements are required in the paper presentation:

 

The Introduction must be rewritten to provide essential information regarding the studied problem and its context, and introduce the methods adopted and the solution proposed by the authors. Please briefly describe the code seach problem intuitively (as it appears at the beginning of Section 2), define what you mean by semantic and statistical features and introduce joint embeddings in the first three paragraphs, respectively. In doing so, correlate your work with more general text mining techniques. On the other hand, the discussion concering the selection of code search methods based on their granularity can be moved to different sections.  

 

An early description of the joint embedding technique is required in the text. Perhaps the authors could consider the presentation of Figure 5 and its subsequent paragraph already in Section 2.2 to facilitate readers' understanding.

 

Examples are missing in many places to illustrate definitions and clarify research development. In Section 3.1.1, an example is given in the first bullet, but they should also be provided in the second and third bullet to clarify the development reported therein. Again in Section 4.4.1, the authors mention high-frequency query sets, but examples of such queries are required for understanding subsequent tables and findings. 

 

DETAILS: 

Below, each text between [] contains a suggestion of exclusion and text between {} contains an inclusion suggestion 

Throughout the whole text: Use Java,  not JAVA;

Line 3: Semantic and statistical information [has]{have} hidden relationships;

Line 18: Balachandran et al conduct{ed} instance queries;

Line 41: Please clarify which Java project source code was downloaded from GitHub and BigQuery. Are they those rated with two stars from 2016 to 2019?

Line 67: Section 6 does not present background information;

Line 330: Figure 6 cannot describe at the same time model structure and training;

Line 363: Maybe the reference that "statements not closely related to code functions" can be extracted out of the enumeration;

Line 376: It appears that item (g) should be presented as an additional bullet;

Line 389: Please clarify what you mean by a functional function;

Line 485: In the conclusion, the second paragraph needs some restructuring/rewriting;

Figures 7 and 8 are misplaced

 

Author Response

Response to Review Comments

Dear editor and reviewer:

Thank you for taking time out of your busy schedule to review our manuscript entitled “Joint Embedding of Semantic and Statistical Features for Effective Code Search”. We really appreciate your help on making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve it. We have uploaded the revision into the system, all the refinements are marked in blue font, and in the following we will respond to the reviewers' comments point by point.

 

Comments from Reviewer 3:

Point 1: Some clarification is needed in study premises, its design decisions and research conduct: Figure 4 describes the multiple-layer neural network model adopted in the reported research. It is a well know problem that multi-layer neural architectures are complex and may introduce cascading precision errors. However, this is not discussed in the paper nor addressed in the threads to validity section. A refined discussion is also needed in Section 4.4.1, where the authors describe model convergence since readers are left wondering whether or not absolute or relative mean-error analysis could be presented. Both issues must be addressed in the paper. 

Response 1: Thank you for this valuable comment. We have addressed cascading error and mean-error analysis in Section 5 (Page 15-16). We also refine the discussion in Section 4.4.1 (Page 12-14), model convergence is visualized in terms of TOP-1 precision and MRR values in Figures 7 and 8. Both of the two metrics prove that the selection of iteration round is reasonable. For example, TOP-1 precision and MRR of JessCS become stable after 700th round and obtain optimal value at 775th round.

 

Point 2: The research questions on Page 10 and in the subsequent sections appear to require some restructuring. Please clarify in RQ1 what you understand by valid in connection to the metrics presented in the following subsection. Moreover, check whether or not the findings are connected to the research questions in the correct order.

Response 2: Thank you for pointing out this problem in manuscript. Our initial idea of RQ1 focuses on effectiveness instead of validity. We have changed the expression of RQ1 and RQ2 in the revision (Page 10, Line 346-352). The two findings and support contents are also revised according to your comments (Page 12-14).

 

Point 3: The research results should be pondered in a more refined way. Please expand the Threads to Validity section to analyze the formulation of the research questions and their depending constructs (internal validity), particularly considering the issues mentioned above, as well as considering alternative paths that could eventually be used to generalize the findings (external validity). Concerning external validity, it would be reasonable to consider comparisons of the developed tool with others apart from UNIF-based and extract from these comparisons ways in which the reported findings can be generalized.

Response 3: Thank you for this valuable comment. We have totally rewritten the possible threats to validity in Section 5 (Page 15-16). We have discussed the influencing factors of effectiveness, formulation, cascading errors, and some other threats. We also present the reason that we select UNIF-based approach in our experiments. Since it is proved to be more effective than DeepCS, and their framework is similar to ours.

 

Point 4: The performed research should be reported in more transparent ways: Some important details are missing in the text. For instance, on line 321, the authors mention the adoption of a large-scale corpus for model training but do not provide details or a characterization. Please mention explicitly and present a description of the adopted corpus therein.

Response 4: We are grateful for the suggestion. The corpus is initially collected by Google, which is widely used in software engineering area. We collect some projects from the initial corpus by building a filter with some conditions (a-f, in Section 4.2). The filter is implemented by generating a SQL expression based on Google BigQuery. We have added some details of the corpus in Section 1 and Section 4.2.

  

Point 5: In Section 4.3, three indicators are introduced as commonly used in the field of information retrieval. Still, just a single reference to a paper written by one of the authors is provided. Please mention therein the primary references that justify this statement. Moreover, in subsection 4.3.2, please provide more information that explains setting N = 1.

Response 5: Thank you for this valuable comment. We have added some references in Section 4.3.1, 4.3.2, and 4.3.3 to prove the universality of the selected metrics. In the standard calculation of TOP-N metric, ranking the 1st and Nth results has the same impact on the precision. However, developer mainly focus on the first result in the real development. So we set N = 1 in our experiments. We have added the detailed reason and reference in the revision (Page 11, Line 404-406).

 

Point 6: Inconsistencies should be removed from the text. For instance, the reference to the TOP-1 Precision and MRR indicators on line 53 should be checked against similar data in the Abstract. The distinctions among text, sequence, words and source code should be clarified at the end of Section 2.1.

Response 6: We are grateful for the suggestion. We have improved the consistency of expressions and discussed the distinctions in the revision.

 

Point 7: Many improvements are required in the paper presentation: The Introduction must be rewritten to provide essential information regarding the studied problem and its context, and introduce the methods adopted and the solution proposed by the authors. Please briefly describe the code search problem intuitively (as it appears at the beginning of Section 2), define what you mean by semantic and statistical features and introduce joint embeddings in the first three paragraphs, respectively. In doing so, correlate your work with more general text mining techniques. On the other hand, the discussion concering the selection of code search methods based on their granularity can be moved to different sections. 

Response 7: Thank you for this valuable comment. We have refined the Introduction, described the key points of code search problem and the potential disadvantage of existing works. We also presented the technical process of JessCS briefly in Section 1, and show the basic features of semantic and statistical information (Page 1-2).

 

Point 8: An early description of the joint embedding technique is required in the text. Perhaps the authors could consider the presentation of Figure 5 and its subsequent paragraph already in Section 2.2 to facilitate readers' understanding.

Response 8: Thank you for this valuable comment. We have carefully refined the contents in Section 2.2 (Page 2-3) and Section 3.2.3 (Page 8-9), to make them more understandable for readers.

 

Point 9: Examples are missing in many places to illustrate definitions and clarify research development. In Section 3.1.1, an example is given in the first bullet, but they should also be provided in the second and third bullet to clarify the development reported therein. Again in Section 4.4.1, the authors mention high-frequency query sets, but examples of such queries are required for understanding subsequent tables and findings.

Response 9: We are grateful for the suggestion. We have added some examples for the API invocation and token collection in Section 3.1.1 (Page 4-5, Line 143-159), added the collected  query descriptions in Table 2 (Page 13).

 

Point 10: Some detailed comments. Throughout the whole text: Use Java, not JAVA;

Line 3: Semantic and statistical information [has]{have} hidden relationships;

Line 18: Balachandran et al conduct{ed} instance queries;

Line 41: Please clarify which Java project source code was downloaded from GitHub and BigQuery. Are they those rated with two stars from 2016 to 2019?

Line 67: Section 6 does not present background information;

Line 330: Figure 6 cannot describe at the same time model structure and training;

Line 363: Maybe the reference that "statements not closely related to code functions" can be extracted out of the enumeration;

Line 376: It appears that item (g) should be presented as an additional bullet;

Line 389: Please clarify what you mean by a functional function;

Line 485: In the conclusion, the second paragraph needs some restructuring/rewriting;

Figures 7 and 8 are misplaced.

Response 10: We are grateful for the suggestions and addressed all of them in the revision. We have modified the expression of Java and some other typos. We described the process of codebase construction in Section 4.2. We carefully refined the contents in Abstract and Conclusion. We also revised the location of Figures 7 and 8 in Section 4.4.1.

Once again, we thank you for the time you put in reviewing our manuscript and look forward to meeting your expectations. We hope that the revised manuscript is accepted for publication in Applied Sciences.

Yours sincerely,

Xianglong Kong

Supeng Kong

Ming Yu

Chengjie Du

Reviewer 4 Report

The author implemented a code search engine using the joint embedding model. Overall, the topic of the paper and the supporting experimental results are well presented. 

Author Response

Response to Review Comments

Dear editor and reviewer:

Thank you for taking time out of your busy schedule to review our manuscript entitled “Joint Embedding of Semantic and Statistical Features for Effective Code Search”. We really appreciate your help on making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve it. We have uploaded the revision into the system, all the refinements are marked in blue font, and in the following we will respond to the reviewers' comments point by point.

 

Comment from Reviewer 4:

Point 1: The author implemented a code search engine using the joint embedding model. Overall, the topic of the paper and the supporting experimental results are well presented.

Response 1: We gratefully thanks for the precious time the reviewer spent making constructive remarks. We have carefully refined the manuscript according to the useful suggestions.

 

Once again, we thank you for the time you put in reviewing our manuscript and look forward to meeting your expectations. We hope that the revised manuscript is accepted for publication in Applied Sciences.

Yours sincerely,

Xianglong Kong

Supeng Kong

Ming Yu

Chengjie Du

Round 2

Reviewer 3 Report

Thank you for producing a revised version of your paper.

 

The research premises, design decisions and research conduct have been clarified. However, there are still important aspects that should be addressed in the paper. 

 

Concerning the hypotheses and findings, from line 475 onwards, the authors claim that "The joint embedding of method name and API invocation can significantly improve the effectiveness of JessCS". However, the formulation of the finding is still unclear since no significance levels, p-values or standard errors are presented in the paper. It is essential to cast the finding relatively, and, whether it has been statistically determined or not, this should be made explicit in the paper.

 

Moreover, it is the opinion of this referee that, even though the reported experiment is based on a publically available tool (Keras) and code collected by Google, the characterization of the corpus that supports model training should be explicitly presented in the paper. This can be done in many ways, for example, by presenting the respective statistical profile. Anyhow, authors should ensure the inclusion in the paper of sufficient details to enable research repeatability.  

 

The presentation of the research results has been substantially improved. Nevertheless, the following details should receive author´s attention:

 

Line 25: "The existing works already prove that either statistical features [8] or semantic features [9–11] can be used to build a code search model."

-> Maybe you mean that existing works provide (strong or weak statistically significant) evidence of this?

 

Line 42:

"The codebase is conducted based on a project filter"

-> Do you mean codebase search?

Author Response

Dear editor and reviewer:

Thank you for taking time out of your busy schedule to review our manuscript for the second round. We really appreciate your help on making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve it. We have uploaded the revision into the system, all the refinements are marked in blue font, and in the following we will respond to the reviewer's comments point by point.

 

Comment from Reviewer 3:

Point 1: Concerning the hypotheses and findings, from line 475 onwards, the authors claim that "The joint embedding of method name and API invocation can significantly improve the effectiveness of JessCS". However, the formulation of the finding is still unclear since no significance levels, p-values or standard errors are presented in the paper. It is essential to cast the finding relatively, and, whether it has been statistically determined or not, this should be made explicit in the paper.

Response 1: Thank you for your rigorous consideration. We have added significance levels, p-values or standard errors metrics in Section 4.3 (Line 436-442), and the discussion of these statistical indicators in Section 4.4.1 (Line 460-466). The results show that TOP-1 precision and MRR values are statistically determined.

 

Point 2: Moreover, it is the opinion of this referee that, even though the reported experiment is based on a publically available tool (Keras) and code collected by Google, the characterization of the corpus that supports model training should be explicitly presented in the paper. This can be done in many ways, for example, by presenting the respective statistical profile. Anyhow, authors should ensure the inclusion in the paper of sufficient details to enable research repeatability.

Response 2: Thank you for this valuable comment. We have added the SQL statement used in code extraction in Figure 7 (Page 11, Line 382-404). 63,015 Java projects can be extracted based on this SQL statement through Google BigQuery. Then we applied the filter on the extracted 7,190,099 Java files to collect the 1,263,974 sets of code snippets and corresponding functional descriptions. The construction of code base is repeatable according to the steps in Section 4.2 (Line 358-404).

 

Point 3: Line 25: "The existing works already prove that either statistical features [8] or semantic features [9–11] can be used to build a code search model." Maybe you mean that existing works provide (strong or weak statistically significant) evidence of this?

Response 3: Thank you for this valuable comment. The existing works prove that they can obtain effective code search results based on their experiments. The training data used in their experiments are extracted from either statistical features or semantic features. They have not provided statistically significant evidence in their work.

 

Point 4: Line 42: "The codebase is conducted based on a project filter". Do you mean codebase search?

Response 4: We totally understand the reviewer's concern. It is not a codebase search. We extracted 63,015 Java projects based on the SQL statement through Google BigQuery. There were 7,190,099 Java files in these projects. Then we implemented a filter (Line 358-404) to collect the useable code snippets from the extracted Java files. The code base is built on the collected 1,263,974 sets of code snippets and corresponding functional descriptions (code comments).

 

Once again, we thank you for the time you put in reviewing our manuscript and look forward to meeting your expectations. We hope that the revised manuscript is accepted for publication in Applied Sciences.

Yours sincerely,

Xianglong Kong

Supeng Kong

Ming Yu

Chengjie Du

Back to TopTop