Automatic Code Review by Learning the Structure Information of Code Graph
Abstract
:1. Introduction
2. Related Works
2.1. Code Feature Representation
2.2. Code Review
3. Background
3.1. Program Dependence Graph (PDG)
3.2. Minimum DFS Encoding
- (1)
- (2)
4. The Algorithm Model
4.1. Problem Description
4.2. PDG2Seq Algorithm
| Algorithm 1 Program dependency graph serialization algorithm PDG2Seq | 
| Input: , , Output: Minimum graph serialization encoding | 
| 1: initialization = (), i = 0; currentNode = v0; 2: for v in V do 3: v.visted = −1; 4: end for 5: DFSSearch (G, currentNode, ); 6: for e in E do 7: Find the insertion position in seq according to the rule ,Insert the backtracking edge e into seq; 8: end for 9: return seq 10: Subprocedure 1 DFSSearch (G, currentNode, seq) 11: currenINode.visited = i; 12: for nei in currentNode.neighbors do 13: array = () 14: for e in E do: 15: if e.begin == currentNode then 16: array.add (e) 17: end if 18: end for 19: e = min (Sort (array)); 20: if q.visited == −1 then 21: i + = 1; 22: E.remove (e); 23: seq.add (e); 24: DFSearch (G,q, seq); 25: end if 26: end for | 
4.3. crBERT Model
5. Experiment and Analysis
5.1. Description of Experimental Dataset
5.2. Evaluation Metrics
5.3. Experimental Parameter Settings
5.4. Experimental Results and Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sadowski, C.; Söderberg, E.; Church, L.; Sipko, M.; Bacchelli, A. Modern code review: A case study at google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, Gothenburg, Sweden, 27 May–3 June 2018; pp. 181–190. [Google Scholar]
- Rani, P.; Blasi, A.; Stulova, N.; Panichella, S.; Gorla, A.; Nierstrasz, O. A decade of code comment quality assessment: A systematic literature review. J. Syst. Softw. 2023, 195, 111515. [Google Scholar] [CrossRef]
- Dong, L.; Zhang, H.; Yang, L.; Weng, Z.; Yang, X.; Zhou, X.; Pan, Z. Survey on Pains and Best Practices of Code Review. In Proceedings of the 2021 28th Asia-Pacific Software Engineering Conference (APSEC), Taipei, Taiwan, 6–9 December 2021; pp. 482–491. [Google Scholar]
- Wessel, M.; Serebrenik, A.; Wiese, I.; Steinmacher, I.; Gerosa, M.A. What to Expect from Code Review Bots on GitHub? A Survey with OSS Maintainers. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering, Natal, Brazil, 21–23 October 2020; pp. 457–462. [Google Scholar]
- Dosea, M.; Sant’Anna, C.; Oliveira, Y.; Junior, M.C. A Survey of Software Code Review Practices in Brazil. arXiv 2020, arXiv:2007.14276. [Google Scholar]
- Tufano, R.; Pascarella, L.; Tufano, M.; Poshyvanyk, D.; Bavota, G. Towards Automating Code Review Activities. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 163–174. [Google Scholar]
- Huo, X.; Li, M.; Zhou, Z.H. Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 4223–4230. [Google Scholar]
- Wan, Y.; Zhao, Z.; Yang, M.; Xu, G.; Ying, H.; Wu, J.; Yu, P.S. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, 3–7 September 2018; pp. 397–407. [Google Scholar]
- Wan, Y.; Shu, J.; Sui, Y.; Xu, G.; Zhao, Z.; Wu, J.; Yu, P. Multi-modal attention network learning for semantic source code retrieval. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 13–25. [Google Scholar]
- Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D.; et al. CodeBERT: A pre-trained model for programming and natural languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Wang, W.; Li, G.; Shen, S.; Xia, X.; Jin, Z. Modular tree network for source code representation learning. ACM Trans. Softw. Eng. Methodol. 2020, 29, 1–23. [Google Scholar] [CrossRef]
- Hu, X.; Li, G.; Xia, X.; Lo, D.; Jin, Z. Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 2020, 25, 2179–2217. [Google Scholar] [CrossRef]
- Wu, H.; Zhao, H.; Zhang, M. SIT3: Code Summarization with Structure-induced Transformer. arXiv 2020, arXiv:2012.14710. [Google Scholar]
- LeClair, A.; Haque, S.; Wu, L.; McMillan, C. Improved Code Summarization via a Graph Neural Network. In Proceedings of the 28th International Conference on Program Comprehension, Seoul, Republic of Korea, 13–15 July 2020; pp. 184–195. [Google Scholar]
- Fan, Y.; Xia, X.; Lo, D.; Li, S. Early prediction of merged code changes to prioritize reviewing tasks. Empir. Softw. Eng. 2018, 23, 3346–3393. [Google Scholar] [CrossRef]
- Uchoa, A.; Barbosa, C.; Coutinho, D.; Oizumi, W.; Assuncao, W.K.G.; Vergilio, S.R.; Pereira, J.A.; Oliveira, A.; Garcia, A. Predicting Design Impactful Changes in Modern Code Review: A Large-Scale Empirical Study. In Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Madrid, Spain, 17–19 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 471–482. [Google Scholar]
- Soares, D.M.; de Lima Júnior, M.L.; Plastino, A.; Murta, L. What factors influence the reviewer assignment to pull requests? Inf. Softw. Technol. 2018, 98, 32–43. [Google Scholar] [CrossRef]
- Shi, S.T.; Li, M.; Lo, D.; Thung, F.; Huo, X. Automatic code review by learning the revision of source code. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4910–4917. [Google Scholar]
- Li, H.Y.; Shi, S.T.; Thung, F.; Huo, X.; Xu, B.; Li, M.; Lo, D. DeepReview: Automatic code review using deep multi-instance learning. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, 14–17 April 2019; pp. 318–330. [Google Scholar]
- Siow, J.K.; Gao, C.; Fan, L.; Chen, S.; Liu, Y. Core: Automating review recommendation for code changes. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), London, ON, Canada, 18–21 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 284–295. [Google Scholar]
- Hoang, T.; Kang, H.J.; Lo, D.; Lawall, J. Cc2vec: Distributed representations of code changes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 518–529. [Google Scholar]
- Ye, X.; Zheng, Y.; Aljedaani, W.; Mkaouer, M.W. Recommending pull request reviewers based on code changes. Soft Comput. 2021, 25, 5619–5632. [Google Scholar] [CrossRef]
- Lu, L.; Ren, X.; Qi, L.; Cui, C.; Jiao, Y. Target Gene Mining Algorithm Based on gSpan. In Proceedings of the Collaborative Computing: Networking, Applications and Worksharing: 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, 1–3 December 2018; pp. 518–528. [Google Scholar]
- Li, X.; Li, J.; Gao, H. An Efficient Frequent Subgraph Mining Algorithm. J. Softw. 2007, 18, 2469–2480. [Google Scholar] [CrossRef]
- Jin, W.; Liu, X.; Ma, Y.; Aggarwal, C.; Tang, J. Feature Overcorrelation in Deep Graph Neural Networks: A New Perspective. arXiv 2022, arXiv:2206.07743. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Tufano, M.; Pantiuchina, J.; Watson, C.; Bavota, G.; Poshyvanyk, D. On learning meaningful code changes via neural machine translation. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 25–31 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 25–36. [Google Scholar]
- Lin, C.Y. Rouge: A Package for Automatic Evaluation of Summarie; Text Summarization Branches Out; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 74–81. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]








| Edge | I | (c) | (d) | 
|---|---|---|---|
| 0 | (v0,v1,A,a,B) | (v0,v1,B,a,A) | (v0,v1,B,d,A) | 
| 1 | (v1,v2,B,d,A) | (v1,v2,A,a,C) | (v1,v2,A,c,C) | 
| 2 | (v2,v3,A,c,C) | (v2,v0,C,b,B) | (v2,v0,C,b,B) | 
| 3 | (v3,v0,C,a,A) | (v2,v3,C,c,A) | (v2,v3,C,a,A) | 
| 4 | (v3,v1,C,b,B) | (v3,v0,A,d,B) | (v3,v0,A,d,B) | 
| 5 | (v2,v4,A,b,D) | (v3,v4,A,b,D) | (v1,v4,A,b,D) | 
| Edge | The Minimum forward Edge Coding | The Minimum Graph Sequence Coding | 
|---|---|---|
| 0 | (v0,v1,a,0,b) | (v0,v1,I,0,b) | 
| 1 | (v1,v2,b,n,d) | (v1,v2,b,n,d) | 
| 2 | (v2,v3,d,0,e) | (v2,v3,d,0,e) | 
| 3 | (v2,v4,d,0,f) | (v2,v4,d,0,f) | 
| 4 | (v0,v5,a,0,c) | (v4,v2,I,i,d) | 
| 5 | (v4,v3,f,i,e) | |
| 6 | (v0,v5,a,0,c) | |
| 7 | (v5,v2,c,0,d) | 
| Model | Beam1 | Beam3 | Beam5 | Beam10 | 
|---|---|---|---|---|
| 1-encoder | 0.692 | 0.769 | 0.791 | 0.814 | 
| crBERT1 | 0.78 | 0.83 | 0.846 | 0.862 | 
| Model | Beam1 | Beam3 | Beam5 | Beam10 | 
|---|---|---|---|---|
| 1-encoder | 0.692 | 0.769 | 0.791 | 0.814 | 
| crBERT1 | 0.78 | 0.83 | 0.846 | 0.862 | 
| Metric | Levenshtein Distance | ROUGE-L | |
|---|---|---|---|
| Model | |||
| 1-encoder | 0.254 | 0.905 | |
| crBERT1 | 0.183 | 0.927 | |
| 2-encoders | 0.202 | 0.926 | |
| crBERT2 | 0.167 | 0.935 | |
| Model | Task1 | Task2 | 
|---|---|---|
| seq2seq | 0.767 | 0.775 | 
| seq2seq+gs | 0.773 | 0.777 | 
| tranformer | 0.692 | 0.763 | 
| Transformer+gs | 0.753 | 0.772 | 
| codeBERT | 0.769 | 0.789 | 
| codeBERT+gs | 0.78 | 0.799 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, Y.; Zhao, Y.; Sun, Y.; Chen, C. Automatic Code Review by Learning the Structure Information of Code Graph. Sensors 2023, 23, 2551. https://doi.org/10.3390/s23052551
Yin Y, Zhao Y, Sun Y, Chen C. Automatic Code Review by Learning the Structure Information of Code Graph. Sensors. 2023; 23(5):2551. https://doi.org/10.3390/s23052551
Chicago/Turabian StyleYin, Ying, Yuhai Zhao, Yiming Sun, and Chen Chen. 2023. "Automatic Code Review by Learning the Structure Information of Code Graph" Sensors 23, no. 5: 2551. https://doi.org/10.3390/s23052551
APA StyleYin, Y., Zhao, Y., Sun, Y., & Chen, C. (2023). Automatic Code Review by Learning the Structure Information of Code Graph. Sensors, 23(5), 2551. https://doi.org/10.3390/s23052551
 
        


 
       