RioCC: Efficient and Accurate Class-Level Code Recommendation Based on Deep Code Clone Detection
Abstract
1. Introduction
- We introduce a class-level code recommendation framework that bridges the gap between clone detection and practical recommendation. Unlike method-level approaches, it retrieves relevant yet structurally diverse code snippets, effectively enabling query expansion and providing developers with a broader and more useful set of recommendations.
- We employ a deep forest model to enhance representation learning for clone detection. This model captures both context and structure, improving the detection of complex (Type-3 and Type-4) clones.
- We integrate a quick search module based on matrix computations to efficiently filter out irrelevant candidates, significantly reducing the time complexity of the recommendation process.
2. Approach
2.1. Overview
2.2. Quick Search
2.3. Clone Detection
- Type-1 (Textual Similarity): Two code fragments are identical except for differences in spaces, comments, and layout. This type is also referred to as an “exact clone”.
- Type-2 (Lexical or Token-Based Similarity): These clones differ in identifier names, variable names, type names, and function names but retain the same structure. This category is also known as a “renamed/parameterized clone”.
- Type-3 (Syntactic Similarity): Code fragments exhibit insertions or deletions of statements while still maintaining similar syntactic structures. Additionally, differences may exist in identifiers, types, spaces, layout, and comments. This type is also referred to as a “near-miss clone” or “gapped clone”.
- Type-4 (Semantic Similarity): Two code fragments are syntactically dissimilar but functionally equivalent. This type is also known as a “semantic clone”.
2.3.1. Data Preprocessing
2.3.2. GcForest Building
- (1)
- Multi-Grained Scanning
- (2)
- Cascade Forest
2.4. Recommendation Presentation
3. Experimental Setup
3.1. Research Questions
- RQ1: How does RioCC perform in detecting clone pairs compared to state-of-the-art methods?
- RQ2: How well does RioCC recommend real-world code fragments?
- RQ3: What is the time consumption of RioCC for code recommendation?
3.2. Subjects
3.2.1. Dataset for Code Clone
- NT1: T1 and T2 clones.
- NT2: VST3 and ST3 clones.
- NT3: MT3 and WT3/4 clones.
- NT4: Non-clone pairs.
3.2.2. Dataset for Code Pool
3.3. Metrics and Baseline
3.3.1. Metrics
3.3.2. Baseline
- CCLearner: Extracts tokens from source code clones to train a DNN model for classification.
- Oreo: Employs a Siamese neural network to train the clone detection model.
- RSharer: Uses a CNN for the classification task.
3.4. Experimental Setting
4. Experimental Results
4.1. Performance of RioCC in Clone Pair Detection (RQ1)
4.1.1. Clone Detection on BigCloneBench (RQ1-1)
4.1.2. Clone Detection on the Code Pool (RQ1-2)
4.2. The Performance of RioCC in Recommending Real-World Code Fragments (RQ2)
4.3. Time Consumption of RioCC for Code Recommendation (RQ3)
5. Limitations and Validity Concerns
5.1. Limited Dataset
5.2. Manual Evaluation Bias
5.3. Limited Availability
6. Discussion
- Advantages of RioCC: RioCC excels in program recommendation by leveraging structured feature extraction, multi-grained scanning, and gcForest-based classification. Unlike deep learning models that require extensive labeled data, RioCC benefits from an ensemble-based learning approach, reducing dependence on large-scale pretraining. Additionally, the quick search module significantly improves efficiency by narrowing down the search space, making RioCC highly scalable for large codebases. These characteristics ensure that RioCC provides precise and computationally efficient code recommendations, particularly for structured clone detection tasks.
- Advantages of LLM-based Approaches: Recent advancements in large language models (LLMs), such as GPT [34] and CodeBERT [21], have revolutionized program recommendation by capturing deep contextual and semantic relationships in source code [35,36,37]. LLMs can generate meaningful recommendations even for unseen code structures, generalizing well across different programming paradigms. Their ability to learn from vast corpora enables them to recommend code fragments that align with developers’ intent, making them particularly effective in open-ended, generative tasks like code synthesis, refactoring suggestions, and intent-driven search.
- Scenarios Where RioCC Remains Advantageous: Despite the strengths of LLMs, RioCC remains highly effective in specific scenarios. In structured clone detection tasks where precise similarity measurement is crucial, RioCC offers deterministic and explainable results, which LLMs may struggle with due to their probabilistic nature. Additionally, for domains requiring strict control over training data and interpretability—such as safety-critical software or enterprise applications—RioCC’s structured, feature-driven approach remains preferable. Moreover, RioCC is computationally lightweight compared to LLMs, making it more suitable for real-time recommendation tasks with limited computing resources.
7. Related Work
7.1. Code Recommendation
7.2. Code Clone Detection
7.3. LLM-Based Code Engineering
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Krueger, C.W. Software Reuse. ACM Comput. Surv. 1992, 24, 131–183. [Google Scholar] [CrossRef]
- Luan, S.; Yang, D.; Barnaby, C.; Sen, K.; Chandra, S. Aroma: Code Recommendation via Structural Code Search. Proc. ACM Program. Lang. 2019, 3, 1–28. [Google Scholar] [CrossRef]
- Kim, K.; Kim, D.; Bissyandé, T.F.; Choi, E.; Li, L.; Klein, J.; Le Traon, Y. FaCoY: A Code-to-Code Search Engine. In Proceedings of the 40th International Conference on Software Engineering; ACM: New York, NY, USA, 2018; pp. 946–957. [Google Scholar]
- Krugler, K. Krugle Code Search Architecture. In Finding Source Code on the Web for Remix and Reuse; Springer: Berlin/Heidelberg, Germany, 2013; pp. 103–120. [Google Scholar]
- Chan, W.-K.; Cheng, H.; Lo, D. Searching Connected API Subgraph via Text Phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering; ACM: New York, NY, USA, 2012; pp. 1–11. [Google Scholar]
- Martie, L.; LaToza, T.D.; van der Hoek, A. CodeExchange: Supporting Reformulation of Internet-Scale Code Queries in Context. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE); IEEE: Piscataway, NJ, USA, 2015; pp. 24–35. [Google Scholar]
- Sachdev, S.; Li, H.; Luan, S.; Kim, S.; Sen, K.; Chandra, S. Retrieval on Source Code: A Neural Code Search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages; ACM: New York, NY, USA, 2018; pp. 31–41. [Google Scholar]
- Durai, A.D.; Ganesh, M.; Mathew, R.M.; Anguraj, D.K. A Novel Approach with an Extensive Case Study and Experiment for Automatic Code Generation from the XMI Schema of UML Models. J. Supercomput. 2022, 78, 7677–7699. [Google Scholar] [CrossRef]
- Sajnani, H.; Saini, V.; Svajlenko, J.; Roy, C.K.; Lopes, C.V. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering; ACM: New York, NY, USA, 2016; pp. 1157–1168. [Google Scholar]
- Guo, C.; Huang, D.; Dong, N.; Ye, Q.; Xu, J.; Fan, Y.; Yang, H.; Xu, Y. Deep Review Sharing. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER); IEEE: Piscataway, NJ, USA, 2019; pp. 61–72. [Google Scholar]
- Abid, S. Recommending Related Functions from API Usage-Based Function Clone Structures. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE); ACM: New York, NY, USA, 2019; pp. 1193–1195. [Google Scholar]
- Martinez-Gil, J. Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures. In Proceedings of the 17th International Conference on Software Quality (SWQD 2025); Springer: Cham, Switzerland, 2025; pp. 72–90. [Google Scholar]
- Quradaa, F.H.; Shahzad, S.; Almoqbily, R.S. A Systematic Literature Review on the Applications of Recurrent Neural Networks in Code Clone Research. PLoS ONE 2024, 19, e0296858. [Google Scholar] [CrossRef] [PubMed]
- Alrubaye, H.; Mkaouer, M.W.; Khokhlov, I.; Reznik, L.; Ouni, A.; McGoff, J. Learning to Recommend Third-Party Library Migration Opportunities at the API Level. Appl. Soft Comput. 2020, 90, 106140. [Google Scholar] [CrossRef]
- Ma, Z.; An, S.; Xie, B.; Lin, Z. Compositional API Recommendation for Library-Oriented Code Generation. In Proceedings of the IEEE/ACM International Conference on Program Comprehension; IEEE: Piscataway, NJ, USA, 2024; pp. 87–98. [Google Scholar]
- Dotzler, G.; Kamp, M.; Kreutzer, P.; Philippsen, M. More Accurate Recommendations for Method-Level Changes. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering; ACM: New York, NY, USA, 2017; pp. 798–808. [Google Scholar]
- Sheneamer, A.; Kalita, J. A Survey of Software Clone Detection Techniques. Int. J. Comput. Appl. 2016, 137, 1–21. [Google Scholar] [CrossRef]
- Li, L.; Feng, H.; Zhuang, W.; Meng, N.; Ryder, B. CCLearner: A Deep Learning-Based Clone Detection Approach. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME); IEEE: Piscataway, NJ, USA, 2017; pp. 249–260. [Google Scholar]
- Saini, V.; Farmahinifarahani, F.; Lu, Y.; Baldi, P.; Lopes, C.V. Oreo: Detection of Clones in the Twilight Zone. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE); ACM: New York, NY, USA, 2018; pp. 354–365. [Google Scholar]
- Alon, U.; Zilberstein, M.; Levy, O.; Yahav, E. code2vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 2019, 3, 40. [Google Scholar] [CrossRef]
- Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
- Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv 2020, arXiv:2009.08366. [Google Scholar]
- Svajlenko, J.; Islam, J.F.; Keivanloo, I.; Roy, C.K.; Mia, M.M. Towards a Big Data Curated Benchmark of Inter-Project Code Clones. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME); IEEE: Piscataway, NJ, USA, 2014; pp. 476–480. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Advances in Neural Information Processing Systems; Curran Associates: Red Hook, NY, USA, 2013; pp. 3111–3119. [Google Scholar]
- Goldberg, Y.; Levy, O. word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method. arXiv 2014, arXiv:1402.3722. [Google Scholar]
- Zhou, Z.-H.; Feng, J. Deep Forest: Towards an Alternative to Deep Neural Networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI); IJCAI Organization: Melbourne, Australia, 2017; pp. 3553–3559. [Google Scholar]
- Ragkhitwetsagul, C.; Krinke, J.; Clark, D. A Comparison of Code Similarity Analysers. Empir. Softw. Eng. 2018, 23, 2464–2519. [Google Scholar] [CrossRef]
- Parsa, S.; Zakeri-Nasrabadi, M.; Ekhtiarzadeh, M.; Ramezani, M. Method Name Recommendation Based on Source Code Metrics. J. Comput. Lang. 2023, 74, 101177. [Google Scholar] [CrossRef]
- Zakeri-Nasrabadi, M.; Parsa, S.; Ramezani, M.; Roy, C.K.; Ekhtiarzadeh, M. A Systematic Literature Review on Source Code Similarity Measurement and Clone Detection: Techniques, Applications, and Challenges. J. Syst. Softw. 2023, 204, 111796. [Google Scholar] [CrossRef]
- Okutan, A. Use of Source Code Similarity Metrics in Software Defect Prediction. arXiv 2018, arXiv:1808.10033. [Google Scholar] [CrossRef]
- Svajlenko, J.; Roy, C.K. Evaluating Clone Detection Tools with BigCloneBench. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME); IEEE: Piscataway, NJ, USA, 2015; pp. 131–140. [Google Scholar]
- Roy, C.K.; Cordy, J.R. NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization. In Proceedings of the 16th IEEE International Conference on Program Comprehension; IEEE: Piscataway, NJ, USA, 2008; pp. 172–181. [Google Scholar]
- Jiang, L.; Misherghi, G.; Su, Z.; Glondu, S. Deckard: Scalable and Accurate Tree-Based Detection of Code Clones. In Proceedings of the 29th International Conference on Software Engineering (ICSE); IEEE: Piscataway, NJ, USA, 2007; pp. 96–105. [Google Scholar]
- OpenAI. ChatGPT. Available online: https://openai.com (accessed on 5 December 2025).
- Tufano, R.; Dabić, O.; Mastropaolo, A.; Ciniselli, M.; Bavota, G. Code Review Automation: Strengths and Weaknesses of the State of the Art. IEEE Trans. Softw. Eng. 2024, 50, 338–353. [Google Scholar] [CrossRef]
- Guo, Q.; Cao, J.; Xie, X.; Liu, S.; Li, X.; Chen, B.; Peng, X. Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study. In Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE); IEEE: Piscataway, NJ, USA, 2024; pp. 1–13. [Google Scholar]
- Jiang, J.; Wang, F.; Shen, J.; Kim, S.; Kim, S. A Survey on Large Language Models for Code Generation. arXiv 2024, arXiv:2406.00515. [Google Scholar] [CrossRef]
- Baker, B.S. A Program for Identifying Duplicated Code. In Proceedings of the Computing Science and Statistics: 24th Symposium on the Interface, College Station, TX, USA, 18–21 March 1992; pp. 49–57. [Google Scholar]
- Ducasse, S.; Rieger, M.; Demeyer, S. A Language Independent Approach for Detecting Duplicated Code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM); IEEE: Piscataway, NJ, USA, 1999; pp. 109–118. [Google Scholar]
- Johnson, J.H. Substring Matching for Clone Detection and Change Tracking. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM); IEEE: Piscataway, NJ, USA, 1994; pp. 120–126. [Google Scholar]
- Kamiya, T.; Kusumoto, S.; Inoue, K. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large-Scale Source Code. IEEE Trans. Softw. Eng. 2002, 28, 654–670. [Google Scholar] [CrossRef]
- Gabel, M.; Jiang, L.; Su, Z. Scalable Detection of Semantic Clones. In Proceedings of the 30th International Conference on Software Engineering; ACM: New York, NY, USA, 2008; pp. 321–330. [Google Scholar]
- Krinke, J. Identifying Similar Code with Program Dependence Graphs. In Proceedings of the Eighth Working Conference on Reverse Engineering; IEEE: Piscataway, NJ, USA, 2001; pp. 301–309. [Google Scholar]
- Chen, K.; Liu, P.; Zhang, Y. Achieving Accuracy and Scalability Simultaneously in Detecting Application Clones on Android Markets. In Proceedings of the 36th International Conference on Software Engineering; ACM: New York, NY, USA, 2014; pp. 175–186. [Google Scholar]
- Pham, N.H.; Nguyen, H.A.; Nguyen, T.T.; Al-Kofahi, J.M.; Nguyen, T.N. Complete and Accurate Clone Detection in Graph-Based Models. In Proceedings of the IEEE 31st International Conference on Software Engineering; IEEE: Piscataway, NJ, USA, 2009; pp. 276–286. [Google Scholar]
- Papamichail, M.D.; Diamantopoulos, T.; Symeonidis, A.L. Measuring the Reusability of Software Components Using Static Analysis Metrics and Reuse Rate Information. J. Syst. Softw. 2019, 158, 110423. [Google Scholar] [CrossRef]
- Alon, U.; Brody, S.; Levy, O.; Yahav, E. code2seq: Generating Sequences from Structured Representations of Code. arXiv 2018, arXiv:1808.01400. [Google Scholar]
- Allen, F.E. Control Flow Analysis. ACM SIGPLAN Not. 1970, 5, 1–19. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. ACM Trans. Intell. Syst. Technol. 2025, 16, 106. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar]
- Zhang, Z.; Saber, T. Exploring the Boundaries between LLM Code Clone Detection and Code Similarity Assessment on Human and AI-Generated Code. Big Data Cogn. Comput. 2025, 9, 41. [Google Scholar] [CrossRef]
- Feng, S.; Chen, C. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE); IEEE: Piscataway, NJ, USA, 2024; pp. 1–13. [Google Scholar]
- Deng, Y.; Xia, C.S.; Peng, H.; Yang, C.; Zhang, L. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis; ACM: New York, NY, USA, 2023; pp. 423–435. [Google Scholar]
- Han, X.; Yang, Q.; Chen, X.; Chu, X.; Zhu, M. Generating and Evolving Reward Functions for Highway Driving with Large Language Models. In Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC); IEEE: Edmonton, AB, Canada, 2024; pp. 831–836. [Google Scholar]
- Nichols, D.; Polasam, P.; Menon, H.; Marathe, A.; Gamblin, T.; Bhatele, A. Performance-Aligned LLMs for Generating Fast Code. arXiv 2024, arXiv:2404.18864. [Google Scholar] [CrossRef]
- Dou, S.; Shan, J.; Jia, H.; Deng, W.; Xi, Z.; He, W.; Wu, Y.; Gui, T.; Liu, Y.; Huang, X. Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey. arXiv 2023, arXiv:2308.01191. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. LLaMA 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Taori, R.; Gulrajani, I.; Zhang, T.; Dubois, Y.; Li, X.; Guestrin, C.; Liang, P.; Hashimoto, T.B. Alpaca: A Strong, Replicable Instruction-Following Model. arXiv 2023, arXiv:2305.14233. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Gong, J.; Voskanyan, V.; Brookes, P.; Wu, F.; Jie, W.; Xu, J.; Giavrimis, R.; Basios, M.; Kanthan, L.; Wang, Z. Language Models for Code Optimization: Survey, Challenges and Future Directions. arXiv 2025, arXiv:2501.01277. [Google Scholar] [CrossRef]
- Huang, D.; Dai, J.; Weng, H.; Wu, P.; Qing, Y.; Cui, H.; Guo, Z.; Zhang, J. EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization. Adv. Neural Inf. Process. Syst. 2024, 37, 84482–84522. [Google Scholar]
- van Stein, N.; Vermetten, D.; Bäck, T. In-the-Loop Hyper-Parameter Optimization for LLM-Based Automated Design of Heuristics. arXiv 2024, arXiv:2410.16309. [Google Scholar] [CrossRef]
- Cummins, C.; Seeker, V.; Grubisic, D.; Elhoushi, M.; Liang, Y.; Rozière, B.; Gehring, J.; Gloeckle, F.; Hazelwood, K.; Synnaeve, G.; et al. Large Language Models for Compiler Optimization. arXiv 2023, arXiv:2309.07062. [Google Scholar] [CrossRef]
- Grubisic, D.; Seeker, V.; Synnaeve, G.; Leather, H.; Mellor-Crummey, J.; Cummins, C. Priority Sampling of Large Language Models for Compilers. In Proceedings of the Workshop on Machine Learning and Systems; ACM: New York, NY, USA, 2024; pp. 91–97. [Google Scholar]
- Li, K.; Hu, Q.; Zhao, J.X.; Chen, H.; Xie, Y.; Liu, T.; Shieh, M.; He, J. InstructCoder: Instruction Tuning Large Language Models for Code Editing. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop); ACL: Bangkok, Thailand, 2024; pp. 473–493. [Google Scholar]
- Xu, J.; Li, J.; Liu, Z.; Suryanarayanan, N.A.V.; Zhou, G.; Guo, J.; Iba, H.; Tei, K. Large Language Models Synergize with Automated Machine Learning. arXiv 2024, arXiv:2405.03727. [Google Scholar] [CrossRef]
- Zhang, K.; Li, G.; Dong, Y.; Xu, J.; Zhang, J.; Su, J.; Liu, Y.; Jin, Z. CodeDPO: Aligning Code Models with Self-Generated and Verified Source Code. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); ACL: Bangkok, Thailand, 2025; pp. 15854–15871. [Google Scholar]








| Dataset | NT1 | NT2 | NT3 | NT4 | |||
|---|---|---|---|---|---|---|---|
| T1 | T2 | VST3 | ST3 | MT3 | WT3/4 | Non-Clone Pairs | |
| Training | 12,800 | 2880 | 1600 | 8000 | 32,000 | 32,000 | 64,000 |
| Testing | 3200 | 720 | 400 | 2000 | 8000 | 8000 | 16,000 |
| Method | NT1D | NT2D | NT3D | NT4D | Total | |
|---|---|---|---|---|---|---|
| CCLearner | 3845 | 102 | 87 | 63 | 4097 | |
| 42 | 2215 | 97 | 91 | 2445 | ||
| 20 | 67 | 8192 | 5434 | 13,713 | ||
| 13 | 16 | 7624 | 10,412 | 18,065 | ||
| Oreo | 3853 | 98 | 82 | 54 | 4087 | |
| 39 | 2236 | 84 | 56 | 2415 | ||
| 23 | 48 | 8688 | 4217 | 12,976 | ||
| 16 | 18 | 7146 | 11,673 | 18,853 | ||
| RSharer | 3885 | 71 | 64 | 53 | 4073 | |
| 24 | 2265 | 81 | 77 | 2447 | ||
| 11 | 42 | 10,144 | 4167 | 14,364 | ||
| 0 | 22 | 5711 | 11,708 | 17,441 | ||
| RioCC | 3892 | 67 | 59 | 53 | 4071 | |
| 17 | 2285 | 68 | 62 | 2432 | ||
| 11 | 33 | 11,921 | 3621 | 15,586 | ||
| 0 | 15 | 3952 | 12,264 | 16,231 | ||
| Total | 3920 | 2400 | 16,000 | 16,000 | 38,320 |
| Method | NT1D | NT2D | NT3D | NT4D | Total | |
|---|---|---|---|---|---|---|
| CCLearner | 89 | 3 | 2 | 0 | 94 | |
| 7 | 87 | 4 | 4 | 102 | ||
| 4 | 6 | 46 | 32 | 88 | ||
| 0 | 4 | 48 | 64 | 116 | ||
| Oreo | 86 | 4 | 1 | 0 | 91 | |
| 10 | 84 | 11 | 1 | 106 | ||
| 3 | 7 | 48 | 28 | 86 | ||
| 1 | 5 | 40 | 71 | 117 | ||
| RSharer | 86 | 2 | 2 | 0 | 90 | |
| 8 | 85 | 12 | 3 | 108 | ||
| 5 | 11 | 60 | 25 | 101 | ||
| 0 | 2 | 36 | 72 | 110 | ||
| RioCC | 92 | 4 | 1 | 0 | 97 | |
| 7 | 88 | 5 | 2 | 102 | ||
| 1 | 6 | 74 | 19 | 100 | ||
| 0 | 2 | 20 | 77 | 99 | ||
| Total | 100 | 100 | 100 | 100 | 400 |
| Model | NT1 | NT2 | NT3 | NT4 |
|---|---|---|---|---|
| RioCC | 4 | 3.1 | 1.8 | 1.1 |
| CCLearner | 4 | 3 | 1.2 | 1.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gao, H.; Guo, C.; Yang, H. RioCC: Efficient and Accurate Class-Level Code Recommendation Based on Deep Code Clone Detection. Entropy 2026, 28, 223. https://doi.org/10.3390/e28020223
Gao H, Guo C, Yang H. RioCC: Efficient and Accurate Class-Level Code Recommendation Based on Deep Code Clone Detection. Entropy. 2026; 28(2):223. https://doi.org/10.3390/e28020223
Chicago/Turabian StyleGao, Hongcan, Chenkai Guo, and Hui Yang. 2026. "RioCC: Efficient and Accurate Class-Level Code Recommendation Based on Deep Code Clone Detection" Entropy 28, no. 2: 223. https://doi.org/10.3390/e28020223
APA StyleGao, H., Guo, C., & Yang, H. (2026). RioCC: Efficient and Accurate Class-Level Code Recommendation Based on Deep Code Clone Detection. Entropy, 28(2), 223. https://doi.org/10.3390/e28020223
