ODEL: An Experience-Augmented Self-Evolving Framework for Efficient Python-to-C++ Code Translation
Abstract
1. Introduction
- 1.
- An experience-driven, on-demand enhancement translation framework, ODEL, significantly outperforms existing baseline methods. By on-demand invocation of a high-performance external model for deep error analysis and structured experience distillation, the framework systematically improves the translation accuracy of the lightweight internal model. On the HumanEval-X benchmark, ODEL increases Pass@1 from 71.82% to 81.10% and Pass@10 from 74.30% to 89.02%, clearly demonstrating the effectiveness and superiority of its experience accumulation and reuse mechanism in enhancing code translation quality.
- 2.
- A sustainable experience accumulation and self-evolution mechanism enables long-term performance improvement across successive tasks. Through a multi-phase translation experiment, we demonstrate that ODEL can achieve sustained self-optimization without external intervention, across successive translation phases within a continuous task stream.
- 3.
- A teacher–student collaborative experience distillation mechanism effectively enhances both the quality of accumulated experience and the performance ceiling of the system. By introducing a high-performance external model (DeepSeek 685B) as the “teacher” for deep error diagnosis and structured experience generation, ODEL produces higher-quality and more generalizable experience units. Experiments show that, compared to using only the internal model for experience generation, the external experience mechanism further improves Pass@1 by 6.35 percentage points, underscoring the critical role of high-quality external experience in pushing the system’s performance boundaries.
2. Related Work
2.1. Methods Based on Foundation Models
2.2. Methods Based on System-Level Frameworks
2.3. Multi-Agent Collaboration Methods
3. ODEL Framework
3.1. Architecture and Workflow of the Framework
3.2. Logic Interpreter
3.3. Code Generator
3.4. Code Validator
| Algorithm 1 Code Validator |
| Require: code← output from Code Generator
test_cases← output from Logic Interpreter Ensure: Validated code (success) or failure after max retries 1: Set max_attempts 2: Set attempt 3: while attempt≤max_attempts do 4: Step 1: Perform quick syntactic checks 5: if syntactic check fails then 6: Record error information 7: Send error feedback to Code Generator 8: attempt ← attempt + 1 9: continue 10: end if 11: Step 2: Invoke compiler for syntax validation 12: if compilation fails then 13: Record error information 14: Send error feedback to Code Generator 15: attempt ← attempt + 1 16: continue 17: end if 18: Step 3: Construct complete executable test program 19: Step 4: Write to temporary file and compile 20: if compilation fails then 21: Record error information 22: Send error feedback to Code Generator 23: attempt ← attempt + 1 24: continue 25: else 26: return code {All steps passed} 27: end if 28: end while 29: return code {Max retries exhausted} |
3.5. Sustainable Experience Integration Mechanism
3.6. Capability Alignment
4. Experiments and Results Analysis
- 1.
- RQ1 (Accuracy Improvement): To what extent does the ODEL framework improve the functional correctness and overall accuracy of automated code translation?
- 2.
- RQ2 (Performance Evolution Across Translation Phases): Can the ODEL system achieve stable performance improvement over successive translation trials through its experience accumulation and reuse mechanism, thereby demonstrating self-evolution and sustained learning ability?
- 3.
- RQ3 (Impact of External Model): How does leveraging a more capable external model for error analysis and experience distillation contribute to further performance gains?
4.1. Experimental Setup
4.2. Baselines
- Qwen2.5-Coder (Single-Agent): Direct translation using Qwen2.5-Coder (32B) [31].
- UniTrans Method [32]: UniTrans is a multi-agent framework built on large language models, where multiple specialized agents cooperate through test generation, translation, execution feedback, and iterative refinement to improve automated code translation accuracy.
- Multi-Agent (No Experience): The framework’s main body uses Qwen2.5-Coder (32B) for translation and iteration.
- ODEL (Internal Experience): The framework’s main body uses Qwen2.5-Coder (32B) for translation and iteration, with its own "reflection" module responsible for error analysis and internal experience generation.
- ODEL (External Experience): The framework uses Qwen2.5-Coder (32B) for translation and iteration, but calls DeepSeek 685B as an external analyzer for in-depth error diagnosis and high-quality experience generation.
- Deepseek-Coder (Single-Agent): Direct translation using Deepseek-Coder (33B) [33].
- Deepseek-Coder ODEL (External Experience): The framework uses Deepseek-Coder (33B) for translation and iteration, but calls DeepSeek 685B as an external analyzer for in-depth error diagnosis and high-quality experience generation.
4.3. Effectiveness of the ODEL
4.4. Performance Evolution Across Translation Phases
4.5. Impact of Experience Source on Translation Performance (Ablation Study)
- (1)
- Even in the absence of experience learning, the multi-agent architecture itself yields measurable improvements over the baseline (Pass@10: 81.65% vs. 74.30%).
- (2)
- Introducing experience learning with the internal model (Qwen 32B) further elevates Pass@10 to 84.70%, confirming the intrinsic value of the experience-driven paradigm.
- (3)
- Most notably, when the powerful external model (DeepSeek 685B) is employed for deep error diagnosis and structured experience distillation, performance reaches its peak: Pass@1: 81.10% and Pass@10: 89.02%. This corresponds to a 6.35 percentage-point gain in Pass@1 and a 4.32 percentage-point gain in Pass@10 over the internal-model experience configuration.
4.6. Deepseek-Coder (33B)
4.7. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Harman, M.; Jia, Y.; Zhang, Y. Search-Based Software Engineering: Trends, Techniques, and Applications. ACM Comput. Surv. 2012, 45, 11. [Google Scholar] [CrossRef]
- Roziere, B.; Lachaux, M.-A.; Chanussot, L.; Lample, G. Unsupervised Translation of Programming Languages. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20), Vancouver, BC, Canada, 6–12 December 2020; pp. 20601–20611. [Google Scholar]
- Nguyen, A.T.; Nguyen, T.T.; Nguyen, N.T. Lexical Statistical Machine Translation for Language Migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013), Saint Petersburg, Russia, 18–26 August 2013; pp. 651–654. [Google Scholar] [CrossRef]
- Chen, X.; Liu, C.; Song, D. Tree-to-Tree Neural Networks for Program Translation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS ’18), Montreal, QC, Canada, 3–8 December 2018; pp. 2552–2562. [Google Scholar]
- Karaivanov, S.; Raychev, V.; Vechev, M. Phrase-Based Statistical Translation of Programming Languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward! 2014), Portland, OR, USA, 20–24 October 2014; pp. 173–184. [Google Scholar] [CrossRef]
- Nijkamp, E.; Pang, B.; Hayashi, H.; Tu, L.; Wang, H.; Zhou, Y.; Savarese, S.; Xiong, C. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv 2022, arXiv:2203.13474. [Google Scholar] [CrossRef]
- Zheng, Z.; Ning, K.; Wang, Y.; Zhang, J.; Zheng, D.; Ye, M.; Chen, J. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arXiv 2023, arXiv:2311.10372v1. [Google Scholar] [CrossRef]
- Fedus, W.; Zoph, B.; Shazeer, N. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. J. Mach. Learn. Res. 2022, 23, 5232–5270. [Google Scholar]
- Ahmad, W.; Chakraborty, S.; Ray, B.; Chang, K.W. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021. [Google Scholar] [CrossRef]
- de Masson d’Autume, C.; Ruder, S.; Kong, L.; Yogatama, D. Episodic Memory in Lifelong Language Learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 13132–13141. [Google Scholar]
- Shinn, N.; Cassano, F.; Berman, E.; Gopinath, A.; Narasimhan, K.; Yao, S. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv 2023, arXiv:2303.11366. [Google Scholar] [CrossRef]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-Refine: Iterative Refinement with Self-Feedback. arXiv 2023, arXiv:2303.17651. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20), Vancouver, BC, Canada, 6–12 December 2020; pp. 9459–9474. [Google Scholar]
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; de Oliveira Pinto, H.P.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021, arXiv:2107.03374. [Google Scholar] [CrossRef]
- Li, Y.; Choi, D.; Chung, J.; Kushman, N.; Schrittwieser, J.; Leblond, R.; Eccles, T.; Keeling, J.; Gimeno, F.; Lago, A.D.; et al. Competition-Level Code Generation with AlphaCode. Science 2022, 378, 1092–1097. [Google Scholar] [CrossRef] [PubMed]
- Rozière, B.; Gehring, J.; Gloeckle, F.; Sootla, S.; Gat, I.; Tan, X.E.; Adi, Y.; Liu, J.; Sauvestre, R.; Remez, T.; et al. Code Llama: Open Foundation Models for Code. arXiv 2023, arXiv:2308.12950. [Google Scholar] [CrossRef]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Wang, Y.; Le, H.; Gotmare, A.D.; Bui, N.D.Q.; Li, J.; Hoi, S.C.H. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. arXiv 2023, arXiv:2305.07922. [Google Scholar] [CrossRef]
- Li, R.; Allal, L.B.; Zi, Y.; Muennighoff, N.; Kocetkov, D.; Mou, C.; Marone, M.; Akiki, C.; Li, J.; Chim, J.; et al. StarCoder: May the Source Be with You! arXiv 2023, arXiv:2305.06161. [Google Scholar] [CrossRef]
- Austin, J.; Odena, A.; Nye, M.; Bosma, M.; Michalewski, H.; Dohan, D.; Jiang, E.; Cai, C.; Terry, M.; Le, Q.; et al. Program Synthesis with Large Language Models. arXiv 2021, arXiv:2108.07732. [Google Scholar] [CrossRef]
- Wang, D.; Li, L. Learning from Mistakes via Cooperative Study Assistant for Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore, 6–10 December 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 10667–10685. [Google Scholar]
- Zheng, Z.; Yin, P.; Lu, C.T.; Huang, X. CodeTrans: Towards Cracking the Language Barrier in Code Translation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 2940–2944. [Google Scholar]
- Wang, Z.; Zhou, S.; Li, X.; Zhang, Y. Continual Learning for Code Generation. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024. [Google Scholar]
- Chen, Z.; Kommrusch, S.J.; Monperrus, M. CERT: Continual Pre-training on Sketches for Library-Oriented Code Generation. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, PA, USA, 16–17 May 2022; pp. 73–84. [Google Scholar]
- Cassano, F.; Gouwar, J.; Nguyen, D.; Nguyen, S.; Phipps-Costin, L.; Pinckney, D.; Yee, M.; Zi, Y.; Anderson, C.J.; Feldman, M.Q.; et al. MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation. arXiv 2022, arXiv:2208.08227. [Google Scholar] [CrossRef]
- Parvez, M.R.; Chakraborty, S.; Ray, B.; Chang, K.-W. Retrieval Augmented Code Generation and Repair. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event, 7–11 November 2021; pp. 2119–2134. [Google Scholar]
- Lu, S.; Xu, D.; Alon, U.; Neubig, G.; Hellendoorn, V. ReACC: Retrieval-Augmented Code Completion. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023. [Google Scholar]
- Karanjai, R.; Blackshear, S.; Xu, L.; Shi, W. Collaboration is all you need: LLM Assisted Safe Code Translation. In Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, Trondheim, Norway, 23–28 June 2025; pp. 671–675. [Google Scholar] [CrossRef]
- Athiwaratkun, B.; Gouda, S.K.; Wang, Z.; Li, X.; Tian, Y.; Tan, M.; Ahmad, W.; Wang, S.; Sun, Q.; Shang, M.; et al. Multi-lingual Evaluation of Code Generation Models. arXiv 2022, arXiv:2210.14868. [Google Scholar] [CrossRef]
- Ellis, K.; Wong, C.; Nye, M.I.; Sablé-Meyer, M.; Morales, L.; Hewitt, L.B.; Cary, L.; Solar-Lezama, A.; Tenenbaum, J.B. DreamCoder: Bootstrapping Inductive Program Synthesis with Wake-Sleep Library Learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21), Online, 20–25 June 2021; pp. 835–850. [Google Scholar] [CrossRef]
- Hui, B.; Yang, J.; Cui, Z.; Yang, J.; Liu, D.; Zhang, L.; Liu, T.; Zhang, J.; Yu, B.; Lu, K.; et al. Qwen2.5-Coder Technical Report. arXiv 2024, arXiv:2409.12186. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, F.; Yu, Z.; Keung, J.W.; Li, J.; Liu, S.; Hong, Y.; Ma, X.; Jin, Z.; Li, G. Exploring and unleashing the power of large language models in automated code translation. Proc. ACM Softw. Eng. 2024, 1, 1585–1608. [Google Scholar] [CrossRef]
- Guo, D.; Zhu, Q.; Yang, D.; Xie, Z.; Dong, K.; Zhang, W.; Chen, G.; Bi, X.; Wu, Y.; Li, Y.K.; et al. DeepSeek-Coder: When the Large Language Model Meets Programming—The Rise of Code Intelligence. arXiv 2024, arXiv:2401.14196. [Google Scholar] [CrossRef]







| Method | Pass@1 | Pass@10 |
|---|---|---|
| Qwen2.5-Coder (Single-Agent) | 71.82% | 74.30% |
| UniTrans Method | 72.16% | 76.64% |
| ODEL (External Experience) | 81.10% | 89.02% |
| Method | Pass@1 | Pass@10 |
|---|---|---|
| Multi-Agent (No Experience) | 72.56% | 81.65% |
| ODEL (Internal Experience) | 74.75% | 84.70% |
| ODEL (External Experience) | 81.10% | 89.02% |
| Method | Pass@1 | Pass@10 |
|---|---|---|
| Deepseek-Coder (Single-Agent) | 60.37% | 60.37% |
| Deepseek-Coder ODEL (External Experience) | 69.09% | 86.59% |
| Qwen2.5-Coder (Single-Agent) | 71.82% | 74.30% |
| UniTrans Method | 72.16% | 76.64% |
| ODEL (External Experience) | 81.10% | 89.02% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Feng, K.; Peng, F.; Wu, J. ODEL: An Experience-Augmented Self-Evolving Framework for Efficient Python-to-C++ Code Translation. Appl. Sci. 2026, 16, 1506. https://doi.org/10.3390/app16031506
Feng K, Peng F, Wu J. ODEL: An Experience-Augmented Self-Evolving Framework for Efficient Python-to-C++ Code Translation. Applied Sciences. 2026; 16(3):1506. https://doi.org/10.3390/app16031506
Chicago/Turabian StyleFeng, Kaiyuan, Furong Peng, and Jiayue Wu. 2026. "ODEL: An Experience-Augmented Self-Evolving Framework for Efficient Python-to-C++ Code Translation" Applied Sciences 16, no. 3: 1506. https://doi.org/10.3390/app16031506
APA StyleFeng, K., Peng, F., & Wu, J. (2026). ODEL: An Experience-Augmented Self-Evolving Framework for Efficient Python-to-C++ Code Translation. Applied Sciences, 16(3), 1506. https://doi.org/10.3390/app16031506

