Comparative Evaluation of Gemini and DeepSeek for LLM-Generated Code Quality and Architectural Robustness in Backend Software Engineering
Abstract
1. Introduction
- Evaluate the effectiveness and practical applicability of selected foundational LLMs for server-side code generation within the vibe-coding context, using a representative scenario commonly encountered by novice or junior software developers.
- Compare two popular but distinct LLMs: a proprietary cloud-based model and an open-weight model suitable for local use.
2. Related Work
2.1. What Is Vibe-Coding?
2.2. AI-Assisted Software Development
3. Methodology
3.1. Experimental Setup
3.2. Target SQL Schema
3.3. Prompting Protocol
- Prompting step 1: Generation of domain entity classes.
- Prompting step 2: Implementation of CRUD (Create–Read–Update–Delete) layers.
- Prompting step 3: Implementation of advanced functionalities.
- Prompting step 4: Configuration and test generation.
3.4. Evaluation Framework for Static and Dynamic Code Analysis
3.5. Evaluation of Generated SQL Schema
4. Results
4.1. Google Gemini Model Evaluation
4.2. DeepSeek Model Evaluation
4.3. Vibe-Coding Efficiency and Human Intervention Rates
4.4. Test Performance and Code Coverage
4.5. System Performance and API Call Latency
4.6. Code Accuracy and Completeness
4.7. System Architecture Analysis
4.8. Static Code Quality Analysis
4.9. SQL Schema Conceptualization and Database Entities Development
5. Discussion
Research Limitations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| -- 1. GENERATE Customers INSERT INTO "customers" (name, email) SELECT 'Kupac ' || n, -- Name in the format "Kupac 1", "Kupac 2", ..., "Kupac 5000", in English "Customer 1", "Customer 2", etc. 'kupac.' || n || '@example.com' FROM generate_series(1, 5000) AS n; -- 2. GENERATE Products INSERT INTO "products" (name, description, price, category) SELECT 'Proizvod ' || n, -- Name in the format "Proizvod 1", "Proizvod 2", ..., "Proizvod 20,000", in English "Product 1", "Product 2", etc. 'Detaljan opis za proizvod ' || n, -- Description in the format "Detaljan opis za proizvod 1", "Detaljan opis za proizvod 2", ..., in English "Detailed description for product 1", "Detailed description for product 2", etc. -- Price between 10.00 and 1010.00 (random() * 1000 + 10)::numeric(10, 2), -- Randomly select one of the 5 categories (Elektronika, Odjeća, Knjige, Kućanstvo, Sport) with equal probability. In English, Electronics, Clothing, Books, Home, Sports. (ARRAY['Elektronika', 'Odjeća', 'Knjige', 'Kućanstvo', 'Sport'])[floor(random() * 5 + 1)] FROM generate_series(1, 20,000) AS n; -- 3. GENERATE Orders INSERT INTO "orders" ("order_date", "customer_id") SELECT -- Random date and time in the last 5 years NOW() - (random() * 365 * 5) * '1 day'::interval, -- Random customer ID between 1 and 5000 floor(random() * 5000 + 1)::bigint FROM generate_series(1, 50000) AS n; -- 4. GENERATE OrderItems DO $$ DECLARE order_id bigint; num_items int; i int; BEGIN -- Loop through each order that was just created FOR order_id IN SELECT id FROM "orders" LOOP -- Each order will have between 1 and 7 items (randomly determined) num_items := floor(random() * 7 + 1); FOR i IN 1..num_items LOOP INSERT INTO "order_items" ("order_id", "product_id", "quantity") VALUES ( order_id, -- Random product ID between 1 and 20,000 floor(random() * 20,000 + 1)::bigint, -- Quantity between 1 and 5 floor(random() * 5 + 1) ); END LOOP; END LOOP; END $$; |
References
- Meske, C.; Hermanns, T.; Von der Weiden, E.; Loser, K.U.; Berger, T. Vibe coding as a reconfiguration of intent mediation in software development: Definition, implications, and research agenda. IEEE Access 2025, 13, 213242–213259. [Google Scholar] [CrossRef]
- Leau, Y.B.; Loo, W.K.; Tham, W.Y.; Tan, S.F. Software development life cycle AGILE vs traditional approaches. In Proceedings of the International Conference on Information and Network Technology; IACSIT Press: Singapore, 2012; Volume 37, pp. 162–167. [Google Scholar]
- Stober, T.; Hansmann, U. Traditional software development. In Agile Software Development: Best Practices for Large Software Development Projects; Springer: Berlin/Heidelberg, Germany, 2009; pp. 15–33. [Google Scholar]
- Michael, J.; Cleophas, L.; Zschaler, S.; Clark, T.; Combemale, B.; Godfrey, T.; Khelladi, D.E.; Kulkarni, V.; Lehner, D.; Rumpe, B.; et al. Model-driven engineering for digital twins: Opportunities and challenges. Syst. Eng. 2025, 28, 659–670. [Google Scholar]
- Schmidt, D.C. Model-driven engineering. Computer 2006, 39, 25. [Google Scholar] [CrossRef]
- Verbruggen, C.; Snoeck, M. Practitioners’ experiences with model-driven engineering: A meta-review. Softw. Syst. Model. 2023, 22, 111–129. [Google Scholar]
- Shafiee, S.; Wautelet, Y.; Friis, S.C.; Lis, L.; Harlou, U.; Hvam, L. Evaluating the benefits of a computer-aided software engineering tool to develop and document product configuration systems. Comput. Ind. 2021, 128, 103432. [Google Scholar] [CrossRef]
- Sarkar, A.; Drosos, I. Vibe coding: Programming through conversation with artificial intelligence. arXiv 2025, arXiv:2506.23253. [Google Scholar]
- Fischer, M.; Lanquillon, C. Evaluation of generative AI-assisted software design and engineering: A user-centered approach. In Proceedings of the International Conference on Human-Computer Interaction; Springer Nature: Cham, Switzerland, 2024; pp. 31–47. [Google Scholar]
- Elgendy, I.A.; Dwivedi, Y.K.; Al-Sharafi, M.A.; Hosny, M.; Helal, M.Y.; Crick, T.; Hughes, L.; Alwahaishi, S.; Mahmud, M.; Dutot, V.; et al. Responsible Vibe Coding: Architecture, Opportunities, and Research Agenda. J. Comput. Inf. Syst. 2026, 1–19. [Google Scholar] [CrossRef]
- Sapkota, R.; Roumeliotis, K.I.; Karkee, M. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic ai. arXiv 2025, arXiv:2505.19443. [Google Scholar]
- Horvat, M.; Kralj, B.; Gledec, G. A Comparative Study of Vibe Coding with ChatGPT and Gemini in Front-end Web Development. In Proceedings of the 36th International Scientific Conference: Central European Conference on Information and Intelligent Systems (CECIIS 2025); University of Zagreb, Faculty of Organization and Informatics: Varaždin, Croatia, 2025; pp. 787–796. [Google Scholar]
- Ljubi, I.; Grgić, Z.; Vuković, M.; Gledec, G. Detecting Disinformation in Croatian Social Media Comments. Future Internet 2025, 17, 178. [Google Scholar] [CrossRef]
- Divjak, D.; Sharoff, S.; Erjavec, T. Slavic corpus and computational linguistics. J. Slav. Linguist. 2017, 25, 171–199. [Google Scholar] [CrossRef]
- Gledec, G.; Sokele, M.; Horvat, M.; Mikuc, M. Error pattern discovery in spellchecking using multi-class confusion matrix analysis for the Croatian language. Computers 2024, 13, 39. [Google Scholar] [CrossRef]
- Gledec, G.; Horvat, M.; Mikuc, M.; Blašković, B. A Comprehensive Dataset of Spelling Errors and Users’ Corrections in Croatian Language. Data 2023, 8, 89. [Google Scholar] [CrossRef]
- Gadde, A. Democratizing software engineering through generative ai and vibe coding: The evolution of no-code development. J. Comput. Sci. Technol. Stud. 2025, 7, 556–572. [Google Scholar] [CrossRef]
- Ge, Y.; Mei, L.; Duan, Z.; Li, T.; Zheng, Y.; Wang, Y.; Wang, L.; Yao, J.; Liu, T.; Cai, Y.; et al. A survey of vibe coding with large language models. arXiv 2025, arXiv:2510.12399. [Google Scholar]
- Osmani, A. Beyond Vibe Coding: From Coder to AI-Era Developer; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2025. [Google Scholar]
- Ray, P.P. A review on vibe coding: Fundamentals, state-of-the-art, challenges and future directions. TechRxiv 2025. [Google Scholar] [CrossRef] [PubMed]
- Kusper, G.; Szabó, C. Vibe Coding in Education. In Proceedings of the 2025 International Conference on Emerging eLearning Technologies and Applications (ICETA); IEEE: New York, NY, USA, 2025; pp. 506–511. [Google Scholar]
- Geng, F.; Shah, A.; Li, H.; Mulla, N.; Swanson, S.; Soosai Raj, G.; Zingaro, D.; Porter, L. Exploring student-AI interactions in vibe coding. In Proceedings of the 28th Australasian Computing Education Conference; Association for Computing Machinery: New York, NY, USA, 2026; pp. 45–54. [Google Scholar]
- Šarčević, A.; Tomičić, I.; Merlin, A.; Horvat, M. Enhancing programming education with open-source generative AI chatbots. In Proceedings of the 2024 47th MIPRO ICT and Electronics Convention (MIPRO); IEEE: New York, NY, USA, 2024; pp. 2051–2056. [Google Scholar]
- Horvat, M. What is Vibe coding and when should you use it (or not)? TechRxiv 2025. [Google Scholar] [CrossRef] [PubMed]
- Fan, Y.; Tang, L.; Le, H.; Shen, K.; Tan, S.; Zhao, Y.; Shen, Y.; Li, X.; Gašević, D. Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. Br. J. Educ. Technol. 2025, 56, 489–530. [Google Scholar]
- Kosmyna, N.; Hauptmann, E.; Yuan, Y.T.; Situ, J.; Liao, X.H.; Beresnitzky, A.V.; Braunstein, I.; Maes, P. Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. arXiv 2025, arXiv:2506.08872. [Google Scholar]
- Gerlich, M. AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies 2025, 15, 6. [Google Scholar] [CrossRef]
- Cabot, J. Vibe modeling: Challenges and opportunities. In Proceedings of the International Conference on Conceptual Modeling; Springer Nature: Cham, Switzerland, 2025; pp. 105–118. [Google Scholar]
- Revuri, J.; Sakthivel, R.K.; Nagasubramanian, G. Artificial intelligence (AI) technologies and tools for accelerated software development. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2026; Volume 141, pp. 115–159. [Google Scholar]
- Alharbi, M.; Alshayeb, M. Automatic Code Generation Techniques: A Systematic Literature Review. Autom. Softw. Eng. 2026, 33, 4. [Google Scholar]
- Odeh, A. Exploring AI innovations in automated software source code generation: Progress, hurdles, and future paths. Informatica 2024, 48, 125–136. [Google Scholar] [CrossRef]
- Chou, Y.H.; Jiang, B.; Chen, Y.W.; Weng, M.; Jackson, V.; Zimmermann, T.; Jones, J.A. Building Software by Rolling the Dice: A Qualitative Study of Vibe Coding. arXiv 2025, arXiv:2512.22418. [Google Scholar]
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; Pinto, H.P.d.O.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating large language models trained on code. arXiv 2021, arXiv:2107.03374. [Google Scholar]
- Zheng, Q.; Xia, X.; Zou, X.; Dong, Y.; Wang, S.; Xue, Y.; Shen, L.; Wang, Z.; Wang, A.; Li, Y.; et al. Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2023; pp. 5673–5684. [Google Scholar]
- Yu, Z.; Zhao, Y.; Cohan, A.; Zhang, X.P. Humaneval pro and mbpp pro: Evaluating large language models on self-invoking code generation. arXiv 2024, arXiv:2412.21199. [Google Scholar]
- Fawzy, A.; Tahir, A.; Blincoe, K. Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook—A Grey Literature Review. arXiv 2025, arXiv:2510.00328. [Google Scholar]
- Jahić, J.; Sami, A. State of practice: Llms in software engineering and software architecture. In Proceedings of the 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C); IEEE: New York, NY, USA, 2024; pp. 311–318. [Google Scholar]
- Zhong, P.; Vaezipoor, P.; Cui, F.; Kumar, V.; Asgarian, A.; Austin, J.; Ho, T.; Inder, P.; Kedir, I.; Catasta, M.; et al. ViBench: A Benchmark on Vibe Coding. In Proceedings of the 1st ACM Conference on Agentic and AI Systems (CAIS’26); ACM: New York, NY, USA, 2026. [Google Scholar]
- Liu, J.; Wang, K.; Chen, Y.; Peng, X.; Chen, Z.; Zhang, L.; Lou, Y. Large language model-based agents for software engineering: A survey. In ACM Transactions on Software Engineering and Methodology; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar]
- Panichella, S. Vulnerabilities introduced by llms through code suggestions. In Large Language Models in Cybersecurity: Threats, Exposure and Mitigation; Springer Nature: Cham, Switzerland, 2024; pp. 87–97. [Google Scholar]
- Xie, Y.; Wu, S.; Chakravarty, S. AI meets AI: Artificial intelligence and academic integrity-A survey on mitigating AI-assisted cheating in computing education. In Proceedings of the 24th Annual Conference on Information Technology Education; Association for Computing Machinery: New York, NY, USA, 2023; pp. 79–83. [Google Scholar]
- Chen, B.; Lewis, C.M.; West, M.; Zilles, C. Plagiarism in the age of generative ai: Cheating method change and learning loss in an intro to CS course. In Proceedings of the Eleventh ACM Conference on Learning @ Scale; Association for Computing Machinery: New York, NY, USA, 2024; pp. 75–85. [Google Scholar]
- Pan, W.H.; Chok, M.J.; Wong, J.L.S.; Shin, Y.X.; Poon, Y.S.; Yang, Z.; Yang, Z.; Chong, C.Y.; Lo, D.; Lim, M.K. Assessing ai detectors in identifying ai-generated code: Implications for education. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–11. [Google Scholar]
- Baek, J.; Yamazaki, T.; Morihata, A.; Mori, J.; Yamakata, Y.; Taura, K.; Chiba, S. LLM-Based Explainable Detection of LLM-Generated Code in Python Programming Courses. In Proceedings of the 57th ACM Technical Symposium on Computer Science Education V. 1; Association for Computing Machinery: New York, NY, USA, 2026; pp. 80–86. [Google Scholar]
- Song, F.; Agarwal, A.; Wen, W. The impact of generative AI on collaborative open-source software development: Evidence from GitHub Copilot. arXiv 2024, arXiv:2410.02091. [Google Scholar]
- Becker, J.; Rush, N.; Barnes, E.; Rein, D. Measuring the impact of early-2025 AI on experienced open-source developer productivity. arXiv 2025, arXiv:2507.09089. [Google Scholar]
- Esteban Cuellar Argotty, J.; Manrique, R. AI-Generated Code Detection: An Examination of Current Tools in Education. In Proceedings of the International Conference on Intelligent Tutoring Systems; Springer Nature: Cham, Switzerland, 2025; pp. 192–201. [Google Scholar]
- Mekterović, I.; Brkić, L.; Horvat, M. Scaling automated programming assessment systems. Electronics 2023, 12, 942. [Google Scholar] [CrossRef]
- Jiang, J.; Wang, F.; Shen, J.; Kim, S.; Kim, S. A survey on large language models for code generation. ACM Trans. Softw. Eng. Methodol. 2026, 35, 58. [Google Scholar] [CrossRef]
- Zheng, D.; Wang, Y.; Shi, E.; Zhang, H.; Zheng, Z. How well do llms generate code for different application domains? benchmark and evaluation. arXiv 2024, arXiv:2412.18573. [Google Scholar]
- Huynh, N.; Lin, B. Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications. arXiv 2025, arXiv:2503.01245. [Google Scholar]
- Stoyanova, M. Integrating Logic Programming with Large Language Models: Opportunities and Challenges. In Strategic Responses to Global Uncertainty: Rethinking Markets, Governance and Innovation; University of Economics–Varna: Varna, Bulgaria, 2025; pp. 512–524. [Google Scholar]
- Zhong, L.; Wang, Z. Can llm replace stack overflow? A study on robustness and reliability of large language model code generation. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2024; Volume 38, pp. 21841–21849. [Google Scholar]
- Paul, D.G.; Zhu, H.; Bayley, I. Does LLM Generated Code Smell? In Proceedings of the 2025 9th International Conference on Cloud and Big Data Computing (ICCBDC); Association for Computing Machinery: New York, NY, USA, 2025; pp. 68–73. [Google Scholar]
- Comanici, G.; Bieber, E.; Schaekermann, M.; Pasupat, I.; Sachdeva, N.; Dhillon, I.; Blistein, M.; Ram, O.; Zhang, D.; Rosen, E.; et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv 2025, arXiv:2507.06261. [Google Scholar]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437. [Google Scholar]
- Saeidnia, H.R. Welcome to the Gemini era: Google DeepMind and the information industry. Libr. Hi Tech News 2023, 43, 18–20. [Google Scholar] [CrossRef]
- Deng, Z.; Ma, W.; Han, Q.L.; Zhou, W.; Zhu, X.; Wen, S.; Xiang, Y. Exploring DeepSeek: A survey on advances, applications, challenges and future directions. IEEE/CAA J. Autom. Sin. 2025, 12, 872–893. [Google Scholar] [CrossRef]
- Chaturvedi, V. Modern software development with Java, Spring Boot, and Python: A survey of frameworks and best practices. ESP J. Eng. Technol. Adv. 2023, 3, 188–197. [Google Scholar]
- Abdulkareem Hamaamin, R.; Mohammed Amin Ali, O.; Wahhab Kareem, S. Java programming language: Time permanence comparison with other languages: A review. ITM Web Conf. 2024, 64, 01012. [Google Scholar] [CrossRef]
- de Oliveira, C.E.; Turnquist, G.L.; Antonov, A. Developing Java Applications with Spring and Spring Boot: Practical Spring and Spring Boot Solutions for Building Effective Applications; Packt Publishing Ltd.: Birmingham, UK, 2018. [Google Scholar]
- Bharathan, R. Apache Maven Cookbook; Packt Publishing Ltd.: Birmingham, UK, 2015. [Google Scholar]
- Krochmalski, J. IntelliJ IDEA Essentials; Packt Publishing Ltd.: Birmingham, UK, 2014. [Google Scholar]
- Bonteanu, A.M.; Tudose, C. Performance analysis and improvement for CRUD operations in relational databases from java programs using JPA, hibernate, spring data JPA. Appl. Sci. 2024, 14, 2743. [Google Scholar] [CrossRef]
- Tudose, C.; Odubăşteanu, C. Object-relational mapping using JPA, hibernate and spring data JPA. In Proceedings of the 2021 23rd International Conference on Control Systems and Computer Science (CSCS); IEEE: New York, NY, USA, 2021; pp. 424–431. [Google Scholar]
- Tudose, C.; Bauer, C.; King, G. Java Persistence with Spring Data and Hibernate; Simon and Schuster: New York, NY, USA, 2023. [Google Scholar]
- Drake, J.D.; Worsley, J.C. Practical PostgreSQL; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2002. [Google Scholar]
- Douglas, K.; Douglas, S. PostgreSQL: A Comprehensive Guide to Building, Programming, and Administering PostgresSQL Databases; SAMS Publishing: Indianapolis, IN, USA, 2003. [Google Scholar]
- Wang, Z.; Liu, R.; Fu, J. Prompting. In Interactive Natural Language Processing: Language Model as Agent; Springer Nature: Cham, Switzerland, 2026; pp. 87–101. [Google Scholar]
- Parker, G.; Kim, S.; Al Maruf, A.; Cerny, T.; Frajtak, K.; Tisnovsky, P.; Taibi, D. Visualizing anti-patterns in micro-services at runtime: A systematic mapping study. IEEE Access 2023, 11, 4434–4442. [Google Scholar]
- Sharma, A.; Chaturvedi, A.; Tripathi, A.K. From problem descriptions to user stories: Utilizing large language models through prompt chaining. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Horvat, M. Which Prompting Technique is Better? Improving Vibe Coding Code Quality with Efficient Prompt Design for Web Front-End Development. Interaction 2026, 9, 10. [Google Scholar]
- Marcilio, D.; Bonifácio, R.; Monteiro, E.; Canedo, E.; Luz, W.; Pinto, G. Are static analysis violations really fixed? a closer look at realistic usage of sonarqube. In Proceedings of the 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC); IEEE: New York, NY, USA, 2019; pp. 209–219. [Google Scholar]
- Gupta, S.; Bhatia, M.; Memoria, M.; Manani, P. Prevalence of GitOps, DevOps in fast CI/CD cycles. In Proceedings of the 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON); IEEE: New York, NY, USA, 2022; Volume 1, pp. 589–596. [Google Scholar]
- Paiva, T.; Damasceno, A.; Figueiredo, E.; Sant’Anna, C. On the evaluation of code smells and detection tools. J. Softw. Eng. Res. Dev. 2017, 5, 7. [Google Scholar] [CrossRef]
- Fontana, F.A.; Dietrich, J.; Walter, B.; Yamashita, A.; Zanoni, M. Antipattern and code smell false positives: Preliminary conceptualization and classification. In Proceedings of the 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER); IEEE: New York, NY, USA, 2016; Volume 1, pp. 609–613. [Google Scholar]
- Das, D.; Maruf, A.A.; Islam, R.; Lambaria, N.; Kim, S.; Abdelfattah, A.S.; Cerny, T.; Frajtak, K.; Bures, M.; Tisnovsky, P. Technical debt resulting from architectural degradation and code smells: A systematic mapping study. ACM SIGAPP Appl. Comput. Rev. 2022, 21, 20–36. [Google Scholar]
- Pantaleev, A.; Rountev, A. Identifying data transfer objects in EJB applications. In Proceedings of the Fifth International Workshop on Dynamic Analysis (WODA’07); IEEE: New York, NY, USA, 2007; p. 5. [Google Scholar]
- Pardede, C.; Sihombing, W.; Nainggolan, W. Comparative Study of Manual and Generated Data Transfer Object Implementation Performance. J. Appl. Inform. Comput. 2025, 9, 2912–2919. [Google Scholar] [CrossRef]
- Hemmati, H. How effective are code coverage criteria? In Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security; IEEE: New York, NY, USA, 2015; pp. 151–156. [Google Scholar]
- Al-Ahmad, B. Using Code Coverage Metrics for Improving Software Defect Prediction. J. Softw. 2018, 13, 654–674. [Google Scholar] [CrossRef]
- Elbaum, S.; Gable, D.; Rothermel, G. The impact of software evolution on code coverage information. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM 2001); IEEE: New York, NY, USA, 2001; pp. 170–179. [Google Scholar]
- Wikantyasa, I.M.A.; Kurniawan, A.P.; Rochimah, S. CK metric and architecture smells relations: Towards software quality assurance. In Proceedings of the 2023 14th International Conference on Information & Communication Technology and System (ICTS); IEEE: New York, NY, USA, 2023; pp. 13–17. [Google Scholar]
- Arachchi, S.A.I.B.S.; Perera, I. Continuous integration and continuous delivery pipeline automation for agile software project management. In Proceedings of the 2018 Moratuwa Engineering Research Conference (MERCon); IEEE: New York, NY, USA, 2018; pp. 156–161. [Google Scholar]
- Rahmani, A.M.; Hemmati, A.; Abbasi, S. The Rise of Large Language Models: Evolution, Applications, and Future Directions. Eng. Rep. 2025, 7, e70368. [Google Scholar] [CrossRef]
- Harsha, K.; Tarun Kumar, K.; Sumathi, D.; Ajith Jubilson, E. A survey on LLMs: Evolution, applications, and future frontiers. In Generative AI: Current Trends and Applications; Springer Nature: Singapore, 2024; pp. 289–327. [Google Scholar]
- Patil, R.; Gudivada, V. A review of current trends, techniques, and challenges in large language models (LLMs). Appl. Sci. 2024, 14, 2074. [Google Scholar] [CrossRef]













| Vibe-Coding Success Metric | Gemini | DeepSeek |
|---|---|---|
| Total number of errors | 16 | 22 |
| Total number of corrective LLM prompts | 4 | 8 |
| Total number of manual code corrections | 20 | 23 |
| TEST Metric | Gemini | DeepSeek |
|---|---|---|
| Total number of generated tests | 23 | 27 |
| Successful test rate (%) | 95.7 | 74.1 |
| Code test coverage (%) | 55.9 | 45.6 |
| N | API Call Test Scenarios | Gemini Latency [ms] | DeepSeek Latency [ms] |
|---|---|---|---|
| 1 | Retrieval of a specific product by ID | 10 | 11 |
| 2 | Frequent category filter | 35 | 27 |
| 3 | Complex search (filter + sort) | 23 | 30 |
| 4 | Retrieval of the top 5 best-selling products | 175 | 170 |
| 5 | Retrieval of all orders for a specific user | 620 | 15 1 |
| Evaluator 1 | |||
|---|---|---|---|
| Functional Category | Functional Unit | Gemini | DeepSeek |
| Domain model | Implementing entities and JPA relations | 1 | 1 |
| Domain model | Applying validation annotations | 1 | 1 |
| CRUD architecture | Generating complete layers (repo, service, controller) | 1 | 1 |
| CRUD architecture | Implementing CRUD endpoint for Customer and Order entities | 1 | 1 |
| CRUD architecture | Implementing CRUD endpoint for Product and OrderItem entities | 1 | 1 |
| Advanced functionality | Implementing filtering and product sorting | 1 | 0.5 |
| Advanced functionality | Implementing aggregation (top 5 products’ retrieval) | 1 | 0.83 |
| Advanced functionality | Implementing order retrieval by user | 0.5 | 0 |
| Configuration | Correct configuration for PostgreSQL database | 1 | 0.83 |
| Testing | Generating unit tests (service) | 1 | 0.67 |
| Testing | Generating integration tests (repo) | 0.83 | 0.17 |
| Testing | Generating integration tests (controller) | 0.5 | 0.17 |
| Total points | 10.83/12 | 8.17/12 | |
| Average | 0.9 | 0.68 | |
| Std. dev. | 0.19 | 0.36 | |
| Success ratio (%) | 90.25 | 68.08 | |
| Quality Metric | Gemini | DeepSeek |
|---|---|---|
| Code smells | 15 | 10 |
| Technical debt (estimated time) | 2 h 20 min | 1 h 32 min |
| Gemini | DeepSeek | |
|---|---|---|
| Number of tables created | 4 | 3 |
| Primary key | BigSerial (Long for PostgreSQL) | BigSerial (Long for PostgreSQL) |
| Attributes | Customer (email with additional first_name and last_name) Product (name, price, but omitted description and category) Orders (correct; customerId and orderDate) OrderItems (correct; orderId, productid, quantity) | Customer (email with additional first name, last name, phone number and createdAt) Product (name, price, description, but omitted category, stock quantity and createdAt) Orders (customerId and orderDate, with additional total amount and status) OrderItems (entirely omitted) |
| Naming consistency | Used snake_case naming format, but did correct it to camel case format when prompted In SQL queries used snake_case naming format | Used snake_case naming format, but did correct it to camel case format when prompted Correctly used camel case naming format in SQL queries |
| Number of additional prompts | 3 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Horvat, M.; Ursić, I.; Krmpotić, K. Comparative Evaluation of Gemini and DeepSeek for LLM-Generated Code Quality and Architectural Robustness in Backend Software Engineering. Electronics 2026, 15, 2805. https://doi.org/10.3390/electronics15132805
Horvat M, Ursić I, Krmpotić K. Comparative Evaluation of Gemini and DeepSeek for LLM-Generated Code Quality and Architectural Robustness in Backend Software Engineering. Electronics. 2026; 15(13):2805. https://doi.org/10.3390/electronics15132805
Chicago/Turabian StyleHorvat, Marko, Iva Ursić, and Klara Krmpotić. 2026. "Comparative Evaluation of Gemini and DeepSeek for LLM-Generated Code Quality and Architectural Robustness in Backend Software Engineering" Electronics 15, no. 13: 2805. https://doi.org/10.3390/electronics15132805
APA StyleHorvat, M., Ursić, I., & Krmpotić, K. (2026). Comparative Evaluation of Gemini and DeepSeek for LLM-Generated Code Quality and Architectural Robustness in Backend Software Engineering. Electronics, 15(13), 2805. https://doi.org/10.3390/electronics15132805

