An Automatic Code Generation Tool Using Generative Artificial Intelligence for Element Fill-in-the-Blank Problems in a Java Programming Learning Assistant System
Abstract
:1. Introduction
2. Related Works
2.1. Programming Education
2.2. Generative AI
2.3. Prompt Engineering
3. Preliminary Works
3.1. Java Programming Learning Assistant System (JPLAS)
3.2. Element Fill-in-the-Blank Problem
3.2.1. Definition of an Element
- Reserved words are predefined sequences of characters that serve specific functions, such as “private” or “public”.
- Identifiers are names defined by the programmer to represent variables, classes, or methods.
- Control symbols include punctuation marks used in the syntax, such as “.” (dot), “:” (colon), “;” (semicolon), “()” (parentheses), and “{}” (curly brackets).
- Operators are used in conditional expressions to define logical conditions, such as “<” and “&&”.
3.2.2. Blank Element Selection Algorithm
- Vertex generation for the constraint graph: each potential blank element is selected from the source code and represented as a vertex in the constraint graph.
- Edge generation for the constraint graph: an edge is added between any two vertices that should not be blanked simultaneously to ensure uniqueness.
- Compatibility graph construction: the complement of the constraint graph is taken to create the compatibility graph, which represents pairs of elements that can be blanked together.
- Clique extraction: a maximal clique is identified using a simple greedy algorithm to select the largest possible set of blank elements with unique answers with the following steps:
- Select the vertex with the highest degree in the compatibility graph.
- Remove this vertex and all its non-adjacent vertices.
- Repeat until no vertices remain.
3.2.3. Coding Rule Check Function
- Naming rules: Naming rules help identify naming errors in source code. We adopt the camel case as the standard Java naming convention:
- Variables, methods, and method arguments: the first letter should be lowercase, with each subsequent word capitalized.
- Classes: the first letter of each word should be uppercase.
- Constants: all letters should be uppercase.
- Identifiers must be meaningful English words; Japanese or Romanized Japanese should not be used.
- Coding styles: Coding style rules ensure a consistent code layout by checking elements such as indentation, bracket placement, and spacing. Following these rules improves code clarity and uniformity, making it easier to read and maintain.
- Potential problems: Potential issues refer to code segments that can be compiled successfully but may introduce functional errors or bugs. These include:
- Dead code: portions of the code that are never executed.
- Overlapping code: multiple code segments with similar structures and functions.
3.2.4. EFP Generation Steps
- Select a source code that covers the syntactic topics to be studied.
- Apply the coding rule check function to detect and fix the issues with naming rules, coding styles, or potential problems.
- Apply JFlex and Jay [32] to tokenize the source code into a sequence of lexical units or elements, and classify each element type.
- Apply the blank element selection algorithm to choose the blank elements that have grammatically correct and unique answers.
- Upload the generated EFP instance to the JPLAS server.
4. Methodology for Proper Code Generation
4.1. Adopted Approach for AI
4.2. Code Quality Assessment
- Code accuracy;
- Code relevance to the topic;
- Code difficulty;
- Feasibility of problem generation.
4.2.1. Code Accuracy
- is the score obtained from the coding rule check function, calculated as the proportion of passed checks over total checks.
- is the score from the JUnit test results, calculated as the proportion of passed test cases.
- is a tunable parameter that adjusts the relative importance of style versus functionality. Here, we set to emphasize functional correctness.
4.2.2. Java Learning Topics
- Primary stage: Java basic grammar learning.
- Variables and data types;
- Operators (arithmetic, logical, bitwise, assignment);
- Control flow (if-else, switch-case, for-loop, while-loop, do-while-loop, break, continue);
- Arrays (one-dimensional array, multi-dimensional array, array iteration);
- Methods (method declaration, parameters and return values, overloading);
- Classes and objects (class declaration, object instantiation, constructor, this, encapsulation).
- Intermediate stage: object-oriented and API learning.
- Inheritance (extends, super, override);
- Polymorphism (upcast, downcast);
- Abstract classes and interfaces (abstract, interface, default method);
- Inner classes (member inner class, local inner class, anonymous inner class);
- String handling (immutability, StringBuilder, StringBuffer, charAt, substring, indexOf, split);
- Exception handling (try-catch-finally, throws, throw, custom exception);
- Collections framework (List, Set, Map, Iterator);
- I/O streams (File, InputStream, OutputStream, Reader, Writer, BufferedReader, BufferedWriter).
- Advanced stage: advanced syntax programming learning.
- Generics (generic class, generic method, wildcards);
- Lambda expressions (functional interface, consumer, function, predicate);
- Stream API (one-dimensional array, multi-dimensional array, array iteration);
- Reflection;
- Multithreading (thread, runnable, synchronized, lock, executor).
4.2.3. Code Relevance to the Topic
- represents the number of appearances of the selected syntax elements;
- represents the number of appearances of all syntax elements;
- and represent the weights of the selected syntax elements and all syntax elements, respectively.
- is the original relevance index;
- is the number of syntax elements that should not appear at the current learning stage;
- is the total number of syntax elements in the code;
- is the penalty weight, controlling the influence of inappropriate content. Here, we set for the primary stage and for the intermediate stage.
4.2.4. Code Difficulty
- Syntax Topics with Weight 1:
- Variable;
- Access Modifier;
- Primitive Data Type;
- Wrapper Class;
- Operator;
- Control Statement;
- Array;
- Common Word;
- Code Block.
- Syntax Topics with Weight 2:
- String Functions;
- Exception;
- Package;
- I/O.
- Syntax Topics with Weight 3:
- Class;
- Interface;
- Regular Expression;
- Recursion;
- Collections Framework.
- D represents the overall code difficulty.
- S represents the number of syntax elements.
- represents the weight of syntax elements.
- M represents the number of methods.
- L represents the total lines of code.
- I represents the depth of inheritance.
- N represents the number of nested structures.
- represent the weights assigned to each factor.
- Primary stage: we set , , , and to emphasize basic syntax comprehension while minimizing structural complexity.
- Intermediate stage: we set , , , and to balance syntax, method usage, code length, and structural complexity as students build deeper understanding.
- Advanced stage: we set , , , and to place greater emphasis on structural complexity and advanced programming concepts.
- Rationale for weight selection: The weight distribution was determined based on an analysis of standard teaching materials and historical student learning performance. We referred to widely used introductory programming textbooks and teaching materials to identify which elements are most emphasized at different stages of learning. Additionally, we reviewed past records of student progress and performance to align the weights with the actual learning challenges encountered at different proficiency levels. While the weighting process inherently involved some subjectivity, it was grounded in established educational practices and designed to align with the cognitive progression of novice to advanced learners.
4.2.5. Feasibility of Problem Generation
- B represents the number of blank elements selected for problem generation.
- T represents the total number of tokens in the source code.
4.2.6. Objective Function
- represents the weight for accuracy;
- represents the weight for feasibility;
- represents the weight for relevance;
- represents the weight for difficulty.
4.3. Generative AI
AI Model Introduction
4.4. Prompt Engineering
- Clarity: prompts should be clear, specific, and avoid ambiguity.
- Context: providing relevant background information improves accuracy.
- Instruction-based: the model is directly instructed on what to do, such as “list the steps” or “summarize briefly.”
- Examples (few-shot learning): providing examples helps guide AI to generate responses in the expected format.
- Constraints: setting limits on word count, format, or style ensures the output meets specific requirements.
- Temperature and top-p: the temperature parameter controls the randomness of the output—lower values result in more deterministic responses, and, the top-p parameter governs nucleus sampling, a technique that introduces controlled randomness to the model’s output.
- Zero-shot and few-shot: Zero-shot and few-shot are two key prompting techniques used in LLMs to control how they generate responses. Zero-shot means that the model is given a prompt without any example and is expected to generate a correct response based on its pre-trained knowledge. It relies entirely on the AI’s pre-existing understanding of language and concepts. Few-shot, on the other hand, means that the model is provided with a few examples in the prompt before being asked to generate a response. The examples help the model understand the pattern and produce more accurate results.Example:
- Zero-shot: “What is the capital of France?” The model will respond, “Paris.”
- Few-shot: “The capital of Italy is Rome. The capital of Germany is Berlin. What is the capital of France?” The model will respond, “Paris.”
- Chain-of-thought (CoT): CoT prompting is a technique used in LLMs to improve reasoning and problem solving by encouraging the model to break down its thought process step by step. Instead of directly generating an answer, the model explains its reasoning before reaching a conclusion. By mimicking human thought processes, CoT prompting helps AI produce more accurate and explainable responses.Example:When asked, “What is 37 multiplied by 6?”, the model might respond with: “First, I multiply 30 by 6, which gives me 180. Then, I multiply 7 by 6, which gives me 42. Finally, I add 180 and 42, which gives me 222.”This step-by-step breakdown demonstrates reasoning before arriving at the final answer.
- Black-box prompt optimization (BPO): BPO is a technique for improving prompts used in LLMs without knowing the internal workings of the model. Since the LLM is treated as a “black box”, the optimization is achieved by evaluating different prompts based on output performance rather than analyzing the model’s structure or gradients. It presents a practical way of refining AI interactions without accessing the model’s internal mechanisms.Example:
- Initial prompt: “Tell me the benefits of regular exercise.”
- Revised prompt: “What are the top three benefits of daily physical activity for mental health?”
The model’s response to the initial prompt might not fully meet the expectations in terms of detail or relevance. By evaluating the response, the prompt is refined to focus more specifically on the mental health benefits. This optimization continues iteratively based on the quality of the generated responses to each version of the prompt.
5. Experiments
5.1. Prompt Strategies
- Zero-shot:
- Input:
- Please help me generate a piece of code using java, the subject is: for loop.
- Chain-of-thought (CoT):
- Input:
- I need to generate a piece of Java code related to the topic: for loop. Before generating the code, let us break down the process step by step:
- 1.
- Understanding the topic:
- −
- The subject is “For Loop”, so the generated code must showcase a for loop in a meaningful way.
- −
- The code should be correct and syntactically valid in Java.
- 2.
- Evaluating difficulty and structure:
- −
- The code should not be too simple (e.g., just printing numbers in a loop) but also not too complex.
- −
- It should demonstrate a common programming pattern using a for loop.
- 3.
- Ensuring questionability:
- −
- The code should be suitable for generating programming exercises.
- −
- It should have elements that can be modified or extended for students to practice.
- 4.
- Code generation:
- −
- Now, generate a Java program that meets these requirements.
- −
- Ensure the code compiles correctly.
- Black-box prompt optimization (BPO):Figure 1 shows a flow chart of BPO processing.
- Input:Your task is to generate a Java program related to the topic: for loop.After generating the code, the program calls the objective function F to evaluate it. It systematically checks whether the code meets the four evaluation criteria. If any criteria is not met, the program provides feedback along with the previously generated code, guiding the AI to refine and improve it.Below are some example prompts for guiding the AI in making targeted modifications.Possible Input:The above is the for Llop code you just generated. I noticed that the class name does not follow the required naming convention. Please correct it.Possible Input:The above is the for loop code you just generated. I noticed that it is not well-uited for fill-in-the-blank questions. Please revise it accordingly.Possible Input:The above is the for loop code you just generated. I think the difficulty is a bit too high for beginners. Please simplify it.Possible Input:The above is the for loop code you just generated. I think it contains some advanced concepts beyond the intended scope. Please remove them.Possible Input:The above is the for loop code you just generated. I think it is not closely aligned with the topic. Please revise it to better fit the theme.
5.2. Experiment Design
5.2.1. Objective
5.2.2. Variables
- AI Models: ChatGPT-4o, DeepSeek-R1-7B, Llama3.2-1B.
- Prompt strategies:
- −
- Zero-shot: direct generation without additional reasoning or optimization.
- −
- CoT (chain of thought): applying step-by-step reasoning during generation.
- −
- BPO (black-box prompt optimization): performing up to 3 optimization calls per generation.
- −
- CoT + BPO: first applying CoT reasoning, then optimizing with BPO (maximum 3 calls).
- Difficulty levels: primary, intermediate, advanced.
5.2.3. Procedure
- For each combination of an AI model, a prompt strategy, and a topic difficulty level, generate outputs.
- For BPO-related strategies, continue optimization until either
- The score does not improve compared to the previous attempt; or
- The maximum limit of 3 calls is reached.
- Repeat each experimental condition 20 times independently to ensure statistical significance.
5.2.4. Evaluation Metrics
5.2.5. Statistical Analysis
- Calculate the average score and the average BPO call times across the 20 trials.
- Use the average score and the average BPO call times as the final performance indicator for comparison among different strategies and models.
5.3. Result and Analysis
Summary
- For top-tier models (e.g., ChatGPT-4o), prompt strategy has a limited influence on simple tasks but becomes critical in complex tasks.
- For mid-range models (e.g., DeepSeek-R1-7B), combining CoT and BPO yields the best performance across all difficulty levels.
- For lightweight models (e.g., Llama3.2-1B), BPO is necessary but insufficient to fully close the performance gap.
- BPO iterations should be carefully managed: unlimited iterations can significantly increase computation cost with diminishing returns, especially for small models.
5.4. Student Participation Testing and Evaluation
- EFP with source code manually selected by instructors;
- EFP with source code generated by different AI models with the highest scores from previous experiments.
- Difficulty: manually selected exercises were rated slightly higher (Cohen’s ), indicating a potentially noticeable difference in perceived difficulty.
- Correctness: both groups assigned identical average scores with overlapping confidence intervals, suggesting no meaningful difference (Cohen’s ).
- Topic relevance: AI-generated exercises performed slightly better (Cohen’s ), reflecting a moderate positive effect.
- Helpfulness: manually selected exercises had a moderate advantage (Cohen’s ).
5.5. Functional Testing and Evaluation
6. Application to Element Fill-in-the-Blank Problem Creation
6.1. Adopted Open Source Software
- Spring Boot: Spring Boot is an open-source framework used to simplify the development of Java-based applications. It provides a set of conventions and tools for building production-ready, stand-alone, and microservice-based applications. Spring Boot allows developers to focus on business logic, while it handles the setup, configuration, and dependencies of the application. It includes embedded servers like Tomcat, which means developers can run Spring Boot applications directly without needing external servers.
- jQuery: jQuery is a fast, lightweight, and feature-rich JavaScript library. It simplifies HTML document traversal and manipulation, event handling, and animation, making it easier to work with JavaScript. jQuery provides an intuitive syntax for tasks like DOM manipulation, Ajax requests, and cross-browser compatibility. Widely used in web developments, jQuery allows developers to create interactive and dynamic websites quickly and efficiently.
- Ollama: Ollama is a platform designed for deploying and running LLMs locally on personal computers. It enables developers to utilize LLMs in a wide range of applications without relying on cloud-based solutions, ensuring better data privacy and control over the models. Ollama supports various models, including Llama3, and allows for integration with AI-driven systems in multiple industries, including education, healthcare, and more.
- Docker: Docker is a platform that allows developers to package applications and their dependencies into containers, ensuring that the application works seamlessly across different environments. Containers are lightweight, portable, and consistent, which makes deploying and managing applications much easier. Docker simplifies software distribution, improves scalability, and provides isolation, making it an essential tool for modern DevOps practices and microservice architectures.
6.2. Software Architecture
- Frontend (jQuery) handles the user interactions, such as selecting a topic, modifying an AI-generated code, and managing problems.
- Backend (Spring Boot) manages API requests, processes user inputs, communicates with the AI model, and provides the necessary logic for evaluations and refinements.
- AI component (Ollama) generates a Java source code based on the selected topic and refines it according to user instructions.
- Containerization (Docker) packages the entire system into a Docker container, allowing for the easy deployment and installation with a single command across different environments.
6.3. Functional Overview
- Code generation based on topic selection:
- The user selects a Java programming topic.
- The back end generates the optimal prompt for the selected topic and calls Ollama AI to generate a Java code.
- AI-guided code modification:
- The user can provide specific modification requirements, such as simplifying the code or aligning it better with a given topic.
- The system sends the existing code along with the modification instructions to Ollama, which generates a refined version.
- This iterative process helps improve the quality and suitability of the generated code.
- Automated question generation:
- Once a valid Java code is generated, the system automatically creates an EFP instance using the blank element selection algorithm.
- Problem downloading:
- The user can download the generated source code and EFP instance for future use.
7. Conclusions
- Cross-model adaptability: we will investigate the applicability of the optimal prompt strategy across different AI models to develop a more general and robust optimization method.
- Multi-language code generation: we will explore the effectiveness of this approach in other programming languages, such as Python and C++, to further validate its applicability.
- Large-scale user testing: we will deploy the system in real educational settings, collect student usage data, and analyze the impacts of different prompt strategies on learning outcomes to further refine AI-assisted programming education.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Netguru. Is Java Still Used? Current Trends and Market Demand in 2025. Available online: https://www.netguru.com/blog/is-java-still-used-in-2025 (accessed on 18 February 2025).
- Ishihara, N.; Funabiki, N.; Kuribayashi, M.; Kao, W.-C. A software architecture for Java programming learning assistant system. Int. J. Comput. Softw. Eng. 2017, 2, 116. [Google Scholar] [CrossRef] [PubMed]
- Aung, S.T.; Funabiki, N.; Syaifudin, Y.W.; Kyaw, H.H.S.; Aung, S.L.; Dim, N.K.; Kao, W.-C. A Proposal of Grammar-Concept Understanding Problem in Java Programming Learning Assistant System. J. Adv. Inf. Technol. 2021, 12, 342–350. [Google Scholar]
- Funabiki, N.; Zaw, K.K.; Ishihara, N.; Kao, W.C. A Graph-Based Blank Element Selection Algorithm for Fill-in-Blank Problems in Java Programming Learning Assistant System. IAENG Int. J. Comput. Sci. 2017, 44, 247–260. [Google Scholar]
- OpenAI. ChatGPT. Available online: https://openai.com/index/chatgpt/ (accessed on 15 March 2025).
- Chen, B.; Zhang, Z.; Langrené, N.; Zhu, S. Unleashing the Potential of Prompt Engineering in Large Language Models: A Comprehensive Review. arXiv 2023, arXiv:2310.14735. [Google Scholar]
- OpenAI. GPT-4o: OpenAI’s Newest Model. Available online: https://openai.com/index/hello-gpt-4o/ (accessed on 15 March 2025).
- DeepSeek. DeepSeek-R1 Model on Ollama. Available online: https://ollama.com/library/deepseek-r1 (accessed on 15 March 2025).
- Meta. LLaMA 3.2 Model on Ollama. Available online: https://ollama.com/library/llama3.2 (accessed on 15 March 2025).
- Spring. Spring Boot. Available online: https://spring.io/projects/spring-boot (accessed on 15 March 2025).
- jQuery. jQuery: The Write Less, Do More, JavaScript Library. Available online: https://jquery.com/ (accessed on 15 March 2025).
- Ollama. Ollama: Run AI Models Locally. Available online: https://ollama.com/ (accessed on 15 March 2025).
- Docker. Docker: Empowering Developers to Build, Share, and Run Applications. Available online: https://www.docker.com/ (accessed on 15 March 2025).
- McGill, T.; Volet, S. A Conceptual Framework for Analyzing Students’ Knowledge of Programming. J. Res. Comput. Educ. 1997, 29, 276–297. [Google Scholar] [CrossRef]
- Altadmri, A.; Brown, N.C. 37 Million Compilations: Investigating Novice Programming Mistakes in Large-Scale Student Data. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education, Kansas City, MO, USA, 4–7 March 2015; pp. 522–527. [Google Scholar] [CrossRef]
- Gomes, A.; Mendes, A.J. An Environment to Improve Programming Education. In Proceedings of the 2007 International Conference on Computer Systems and Technologies, Rousse, Bulgaria, 14–15 June 2007; Volume 88, pp. 1–6. [Google Scholar] [CrossRef]
- Sorva, J.; Karavirta, V.; Malmi, L. A Review of Generic Program Visualization Systems for Introductory Programming Education. Acm Trans. Comput. Educ. 2013, 13, 15. [Google Scholar] [CrossRef]
- Medeiros, R.P.; Ramalho, G.L.; Falcão, T.P. A Systematic Literature Review on Teaching and Learning Introductory Programming in Higher Education. IEEE Trans. Educ. 2019, 62, 77–90. [Google Scholar] [CrossRef]
- Lindberg, R.S.; Laine, T.H.; Haaranen, L. Gamifying Programming Education in K–12: A Review of Programming Curricula in Seven Countries and Programming Games. Br. J. Educ. Technol. 2019, 50, 1979–1995. [Google Scholar] [CrossRef]
- Olsson, M.; Mozelius, P.; Collin, J. Visualisation and Gamification of E-Learning and Programming Education. Electron. J. E-Learn. 2015, 13, 452–465. [Google Scholar]
- Luckin, R.; Holmes, W. Intelligence Unleashed: An Argument for AI in Education; Pearson: London, UK, 2016. [Google Scholar]
- Chen, L.; Chen, P.; Lin, Z. Artificial Intelligence in Education: A Review. IEEE Access 2020, 8, 75264–75278. [Google Scholar] [CrossRef]
- Baidoo-Anu, D.; Ansah, L.O. Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. J. AI 2023, 7, 52–62. [Google Scholar] [CrossRef]
- Coding Rooms. Coding Rooms: Developer Training & Enablement Platform. Available online: https://www.codingrooms.com/ (accessed on 15 March 2025).
- Khan Academy. Khan Academy: For Every Student, Every Classroom. Real Results. Available online: https://www.khanacademy.org/ (accessed on 15 March 2025).
- Khanmigo. Khanmigo: An AI-Powered Tutor and Teaching Assistant. Available online: https://www.khanmigo.ai/ (accessed on 15 March 2025).
- GitHub Copilot. GitHub Copilot: Your AI Pair Programmer. Available online: https://github.com/features/copilot (accessed on 15 March 2025).
- Cheng, J.; Liu, X.; Zheng, K.; Ke, P.; Wang, H.; Dong, Y.; Huang, M. Black-Box Prompt Optimization: Aligning Large Language Models Without Model Training. arXiv 2023, arXiv:2311.04155. [Google Scholar]
- Agarwal, E.; Singh, J.; Dani, V.; Magazine, R.; Ganu, T.; Nambi, A. PromptWizard: Task-Aware Prompt Optimization Framework. arXiv 2024, arXiv:2405.18369. [Google Scholar]
- Fernando, C.; Banarse, D.; Michalewski, H.; Osindero, S.; Rocktäschel, T. PromptBreeder: Self-Referential Self-Improvement via Prompt Evolution. arXiv 2023, arXiv:2309.16797. [Google Scholar]
- Wint, S.S.; Funabiki, N. A proposal of recommendation function for element fill-in-Blank problems in Java programming learning assistant system. Int. J. Web Inf. Syst. 2021, 17, 140–152. [Google Scholar] [CrossRef]
- JFlex. JFlex: A Lexical Analyzer Generator for Java. Available online: https://www.jflex.de/ (accessed on 18 February 2025).
- JUnit. JUnit. Available online: https://github.com/junit-team/junit5/ (accessed on 18 February 2025).
- EvoSuite. EvoSuite: Automated Test Suite Generation for Java. Available online: https://www.evosuite.org/ (accessed on 27 April 2025).
- Niemeyer, P.; Knudsen, J. Learning Java; O’Reilly Media, Inc.: Newton, MA, USA, 2005. [Google Scholar]
- JavaParser. JavaParser: The Most Popular Parser for the Java Language. Available online: https://javaparser.org (accessed on 18 February 2025).
- Cao, Y.; Li, S.; Liu, Y.; Yan, Z.; Dai, Y.; Yu, P.S.; Sun, L. A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. arXiv 2023, arXiv:2303.04226. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- React. React. Available online: https://reactjs.org/ (accessed on 18 February 2025).
- Vue. Vue.js. Available online: https://vuejs.org/ (accessed on 18 February 2025).
Prompt Strategy | Difficulty | Avg.Score | BPO Calls |
---|---|---|---|
Zero-shot | Primary | 79.6 | - |
Zero-shot | Intermediate | 75.1 | - |
Zero-shot | Advanced | 65.6 | - |
CoT | Primary | 79.4 | - |
CoT | Intermediate | 78.6 | - |
CoT | Advanced | 66.3 | - |
BPO | Primary | 83.8 | 0.8 |
BPO | Intermediate | 79.5 | 1.5 |
BPO | Advanced | 71.9 | 2.5 |
CoT+BPO | Primary | 82.3 | 0.6 |
CoT+BPO | Intermediate | 81.4 | 1.6 |
CoT+BPO | Advanced | 73.5 | 2.3 |
Prompt Strategy | Difficulty | Avg.Score | BPO Calls |
---|---|---|---|
Zero-shot | Primary | 73.3 | - |
Zero-shot | Intermediate | 63.7 | - |
Zero-shot | Advanced | 62.1 | - |
CoT | Primary | 75.6 | - |
CoT | Intermediate | 64.2 | - |
CoT | Advanced | 63.9 | - |
BPO | Primary | 76.1 | 1.8 |
BPO | Intermediate | 66.3 | 1.9 |
BPO | Advanced | 64.7 | 3 |
CoT+BPO | Primary | 74.9 | 1.5 |
CoT+BPO | Intermediate | 67.6 | 1.6 |
CoT+BPO | Advanced | 65.8 | 2.9 |
Prompt Strategy | Difficulty | Avg.Score | BPO Calls |
---|---|---|---|
Zero-shot | Primary | 70.7 | - |
Zero-shot | Intermediate | 57.4 | - |
Zero-shot | Advanced | 44.3 | - |
CoT | Primary | 71.5 | - |
CoT | Intermediate | 59.5 | - |
CoT | Advanced | 51.8 | - |
BPO | Primary | 72.2 | 2.2 |
BPO | Intermediate | 61.5 | 3 |
BPO | Advanced | 55.8 | 3 |
CoT+BPO | Primary | 71.3 | 1.9 |
CoT+BPO | Intermediate | 61.8 | 3 |
CoT+BPO | Advanced | 58.3 | 3 |
AI Model | Task Difficulty | Score and Iterations |
---|---|---|
ChatGPT-4o | Primary | Score: 82.6, Iterations: 0.7 |
Intermediate | Score: 79.3, Iterations: 1.5 | |
Advanced | Score: 72.5, Iterations: 2.3 | |
DeepSeek-R1-7B | Primary | Score: 75.8, Iterations: 1.6 |
Intermediate | Score: 66.7, Iterations: 1.8 | |
Advanced | Score: 66.9, Iterations: 3.8 | |
Llama3.2-1B | Primary | Score: 71.2, Iterations: 1.9 |
Intermediate | Score: 65.3, Iterations: 4.3 | |
Advanced | Score: 62.8, Iterations: 6.7 |
Evaluation Aspect | AI-Generated Exercises | Manually Selected Exercises |
---|---|---|
Difficulty | 4.00 [4.00, 4.00] | 4.25 [3.90, 4.60] |
Correctness | 4.75 [4.40, 5.10] | 4.75 [4.40, 5.10] |
Topic Relevance | 4.50 [4.00, 5.00] | 4.25 [3.90, 4.60] |
Helpfulness | 4.00 [3.50, 4.50] | 4.25 [3.90, 4.60] |
Java Topic | CPU Time (s) | Lines of Code | BPO Iterations | EFP Blanks | Difficulty Score |
---|---|---|---|---|---|
Variable | 2.2 | 16 | 0 | 5 | 19 |
Control Statement | 3.4 | 22 | 1 | 8 | 32 |
Class | 4.3 | 35 | 2 | 7 | 35 |
Exception Handling | 5.8 | 40 | 2 | 9 | 60 |
Operators | 2.8 | 32 | 0 | 7 | 40 |
Collections Framework | 6.1 | 37 | 2 | 10 | 56 |
I/O Operations | 5.5 | 20 | 2 | 6 | 42 |
Arrays | 3.8 | 22 | 1 | 5 | 30 |
String Manipulation | 4.5 | 30 | 2 | 8 | 48 |
Interface | 5.1 | 44 | 2 | 13 | 70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, Z.; Funabiki, N.; Mentari, M.; Aung, S.T.; Kao, W.-C.; Lee, Y.-F. An Automatic Code Generation Tool Using Generative Artificial Intelligence for Element Fill-in-the-Blank Problems in a Java Programming Learning Assistant System. Electronics 2025, 14, 2261. https://doi.org/10.3390/electronics14112261
Zhu Z, Funabiki N, Mentari M, Aung ST, Kao W-C, Lee Y-F. An Automatic Code Generation Tool Using Generative Artificial Intelligence for Element Fill-in-the-Blank Problems in a Java Programming Learning Assistant System. Electronics. 2025; 14(11):2261. https://doi.org/10.3390/electronics14112261
Chicago/Turabian StyleZhu, Zihao, Nobuo Funabiki, Mustika Mentari, Soe Thandar Aung, Wen-Chung Kao, and Yi-Fang Lee. 2025. "An Automatic Code Generation Tool Using Generative Artificial Intelligence for Element Fill-in-the-Blank Problems in a Java Programming Learning Assistant System" Electronics 14, no. 11: 2261. https://doi.org/10.3390/electronics14112261
APA StyleZhu, Z., Funabiki, N., Mentari, M., Aung, S. T., Kao, W.-C., & Lee, Y.-F. (2025). An Automatic Code Generation Tool Using Generative Artificial Intelligence for Element Fill-in-the-Blank Problems in a Java Programming Learning Assistant System. Electronics, 14(11), 2261. https://doi.org/10.3390/electronics14112261