Next Article in Journal
Implementing Educational Innovation in LMSs: Hackathons, Microcredentials, and Blended Learning
Next Article in Special Issue
Beyond Answers: Pedagogical Design Rationale for Multi-Persona AI Tutors
Previous Article in Journal
Design of a High-Speed Pavement Image Acquisition System Based on Binocular Vision
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Manual to AI-Driven: Methods for Generating Mathematics and Programming Exercises in Interactive Educational Platforms

1
Łukasiewicz Research Network—Institute of Artificial Intelligence and Cybersecurity, Leopolda 31, 40-189 Katowice, Poland
2
Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44-100 Gliwice, Poland
3
Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
*
Author to whom correspondence should be addressed.
Appl. Syst. Innov. 2025, 8(6), 174; https://doi.org/10.3390/asi8060174
Submission received: 3 October 2025 / Revised: 5 November 2025 / Accepted: 12 November 2025 / Published: 18 November 2025
(This article belongs to the Special Issue AI-Driven Educational Technologies: Systems and Applications)

Abstract

The paper presents methods of applying AI to generate mathematical and programming exercises for the purpose of creating courses on an educational platform. Various challenges and advantages are highlighted and discussed in the context of a new interactive platform—Compass. The proposed learning methods based on user–platform interaction are described, along with the results of evaluations conducted among university students who learned with Compass.

1. Introduction

Contemporary education at secondary and higher levels requires supplementing traditional methods with modern, engaging, and student-friendly solutions. These solutions should integrate and organize knowledge and adapt the educational process to individual needs. In mathematics and programming education, students’ independent work is crucial. Increasing difficulties in knowledge acquisition [1] indicate the need to create didactic tools that provide broad access to educational materials, enable effective learning, and encourage interactions that deepen understanding of the material being studied. Education based on Internet technologies allows for greater flexibility and individualization of the teaching process, while also offering the opportunity to overcome time and social barriers to access high-quality education [2,3].
The aim of this work is to present methods for constructing solutions to mathematics and programming exercises for the remote education process. The process is developed and described within the framework of the new Compass educational platform (https://compass-edu.pl, accessed on 11 November 2025). A cognitive approach, focusing on understanding concepts and repeating exercises [4,5,6,7,8], was applied to improve educational practice and identify optimal learning methods. This approach stimulates three key mental activities to perform mathematical and programming tasks: thinking, remembering, and learning.
The paper presents an effective strategy for university-level education in mathematics and programming. This scenario-based approach integrates expert-guided interactive exercises, generated exercises supported by artificial intelligence (AI), theoretical components, and knowledge tests. The innovative interactive exercises support solving mathematical problems in multiple ways, offering contextual hints and error analysis based on user–platform interaction, stimulating students’ independent thinking. For selected exercises, AI generates new variations. The effectiveness of this learning support has been confirmed by exam results and student feedback collected through surveys.
We also propose a systematic method for interactive generation of programming exercises using large language models (LLMs). In contrast to previous studies [9,10,11,12], which prompted the models to generate exercises characterized by broad concepts such as “conditionals” or “loops”, our approach focuses on creating exercises closely related to specific units of a programming course. For that purpose, we leverage the potential of one-shot learning using preexisting human-written exercises as examples for the model. We ensure the relevance, difficulty, and quality of the generated problems through the human-in-the-loop strategy: the output of the model is reviewed by the instructor, who can accept the result or provide feedback to the model. Thus, we utilize the conversational capabilities of the LLM to iteratively improve its output.
In this study, we address two research questions. First, we investigate the possibility of automating the generation of exercise content and corresponding solutions using AI (RQ1). Second, we examine whether the proposed e-learning approach serves as an effective tool for improving mathematics education among university students (RQ2).
This study’s unique contributions are an effective strategy for university-level mathematics and programming education, and a new method for the interactive generation of programming exercises using an LLM prompted with broad programming concepts such as “conditionals” or “loops”.
The paper is organized as follows. Section 2 provides an overview of the existing interactive educational platforms for teaching mathematics and programming, including a comparative analysis of their features and functionalities. Section 3 reviews related work on the automatic generation of educational exercises. Section 4 describes in detail the structure and content of the Compass platform, encompassing both mathematics and programming courses, along with their respective interactive and generative components. Section 5 presents the evaluation of the generated programming exercises, while Section 6 discusses the results of the effectiveness assessment based on student performance and feedback. Finally, Section 7 concludes the paper and outlines the directions for future work.

2. Landscape of Interactive Educational Platforms for Mathematics and Programming

Over the past two decades, there has been a dynamic development of tools that support the educational process, including the teaching of mathematics and programming, at different levels of education, and across a wide range of topics. Numerous educational platforms have emerged that not only facilitate the understanding of theoretical concepts but also support the practical application of the acquired knowledge.

2.1. Educational Platforms for Mathematics

In the context of secondary and academic mathematics education, many of the currently available online educational platforms offer video lectures (e.g., Khan Academy (https://khanacademy.org, accessed on 22 September 2025), MITx Courses (https://edx.org/school/mitx, accessed on 20 September 2025), Coursera (https://coursera.org, accessed on 20 September 2025), OZE PWr (https://oze.pwr.edu.pl/, accessed on 21 September 2025)) as well as various text materials (e.g., Ważniak (https://wazniak.mimuw.edu.pl, accessed on 20 September 2025), ForMath (https://4math.ms.polsl.pl, accessed on 22 September 2025), Paul’s Online Notes (https://tutorial.math.lamar.edu, accessed on 21 September 2025)). Users also have access to exercises with complete solutions (Paul’s Online Notes, Obliczone.pl (https://obliczone.pl, accessed on 22 September 2025)), and some platforms offer additional hints and partial solutions on demand (e.g., Paul’s Online Notes, Ważniak). In addition, interactive exercises tailored to individual learning needs are available on platforms such as Brillant (https://brilliant.org, accessed on 21 September 2025), Khan Academy, MITx Courses, ForMath. An added benefit is the availability of tests and quizzes to assess the mastery of the material, allowing learners to track their progress over time and identify areas that may require further practice (e.g., Khan Academy, MITx Courses, ForMath). In addition, some of these platforms use AI algorithms to personalize the learning process by assessing users’ skills and adjusting the difficulty of exercises to match their individual abilities (e.g., Khan Academy, Brillant).
In analyzing the platforms that are functionally closest to the Compass system, it is worth considering Khan Academy [13], MITx Courses [14], and ForMath [15]. The first two offer courses in a variety of fields, including mathematics, computer science, and science at levels ranging from elementary to academic. ForMath, on the other hand, focuses specifically on mathematical analysis at the technical university level.
Khan Academy courses are available for free, while MITx on edX offers some materials for free, with full access requiring a subscription. ForMath, on the other hand, is available exclusively to students at the Silesian University of Technology in Gliwice, Poland. The language of instruction for MITx on edX is English, while Khan Academy primarily offers content in English, but also offers courses in other languages, including Polish.
In terms of presentation of theoretical content, Khan Academy and MITx on edX provide video lectures and text materials. Some Khan Academy courses are enhanced with animations and interactive components such as graphs. Both platforms integrate theoretical content with examples and exercises for independent practice. Answer-checking mechanisms include multiple-choice selections, input fields for numerical results, and interactive manipulation of elements with immediate feedback from the system. If the answer is correct, users can review the solution; if it is incorrect, the system signals an error and offers either full solutions or detailed hints (as in Khan Academy) or general guidance with the option to try again (as in MITx on edX). Full solutions on MITx on edX are only available in the paid version. Both platforms offer quizzes and tests to assess mastery of the material. Khan Academy uses AI to adaptively adjust the difficulty of problems and suggest additional practice, providing significant support for personalized learning.
ForMath (4Math) differs from the previously discussed platforms mainly in its “check, correct, and learn” approach in the problem-solving module. Its goal is to teach step-by-step, checking the correctness of each step of the solution, rather than just the final result, as in other popular systems.
The platform consists of two main components: theoretical materials in the form of e-books and an interactive problem-solving module. The core of the platform is the problem-solving section where, after reviewing the task, users can choose between two options.
In the first option, for those who have solved the problem independently, the user enters the result manually or selects it from a list of available choices. If the answer is incorrect, the system provides immediate feedback with a comment that analyzes the potential error. This allows the user to correct the answer independently. If the answer is correct, it is recommended to review the step-by-step solution using the second option.
The second option, for those who need help, guides the user through successive steps of the solution, each step requiring independent calculation or decision making, followed by selection of the appropriate option from a list or entry of the result into a designated field. Each step provides access to theoretical content in the form of hints and the ability to jump directly to the relevant e-book chapter. Through error analysis, hints, and well-structured comments, the platform maximizes user participation in problem solving and promotes the development of analytical and mathematical skills.
The Compass platform has been designed on the basis of years of experience in teaching mathematics and the knowledge gained by its creators during the development and use of the ForMath platform. It incorporates the benefits of ForMath while introducing a new framework and course components. A detailed description of the Compass platform can be found later in this article.

2.2. Educational Platforms for Programming

The demand for programming skills, both in educational and professional environments, contributed to a large number of programming courses available online [16]. Historically, a large part of them has been developed, curated, and hosted by educational institutions. Since the 1990s, those courses frequently offered innovative features, such as online compiler, automated solution assessment, and feedback [17,18,19,20]. Following the recent developments of LLMs there has been much interest in applying AI technology in the context of teaching computer programming. In the literature several examples of custom learning applications have been described in which AI was used to explain code [21], present step-by-step solutions [22], provide assistance to the learner [23], and generate exercises [12,24].
In the 2010s, massive open online learning (MOOC) platforms, such as Coursera or EdX, have gained a lot of popularity as a versatile tool for providing educational content, including computer programming courses [16]. In the past, it was observed that these platforms were insufficient for teaching programming, because they lacked features such as an online editor and compiler [16,19]. However, as the popularity of online education grew, the capabilities of online platforms also expanded. In this section, we present a brief overview of some of the popular online platforms that offer programming courses with a focus on interactive features and the application of AI.
The General-purpose platform Coursera (https://www.coursera.org/, accessed on 11 November 2025), in addition to the usual educational content, such as videos and quizzes, also offers features designed specifically for programming courses, such as coding in the browser. Coursera Labs offers three types of workspace (Jupyter Notebook, RStudio, or Visual Studio Code) to write and run programming exercises. The workspaces support assessment of the learner’s coding ability via Graded Labs, as well as Lab Sandbox for private practice. In 2023, Coursera introduced an AI assistant named Coursera Coach designed to provide personal guidance. Learners can use the assistant to ask for clarifications of the course material or engage in practice sessions with interactive questions and feedback. In addition, AI can be used to design immersive, dialogue-like course activities and to grade assignments. Coursera also offers help to course creators through Course Builder, which uses an AI assistant to help instructors setup their courses. The tool generates course structure based on the author’s input and offers a step by step guidance in that process.
Udemy (https://www.udemy.com/, accessed on 26 September 2025) is another general-purpose learning platform that supports teaching programming. Similarly to Coursera, Udemy courses include coding exercises and a built-in browser code editor where learners can write code, execute it, and run tests to determine if their solution is correct. From 2023, Udemy announced a set of AI features dedicated to both learners and course authors. The Udemy AI Assistant provides real-time guidance for learners, such as summarizing course content, answering questions, providing step-by-step explanations, and finding new courses. In particular, the assistant also offers help with programming exercises by generating code examples and helping with debugging. The platform also offers AI assistance for instructors who create programming activities. Based on the task description typed by the instructor, the assistant can generate a starter code, a complete solution, and an evaluation code. The instructor can then adjust the generated content according to their judgment.
Codecademy (https://www.codecademy.com/, accessed on 26 September 2025) specializes in information technology courses, such as programming, data science, and cybersecurity. Like the platforms mentioned above, it offers an interactive learning environment with code editor, which has been recently enhanced by the AI Learning Assistant. Similarly to the other solutions, the assistant can explain course content and the code, debug the learner’s code, provide hints, and answer follow-up questions. The assistant is closely integrated with the environment and highly contextual, which allows it to generate relevant answers without the need for much prompting on the learner’s part.
This brief overview of commercial on-line learning platforms reveals that the most popular applications of AI focus on conversational assistants providing personalized help to the learner in form of explanations, hints, bug fixes, and general support related to the learner’s educational path. Fewer platforms offer AI-backed tools for instructors, especially in the context of computer programming, with the notable exception of Udemy. However, the use of LLM-based tools for task generation is mostly limited to smaller-scale university solutions [12,24].

3. Related Work on Methods for Generating Mathematical and Programming Exercises

Creating a large number of programming or mathematical exercises can be challenging for teachers; therefore, there have been attempts to automate the process. The early solutions were based on templates filled with different sets of data [25,26]. The motivation behind this approach was usually to prevent students from copying solutions from each other [25] rather than to provide a richer set of exercises for a single user.
The development of LLMs has opened up new possibilities in both computer science and mathematics education. LLMs have demonstrated good knowledge of both natural and formal languages, making them effective tools not only for generating and explaining code, but also for producing new exercises [27]. In 2022 Sarsa et al. [9] explored the capabilities of the OpenAI Codex model for its ability to generate programming exercises and the corresponding explanations. They found out that “the majority of the programming exercises created by Codex were sensible, novel, and included an appropriate sample solution” and that “the explanations created by Codex cover a majority (90%) of the code, although contain some inaccuracies (67.2% of explanation lines were correct)”. Jordan et al. [10] conducted a similar study using GPT-3.5 to generate programming exercises in multiple languages and found that the quality of the problems generated in English, Spanish, and Vietnamese was generally high, while the problems in Tamil were mostly non-sensical.
Speth et al. [24] described the generation of programming exercises using ChatGPT in which the instructor interacted with the model and provided feedback to improve the quality of the responses. They and found out that using ChatGPT sped up the process of creating exercises, but manual changes to the generated problems were usually needed. Del Carpio Gutierrez et al. [11] used GPT-4 to generate and evaluate 500 contextualized programming exercises. They concluded that the generated problems “generally consisted of complete and sensible problem statements and accurate code solutions with good style”. The quality of personalized problems generated by GPT-4 was further examined by Logacheva et al. [12], who found that the quality was highly rated by both instructors and students.
In the domain of mathematics education, LLMs have similarly been explored for their potential to generate exercises and support learning. Liang et al. [28] introduced a method in which GPT-3 acts as a math tutor, generating customized word problems tailored to the current understanding of the student. This approach assesses the student’s performance and adjusts the difficulty and content of subsequent exercises accordingly, demonstrating improved learning outcomes compared to static problem sets. However, it requires meticulous prompt design to generate exercises, which inevitably requires human intervention.
To address the scarcity of large-scale mathematical datasets, Zhang [29] proposed Template-based Data Generation (TDG), using GPT-4 to create parameterized metatemplates. This approach enabled the synthesis of a large number of grade-school math problems with accompanying solutions, facilitating the training and evaluation of LLMs in mathematical reasoning tasks. However, this approach does not involve interaction with a user.
The generation of effective hints for math problems has also been a focus of research. Tonga et al. [30] explored the use of LLMs such as GPT-4o and Llama-3 to produce pedagogically sound hints within Intelligent Tutoring Systems. By simulating student errors and tailoring prompts to address specific misconceptions, the study demonstrated that AI-generated hints could significantly aid students in self-correcting their mistakes. However, the authors only used four exercises from different modules, which is not sufficient for a comprehensive analysis, even though each exercise was solved 40 times. The authors themselves noticed that the results may differ with other variants of the exercises within one module. Moreover, the lack of qualitative analysis of the generated hints is another limitation, as such an analysis is necessary to improve the overall assessment of the quality of the hints.
All of the aforementioned authors consider LLMs a useful tool for educators. However, despite advances, many challenges remain. Issues such as the generation of non-sensical or poorly formulated problems [9,10,12], errors in the proposed solution [9,10,12], and incorrect or missing code explanations [9] persist. In addition, concerns about students’ overreliance on AI-generated solutions have been raised, which could hinder the development of critical thinking and independent problem-solving skills [31]. Consequently, human oversight and intervention are deemed essential to ensure the quality and educational value of AI-generated exercises.
While LLMs have demonstrated strong capabilities in generating textual and code-based learning materials, other AI architectures are also proving effective at uncovering complex data patterns. For example, the study [32] illustrates how neural networks can capture higher-order relationships and temporal dependencies. Such approaches highlight the potential of advanced AI models to understand complex dependencies—an ability that could, in the future, help create more adaptive and data-driven educational content.
Our approach focuses on generating exercises of varying difficulty levels tailored to a specific course (thematic scope), including both interactive tasks (with contextual hints and teacher-supervised error analysis) and practice problems (created by analogy to the interactive ones), all accompanied by answers. The quality of the exercises is verified by experts.

4. Methods

The methods described in this paper are presented within the framework of the innovative and interactive educational platform Compass. The platform has been created in the Łukasiewicz Research Network—Institute of Artificial Intelligence and Cybersecurity (formerly: Łukasiewicz Research Network—Institute of Innovative Technologies EMAG) in Katowice, Poland, in cooperation with the scientists of the Silesian University of Technology in Gliwice, Poland. This is a comprehensive, engaging, and student-friendly teaching tool that supports remote learning and complements traditional teaching methods. The platform offers eight mathematics courses and one broad course of programming in the C++ language in two language versions: Polish and English. The courses are designed for high school students, university applicants, and students, as well as anyone interested in mathematics and computer science who wishes to expand their knowledge. The courses have been developed by university lecturers with extensive teaching experience. The authors of this paper are the academic creators of the substantive part of this solution.

4.1. Mathematics Courses

The eight mathematics courses cover topics related to both calculus and linear algebra. A thematic overview of the mathematics courses, divided into fields, is presented in Table 1.

4.1.1. Course Structure in Compass

The Compass course structure was developed following an in-depth analysis of how students acquire mathematical knowledge. This analysis was based on the results of the students who used the ForMath platform, as well as on the many years of teaching experience of the authors. ForMath has been designed as a collection of exercises with a theory component (e-books). It offers exercises of varying difficulty within different branches of calculus. ForMath was found to be a helpful learning tool that promotes good educational outcomes. However, our experience and analysis show that the best results in mathematics learning are achieved when, after becoming familiar with a given method through interactive exercises, students reinforce their skills by solving several or dozens of exercises of the same type on their own, to consolidate the method and the underlying concepts [33,34,35]. In response to these challenges, the Compass platform has been developed with a completely different course structure.
Each course in Compass consists of four sections. The first is a clearly presented theory that will help users learn or review material related to the given topic. Then interactive exercises are offered of varying difficulty levels. While solving interactive exercises, two options are available: checking the result and solving the exercises together, with the help of contextual hints, tips, and error analysis. To reinforce acquired skills, the third section includes generated exercises for self-practice. The last section contains tests that allow for assessing the level of understanding of the material covered and progress in the subject. Such an educational cycle ensures much better results. This will be demonstrated in Section 6.

4.1.2. Interactive Exercise Schema

After accessing the Interactive Exercises section of a given math course and selecting a specific exercise, the user has two options: either solve an exercise independently and check the result on the Check the result path, or solve it step by step with the help of the system on the Solve path (see Figure 1).
If the user chooses Check the result path, they will be presented with a screen showing a list of possible answers or a field to manually enter their answer. After making a selection or entering a result, the system provides immediate feedback. If the answer is correct, the user receives an appropriate comment and a suggestion to click the Solve button to verify the solution by analyzing it step by step, or to select the option New exercise and proceed to the next exercise. If the answer is incorrect, the system generates a comment that analyzes the likely source of the error, allowing the user to correct the error independently. The list of available answers is tailored to the most common errors, allowing the system to suggest to which aspects of the solution the user should pay special attention. The list usually also includes the answer “other than the above”. The comment associated with this option helps to focus the user’s attention on potential problem areas in the solution where errors frequently occur and highlights elements that require further review. If multiple attempts do not result in the correct answer, the user can switch to the Solve path, where the system guides the user step-by-step through the solution.
The solution of the exercise on the Solve path is divided into steps, each of which corresponds to a separate screen. As the user progresses through each step, each new screen displays the exercise content, along with the parts of the solution that have been completed so far and additional explanations on demand. The screens also offer contextual hints on demand, including suggestions, theorems, and definitions. The incorporation of these theoretical components and additional explanations is crucial, as the objective is not only to demonstrate the method for solving an exercise but also to facilitate the user’s comprehension of the rationale behind the correctness of a specific method. At the end of each screen, the interaction is mandatory: the user enters or selects the result of partial calculations. In a manner analogous to the Check the result path, the list of available answers includes common incorrect solutions, so that immediate feedback allows the user to recognize and correct errors right away and solve the exercise as independently as possible. In the event that the user repeatedly enters an incorrect result, despite the comments and hints available, the user is taken to the next screen where detailed calculations are provided. If, at any point while solving the exercise, the user feels confident in solving it independently, they can use the Check the result button to proceed to the result verification screen. If the answer is incorrect, the user can return to the previous steps and continue solving the exercise with the help of the system. Some exercises or their parts may have more than one solution method. In such cases, the system allows the user to choose the method, and the solution path branches accordingly to accommodate the chosen approach.
In the following section, details of the various actions taken by the user and the responses generated by the system are presented, based on a specific exercise.

4.1.3. Student Interactive Exercise Interface

This section presents the student interface for interactive exercises. The selected screens that the user sees while solving an interactive exercise will be presented using a chosen example from the course Equations and Inequalities. Figure 2 shows the first screen with the problem statement, with the option to choose: Solve or Check the result.
Figure A1 (Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6 and Figure A7 are included in Appendix A) shows an example screen on the path Check the result, with the incorrect answer highlighted and a comment. The incorrect answer is marked in red, while the correct one is marked in green.
The following figures present screens on the Solve path, displaying selected hints, tooltips, and comments. In Figure A2, we see the first step of the Solve path with a hint displayed in the selected exercise.
In the second step shown in Figure A3 and Figure A4, the user selects an item from the dropdown list and fills in the empty field by entering an expression using a special mathematical toolbox that facilitates the input of formulas and symbols. Figure A4 shows a comment on an incorrect answer with a dropdown list prepared for reselection.
Figure A5 shows the step in which the user can choose how to continue solving the exercise.
In the next step (Figure A6), the user independently determines the values, which should then be entered into the blanks. The user can use the information provided in the tooltip (by clicking on the underlined word—the tooltip anchor, a tooltip appears on the screen) and in the Hint.
The last screen (Figure A7) contains the complete solution to the exercise using the chosen method. The user can see another way to solve the task (in addition to the two paths shown in Figure A5) by clicking on Solution or selecting a different exercise from the list of exercises by clicking on New exercise. The New exercise button directs the user to the list of all interactive exercises available in the given course.

4.1.4. Manual and Semi-Automatic Exercise Generation

Interactive exercises support the understanding of problem-solving methods within a specific topic by highlighting best practices and helping to avoid the most common mistakes. To reinforce the skills acquired, the student should independently solve at least a few exercises of a given type. Therefore, in all mathematics courses, a Generated Exercises section has been created that covers several to a dozen types of exercise, most of which correspond to those discussed in the Interactive Exercises section. Each exercise type includes a set of dozens of examples, which are randomly selected at the user’s request.
Templates for the interactive exercises were prepared in the LaTeX environment. The TeX files of the tasks were used to create JSON files, which were then imported into the edX platform. The interactive exercises were created entirely manually, although attempts were made to incorporate AI into the process. Unfortunately, the versions of ChatGPT-4 available at that time did not meet quality standards. Too many errors appeared in the solutions to more difficult exercises and in interactions with a user. For example, the study [28] presents the application of LLMs to very simple mathematical tasks, demonstrating the potential of this approach. However, based on our attempts, it appears that at the current stage of AI development, it is still not possible to obtain fully reliable results for the proposed multi-step interactive tasks involving real-time error analysis and contextual hints at the level of higher mathematics. Given the rapid development of this technology, such attempts will certainly resume in the near future.
However, AI was used to create the sections Generated exercises. Templates for the generated exercises, similar to the interactive ones, were prepared. These templates include parameters that are replaced with specific values from a predefined set upon the user’s request. For this purpose, a database was created that contains task data for each type of exercise generated. These data serve as a base from which sets of parameters are randomly selected to generate additional examples for the user when they click the Generate exercise button. Some of these data (for the calculus part) were prepared using ChatGPT. The chatbot generated examples, and a human verified that the coefficients were reasonable enough to make the solutions “student-friendly”. An exemplary list of types of generated exercises in the Inverse Function course is presented in Figure 3.
Each type of generated exercise includes a set of several dozen examples, randomly generated upon the user’s request. These sets were developed by academic teachers involved in the creation of courses, using different methods depending on the type of exercise. The generation of examples (both statements and solutions) for exercises in calculus (5 courses) was supported by AI. Most of the algebra tasks (3 courses) were generated in Wolfram Mathematica using specially designed scripts. However, AI-generated examples often produce data that are “unfriendly” to the user. When developing problem sets and academic textbooks, authors place great emphasis on creating examples with numerical data within the range of rational numbers. The goal is to practice problem-solving methods rather than perform complex calculations. Therefore, in the next step, the examples proposed by AI were manually verified or checked in Wolfram Alpha and, in many cases, corrected. This semi-automatic approach to generating examples for generated exercises yielded the best results in the creation of math courses. To further verify the correctness of the comprehensive data, symbolic computation software was also used.

4.2. Programming Course

4.2.1. Aspects of Programming Education

Teaching computer programming at the introductory level has been described as a complex task due to the wide range of skills that students need to acquire. Students are expected to simultaneously learn the features of the programming language and develop general problem solving skills. From the pedagogical perspective the latter seem more perplexing: while some have proposed teaching problem solving explicitly by presenting useful schemata, others suggest that those skills are learned mostly in practice. In particular, frequent and relatively simple programming exercises have been proposed as a way to put programming knowledge into practice, instill problem solving skills, provide feedback and encourage students [36,37,38].
Those considerations apply both for traditional and online programming courses, including MOOCs, whose creation has been facilitated by platforms, such as Coursera or EdX. Early versions of those platforms were sometimes described as inefficient for the proposed hands-on approach to teaching programming as they did not support running and testing code. For that reason, an integrated online editor has been proposed in many university courses [16,19] and later adopted by mainstream platforms. The advantage of the online editor is that the student does not need to setup the environment themselves on their machine. Another convenience is that the student code can be immediately checked by the system.
Proper automated assessment of student’s program is not a trivial task. Pieterse [39] mentions the following factors that contribute to successful automatic assessment in MOOCs: high quality of assignment, clear formulation of the problem, well chosen test data, good feedback, unlimited submissions, and others. One of the problems occurs when the task for the student is ambiguous: the student may interpret the task in a different way, which can make an automated system reject an otherwise valid program. Therefore, problems must be specified carefully to avoid ambiguity [16,39].
Several methods for automated evaluation of student code have been proposed. One of the simplest is I/O testing: the program is given an example of input data, and then its output is compared to the expected output. Advantages of this approach include simplicity, versatility, and independence of the programming language. The drawback of this approach is that it does not consider the qualities of the solution other than its output [16,25]. Some have proposed assessing using industrial-grade testing tools, such as xUnit and acceptance testing frameworks such as Cucumber or FitNesse [16,40]. In addition, assessing the student’s code has also been proposed, e.g., checking the readability and complexity [41].

4.2.2. Selection of Programming Language and Course Scope

Firstly, we needed to decide on the programming language used throughout the course. One of the reasons for selecting a particular language may be its popularity. According to the TIOBE index, the most common language in 2025 was Python, followed by C++ and Java [42]. According to IEEE Spectrum 2024 ranking, Python also took the top spot, with Java, JavaScript, and C++ coming next [43].
Another criterion for selecting a language is its suitability for beginners. It is suggested that the first language be easy to understand and not overload the learner with syntax. Because of that, the Python language is nowadays often regarded as the most suitable choice for an introductory course among general purpose languages [44]. Its advantages include simple and concise syntax, code readability enforced by mandatory indentation, and easy to use advanced data types such as lists, sets, and dictionaries [45]. In contrast, a recent study compared the behavior of students solving the same lab tasks from analogically structured courses in C++, Java, and Python and found no difference in the time spent or number of wrong attempts between students of each language [46]. In general, there is no general consensus on the most suitable language for beginners and the reasons for choosing one language over another are described as subjective [38].
Notwithstanding the recent trends, our platform was primarily addressed to the students of Polish universities; therefore, we had to consider the most popular languages taught there during the first semesters. We conducted a survey of the curricula of introductory programming courses required in majors related to computer science and mathematics at the following universities: University of Warsaw, Jagiellonian University in Kraków, University of Wrocław, Adam Mickiewicz University in Poznań, University of Łódź, Silesian University of Technology, AGH University of Kraków, Wrocław University of Science and Technology and Warsaw University of Technology. The results are presented in Table 2.
Our findings suggest that C and C++ still dominate introductory courses in Polish universities. Because of that, we chose C++ for our course. One of the advantages of C++ is machine code compilation, which makes this language a popular choice whenever high performance is required. This also makes the language suitable for embedded systems, which are often part of the curriculum in more engineering-oriented universities. At the same time, C++ (as opposed to C) supports multiple paradigms, such as procedural, object-oriented, and meta-programming with templates [47].
Since the publication of C++ in the 1980s, numerous features have been added to the language that had the effect of recommended coding practices. The most important from the beginner’s point of view concerns memory management. In the past, it was customary to use low level features, such as C-style arrays and strings, raw pointers, and the new and delete keywords. Today, such constructs are discouraged in favor of STL containers, std::string type, and smart pointers [48]. Changes in coding practice are reflected in the proposed modern curricula [47,49].
At the same time, the aforementioned low-level features are sometimes difficult to avoid. For example, the second argument of the main function includes an array of C-style strings: int main(int argc, char** argv). Such an interface can also be encountered in system function and libraries, such as OpenCV, OpenGL, or QT. As a compromise, the authors suggest introducing the recommended modern practices first and showing the low-level mechanisms later with the necessary context [47,49].
Having those considerations in mind, we designed our course aimed at C++ novices. The course is divided into small units that consist of theoretical introduction with code examples followed by a programming exercise that requires writing a program or a part of it. We kept the theory part of each unit brief but informative, and we designed the exercises to cover most of the issues from the introduction.
The first exercises on each topic focus on new language features and syntax, later problems require more algorithmic thinking. The exercises often build upon the knowledge from the previous units, so the course should be done in the prescribed order. The units of the course are grouped into the following chapters:
  • Program structure: main function, standard output stream, comments.
  • Variables, types, and operators: int and double types, representation of variables in the memory, arithmetic operators, standard input stream.
  • Conditional instructions: if, else and else if, bool type and boolean operators.
  • Loops: while, for, assignment operators like +=, computations with loops (Fibonacci sequence, factorial).
  • Arrays: std::vector, indexing operator, list-initialization, range-based for, selected std::vector member functions, manual implementation of searching and sorting.
  • Functions: function definition, function call, return statement, pass-by-value vs pass-by-reference, program modularization with functions, recursion.
  • Strings: std::string and char types, similarities between string and array types, string operators, and member functions.
  • Structs: creating structural types, object initialization, and member access.
  • I/O Streams: std::ostream and std::istream base types, file streams, output formatting, handling of invalid input.
  • Algorithms: functions from the <algorithm> library, lambda expressions.
  • Pointers: only in this chapter are the following features introduced: pointers, C-style arrays and strings, new and delete; memory management issues like dangling pointers and memory leaks; comparison with modern practices.
  • Introduction to object oriented programming: starting from the plain structs we introduce the following OOP concepts: member functions, private and public members, constructors, destructors and the RAII idiom.

4.2.3. Exercise Schema

Figure 4 shows the typical flow of student’s interactions in a single unit of the course. At the beginning, the student is shown the theoretical introduction with examples. After reading that part, the student proceeds to the programming task. If the student is able to solve the task, they begin to write the solution in the editor integrated with the platform.
The student can try out their program by clicking the Run button, which compiles the program, runs it, and presents the program’s output. When the student decides that the solution is ready, they click Check the solution. The platform then runs tests to verify whether the program is correct.
Most of the tasks are checked using input/output tests: the program is provided with input data, and then its output is compared with the expected one. On average, about 3 test cases are prepared for each unit. The input data in all cases is valid and unambiguous. Invalid input could distract students from the main focus of the task; therefore, test cases do not include such data. We also avoid ambiguous edge cases where the expected output is not immediately clear from the task description or where different equivalent results are possible. Sometimes, the code is also checked to verify whether expected constructs were used. When all checks are successful, the unit is marked as solved. In case of an error, the system provides feedback with examples of failed test data.
If the student needs help, they can display hints that offer step-by-step guidance at any time. Ultimately, if the student still cannot solve the task, they can click Show solution, which displays an example solution along with an additional explanation.

4.2.4. Student Exercise Interface

Figure 5 presents the student’s interface. The left column initially has two tabs: Introduction with the theoretical introduction and Task with task content. The Task tab includes a Show hint button which shows another tab with hints. The right column contains a code editor on the top. It is initially filled with the necessary #include directives and the empty main function. It may also include some parts of the solution in the case of more complex tasks.
The Run button sends the program to the server, which compiles the program, runs it, and responds with the output of the program. The tabs at the bottom are used to present information related to the input and output of the student’s program. The STDIN tab contains a multiline text input that is used as the program’s standard input after clicking Run. The standard output of the program is displayed in the Program Output tab, while the Compilation tab shows the compiler’s output, including any compilation errors.
After clicking the Check solution button, the system checks the student’s solution. The results are presented in Solution validation tab. In particular, it contains the test cases used during I/O testing with the input and expected output. The failed tests are colored red.
The Hints tab includes a Show Solution button, which causes the solution to appear in the editor on the right. It also adds a new tab to the left column with the explanation of the solution.

4.2.5. Automatic Exercise Generation

We used LLMs in our course to generate additional exercises for some of the units. We excluded the units where new concepts were in the middle of being introduced, as the generated exercises would likely include the issues not addressed yet in the course. As described in Section 4.2.3 the basic unit consists of the following parts: theoretical introduction, problem statement, hints, example solution, explanation of that solution, and test data. The generated exercises share the introduction, while all the other elements are generated separately for each generated exercise.
For generation, we used the GPT-4 model using OpenAI chat completions API. Our approach to prompt engineering can be described as one-shot learning [50]: for example, when we generated the problem statement, we provided the corresponding part of the human-created exercise as an example. To facilitate the generation, we wrote a command-line tool in Python that automatically reads the examples, constructs the messages, saves the responses, and allows the user to converse with the model. The process of generating tasks using this program is presented in Figure 6. Table 3 shows the initial messages sent to the API while generating new problem statements. The left column presents the role of each message, which can be:
  • system (or developer in newer models): messages instructing model how to respond to messages with role user.
  • user messages from a user of some system.
  • assistant messages assumed to be generated by the model. They can be actual previous responses of the model or examples utilized in few-shot learning.
The API responds with the problem statement of a new task. The interactive tool saves the response to a file for manual examination by the instructor, who can accept the problem or ask the model to change something. In the latter case, the instructor should write a message to the model explaining what it should change about the problem. The previous problem statement generated by the model and the instructor’s answer is added to the message list presented in Table 3 and all messages are sent back to the model. This enables interactive conversation with the model in the style of ChatGPT. Table 4 shows the messages that could be exchanged with the model in such a conversation. The first four messages are the initial prompt presented in Table 3, the following assistant messages are the responses of the model, and the user messages are typed by the instructor.
The other parts of the task (solution, test data, hints, and solution explanation) are also generated using one-shot learning using the elements of the base task as an example. Table 5 shows an example of the generation of a task solution. Like in the previous example, the first four messages are sent at once when the generation is started, the subsequent messages are model responses and instructor remarks.
The flow of communication when generating the test data is analogous to generating the solution, but the model is asked to generate test data in JSON format, and an example of expected output format is provided in the first assistant message. When the solution and test data are ready, the interactive tool can check whether running the solution for the provided test input gives the expected output. The results of this validation are presented to the instructor. In case of a failed check, the instructor must examine both the solution and the test data and correct the error manually or by generating the wrong element once again.
To generate relevant hints or explanations of the solution, both the problem statement and solution need to be generated first. The reason is that generating appropriate hints requires knowledge of both the problem and the expected final result. The same is also true for the generation of solution explanations. Table 6 presents the messages exchanged during the generation of hints.
To ensure consistency between the different language versions of the course, all elements were generated in English and then translated into Polish. That task was also carried out by the GPT-4 model. The first, system-role message instructed the model to translate the element into Polish and the following user-role message was the element to translate. Those two messages were sent to the model, which responded with the translated text.

5. Evaluation of Generated Programming Exercises (RQ1)

We have generated programming exercises using the method described in Section 4.2.5 for selected units from all chapters of the course except “Program structure” and “Pointers”. Some tasks in each unit were evaluated by an independent expert. Table 7 presents the total number of exercises evaluated for each subject.
Because exercises can be iteratively improved, the final quality is highly dependent on the participation of the instructor who supervises the generation. For the sake of evaluation of model capabilities, we tried to keep the role of the human minimal; nevertheless, in some areas human guidance was indispensable.
Firstly, the instructor was actively involved in the generation of problem statements, since the model frequently generated unsuitable problems that were trivial, too difficult, or required using concepts not introduced yet in the course. The instructor skimmed each generated problem statement in search of the aforementioned issues and asked the model to modify the problem when necessary. However, the instructor did not particularly check the output for other errors, such as syntax errors or wrong examples of expected output.
Secondly, the instructor also briefly examined the generated solutions to determine if they do not include constructs that could be unfamiliar to the student. When such a situation occurred, the instructor provided feedback to the model with the request to create simpler code.
Lastly, the instructor also intervened if the automatic check of the generated solution and test data failed, i.e., the program did not produce expected output for one of the test inputs. In this scenario, the instructor manually fixed the code or test data.
Apart from that, the instructor kept the content as it was generated by the model. The generated tasks were then sent for evaluation to an expert who did not participate in the generation. The expert examined the problem statement, solution, and hints. In case of issues they wrote remarks related to each part in free form. Based on those responses, we categorized the quality of the elements according to the following criteria:
  • correct: the expert found no issues other than language errors that do not impact the understanding of the text.
  • partially correct: the element had some errors that could be fixed without substantial changes to the text. Examples include: syntax errors in examples, incorrect examples of program output, confusing instructions or hints, and discrepancies between that and other elements of the task.
  • incorrect: the text was mostly erroneous, nonsensical or was something else than expected, e.g., the model wrote a code of a program when it was asked to generate a problem statement.
Table 8 presents the number of correct, partially correct, and incorrect task elements in each language. In terms of problem statements, all statements in English were sensible, which can be attributed to manual supervision by the instructor. However, eight of them contained errors, such as syntax errors in code snippets, wrong examples of expected output, or confusing instructions.
The number of errors in the problem statement increased to 14 after translating into Polish. Most of those additional errors were discrepancies between the instruction and the solution caused by the model translating the English words in the expected output to Polish. Our intention was to keep the output messages the same regardless of the language, and we included that instruction in the model prompt; nevertheless, the model occasionally translated those words anyway. One of the translated problems was classified as incorrect because the model wrote the code in Python instead of translating it.
The solution code was valid in all cases. The correctness of this part was also ensured by the generation procedure described previously. One of the solutions was classified as partially correct because it included unnecessary variables that were manipulated in the code but finally did not influence the program output.
In contrast, the generation of hints was not supervised by the instructor; nevertheless, most of them were correct as well. The hints deemed partially correct had issues such as different function names than in the solution, errors in code snippets, or otherwise misleading instructions. The one hint described as incorrect contained an excessive amount of code snippets that together formed the entire solution.
The evaluation of the generated elements (Table 8) indicates that creating exercises using an LLM supported by selective expert intervention enables the production of appropriate educational content (87.5% correct problem statements, 98.4% correct code solutions, and over 90% correct hints in the English version). These results support research question RQ1, confirming that LLMs, when combined with expert oversight, can effectively assist in the automation of exercise content generation. At the same time, a slightly lower quality was observed for the Polish version compared to the English one (76.6% correct Polish problem statements versus 87.5% in English). The weaker performance of the Polish version was mainly due to translation errors, including the unintended translation of code elements and output messages. Therefore, additional translation validation and refinement of translation prompts may be necessary to ensure consistent quality across languages.

6. Assessment of Effectiveness (RQ2)

The Compass platform has proven to be a very effective educational tool to support student learning. The results of first semester exams for the fields of study: Automatic Control and Robotics, Electronics and Telecommunication (with lectures in Polish), and Control, Electronics and Information Engineering (MAKRO, with lectures in English) were tested (the authors would like to express their gratitude to academic teachers Iwona Nowak, Ewa Łobos, and Marek Żabka for providing the data for the analysis). About 80–90% of the evaluated students, depending on the field of study (the total number of the evaluated students from the 3 fields of study was equal to 256), used the Compass platform, which was confirmed by surveys conducted among the students. The exams covered topics developed on Compass, focusing on mathematical analysis and linear algebra. In all cases, the results for the academic year 2024/2025 were higher than in the previous year, when only Formath was available to students. Table 9 shows the pass rates for exams in these years. The results are all the more promising given that, in the case of the control group of students in Automatic Control and Robotics taking the algebra course, the exam results were lower than in the previous academic year, 85.1% and 81.11%, respectively. This indicates a lower level of this year’s students. This fact is also confirmed by the lower admission thresholds for the mentioned fields of study during the university admission process for the academic year 2024/2025.
Furthermore, the results of the survey show that the students willingly use the Compass platform (Table 10). The students found the platform to be an excellent supplementary learning aid, particularly helpful in preparing for class tests and exams (Table 11). Opinions about the platform are overwhelmingly positive, for example, “I hope it is available in as many courses as possible and from every subject”, “If the materials for each subject were as well prepared as they are on Compass, learning would be much more effective and enjoyable”, “The platform is excellently designed and very helpful in mastering new material”, “More subjects in studies should use this platform”. Some suggestions were also made for possible improvements, for example, adding an autocomplete function for input fields or making it easier to navigate between courses. Additionally, many suggestions were made for additional courses on the platform—integrals, partial derivatives, differential equations, linear spaces, and dynamics of systems. In response to these expectations, a course on indefinite integration is currently being developed.
To assess whether the overall improvement in student performance between the academic years 2023/2024 and 2024/2025 was statistically significant, a one-tailed difference-in-proportions test (one-tailed Z-test) was conducted using the aggregated pass rates presented in Table 9. This analysis combined data from all examined courses and fields of study to evaluate the general year-to-year trend in examination outcomes. The resulting statistics are summarized in Table 12, which presents the pass rates for both years, the absolute difference, 95 % confidence interval (CI), and corresponding significance test values. The results confirm a statistically significant overall improvement in pass rates ( z 2.06 , p 0.019 ), indicating that students in 2024/2025 performed better on average than those in 2023/2024. This finding supports research question RQ2, indicating that the implementation of the Compass platform, serving as an additional educational tool, contributed to improved academic outcomes.

7. Conclusions and Future Work

The effective strategy for university-level education in mathematics and programming has been described in this paper. Our approach integrates interactive exercises guided by experts, AI-supported generated exercises, theoretical components, and evaluation tests. The innovative interactive exercises support solving mathematical problems in multiple ways, offering contextual hints and error analysis based on user–platform interaction, stimulating students’ independent thinking.
The improved results of the students who used the new educational platform highlight the importance of fostering independent work in mathematics education while providing support when needed. The offer of ready-made solutions does not create such opportunities. Naturally, similar conclusions arise from the authors’ many years of teaching experience. The approach to teaching mathematics adopted on the Compass platform was inspired by the constructivist theory of learning, which emphasizes student engagement and motivation, as well as the principle that “the more we know, the more we can learn”, implemented through careful selection of tasks and a gradual increase in their difficulty. Our study shows that courses including generated exercises are received positively by the students, confirming the findings of Logacheva et al. [12]. The results of an aggregated statistical comparison of pass rates between the academic years 2023/2024 and 2024/2025 confirm a statistically significant overall improvement in student performance, supporting research question RQ2, which posits that the introduction of the Compass platform contributed to improved exam outcomes. Naturally, the statistics presented in Table 12 illustrate the observed phenomenon in a simplified manner and warrant further investigation. More in-depth statistical analyses are planned after the platform has been in use for several years, once larger datasets become available.
The research carried out has also shown that AI is an effective tool for generating new exercises in both programming and mathematics. For this purpose, the one-shot learning technique was used, employing existing exercises written by specialists as examples for the model. Then, using a human-in-the-loop strategy, an expert verified the model results, either by accepting the result or providing feedback to the AI. The entire process was iterative, gradually improving the generated exercise. These findings address research question RQ1, demonstrating that the automation of exercise generation using AI is feasible when supported by expert supervision and iterative refinement. The involvement of an expert in the given field remains a crucial element of this process. The main problems with the AI-generated contents include: generating problems that ware too easy or too difficult or not focused enough on the subject and occasional errors in the proposed examples or solutions, and minor translation issues affecting the consistency of non-English versions.
Despite the need for manual verification, the usage of AI accelerates substantially the process of generating exercises, their solutions and explanations, as noted previously by Denny et al. [27]. For that reason, the developed solution can have great value for educators who otherwise would not be able to prepare detailed teaching resources because of limited time. We acknowledge that the assessment of the proposed generation tools is limited due to small number of instructors and testers involved in the process. Future experiments including more educators could provide additional insights on the usefulness of the solution.
Future work will develop in two directions. New courses will be created for the Compass platform and additional exercises will be added to existing ones. We are currently developing a course on the indefinite integral. In addition, we plan to conduct more detailed research on the effectiveness of AI-generated exercises that would involve a greater number of instructors interacting with the model and experts evaluating the generated exercises. This would enable gathering more robust statistics and more diverse insights regarding the process of generation and evaluation, such as average number of interactions with the model, number of manual corrections, inter-rater agreement and the most common issues with the generated content. The quality of generations could be further improved by refining prompt engineering, implementing more advanced models, and developing tools for automated validation. Increased quality of automatic generation would enable other research directions, such as generating personalized exercises based on student usage data.

Author Contributions

Conceptualization, B.S. and Ł.W.; methodology, B.S. and Ł.W.; software, D.B.; formal analysis, D.B. and B.S.; investigation, D.B.; resources, D.B.; writing—original draft preparation, D.B., J.M., B.S. and Ł.W.; writing—review and editing, D.B., B.S. and Ł.W.; visualization, D.B. and B.S.; supervision, B.S. and Ł.W.; funding acquisition, B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Silesian University of Technology, Poland, under grant number BK231/RMS3/2025. The Compass platform has been created under agreement No. MEiN/2023/DPI/1869 financed by the Ministry of Education and Science, Poland.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The described courses are openly available on Compass platform at https://compass-edu.pl (accessed on 11 November 2025). The code used for generation, example data and logs from generation are available at https://github.com/Lukasiewicz-EMAG/compass-edu-ai (accessed on 11 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Figures for Section 4.1.3.
Figure A1. User interface of the math interactive exercise—answer selection on the Check the result path.
Figure A1. User interface of the math interactive exercise—answer selection on the Check the result path.
Asi 08 00174 g0a1
Figure A2. User interface of the math interactive exercise—the first step of the Solve path with a hint displayed.
Figure A2. User interface of the math interactive exercise—the first step of the Solve path with a hint displayed.
Asi 08 00174 g0a2
Figure A3. User interface of the math interactive exercise—the second step of the Solve path with a math toolbox displayed.
Figure A3. User interface of the math interactive exercise—the second step of the Solve path with a math toolbox displayed.
Asi 08 00174 g0a3
Figure A4. User interface of the math interactive exercise—the second step of the Solve path with a dropdown list displayed.
Figure A4. User interface of the math interactive exercise—the second step of the Solve path with a dropdown list displayed.
Asi 08 00174 g0a4
Figure A5. User interface of the math interactive exercise—the option to choose the method of continuing to solve the exercise.
Figure A5. User interface of the math interactive exercise—the option to choose the method of continuing to solve the exercise.
Asi 08 00174 g0a5
Figure A6. User interface of the math interactive exercise—the step of filling in the blanks and displaying the tooltip.
Figure A6. User interface of the math interactive exercise—the step of filling in the blanks and displaying the tooltip.
Asi 08 00174 g0a6
Figure A7. User interface of the math interactive exercise—the last screen of the exercise with the full solution.
Figure A7. User interface of the math interactive exercise—the last screen of the exercise with the full solution.
Asi 08 00174 g0a7

References

  1. Łobos, E.; Macura, J. Mathematical competencies of engineering students. In Proceedings of the International Conference on Engineering Education ICEE-2010, Gliwice, Poland, 18–22 July 2010. [Google Scholar]
  2. Schwerter, J.; Dimpfl, T.; Bleher, J.; Murayama, K. Benefits of additional online practice opportunities in higher education. Internet High. Educ. 2022, 53, 100834. [Google Scholar] [CrossRef]
  3. Zou, Y.; Kuek, F.; Feng, W.; Cheng, X. Digital learning in the 21st century: Trends, challenges, and innovations in technology integration. Front. Educ. 2025, 10, 1562391. [Google Scholar] [CrossRef]
  4. Pea, R.D. Cognitive Technologies for Mathematics Education. In Cognitive Science and Mathematics Education; Taylor & Francis Inc.: Hoboken, NJ, USA, 1987. [Google Scholar]
  5. Ritter, S.; Anderson, J.R.; Koedinger, K.R.; Corbett, A. Cognitive Tutor: Applied research in mathematics education. Psychon. Bull. Rev. 2007, 14, 249–255. [Google Scholar] [CrossRef] [PubMed]
  6. Schoenfeld, A.H. Research methods in (mathematics) education. In Handbook of International Research in Mathematics Education; Routledge: New York, NY, USA, 2008. [Google Scholar]
  7. Sweller, J. Cognitive Load During Problem Solving: Effects on Learning. Cogn. Sci. 1988, 12, 257–285. [Google Scholar] [CrossRef]
  8. Kapur, M. Productive failure. Cogn. Instr. 2008, 26, 379–424. [Google Scholar] [CrossRef]
  9. Sarsa, S.; Denny, P.; Hellas, A.; Leinonen, J. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the ACM Conference on International Computing Education Research, Lugano, Switzerland, 7–11 August 2022. [Google Scholar] [CrossRef]
  10. Jordan, M.; Ly, K.; Soosai Raj, A.G. Need a Programming Exercise Generated in Your Native Language? ChatGPT’s Got Your Back: Automatic Generation of Non-English Programming Exercises Using OpenAI GPT-3.5. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Portland, OR, USA, 20–23 March 2024. [Google Scholar] [CrossRef]
  11. Del Carpio Gutierrez, A.; Denny, P.; Luxton-Reilly, A. Evaluating Automatically Generated Contextualised Programming Exercises. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Portland, OR, USA, 20–23 March 2024. [Google Scholar] [CrossRef]
  12. Logacheva, E.; Hellas, A.; Prather, J.; Sarsa, S.; Leinonen, J. Evaluating Contextually Personalized Programming Exercises Created with Generative AI. In Proceedings of the ACM Conference on International Computing Education Research, Melbourne, Australia, 13–15 August 2024. [Google Scholar] [CrossRef]
  13. Kelly, D.P.; Rutherford, T. Khan Academy as Supplemental Instruction: A Controlled Study of a Computer-Based Mathematics Intervention. Int. Rev. Res. Open Distrib. Learn. 2017, 18, 1–8. [Google Scholar] [CrossRef]
  14. French, J.; Miller, H.; Roy, A. Computer manipulatives and student engagement in an online mathematics course. In Active Learning in College Science; Mintzes, J.J., Walter, E.M., Eds.; Springer Nature: Cham, Switzerland, 2020; pp. 603–619. [Google Scholar] [CrossRef]
  15. Brzoza, P.; Łobos, E.; Macura, J.; Sikora, B.; Żabka, M. ForMath intelligent tutoring system in mathematics. In Proceedings of the 4th International Conference on Computer Supported Education, CSEDU 2012, Porto, Portugal, 16–18 April 2012. [Google Scholar] [CrossRef]
  16. Staubitz, T.; Klement, H.; Renz, J.; Teusner, R.; Meinel, C. Towards practical programming exercises and automated assessment in Massive Open Online Courses. In Proceedings of the 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), Zhuhai, China, 10–12 December 2015. [Google Scholar] [CrossRef]
  17. Hitz, M.; Kögeler, S. Teaching C++ on the WWW. SIGCSE Bull. 1997, 29, 11–13. [Google Scholar] [CrossRef]
  18. Đanić, M.; Radošević, D.; Orehovački, T. Evaluation of Student Programming Assignments in Online Environments. In Proceedings of the 22nd Central European Conference on Information and Intelligent Systems (Ceciis 2011), Varaždin, Croatia, 21–23 September 2011. [Google Scholar]
  19. Neuhaus, C.; Feinbube, F.; Polze, A. A platform for interactive software experiments in massive open online courses. J. Integr. Des. Process Sci. 2014, 18, 69–87. [Google Scholar] [CrossRef]
  20. Robinson, P.E.; Carroll, J. An online learning platform for teaching, learning, and assessment of programming. In Proceedings of the 2017 IEEE Global Engineering Education Conference (EDUCON), Athens, Greece, 25–28 April 2017. [Google Scholar] [CrossRef]
  21. MacNeil, S.; Tran, A.; Hellas, A.; Kim, J.; Sarsa, S.; Denny, P.; Bernstein, S.; Leinonen, J. Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Toronto, ON, Canada, 15–18 March 2023. [Google Scholar] [CrossRef]
  22. Jury, B.; Lorusso, A.; Leinonen, J.; Denny, P.; Luxton-Reilly, A. Evaluating LLM-generated Worked Examples in an Introductory Programming Course. In Proceedings of the 26th Australasian Computing Education Conference, Sydney, Australia, 29 January–2 February 2024; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
  23. Liffiton, M.; Sheese, B.E.; Savelka, J.; Denny, P. CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research, Koli, Finland, 13–18 November 2024; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
  24. Speth, S.; Meißner, N.; Becker, S. Investigating the Use of AI-Generated Exercises for Beginner and Intermediate Programming Courses: A ChatGPT Case Study. In Proceedings of the 35th IEEE International Conference on Software Engineering Education and Training (CSEE&T), Tokyo, Japan, 7–9 August 2023. [Google Scholar] [CrossRef]
  25. Radošević, D.; Orehovački, T.; Stapić, Z. Automatic on-line generation of student’s exercises in teaching programming. In Proceedings of the Central European Conference on Information and Intelligent Systems, CECIIS, Varaždin, Croatia, 22–24 September 2010. [Google Scholar]
  26. Wakatani, A.; Maeda, T. Automatic generation of programming exercises for learning programming language. In Proceedings of the 14th IEEE/ACIS International Conference on Computer and Information Science (ICIS), Las Vegas, NV, USA, 28 June–1 July 2015. [Google Scholar] [CrossRef]
  27. Denny, P.; Prather, J.; Becker, B.A.; Finnie-Ansley, J.; Hellas, A.; Leinonen, J.; Luxton-Reilly, A.; Reeves, B.N.; Santos, E.A.; Sarsa, S. Computing Education in the Era of Generative AI. Commun. ACM 2024, 67, 56–67. [Google Scholar] [CrossRef]
  28. Liang, Z.; Yu, W.; Clark, P.; Zhang, X.; Kaylan, A. Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023. [Google Scholar] [CrossRef]
  29. Zhang, Y. Training and Evaluating Language Models with Template-based Data Generation. arXiv 2024, arXiv:2411.18104. [Google Scholar] [CrossRef]
  30. Tonga, J.C.; Benjamin, C.; Oudeyer, P.Y. Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology. arXiv 2025, arXiv:2411.03495. [Google Scholar]
  31. Zhai, C.; Wibowo, S.; Li, L.D. The efects of over-reliance on AI dialogue systems on students’ cognitive abilities: A systematic review. Smart Learn. Environ. 2024, 11, 28. [Google Scholar] [CrossRef]
  32. Zhu, P.; Chen, X.; Zhang, Z.; Li, P.; Cheng, X.; Dai, Y. AI-driven hypergraph neural network for predicting gasoline price trends. Energy Econ. 2025, 151, 108895. [Google Scholar] [CrossRef]
  33. Anderson, J.R.; Reder, L.M.; Simon, H.A. Situated Learning and Education. Educ. Res. 1996, 25, 5–11. [Google Scholar] [CrossRef]
  34. Mazur, E. Peer Instruction: A User’s Manual; Prentice Hall: Upper Saddle River, NJ, USA, 1997. [Google Scholar]
  35. Bransford, J.D.; Brown, A.L.; Cocking, R.R. How People Learn Brain, Mind, Experience, and School; National Academy Press: Washington, DC, USA, 2000. [Google Scholar]
  36. Robins, A.; Rountree, J.; Rountree, N. Learning and Teaching Programming: A Review and Discussion. Comput. Sci. Educ. 2003, 13, 137–172. [Google Scholar] [CrossRef]
  37. Medeiros, R.P.; Ramalho, G.L.; Falcão, T.P. A Systematic Literature Review on Teaching and Learning Introductory Programming in Higher Education. IEEE Trans. Educ. 2019, 62, 77–90. [Google Scholar] [CrossRef]
  38. Luxton-Reilly, A.; Simon; Albluwi, I.; Becker, B.A.; Giannakos, M.; Kumar, A.N.; Ott, L.; Paterson, J.; Scott, M.J.; Sheard, J.; et al. Introductory programming: A systematic literature review. In Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, Larnaca, Cyprus, 2–4 July 2018; Association for Computing Machinery: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
  39. Pieterse, V. Automated Assessment of Programming Assignments. In Proceedings of the 3rd Computer Science Education Research Conference on Computer Science Education Research, Heerlen, The Netherlands; Open Universiteit: Heerlen, The Netherlands, 2013. [Google Scholar]
  40. Ihantola, P.; Ahoniemi, T.; Karavirta, V.; Seppälä, O. Review of recent systems for automatic assessment of programming assignments. In Proceedings of the 10th Koli Calling International Conference on Computing Education Research, Koli, Finland, 28–31 October 2010; Association for Computing Machinery: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
  41. Ala-Mutka, K.; Uimonen, T.; Järvinen, H.M.; Knight, L. Supporting Students in C++ Programming Courses with Automatic Program Style Assessment. J. Inf. Technol. Educ. Res. 2004, 3, 245–262. [Google Scholar] [CrossRef] [PubMed]
  42. TIOBE Software BV. TIOBE Index for February 2025. 2025. Available online: https://www.tiobe.com/tiobe-index/ (accessed on 26 February 2025).
  43. Cass, S. The Top Programming Languages 2024. IEEE Spectrum. 2024. Available online: https://spectrum.ieee.org/top-programming-languages-2024 (accessed on 26 February 2025).
  44. Mészárosová, E. Is python an appropriate programming language for teaching programming in secondary schools. Int. J. Inf. Commun. Technol. Educ. 2015, 4, 5–14. [Google Scholar] [CrossRef][Green Version]
  45. Ateeq, M.; Habib, H.; Umer, A.; Rehman, M.U. C++ or Python? Which One to Begin with: A Learner’s Perspective. In Proceedings of the 2014 International Conference on Teaching and Learning in Computing and Engineering, Washington, DC, USA, 11–13 April 2014. [Google Scholar] [CrossRef]
  46. Gordon, C.; Lysecky, R.; Vahid, F. Programming learners struggle as much in Python as in C++ or Java. In Proceedings of the ASEE Annual Conference & Exposition, ASEE Conferences, Minneapolis, MN, USA, 26–29 June 2022; Available online: https://peer.asee.org/41410 (accessed on 11 November 2025).
  47. Cyganek, B. Modern C++ in the era of new technologies and challenges—Why and how to teach modern C++? In Proceedings of the 17th Conference on Computer Science and Intelligence Systems (FedCSIS), Sofia, Bulgaria, 4–7 September 2022. [Google Scholar] [CrossRef]
  48. Standard C++ Foundation. How Do I Deal with Memory Leaks? 2025. Available online: https://isocpp.org/wiki/faq/freestore-mgmt#memory-leaks (accessed on 9 January 2025).
  49. Raj, A.G.S.; Naik, V.; Patel, J.M.; Halverson, R. How to teach “modern C++” to someone who already knows programming? In Proceedings of the 20th Australasian Computing Education Conference, Brisbane, QLD, Australia, 30 January–2 February 2018; Association for Computing Machinery: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
  50. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates: Red Hook, NY, USA, 2020. [Google Scholar]
Figure 1. Mathematical exercise schema.
Figure 1. Mathematical exercise schema.
Asi 08 00174 g001
Figure 2. User interface of the math interactive exercise—screen with the example problem statement.
Figure 2. User interface of the math interactive exercise—screen with the example problem statement.
Asi 08 00174 g002
Figure 3. The list of types of generated exercises in the course Inverse Function.
Figure 3. The list of types of generated exercises in the course Inverse Function.
Asi 08 00174 g003
Figure 4. A flowchart of student’s interaction with a single programming exercise. The ellipses are terminal nodes, rectangles show actions of the student and the system, and diamonds are decision nodes.
Figure 4. A flowchart of student’s interaction with a single programming exercise. The ellipses are terminal nodes, rectangles show actions of the student and the system, and diamonds are decision nodes.
Asi 08 00174 g004
Figure 5. User interface of the programming course. The left column includes tabs with introduction to the unit, the programming task and hints. The right column consists of the code editor, buttons for running and checking the solution, and tabs with program input and output at the bottom.
Figure 5. User interface of the programming course. The left column includes tabs with introduction to the unit, the programming task and hints. The right column consists of the code editor, buttons for running and checking the solution, and tabs with program input and output at the bottom.
Asi 08 00174 g005
Figure 6. A flowchart presenting the process of generating programming tasks using our human-in-the-loop interactive tool.
Figure 6. A flowchart presenting the process of generating programming tasks using our human-in-the-loop interactive tool.
Asi 08 00174 g006
Table 1. Overview of mathematics courses on the Compass platform.
Table 1. Overview of mathematics courses on the Compass platform.
FieldCourse
CalculusEquations and Inequalities
Inverse Function
Limit of a Sequence
Limits of Functions
Derivative and Its Applications
Linear algebraComplex Numbers
Matrices and Systems of Equations
Vector Calculus
Table 2. Number of introductory courses teaching each programming language in selected Polish universities.
Table 2. Number of introductory courses teaching each programming language in selected Polish universities.
LanguageNumber of Courses
C or C++35
Python11
Java4
Other4
Unspecified2
Table 3. Initial messages sent to the model API during problem generation.
Table 3. Initial messages sent to the model API during problem generation.
RoleMessage Description
systemA prompt like “The user will send a unit of programming course and your job is to generate a programming exercise related to that unit”.
userTheoretical introduction to the unit.
assistantProblem statement of the base task (created by human).
userA prompt like “Very good task, generate another task related to this unit”.
Table 4. Example of messages exchanged with the model during generation of a task.
Table 4. Example of messages exchanged with the model during generation of a task.
RoleMessage Description
systemA prompt like “The user will send a unit of programming course and your task is to generate a programming exercise related to that unit”.
userTheoretical introduction to a unit about if and else statements.
assistantProblem statement of the base task (created by human).
userA prompt like “Very good task, generate another task related to this unit”.
assistantA problem that requires loops (not introduced yet in the course).
user“Your task requires using loops, which are not introduced yet. Generate a task that requires using only if and else control statements.”
assistantA good problem which only requires using if and else.
user“Very good task, generate another task related to this unit.”
assistantAnother good problem.
Table 5. Example of messages exchanged with the model during generation of task solution.
Table 5. Example of messages exchanged with the model during generation of task solution.
RoleMessage Description
systemA prompt like “The user will send a programming task and you have to write a solution in C++.”
userThe problem statement of the base task (created by human).
assistantThe solution to the base task (created by human).
userThe problem statement generated previously by the model.
assistantA solution which uses features not introduced yet in the course, e.g., functions.
user“Your solution uses functions, which are not introduced yet. Generate a solution using only the main function.”
assistantA solution without user-defined functions.
Table 6. Example of messages exchanged with the model during generation of hints.
Table 6. Example of messages exchanged with the model during generation of hints.
RoleMessage Description
systemA prompt like “The user will send a programming task and its solution. You have to write hints that will help the student solve the task”.
userThe problem statement and solution of the base task (created by human).
assistantThe hints to the base task (created by human).
userThe problem statement and solution of a task generated previously by the model.
assistantHints to the generated task.
Table 7. Number of units with generated exercises and total number of evaluated exercises for each subject in the programming course.
Table 7. Number of units with generated exercises and total number of evaluated exercises for each subject in the programming course.
SubjectUnitsEvaluated Tasks
Variables, types and operators36
Conditional instructions39
Loops416
Arrays26
Functions39
Strings35
Structs13
I/O Streams22
Algorithms26
Introduction to OOP26
Total2568
Table 8. Number of correct, partially correct and incorrect task elements of each type.
Table 8. Number of correct, partially correct and incorrect task elements of each type.
ElementCorrectPartially CorrectIncorrect
EN problem statement56 (87.5%)8 (12.5%)0 (0.0%)
PL problem statement49 (76.6%)14 (21.9%)1 (1.6%)
Solution code63 (98.4%)1 (1.6%)0 (0.0%)
EN hints59 (92.2%)4 (6.2%)1 (1.6%)
PL hints58 (90.6%)5 (7.8%)1 (1.6%)
Table 9. Percentage Pass Rates of Midterm Exams.
Table 9. Percentage Pass Rates of Midterm Exams.
Field of StudyCourse2023/20242024/2025
SizePassSizePass
Automatic Control and RoboticsCalculus15863.41%19166.88%
Electronics and TelecommunicationCalculus5643.16%3745.59%
Algebra5352.48%3564.08%
MACROCalculus3352.23%2857.7%
Algebra3149.88%2772.81%
Table 10. Platform Usage Frequency ( % ) .
Table 10. Platform Usage Frequency ( % ) .
FrequencyAverage
Occasionally (e.g., before a test or exam)39%
Rarely (about once a month)3%
Several times a month39%
Frequently (several times a week)19%
Table 11. Percentage breakdown of responses to the question of whether the platform is an excellent learning tool.
Table 11. Percentage breakdown of responses to the question of whether the platform is an excellent learning tool.
Learning ObjectiveAverage
Preparing for an in-class tests and midterm exams81%
Preparing for an theoretical exams65%
Acquiring new knowledge32%
Additional learning support84%
Not useful0%
Table 12. Aggregated statistical comparison of pass rates between academic years 2023/2024 and 2024/2025.
Table 12. Aggregated statistical comparison of pass rates between academic years 2023/2024 and 2024/2025.
Pass Rate 2023/2024Pass Rate 2024/2025Difference95% CIz-Scorep-ValueSignificant ( α = 0.05 )
56.7%64.6%+7.9% [ 15.5 % , 0.4 % ] 2.06 0.019Yes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Borys, D.; Macura, J.; Sikora, B.; Wróbel, Ł. From Manual to AI-Driven: Methods for Generating Mathematics and Programming Exercises in Interactive Educational Platforms. Appl. Syst. Innov. 2025, 8, 174. https://doi.org/10.3390/asi8060174

AMA Style

Borys D, Macura J, Sikora B, Wróbel Ł. From Manual to AI-Driven: Methods for Generating Mathematics and Programming Exercises in Interactive Educational Platforms. Applied System Innovation. 2025; 8(6):174. https://doi.org/10.3390/asi8060174

Chicago/Turabian Style

Borys, Dominik, Janina Macura, Beata Sikora, and Łukasz Wróbel. 2025. "From Manual to AI-Driven: Methods for Generating Mathematics and Programming Exercises in Interactive Educational Platforms" Applied System Innovation 8, no. 6: 174. https://doi.org/10.3390/asi8060174

APA Style

Borys, D., Macura, J., Sikora, B., & Wróbel, Ł. (2025). From Manual to AI-Driven: Methods for Generating Mathematics and Programming Exercises in Interactive Educational Platforms. Applied System Innovation, 8(6), 174. https://doi.org/10.3390/asi8060174

Article Metrics

Back to TopTop