Abstract
The integration of AI tools like ChatGPT into educational assessments, particularly in the context of Multivariable Calculus, represents a transformative approach to personalized and scalable learning. This study examines the Exams as a Service (EaaS)-Flipped Chatbot Test (FCT) framework, implemented through the AIQuest platform, to explore how chatbots can support assessment processes while addressing risks related to automation and academic integrity. The methodology combines static and dynamic assessment modes within a cloud-based environment that generates, evaluates, and provides feedback on student responses. Quantitative survey data and qualitative written reflections were analyzed using a mixed-methods approach, incorporating Grounded Theory to identify emerging cognitive patterns. The results reveal differences in students’ engagement, performance, and reasoning patterns between AI-assisted and non-AI assessment conditions, highlighting the role of structured AI-generated feedback in supporting reflective and metacognitive processes. Quantitative results indicate higher and more homogeneous performance under the reverse evaluation, while survey responses show generally positive perceptions of feedback usefulness and task appropriateness. This study contributes integrated quantitative and qualitative evidence on the design of AI-assisted evaluation frameworks as formative and diagnostic tools, offering guidance for educators to implement AI-based evaluation systems.
1. Introduction
In the digital era, higher education faces several challenges in designing and evaluating assessments that are efficient, scalable, and aligned with students’ learning needs Adeshola and Adepoju (2023). The adoption of online platforms and advanced technologies has driven the development of innovative models that integrate automation, artificial intelligence (AI), and cloud computing into assessment processes (Jo, 2024), (Thavi et al., 2024). In this context, the Exams as a Service (EaaS) model has emerged as an approach that combines AI-based tools, such as ChatGPT (OpenAI, 2024), with cloud infrastructures to automate question generation, answer evaluation, and personalized feedback (Chan et al., 2023; Navas et al., 2024b). This paradigm makes it possible to create dynamic and adaptive exam experiences while supporting teaching staff in reducing workload and improving learning outcomes (Jo, 2024).
Chatbots and generative AI models have changed educational practices (Jusoh & Kadir, 2025) in a wide variety of disciplines, including Mathematics (Ergene & Ergene, 2024; Navas et al., 2024a), Physics (Navas et al., 2025), Engineering, Medicine (Saleem et al., 2024), Business Education (Maslov et al., 2026), and Social Sciences (El Mourabit et al., 2025). These tools can offer step-by-step explanations, generate multiple variations of problems, facilitate real-time interaction, and provide immediate feedback (Calonge et al., 2023; Labadze et al., 2023). Previous studies have shown that AI-based tutoring can improve students’ self-efficacy and deepen their conceptual understanding in STEM areas (Lademann et al., 2024; Trubljanin et al., 2025; Zhuang, 2025). Furthermore, recent research shows that chatbots can adapt to each student’s level and pace, offering explanations adapted to their needs and responding immediately to their questions (Camacho Leal et al., 2025; Lee et al., 2024; Lee & Yeo, 2022).
However, this rapid integration of AI in education has also raised concerns about the risks associated with automation. Chatbots can produce inaccurate or incomplete solutions, present difficulties with symbol manipulation or mathematical proofs, and generate misconceptions that students might accept without critical analysis (Labadze et al., 2023). Excessive reliance on automated tools can inhibit critical thinking, reduce active learning, and create dependency patterns that weaken long-term learning outcomes (Lademann et al., 2024).
In this context, challenges also persist in the development of assessments for groups. The traditional design of questions and manual grading are labor-intensive and costly processes for instructors, especially in courses with large groups or high mathematical complexity (Navas et al., 2024b). In addition, ambiguity in problem statements or a lack of alignment between instruction and assessment design can reinforce misconceptions instead of resolving them.
Evidence from non-mathematics disciplines—including engineering, health sciences, and professional education—further supports this interpretation. In these contexts, AI-assisted assessment has shown greater pedagogical value when it is aligned with structured learning objectives, explicit feedback mechanisms, and opportunities for student reflection, rather than functioning as a purely automated evaluative tool. These findings are particularly relevant to mathematically intensive courses Udias et al. (2024), where conceptual misunderstandings and procedural errors are frequent, and where feedback must foster analytical reasoning rather than superficial correctness. Consequently, insights from these disciplines reinforce the importance of designing AI-supported assessments that balance adaptability with guided cognitive engagement in mathematics education.
To address these opportunities and challenges, Navas et al. (2024b) developed the Exams as a Service (EaaS) model through the AIQuest platform. Building on this architecture, the present study introduces the Flipped Chatbot Test (FCT) (Navas et al., 2025), an approach inspired by the flipped classroom model (Bishop & Verleger, 2013; Li et al., 2018; Lujan & DiCarlo, 2014; Zou et al., 2022) and implemented within AIQuest for the teaching and assessment of Multivariable Calculus.
Despite recent advances in the integration of AI and chatbots in higher education, significant gaps remain in the literature: (1) limited understanding of how students cognitively process AI-generated feedback; (2) the combination of automated assessment systems with dynamic question variations generated by chatbots has not been found in prior work; and (3) there are no studies analyzing how these AI-based assessment modes influence students’ reasoning patterns using qualitative methods. This study addresses these gaps by analyzing static and dynamic assessments implemented through AIQuest in Multivariable Calculus using a mixed methodology (quantitative and qualitative).
Accordingly, the purpose of the present study is to examine the pedagogical value of an AI-assisted assessment framework in Multivariable Calculus, focusing on students’ performance under different assessment modalities, their interaction with automated feedback, and their perceptions of the AIQuest system.
To guide the empirical analysis, the study is structured around the following research questions:
RQ1. How does student performance differ between a traditional assessment conducted without AI support (control condition) and an AI-assisted reverse evaluation implemented through the AIQuest platform?
RQ2. How do students perceive the usefulness, appropriateness, and level of difficulty of AI-assisted assessment and feedback provided by the AIQuest system?
RQ3. What qualitative learning patterns and feedback engagement behaviors emerge from students’ written responses during AI-assisted assessment?
2. Background
Educational research on artificial intelligence, automated assessment, and learning with chatbots has grown rapidly in recent times. This section reviews the foundational literature related to AI-assisted assessment, synthesizes theories that support the pedagogical use of chatbots, and gives context to the present study.
2.1. Chatbots and AI in Education
Chatbots have been adopted in STEM areas to provide explanations, generate practice exercises, and support interactive problem solving (Calonge et al., 2023; Labadze et al., 2023; Lee et al., 2024, Navas et al., 2024b, 2025). These systems provide individualized tutoring, offering students clarification and guided learning (Zhuang, 2025). Their applications extend beyond mathematics to fields such as medicine, engineering education, data science, chemistry, and law, demonstrating their interdisciplinary relevance.
Research has identified several benefits: improvements in conceptual understanding (Lademann et al., 2024), increased self-efficacy, higher motivation, and more frequent engagement in problem-solving tasks (Zhuang, 2025). However, the literature also highlights limitations such as the generation of incorrect content, reasoning, and symbol processing errors, and difficulties in verifying the reliability of AI-generated solutions (Labadze et al., 2023). These issues underscore the need for supervision and the design of robust pedagogical frameworks.
2.2. AI Risks
The increasing dependence of students on AI tools has become a critical issue. Studies prove that excessive use can inhibit metacognition, reduce engagement in learning, and weaken the development of analytical skills (Lademann et al., 2024). Other risks identified in the literature include privacy issues, ethical considerations, lack of transparency in AI reasoning, and risks to academic integrity.
In this context, these concerns are not limited to the classroom but are reflected in international debates on the responsible use of AI in education. Several countries have issued guidelines on the use of generative AI. Comparisons have been drawn between Canada, the UK, and the USA regarding their policies (Orfanidis, 2025). Likewise, universities in Singapore have introduced frameworks that emphasize transparency and human oversight to address ethics (Tan et al., 2025). In the European Union, the legal and pedagogical implications of regulatory frameworks on the use of artificial intelligence in digital education have been analyzed, with a specific focus on IT curricula (Ivković et al., 2025). Finally, in Latin America, institutional policies on academic integrity related to the use of these technologies have been developed (Meza et al., 2025).
Some of these contexts prioritize formal regulation, transparency, and risk mitigation through explicit policy frameworks (Ivković et al., 2025; Meza et al., 2025; Orfanidis, 2025), others adopt more flexible institutional guidelines that emphasize formative use, instructor oversight, and pedagogical autonomy (Meza et al., 2025; Tan et al., 2025). Despite these differences, a shared concern emerges across regions: the need to prevent student over-reliance on IA systems, and preserve academic integrity. This convergence suggests that effective AI-assisted assessment requires hybrid models in which automated processes are systematically embedded within human-centered instructional decision-making.
2.3. AI Assessment and Prompt Engineering
Automated assessment systems have evolved significantly, integrating Natural Language Processing (NLP) techniques to evaluate free-text responses, generate feedback, and produce variations of problem statements. The EaaS model by Navas et al. (Navas et al., 2024b) represents a recent development that incorporates cloud-based microservices to automate and personalize assessment workflows (Amaral et al., 2015). Prompt engineering plays a key role in this context. The literature emphasizes that carefully designed prompts can reduce errors, improve the clarity of feedback, and ensure that AI-generated content aligns with pedagogical objectives (Camacho Leal et al., 2025; Park et al., 2023). As generative AI becomes more prevalent, prompt engineering is increasingly recognized as a fundamental digital literacy for both instructors and students.
2.4. Theoretical Foundations
The theoretical foundation of this study integrates established educational theories—Social Cognitive Theory (Bandura et al., 2009), constructivism, and metacognitive theory—with conceptual and technological frameworks central to AI-assisted assessment, including the Exams as a Service (EaaS) model, the AIQuest platform, the Flipped Chatbot Test (FCT), and the methodological foundations of Grounded Theory (GT).
The pedagogical implications of AI-based assessment are grounded in three main theoretical frameworks. First, Social Cognitive Theory (Bandura et al., 2009) emphasizes the role of self-efficacy in students’ motivation and performance; timely and guided feedback, such as that generated by chatbots, can strengthen confidence when facing complex tasks. Second, constructivist theory (Bada & Olusegun, 2015) highlights the need for active participation and knowledge construction; AI-generated explanations and dynamic task variations can support or hinder this process depending on the degree of student reflection and autonomy. Third, metacognitive theory (Flavell, 1979; Lai, 2011) underscores the importance of monitoring, error analysis, and reflective thinking to improve learning.
Complementing these educational theories, Grounded Theory (GT) (Glaser, 1998; Glaser & Strauss, 1965) provides the methodological approach used to analyze students’ cognitive and interpretive processes. following the three classical phases (Navas & Yagüe, 2022, 2023): open coding, selective coding, and theoretical coding, through constant comparison. Rather than imposing predefined categories, GT allows patterns to emerge organically from students’ written explanations and reflections. This makes it particularly suited to examining how learners interpret AI-generated feedback, negotiate ambiguity, and develop their reasoning in the context of automated assessment.
On the technological and structural side, the Exams as a Service (EaaS) model conceptualizes assessment as a scalable and automated service leveraging cloud computing, workflow orchestration, and generative AI. EaaS supports the generation of dynamic problem variations, automated grading, and personalized feedback at scale. Within this model, AIQuest serves as the operational platform that implements EaaS through a microservices architecture. AIQuest manages user roles, question banks, prompts, responses, feedback, and both static and dynamic assessment modalities, providing a controlled environment in which pedagogical strategies can be encoded into assessment workflows.
The Flipped Chatbot Test (FCT) introduces the pedagogical mechanism that integrates these concepts. In the FCT, students first evaluate a chatbot-generated solution—correcting errors, analyzing reasoning, or identifying misconceptions—before solving a final problem. This structure operationalizes SCT by supporting self-efficacy through guided feedback; enacts constructivist principles by requiring active knowledge construction; promotes metacognition through structured reflection; and generates rich qualitative data that can be analyzed through Grounded Theory.
Together, these theoretical and methodological frameworks provide a coherent foundation for the present study. Social Cognitive Theory, and metacognitive theory inform the pedagogical design of the EaaS and FCT frameworks. At the same time, Grounded Theory supports an analysis of students’ interactions with AI-generated feedback. This perspective allows examination of how automated assessments develop learning processes in Multivariable Calculus.
The research gaps motivating this study were outlined in the introduction and are specified here through the theoretical framework. While prior studies have examined AI-assisted feedback, formative assessment, and metacognition in isolation, there remains a lack of empirical research addressing how specific assessment designs structure students’ engagement with AI-generated feedback in mathematically contexts. In particular, existing literature provides limited insight into how reverse evaluation tasks, where students are required to identify and correct errors, can support reflective and metacognitive learning processes when mediated by AI systems. Based in constructivist, social cognitive, and metacognitive theories, this study addresses this gap by operationalizing reverse evaluation as a pedagogically informed assessment strategy.
3. Methodology
3.1. Research Design
This study follows a mixed-methods research design (Agarwal et al., 2025), combining quantitative and qualitative approaches to analyze student performance, perceptions, and cognitive processes during AI-assisted assessments in Multivariable Calculus. The mixed-methods approach enables triangulation between numerical assessment results and qualitative evidence derived from students’ written explanations and feedback interactions.
Quantitative data were obtained from the numerical scores generated by the AIQuest platform across multiple attempts, as well as from responses to a post-exam Likert-scale questionnaire.
Qualitative data came from the written explanations provided by students when solving the problems, and from the feedback generated by the GPT-4-turbo model. Both were analyzed using the Grounded Theory (GT) methodology in its Glaserian version (Glaser, 1998; Navas & Yagüe, 2022), in order to identify thinking patterns, conceptual challenges, and learning strategies. Grounded Theory analysis followed an open, selective, and theoretical coding process with constant comparison. The qualitative analysis was conducted by a single researcher; in line with the Glaserian tradition, interrater reliability metrics such as Cohen’s kappa were not applied, as GT emphasizes theoretical sensitivity, constant comparison, and analytic rigor over coder agreement.
3.2. Participants and Setting
The study was conducted during the March–September 2024 academic term at Universidad Politécnica Salesiana (Ecuador) in a supervised computer laboratory environment. The sample consisted of 19 undergraduate Electrical, Mechanical, Civil, Electronics, Mechatronic, and Computer Engineering students enrolled in the Multivariable Calculus course. Of the 19 students, 12 were male and 7 were female. A non-probabilistic convenience sampling strategy was used, since all students in the course were invited to participate as part of their regular learning activities. Participation was voluntary, anonymity was guaranteed, and the data were used exclusively for academic and research purposes.
The students worked individually, connected to the AIQuest platform. The assessment sessions lasted approximately 60 min and were supervised by the course professor. The same instructions, conditions, and constraints were applied to the entire group, and the use of additional AI tools was not allowed.
3.3. Instruments and Platform
The implementation of the “Exams as a Service” (EaaS) system integrates advanced artificial intelligence technologies and cloud services to offer a scalable educational platform. The proposed architecture is deployed on Google Cloud Platform (GCP) using a virtual machine running Ubuntu Server, and it is organized into Docker containers that host both the backend (developed in Python 3.12.3 and Flask 2.1.3) and the frontend (in JavaScript using VueJS). The system features a Single Page Application (SPA) interface that provides a smooth user experience and supports mathematical formulas via the KaTeX library. All generated data—including questions, student responses, feedback, and grades—is stored in a PostgreSQL database.
From a research perspective, the AIQuest platform functioned as both an assessment instrument and a data collection tool, recording students’ responses, scores, attempts, and AI-generated feedback for subsequent quantitative and qualitative analysis.
3.4. Assessment Framework
This methodology is framed within the Exams as a Service Flipped Chatbot Test (EaaS-FCT) model, which combines the architecture of “Exams as a Service” (EaaS) with the flipped chatbot test (FCT) to transform traditional assessment into a dynamic, formative learning experience by clearly separating pedagogical and technical responsibilities. The model was applied in Multivariable Calculus—particularly in gradients and directional derivatives, a commonly challenging topic—to evaluate the pedagogical value of static and dynamic assessments, the clarity of chatbot-generated feedback, and students’ perceptions of learning effectiveness, while also exploring the cognitive patterns emerging during interactions with AI-generated feedback.
The EaaS-FCT framework defines three roles:
- Student: interacts directly with static and dynamic assessments on the AIQuest platform. They receive AI-generated feedback in real time on their submissions and may make multiple attempts to refine their understanding.
- Professor: acts as the strategist of the learning process. They select the topics, design the exercises and their solutions, review and adjust the problem statements, define the evaluation criteria for the AI, and use the platform’s reports to identify frequent misconceptions.
- Prompt Manager: serves as the technical link between pedagogical design and system implementation. They implement the instructor’s requirements, format the prompts in the required JSON structure, manage the database of problems and solutions (promptDB), and ensure the proper operation of the system.
These roles are important in the design and execution of both static and dynamic assessments, ensuring a robust and effective process. This can be seen in Figure 1.
Figure 1.
State diagram of AIQuest process.
EaaS-FCT framework operationalizes formative assessment principles and provides conditions for observing metacognitive and reflective learning processes during AI-assisted problem solving.
3.5. Assessment Modes
This study implements two types of evaluation, as defined by Navas (Navas et al., 2024b), static and dynamic assessment modes. Both rely on the gpt-4-turbo API, but at different stages of the assessment process.
3.5.1. Static Assessment Mode
The process begins when the student logs into the AIQuest platform and accesses the assessment. Prior to this stage, the professor plays an important role by reviewing, editing, and defining the conditions of the question that will serve as the basis for evaluation. This validated question constitutes the promptDB and is tested using the GPT-4-turbo API to ensure correctness and alignment with the learning objectives. The Prompt Manager then implements the professor’s specifications by formatting the prompt in JSON format and storing the finalized version in the database. When the student accesses the test, AIQuest retrieves the preprocessed promptDB, which remains fixed and does not require additional API processing at this stage. The student submits their response, generating the promptStu. The system then combines promptDB and promptStu into a composite prompt (promptSys), which is sent to the GPT-4-turbo API for evaluation. The API returns a numerical score (on a 0–10 scale) along with detailed textual feedback. These outputs are combined into promptAPIGrade and stored together with the student’s response in promptDBStorage, also in JSON format. The feedback is immediately presented to the student, who may attempt the assessment up to five times, fostering an iterative learning process. The platform also displays the average score across attempts, supporting self-regulation and reflective learning (Navas et al., 2024b, 2025).
3.5.2. Dynamic Assessment Mode
Similar to the static mode, the process begins with the professor, who reviews the base prompt and defines the dynamic behavior to be incorporated into the promptDB. This includes specifying the types of variations or intentional modifications that the AI may introduce. The Prompt Manager incorporates these requirements, ensuring that the dynamic logic is correctly implemented, that the prompt adheres to the required JSON structure, and that it is stored in the database. Once the student accesses the assessment, AIQuest retrieves the promptDB and submits it to the GPT-4-turbo API. At this stage, the API introduces the predefined dynamic modifications, which may include adjusting numerical parameters, altering a formula or equation, or embedding an intentional conceptual error in one of the problem statements. The result of this transformation is the dynamically generated prompt (promptAPIQ), which is delivered to the student Spasic and Jankovic (2023), (Navas et al., 2024b, 2025).
The student is then required to respond according to the modified task, identifying, correcting, or addressing the introduced changes based on the defined evaluation criteria. This dynamic mechanism allows each student to face a personalized assessment while preserving the underlying mathematical structure of the problem.
3.6. Procedure
The core process of the EaaS-FCT framework was developed based on the methodological steps proposed in [2] and [4], and adapted to the context of the Multivariable Calculus course as follows:
- 1.
- Definition of the knowledge domain: The professor selected the subject area (Multivariable Calculus) and identified subtopics that have historically been challenging for students (for example, gradient and directional derivatives).
- 2.
- Selection and solution of exercises: The professor selected and solved a set of relevant exercises. These step-by-step solutions served as a reference for the chatbot to subsequently evaluate the students’ work.
- 3.
- Configuration of the promptDB and evaluation rules: The exercises and their solutions were stored in AIQuest’s PostgreSQL database, defining the promptDB (repository of problems and correct solutions) and the Base Prompt with the dynamic rules governing the generation and evaluation of tasks by the AI.
- 4.
- Initial student training: Before the formal assessment, students participated in a familiarization session with the platform through a static assessment, where the question format and the use of AI-generated feedback were introduced.
- 5.
- Reverse evaluation: During the formal assessment session, students submitted their answers (Student Prompt), which were combined with the correct solution and the evaluation rules to build a combined prompt sent to the GPT-4-turbo API. The system generated a numerical score and detailed textual feedback; up to five attempts per exercise were allowed to adopt an iterative learning cycle.
- 6.
- Post-assessment survey: At the end of the assessment session, students completed a questionnaire to provide quantitative and qualitative data about their experience, the perceived usefulness of the AI feedback, and the challenges they encountered. The questionnaire consisted of four Likert-scale items and one open-ended question addressing perceived difficulty, feedback usefulness, and overall learning experience.
Data generated throughout these stages were analyzed using descriptive statistics for quantitative variables and theory procedures for qualitative data.
4. Case Study
This case study illustrates the practical implementation of the EaaS-FCT framework in an undergraduate Multivariable Calculus course. Its purpose is to examine how static and dynamic assessment modes operate in real classroom conditions and how students interact with AI-generated feedback during iterative assessment cycles.
This configuration was developed through collaborative efforts between course instructors and the AIQuest development team, with active student participation during assessments.
Professors selected the subject area and relevant subtopics, providing exercises tailored to their course content. These exercises had to be clearly defined and suitable for accurately evaluating student knowledge. Additionally, instructors supplied the correct solutions for each problem. The development team compiled the provided exams and conducted interviews with the professor to define functional requirements and initial conditions. Based on this input, the prompt manager configured prompts that governed exam operation, including the statement presented to students and presentation format. Once configured, the instructors were granted access to an alpha version of the platform to evaluate and provide feedback to the development team. Before student access was enabled, internal acceptance testing (IAT) was conducted to verify system functionality. This process involved uploading test cases and generating prompts to confirm seamless integration among system components, including the database, question generation engine, and grading subsystem. Instructor feedback was then incorporated through prompt modifications.
Students participated in two assessments that were implemented on the platform: static and dynamic. In the static mode, a training exercise, all students received the same standardized question. In contrast, the dynamic mode, which involves formal evaluation, uses GPT-4 to generate personalized variations of each question, allowing for specific changes in each instance.
Students were granted up to five opportunities to complete each assessment, receiving a grade and detailed feedback after each attempt. This iterative approach was designed to reinforce learning through reflection and correction, promoting deeper understanding of the subject matter.
Calculus Application
AIQuest is designed to be a domain-agnostic assessment system and has been tested across various disciplines. This study focuses on its application in Multivariable Calculus, a standard subject in many undergraduate engineering programs, including Mechanical, Electronics, Telecommunications, Mechatronics, and Civil Engineering.
The following example follows the procedure described in Section 3.6 and integrates both static and dynamic assessment modes as defined in Section 3.5.
The subtopic selected for assessment was gradients and directional derivatives, identified as particularly challenging for students in past courses. Exercises were curated and solved by the instructors, with each problem requiring interpretation of the gradient (Equation (1)) and the directional derivative of bivariate functions (Equation (2)) (Zill & Wright, 2011).
Each exercise was uploaded to AIQuest in both modes, specifying a problem statement divided into four parts, along with a corresponding solution. Instructors reviewed this initial version to identify errors and ensure alignment with pedagogical objectives. Several adjustments were made, such as reducing the number of parts and clarifying directional conventions (e.g., assigning the positive X-axis to the East and the Y-axis to the North), to simplify interpretation and minimize confusion.
Two test sessions were conducted. The first served as a training session using the static mode, shown in Figure 2. The second followed the dynamic mode and included two modified exercises. In the training session, students received the original statement with intentional errors introduced by the ChatGPT API (OpenAI, 2025; Wu et al., 2023), and were tasked with identifying and describing these inaccuracies. In the second session, students received dynamically generated problem statements by GPT-4, which they had to solve independently.
Figure 2.
First lesson: Problem statement.
This activity was defined as reverse evaluation; instead of directly solving the problem, students are required to evaluate an AI-generated solution, identify conceptual or procedural errors, and propose corrections before producing their own answer.
The general form of the bivariable function used in these problems followed the structure (Equation (3)). Figure 2, Figure 3, Figure 4 and Figure 5 in the article illustrate examples of these exercises, the system-generated feedback, and student responses.
Figure 3.
First lesson: Feedback and the best answer.
Figure 4.
Second lesson: Problem statement.
Figure 5.
Second lesson: Feedback and the best answer.
A version modified by the ChatGPT API is presented in Figure 2, which corresponds to the static interaction mode. The conceptual error is located in step 3, specifically in the final paragraph. In this paragraph, it is incorrectly asserted that the direction of steepest ascent is opposite to the gradient direction. This statement is inaccurate, as the gradient vector, by definition, points in the direction of the greatest rate of increase of a scalar bivariate function. For clarity and emphasis, the paragraph containing the error in Figure 2 has been intentionally highlighted.
In the second lesson, the problem statement presents the issue without a solution, and the student is asked to solve two parts of the problem.
AIQuest modified it within a set range of values, as shown in Figure 2. The rest of the problem statement includes the requirement of four parts, and the solutions to these parts are provided, with an error introduced in one of them. Initially, it was established that there could be one or more errors, but this could increase the difficulty for students who encountered more errors.
The best solution a student provides to the example problem statement in Figure 2 can be found in Figure 3. Here, the previous feedback given by the system and the corresponding response can be observed.
AIQuest modified the context of the problem statement completely, even altering the base story on its own. Similarly to the statement of the first lesson, the general form of the bivariate function is: within a set range of values, as shown in Figure 4. The rest of the problem statement includes the requirement for two parts to be answered by the student. The feedback and the best solution for the example in Figure 4 can be found in Figure 5.
These examples illustrate how AI-generated feedback, when combined with reverse evaluation, encourages students to reflect on their reasoning process, identify misconceptions, and improve their understanding.
5. Results
This section presents the quantitative and qualitative findings. These results are organized to distinguish between Reverse Evaluation analysis, self-reported perceptions, and qualitative patterns identified through Grounded Theory analysis.
5.1. Quantitative Analysis
5.1.1. Reverse Evaluation Analysis
Two assessments were compared: (i) Control, conducted without AI support, and (ii) Reverse Evaluation, conducted with AI through the AIQuest platform. The aim of the analysis was to contrast participants’ performance (Figure 6).
Figure 6.
Boxplot comparing participants’ performance in the Control and Reverse Evaluation assessments.
In the Control group, the mean was 1.33 and the median was 0, indicating that more than half of the students obtained a score of zero. In contrast, in the reverse evaluation, the mean reached 4.71 and the median 4.53.
Regarding dispersion, the Control assessment showed a standard deviation of 2.29, higher than that observed in Reverse Evaluation (0.97), suggesting more variable results in the non-AI condition.
The box-and-whisker plot (Figure 6). shows that in the Control assessment, the box is concentrated at the lower end. In Reverse Evaluation, the box is centered around 4–5 and displays a smaller spread, indicating more homogeneous performance.
5.1.2. Survey Results: Closed-Ended Questions
This subsection presents the quantitative analysis of the closed-ended items from the questionnaire. The analysis is based from the Likert-scale responses, including frequency distributions and percentage-based trends across participants.
The first question of the survey was: What is level of difficulty experienced when solving the exercise using this method?. Among the five possible options on the Likert scale, students selected only two. The most frequent response was “Neutral”, a majority of students (75%) evaluated the exercises as neutral. In contrast, only 25% of students reported a moderate level of difficulty, see Figure 7.
Figure 7.
First question: Level of difficulty.
The second question of the survey was: Do you consider this method appropriate? The overall learning experience was positively evaluated by half of the participants (50%), while 41.7% reported a neutral perception (see Figure 8).
Figure 8.
Second question: Appropriate method.
The Third question of the survey was: Has the feedback provided by the AIQuest system been useful during the session? Students’ perceptions of the AIQuest feedback system were largely positive. The majority of participants rated the feedback as either Very useful (58.3%) or Extremely useful (8.3%). A smaller proportion of students selected the midpoint of the scale, Somewhat useful (8.3%). Additionally, 25% of the participants rated the feedback as Slightly useful. No student selected the lowest category, indicating the absence of strongly negative perceptions, see Figure 9.
Figure 9.
AIQuest usefulness.
The fourth question of the survey was: Was the session time appropriate? Most students 75% Strongly Agree or Agree. The 8.3%, gave a neutral rating. A 16.7% disagreed, stating that the allocated time was either insufficient, see Figure 10.
Figure 10.
Was the session time appropriate?
Also, a descriptive comparison by gender was conducted to explore potential trends. Male students (n = 12) and female students (n = 7) showed similar patterns in overall performance and survey responses, with no systematic differences observed across assessment modes. Given the small and unbalanced sample, no inferential statistical tests were performed, and results are reported descriptively.
5.2. Qualitative Analysis
This section is organized into two parts. The first part reports the findings from the survey’s open-ended question. The second part focuses on qualitative analysis of Reverse Evaluation using GT.
5.2.1. Survey Results: Open-Ended Question
It was analyzed the open-ended question: How would you evaluate your learning experience under this modality? The analysis suggests that this modality is perceived as a formative environment that supports practice and the application of concepts. The responses indicate an overall positive assessment of the AIQuest experience, although accompanied by remarks related to feedback.
First, a dominant theme emerged around “reinforcement and practice of what was learned”, where several students described the modality as appropriate and reported that it helped them consolidate content covered in class and put previously acquired knowledge into practice.
A second recurring theme relates to the “development of analysis and step-by-step reasoning”. Participants reported that the modality requires them to “think through all aspects” of an answer and “detail the steps” taken, highlighting the role of reflection and procedural evaluation. However, this same feature appeared with an ambivalent perception: while some valued it as beneficial for analyzing the exercise, others associated it with greater difficulty in solving tasks or acknowledged the need to strengthen their own analytical skills, for example: “I need to analyze the problem more to solve it.” Taken together, these findings suggest that the modality promotes metacognitive processes, although not all students feel equally prepared for it.
Additionally, a theme concerning the “novelty of the experience” emerged, characterizing the modality as different from conventional approaches. These references point to a perception of innovation that may contribute to motivation, but may also require gradual adaptation by the students.
In contrast, specific critical comments were identified regarding the “need for guidance and feedback”, particularly in relation to error identification. These findings highlight opportunities for improvement related to the evaluation design and to supporting errors as a learning opportunity.
5.2.2. Reverse Evaluation Analysis Using GT
This subsection presents a qualitative examination of students’ written solutions and their iterative interactions with AI-generated feedback during the reverse evaluation process. The analysis, developed using a Grounded Theory perspective, focuses on identifying recurrent patterns in reasoning, persistent conceptual difficulties, and observable changes across successive attempts, providing a more detailed view of how students engaged with multivariable calculus concepts within the AI-assisted assessment environment.
Open Coding: Concept Identification
All student responses, including their original answers, the question prompts, and the feedback generated by the ChatGPT API, were coded line by line. This phase revealed the following concepts:
- Partial derivative confusion: differentiating with respect to the wrong variable (e.g., computing instead of ).
- Gradient misinterpretation: Incorrect understanding of the gradient’s direction, leading to errors in determining ascent or descent.
- Narrative over formalism: Several responses focused on descriptive storytelling (“Frodo walks east…”) rather than mathematical reasoning.
- Feedback neglect or misapplication: Some students failed to appropriately interpret or apply the feedback provided, either ignoring it or incorporating it incorrectly.
- Error persistence: Certain misconceptions or procedural errors remained stable across multiple attempts.
- Effective contextual-to-formal translation: Some students correctly translated the narrative context into mathematical expressions and vector operations.
- Progressive refinement: A few students demonstrated clear improdement by incorporating feedback iteratively.
Selective Coding: Category Formation
During selective coding, the open concepts were grouped into higher-level categories that captured recurring patterns in how students interpreted the task, engaged with AI feedback, and evolved (or failed to evolve) across attempts. Four main categories emerged, organized around a central theme.
- Participation in learning supported by feedback Students’ performance were primarily explained by how they engaged with feedback. Two contrasting patterns were observed: (i) effective application (reflection, revision, and integration of suggestions), and (ii) ineffective application (ignoring, copying, or misapplying feedback).
- Conceptual obstacles in multivariable calculus Persistent errors clustered around conceptual misunderstandings, particularly the interpretation of partial derivatives and gradients. These obstacles blocked progress even when feedback was available.
- Narrative-to-formal modeling A differentiator was the ability to translate the story-based context into formal mathematical objects (vectors, gradients, directional change). Students who remained at the narrative level tended to produce non-operational answers, while successful students made an explicit change to formalism.
- Iterative improvement vs. No observable change Across attempts, students followed either a refinement trajectory (incremental correction and improved precision) or no change trajectory (repeated miscalculations and stable misconceptions).
Theoretical Coding: Emergent Propositions
From the categories above, a set of theoretical propositions emerged to explain students’ interaction with the AI-assisted assessment environment:
1. Feedback usefulness is supported by students’ participation in learning. AI-generated feedback supports improvement only when students engage in effective application. Ineffective application is associated with persistent errors and minimal score gains.
2. Conceptual obstacles in multivariable calculus act as a bottleneck to iterative progress. Misconceptions about partial derivatives and gradients frequently persist across attempts, limiting the impact of feedback unless these concepts are explicitly addressed.
3. Change from narrative description to formal modeling predicts successful solutions. Narrative-over-formalism responses tend to remain non-operational, while effective contextual-to-formal translation enables students to produce mathematically actionable reasoning and correct procedures.
4. Students follow two dominant trajectories: iterative improvement or no observable change Progressive refinement emerges when feedback is internalized and mapped onto correct conceptual structures; by contrast, stable misconceptions and repeated miscalculations characterize a no-change trajectory despite repeated exposure to feedback.
5. Systematic error persistence indicates that feedback alone is insufficient. Recurring partial-derivative and gradient errors suggest the need for complementary supports to help students convert feedback into conceptual change and procedural accuracy.
6. Discussion
Metacognition emerged not as a predefined variable but as an interpretive construct inferred from multiple indicators, including students’ perceptions of task difficulty, their evaluation of feedback usefulness, and the qualitative categories derived from the Grounded Theory analysis. These indicators captured different ways in which students monitored their understanding, engaged with feedback, and reflected on their own errors. The results suggest that AI-assisted assessment environments can cause metacognitive behaviors when they are designed to promote reflection and iterative engagement.
The convergence of quantitative and qualitative evidence suggests that the assessment design supported forms of self-regulated learning. Students who demonstrated progressive improvement across attempts tended to engage more actively with feedback, revisiting their reasoning and adjusting strategies rather than simply reproducing procedures.
An important implication of the mixed evidence is that learning gains appear to depend less on the presence of AI feedback and more on the quality of learners’ uptake of that feedback. The Grounded Theory results point to two contrasting trajectories: students who actively integrated suggestions and refined their reasoning across attempts versus those who ignored, copied, or misapplied feedback. Reverse Evaluation produced higher central tendency and lower dispersion than the control assessment, suggesting not only improvement but also more homogeneous outcomes when feedback was effectively applied.
At the same time, the analysis revealed that exposure to feedback alone was insufficient to guarantee conceptual change. Several students exhibited repeated errors across attempts, indicating difficulties in translating explanations into corrective action.
These patterns suggest that feedback should be treated as necessary but not sufficient for conceptual change. When students face bottleneck concepts, feedback may remain descriptive and fail to translate into corrective action.
This finding highlights an important limitation of automated feedback systems and reinforces the need for complementary instructional support, such as worked examples, reflection prompts, or professor guidance.
Another relevant element is the tension between narrative engagement and mathematical formalism. While the story-based context can increase interest, the qualitative coding indicates that some students remained at the narrative level, producing non-operational explanations instead of mapping the context to vectors, gradients, and directional change.
Finally, these results should be interpreted in the broader academic-integrity debate around generative AI in assessment. Although AI-assisted systems can enhance learning, they also introduce risks related to overreliance or reduced critical thinking, etc. EaaS approach partially mitigates some risks through workflows, role separation (professor/prompt manager/student), and the use of dynamic variants; however, responsible deployment still requires transparent policies, supervision strategies, and assessment designs that reward explanation quality and conceptual justification over final answers alone.
7. Conclusions
The integration of chatbots such as ChatGPT in mathematics education, particularly in multivariable calculus, presents both opportunities and challenges. While AI-driven systems can enhance learning experiences and support conceptual understanding, their educational value depends on careful pedagogical design, prompt clarity, and active student engagement. Customization and instructor supervision therefore emerge as very important factors for effective implementation.
Within this context, the present study provides empirical evidence on the implementation of an AI-assisted assessment framework based on the EaaS-FCT model, illustrating how automated evaluation and feedback can be integrated into formal assessment while preserving formative intent and pedagogical control.
An important contribution of this study lies in its mixed-methods approach. Quantitative data derived from grades and surveys offered insights into students’ perceptions of task difficulty, feedback usefulness, and overall performance, while qualitative analysis based on Grounded Theory revealed cognitive patterns in students’ interactions with the system. These patterns included error repetition, reliance on narrative reasoning versus mathematical formalism, and refinement through feedback engagement. Together, these complementary data sources allowed for a more comprehensive understanding of learning processes, linking observable outcomes with students’ reflective behaviors during AI-assisted assessment.
The quantitative comparison between the control assessment and the reverse evaluation reveals a substantial difference in students’ performance when AI support is incorporated. In the control condition, the mean score was 1.33 and the median was 0, indicating that more than half of the participants were unable to successfully complete the task. In contrast, the reverse evaluation produced a higher mean score of 4.71 and median of 4.53, along with lower dispersion (SD = 0.97 versus 2.29 in the control condition), getting better performance and homogeneous outcomes.
Survey results further contextualize these performance differences. Most students perceived the exercises as challenging but manageable, with 75% selecting a neutral difficulty level. Half of the participants considered the method appropriate, while 41.7% expressed a neutral evaluation. The perceptions of feedback were predominantly positive, as 58.3% rated the AIQuest feedback as very useful and 8.3% as extremely useful. These findings suggest that while automated feedback can support deeper reasoning, its effectiveness depends on students’ ability to translate explanations into corrective actions, highlighting the need for complementary professor’ support.
Qualitative analysis revealed persistent conceptual difficulties, particularly in the interpretation of gradients and directional derivatives. Some students demonstrated repetitive errors across attempts, indicating that feedback was not always internalized, while others showed progressive conceptual refinement through iterative engagement. This contrast underscores the central role of metacognition and self-regulated learning behaviors in AI-assisted environments, as learning improvements were associated not only with feedback but with students’ capacity for monitoring, reflection, and strategy adjustment.
Beyond its technical implementation, AIQuest functioned as a formative pedagogical tool capable of reshaping the professor’s role. Rather than acting exclusively as evaluators, professors could use platform analytics to identify conceptual bottlenecks, track learning trajectories, and redesign instructional materials.
In this sense, the EaaS model operates as a diagnostic instrument that supports evidence-based instructional decision-making across assessment cycles.
Prompt engineering emerged as a relevant design factor. While narrative-based prompts increased engagement, they occasionally introduced ambiguity; in contrast, more precise and structured prompts enhanced conceptual clarity and mathematical rigor.
Overall, this study demonstrates that AI-driven assessment systems can support conceptual understanding in multivariable calculus when pedagogical strategies, prompt design, and instructor oversight are well integrated. The EaaS-FCT framework represents a new paradigm for digital assessment that may extend beyond mathematics into other STEM domains.
From an implementation perspective, this study offers practical recommendations for educators aiming to adopt AI-assisted assessment systems. Iterative feedback cycles should be intentionally limited and structured to encourage reflection rather than trial-and-error responses, supported by continuous prompt engineering through close collaboration between the professor and the prompt manager. Additionally, professors should actively review feedback logs to identify recurring misconceptions and refine instructional guidance accordingly.
8. Limitations and Future Research
Several limitations must be acknowledged. The study involved a small sample size, limiting the generalizability of the findings. Additionally, the absence of inferential statistical testing and direct measures of professor workload or learning improvements limits the strength of causal interpretations. Metacognition, although central to the analysis, was inferred from behavioral and qualitative indicators rather than assessed through standardized instruments.
Ethical considerations are also central to the use of AI-assisted evaluation systems. Although AIQuest was designed to preserve professor guidance and to use automated grading as a formative rather than decisive mechanism, risks remain relevant, including the possibility of student over-reliance on AI-generated feedback, unequal interpretation of automated responses, and limited transparency in algorithmic evaluation processes. These risks may affect students’ autonomy, fairness in assessment, and trust in evaluation outcomes. These findings reinforce the need for clear pedagogical guidelines, human oversight, and responsible data governance when integrating AI into educational contexts.
Future research should explore larger and more diverse populations, incorporate explicit performance and workload metrics, and investigate adaptive feedback strategies aligned with students’ cognitive profiles.
Author Contributions
Conceptualization, G.N.; methodology, G.N.; software, G.E.N.-R. and J.P.-O.; validation, G.E.N.-R. and G.N.-R.; formal analysis, G.E.N.-R., G.N.-R., R.O. and J.P.-O.; investigation, G.N.; resources, R.O.; data curation, R.O.; writing—original draft preparation, G.N., G.E.N.-R. and G.N.-R.; writing—review and editing, J.P.-O., G.E.N.-R. and G.N.-R.; visualization, J.P.-O.; supervision, G.N.; project administration, G.N.; funding acquisition, G.N.and J.P.-O. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Universidad Politecnica Salesiana.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used in the study can be accessible at https://data.mendeley.com/datasets/4ybbnppvs5/1, accessed on 20 September 2025.
Acknowledgments
We would like to express our gratitude to the IDEIAGEOCA Research Group of the Universidad Politécnica Salesiana, as well as Elena Reascos-Peñafiel, who provided invaluable assistance throughout this investigation.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Adeshola, I., & Adepoju, A. P. (2023). The opportunities and challenges of ChatGPT in education. Interactive Learning Environments, 32(10), 6159–6172. [Google Scholar] [CrossRef]
- Agarwal, V., Verma, P., & Ferrigno, G. (2025). Education 5.0 challenges and sustainable development goals in emerging economies: A mixed-method approach. Technology in Society, 81, 102814. [Google Scholar] [CrossRef]
- Amaral, M., Polo, J., Carrera, D., Mohomed, I., Unuvar, M., & Steinder, M. (2015, September 28–30). Performance evaluation of microservices architectures using containers. 2015 IEEE 14th International Symposium on Network Computing and Applications (pp. 27–34), Cambridge, MA, USA. [Google Scholar]
- Bada, S. O., & Olusegun, S. (2015). Constructivism learning theory: A paradigm for teaching and learning. Journal of Research & Method in Education, 5(6), 66–70. [Google Scholar]
- Bandura, A., Azzi, R. G., & Polydoro, S. A. (2009). Teoria social cognitiva: Conceitos básicos. Artmed Editora. [Google Scholar]
- Bishop, J., & Verleger, M. (2013, October 23–26). Testing the flipped classroom with model-eliciting activities and video lectures in a mid-level undergraduate engineering course. 2013 IEEE Frontiers in Education Conference (FIE), Oklahoma City, OK, USA. [Google Scholar]
- Calonge, D. S., Smail, L., & Kamalov, F. (2023). Enough of the chit-chat: A comparative analysis of four AI chatbots for calculus and statistics. Journal of Applied Learning and Teaching, 6(2), 346–357. [Google Scholar] [CrossRef]
- Camacho Leal, D., Capetillo, A., & Güemes Castorena, D. (2025, June 26–27). Prompting literature review: The impact of transformer models and ChatGPT on prompt engineering and AI research publications. 2025 Institute for the Future of Education Conference (IFE) (pp. 1–6), Online. [Google Scholar]
- Chan, W., An, A., & Davoudi, H. (2023, December 15–18). A case study on ChatGPT question generation. 2023 IEEE International Conference on Big Data (BigData) (pp. 1647–1656), Sorrento, Italy. [Google Scholar]
- El Mourabit, I., Jai Andaloussi, S., Ouchetto, O., & Miyara, M. (2025). Enhancing self-regulated learning through an emotional chatbot: A conceptual framework. Lecture Notes in Networks and Systems, 1310, 511–523. [Google Scholar]
- Ergene, O., & Ergene, B. C. (2024). AI ChatBots’ solutions to mathematical problems in interactive e-textbooks: Affordances and constraints from the eyes of students and teachers. Education and Information Technologies, 29(11), 13329–13364. [Google Scholar] [CrossRef]
- Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American Psychologist, 34(10), 906. [Google Scholar] [CrossRef]
- Glaser, B. G. (1998). Doing grounded theory: Issues and discussions. Sociology Press. [Google Scholar]
- Glaser, B. G., & Strauss, A. L. (1965). Awareness of dying. Routledge. [Google Scholar]
- Ivković, R., Vučković, D., Ječmenić, M., Ananieva, D., Stanković, M., Petković, M., & Janković, N. (2025). The EU AI act and its contractual and educational implications: AI, education, and the law of obligations. International Journal of Cognitive Research in Science, Engineering and Education, 13(2), 123–145. [Google Scholar] [CrossRef]
- Jo, H. (2024). From concerns to benefits: A comprehensive study of ChatGPT usage in education. International Journal of Educational Technology in Higher Education, 21(1), 35. [Google Scholar] [CrossRef]
- Jusoh, S., & Kadir, R. A. (2025). Chatbot in education: Trends, personalisation, and techniques. Multimedia Tools and Applications, 84, 36919–36942. [Google Scholar] [CrossRef]
- Labadze, L., Grigolia, M., & Machaidze, L. (2023). Role of AI chatbots in education: Systematic literature review. International Journal of Educational Technology in Higher Education, 20(1), 1–32. [Google Scholar] [CrossRef]
- Lademann, J., Henze, J., & Becker-Genschow, S. (2024). Building bridges: AI custom chatbots as mediators between mathematics and physics. arXiv, arXiv:2412.15747. [Google Scholar] [CrossRef]
- Lai, E. R. (2011). Metacognition: A literature review (Pearson research report). ER Lai–Pearson Education [Electronic resource]. [Google Scholar]
- Lee, D., Son, T., & Yeo, S. (2024). Impacts of interacting with an AI chatbot on preservice teachers’ responsive teaching skills in math education. Journal of Computer Assisted Learning, 40(2), 543–556. [Google Scholar] [CrossRef]
- Lee, D., & Yeo, S. (2022). Developing an AI-based chatbot for practicing responsive teaching in mathematics. Computers and Education, 191, 104646. [Google Scholar] [CrossRef]
- Li, Y., Luo, W., & Zhao, X. (2018, August 8–11). Flipped classroom teaching model for engineering education based on CDIO. 2018 13th International Conference on Computer Science & Education (ICCSE) (pp. 1–4), Colombo, Sri Lanka. [Google Scholar]
- Lujan, H. L., & DiCarlo, S. E. (2014). The flipped exam: Creating an environment in which students discover for themselves the concepts and principles we want them to learn. Advances in Physiology Education, 38(4), 339–342. [Google Scholar] [CrossRef]
- Maslov, I., Poelmans, S., Wautelet, Y., & Rosenthal, K. (2026). Generative AI’s aid in business process modeling instructional design: A case study. Lecture Notes in Business Information Processing, 565, 343–357. [Google Scholar]
- Meza, J., Linzan Saltos, M. F., Velastegui Campoverde, E. U., Navarro Cejas, M. C., & Arguello Castro, V. C. (2025, June 18–20). AI regulation for ecuador. International Conference on eDemocracy and eGovernment, ICEDEG (pp. 311–316), Bern, Switzerland. [Google Scholar]
- Navas, G., Navas-Reascos, G., Navas-Reascos, G. E., & Guaño, S. (2025). Innovating engineering education with AI: A case study on ChatGPT 4.0’s role in statics-physics. Cogent Education, 12(1), 2539545. [Google Scholar] [CrossRef]
- Navas, G., Navas-Reascos, G., Navas-Reascos, G. E., & Proaño-Orellana, J. (2024a). Exploring the effectiveness of advanced chatbots in educational settings: A mixed-methods study in statistics. Applied Sciences, 14(19), 8984. [Google Scholar] [CrossRef]
- Navas, G., Proaño-Orellana, J., Orizondo, R., & Terreros, A. (2024b, December 2–4). Exams as a service: Synergies between ChatGPT, and cloud computing for education. Smart Technologies, Systems and Applications. SmartTech-IC 2024 (pp. 165–175), Quito, Ecuador. [Google Scholar]
- Navas, G., & Yagüe, A. (2022, April 25–26). Glaserian systematic mapping study: An integrating methodology. 17th International Conference on Evaluation of Novel Approaches to Software Engineering (pp. 519–527), Virtual Event. [Google Scholar]
- Navas, G., & Yagüe, A. (2023). A new way of cataloging research through grounded theory. Applied Sciences, 13(10), 5889. [Google Scholar] [CrossRef]
- OpenAI. (2024). OpenAI company. Available online: https://openai.com/ (accessed on 1 February 2025).
- OpenAI. (2025). Chat completions|OpenAI API reference. Available online: https://platform.openai.com/docs/api-reference/chat (accessed on 1 September 2025).
- Orfanidis, C. (2025). Moral diversity in institutional policies governing the student usage of generative AI: An international comparison. Higher Education Quarterly, 79(4), e70051. [Google Scholar] [CrossRef]
- Park, D., An, G. T., Kamyod, C., & Kim, C. G. (2023). A study on performance improvement of prompt engineering for generative AI with a large language model. Journal of Web Engineering, 22(8), 1187–1206. [Google Scholar] [CrossRef]
- Saleem, N., Mufti, T., Sohail, S. S., & Madsen, D. O. (2024). ChatGPT as an innovative heutagogical tool in medical education. Cogent Education, 11(1), 2332850. [Google Scholar] [CrossRef]
- Spasic, A. J., & Jankovic, D. S. (2023, June 29–July 1). Using ChatGPT standard prompt engineering techniques in lesson preparation: Role, instructions and seed-word prompts. 2023 58th International Scientific Conference on Information, Communication and Energy Systems and Technologies, ICEST 2023-Proceedings (pp. 47–50), Nis, Serbia. [Google Scholar]
- Tan, M. X. Y., Qu, Y., & Wang, J. (2025). Student perceptions of generative artificial intelligence regulations: A mixed-methods study of higher education in Singapore. Higher Education Quarterly, 79(3), e70038. [Google Scholar] [CrossRef]
- Thavi, R., Jhaveri, R., Narwane, V., Gardas, B., & Jafari Navimipour, N. (2024). Role of cloud computing technology in the education sector. Journal of Engineering, Design and Technology, 22(1), 182–213. [Google Scholar] [CrossRef]
- Trubljanin, E., Taruh, E., Čakić, S., Popović, T., & Filipović, L. (2025, March 19–21). Transforming matrix problem solving with intelligent tutoring systems. 2025 24th International Symposium INFOTEH-JAHORINA, INFOTEH 2025-Proceedings, Piscataway, NJ, USA. [Google Scholar]
- Udias, Á., Alonso-Ayuso, A., Alfaro, C., Algar, M. J., Cuesta, M., Fernández-Isabel, A., Gómez, J., Lancho, C., Cano, E. L., Martín de Diego, I., & Ortega, F. (2024). ChatGPT’s performance in university admissions tests in mathematics. International Electronic Journal of Mathematics Education, 19(4), em0795. [Google Scholar] [CrossRef] [PubMed]
- Wu, T., He, S., Liu, J., Sun, S., Liu, K., Han, Q.-L., & Tang, Y. (2023). A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10(5), 1122–1136. [Google Scholar] [CrossRef]
- Zhuang, Y. (2025). Lessons from using ChatGPT in calculus: Insights from two contrasting cases. Journal of Formative Design in Learning, 9, 25–35. [Google Scholar] [CrossRef]
- Zill, D. G., & Wright, W. S. (2011). Cálculo de varias variables. Mc Graw Hill. [Google Scholar]
- Zou, D., Luo, S., Xie, H., & Hwang, G. J. (2022). A systematic review of research on flipped language classrooms: Theoretical foundations, learning activities, tools, research topics and findings. Computer Assisted Language Learning, 35(8), 1811–1837. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.









