Enhancing the Learning Experience with AI

Runceanu, Adrian; Balan, Adrian; Gavanescu, Laviniu; Neagu, Marian-Madalin; Cojocaru, Cosmin; Borcosi, Ilie; Balacescu, Aniela

doi:10.3390/info16050410

Open AccessArticle

Enhancing the Learning Experience with AI

by

Adrian Runceanu

^1,2,*

,

Adrian Balan

^3,4,*

,

Laviniu Gavanescu

⁵,

Marian-Madalin Neagu

⁶

,

Cosmin Cojocaru

⁷,

Ilie Borcosi

¹ and

Aniela Balacescu

¹

Engineering Faculty, Constantin Brancusi University, 210160 Targu-Jiu, Romania

²

National Bank of Romania, 030031 Bucharest, Romania

³

Computer Science Laboratory, Ecole Polytechnique, 91120 Palaiseau, France

⁴

AI Research Laboratory, 031704 Bucharest, Romania

⁵

Gavanescu Laviniu PFA, 030804 Bucharest, Romania

⁶

Faculty of Mathematics and Informatics, University of Bucharest, 010014 Bucharest, Romania

⁷

Faculty of Entrepreneurship, Business Engineering and Management, Polytechnic University of Bucharest, 060042 Bucharest, Romania

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(5), 410; https://doi.org/10.3390/info16050410

Submission received: 25 March 2025 / Revised: 26 April 2025 / Accepted: 6 May 2025 / Published: 16 May 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

The exceptional progress in artificial intelligence is transforming the landscape of technical jobs and the educational requirements needed for these. This study’s purpose is to present and evaluate an intuitive open-source framework that transforms existing courses into interactive, AI-enhanced learning environments. Our team performed a study on the proposed method’s advantages in a pilot population of teachers and students which assessed it as “involving, trustworthy and easy to use”. Furthermore, we evaluated the AI components on standard large language model (LLM) benchmarks. This free, open-source, AI-enhanced educational platform can be used to improve the learning experience in all existing secondary and higher education institutions, with the potential of reaching the majority of the world’s students.

Keywords:

education framework; artificial intelligence; AI assistant; AI evaluator; AI tutor; teaching; large language models

1. Introduction

Artificial intelligence has developed in recent years at a remarkable pace, a growth that has influenced and continues to influence many fields, education being no exception. In this article, we highlight the practical potential of AI to revolutionize secondary and higher education, modifying and improving existing pedagogical approaches by creating motivating, effective, and engaging learning environments.

In Section 2, we study the current advances in teaching and evaluation methods, assessing the benefits of these enhanced methods and their reach in the real world as a proportion of the total student population. We found that less than 10% of the global education system is benefiting from these advanced techniques (see Section 2.3).

We emphasize a major issue in education, the fact that most education institutions are lagging in adopting new technologies in their learning methods. While top universities are able to pioneer the newest education tools in their methods (for example, most Ivy League universities take advantage of online courses, laboratories, and AI-assisted student assistants), this is not the case for the majority of 1.5 million secondary-level [1] and 25000 higher education institutions in the world. In this bulk percentage of educational institutions, which serve 90% of the world’s students, the teaching and evaluation methods are still lagging.

At the same time, the majority of students nowadays prefer electronic support as course distribution support; in plain words, students would prefer to learn on a phone, tablet or computer (See Section 2) [2,3,4]. But, currently, most course materials—main courses, laboratory courses, seminars and exam questions—are mostly available as PDF files, Word documents, or printed books, formats that are not well-suited for direct use in computer- or AI-assisted learning methods.

We assess the main causes/barriers to the introduction of these advanced methods, including low funding, lack of technical experience of the existing teaching corps and low student access to computing resources.

To address these major issues, we introduce in Section 3 an open-source, one-click platform that enables teachers across all these universities to transform their traditional courses into an AI-enhanced learning experience, regardless of their technical expertise. Additionally, this platform is accessible to students from any browser on a computer, tablet or smartphone.

In Section 4, we evaluate the proposed platform through two experiments: (a) a pilot study on a population of teachers and students to assess application usability and correctness; (b) human and automated tests on standard LLM benchmarks to assess the AI components’ performances.

In Section 5, we discuss the solution’s benefits and the issues it solves, before presenting the closing thoughts in Section 6.

2. Review of New Methods Used in Education

2.1. Computer-Aided Methods in Education

Computer-Assisted Methods in Education (CAME) refer to instructional technologies and strategies that leverage computers to enhance teaching and learning. Emerging in the 2000s with the advent of the internet, these methods have since been extensively developed and widely adopted. They offer several benefits, including individualized learning, increased accessibility for students with disabilities, enhanced engagement, the opportunity for self-paced study, and the provision of immediate feedback [5,6].

CAME uses different teaching techniques and technological tools that seek to improve the instructional educational process, of which we discuss the most important: gamification, microlearning, Virtual Reality and, more recently, artificial intelligence (AI).

Virtual Reality (VR) is combined with Augmented Reality (AR) in teaching to completely transform educational experiences by providing dynamic and attractive contexts. AR enhances real-world experiences by superimposing digital data over the surrounding world, while VR lets students explore 3D worlds.

Another computer-based learning method is gamification, integrating specific game elements into the educational context. This approach has the effect of making learning activities more participatory and spectacular. The game-specific competition that gives students a sense of satisfaction is composed of simple elements such as leaderboards, points earned and badges [7]. The market for gamification offerings in education has grown significantly, from USD 93 million in 2015 to almost USD 11.5 billion by 2020, demonstrating its growing acceptance in the industry [6]. This upward trajectory has continued, with the market reaching approximately USD 13.50 billion by 2024 and is projected to grow to USD 42.39 billion by 2034 [8].

Microlearning is about to become a major trend in the educational landscape, standing out by providing solutions to problems such as cognitive overload and teacher fatigue. This new method provides students with short and specific content when it suits them. According to some studies [6,9], microlearning segments of between two and five minutes are found to be very effective in terms of engagement and retention of new information. Microlearning is also suitable for mobile, as it allows learners to interact “on the go” and access knowledge at the right moments [6].

In recent years, artificial intelligence has added to these modern methods of education, especially in adaptive learning systems that provide content specific to the expectations of each student, thus improving the overall educational process [9,10]. This personalization improves the overall learning process and increases its effectiveness.

AI is already seen to play a significant role in education and is continually developing, resulting in intelligent learning guidance programs that adapt to the needs of each student and complex adaptive learning systems [9,10].

2.2. AI in Education

AI has significant potential to transform both teaching and learning in education. AI solutions can automate administrative duties, tailor customized learning pathways, provide adaptive feedback, and overall create more engaging and efficient educational experiences.

In general education, AI has been used to develop personalized learning platforms, intelligent tutoring systems, and even automated essay scoring, although in incipient stages, such as Pearson’s AI study tool [11].

From kindergarten to university level, institutions are faced with the challenge of adapting to diverse student populations with varying learning styles and paces. Here, AI can provide an answer by providing the following:

Personalized Learning/Adaptive Feedback: AI algorithms can personalize learning by assessing student performance, providing instant feedback, and adjusting the pace, content, and evaluation to cater to each learner’s needs [4,5].
100% availability for answers from courses: Similar to a teacher providing answers to students during the main class, an AI solution can provide answers from the course materials to an unlimited number of students 24/7.
Automated Grading: AI can automate tasks such as assignment grading and providing feedback to student inquiries, thus liberating faculty time for more in-depth student interactions.
Enhanced Research: In higher education, AI tools assist researchers [6] in analyzing large datasets and phenomena, being an effective companion in extracting patterns and generating new ideas [7,9].
Improved Accessibility: AI-powered solutions can be enhanced with text-to-speech and speech-to-text features, providing support to students with disabilities.

There are five main directions in the domain of artificial intelligence (AI) application in education: assessment/evaluation, prediction, AI assistants, intelligent tutoring systems (ITSs), and the management of student learning [12]; each demonstrates potential for innovation in the education sector. We discuss below the ITSs, AI assistants and AI evaluators.

An ITS is an AI system which provides personalized adaptive instruction, real-time feedback, and tailored learning experiences to support student progress [13,14]. An AI assistant is generally a simple chatbot which can answer student questions about the class similarly to how a teacher assistant would. The assessment/evaluation AI is basically able to grade students’ test answers.

We can find in the existing literature different studies that present the benefits and use-cases for AI across various educational domains such as the social sciences [15], engineering [16], science [17], medicine [18], the life sciences [19], and language acquisition [20], among others [21,22].

In recent years, the applications of artificial intelligence (AI) in the educational sector have been extensively explored, focusing on chatbots [23], programming assistance [24,25], language models [26,27], and natural language processing (NLP) tools [28]. More recently, the introduction of OpenAI’s Generative artificial intelligence (AI) chatbot, ChatGPT 4.0 [28] and competitor models such as Gemini [29], Claude [30] or LLAMA [31] has attracted considerable interest [32,33]. These chatbots are based on large language models (LLMs), trained on datasets of 100 billion words [25], and have the capability to process, generate and reason in natural languages [26,27,28] at human-expert-equivalent level. As a result of their high performance, AI chatbots have gained widespread popularity [34,35], and they are now starting to benefit diverse fields, including research [36,37,38] and education [39].

In conclusion, both traditional algorithmic tools and AI technologies are used to improve the quality of the education process, including learning and evaluation.

2.3. Estimation of the Usage of New Methods in Education

Research on students’ attitudes and behavior about paper vs. digital learning has become a fascinating area of study. In this section, we present an overview of the current split of the supports and tools used in education and its evolution over time.

Paper-Based Materials (Textbooks and Print Handouts): Traditionally, nearly all instructional supports were paper-based. Meta-analyses in education [40] indicate that, in earlier decades, printed textbooks and handouts could represent 70–80% of all learning materials. Over the past two decades, however, this share has steadily decreased (now roughly 40–50%) as digital tools have been integrated into teaching.
Computer-Based Supports (Websites, PDFs, and Learning Management Systems): Research [41,42] in the COVID-19 pandemic period demonstrates that Learning Management Systems (LMSs) and other computer-based resources (including websites and PDFs) have increased from practically 10% to about 40–50% of the educational supports in some settings. This evolution reflects both improved digital infrastructure and shifts in teaching practices.
Smartphones and Mobile Apps: Studies [43] in the early 2010s reported very limited in-class smartphone use. Over time, however, as smartphones became ubiquitous, more recent research [44] shows that these devices have now grown to roughly 20–30% of learning interactions. This growth reflects both increased mobile connectivity and the rising popularity of educational apps.
Interactive Digital Platforms (Websites, Multimedia, and Collaborative Tools): Parallel to the growth in LMS and mobile use, digital platforms that incorporate interactive multimedia and collaborative features have also expanded. Meta-analyses [45] indicate that while early-2000s classrooms saw digital tool usage in the order of 10–20%, today, these platforms now comprise roughly 30–40% of the overall learning support environment. This trend underscores the increasing importance of online content and real-time collaboration in education.

These studies show an evolution from a paper-dominant model toward a blended environment where computer-based resources and mobile devices have grown significantly over the past two decades. Still, each mode of support plays a complementary role in modern education, and many studies also show that paper is still a preferred medium, especially from the point of view of reading experience [46].

For example, large-scale international surveys (10,293 respondents from 21 countries [47] and 21,266 participants from 33 countries [48]) have consistently indicated that most college students prefer to read academic publications in print. These same studies found a correlation between students’ age and their preferred reading modes, with younger students favoring printed materials. A qualitative analysis of student remarks reported in [49] indicates that students’ behavior is flexible. Students usually learn better when using printed materials, albeit this relies on several criteria, including length, convenience, and the importance of the assignments [50].

2.4. The Shift Towards Using AI Tools

While the above papers showed a good percentage of students still prefer paper support, we have seen in the last 1–2 years a huge shift towards AI-based tools, especially for school assignments and countries with greater access to tech.

Based on several recent studies and surveys, we can estimate that up to 40% of US students—across both secondary and higher education—use AI-based educational tools. To illustrate this shift more concretely, several key surveys can be highlighted below: (a) Global Trends: A global survey conducted by the Digital Education Council (reported by Campus Technology, 2024) [2] found that 86% of students use AI for their studies. This study spanning multiple countries including the US, Europe, and parts of Asia highlights widespread global adoption. (b) United States Surveys: In the United States, an ACT survey [3] of high school students (grades 10–12) reported that 46% have used AI tools (e.g., ChatGPT) for school assignments. A survey by Quizlet (USA, 2024) [51] indicates that adoption is even higher in higher education, with about 80-82% of college students reporting they use AI tools to support their learning. (c) Additional Studies: Additionally, a quantitative study [4] involving global higher education institutions found that nearly two-thirds (approximately 66%) of students use AI-based tools for tasks such as research, summarization, and brainstorming.

Together, these findings suggest that while usage rates vary by education level and region (with higher rates in the US), there is a continuing global trend towards integrating AI-based educational tools in schools and universities.

2.5. Chatbots in Education

2.5.1. Chatbots: Definition and Classification

A chatbot application is, in simple terms, any application, usually web-based, which can chat with a person in a similar way a human does, being able to answer user questions and follow the history of the conversation.

Chatbot applications can be classified [52] based on distinct attributes such as the domain of knowledge, services provided, objectives (goals) and response generation and responses generated [53], as summarized in Table 1. Each classification method highlights specific characteristics that determine how a chatbot operates and interacts with users, as detailed in Table 1.

Based on the knowledge domain the apps can access, we classify the chatbots as (a) open knowledge domain (general-purpose) and (b) closed knowledge domain (domain-specific) bots. Open chatbot applications address general topics and answer general questions (like Siri or Alexa). On the other hand, closed chatbots have specific knowledge domains to provide answers to questions from different domains [53,54].

Based on the service provided, we can classify the bots as (a) informational when they merely provide known information, (b) transactional when they can handle actions like bookings, or (c) assistants when their role is to provide assistance in a similar way as a support person [53].

Based on their fulfilled objective/goal, bots can be designed to either complete specific user tasks (task-oriented), engage in a human-like dialogue (conversational), or be specialized in learning/training.

Finally, based on how the chatbot generates the responses from the input [55], the chatbots can be rule-based, can leverage a full AI architecture, or use a combination of the two (hybrid) [53].

2.5.2. Chatbots: Structure and Role in Education

A chatbot is an application, implemented programmatically or leveraging Generative AI [55,56], that understands and answers questions from human users (or chatbots) in natural language [57] on a particular topic or general subject, by text or voice [58,59].

Figure 1 illustrates the general workflow for interacting with a chatbot that integrates natural language processing (NLP) with a knowledge base retrieval system. The process begins when the chatbot receives input from the user, whether as text, voice, or both. This input is then converted into text and forwarded to the NLP component, which processes and comprehends the query. The response area uses different algorithms to process the existing knowledge base and then provides a variety of responses to the response selector. In this data processing step, the answer selector uses machine learning and artificial intelligence algorithms to be able to choose the most appropriate answer for the input [60]. The current trend is to move towards a more streamlined system that consolidates the process into fewer steps (Question–AI Model–Response), albeit at a higher cost per question, as shown in Figure 1b.

2.5.3. Educational Chatbots Survey

In the subclass of specialized service-oriented applications [57], we can include educational chatbots. We present a short review of the most common educational chatbots in Table 2.

As seen above, the reach of AI in education is high and increasing. Still, we face major barriers, such as the following:

Insufficient training and digital literacy among educators. For instance, ref. [4] found that many higher education teachers across several countries felt unprepared to fully leverage AI due to inadequate institutional training and support.
High implementation costs, especially in institutions with lower resources [68].
Lack of clearly defined and easily applicable policies for the ethical adoption of AI components as well as concerns about data privacy and fears of algorithmic bias.

These obstacles foster skepticism, and they need to be addressed for the effective introduction of AI tools in education.

We reviewed the split of different supports in education and presented the benefits and challenges of computer-aided AI solutions for education. These limitations can be overcome by the AI-enabled open-source framework we propose in Section 3.

3. Materials and Methods

3.1. Proposed Solution Description

Traditional educational approaches often struggle to address the diverse needs of individual learners, particularly in the science, technology, engineering, and mathematics (STEM) domain. Limited instructor availability, inconsistent feedback, and a lack of personalized learning experiences can hinder student progress and engagement [69]. To address these challenges, we developed an AI-powered teaching assistant designed to improve the learning process in university-level programming courses. This framework is open-source and can be used by any instructor and student without any technical skills required.

3.2. High-Level Description

Our application consists of four student modules and two supervisor/teacher modules.

Student-facing modules

Module 1—AI Teaching Assistant. In this module, the student has access to the course material and to an AI assistant (chatbot) who can answer questions related to the contents of the course material.

Module 2—Practice for Evaluation. In this module, students can prepare for the final assessment using “mock assessments”, where the questions generated by the AI are similar to but different from those of the final assessment. The student is tested on multiple-choice questions, open-text questions, and specific tasks (for example, computer programming). The AI evaluator provides immediate feedback on the student’s answers, suggesting improvements and providing explanations for correct answers. It is worth mentioning that the AI evaluator can also redirect students to Module 1 to review the relevant course information and seek further clarification from the AI assistant.

Module 3—Final Evaluation. After students complete several practice evaluations, they have the option to take the final evaluation. The module is similar to the practice for evaluation module, but it represents the “real exam”; questions are created by the teacher (or generated by AI and validated by the teacher) and graded by the AI evaluator, and the final grades are transcribed in the official student file. The feedback and grades can be provided instantly or be delayed until the teacher double-checks the evaluation results.

Module 4—Feedback. This module allows students to provide optional feedback related to their experience using this AI application. This feedback is essential for further development and improvement of this framework.

Supervisor modules

Module 5—Setup. Teachers are provided with a simple interface allowing them to carry out the following:

Add a new course;
Drag-and-drop the course chapters documents (in PDF, DOC or OPT format);
Drag-and-drop a document containing exam questions or opt for automatic exam question generation;
Optionally upload excel files with the previous year’s student results for statistical analysis.

Module 6—Statistics. This module extracts statistics on the year-on-year variation in students’ results. We use it to evaluate the efficiency of this teaching method vs. the previous year’s results in the classical teaching system (See Section 4.1.3).

3.3. Description of the User Interface

The user interface of our application is presented below in Figure 2:

The UI is divided in three frames as shown in Figure 2. The left frame is for navigation, the center frame for displaying the course material (PDFs) or exam questions, and the right frame for accessing the AI assistant or AI evaluator.

Depending on the module he chooses, the UI provides the following functionalities:

-: Module 1—AI Assistant: In this module, the student selects in the left frame one of the course chapters which is displayed in the center frame. Then, the student can ask the AI assistant questions about the selected chapter in the right frame.
-: Modules 2 and 3—Preparation for Evaluation/Final Evaluation: These modules share a similar UI and present students with exercises, questions, and programming tasks in the middle frame. In the right frame, the chatbot provides feedback on student responses or links to Module 1 so the student can review the course documentation.
-: Module 4—Feedback Survey: The center frame displays a form for evaluating the application.

3.4. Architecture Details of the Platform

In this section, we provide a detailed description of the architecture of our solution (See Section 3.4.1) and details of the AI components’ architecture (See Section 3.4.2).

3.4.1. General Software Application Architecture and Flow

The application leverages a multi-layered architecture comprising a Frontend, a Backend, and external services, orchestrated within a containerized environment.

The architecture is highly modular, with clear separation of roles between Frontend, Backend, and LLM components. This allows for easy extension and modification (e.g., changing the LLM).

The solution can be deployed via Docker on any server, in our case being deployed on a Google Cloud CloudRun [70] instance. Docker is a packaging concept which simplifies deployment and ensures consistent behavior across different environments.

To understand the architecture of the application, we illustrate in Figure 3 the main components of the application and the usual dataflow in the case of the Module 1. Briefly, the students ask a question to the Frontend (UI) component of the application which runs in a Docker Container in Google Cloud. The question is processed by Backend, augmented with the course-relevant context, and then processed by an external LLM, and the answer is sent back to the student.

3.4.2. AI Components/Modules’ Architecture

The AI components are implemented in Modules 1, 2, and 3 of the Backend components.

Module 1—AI Assistant

To maintain its independence from any LLM provider, the Backend architecture for Module 1 is centered around three base concepts:

LLM Protocol: This is an interface which describes the minimal conditions that an LLM needs to implement to be usable; in this case, it should be able to answer a question.
RAG Protocol: This is an interface which describes what the Retrieval Augmented Generation (RAG) pattern should implement. The main idea [71] is to use a vector database to select possible candidates from the documentation (PDF course support), which are provided as context to an LLM so it can answer the student question. The objects implementing the RAG Protocol provides functions for the following:
- Retrieves context—When given a question, it is able to retrieve the possible context from the vector DB.
- Embedding Generation—This is a helper function to convert (embed) text into vector representation.
- Similarity Search—This performs similarity searches within the vector store database to find the most relevant chunks.
LLM RAG (Figure 4) is a class which contains the following:
- An object ‘rag’ which implements the RAG Protocol (for example, RAGChroma) to store and retrieve relevant document chunks (content) for the question.
- A function to augment the question with the context recovered from the ‘rag’ object.
- An object ‘llm’ which implements the LLM Protocol (for example, LLMGemini) to answer the augmented question.

The flow of a question in LLM-RAG is shown in Figure 5. The question from a student is sent to LLM_RAG. In turn, the LLM RAG calls the component RAG Chroma for context. The LLM_RAG then creates an augmented question in a format similar to “Please answer {question} in the context {context extracted by rag}”. This question is sent to the LLMGemini component, and the answer is sent back to the student, eventually enhanced with snippets from the course material.

Module 2—AI Practice for Evaluation and Module 4—AI Evaluation

Both Modules 2 and 3 are implemented around the evaluator protocol concept: an interface which describes how an evaluator (judge) should grade one question and the feedback it should provide to the student, similar to the concept of LLM as a judge presented in HELM, Ragas [72,73].

The evaluator protocol implementations are providing functions for evaluating user responses including the following:

Evaluate Answer: Compares user’s free text or single-choice answers with correct answers.
Evaluate Code Answer: Executes user-provided code snippets and compares them against correct code.
Calculate-Score: Calculates the overall score based on a list of responses.

3.5. Technical Implementation of the Platform

We present the details of the technical implementation of our platform. As outlined in the general architecture Section 3.4, the application is divided into Frontend and Backend which are packed in a Docker image deployed over Google CloudRun. We utilize an instance with 8 GB of memory for tests, but 2 GB should be enough for normal usage.

All components are a combination of standard Python code and LLMs prompted specially to be used in question/answer, evaluator, or assistant mode.

3.5.1. Frontend Implementation

The Frontend is based on Streamlit, an open-source Python 3.11 framework used for building and deploying data-driven web applications.

The Frontend component implements the following:

Session Management: Manages user sessions and state. This includes handling Google OAuth 2.0 authentication.
User Interface: Provides the user interface for interaction, including chapter selection, course navigation, dialog with the AI, evaluation, and feedback surveys:
○
Navigation: Uses a sidebar for primary navigation, allowing users to select chapters/units and specific course materials.
○
Dialog Interaction: Renders a dialog zone where users interact with the AI assistant. This includes input fields and display of AI responses.
○
Evaluation Display: Presents evaluation results to the user.
○
Styling: Streamlit themes and custom CSS.

3.5.2. Backend Implementation

The Backend is modular and implements all components described in 3.1, with differences between Modules 1-3 (AI) and 4-5-6 (Feedback, Setup, and Statistics).

Modules 1-3 implement the AI components. To implement the LLM Protocol, we used as backbone mainly Gemini versions 1.0, 1.5 Flash, 1.5 Pro, 2.0 Flash and 2.0 Pro [74,75,76] from Google Vertex AI [77], but we also tested ChatGPT and Claude Sonnet [78,79].

For the LLMRAG, we used a Chroma DB implementation, but Microsoft Azure AI Search [79] or Vertex AI Search [80] could be substituted based on preference.

The implementation of the evaluator protocol was also a custom-made class, similar to LLM-as-a-judge [81], using Gemini 2.0 as a backbone.

Modules 4-5-6 (Feedback, Setup, and Statistics) were implemented in Python, using the SQLite database for storing the course content, exam questions, student list, grades, and feedback. Statistics graphs were generated using the plotly [74] and pandas [75] packages. Feedback was implemented with Google Forms, although any other similar option can be integrated.

4. Experiments and Results

4.1. Description of the Experiments

We performed two types of experiments: (a) in the first set of experiments, a cohort of students enrolled in four pilot courses (see Section 4.1.1) and instructors (see Acknowledgements) assessed the quality of the platform as an AI assistant and evaluator on a range of criteria described in Section 4.2; (b) in the second set of experiments, we evaluated the correctness and faithfulness of the AI components’ answers on a set of classic LLM metrics (see Section 4.3).

The statistics extracted from the results of the cohort of students will constitute a third experiment, which will be reported in a new paper at the end of the courses.

4.1.1. Pilot Courses Evaluated on Our Solution

For our study, we used the four courses presented in Table 3.

These courses are part of the Undergraduate program in Automatics and Applied Informatics offered by the Faculty of Engineering of the Constantin Brancusi University (CBU) of Targu-Jiu, located in Romania.

The first two courses are linked, as the first covers introductory SQL topics in the field of database design and administration, and the second course extends to techniques for designing applications that process databases in PL/SQL. In CBU, these are the students’ first introduction to databases. In the third course, students learn the Java programming language, and in the fourth course, students focus on applied programming techniques for software development.

Throughout these courses, students receive both theoretical and practical learning materials and participate in practical laboratory activities where they work on hands-on tasks.

Our research presents the usability and reliability of the proposed AI framework when applied to these courses.

4.1.2. Sample

The cohorts of students enrolled in the pilot courses in the new academic year and their distribution are summarized in Table 4 below.

Students who did not pass these courses in previous years were not included in this research, ensuring that the students in the sample had no prior knowledge of the subject.

4.1.3. Description of the Classical Teaching Process

To establish a baseline, we will describe the traditional teaching process for the courses involved in this study.

Each course consisted of weekly lectures and practical laboratory sessions, both lasting 2 hours over a 14-week period. After each lecture, in which theoretical notions were presented with useful examples, students participated in practical laboratory activities. During these sessions in the laboratory, they individually executed the code sequences demonstrated in the lecture and then solved additional tasks based on the presented concepts.

After 7 weeks, students took a 60 min mid-term assessment, with the goal of keeping them motivated on continuous learning and to be able to identify those struggling at an early stage.

The mid-term assessment had two parts: (a) 15 min for 10 single-choice questions; (b) 45 min allotted for 2 coding exercises directly on the computer.

The final exam, at the end of the 14 weeks of the course, took 120 min and was structured similarly: (a) 40 min for 20 single-choice questions; (b) 80 min for 2 coding exercises.

Following the final exam, students were asked to fill in a questionnaire on how they used the teaching materials and also to evaluate the teaching assistant. In this way, the effects of these traditional teaching resources such as textbooks, books, problem books, both in print and on the web, could be highlighted.

It is worth mentioning that during this final assessment, students were monitored by the lecturer and the teaching assistant and advised to use only the materials previously presented in the lectures and in the laboratory activities. The use of messaging applications and AI tools, such as ChatGPT, was not allowed during the classical assessment.

4.1.4. Description of the AI-Enhanced Teaching Process (Pilot)

In this pilot experiment, the AI framework was added as an additional support to the existing classical teaching process. So, in addition to the lectures and laboratory, the students had access to our platform. They could use it to reread the course, ask questions to the AI assistant (Module 1), and prepare for the evaluations in Module 2—Prepare for Evaluation. The mid-term and final evaluation were passed and graded in Module 3—Evaluation, and the student feedback recovered in Module 4—Feedback.

4.2. Results: Evaluation of the Platform by Instructors and Students

The experiment was launched this year with the students enlisted in the four pilot courses described in Section 4.1.2 and a cohort of seven high school and ten university teachers to assess the quality of the proposed learning platform. We present the summary of assessments of the students and instructors involved.

The evaluation criteria were derived by synthesizing the most prevalent qualitative metrics relevant to our application from the existing literature on computer- and AI-assisted educational technologies [16,17,23,41]. These criteria facilitate an assessment of perceived quality from both student and teacher perspectives regarding the responses provided by the AI assistant, as well as the overall usability and effectiveness of the educational application.

4.2.1. Perceived Advantages and Disadvantages for Instructors

In the first phase, the framework and the web application were evaluated by the instructors mentioned in the Acknowledgements section. They assessed the application based on the criteria listed in Table 5 and were also asked to provide open-ended feedback.

Below is the prevalent free-form feedback recorded from the teachers:

This application greatly simplifies the migration of their existing course material to an online/AI-enhanced application, an obstacle which was, in their opinion, insurmountable before being presented with this framework.
The ability to deploy the application on a university server or cloud account avoids many of the issues related to student confidentiality.
They appreciated the reduction in time spent on simple questions and grading which permits them to focus on more difficult issues.

4.2.2. Perceived Advantages for the Students

We used the feedback form to obtain initial student feedback to the questions in Table 6.

Additionally, we extracted the following free-form feedback:

Students consider a major benefit of this platform to be that they can ask any question they might hesitate to ask during class (so-called “stupid-questions”) while having the same confidence in the answer as if they were asking a real teacher.
They appreciate that each answer highlights the relevant sections in the text, which increases their confidence in the AI assistant’s answer.
They appreciate that the application can be used on mobile phones, for example, during their commutes or small breaks.

4.3. Results: Testing of the AI Components of the Platform

To test the performance of the AI modules, we used a dataset composed of the following (Table 7):

A total of 16 single-choice questions from previous exams.
A total of 40 free-answer questions.
-
A total of 16 questions from previous exams (Manual Test 1 and Test 2) same as the single-choice ones from above, but we deleted the possible answers and asked the AI to answer in free form;
-
A total of 24 questions generated with o3-mini-high with low, medium, high difficulty settings.

For single-choice answers, 100% answer correctness was obtained if the context was properly extracted by the RAG, so the rest of the analysis focused on more difficult free-answer questions.

We present the results on the AI assistant in Section 4.3.1 and on the AI evaluator in Section 4.3.2; the common results are presented in Section 4.3.3 and summarized AI results in Section 4.3.4.

4.3.1. AI Assistant (Module 1) Assessment

The assistant was graded both manually and using Ragas [81,82], a specialized library for evaluation of RAG-type specialized assistants.

Manual tests. For the manual tests, we evaluated only the final answer, with two human experts who were both familiar with the course material. We evaluated a single metric “answer_correctness” in a binary mode (correct or incorrect). Incomplete answers were labeled as incorrect. Due to inherent subjectivity in interpreting answers, as well as due to human error when handling large sets of data (250 rows), the initial evaluations on the same questions were different in about 5% of the cases (95% consistency). These inconsistencies were discussed, and the agreed answers were considered correct.

Automated tests. The assistant was evaluated automatically against two types of metrics [81,82]:

Retrieval (Contextual) Metrics, i.e., whether the system “finds” the right information from an external knowledge base before the LLM generates its answer. The metrics used were as follows:
- Context Precision—measures whether the most relevant text “chunks” are ranked at the top of the retrieved list.
- Context Recall—evaluates whether all relevant information is retrieved.
Generation Metrics, i.e., whether the LLM must have an answer that is not only fluent but also grounded in the retrieved material. The metrics we employed were as follows:
- Answer Relevancy—how well the generated answer addresses the user’s question and uses the supplied context. It penalizes incomplete users or unnecessary details.
- Answer Faithfulness—whether the response is factually based on the retrieved information, minimizing “hallucinations”, estimated either with ragas or human evaluation.
- Answer Text Overlap Scores (conventional text metrics BLEU, ROUGE, F1 [82])—compare generated answers against reference answers.

We compared the results using five different LLM backbones, from Gemini 1.0 to Gemini 2.0 Pro. All LLMs performed well in terms of answer correctness, matching or surpassing the human experts (Table 8).

Furthermore, we present in Figure 6 a split on question difficulty and the question generation (AI or manual).

The analysis of the results (Figure 6) leads to these main conclusions:

Correctness is very high for all LLMs, with results on par with 1 expert.
The answer relevancy results are very promising as well, having mostly scores above 80% relevancy, as observed by human raters in the HELM study [83].
Context retrieval is very important; results are better when more context is provided, which is expected and natural [82].
For faithfulness, we extracted two trends: (a) the faithfulness is better for higher difficulty questions; (b) faithfulness increases for newer LLMs, Gemini 2.0 Pro being the best. Gemini 1.0 and 1.5 will sometimes ignore the instruction to answer only from context.
Older metrics are not relevant: NLP (non-LLM) metrics like ‘ROUGE’, ‘BLUE’, ‘factual correctness’ are no longer suited for evaluation of assistant performance (see Appendix A.1 with full results and [83]). The main explanation is that two answers can correctly explain the same idea and obtain a high answer relevancy but use very different words which will cause bleu. rouge and factual correctness to be very low.

We provide below a detailed discussion of the correctness, relevancy, faithfulness, and context retrieval in the context of the AI assistant.

Correctness and Relevancy. The correctness of the answers is on par or better than expert level, and the relevance also is on par with HELM [83]; we observed here just the facts known in the last 6 months, that LLM solutions are now on par or better than human experts in most domains.

Faithfulness. The analysis of faithfulness helped us understand an initially puzzling result in the raw data, in which Gemini 1.0 gave better results than Gemini 2.0, although only by a very small margin. After observing the faithfulness graphs, we noted that Gemini 1.0 and 1.5 generations of models were not as faithful as expected, the main reason being that they did not respect the instruction given in the prompt “Please do not answer if the answer cannot be deduced from the context received”, while Gemini 2.0 was much better at reasoning and respecting instructions. After closer analysis of the cases where Gemini 1.0 and 1.5 answered correctly but Gemini 2.0 did not provide a response, we found that the information was not present in the retrieved context, and Gemini 1.0 and 1.5 were responding from their own knowledge, without respecting the prompt to answer “only from the provided context”. Thus, the actual “correct” response in the given context was provided by Gemini 2.0.

We retested and confirmed this hypothesis by adding a set of questions which were not related to the given class document. While the RAG extracted no context, Gemini 1.0 and 1.5 gave answers to more than 60% of the questions which should have not been answered, while Gemini 2.0 correctly responded that “I am not able to answer in the context of this class”. We removed such cases from the rest of the analysis, but these cases are saved and can be found in the raw data (link in Appendix A).

Context retrieval. Context retrieval is very important for RAG LLMs, so we dedicated a small section to describe the result. It makes utilization in specific contexts possible, and it reduces the costs because it includes only the relevant context in the prompt sent to the LLM.

We tested this with two methods: one with chunks limited to 3000 tokens, and one with pages, usually limited at 500 tokens.

In our cases, the extraction of context with chunks of 3000 tokens with an overlap value of 300 tokens was always superior to that with pages. This outcome was most likely because we have six times more context and because sometimes ideas are split on two or more pages.

The results in Figure 7 show the following: (a) the correct answers of the system drop by 5–7%, with a higher drop on more difficult questions; (b) the metric “context recall” measured by ragas drops drastically with smaller context.

We are aware that our RAG framework has room for improvement, and we will continue to update it in the future. By improving it, the results will be more satisfactory, and the costs will be reduced, as less but more relevant context is sent to the LLM.

4.3.2. AI Evaluator Assessment (Module 2 and 3)

To assess our AI evaluator, similar to [84] and following the best practices mentioned in [85], we used the same 16 single-choice and 40 free-answer questions to which we added reference answers (ground truth) and student answers. With this setup built, we performed manual and automatic evaluation.

The results of the evaluator grading:

In manual mode (using two human experts):

Evaluator grading to free-form questions: correctness 90%.
Evaluator grading to single-choice questions: correctness 100%.
Relevancy of evaluator suggestions to wrong questions: relevance 99%.

In automatic mode (by using ChatGPT O1 as a judge):

Comprehensiveness (whether the response covers all the key aspects that the reference material would demand): 75%.
Readability (whether the answer is well-organized and clearly written): 90%.

4.3.3. Common Benchmarks

Certain considerations, such as stability and language effect, apply to both the assistant and evaluator, so we present them separately.

Stability and faithfulness. The platform was configured and empirically adjusted to optimize user experience in the context of AI-enhanced learning: (a) answer quality—it is instructed to respond “I am not able to answer this question in the class context” if it is unsure about its answer; (b) stability and consistency—we set the LLM temperature to 0 and steps to 0 to reduce the variability in the LLM answer [86].

We evaluated the stability by rerunning the same 10 questions 10 times. As all answers were equivalent, we estimated the instability to be below 1%.

Translation Effect. We observed a small effect on changing the languages (our first test was on a class in Romanian). Still, this effect was almost nonexistent for newer LLM backbones (almost unmeasurable for Gemini 2.0), as these backbones have improved their multilanguage ability [87].

However, there are two important effects on the RAG component: (a) First, you need an embedding model which is multilanguage. We used distiluse-base-multilingual-cased-v2 embeddings [88] to accommodate content in all languages. (b) Second, if the course documentation is in English, and the question is in French, the vector store would be unable to retrieve any relevant context.

To address this, we have a few options which can be implemented: (1) require that the questions are in a fixed language (usually the same language as the course documentation), configured by default; (2) translate each question to the language the class documentation is in; (3) have all the class documentation in the database translated in a few common languages at the setup phase.

4.3.4. Summary of LLM Results

We obtained 100% correct answers on single-choice questions and 95–100% correct answers on free-form results, which surpass human-expert level. We observed that the performance is strongly influenced (>10%) by the context retrieval performance.

Based on the reported results of the application performance, we consider that this application can be used in the current state for high school and university level.

Going forward, we can focus our improvements on three directions: (1) improve RAG performance to ensure that the LLM receives all relevant context for the questions; (2) reduce LLMs costs by providing only the relevant context; (3) upgrade to better-performing LLMs.

5. Discussion

From our review study, we found that AI assistants and AI evaluators are a useful and needed addition to the classical teaching methods (See Section 2).

The implementation’s accuracy/faithfulness of our proposed solution (See Section 3) was more than satisfactory with current LLMs (See Section 4.3), and its usefulness and ease of use was evaluated as excellent by both instructors and students (See Section 4.2).

While the introduction of this framework as an extension to classical courses seems both beneficial and needed, we still have to consider the obstacles to adoptions (See Section 2.5), mainly the technical adoption barrier, costs, competing solutions, and legal bureaucracy.

5.1. Technical Adoption Barrier

This application was designed to be very easy to use and adapted for any non-technical user, in particular because its deployment is a “one-click” process, and its UI is designed with intuitiveness in mind. The feedback of the instructors and students (See Section 4.2) confirmed that we achieved this goal and that the application is easily accessible for even the least technical person.

5.2. Cost Analysis

This app is open-source and free to use by any university. Still, there are two main costs: for hosting the platform and for LLM usage.

As shown in Section 3.5, the app requires a system with at least 2 GB of memory. This can be found in almost any university and is usually offered in the free tier for most cloud providers.

To evaluate the LLMs cost, we consider that in standard STEM courses, we have 15 chapters of lectures with 3 evaluations for each 1 and roughly 50 students who might ask 10 questions per chapter, adding up to ~10 k questions. To answer each question, the RAG will augment each question (originally 50 tokens/word) to around 1000 tokens and provide an answer around 50 tokens long, resulting in ~10 M input tokens and ~45 k output tokens.

In Table 9, we present a detailed comparison table of the LLMs’ related cost computed for the main models on the market for the above case (1 course/15 chapters).

We observed that one of the best solutions we tested (Gemini 2.0) gave a cost/course of only USD 1. While this might still be a barrier in some demographics, these costs are only dropping exponentially, with the cost/token reduced in half every 6 months [84]. Furthermore, we plan to establish a collaboration which will sponsor at least some of these costs.

5.3. Competing Solutions

We investigated whether this solution can bring any benefit with respect to the existing solutions. As a solution providing online courses enhanced with an AI assistant to students, our open educational platform can be seen as both an extension of and complement to Massive Open Online Courses (MOOCs). In Table 10, we compare learning platforms which implement some form of computer- or AI-assisted teaching, ranging from purely academic MOOCs (Stanford Online, edX) to professional-training platforms (Coursera, Udemy).

While not a MOOC in the traditional sense, our platform fills a specific gap in the current educational-technology landscape. Whereas platforms like Coursera, edX, or Udemy focus on centralized course delivery, our framework is decentralized and provides advantages such as low cost, ease of setup, facility of utilization, and the integration of an AI assistant/evaluator. This space of low-cost, open-source, AI educational solutions which our framework is targeting is practically not addressed by any of the existing applications, which makes us think that the launch of our platform is both needed and beneficial.

5.4. Legal and Governance Issues

There are still gaps in legislation and policies related to the usage of AI in student education and the confidentiality of student data. These gaps are being addressed in different countries in recent years and should be reduced progressively in the near future. Still, we consider that most of these impediments are avoided in our application because the students are already enrolled in the high school or university courses.

Therefore, we think our application has an advantage over all existing educational platforms, whether in cost, technical adoption, policy barriers or possible reach. We estimate that this framework can reach more than 90% of the world’s students and instructors, including demographics otherwise unreachable by existing solutions.

As a next step, we propose to give as much exposure as possible to our proposed application, most probably in the form of a collaboration with public and private institutions, to make it available for free in any high school and university. Results obtained at the end of the pilot phase will help us better quantify the effect on student results (improvement in grades, time spent learning, etc.) and will contribute to the adoption of the platform.

6. Conclusions

We evaluated the current status of computer-aided methods in education, including AI approaches. The AI methods offer significant benefits, but there are major barriers to their adoption related to costs and technical literacy of the instructors.

To address these challenges, we created an easy-to-use AI framework, composed of an AI assistant and AI evaluator. This platform enables instructors to migrate existing courses with a simple drag-and-drop operation, effectively overcoming the “technical literacy” barrier. It provides a wide range of advantages (See Section 4.2) such as near 100% accuracy (See Section 4.3), high consistency, low costs (estimated at USD 1/year/class), and fewer policy barriers as it is an open-source solution which can be fully controlled by the educational institution. From the student perspective, it has significant advantages such as 24/7 availability enabling a flexible learning schedule, mobile device accessibility, increased answer accuracy and consistency, and a lowered teacher–student barrier.

Our solution compares positively with all existing solutions. The combination of the AI-enhanced learning experience, low-cost maintenance, open-source licensing, and excellent performance makes us strongly believe that this application can see widespread adoption in the coming years, contributing significantly to the democratization of the educational system.

Author Contributions

Conceptualization, A.R. and A.B. (Adrian Balan); methodology, A.R. and A.B. (Adrian Balan); software, A.R., A.B. (Adrian Balan), and L.G.; validation, A.R., A.B. (Adrian Balan), and L.G.; formal analysis, A.R. and A.B. (Adrian Balan); investigation, A.R., A.B. (Adrian Balan), M.-M.N., C.C., and L.G.; resources, A.R., A.B. (Adrian Balan), and L.G.; data curation, A.R., A.B. (Adrian Balan), and L.G.; writing—original draft preparation, A.R., A.B. (Adrian Balan), I.B., L.G., and A.B. (Aniela Balacescu); writing—review and editing, A.R., A.B. (Adrian Balan), I.B., L.G., M.-M.N., C.C., and A.B. (Aniela Balacescu); visualization, A.R., A.B. (Adrian Balan), M.-M.N., C.C., and L.G.; supervision, A.R., A.B. (Adrian Balan), I.B., L.G., and A.B. (Aniela Balacescu); project administration, A.R., A.B. (Adrian Balan), and L.G.; funding acquisition, A.R., A.B. (Adrian Balan), I.B., L.G., M.-M.N., C.C., and A.B. (Aniela Balacescu). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The code used in this paper is found at: https://github.com/intelpen/teaching_helper (accessed on 26 February 2025). The demo of deployed application used in this paper is found at: https://teaching-helper-612652291913.europe-west3.run.app/ (accessed on 26 February 2025). The app is protected with O2Auth. Please request to be added to the demo by sending an email at adrian.balan@airl.ro. Ddta used for evaluating the LLM, RAG and Evaluator component can be found at https://github.com/intelpen/teaching_helper/tree/main/tests/results (accessed on 26 February 2025).

Acknowledgments

We thank the following instructors for their contribution in the evaluation of the framework: A.B. (Adrian Balan), A.R., L.G., A.B. (Aniela Balacescu), I.B., M.-A.R., L.-G.L., F.G., A.L., G.G., M.I., M.-M.N., R.S., M.R., A.I., A.L. We thank the students involved in the 4 pilots described in Section 4.1.2. We acknowledge the help of Emanuel Aldea (Paris Sud University, FR) for his review and improvement suggestions on several versions of this manuscript.

Conflicts of Interest

Author Laviniu Gavanescu was employed by the company Laviniu Gavanescu PFA. Author Adrian Balan was employed by the company AI Research Laboratory. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
SQL	Structure Query Language
LLM	Large language model
CAME	Computer-Assisted Methods in Education
CAI	Computer-Assisted Instruction
CEI	Computer-Enhanced Instruction
VR	Virtual Reality
AR	Augmented Reality
LMS	Learning Management System
NLP	Natural Language Processing
ChatGPT	Chat Generative Pre-Trained Transformer developed by OpenAI
Gemini	Generative artificial intelligence chatbot developed by Google
Claude	a family of large language models developed by Anthropic
LLAMA	a family of large language models (LLMs) released by Meta AI
LLMProtocol	Large Language Model Protocol
RAG	Retrieval Augmented Generation
RAGProtocol	Retrieval Augmented Generation Protocol
LLMGemini	Large Language Model Gemini
Claude Sonnet	Claude 3.5 Sonnet
VertexAI	Vertex AI Platform
ECTS	European Credits Transfer System

Appendix A

Appendix A.1

Table A1. Full results from the Module 1 evaluation of chunks and pages RAG strategy.

Row Labels	Average of Corect	Context_Recall	Faithfulness	Answer_Relevancy	Bleu_Score	Rouge Score	Factual Correctness
chunks
gemini-1.0-pro	95.00%	81.25%	71.85%	86.39%	8.34%	22.99%	56.00%
gemini-1.5-flash	97.50%	82.50%	74.13%	87.09%	14.85%	31.00%	57.11%
gemini-1.5-pro	97.50%	83.75%	74.38%	83.31%	13.40%	35.69%	43.77%
gemini-2.0-flash	100.00%	77.50%	82.92%	84.82%	15.08%	35.93%	46.85%
gemini-2.0-pro	97.50%	82.50%	87.61%	85.12%	10.01%	24.76%	44.40%
pages
gemini-1.0-pro		47.08%	74.05%	45.12%	3.09%	32.04%	21.85%
gemini-1.5-flash		47.08%	57.83%	74.76%	7.81%	20.64%	47.13%
gemini-1.5-pro		47.08%	55.26%	83.79%	5.70%	16.87%	39.73%
gemini-2.0-flash		47.08%	88.79%	69.63%	4.35%	17.42%	33.75%
gemini-2.0-pro		47.08%	87.78%	78.15%	3.83%	14.45%	39.45%
Grand Total	97.50%	64.29%	75.47%	77.82%	8.65%	25.18%	43.28%

Appendix A.2

Full data used for tests can be found at: https://github.com/intelpen/teaching_helper/tree/main/tests/results (accessed on 26 February 2025).

References

Incheon Declaration and Framework for Action, for the Implementation of Sustainable Development Goal 4. Available online: https://uis.unesco.org/sites/default/files/documents/education-2030-incheon-framework-for-action-implementation-of-sdg4-2016-en_2.pdf (accessed on 26 February 2025).
Survey: 86% of Students Already Use AI in Their Studies. Available online: https://campustechnology.com/articles/2024/08/28/survey-86-of-students-already-use-ai-in-their-studies.aspx (accessed on 26 February 2025).
Half of High School Students Already Use AI Tools. Available online: https://leadershipblog.act.org/2023/12/students-ai-research.html (accessed on 26 February 2025).
Ravšelj, D.; Keržič, D.; Tomaževič, N.; Umek, L.; Brezovar, N.; AIahad, N.; Abdulla, A.A.; Akopyan, A.; Aldana Segura, M.W.; AlHumaid, J.; et al. Higher education students’ perceptions of ChatGPT: A global study of early reactions. PLoS ONE 2025, 16, e0245832. [Google Scholar] [CrossRef]
How to Use the ADDIE Instructional Design Model–SessionLab. Available online: https://www.sessionlab.com/blog/addie-model-instructional-design/ (accessed on 26 February 2025).
2023 Learning Trends and Beyond-eLearning Industry. Available online: https://elearningindustry.com/2023-learning-trends-and-beyond (accessed on 26 February 2025).
Learning Management System Trends to Stay Ahead in 2023-LinkedIn. Available online: https://www.linkedin.com/pulse/learning-management-system-trends-stay-ahead-2023-greenlms (accessed on 26 February 2025).
Gamification Education Market Overview Source. Available online: https://www.marketresearchfuture.com/reports/gamification-education-market-31655?utm_source=chatgpt.com (accessed on 26 February 2025).
eLearning Trends And Predictions For 2023 and Beyond-eLearning Industry. Available online: https://elearningindustry.com/future-of-elearning-trends-and-predictions-for-2023-and-beyond (accessed on 26 February 2025).
AI Impact on Education: Its Effect on Teaching and Student Success. Available online: https://www.netguru.com/blog/ai-in-education (accessed on 26 February 2025).
Introducing AI-Powered Study Tool. Available online: https://www.pearson.com/en-gb/higher-education/products-services/ai-powered-study-tool.html (accessed on 26 February 2025).
Crompton, H.; Burke, D. Artificial Intelligence in Higher Education: The State of the Field. Int. J. Educ. Technol. High. Educ. 2023, 20, 22. [Google Scholar] [CrossRef]
Xu, W.; Ouyang, F. The application of AI technologies in STEM education: A systematic review from 2011 to 2021. Int. J. STEM Educ. 2022, 9, 59. [Google Scholar] [CrossRef]
Hadzhikoleva, S.; Rachovski, T.; Ivanov, I.; Hadzhikolev, E.; Dimitrov, G. Automated Test Creation Using Large Language Models: A Practical Application. Appl. Sci. 2024, 14, 9125. [Google Scholar] [CrossRef]
Nurhayati, T.N.; Halimah, L. The Value and Technology: Maintaining Balance in Social Science Education in the Era of Artificial Intelligence. In Proceedings of the International Conference on Applied Social Sciences in Education, Bangkok, Thailand, 14–16 November 2024; Volume 1, pp. 28–36. [Google Scholar]
Nunez, J.M.; Lantada, A.D. Artificial intelligence aided engineering education: State of the art, potentials and challenges. Int. J. Eng. Educ. 2020, 36, 1740–1751. [Google Scholar]
Darayseh, A.A. Acceptance of artificial intelligence in teaching science: Science teachers’ perspective. Comput. Educ. Artif. Intell. 2023, 4, 100132. [Google Scholar] [CrossRef]
Briganti, G.; Le Moine, O. Artificial intelligence in medicine: Today and tomorrow. Front. Med. 2020, 7, 27. [Google Scholar] [CrossRef]
Kandlhofer, M.; Steinbauer, G.; Hirschmugl-Gaisch, S.; Huber, P. Artificial intelligence and computer science in education: From kindergarten to university. In Proceedings of the 2016 IEEE Frontiers in Education Conference (FIE), Erie, PA, USA, 12–15 October 2016. [Google Scholar]
Edmett, A.; Ichaporia, N.; Crompton, H.; Crichton, R. Artificial Intelligence and English Language Teaching: Preparing for the Future. British Council 2023. [Google Scholar] [CrossRef]
Hajkowicz, S.; Sanderson, C.; Karimi, S.; Bratanova, A.; Naughtin, C. Artificial intelligence adoption in the physical sciences, natural sciences, life sciences, social sciences and the arts and humanities: A bibliometric analysis of research publications from 1960–2021. Technol. Soc. 2023, 74, 102260. [Google Scholar] [CrossRef]
Rahman, M.M.; Watanobe, Y.; Nakamura, K. A bidirectional LSTM language model for code evaluation and repair. Symmetry 2021, 13, 247. [Google Scholar] [CrossRef]
Wollny, S.; Schneider, J.; Di Mitri, D.; Weidlich, J.; Rittberger, M.; Drachsler, H. Are we there yet?-A systematic literature review on chatbots in education. Front. Artif. Intell. 2021, 4, 654924. [Google Scholar] [CrossRef]
Rahman, M.M.; Watanobe, Y.; Rage, U.K.; Nakamura, K. A novel rule-based online judge recommender system to promote computer programming education. In Proceedings of the Advances and Trends in Artificial Intelligence. From Theory to Practice: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, 26–29 July 2021; pp. 15–27. [Google Scholar]
Rahman, M.M.; Watanobe, Y.; Nakamura, K. Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Appl. Sci. 2020, 10, 2973. [Google Scholar] [CrossRef]
Rahman, M.M.; Watanobe, Y.; Kiran, R.U.; Kabir, R. A stacked bidirectional lstm model for classifying source codes built in mpls. In Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2021, Virtual Event, 13–17 September 2021; pp. 75–89. [Google Scholar]
Litman, D. Natural language processing for enhancing teaching and learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
ChatGPT. Available online: https://chatgpt.com (accessed on 26 February 2025).
Gemini. Available online: https://gemini.google.com/app (accessed on 26 February 2025).
Claude. Available online: https://claude.ai/ (accessed on 26 February 2025).
LLAMA. Available online: https://www.llama.com (accessed on 26 February 2025).
Tian, S.; Jin, Q.; Yeganova, L.; Lai, P.-T.; Zhu, Q.; Chen, X.; Yang, Y.; Chen, Q.; Kim, W.; Comeau, D.C.; et al. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Brief. Bioinform. 2024, 25, bbad493. [Google Scholar] [CrossRef]
Gill, S.S.; Xu, M.; Patros, P.; Wu, H.; Kaur, R.; Kaur, K.; Fuller, S.; Singh, M.; Arora, P.; Parlikad, A.K.; et al. Transformative effects of ChatGPT on modern education: Emerging Era of AI Chatbots. Internet Things Cyber-Phys. Syst. 2024, 4, 19–23. [Google Scholar] [CrossRef]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Nan, D.; Sun, S.; Zhang, S.; Zhao, X.; Kim, J.H. Analyzing behavioral intentions toward Generative Artificial Intelligence: The case of ChatGPT. Univers. Access Inf. Soc. 2024, 24, 885–895. [Google Scholar] [CrossRef]
Argyle, L.P.; Busby, E.C.; Fulda, N.; Gubler, J.R.; Rytting, C.; Wingate, D. Out of one, many: Using language models to simulate human samples. Political Anal. 2023, 31, 337–351. [Google Scholar] [CrossRef]
Rice, S.; Crouse, S.R.; Winter, S.R.; Rice, C. The advantages and limitations of using ChatGPT to enhance technological research. Technol. Soc. 2024, 76, 102426. [Google Scholar] [CrossRef]
Hämäläinen, P.; Tavast, M.; Kunnari, A. Evaluating Large Language Models in Generating Synthetic Hci Research Data: A Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg Germany, 23–28 April 2023; pp. 1–19. [Google Scholar]
Wang, S.; Xu, T.; Li, H.; Zhang, C.; Liang, J.; Tang, J.; Yu, P.S.; Wen, Q. Large Language Models for Education: A Survey and Outlook, arXiv 2024, arXiv:2403.18105. Available online: https://arxiv.org/html/2403.18105v1 (accessed on 26 February 2025).
Hattie, J. Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement; Routledge: London, UK, 2009. [Google Scholar] [CrossRef]
Alzahrani, L.; Seth, K.P. Factors influencing students’ satisfaction with continuous use of learning management systems during the COVID-19 pandemic: An empirical study. Educ. Inf. Technol. 2021, 26, 6787–6805. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Learning Management System, EN.WIKIPEDIA.ORG. Available online: https://en.wikipedia.org/wiki/Learning_management_system (accessed on 26 February 2025).
Studies on the Impact of Cellphones on Academics, CACSD.ORG. Available online: https://www.cacsd.org/article/1698443 (accessed on 26 February 2025).
Lepp, A.; Barkley, J.E.; Karpinski, A.C. The relationship between cell phone use and academic performance in a sample of U.S. college students. Comput. Hum. Behav. 2015, 31, 343–350. [Google Scholar] [CrossRef]
Junco, R. In-class multitasking and academic performance. Comput. Hum. Behav. 2012, 28, 2236–2243. [Google Scholar] [CrossRef]
Clinton, V. Reading from paper compared to screens: A systematic review and meta-analysis. J. Res. Read. 2019, 42, 288–325. [Google Scholar] [CrossRef]
Mizrachi, D.; Salaz, A.M.; Kurbanoglu, S.; Boustany, J. Academic reading format preferences and behaviors among university students worldwide: A comparative survey analysis. PLoS ONE 2018, 13, e0197444. [Google Scholar] [CrossRef]
Mizrachi, D.; Salaz, A.M.; Kurbanoglu, S.; Boustany, J. The Academic Reading Format International Study (ARFIS): Final results of a comparative survey analysis of 21,265 students in 33 countries. Ref. Serv. Rev. 2021, 49, 250–266. [Google Scholar] [CrossRef]
Mizrachi, D.; Salaz, A.M. Beyond the surveys: Qualitative analysis from the academic reading format international study (ARFIS). Coll. Res. Libr. 2020, 81, 808. [Google Scholar] [CrossRef]
Welsen, S.; Wanatowski, D.; Zhao, D. Behavior of Science and Engineering Students to Digital Reading: Educational Disruption and Beyond. Educ. Sci. 2023, 13, 484. [Google Scholar] [CrossRef]
Quizlet’s State of AI in Education Survey Reveals Higher Education is Leading AI Adoption. Available online: https://www.prnewswire.com/news-releases/quizlets-state-of-ai-in-education-survey-reveals-higher-education-is-leading-ai-adoption-302195348.html (accessed on 26 February 2025).
Sandu, N.; Gide, E. Adoption of AI-Chatbots to Enhance Student Learning Experience in Higher Education in India. In Proceedings of the 2019 18th International Conference on Information Technology Based Higher Education and Training (ITHET), Magdeburg, Germany, 26–27 September 2019; pp. 1–5. [Google Scholar] [CrossRef]
Bruner, J.; Barlow, M.A. What are Conversational Bots?: An Introduction to and Overview of AI-driven Chatbots; O’Reilly Media: Sebastopol, CA, USA, 2016. [Google Scholar]
Shevat, A. Designing Bots: Creating Conversational Experiences; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
Thomas, D. The AI Advantage: How to Put the Artificial Intelligence Revolution to Work; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar] [CrossRef]
Verleger, M.; Pembridge, J. A Pilot Study Integrating an AI-driven Chatbot in an Introductory Programming Course. In Proceedings of the Frontiers in Education Conference, San Jose, CA, USA, 3–6 October 2019. [Google Scholar]
Duncker, D. Chatting with chatbots: Sign making in text-based human-computer interaction. Sign Syst. Stud. 2020, 48, 79–100. [Google Scholar] [CrossRef]
Smutny, P.; Schreiberova, P. Chatbots for learning: A review of educational chatbots for the Facebook Messenger. Comput. Educ. 2020, 151, 103862. [Google Scholar] [CrossRef]
Miklosik, A.; Evans, N.; Qureshi, A.M.A. The Use of Chatbots in Digital Business Transformation: A Systematic Literature Review. IEEE Access 2021, 9, 106530–106539. [Google Scholar] [CrossRef]
Singh, S.; Thakur, H.K. Survey of Various AI Chatbots Based on Technology Used. In Proceedings of the ICRITO 2020-IEEE 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), Noida, India, 4–5 June 2020; pp. 1074–1079. [Google Scholar]
Okonkwo, C.W.; Ade-Ibijola, A. Python-Bot: A Chatbot for Teaching Python Programming. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 202–208. [Google Scholar]
Dan, Y.; Lei, Z.; Gu, Y.; Li, Y.; Yin, J.; Lin, J.; Ye, L.; Tie, Z.; Zhou, Y.; Wang, Y.; et al. EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education. arXiv 2023, arXiv:2308.02773. [Google Scholar]
GPTeens. 2024. Available online: https://en.wikipedia.org/wiki/GPTeens?utm_source=chatgpt.com (accessed on 26 February 2025).
Li, Y.; Qu, S.; Shen, J.; Min, S.; Yu, Z. Curriculum-Driven EduBot: A Framework for Developing Language Learning Chatbots Through Synthesizing Conversational Data. arXiv 2023, arXiv:2309.16804. [Google Scholar]
BlazeSQL. Available online: https://www.blazesql.com/ (accessed on 26 February 2025).
OpenSQL-From Questions to SQL. Available online: https://web.archive.org/web/20250115110920/http://www.opensql.ai/ (accessed on 5 May 2025).
Chat with SQL Databases Using AI. Available online: https://www.askyourdatabase.com/?utm_source=chatgpt.com (accessed on 26 February 2025).
Mahapatra, S. Impact of ChatGPT on ESL Students’ Academic Writing Skills: A Mixed Methods Intervention Study. Smart Learn. Environ. 2024, 11, 9. [Google Scholar] [CrossRef]
Pros And Cons Of Traditional Teaching: A Detailed Guide. Available online: https://www.billabonghighschool.com/blogs/pros-and-cons-of-traditional-teaching-a-detailed-guide/ (accessed on 26 February 2025).
Google Cloud Run. Available online: https://cloud.google.com/run#features (accessed on 26 February 2025).
Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, M.; Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2024, arXiv:2312.10997. [Google Scholar]
Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. Adv. Neural Inf. Process. Syst. 2023, 36, 46595–46623. [Google Scholar]
Liang, P.; Bommasani, R.; Lee, T.; Tsipras, D.; Soylu, D.; Yasunaga, M.; Zhang, Y.; Narayanan, D.; Wu, Y.; Kumar, A.; et al. Holistic Evaluation of Language Models. arXiv 2023, arXiv:2211.09110. [Google Scholar]
Gemini Team. Gemini: A Family of Highly Capable Multimodal Models. arXiv 2023, arXiv:2312.11805. [Google Scholar]
Gemini Team. Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context. arXiv 2024, arXiv:2403.05530. [Google Scholar]
Akter, S.N.; Yu, Z.; Muhamed, A.; Ou, T.; Bäuerle, A.; Cabrera, Á.A.; Dholakia, K.; Xiong, C.; Neubig, G. An In-depth Look at Gemini’s Language Abilities. arXiv 2023, arXiv:2312.11444. [Google Scholar]
Google Cloud Vertex AI. Available online: https://cloud.google.com/vertex-ai (accessed on 26 February 2025).
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Lowe, R. Training language models to follow instructions with human feedback. arXiv 2022, arXiv:2203.02155. [Google Scholar]
Bai, Y.; Kadavath, S.; Kundu, S.; Askell, A.; Kernion, J.; Jones, A.; Chen, A.; Goldie, A.; Mirhoseini, A.; McKinnon, C. Constitutional AI: Harmlessness from AI Feedback. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Microsoft Search AI. Available online: https://learn.microsoft.com/en-us/rest/api/searchservice/ (accessed on 26 February 2025).
Yang, Y.; Li, Z.; Dong, Q.; Xia, H.; Sui, Z. Can Large Multimodal Models Uncover Deep Semantics Behind Images? arXiv 2024, arXiv:2402.11281. [Google Scholar]
RAG Evaluation. Available online: https://docs.confident-ai.com/guides/guides-rag-evaluation (accessed on 26 February 2025).
Zhang, Y.; Mai, Y.; Roberts, J.S.R.; Bommasani, R.; Dubois, Y.; Liang, P. HELM Instruct: A Multidimensional Instruction Following Evaluation Framework with Absolute Ratings. 2024. Available online: https://crfm.stanford.edu/2024/02/18/helm-instruct.html (accessed on 26 February 2025).
Jauhiainen, J.S.; Guerra, A.G. Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large. arXiv. 2024.
Best Practices for LLM Evaluation of RAG Applications-A Case Study on the Databricks Documentation Bot. Available online: https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG (accessed on 26 February 2025).
Renze, M.; Guven, E. The Effect of Sampling Temperature on Problem Solving in Large Language Models. 2024. Available online: https://arxiv.org/html/2402.05201v1 (accessed on 26 February 2025).
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
Reimers, N.; Gurevych, I. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP); pp. 4512–4525. Available online: https://arxiv.org/abs/2004.09813 (accessed on 26 February 2025).

Figure 1. Chatbot workflow diagram. (a) General NLP knowledge base retrieval + AI. (b) Simplified AI-only system.

Figure 2. Application user interface: left—navigation, middle—content, right—AI assistant.

Figure 3. Technical diagram of components and flow of our application.

Figure 4. Diagram of the RAG interfaces.

Figure 5. Flow of a question through the system.

Figure 6. Answer correctness (top), relevancy (middle), and faithfulness (bottom) split by question difficulty and the question generation mode (AI or manual) for all LLM Backends.

Figure 7. Comparison on the two splitting methods (chunks and pages) for the metrics context recall (left) and answer correctness (right).

Table 1. Chatbot classification.

Classification	Type	Description	Example
Knowledge Domain	Domain-Specific	Specialized in a specific field or industry.	Python-Bot, Babylon Health
Knowledge Domain	General-Purpose	Broader functionality across multiple domains.	Siri, Alexa
Service Provided	Informational	Provides facts, updates, and general information.	FIT-EBot
	Transactional	Handles tasks like booking, purchasing, or transactions.	Erica by BOA
	Support/Assistant	Assists with troubleshooting or performing complex tasks.	IT support bots
Goals	Task-Oriented	Designed to complete specific user-defined tasks.	BlazeSQL AI
	Conversational	Focused on engaging in natural dialogue with users.	ChatGPT
	Learning/Training	Educates or trains users in specific domains.	Percy
Input/Response	Rule-Based	Operates on pre-written scripts and decision logic.	ELIZA
	AI-Powered	Leverages AI and NLP to provide dynamic, context-aware responses.	OpenSQL.ai
	Hybrid	Combines rule-based and AI features for versatile interaction.	Many enterprise chatbots

Table 2. Most common educational chatbots.

Chatbot Name	Purpose	Features
Python-Bot [61]	Python-Bot is a learning assistant chatbot designed to assist novice programmers in understanding the Python programming language.	User-Friendly Interface: Offers an intuitive platform for students to interact with, making the learning process more accessible.
		Educational Focus: Provides detailed explanations of Python concepts, accompanied by practical examples to reinforce learning.
		Interactive Learning: Engages users in a conversational manner, allowing them to ask questions and receive immediate feedback.
EduChat [62]	EduChat is an LLM-based chatbot system designed to support personalized, fair, and compassionate intelligent education for teachers, students, and parents.	Educational Functions: Enhances capabilities such as open question answering, essay assessment, Socratic teaching, and emotional support.
		Domain-Specific Knowledge: Pre-trained on educational formats to provide accurate and relevant information.
		Tool Integration: Fine-tuned to utilize various tools, enhancing its educational support capabilities.
GPTeens [63]	GPTeens is an AI-based chatbot developed to provide educational materials aligned with school curricula for teenage users.	Interactive Format: Utilizes natural language processing to support conversational interactions with learners.
		Age-Appropriate Design: Delivers responses suitable for teenage users.
		Curriculum Integration: Trained on educational materials aligned with the South Korean national curriculum.
EduBot [64]	Curriculum-Driven EduBot is a framework for developing language learning chatbots for assisting students.	Topic Extraction: Extracts pertinent topics from textbooks to generate dialogues related to these topics.
		Conversational Data Synthesis: Uses large language models to generate dialogues, which are then used to fine-tune the chatbot.
		User Adaptation: Adapts its dialogue to match the user’s proficiency level, providing personalized conversation practice.
BlazeSQL [65]	BlazeSQL is designed to transform natural language questions into SQL queries, enabling users to extract data insights from their databases without extensive SQL knowledge.	AI-Driven Query Generation: Utilizes advanced AI technology to comprehend database schemas and generate SQL queries based on user input in plain English.
		Broad Database Compatibility: Supports various databases, including MySQL, PostgreSQL, SQLite, Microsoft SQL Server, and Snowflake.
		Privacy-Focused Operation: Operates locally on the user’s desktop, ensuring that sensitive data remain on the user’s machine.
		No-Code Data Visualization: Generates dashboards and visualizations directly from query results, simplifying data presentation.
OpenSQL.ai [66]	OpenSQL.ai aims to simplify the process of SQL query generation by allowing users to interact with their databases through conversational language.	Text-to-SQL Conversion: Transforms user questions posed in plain English into precise SQL code, facilitating data retrieval without manual query writing.
		User-Friendly Interface: Designed for both technical and non-technical users, making database interactions more accessible.
		Efficiency Enhancement: Streamlines data tasks by reducing the need for complex SQL coding, thereby increasing productivity.
AskYourDatabase [67]	AskYourDatabase is an AI-powered platform that enables users to interact with their SQL and NoSQL databases using natural language inputs, simplifying data querying and analysis.	Natural Language Interaction: Allows users to query, visualize, manage, and analyze data by asking questions in plain language, eliminating the need for SQL expertise.
		Data Visualization: Instantly converts complex data into clear, engaging visuals without requiring coding skills.
		Broad Database Support: Compatible with popular databases such as MySQL, PostgreSQL, MongoDB, and SQL Server.
		Self-Learning Capability: The AI learns from data and user feedback, improving its performance over time.
		Access Control and Embeddability: Offers fine-grained user-level access control and can be embedded as a widget on websites.

Table 3. Course description.

Course	Semester	ECTS	Hours	Year
Databases (DBs)	4th	4	56	2020–2025
Database Programming Techniques (DBPTs)	5th	5	56	2020–2025
Object-Oriented Programming (OOP)	3rd	6	70	2020–2025
Designing Algorithms (DAs)	2nd	4	56	2021–2024

Table 4. Pilot courses evaluated on our application.

Course	Students	%Male	%Female
Databases (DBs)	37	89%	11%
Database Programming Techniques (DBPTs)	36	91%	8%
Object-Oriented Programming (OOP)	37	89%	11%
Designing Algorithms (DAs)	49	83%	16%

Table 5. Teacher feedback, grading 1 min–5 max.

Criteria	Question	Eval
1. Ease of Application Setup	How would you rate the ease of setting up the application, including adding courses, creating exam questions, and generating exam questions automatically?	5/5
2. Chatbot Answer Quality	How do you rate the accuracy and relevance of the chatbot’s responses to questions in Module 1?	4.8/5
3. Evaluator’s Judgment Accuracy	How do you rate the quality and fairness of the evaluator’s assessment of student answers?	4.1/5
4. Evaluator Hints for Wrong Answer	How useful are the evaluator’s hints when a student selects multiple answers in a question?	3.2/5

Table 6. Student feedback.

Criteria	Question	Eval
1. Usability	How would you rate the ease of using and accessing the Module 1/2/3	4.2/5
2. Chatbot Answers Clarity	How easy to understand are the chatbot answers and suggestions?	4.9/5
3. Chatbot Answers usefulness	How often are the chatbot fully answering your question?	5/5
4. Bugs	How often the application was irresponsible or crashed?	2.9/5

Table 7. Test data.

Source	Difficulty	Questions No	Type
Manual Test1 (2023 Exam)	1	8	Single-choice
Manual Test2 (2023 Exam)	1	8	Single-choice
O3-mini-high	1	6	Free-answer
O3-mini-high	2	6	Free-answer
O3-mini-high	3	6	Free-answer

Table 8. Results: overall correctness for the five backbones.

LLM	Gemini-1.0-Pro	Gemini-1.5-Flash	Gemini-1.5-Pro	Gemini-2.0-Flash	Gemini-2.0-Pro
Correct %	0.95	0.975	0.975	1	0.975

Table 9. Estimated costs for existing LLMs for 1 course (10M input tokens, 45 K output tokens).

Model Variant	Cost per 1M Input Tokens	Cost per 1M Output Tokens	Cost for 10 M Input Tokens	Cost for 45 K Output Tokens	Total Estimated Cost
Gemini 1.5 Flash	USD 0.15	USD 0.60	USD 1.50	USD 0.03	USD 1.53
Gemini 1.5 Pro	USD 2.50	USD 10.00	USD 25.00	USD 0.45	USD 25.45
Gemini 2.0 Flash	USD 0.10	USD 0.40	USD 1.00	USD 0.02	USD 1.02
Claude 3.5 Sonnet	USD 3.00	USD 15.00	USD 30.00	USD 0.68	USD 30.68
Chat GPT-4o	USD 2.5	USD 20	USD 25.00	USD 0.2	USD 25.2
DeepSeek (V3)	USD 0.14	USD 0.28	USD 1.40	USD 0.01	USD 1.41
Mistral (NeMo)	USD 0.15	USD 0.15	USD 1.50	USD 0.01	USD 1.51

Table 10. Comparison to existing solutions.

Criteria	Current App	Coursera [47]	Stanford [46]	Udemy [47]	edX [46]
Cost for Student	Free	Approx EUR 40 per course per month	Tuition-based, varies by program	Varies (EUR 10–200 per course)	Free courses, Paid certificates
Cost for Teacher/University	Free < 1 year LLM tokens	Free for universities; revenue-sharing for instructors	Salary-based	Instructors can set prices or offer courses for free	Free for universities
Ease of Use	High	High	High	High	High
Video Content	Real classes available	Yes	Yes	Yes	Yes
AI Assistant	Yes	Yes	No	No	No
AI Evaluator	Yes	Yes	No	No	Yes
Possible Reach (Students)	1 billion	Approx. 148 million registered learners	Enrolled students–20 k/year	Over 57 million learners	Over 110 million learners
Possible Reach (Teachers)	97 million total	Thousands of instructors	Limited to faculty members	Open to anyone interested in teaching	University professors

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Runceanu, A.; Balan, A.; Gavanescu, L.; Neagu, M.-M.; Cojocaru, C.; Borcosi, I.; Balacescu, A. Enhancing the Learning Experience with AI. Information 2025, 16, 410. https://doi.org/10.3390/info16050410

AMA Style

Runceanu A, Balan A, Gavanescu L, Neagu M-M, Cojocaru C, Borcosi I, Balacescu A. Enhancing the Learning Experience with AI. Information. 2025; 16(5):410. https://doi.org/10.3390/info16050410

Chicago/Turabian Style

Runceanu, Adrian, Adrian Balan, Laviniu Gavanescu, Marian-Madalin Neagu, Cosmin Cojocaru, Ilie Borcosi, and Aniela Balacescu. 2025. "Enhancing the Learning Experience with AI" Information 16, no. 5: 410. https://doi.org/10.3390/info16050410

APA Style

Runceanu, A., Balan, A., Gavanescu, L., Neagu, M.-M., Cojocaru, C., Borcosi, I., & Balacescu, A. (2025). Enhancing the Learning Experience with AI. Information, 16(5), 410. https://doi.org/10.3390/info16050410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the Learning Experience with AI

Abstract

1. Introduction

2. Review of New Methods Used in Education

2.1. Computer-Aided Methods in Education

2.2. AI in Education

2.3. Estimation of the Usage of New Methods in Education

2.4. The Shift Towards Using AI Tools

2.5. Chatbots in Education

2.5.1. Chatbots: Definition and Classification

2.5.2. Chatbots: Structure and Role in Education

2.5.3. Educational Chatbots Survey

3. Materials and Methods

3.1. Proposed Solution Description

3.2. High-Level Description

3.3. Description of the User Interface

3.4. Architecture Details of the Platform

3.4.1. General Software Application Architecture and Flow

3.4.2. AI Components/Modules’ Architecture

3.5. Technical Implementation of the Platform

3.5.1. Frontend Implementation

3.5.2. Backend Implementation

4. Experiments and Results

4.1. Description of the Experiments

4.1.1. Pilot Courses Evaluated on Our Solution

4.1.2. Sample

4.1.3. Description of the Classical Teaching Process

4.1.4. Description of the AI-Enhanced Teaching Process (Pilot)

4.2. Results: Evaluation of the Platform by Instructors and Students

4.2.1. Perceived Advantages and Disadvantages for Instructors

4.2.2. Perceived Advantages for the Students

4.3. Results: Testing of the AI Components of the Platform

4.3.1. AI Assistant (Module 1) Assessment

4.3.2. AI Evaluator Assessment (Module 2 and 3)

4.3.3. Common Benchmarks

4.3.4. Summary of LLM Results

5. Discussion

5.1. Technical Adoption Barrier

5.2. Cost Analysis

5.3. Competing Solutions

5.4. Legal and Governance Issues

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

Appendix A.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI