Enhancing Online Learning Through Multi-Agent Debates for CS University Students

Du, Jing; Xu, Guangtao; Liu, Wenhao; Zhou, Dibin; Liu, Fuchang

doi:10.3390/app15115877

Open AccessArticle

Enhancing Online Learning Through Multi-Agent Debates for CS University Students

by

Jing Du

¹,

Guangtao Xu

²

,

Wenhao Liu

³,

Dibin Zhou

^3,4 and

Fuchang Liu

^3,*

¹

Department of Media and Communication, Kangwon National University, Chuncheon 24341, Republic of Korea

²

Jing Hengyi School of Education, Hangzhou Normal University, Hangzhou 311121, China

³

School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China

⁴

Center for Engineering and Scientific Computation, Zhejiang University, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 5877; https://doi.org/10.3390/app15115877

Submission received: 4 April 2025 / Revised: 8 May 2025 / Accepted: 20 May 2025 / Published: 23 May 2025

(This article belongs to the Special Issue Artificial Intelligence Technologies for Education: Advancements, Challenges, and Impacts)

Download

Browse Figures

Versions Notes

Abstract

As recent advancements in large language models enhance reasoning across various domains, educators are increasingly exploring their use in conversation-based tutoring systems. However, since LLMs are black-box models to users and lack human-like problem-solving strategies, users are hardly convinced by the answers provided by LLMs. This lack of trust can potentially undermine the effectiveness of learning in educational scenarios. To address these issues, we introduce a novel approach that integrates multi-agent debates into a lecture video Q&A system, aiming to assist computer science (CS) university students in self-learning by using LLMs to simulate debates between affirmative and negative debaters and a judge to reach a final answer and presenting the entire process to users for review. This approach is expected to lead to better learning outcomes and the improvement of students’ critical thinking. To validate the effectiveness of this approach, we carried out a user study through a prototype system and conducted preliminary experiments based on video lecture learning involving 90 CS students from three universities. The study compared different conditions and demonstrated that students who had access to a combination of video-based Q&A and multi-agent debates performed significantly better on quizzes compared to those who only had access to the video or video-based Q&A. These findings indicate that integrating multi-agent debates with lecture videos can substantially enhance the learning experience, which is also beneficial for the development of students’ high-order thinking abilities in the future.

Keywords:

online learning; multi-agent; large language models

1. Introduction

With the increasing availability of online courses, a growing number of undergraduate and graduate students are turning to these resources for learning. Millions now access free educational videos on platforms like Khan Academy, Coursera, edX, Udacity, MIT OpenCourseWare, YouTube, and Chinese MOOC platforms. This trend underscores how Internet technology is enhancing educational access and broadening the range of available courses, particularly serving as a valuable alternative for students unable to attend in-person classes during the COVID-19 pandemic [1]. However, most current online courses still rely primarily on one-way lectures from an instructor, supplemented by a few post-lecture quizzes or self-regulated learning prompts [2,3]. There is a lack of necessary interaction and discussion, making them not significantly different from traditional, teacher-centered instruction, except that they are delivered through the Internet. Previous research on human–computer interaction has focused on enhancing educational videos through various techniques, such as interactive transcripts, word clouds, keyword search, and storyboard highlights [4]. Other studies have segmented videos to display key moments alongside snapshots and transcripts [5]. Additionally, some researchers have developed video digests structured in a textbook-like format, organized into chapters, titles, and sections with accompanying text summaries [6]. Moreover, Shimada et al. [7] found that students who used summarized slide previews achieved higher pre-quiz scores and required less time compared to those who previewed the original learning materials. Therefore, using large language models (LLMs) to summarize video content and subsequently perform retrieval-augmented generation (RAG) may also provide an effective human–computer interaction (HCI) approach to enhance students’ learning efficiency.

The advent of LLMs, such as Generative Pre-trained Transformer 4 (GPT-4) [8], has provided significant technological advancements in various educational activities, including engineering education [9], language teaching and learning [10], and personalized learning [11]. Despite their potential, LLMs still face challenges [12,13] such as hallucination, reliance on static training data, lack of real-time knowledge, limited problem-solving capabilities, memory continuity issues, ethical and safety concerns, and insufficient personalization. Careful use of LLMs poses challenges for both teachers and learners [14]. However, these limitations can potentially be mitigated by advanced prompt engineering techniques, such as prompt engineering [15], chain of thought (CoT) [16], self-consistency (SC) [17], and retrieval-augmented generation (RAG) [18]. Ongoing advancements in generative AI, including RAG, are progressively enhancing the factual reliability and responsiveness of large language models (LLMs). Autonomous agents have long been a central research area in both academia and industry. Large language models have been applied as autonomous agents in many fields, especially in educational assistance [19,20]. This advancement suggests a promising pathway for implementing multi-agent systems based on large language models, enabling novel online teaching and learning experiences.

Accordingly, we present an online learning-assisted scheme that offers an intuitive and user-friendly solution. This scheme supports both question-and-answer sessions and topic discussions through video-based multi-agent debates. The purpose of this paper is to explore the role of open-source LLM-based agents in facilitating self-learning among undergraduate students, particularly in the field of computer science.

The research question for this study is the following:

RQ. How can video-based multi-agent debates enhance online learning outcomes for CS students?

Regarding the above-mentioned research question, we further elaborate and propose the following hypotheses to explore the effects of multi-agents on students.

H1. Video-based multi-agent debates are more effective than video-only and video-based Q&A approaches for online learning among CS students in terms of objective questions.
H2. Video-based multi-agent debates are more effective than video-only and video-based Q&A approaches for online learning among CS students in terms of enriching the answer dimensions of open-ended questions.

2. Literature Review

In this section, we conducted a review of the relevant literature. Prior to conducting this review, we first carried out a literature survey on Clarivate Analytics’ Web of Science using the keywords “online learning and video”, resulting in 3057 documents published since 2012. Keyword analysis reveals that it is relatively rare to use multi-agent debates to enhance online learning in a video-based environment, as shown in Table 1. We performed a topic analysis on the abstracts of relevant literature, as illustrated in Figure 1. The findings reveal that cutting-edge technologies such as “virtual reality” and “artificial intelligence” have garnered significant attention in recent years, while research on online courses and online learning environments has consistently remained a central focus in the field. The contribution of this paper lies in presenting a novel approach to online learning. It leverages LLMs as multiple debaters to generate answers and, through this process, aims to inspire students’ high-order thinking abilities. The aforementioned literature highlights the application and transformative impact of “online learning and video” in education. Therefore, we summarize the prior literature about online learning from three perspectives: massive open online courses (MOOCs), online learning environments, and LLM-based multi-agents.

2.1. MOOCs

MOOCs represent a widely studied innovation in distance education, first introduced in 2008 [21]. They gained prominence as a popular learning format in 2012, a year often referred to as the “Year of the MOOC” [22]. MOOCs are distinguished by their openness, scalability, and ability to cater to diverse audiences. Openness facilitates free enrollment and unrestricted access to course materials, thereby advancing the democratization of education [23]. Scalability allows instructors to interact with thousands of learners simultaneously, surpassing the constraints of traditional classroom settings [24]. The global pandemic of COVID-19 underscored the role of MOOCs in delivering educational and learning opportunities to a wide audience [25,26,27]. A particularly appealing feature of MOOCs for adult learners is the high degree of control and autonomy they offer over the learning process [28,29]. However, despite their significant potential, MOOCs continue to face challenges related to high dropout rates and low completion rates, which hinder their broader adoption [30].

Despite offering convenient online learning, MOOCs face notable limitations. Completion rates typically range between

5 %

and

15 %

, with longer course durations correlating with lower completion rates [31]. Margaryan et al. [32] highlighted shortcomings in traditional MOOCs’ learning outcomes, attributing them to a lack of personalized support, limited feedback mechanisms, and insufficient opportunities for deep learning. The one-size-fits-all approach of MOOCs often fails to address the diverse learning styles and needs of participants [33]. Additionally, the absence of personalized feedback and direct interaction with instructors remains a critical challenge [34].

To address the aforementioned issues, educators and technologists have explored various interactive teaching strategies. The integration of interactive elements into MOOCs has been shown to significantly enhance learning outcomes. Interactive approaches in MOOCs include discussion forums, gamification, and interactive videos. MOOC discussion forums serve as a vital channel for communication between instructors and learners, with their effectiveness shaped by various factors. Wei et al. [35] demonstrated that small group discussions, particularly those with clear objectives and assessment criteria, enhance learner engagement. Similarly, Neha et al. [36] found that structured topics encourage participation, while immediate feedback helps sustain engagement. Galikyan et al. [37] highlighted the dual role of forums in facilitating knowledge exchange and fostering social connections, contributing to deeper learning when instructors actively guide discussions and incorporate peer review. Gamification in MOOCs has demonstrated significant positive impacts. Freitas et al [38] found that gamification enhances motivation and engagement, with achievement badges serving as powerful incentives. Similarly, Meng et al. [39] showed that a well-designed scoring system promotes skill development and multidimensional engagement but cautioned against the risk of goal misalignment if gamification is overused. Maintaining clear rules and mechanisms is essential to ensure that gamification remains aligned with educational objectives. MOOCs incorporate interactive videos with various elements to enrich the learning experience. Embedded quizzes, implemented by platforms such as Coursera and Udacity [40], serve as checkpoints for self-assessment of knowledge. Furthermore, Chen et al. [41] introduced time-anchored discussions, enabling learners to engage in conversations tied to specific video timestamps, thereby fostering interaction and enhancing content understanding. Some platforms have further innovated by combining linear discussions with live comments, providing more flexible and dynamic interaction opportunities.

Integrating large language models (LLMs) into MOOCs represents a transformative development. These models enable personalized learning experiences, as exemplified by Khan Academy’s Khanmigo, which supports students through adaptive, progressive questioning. Similarly, the Massive AI-Empowered Course (MAIC) [42] establishes an intelligent classroom environment, leveraging AI-driven roles to create personalized learning paths that were previously unattainable in traditional MOOCs.

2.2. Online Learning Environment

The online learning environment is a diverse and digital educational ecosystem that provides learners with flexible educational platforms that transcend geographical, temporal, and cultural boundaries through the Internet, cloud computing, and advanced digital interaction technologies [43]. This complex learning ecosystem is not merely a technological medium but an innovative paradigm that integrates education, cognitive science, and information technology, with the core value of offering highly personalized, adaptive, and inclusive learning experiences [44]. Educational theories provide the essential theoretical foundation and normative guidance for the cognitive framework and practical paradigms of online learning. Self-Determination Theory (SDT) extensively analyzes the inherent mechanisms of learners’ basic psychological needs—autonomy, competence, and relatedness—regarding learning motivation and performance, offering significant theoretical insights into learner agency in online learning contexts [45]. The Cognitive–Affective Immersive Learning (CAMIL) model deconstructs the immersive experience of learners in digital learning environments in various dimensions, elucidating the complex dialectical relationships between cognitive engagement, emotional involvement, and interaction experiences throughout the learning process [46]. The Inquiry-Based Learning (IBL) framework provides a structured methodology for online learning, emphasizing the cultivation of learners’ autonomy in exploration, critical thinking, and problem-solving skills within the knowledge construction process [47]. Self-Regulated Learning (SRL) strategies reveal the complex pathways through which learners regulate their behaviors and achieve goals during autonomous learning [48,49]. Deeper investigations indicate that learners’ emotional trajectories and levels of interaction critically influence learning outcomes, not only impacting cognitive knowledge acquisition but also reflecting the development of learners’ metacognitive skills and learning adaptability [50]. This study demonstrates that cognitive presence, social presence, and teaching presence are significantly correlated with autonomy, competence, relevance, and perceived learning in the context of online educational settings in China [51].

The COVID-19 pandemic, in particular, has accelerated the application of digital technologies in education, leading to a growing number of technological innovations in online learning platforms. For example, virtual reality (VR) technology provides a revolutionary learning paradigm for disciplines that require high levels of practicality and interactivity by creating highly immersive, interactive learning environments [46,52]. At the same time, artificial intelligence (AI) optimizes learning processes and dynamically adapts learning paths through intelligent algorithms tailored to the individual needs of learners [53].

By breaking the structural limitations of traditional education, online learning environments achieve global sharing of educational resources, promote learner autonomy, and implement the concept of lifelong learning, thereby advancing educational equity and knowledge democratization in a globalized context. Despite advancements in technology, online education environments still face significant structural challenges. First, traditional online courses primarily rely on one-way lecturing models, which closely mirror conventional classrooms by simply transferring content to the Internet, thus lacking substantial pedagogical innovation. As a result, students remain in a highly passive state during the learning process, hindering their ability to think independently and explore knowledge actively. Second, the lack of interactivity leads to substantial declines in student satisfaction and learning outcomes due to insufficient opportunities for immediate questioning and deep communication [54]. Existing online learning platforms often fail to adapt precisely to learners’ cognitive characteristics and learning paces, resulting in a general lack of personalized learning paths and, consequently, low efficiency and engagement and poor learning outcomes [55]. More critically, these platforms tend to overlook the multidimensional nature of learning, failing to adequately stimulate learners’ cognitive autonomy and emotional engagement, which causes online education to become a mechanical and soulless tool for knowledge transmission. These inherent limitations severely restrict the developmental potential of online education and hinder the genuine transformation of the digital learning ecosystem.

2.3. LLM-Based Learning Systems

LLMs have recently demonstrated remarkable potential, achieving reasoning and planning capabilities comparable to those of humans. This aligns precisely with human expectations for autonomous agents capable of perceiving their surroundings, making decisions, and taking appropriate actions in response [56,57]. Building on the impressive capabilities of single LLM-based agents, LLM-based multi-agent systems have been proposed to harness collective intelligence and specialized skills across multiple agents. Compared to systems employing a single LLM-powered agent, multi-agent systems offer enhanced functionalities by (1) specializing LLMs into distinct agents, each with unique capabilities, and (2) facilitating interactions among these agents to effectively simulate complex real-world environments. Recent studies [58] have shown promising results in leveraging LLM-based multi-agents for diverse tasks, such as software development, multi-robot systems, societal simulations, policy modeling, and game simulations, with particular success in simulating human behavior. Notable examples include Generative Agents [59], Ghost in the Minecraft [60], and GPT-Bargaining [61].

As LLMs continue to advance, intense discussions have arisen regarding methodologies in this field [62,63]. A key focus lies in fully harnessing the capabilities of large models to simulate real classrooms with multiple agents for automated teaching. LLMs have already been applied to various intelligent educational tasks to support teaching. Ref. [64] introduced a multi-agent classroom simulation framework involving user participation, demonstrating that LLMs can effectively replicate traditional classroom interaction patterns while enhancing the user experience. This work pioneered the use of LLM-powered multi-agent systems in virtual classroom teaching.

In summary, how to effectively utilize AI technology to advance education has long been a key research focus, with scholars exploring this topic from various perspectives [9]. For example, Huber et al. [65] explored the use of LLMs through playful and game-based learning methods, while [66,67] implemented LLM prompt-based personalized tutoring. This study approaches the topic from the perspective of video-assisted online learning. AI may negatively impact independent thinking by fostering student dependency, leading to a lack of initiative in problem-solving. This can hinder deeper memory retention of knowledge and the development of critical thinking skills. Mohamed et al. [68] examined the benefits and drawbacks of integrating AI tools into language instruction, while [69] explored various issues caused by ChatGPT (gpt-3) in a broader range of educational scenarios. Zhang et al. [70] discussed students’ dependency on AI and identified the top five negative effects of AI technology: increased laziness, the spread of misinformation, decreased creativity, and reduced critical and independent thinking. Efficient, reliable, and valid assessment for online learning is also a topic worthy of in-depth investigation [71]. Further detailed investigation and assessment are needed in the area of online learning. Finally—and most importantly—the use of artificial intelligence tools like ChatGPT in education raises ethical dilemmas and challenges, including concerns about data privacy, over-reliance on technology, intrinsic biases, and the integrity of student outputs [72]. Achieving a rational and ethically balanced integration of LLMs and conversable agents in educational settings remains a significant challenge for researchers leveraging AI to enhance education. Although there have been quite a few attempts to integrate LLMs into learning systems, the design of such models for personalized learning still has room for further enhancement. Future research could explore the use of LLMs to achieve personalized learning through a conversation-based tutoring system enhanced with more interactive elements such as quizzes or debates that inspire higher-order thinking.

3. Materials and Methods

3.1. Design of Evaluation System

After defining the research questions and hypotheses, the study was conducted in four steps, as outlined in Figure 2. Step 1 focused on obtaining initial scores through a pre-video test, where participants answered questions via an online quiz. Step 2 involved having participants watch a lecture video and interact with a software interface. This step incorporated a learn-as-you-go approach, supported by multi-agent debates to facilitate understanding of key concepts. In Step 3, a post-video test was conducted, which also included answering an online quiz. Step 4 analyzed learning outcomes and subjective evaluations by comparing pre-test and post-test scores, along with responses from a follow-up questionnaire. The results of the questionnaire were analyzed using SPSS software (29.0.1.0(171)).

3.1.1. LLM RAG-Based Video Understanding Function

Videos inherently exhibit multimodal properties, encompassing both visual and audio features. The visual features can be categorized into image captioning, object detection, and optical character recognition (OCR) tasks. We extracted keyframes from the video using CLIP features and used the BLIP model to generate image captions, while employing GRiT and RAM++ for object detection. Additionally, we utilized Whisper to transcribe the audio into text. These multimodal features provide the foundational data for subsequent QA tasks, as shown in Figure 3.

The method proposed in this paper is compared with the current state-of-the-art approaches, including VideoChat2 (https://github.com/OpenGVLab/Ask-Anything, accessed on 5 September 2024) and the open-source VLog project. Unlike VideoChat2, which primarily utilizes visual features, our approach incorporates additional modalities, such as audio and OCR features. As a result, our method demonstrates a superior ability to capture finer details in video content. Figure 4a shows that current state-of-the-art methods for video understanding, such as VideoChat2 [73], can only answer questions related to the visual content of the video but struggle with finer details. In contrast, the multi-modal video analysis method based on the VLog project (https://github.com/showlab/VLog, accessed on 27 August 2024) can answer detailed questions about the video, though not always with full accuracy. The results presented in Figure 4b build on the VLog project, where ChatGPT is replaced with the free online “WizardLM-2” LLM API by Together.ai (https://api.together.ai/, accessed on 27 August 2024), along with English-to-Chinese translation using the pre-trained “Helsinki-NLP/opus-mt-en-zh” model. WizardLM-2’s capabilities are very close to those of cutting-edge proprietary models such as GPT-4-1106-preview and significantly ahead of all other open-source models. However, as shown in Figure 4c, our system, enhanced with OCR recognition, significantly improves accuracy in capturing detailed information within the video content. Compared to existing systems, our system accurately extracts video information, then provides it to the RAG module for answer generation.

3.1.2. Multi-Agent Debate-Assisted Learning Function

Using large language models as agents to assist users in completing various tasks is increasingly becoming a trend. In particular, multi-agent collaboration can address the limitations of a single model, resulting in improved accuracy and robustness.

The study titled Multi-Agent Debate (MAD) [74], demonstrates that reflection-based methods are prone to the Degeneration-of-Thought (DoT) problem. Once a large language model becomes confident in its initial solution, it struggles to generate new ideas through further reflection, even if the initial stance is incorrect. Multi-Agent Debate, inspired by another fundamental strategy of human-like problem solving, employs two agents to express their own arguments in a “tit-for-tat” state. A judge then monitors and manages the debate process to generate a final answer. The core idea of MAD is outlined as follows: (1) Through debate, the distorted thinking of one agent can be corrected by the others. (2) Through debate, the resistance of one party can be counterbalanced by the support of other members participating in the debate. (3) The answers of each agent can serve as context for mutual feedback. Therefore, MAD is less vulnerable to the occurrence of the DoT problem and can explore divergent CoTsto reduce hallucinations. As shown in Table 2, for questions that require a more comprehensive understanding through multiple perspectives and dialectical discussion, the debate function is clearly more suitable than the chat function. Multi-agent debate can be applied not only to STEM (Science, Technology, Engineering, and Mathematics) education but also to humanities and history studies.

We followed the idea from the work on divergent thinking based on MAD, employing three LLM-based agents to conduct multi-round debates. We utilized three roles, which are elaborated as follows: moderator, affirmative debater, and negative debater (Table 3). Based on the given topics, both sides present their arguments and provide supporting reasons. After each round, the moderator evaluates the responses and determines the winning side. After three rounds of debate, the moderator provides a final summary evaluation, along with the corresponding rationale. The specific details of the debate result in Figure 5 can be found in Table 2, which presents an example of a specific debating process. Through the debate, the reasons for the arguments can be presented, which can inspire students’ dialectical thinking. The answers to the QA rely on the LLM and RAG technologies and are provided based on the speech information extracted from the video, the OCR information, and the object detection results of the objects in the images. Text generation in the debate is completely accomplished based on the knowledge inherent in the pre-trained large language model we use.

Regarding the implementation of multi-agents, we referred to the open-source project available at https://github.com/Skytliang/Multi-Agents-Debate, accessed on 6 August 2024. Specifically, we used the free online “Qwen2.5-72B-Instruct-Turbo” LLM API provided by Together.ai. Each agent is an instance of a large language model, and different responsibilities are assigned through different role prompts. Each agent has its own memory. The output is stored in the memory as the content of historical conversations, and its own memory is referred to when generating new responses. The moderator controls the advancement of the rounds, and its output is broadcast to everyone, indirectly affecting the next-round input of other agents. We preset the meta prompt (role setting) used by each role, the dynamic prompt template required for the moderator’s judging, and the alternative judgment prompt in case of an inability to make a ruling. We also provide the detailed step-by-step procedure of MAD in Algorithm 1.

Algorithm 1 Multi-Agent Debate Process

Input: The debate topic provided by the user and the number of rounds (n).

Output: The debate topic and a complete debate process text.

Step 1: Initialize Agents

Create three agents: Affirmative, Negative, and Moderator

Inject meta-prompts for each agent role

Embed the debate topic into the preset prompt templates

Step 2: Commence the Debate

Affirmative agent presents the initial argument

Negative agent reads the affirmative argument and generates a refutation

Moderator evaluates both arguments and broadcasts a neutral judgment

Step 3: Make the Judging

while debate less than n rounds do

if Moderator identifies a clear preferred side then

Terminate the debate, summarize reasoning and final decision and go to Step 4

else

Proceed to the next round: Affirmative and Negative agent update their memory

and perform turn-based rebuttals

end if

end while

Invoke the alternative judgment prompt for Moderator:

a. Extract all answer candidates

b. Summarize reasoning and produce final decision

Step 4: Finish the Debate

Return the debate topic, arguments from both sides, full speech log, final decision, and reasoning

3.2. Participants and Design

In this section, we present a controlled study conducted to evaluate the potential benefits of providing students with video lectures, supplemented by video chat and LLM agents’ debates. Below, we outline the study’s participants, the testing conditions, and the results.

Participants. We recruited 197 undergraduate and postgraduate students to participate in the study. These students, specializing in computer science, were from Hangzhou Normal University, Nanjing University of Information Science and Technology, and Hebei University of Technology. Apart from the pre-test and post-test quiz, we also provided a follow-up questionnaire. Due to the time-consuming nature of the questionnaire, a total of 90 valid responses were ultimately returned. Undergraduate students accounted for 44.4%, and postgraduate students accounted for 55.6%.

Study design. We evaluated student performance on materials related to two machine learning topics. These topics were covered in two short online lecture videos by Andrew Ng: one on regularization (approximately 9 min) and the other on numerical approximation of gradients (approximately 6 min). For each topic, we assessed the following three learning conditions.

Testing conditions. (1) Watching only the lecture video; (2) watching the lecture video with the ability to ask questions via video-based Q&A; (3) watching the lecture video with the ability to ask questions via video-based Q&A and starting a multi-agent debate on specific topics.

All students were randomly assigned to a different learning condition for each topic. Prior to watching the lecture videos, students completed pre-test quizzes consisting of five single-choice questions and one open-ended question per topic to assess their initial understanding of the concepts. Once students had watched the video and indicated that they were ready, we administered the post-test quiz. The post-test quiz comprised the six questions from the pre-test, along with four additional, previously unseen single-choice questions. Each correct answer earned 1 point, while incorrect answers received 0 points. All the test questions were designed by teachers who teach relevant courses. Moreover, the difficulty level was determined through tests carried out by an undergraduate and a graduate student who are beginners in this area. Most students completed the test through WeChat groups, while a smaller number participated in face-to-face testing in the laboratory. To mitigate the biases introduced by heterogeneous assessment conditions, all participants, whether recruited via WeChat groups or undergoing face-to-face testing in the laboratory, were required to complete the quiz on our runnable system (https://huggingface.co/spaces/dj86/VLog4CustomLLMsPlusDebate, accessed on 1 September 2024). Our runnable system can be considered a prototype for LLM-based online video learning. It is equipped with features such as question generation based on retrieval-augmented generation and multi-agent debate functionality.

Measures. Due to differences in students’ prior knowledge, we assessed the effectiveness of the learning methods by measuring the relative improvement in test scores before and after the learning videos. We calculated the relative percentage-point increase as follows:

Δ = \frac{Post score (%) - Pre score (%)}{Pre score (%)}

(1)

4. Results

4.1. Data Analysis

According to the pre-test and post-test results, as shown in Table 4, the mean score on the pre-test questions was below

70 %

. We limited our analysis to the single and multiple-choice questions, as grading the open-ended questions proved more challenging. Instead, we employed a word cloud analysis to examine the open-ended questions. After reviewing the lecture videos, students under all conditions demonstrated learning effectiveness. The learning method using MAD achieves the greatest improvement in outcomes. We employed a paired t-test and Cohen’s d [75] to analyze test correctness scores before and after “video-only” [

t (29) = - 3.946

,

p < 0.001

, 95% CI [−22.724, −7.210], Cohen’s

d = - 0.720

], “video-based Q&A” [

t (30) = - 2.765

,

p = 0.005

, 95% CI [−16.320, −2.454], Cohen’s

d = - 0.497

], and “video-based Q&A + MAD” [

t (28) = - 3.260

,

p = 0.001

, 95% CI [−31.108, −7.099], Cohen’s

d = - 0.605

] interventions. These findings indicate that the “video-based Q&A + MAD” approach is more effective than “video-only” and “video-based Q&A” methods for online learning. These results support H1. Question samples and SPSS 29.0.1.0 results can be found at https://osf.io/4vt9h/, accessed on 26 November 2024.

Those in condition 2 exhibited the smallest improvement, whereas those in condition 3 showed the greatest improvement. It is generally believed that the use of large language models demonstrates an advantage over traditional video-watching methods. Table 4 indicates that the post-test performance of students in condition 3 shows a relative improvement of

29.78 %

, compared to

14.70 %

for students in condition 2, and

23.39 %

for students in condition 1. The quiz questions consisted of a mix of true/false questions and single and multiple-choice questions. A possible explanation is that the video-based Q&A approach might encourage students to take shortcuts, as they may not watch the video attentively and, instead, rely on asking questions directly to learn. This behavior could result in overlooking some video content, thereby leading to lower scores. In contrast, the “video-based Q&A + MAD” approach likely enhances students’ interest in learning through debates. The presentation of affirmative viewpoints may prompt students to question specific knowledge points, extending their study time and fostering a more comprehensive understanding of the video content. The pre-test and post-test questions are presented in Appendix A. As shown in Figure 6, we used violin plots to present the specific distributions of the scores of the three groups of samples in pre-test and post-test. It can be observed that the score distributions of the video-based Q&A + MAD method before and after the pre-test and post-test show significant differences, and these differences are much more pronounced compared to those of the other two groups.

Therefore, we conducted a further analysis from the perspective of students’ video-viewing duration. Specifically, we examined the variations in the duration of students’ engagement with online lecture videos under different scenarios, with and without the assistance of large language models. By comparing these durations in Figure 7, we can discern students’ interest in utilizing large language models and determine whether the use of such models can enhance students’ interest in learning videos. As shown in Figure 7, there is significant variation in the time students spent watching the video in the “traditional learning” condition. Some students spent only a few minutes, while others spent over an hour. One possible factor is that the learning material is in English, which can be challenging for Chinese students. As a result, some students may quickly skim through the video, while others might repeatedly listen or pause to focus on the video visuals. In contrast, the other two learning conditions, with the intervention of large language models, were more effective in engaging students’ interest. However, it is inevitable that some students, seeking convenience, may skip watching the video and, instead, rely solely on asking questions to learn from it. The “video-based Q&A + MAD” approach encouraged students to ask questions and initiate debates, helping them better understand the video content. These findings support Hypothesis (H2), which suggests that using video-based multi-agent debates helps students stay more focused on learning through videos.

The word frequency analysis presented in Figure 8 and Figure 9 indicates notable differences in responses to open-ended questions before and after watching the video. Post-test responses demonstrate a noticeably richer set of keywords. Additionally, inter-group comparisons reveal that under the third testing condition (“video-based Q&A + MAD”), the keywords are significantly more diverse. These findings suggest that MAD has a positive impact on online learning.

We present representative quotes from student responses to open-ended question for each group during the post-test for the topic of “Numerical Approximation of Gradients”:

Response of using the “video only” (English translation): One-sided difference refers to performing difference operations in one direction, typically used to eliminate errors in that direction. Two-sided difference involves difference operations in both directions, usually employed for further error elimination.

Response using the “video-based Q&A” (English translation): One-sided difference approximates derivatives based on changes in the function between x and

x + h

, while two-sided difference uses changes between x and

x - h

for a more precise approximation with smaller errors.

Response of using the “video-based Q&A + MAD” (English translation): One-sided difference focuses on the deviation of a single data point from the reference point, often used for preliminary data processing. Two-sided difference, by comparing two related data points, effectively reduces noise, highlights features, and optimizes model performance.

Similarly, representative quotes from student responses to open-ended questions for each group during the post-test on the topic of “regularization” are presented as follows:

Response using the “video only” (English translation): Although L2 regularization is more commonly used in deep learning, its widespread application does not render L1 regularization obsolete. L1 regularization offers unique advantages in feature selection and generating sparse models, making it highly effective in specific scenarios. Therefore, depending on the requirements of a particular task, both L1 and L2 regularization have their irreplaceable roles.

Response using the “video-based Q&A” (English translation): In machine learning and deep learning, L1 and L2 regularization are two commonly used techniques to prevent model overfitting. L1 regularization penalizes large weights by adding the absolute values of the weights to the loss function, while L2 regularization achieves this by adding the squared values of the weights.

Response using the “video-based Q&A + MAD” (English translation): Preventing overfitting is one of the primary objectives of regularization. While both L1 and L2 regularization can mitigate overfitting, they achieve this through different mechanisms. L1 regularization reduces the number of non-zero weights, whereas L2 regularization minimizes the magnitude of the weights. This diversity allows for the selection of the most suitable regularization method based on the specific problem and dataset. L1 regularization offers an alternative strategy for combating overfitting, complementing L2 regularization rather than opposing it.

These responses illustrate that the “video-only” approach enables students to acquire basic and accurate knowledge, the “video-based Q&A” approach allows for more precise answers, and the “video-based Q&A + MAD” approach further enhances the breadth of student responses. In other words, there is an increasing depth and breadth of understanding as the learning condition becomes more interactive. These findings are also consistent with the research results of previous papers. Through AI-assisted learning technologies, high-quality personalized learning can be achieved, improving students’ abstract thinking and critical thinking [42].

4.2. Student Feedback Analysis

To further analyze the role of “video-based Q&A + MAD” in students’ learning, we administered a follow-up questionnaire. The questionnaire included eight questions, five of which were rating-based using a five-point Likert scale, while the remaining three questions solicited users’ suggestions for improvement. Table 5 presents the detailed ratings from the follow-up questionnaire. Each item, representing an evaluation question in the follow-up questionnaire, was rated on a scale from 1 (lowest) to 5 (highest). Users who experienced the MAD functionality generally gave high scores, indicating a positive learning experience and supporting Hypothesis H2. We conducted a word cloud analysis of user improvement suggestions. The questionnaire focused on the “video-based Q&A” function, the MAD function, and the additional features that users most desired to include in future updates. According to the word cloud analysis in Figure 10, students evaluated the “video-based Q&A” function as accurate, fast, and effective in addressing the content of the videos. The multi-agent debate feature was rated as helpful, offering different viewpoints for understanding the learning material. The most requested additional features were video voice translation (as many Chinese students prefer not to listen to lengthy English lectures) and summaries of key points. These suggestions reflect users’ subjective experiences and align closely with the responses to the earlier open-ended questions.

5. Discussion

Interaction between learners and instructors, as well as among peers, plays a crucial role in helping learners construct meaning from educational materials by articulating their ideas, engaging with diverse perspectives, and refining their understanding through feedback. However, facilitating such interaction in MOOCs presents a considerable challenge. The use of LLM-driven agents to support both teaching and learning in online educational environments is expected to become a significant trend, as highlighted in various studies [42,63,64]. Experiments demonstrate that introducing multi-agent systems into online learning environments can inspire students to think critically and help deepen their understanding of course content. Although this research focuses on lecture videos in computer science, the methodology is applicable to other engineering courses.

This study does not focus on the impact of video lecture summaries on learning, as existing research [76] has already concluded that incorporating automatically generated summaries with videos can enhance students’ overall learning experience. Chen et al. [77] also emphasized that designing and implementing an interactive video lesson alone is insufficient. They highlighted that incorporating summarizing strategies and prompts can enhance video-based learning by improving learning outcomes, reducing cognitive load, increasing intrinsic motivation, fostering metacognition, and further engaging learners.

Large language models have already started to drive the transformation of the teaching model from the “teacher–student” binary structure to the “teacher–machine–student” ternary structure [78]. This structural change propels teaching from a “teaching-centered” approach to a “learning-centered” one. The educational paradigm will shift from supply-dominated to demand-dominated, reshaping and updating the role functions and competency requirements of teachers. Compared with the traditional binary structure, the technology-empowered “teacher–machine–student” ternary structure liberates teachers from inefficient, repetitive, and onerous knowledge imparting, allowing them to return to the core business of educating. Teachers can then focus on the more creative and affectionate educational values, emphasizing the cultivation of students’ autonomous learning abilities and creativity. Singapore is the first country globally to publicly support the use of ChatGPT in the education system. Its education authorities will guide teachers to effectively use intelligent tools like ChatGPT to enhance learning [79].

Our user study evaluates students using two short, pre-recorded, and carefully edited lecture videos. Future research should explore the effectiveness of multi-modal LLMs under broader conditions, including prerecorded versus live lectures, varying lengths, and diverse topics. While our experiments compared three conditions, additional conditions could provide further insights into the value of multi-agent chat and debate. For instance, limiting the time students have to answer questions, tracking the frequency of their use of agents, and surveying their willingness to use the tool before attempting the questions could yield valuable findings. It is worthwhile to explore the possibility of customizing the behavior of multi-agent systems according to students’ knowledge levels or their behaviors when watching video materials. Designing a closed-loop system to generate dynamic feedback is of great significance. Multi-agents integrated into formal educational settings, such as online lectures, should be able to support learners for long-term use. The design principles of digital tutors and the technologies of AI-augmented classrooms shed light on our future research [42,80]. Additionally, exposing students to a different condition, analyzing their changes by comparing their results and feedback, and collecting students’ questions to analyze the patterns of their inquiries will all be highly meaningful future work. Moreover, this study focuses on students’ immediate performance after learning. However, it would be more valuable to evaluate long-term memory retention, changes in learning habits, or the impact on metacognition. Therefore, we plan to incorporate the evaluation of long-term memory retention, changes in learning habits, and the influence on metacognition in our future research.

Additionally, due to GPU hardware limitations, the sample size of participants was relatively small, and most assessments were conducted as independent online tests. In future studies, we will focus on improving the experimental design and selecting model parameters that match the GPU computing power. On one hand, we can integrate regular assignments in computer teaching to expand the sample size of the questionnaire. On the other hand, in the future, we can rely on the university’s servers to deploy open-source models with relatively low hardware consumption, such as the recently popular DeepSeek [81], to break through the computing-power limitations of GPUs. To date many large language models have introduced small models of 7B and 14B, as well as distilled models. When combined with RAG technology, it is believed that these models will be sufficient to handle answer generation for knowledge-based questions in most scenarios in the future.

6. Conclusions

This study focuses on video-based online learning, comparing the traditional “video-only” learning approach with methods that incorporate large language models, specifically “video-based Q&A” and “video-based multi-agent debate” approaches. These methods represent some of the latest trends in LLM applications, offering a more interactive learning experience compared to the traditional unidirectional MOOC format. Additionally, generative responses based on retrieval-augmented generation technology provide students with more comprehensive knowledge summaries. In summary, the method proposed in this paper utilizes multi-agent debates to provide students with reference answers from different perspectives, aiming to deepen students’ understanding of knowledge and cultivate their critical thinking. However, compared with in-person teaching by teachers, it lacks the cultivation of other high-order thinking abilities, such as deep thinking and abstract thinking. Additionally, it fails to offer life thinking, life enlightenment, or certain enlightenment and thinking beyond the realm of artificial intelligence. This is also one of the common drawbacks of using AI tools for learning.

Author Contributions

Conceptualization, F.L. and J.D.; methodology, G.X. and J.D.; software, J.D.; validation, J.D.; formal analysis, J.D.; investigation, J.D.; resources, D.Z.; data curation, W.L. and J.D.; writing—original draft preparation, F.L. and J.D.; writing—review and editing, F.L.; visualization, J.D.; supervision, F.L.; project administration, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Project of Chinese National Educational Science Planning “Research on the Paths and Mechanisms of Enhancing Middle School Students’ Scientific Inquiry Practices with Large Language Models” (BCA240049).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the School of Information Science and Technology, Hangzhou Normal University (protocol code 2024-010, date of approval: 9 September 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The collected data are publicly available at the following link: https://osf.io/4vt9h/files/osfstorage (accessed on 26 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Pre-Test and Post-Test Questions

All items were originally presented in Chinese in the study and translated into English for understanding.

Appendix A.1. Regularization Pre-Test Questions

1.

Overfitting usually manifests itself as a high variance problem. (T/F)

2.

The regularization parameter

λ

serves to increase model complexity. (T/F)

3.

In machine learning and deep learning, L2 regularization is achieved by adding a regularization term to the loss function, which is

(a): The cube of the L2 norm of the model parameters (usually referred to as weights w)
(b): The square of the L2 norm of the model parameters (usually referred to as weights w)
(c): Minus the cube of the L2 norm of model parameters (usually referred to as weights w)
(d): Minus the cube of the L2 norm of model parameters (usually referred to as weights w)

4.

Which of the following options are correct in the description of L2 regularization in neural networks (Multiple choice)?

(a): L2 regularization is achieved by adding a term to the loss function that is proportional to the square of the weights
(b): L2 regularization terms usually do not include bias terms
(c): During gradient descent, L2 regularization causes the weights to shrink towards zero, thus preventing overfitting
(d): The larger the L2 regularization factor ( $λ$ ), the weaker the effect of regularization
(e): L2 regularization is achieved by directly modifying the weights during the forward propagation of the neural network

5.

What is the main difference between L1 regularization and L2 regularization in deep learning?

(a): L1 regularization takes into account the square of the weights, while L2 regularization does not take into account the square of the weights.
(b): There is no difference in effect between L1 regularization and L2 regularization.
(c): L1 regularization focuses on preventing overfitting, while L2 regularization is usually used for feature selection.
(d): L1 regularization tends to produce sparse weight matrices, while L2 regularization makes the weights smoother.

6.

There is a growing tendency to use L2 regularization when training networks to better achieve the effect of avoiding overfitting, does it mean that L1 regularization has gradually lost its usefulness? Please write down what you think.

Appendix A.2. Regularization Post-Test Questions

1.

Overfitting usually manifests itself as a high variance problem. (T/F)

2.

The regularization parameter

λ

serves to increase model complexity. (T/F)

3.

Why is L2 regularization usually performed only for the parameter w and not for the parameter “b”?

(a): “b” is one of many parameters that have a small effect on model complexity.
(b): The dimension of “b” is usually higher than that of “w”.
(c): The range of values for “b” is usually larger than for “w”, which does not lend itself easily to regularization.
(d): The update of “b” has no significant effect on model performance.

4.

What is the main purpose of regularization techniques in deep learning?

(a): Improving model accuracy
(b): Accelerating the model training process
(c): Reducing model overfitting
(d): Simplifying model complexity

5.

In machine learning and deep learning, L2 regularization is achieved by adding a regularization term to the loss function, which is

(a): The cube of the L2 norm of the model parameters (usually referred to as weights w)
(b): The square of the L2 norm of the model parameters (usually referred to as weights w)
(c): Minus the cube of the L2 norm of model parameters (usually referred to as weights w)
(d): Minus the cube of the L2 norm of model parameters (usually referred to as weights w)

6.

How can Dropout help improve model performance in neural networks (Multiple choice)?

(a): Reducing overfitting
(b): Increasing the complexity of the model
(c): Improving the generalization of models
(d): Accelerating the training process

7.

What are the correct statements about the regularization term

λ / 2 m \times {| | W | |}_{F}^{2}

(the Frobenius paradigm) (Multiple choice)?

(a): It sums the sum of squares of all layer weight matrices W
(b): It helps to reduce the number of paradigms in the weight matrix, thus avoiding overfitting
(c): It has the same effect as L2 regularization in logistic regression
(d): It only applies to fully connected layers, not to convolutional layers

8.

Which of the following options are correct in the description of L2 regularization in neural networks (Multiple choice)?

(a): L2 regularization is achieved by adding a term to the loss function that is proportional to the square of the weights
(b): L2 regularization terms usually do not include bias terms
(c): During gradient descent, L2 regularization causes the weights to shrink towards zero, thus preventing overfitting
(d): The larger the L2 regularization factor ( $λ$ ), the weaker the effect of regularization
(e): L2 regularization is achieved by directly modifying the weights during the forward propagation of the neural network

9.

What is the main difference between L1 regularization and L2 regularization in deep learning?

(a): L1 regularization takes into account the square of the weights, while L2 regularization does not take into account the square of the weights.
(b): There is no difference in effect between L1 regularization and L2 regularization.
(c): L1 regularization focuses on preventing overfitting, while L2 regularization is usually used for feature selection.
(d): L1 regularization tends to produce sparse weight matrices, while L2 regularization makes the weights smoother.

10.

There is a growing tendency to use L2 regularization when training networks to better achieve the effect of avoiding overfitting, does it mean that L1 regularization has gradually lost its usefulness? Please write down what you think.

Appendix A.3. Numerical Approximation of Gradients Pre-Test Questions

1.

What is the main purpose of gradient numerical approximation?

(a): Directly calculate the derivative of a function at a certain point
(b): Test the accuracy of the derivative by approximation method
(c): Taylor series expansion of the solution function
(d): Computational efficiency of optimization function

2.

What kind of function is the gradient numerical approximation method mainly suitable for?

(a): Differentiable continuous functions
(b): Functions that are not differentiable
(c): Discrete function
(d): Any function

3.

Which of the following formulas is the standard formula for the two-sided difference method?

(a): $(f (x + h) - f (x)) / h$
(b): $(f (x) - f (x - h)) / h$
(c): $(f (x + h) - f (x - h)) / 2 h$
(d): $(f (x + h) + f (x - h)) / 2 h$

4.

What is the main difference between unilateral difference and bilateral difference in gradient numerical approximation?

(a): Computational complexity
(b): The magnitude of the approximation error
(c): Type of function used
(d): Whether a derivation formula is needed

5.

In the gradient test, which one is more effective?

(a): Unilateral difference
(b): Bilateral difference

6.

Please briefly describe basic principles and differences of unilateral difference and bilateral difference.

Appendix A.4. Numerical Approximation of Gradients Post-Test Questions

1.

What is the main purpose of gradient numerical approximation?

(a): Directly calculate the derivative of a function at a certain point
(b): Test the accuracy of the derivative by approximation method
(c): Taylor series expansion of the solution function
(d): Computational efficiency of optimization function

2.

What kind of function is the gradient numerical approximation method mainly suitable for?

(a): Differentiable continuous functions
(b): Functions that are not differentiable
(c): Discrete function
(d): Any function

3.

Which of the following formulas is the standard formula for the two-sided difference method?

(a): $(f (x + h) - f (x)) / h$
(b): $(f (x) - f (x - h)) / h$
(c): $(f (x + h) - f (x - h)) / 2 h$
(d): $(f (x + h) + f (x - h)) / 2 h$

4.

What is the main difference between unilateral difference and bilateral difference in gradient numerical approximation?

(a): Computational complexity
(b): The magnitude of the approximation error
(c): Type of function used
(d): Whether a derivation formula is needed

5.

In the gradient test, which one is more effective?

(a): Unilateral difference
(b): Bilateral difference

6.

In gradient test, the main calculation of unilateral difference is

(a): Gradient caused by a large change in a function at a certain point in a certain direction
(b): An approximation of the gradient caused by a small change in the function at a point in a certain direction
(c): The average value of the gradient caused by the change of the function in all directions at one point
(d): The gradient difference caused by a change in the opposite direction of a function at a certain point

7.

The main difference between two-sided difference and one-sided difference in calculating the gradient approximation is

(a): Bilateral difference only takes into account changes in one direction
(b): Bilateral differences consider changes in two opposite directions
(c): Bilateral difference does not consider the change of the function, only the gradient
(d): There is no difference in calculation between bilateral difference and unilateral difference

8.

In practical applications, what factors may affect the choice of unilateral difference or bilateral difference for gradient testing?

(a): Derivability of the function
(b): Limitations on computing resources
(c): Requirements for accuracy
(d): All of the above factors

9.

In practical applications, which strategies might be adopted to balance computational efficiency and accuracy of gradient tests?

(a): Select the difference method according to the function characteristics
(b): Always use bilateral differences
(c): Always use unilateral difference
(d): No gradient test

10.

Please briefly describe basic principles and differences of unilateral difference and bilateral difference.

Appendix B. Follow-Up Questionnaire

1.

How satisfied are you with the system overall?

(a): Very satisfied (5 points)
(b): Fairly satisfied (4 points)
(c): Neutral (3 points)
(d): Fairly dissatisfied (2 points)
(e): Very dissatisfied (1 point)

2.

Do you find the UI design of the system simple and easy to understand?

(a): Very satisfied (5 points)
(b): Fairly satisfied (4 points)
(c): Neutral (3 points)
(d): Fairly dissatisfied (2 points)
(e): Very dissatisfied (1 point)

3.

How fast do you think the system response?

(a): Very rapid (5 points)
(b): Comparative speed (4 points)
(c): Neutral (3 points)
(d): Less rapid (2 points)
(e): Very slow (1 point)

4.

Are you satisfied with the effectiveness of the system’s video-based Q&A function?

(a): Very satisfied (5 points)
(b): Fairly satisfied (4 points)
(c): Neutral (3 points)
(d): Fairly dissatisfied (2 points)
(e): Very dissatisfied (1 point)

5.

Please give the corresponding reason based on your rating of the effectiveness of the system’s video-based Q&A function.

6.

Are you satisfied with the effectiveness of the system’s video-based MAD function (If applicable)?

(a): Very satisfied (5 points)
(b): Fairly satisfied (4 points)
(c): Neutral (3 points)
(d): Fairly dissatisfied (2 points)
(e): Very dissatisfied (1 point)

7.

Please give the corresponding reason based on your rating of the effectiveness of the system’s video-based MAD function. (If applicable).

8.

What feature would you most like to see added to the current system? Please provide examples if possible.

References

Aristovnik, A.; Karampelas, K.; Umek, L.; Ravšelj, D. Impact of the COVID-19 Pandemic on Online Learning in Higher Education: A Bibliometric Analysis. Front. Educ. 2023, 8, 1225834. [Google Scholar] [CrossRef]
Van Alten, D.C.; Phielix, C.; Janssen, J.; Kester, L. Self-regulated Learning Support in Flipped Learning Videos Enhances Learning Outcomes. Comput. Educ. 2020, 158, 104000. [Google Scholar] [CrossRef]
Moos, D.; Bonde, C. Flipping the Classroom: Embedding Self-Regulated Learning Prompts in Videos. Technol. Knowl. Learn. 2016, 21, 225–242. [Google Scholar] [CrossRef]
Kim, J.; Guo, P.J.; Cai, C.J.; Li, S.W.; Gajos, K.Z.; Miller, R.C. Data-driven Interaction Techniques for Improving Navigation of Educational Videos. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, Honolulu, HI, USA, 5–8 October 2014; pp. 563–572. [Google Scholar]
Yang, S.; Yim, J.; Kim, J.; Shin, H.V. CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022. [Google Scholar]
Pavel, A.; Reed, C.; Hartmann, B.; Agrawala, M. Video Digests: A Browsable, Skimmable Format for Informational Lecture Videos. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, Honolulu, HI, USA, 5–8 October 2014; pp. 573–582. [Google Scholar] [CrossRef]
Shimada, A.; Okubo, F.; Yin, C.; Ogata, H. Automatic Summarization of Lecture Slides for Enhanced Student PreviewTechnical Report and User Study. IEEE Trans. Learn. Technol. 2018, 11, 165–178. [Google Scholar] [CrossRef]
OpenAI. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
Bernabei, M.; Colabianchi, S.; Falegnami, A.; Costantino, F. Students’ Use of Large Language Models in Engineering Education: A Case Study on Technology Acceptance, Perceptions, Efficacy, and Detection Chances. Comput. Educ. Artif. Intell. 2023, 5, 100172. [Google Scholar] [CrossRef]
Kohnke, L.; Moorhouse, B.; Zou, D. ChatGPT for Language Teaching and Learning. RELC J. 2023, 54, 537–550. [Google Scholar] [CrossRef]
Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
Kaddour, J.; Harris, J.; Mozes, M.; Bradley, H.; Raileanu, R.; McHardy, R. Challenges and Applications of Large Language Models. arXiv 2023, arXiv:2307.10169. Available online: https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2023.1225834 (accessed on 27 April 2025).
Bender, E.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event Canada, 3–10 March 2021; pp. 610–623. [Google Scholar]
Kasneci, E.; Sessler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
Sahoo, P.; Singh, A.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv 2024, arXiv:2402.07927. [Google Scholar]
Kojima, T.; Gu, S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large Language Models Are Zero-shot Reasoners. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.V.; Chi, E.H.; Narang, S.; Chowdhery, A.; Zhou, D. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-augmented Generation for Knowledge-intensive NLP Tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A Survey on Large Language Model based Autonomous Agents. Front. Comput. Sci. 2024, 18, 3. [Google Scholar] [CrossRef]
Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Li, B.; Zhu, E.; Jiang, L.; Zhang, X.; Zhang, S.; Liu, J.; et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv 2023, arXiv:2308.08155. [Google Scholar]
McAuley, A.; Stewart, B.; Siemens, G.; Cornier, D. The MOOC Model for Digital Practice: Massive Open Online Courses: Digital Ways of Knowing and Learning; University of Prince Edward Island: Charlottetown, PE, Canada, 2010; Available online: https://books.google.com.hk/books?id=VIYSzgEACAAJ (accessed on 27 April 2025).
Laura, P. The Year of the MOOC. The New York Times, 13 November 2012. [Google Scholar]
Yuan, L.; Powell, S. MOOCS and Open Education: Implications for Higher Education; JISC Cetis; University of Bolton: Bolton, UK, 2013. [Google Scholar]
Laurillard, D. The Educational Problem that MOOCs Could Solve: Professional Development for Teachers of Disadvantaged Students. Res. Learn. Technol. 2016, 24, 29369. [Google Scholar] [CrossRef]
Khan, M.; Vivek; Nabi, M.; Khojah, M.; Tahir, M. Students’ Perception towards E-Learning during COVID-19 Pandemic in India: An Empirical Study. Sustainability 2021, 13, 57. [Google Scholar] [CrossRef]
Moore, R. Developing Lifelong Learning with Heutagogy: Contexts, Critiques, and Challenges. Distance Educ. 2020, 41, 381–401. [Google Scholar] [CrossRef]
Suanpang, P.; Netwong, T.; Manisri, T.; Duantrakoonsi, W. The Factors Affecting Learning Outcome Intention of MOOCs for an Online Learning Platform. Psychol. Educ. 2021, 58, 3922–3930. [Google Scholar]
Jansen, R.; Van Leeuwen, A.; Janssen, J.; Conijn, R.; Kester, L. Supporting Learners’ Self-regulated Learning in Massive Open Online Courses. Comput. Educ. 2020, 146, 103771. [Google Scholar] [CrossRef]
Lu, Y.; Wang, B.; Lu, Y. Understanding Key Drivers of MOOC Satisfaction and Continuance Intention to Use. J. Electron. Commer. Res. 2019, 20, 105–117. [Google Scholar]
Aldowah, H.; Al-Samarraie, H.; Alzahrani, A.; Alalwan, N. Factors Affecting Student Dropout in MOOCs: A Cause and Effect Decision-making Model. J. Comput. High. Educ. 2020, 32, 429–454. [Google Scholar] [CrossRef]
Jordan, K. Initial Trends in Enrollment and Completion of Massive Open Online Courses. Int. Rev. Res. Open Distrib. Learn. 2014, 15, 133–160. [Google Scholar]
Margaryan, A.; Bianco, M.; Littlejohn, A. Instructional Quality of Massive Open Online Courses (MOOCs). Comput. Educ. 2015, 80, 77–83. [Google Scholar] [CrossRef]
Kizilcec, R.; Piech, C.; Schneider, E. Deconstructing Disengagement: Analyzing Learner Subpopulations in Massive Open Online Courses. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, Leuven, Belgium, 8–13 April 2013; pp. 170–179. [Google Scholar]
Hew, K.; Cheung, W. Students’ and Instructors’ Use of Massive Open Online Courses (MOOCs): Motivations and Challenges. Educ. Res. Rev. 2014, 12, 45–58. [Google Scholar] [CrossRef]
Wei, W.; Liu, J.; Xu, X.; Kolletar-Zhu, K.; Zhang, Y. Effective Interactive Engagement Strategies for MOOC Forum Discussion: A Self-efficacy Perspective. PLoS ONE 2023, 18, e0293668. [Google Scholar] [CrossRef]
Neha; Kim, E. Designing Effective Discussion Forum in MOOCs: Insights from Learner Perspectives. Front. Educ. 2023, 8, 1223409. [Google Scholar] [CrossRef]
Galikyan, I.; Admiraal, W.; Kester, L. MOOC Discussion Forums: The Interplay of the Cognitive and the Social. Comput. Educ. 2021, 165, 104133. [Google Scholar] [CrossRef]
Freitas, M.; Silva, M. Systematic Literature Review about Gamification in MOOCs. Open Learn. J. Open Distance-Learn. 2023, 38, 73–95. [Google Scholar] [CrossRef]
Meng, C.; Zhao, M.; Pan, Z.; Pan, Q.; Bonk, C. Investigating the Impact of Gamification Components on Online Learners’ Engagement. Smart Learn. Environ. 2024, 11, 47. [Google Scholar] [CrossRef]
Chauhan, J.; Goel, A. An Analysis of Video Lecture in MOOC. In Proceedings of the International Conference on Information and Communication Technologies in Education, Research, and Industrial Applications, Lviv, Ukraine, 14–16 May 2015; Available online: https://ceur-ws.org/Vol-1356/paper_16.pdf (accessed on 27 April 2025).
Chen, Y.; Gao, Q.; Yuan, Q.; Tang, Y. Facilitating Students’ Interaction in MOOCs through Timeline-Anchored Discussion. Int. J. Human–Computer Interact. 2019, 35, 1781–1799. [Google Scholar] [CrossRef]
Yu, J.; Zhang, Z.; Zhang, D.; Tu, S.; Hao, Z.; Li, R.; Li, H.; Wang, Y.; Li, H.; Gong, L.; et al. From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents. arXiv 2024, arXiv:2409.03512. [Google Scholar]
Reyna, J. Digital Teaching and Learning Ecosystem (DTLE): A Theoretical Approach for Online Learning Environments. In Proceedings of the Ascilite Hobart 2011, Hobart, TAS, Australia, 4–7 December 2011; pp. 1083–1088. Available online: https://ascilite.org/conferences/hobart11/downloads/papers/Reyna-concise.pdf (accessed on 27 April 2025).
Thorneycroft, S. Defining a Next-generation Ecosystem for Online Learning: From Changing the Platform to Shifting the Paradigm. In Proceedings of the 39th International Conference on Innovation, Practice and Research in the Use of Educational Technologies in Tertiary Education, Sydney, Australia, 4–7 December 2022; pp. 1–6. [Google Scholar] [CrossRef]
Hsu, H.; Wang, C.; Levesque-Bristol, C. Reexamining the Impact of Self-determination Theory on Learning Outcomes in the Online Learning Environment. Educ. Inf. Technol. 2019, 24, 2159–2174. [Google Scholar] [CrossRef]
Petersen, G.; Petkakis, G.; Makransky, G. A Study of How Immersion and Interactivity Drive VR Learning. Comput. Educ. 2022, 179, 104429. [Google Scholar] [CrossRef]
Pedaste, M.; Mäeots, M.; Siiman, L.; Jong, T.; Riesen, S.; Kamp, E.; Manoli, C.; Zacharia, Z.; Tsourlidaki, E. Phases of Inquiry-based Learning: Definitions and The Inquiry Cycle. Educ. Res. Rev. 2015, 14, 47–61. [Google Scholar] [CrossRef]
Kizilcec, R.; Pérez-Sanagustín, M.; Maldonado, J. Self-regulated Learning Strategies Predict Learner Behavior and Goal Attainment in Massive Open Online Courses. Comput. Educ. 2017, 104, 18–33. [Google Scholar] [CrossRef]
Xu, Z.; Zhao, Y.; Liew, J.; Zhou, X.; Kogut, A. Synthesizing Research Evidence on Self-regulated Learning and Academic Achievement in Online and Blended Learning Environments: A Scoping Review. Educ. Res. Rev. 2023, 39, 100510. [Google Scholar] [CrossRef]
Huang, C.; Han, Z.; Li, M.; Wang, X.; Zhao, W. Sentiment Evolution with Interaction Levels in Blended Learning Environments: Using Learning Analytics and Epistemic Network Analysis. Australas. J. Educ. Technol. 2021, 37, 81–95. [Google Scholar] [CrossRef]
Li, Y.; Meng, N.; Ye, X.; Syed Shah, A. Self-determination and Perceived Learning in Online Learning Communities. Sci. Rep. 2024, 14, 24538. [Google Scholar] [CrossRef]
Zhang, L.; Carter, R.A., Jr.; Qian, X.; Yang, S.; Rujimora, J.; Wen, S. Academia’s Responses to Crisis: A Bibliometric Analysis of Literature on Online Learning in Higher Education during COVID-19. Br. J. Educ. Technol. 2022, 53, 620–646. [Google Scholar] [CrossRef] [PubMed]
Ng, D.; Leung, J.; Su, J.; Ng, R.; Chu, S. Teachers’ AI Digital Competencies and Twenty-first Century Skills in the Post-pandemic World. Educ. Technol. Res. Dev. 2023, 71, 137–161. [Google Scholar] [CrossRef]
Wang, P.; Wang, F.; Li, Z. Exploring the Ecosystem of K-12 Online Learning: An Empirical Study of Impact Mechanisms in the Post-pandemic Era. Front. Psychol. 2023, 14, 1241477. [Google Scholar] [CrossRef]
Ayeni, O.; Hamad, N.; Chisom, O.; Osawaru, B.; Adewusi, E. AI in Education: A Review of Personalized Learning and Educational Technology. GSC Adv. Res. Rev. 2024, 18, 261–271. [Google Scholar] [CrossRef]
Wooldridge, M.; Jennings, N. Intelligent Agents: Theory and Practice. Knowl. Eng. Rev. 1995, 10, 115–152. [Google Scholar] [CrossRef]
Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv 2023, arXiv:2309.07864. [Google Scholar]
Guo, T.; Chen, X.; Wang, Y.; Chang, R.; Pei, S.; Chawla, N.; Wiest, O.; Zhang, X. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv 2024, arXiv:2402.01680. [Google Scholar]
Park, J.; O’Brien, J.; Cai, C.; Morris, M.; Liang, P.; Bernstein, M. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), San Francisco, CA, USA, 29 October–1 November 2023. [Google Scholar]
Zhu, X.; Chen, Y.; Tian, H.; Tao, C.; Su, W.; Yang, C.; Huang, G.; Li, B.; Lu, L.; Wang, X.; et al. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. arXiv 2023, arXiv:2305.17144. [Google Scholar]
Fu, Y.; Peng, H.; Khot, T.; Lapata, M. Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback. arXiv 2023, arXiv:2305.10142. [Google Scholar]
Extance, A. ChatGPT has Entered the Classroom: How LLMs Could Transform Education. Nature 2023, 623, 474–477. [Google Scholar] [CrossRef]
Yue, M.; Mifdal, W.; Zhang, Y.; Suh, J.; Yao, Z. MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education. arXiv 2024, arXiv:2404.06711. [Google Scholar]
Zhang, Z.; Zhang-Li, D.; Yu, J.; Gong, L.; Zhou, J.; Hao, Z.; Jiang, J.; Cao, J.; Liu, H.; Liu, Z.; et al. Simulating Classroom Education with LLM-Empowered Agents. arXiv 2024, arXiv:2406.19226. [Google Scholar]
Huber, S.; Kiili, K.; Nebel, S.; Ryan, R.; Sailer, M.; Ninaus, M. Leveraging the Potential of Large Language Models in Education Through Playful and Game-Based Learning. Educ. Psychol. Rev. 2024, 36, 25. [Google Scholar] [CrossRef]
Baillifard, A.; Gabella, M.; Lavenex, P.; Martarelli, C. Effective Learning with a Personal AI Tutor: A Case Study. Educ. Inf. Technol. 2025, 30, 297–312. [Google Scholar] [CrossRef]
Park, M.; Kim, S.; Lee, S.; Kwon, S.; Kim, K. Empowering Personalized Learning through a Conversation-based Tutoring System with Student Modeling. In Proceedings of the CHI EA ’24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–10. [Google Scholar]
Mohamed, A. Exploring the Potential of an AI-based Chatbot (ChatGPT) in Enhancing English as a Foreign Language (EFL) Teaching: Perceptions of EFL Faculty Members. Educ. Inf. Technol. 2024, 29, 3195–3217. [Google Scholar] [CrossRef]
Tlili, A.; Shehata, B.; Adarkwah, M.; Bozkurt, A.; Hickey, D.; Huang, R.; Agyemang, B. What if the Devil Is My Guardian Angel: ChatGPT as a Case Study of Using Chatbots in Education. Smart Learn. Environ. 2023, 10, 15. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, X.; Zhou, T.; Kim, J. Do You Have AI Dependency? The Roles of Academic Self-efficacy, Academic Stress, and Performance Expectations on Problematic AI Usage Behavior. Int. J. Educ. Technol. High. Educ. 2024, 21, 2–14. [Google Scholar] [CrossRef]
Siu, O.; Lui, K.; Huang, Y.; Ng, T.; Yeung, W. An Efficient, Reliable and Valid Assessment for Affective States during Online Learning. Sci. Rep. 2024, 14, 15768. [Google Scholar] [CrossRef]
Almarzouqi, A.; Aburayya, A.; Alfaisal, R.; Elbadawi, M.A.; Salloum, S.A. Ethical Implications of Using ChatGPT in Educational Environments: A Comprehensive Review. In Artificial Intelligence in Education: The Power and Dangers of ChatGPT in the Classroom; Springer Nature: Cham, Switzerland, 2024; pp. 185–199. [Google Scholar] [CrossRef]
Li, K.; He, Y.; Wang, Y.; Li, Y.; Wang, W.; Luo, P.; Wang, Y.; Wang, L.; Qiao, Y. VideoChat: Chat-Centric Video Understanding. arXiv 2023, arXiv:2305.06355. [Google Scholar]
Liang, T.; He, Z.; Jiao, W.; Wang, X.; Wang, Y.; Wang, R.; Yang, Y.; Shi, S.; Tu, Z. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate. arXiv 2023, arXiv:2305.19118. [Google Scholar]
Cohen, J. A Power Primer. Psychol. Bull. 1992, 112, 155–159. [Google Scholar] [CrossRef]
Gonzalez, H.; Li, J.; Jin, H.; Ren, J.; Zhang, H.; Akinyele, A.; Wang, A.; Miltsakaki, E.; Baker, R.; Callison-Burch, C. Automatically Generated Summaries of Video Lectures May Enhance Students’ Learning Experience. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Toronto, ON, Canada, 13 July 2023; pp. 382–393. Available online: https://aclanthology.org/2023.bea-1.31 (accessed on 27 April 2025).
Chen, W.; Lin, L.; Pang, W. Learning via Video: The Effects of Summarizing Strategies and Self-explanation Prompts. Educ. Inf. Technol. 2025, 30, 3541–3572. [Google Scholar] [CrossRef]
Yang, Z.; Wang, J.; Wu, D.; Chen, X. Exploring the Impact of ChatGPT/AIGC on Education and Strategies for Response. J. East China Norm. Univ. (Educ. Sci.) 2023, 41, 26–35. [Google Scholar]
Ministry of Education Singapore. Managing the Use of Artificial Intelligence (AI) Bots such as ChatGPT in Schools. 2023. Available online: www.moe.gov.sg (accessed on 27 April 2025).
Hobert, S.; Berens, F. Developing a Digital Tutor as an Intermediary Between Students, Teaching Assistants, and Lecturers. Educ. Technol. Res. Dev. 2024, 72, 797–818. [Google Scholar] [CrossRef]
DeepSeek-AI. DeepSeek-V3 Technical Report. arXiv 2025, arXiv:2412.19437.

Figure 1. A topic analysis on the abstracts of relevant literature from 3057 documents retrieved from Clarivate Analytics’ Web of Science. The figure was generated using Biblioshiny, a tool within the Bibliometrix package, by importing bibliographic data from Clarivate Analytics’ Web of Science.

Figure 2. Overview of the proposed research method.

Figure 3. Experimental procedure. From left to right, the diagram illustrates the feature processing, RAG workflow, and debate workflow of the AI-empowered video-based online learning system. The multi-agent debate module operates independently from the QA system while also complementing it, both of which serve the purpose of supporting students in self-learning.

Figure 4. Evaluation of results from different video understanding methods. (a) The result of VideoChat2. (b) The result of VLog. (c) The result of our video-MAD system.

Figure 5. An illustration of our multi-agent debate-assisted learning system. The user interface includes video upload, video information extraction, Q&A, and debate functionalities. After the user uploads a video, the system extracts video information, which is utilized in both the Q&A and multi-agent debate processes. The green input boxes are the places where students input questions and debate topics. The red text box is the area where the answers from the LLM, as well as the texts and conclusions of the multi-agent debate process, are displayed.

Figure 6. Score differences between pre-tests and post-tests correspond to the three learning conditions: video only, video-based Q&A, and video-based Q&A plus multi-agent debate.

Figure 7. Time differences across the three groups correspond to the three learning conditions: video only, video-based Q&A, and video-based Q&A plus multi-agent debate.

Figure 8. Word cloud analysis of open-ended question responses provided by students for the course titled “Numerical Approximation of Gradients”. (a) Responses in the pre-test under the video-only testing condition. (b) Responses in the post-test under the video-only testing condition. (c) Responses in the pre-test under the video-based Q&A testing condition. (d) Responses in the post-test under the video-based Q&A testing condition. (e) Responses in the pre-test under the video-based Q&A+MAD testing condition. (f) Responses in the post-test under the video-based Q&A+MAD testing condition.

Figure 9. Word cloud analysis of open-ended question responses provided by the students for the course titled “Regularization”. (a) Responses in the pre-test under the video-only testing condition. (b) Responses in the post-test under the video-only testing condition. (c) Responses in the pre-test under the video-based Q&A testing condition. (d) Responses in the post-test under the video-based Q&A testing condition. (e) Responses in the pre-test under the video-based Q&A+MAD testing condition. (f) Responses in the post-test under the video-based Q&A+MAD testing condition.

Figure 10. Word cloud analysis of the feedback provided by the students in follow-up questionnaires. (a) Evaluation of the effectiveness of the video-based Q&A function. (b) Evaluation of the effectiveness of the MAD function. (c) The function you would most like to add to the current online learning system.

Table 1. The top ten most relevant keywords were extracted from 3057 documents retrieved from Clarivate Analytics’ Web of Science using the keywords “online learning and video”, constrained to the educational research field.

	Author Keywords	Articles	Keywords Plus	Articles
1	ONLINE LEARNING	270	EDUCATION	264
2	LEARNING	227	STUDENTS	231
3	E-LEARNING	189	ONLINE	186
4	EDUCATION	172	PERFORMANCE	145
5	FLIPPED CLASSROOM	137	IMPACT	141
6	HIGHER EDUCATION	135	TECHNOLOGY	131
7	BLENDED LEARNING	121	VIDEO	114
8	ONLINE	121	ENGAGEMENT	113
9	VIDEO	112	MOTIVATION	95
10	TECHNOLOGY	103	PERCEPTIONS	95

Table 2. The MAD process for an example in Andrew Ng’s lecture video on regularization (original text is in Chinese).

〈 debate topic 〉: Is regularization always necessary?
Affirmative Argument: As a debater for the affirmative side, I believe that in many cases, regularization is essential.
Here are several points to support this argument:
1. Preventing Overfitting: The primary goal of regularization is to prevent the model from overfitting the training data. Overfitting occurs when a model becomes too complex and overly learns the details and noise of the training data, leading to poor performance on unseen data. By introducing a regularization term, we can constrain the model parameters, thus enhancing its generalization ability.
2. Resource Optimization: Regularization helps strike a balance between model complexity and performance. By controlling the complexity of the model, we can avoid building overly complex models, which may consume excessive computational resources and time.
3. Knowledge Transfer: In transfer learning and multi-task learning, regularization promotes knowledge sharing and transfer between tasks. It prevents each task from learning independently, thus improving overall performance.
4. Algorithm Stability: Regularization can improve the stability of the algorithm, making the model less sensitive to small changes in the data and increasing its robustness.
5. Parameter Constraints: In fields such as medicine or finance, it is sometimes important for model parameters to have physical or economic significance. Regularization can help constrain these parameters within reasonable bounds.
However, this does not imply that regularization is necessary in all cases. In situations where the dataset is very large, the relationship between features and the target is simple, or where there is no strict need to limit model complexity, it may be feasible to forgo regularization. Overall, considering the risk of overfitting and resource efficiency, regularization is a valuable tool in most machine learning problems and should be regarded as an important step.
Negative Argument: As the opposing debater, I understand the affirmative side’s points, but I believe that regularization is not always necessary in every situation. Here are my arguments:
1. Data Quality: If the training data is large enough and evenly distributed, without noise or outliers, the model may generalize well without regularization. In such cases, the model can naturally learn the main patterns in the data without overfitting.
2. Simple Problems: For simple linear problems or problems with a clear structure, such as linear regression, the model might already have good generalization ability and may not need additional regularization to prevent overfitting.
3. Preprocessing and Feature Selection: Effective data preprocessing and feature selection can reduce noise and irrelevant features, thereby lowering the risk of overfitting, making regularization less necessary.
4. Non-Parametric Methods: Certain non-parametric methods, such as decision trees and random forests, have inherent regularization properties (e.g., limiting tree depth or using bagging) and may not require additional regularization.
5. Hyperparameter Tuning: By carefully adjusting hyperparameters like model complexity and learning rate, it is possible to find an appropriate model without the need for regularization. This is particularly feasible for small or medium-sized datasets.
6. Real-Time and Online Learning: In environments where data streams are continuously updated, regularization may not be required since the model constantly adapts to new data and doesn’t overfit early data points.
In conclusion, while regularization is an effective tool for improving generalization in many cases, it is not always necessary. Whether regularization is needed depends on the nature of the data, the complexity of the problem, and the available computational resources. In some specific situations, satisfactory results can be achieved without regularization.
After three rounds of debates…
Arbitration opinion: Regularization is an important tool in machine learning, but its necessity depends on the specific application scenario. In many cases, regularization can improve the generalization ability and stability of the model.
However, in other cases, it may be necessary to combine with other methods to achieve the best effect.
Arbitration reason: In the third round of debates, both debaters further deepened their understanding of the role of regularization in machine learning, emphasizing its importance and complementarity with other technologies. Both sides agreed that regularization is a valuable tool, but not the only solution in all cases. They jointly pointed out that the decision on whether and how to use regularization should be made according to specific problems, data, and model selection.

Table 3. Design of Multi-Agent Debate. We assigned different roles to the agents with the aim of creating an atmosphere of debate. The prompts used in our system are all in Chinese.

Roles	Prompt (English Translation)
Moderator	You are the moderator. Two debaters will participate in the debate. They will present their answers and discuss their respective viewpoints on the following topic: 〈debate topic〉. At the end of each round, you will evaluate the answers and determine which one is correct.
Affirmative Debater	You are affirmative side. Please express your viewpoints.
Negative Debater	You are negative side. You disagree with the affirmative side’s points. Provide your reasons and answer.

Table 4. Student performance was evaluated through pre-tests and post-tests after watching the lecture videos under each of the three learning conditions: video only, video-based Q&A, and video-based Q&A plus multi-agent debate.

Condition	N	Pre-Test ( $μ, σ$ )	Post-Test ( $μ, σ$ )	$Δ$
V	30	$64.00 %$ , 27.50	$78.97 %$ , 18.83	$23.39 %$
V+Q&A	31	$63.87 %$ , 22.76	$73.26 %$ , 22.00	$14.70 %$
V+Q&A+MAD	29	$64.14 %$ , 25.29	$83.24 %$ , 20.88	$29.78 %$

Table 5. Ratings in the follow-up questionnaire.

General Satisfaction	Easy to Use	System Response Speed	Eval of Q&A	Eval of MAD
4.31	4.43	3.96	4.13	4.34

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, J.; Xu, G.; Liu, W.; Zhou, D.; Liu, F. Enhancing Online Learning Through Multi-Agent Debates for CS University Students. Appl. Sci. 2025, 15, 5877. https://doi.org/10.3390/app15115877

AMA Style

Du J, Xu G, Liu W, Zhou D, Liu F. Enhancing Online Learning Through Multi-Agent Debates for CS University Students. Applied Sciences. 2025; 15(11):5877. https://doi.org/10.3390/app15115877

Chicago/Turabian Style

Du, Jing, Guangtao Xu, Wenhao Liu, Dibin Zhou, and Fuchang Liu. 2025. "Enhancing Online Learning Through Multi-Agent Debates for CS University Students" Applied Sciences 15, no. 11: 5877. https://doi.org/10.3390/app15115877

APA Style

Du, J., Xu, G., Liu, W., Zhou, D., & Liu, F. (2025). Enhancing Online Learning Through Multi-Agent Debates for CS University Students. Applied Sciences, 15(11), 5877. https://doi.org/10.3390/app15115877

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Online Learning Through Multi-Agent Debates for CS University Students

Abstract

1. Introduction

2. Literature Review

2.1. MOOCs

2.2. Online Learning Environment

2.3. LLM-Based Learning Systems

3. Materials and Methods

3.1. Design of Evaluation System

3.1.1. LLM RAG-Based Video Understanding Function

3.1.2. Multi-Agent Debate-Assisted Learning Function

3.2. Participants and Design

4. Results

4.1. Data Analysis

4.2. Student Feedback Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Pre-Test and Post-Test Questions

Appendix A.1. Regularization Pre-Test Questions

Appendix A.2. Regularization Post-Test Questions

Appendix A.3. Numerical Approximation of Gradients Pre-Test Questions

Appendix A.4. Numerical Approximation of Gradients Post-Test Questions

Appendix B. Follow-Up Questionnaire

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI