Next Article in Journal
Multilingual Literacy for All? Aligning Clinical Practice of Bilingual Teacher Candidates in California
Previous Article in Journal
Empowering Environmental Awareness Through Chemistry: A Science–Technology–Society–Environment-Based Approach to Teaching Acid–Base Reactions in 11th-Grade Science
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Effects of Pedagogical Agent-Generated Summaries on Video-Based Learning: Evidence from Eye-Tracking and EEG

by
Lei Yuan
1,2,
Jiyuan Xu
2 and
Zehui Zhan
3,*
1
Key Laboratory of Education Blockchain and Intelligent Technology of Ministry of Education, Guilin 541004, China
2
Faculty of Education, Guangxi Normal University, Guilin 541004, China
3
School of Information Technology in Education, South China Normal University, Guangzhou 510631, China
*
Author to whom correspondence should be addressed.
Educ. Sci. 2026, 16(1), 39; https://doi.org/10.3390/educsci16010039 (registering DOI)
Submission received: 28 September 2025 / Revised: 11 December 2025 / Accepted: 26 December 2025 / Published: 29 December 2025

Abstract

As an emerging learning support technology, large language model-powered pedagogical agents demonstrate significant potential in enhancing video learning effectiveness, yet the underlying cognitive mechanisms remain inadequately elucidated. This study employed a multimodal approach combining EEG and eye-tracking to investigate the effects of AI-generated mind maps and text summaries on learning performance and cognitive processing. Following data screening, 80 valid datasets from education majors were randomly assigned to three groups: mind map summary (PA-MMS, n = 27), text summary (PA-TS, n = 28), and control (NPA, n = 25). Results showed both experimental groups achieved significantly higher post-test scores than controls, with PA-MMS demonstrating the strongest performance (d = 3.78). EEG evidence indicated pedagogical agents reduced Theta activity (decreased working memory load) while PA-MMS enhanced Alpha activity (superior attention control). Eye-tracking revealed differentiated strategies: PA-MMS exhibited networked fixation patterns facilitating integration; PA-TS demonstrated linear scanning. Delayed testing showed PA-MMS achieved the highest retention (96.8%). Correlations confirmed posttest scores negatively correlated with Theta (r = −0.46) and extraneous load (r = −0.61), positively with germane load (r = 0.54). Mind maps simultaneously reduced extraneous load (d = 1.26) while enhancing germane processing (d = 1.15), representing a shift from static scaffolds to AI-mediated generative support.

1. Introduction

With the rapid development of digital technology, online video learning has become an integral component of education. Research indicates that well-designed online learning can achieve superior outcomes compared to traditional face-to-face instruction (Means et al., 2013). However, traditional video-based learning often faces challenges such as information overload, difficulty in grasping key points, and low learning efficiency. In recent years, generative artificial intelligence technologies, exemplified by ChatGPT, have brought new opportunities for addressing these challenges.
Pedagogical Agents (PAs), also known as instructional agents, are intelligent learning support systems built on artificial intelligence technology. Unlike traditional rule-based summarization that relies on predetermined templates, modern pedagogical agents such as the one employed in this study (Doubao) utilize large language model architecture to perform context-aware content analysis, dynamically extracting conceptual hierarchies and generating adaptive knowledge structures tailored to specific instructional material (Kasneci et al., 2023; Baidoo-Anu & Ansah, 2023). This represents a paradigm shift from static cognitive scaffolding to generative cognitive support. These pedagogical agents can understand, analyze, and summarize video content, generate structured summaries in real-time, and present core knowledge points through mind maps or text summaries, providing cognitive scaffolding for learners (Johnson & Lester, 2016). As an emerging learning support technology, generative AI-powered pedagogical agents have demonstrated significant potential in enhancing learning outcomes and reducing cognitive load (Armando et al., 2022).
Although pedagogical agents are increasingly prevalent in educational practice, the underlying mechanisms of their impact on the learning process remain to be systematically investigated. Existing research primarily relies on behavioral data and subjective reports to assess learning outcomes, lacking direct measurement of physiological indicators (Mutlu-Bayraktar et al., 2019). Single-method approaches provide only partial insights into the complex cognitive mechanisms underlying AI-assisted learning. Cognitive Load Theory posits that cognitive resources during learning are limited, and effective instructional design should optimize the allocation of cognitive resources (Sweller et al., 2011). However, whether the supplementary content generated by pedagogical agents genuinely reduces cognitive load and the differential effects of various presentation formats on cognitive processing still require more objective evidence through integrated measurement of behavioral, subjective, and neurophysiological indices.
Electroencephalography (EEG) technology, as a non-invasive neurophysiological measurement method, can provide objective indicators of cognitive states during learning (Antonenko et al., 2010). Brain wave activity in different frequency bands is closely associated with specific cognitive functions, providing neurophysiological evidence for assessing cognitive load, attention levels, and depth of information processing (Klimesch, 1999). Meanwhile, eye-tracking technology can precisely record learners’ visual attention allocation patterns, reflecting information acquisition strategies (Lai et al., 2013). The convergence of these behavioral and neurophysiological measures enables triangulated validation of theoretical assumptions about pedagogical agent effects. Combining these two technologies synchronously enables a comprehensive understanding of the cognitive mechanisms of pedagogical agent-assisted learning from both behavioral and neural perspectives.
Based on the aforementioned background, this study aims to investigate the following core questions: First, can pedagogical agent assistance effectively enhance video learning performance, and is this improvement accompanied by optimization of cognitive load? Second, do different forms of pedagogical agent-generated supplementary content differ in promoting learning and regulating cognitive load? Third, how are learners’ visual attention allocation patterns influenced by pedagogical agent assistance, and what is the relationship between this influence and learning outcomes? Fourth, from a neurophysiological perspective, what are the characteristics of EEG activity during pedagogical agent-assisted learning processes?
This study employs a multimodal data collection approach, synchronously recording eye-tracking and EEG data while learners watch instructional videos. The research establishes three experimental conditions: pedagogical agent-generated mind map summary assistance, pedagogical agent-generated text summary assistance, and no pedagogical agent assistance (control group). By comparing learning performance, cognitive load, eye movement patterns, and EEG characteristics across different conditions, this study aims to reveal the cognitive-neural mechanisms and effects of pedagogical agent-assisted video learning through triangulated evidence from behavioral, subjective, and neurophysiological levels, providing theoretical foundations and empirical support for the design and application of intelligent educational technologies.

2. Literature Review

2.1. Cognitive Load Theory’s Guidance for Online Video Learning Design

Cognitive Load Theory posits that during the learning process, learners must process new information in working memory and integrate it into knowledge structures in long-term memory. When cognitive processing demands exceed working memory capacity, cognitive overload occurs, impairing learning outcomes (Sweller et al., 2011). Understanding the composition and characteristics of cognitive load provides the theoretical foundation for optimizing video learning design.
To better understand and measure cognitive load during learning, Leppink et al. (2013) developed a three-dimensional model of cognitive load. Intrinsic Cognitive Load (ICL) stems from the inherent complexity of the learning task itself, determined by the number of elements in the learning material and their degree of interaction. Extraneous Cognitive Load (ECL) is caused by inappropriate information presentation methods, such as confusing layouts or redundant information, which contribute nothing to learning. Germane Cognitive Load (GCL) represents meaningful processing for schema construction, involving learners’ cognitive efforts to actively integrate new and existing knowledge. Effective instructional design should manage ICL while minimizing ECL and optimizing GCL, providing clear direction for video learning design.
Building on the three-dimensional cognitive load model, the Cognitive Theory of Multimedia Learning further elaborates on managing different types of cognitive load in video learning. Mayer (2017) systematically summarized design principles for reducing ECL. The spatial contiguity principle requires placing text adjacent to corresponding graphics to reduce visual search burden. The temporal contiguity principle emphasizes that narration and corresponding graphics should be presented simultaneously to avoid maintaining information in working memory while awaiting integration. The redundancy principle indicates that when graphics are accompanied by narration, simultaneously presenting identical on-screen text creates unnecessary cognitive processing. Conversely, attention-guiding strategies that direct learners’ attention to key information through visual signals or auditory cues can optimize cognitive resource allocation.
Beyond multimedia design principles, information organization is crucial for cognitive load management. Based on Cognitive Load Theory, Leutner et al. (2009) found that when learners face high cognitive load, they adopt load-reduction strategies, reducing processing of textual details to focus on core concepts and key information. Among these, extracting complex concepts, segmenting information, and employing visual hierarchical structures are effective load-reduction methods that can substantially decrease learners’ cognitive burden and enhance learning outcomes. For instance, Zhang et al. (2006) found that compared to traditional linear text, online video materials with visual structures more effectively guide learners in organizing and integrating knowledge. Furthermore, Schneider and Preckel’s (2017) research demonstrated that online video courses employing hierarchical information presentation significantly improved learning outcomes while reducing learners’ cognitive burden.
The transient information effect in video learning is another significant factor contributing to increased learner cognitive load. A. Wong et al. (2012) experimentally demonstrated that video content presented dynamically possesses transient characteristics, requiring learners to simultaneously retain past and current information in working memory, readily leading to cognitive overload. This effect is more pronounced when presenting lengthy segments or complex information, making it difficult for learners to effectively process and retain knowledge. Leahy and Sweller (2011) further found that when auditory information is lengthy or complex, its transience significantly increases working memory burden, as learners cannot repeatedly review content as they would with static text. Singh et al.’s (2012) research indicated that providing fixed reference materials during video learning can effectively reduce the cognitive burden of processing transient information, particularly when dealing with complex content.
From the three-dimensional cognitive load model to multimedia learning principles, from transient information effects to load-reduction strategies, these theoretical and empirical findings collectively constitute the knowledge foundation for optimizing video learning. However, traditional design principles primarily rely on static, predetermined optimization strategies, struggling to accommodate individual learner differences and dynamically changing cognitive states. How to achieve more intelligent and personalized cognitive load management has become a new challenge facing current video learning research. This necessitates introducing new technological approaches, particularly pedagogical agents capable of real-time learning state perception and dynamic support strategy adjustment, to transcend the limitations of traditional design.

2.2. Forms and Effects of Pedagogical Agent-Assisted Video Learning

Pedagogical agents demonstrate complex and powerful cognitive support functions in video learning. According to Wooldridge and Jennings’s (1995) classic definition, an agent is essentially a computational entity capable of perceiving its environment, reasoning autonomously, and executing goal-directed behaviors. Recent advancements in generative artificial intelligence have fundamentally transformed the capabilities of pedagogical agents. Unlike traditional rule-based systems that rely on predetermined templates and fixed algorithms, modern pedagogical agents powered by large language models (LLMs) can perform context-aware content analysis and generate adaptive knowledge structures tailored to specific instructional materials (Kasneci et al., 2023). This represents a paradigm shift from static cognitive scaffolding to generative cognitive support, where the pedagogical agent acts not merely as a content presenter but as an intelligent cognitive partner capable of real-time adaptation (Baidoo-Anu & Ansah, 2023). In educational contexts, these new-generation agents are viewed as systems with high cognitive adaptability, whose core value lies in providing dynamic, precise, and personalized learning support for learners. The new generation of pedagogical agents based on large language models achieves environmental perception, cognitive model reconstruction, and dynamic adjustment of learning strategies through nonlinear interaction, self-organization, and emergence mechanisms, exhibiting adaptive and evolutionary characteristics similar to living organisms.
In terms of functional dimensions, pedagogical agents play multiple roles in video learning. Contemporary LLM-powered pedagogical agents demonstrate unprecedented capabilities in understanding and processing educational video content. For instance, ChatGPT and similar tools have shown the capacity to dynamically extract conceptual hierarchies from multimodal instructional materials and create personalized summaries that align with individual learners’ cognitive needs (Sullivan et al., 2023). These systems can analyze both verbal and visual information streams in videos, identifying key concepts and their relationships to generate structured knowledge representations (Dai et al., 2023). Ma et al. (2024) developed the Hypo Compass agent based on GPT-4.0, which significantly enhanced students’ programming skills and learning engagement by providing real-time programming guidance and conceptual explanations. Liu et al. (2024) designed the CoQuest agent, combining GPT-3.5, the ReAct reasoning framework, and academic databases, demonstrating unique advantages in assisting students with research question formulation and complex concept comprehension. Empirical evidence suggests that LLM-based agents can reduce cognitive load by adaptively adjusting the granularity and complexity of generated summaries based on learner characteristics and content difficulty (AlShaikh et al., 2024). These agents not only answer questions and provide explanations but also adjust support strategies based on learning progress, achieving a transformation from static information delivery to dynamic cognitive support.
The cognitive support mechanisms of agents embody a modern interpretation of Vygotsky’s Zone of Proximal Development theory. Pedagogical agents can be viewed as dynamic cognitive scaffolds, supporting learners’ cognitive development through precise prompting. Scheiter and Eitel (2015) indicated that cueing elements in educational videos, such as underlined text, arrows, headings, and explanatory text, can effectively reduce learners’ cognitive load and guide attention allocation. Jamet (2014) validated this cueing effect through eye-tracking research, confirming that incorporating appropriate cues in educational videos significantly enhances learning outcomes. Agents transform traditional static design into dynamic adaptive support by automatically generating and presenting these cues.
In specific applications of video learning, agent-generated summaries have become a key functionality. Rahman et al. (2024) developed an AI-driven video summarization system capable of reorganizing video content into visual and textual summaries. Experimental evaluations showed that AI-generated keyword summaries and visual summaries demonstrated superior performance in both accuracy and relevance, significantly outperforming random selection and traditional clustering algorithms. Recent studies further reveal that generative AI can produce hierarchically organized summaries that match learners’ prior knowledge levels, potentially reducing cognitive overload in complex learning materials (Chromik et al., 2019). The temporal and multimodal nature of video content particularly demands flexible summarization strategies that can capture both visual and verbal information streams—a capability where LLM-based agents demonstrate distinct advantages over conventional static summarization methods (Mangaroska et al., 2022).
However, the effects of agent assistance present complexity and context-dependency. Hao and Cukurova (2023) employed the Wizard of Oz research method to investigate the impact of AI-generated discussion summaries, finding differential effects of automatically generated summaries across different learner groups. Particularly for students with lower engagement levels, merely providing summaries did not significantly increase their interaction levels. The research also indicated that the timing of summary presentation is a critical factor affecting outcomes; timely summaries can serve as participation reminders, promoting learners’ active engagement. L. H. Wong and Viberg’s (2024) research further confirmed that agent assistance effects are jointly moderated by learner characteristics, task nature, and learning phases.
Regarding cognitive load management, agents demonstrate unique advantages. Li et al. (2024) found through comparing human-human collaboration with human-machine collaboration modes that large language models such as ChatGPT excel in improving task efficiency, cultivating systematic thinking, and reducing cognitive load. Agents can decompose complex information into comprehensible segments, provide hierarchical explanations, and adjust information density based on learners’ comprehension levels. Yilmaz and Karaoglan Yilmaz’s (2023) series of studies confirmed that generative AI tools have positive effects on enhancing students’ computational thinking abilities, programming self-efficacy, and learning motivation, with these improvements closely related to effective cognitive load management.
Pedagogical agents provide support for video learning through various forms, including cognitive scaffolding and intelligent summarization. However, existing research lacks in-depth exploration of how generative AI-powered agents systematically manage cognitive load in video learning, particularly the coordination mechanisms between dynamically generated summary structures and transient information characteristics. Moreover, while behavioral assessments can measure learning outcomes, they cannot reveal how learners cognitively process adaptively generated content in real-time. The integration of neurophysiological measures such as eye-tracking and EEG becomes particularly crucial when evaluating LLM-generated instructional materials, as these tools can detect moment-to-moment shifts in attention allocation and cognitive engagement. Therefore, investigating the impact mechanisms of agent-assisted strategies on cognitive load through multimodal behavioral and neurophysiological evidence holds significant importance.

3. Research Questions and Hypotheses

This study employs the presentation format of video content summaries generated by pedagogical agents as the independent variable, systematically examining the effectiveness of pedagogical agent assistance in online video learning through experimental methods. Three experimental groups are established:
  • Pedagogical Agent-Generated Mind Map Summary Group (PA-MMS): Presenting knowledge relationships in a graphical format;
  • Pedagogical Agent-Generated Text Summary Group (PA-TS): Presenting core knowledge points following principles of conciseness;
  • Non-Pedagogical Agent Summary Generation Group (NPA): Presenting only original video content.
Based on Cognitive Load Theory and Cognitive Theory of Multimedia Learning, this study aims to explore the impact mechanisms of pedagogical agent assistance on video learning. Existing research indicates that, in high cognitive load scenarios, learners tend to offload task demands to intelligent systems to reduce cognitive load (Wahn et al., 2023). Mind maps, as dual-channel cueing tools, can present knowledge relationships through graphical means, helping learners establish conceptual connections (Novak, 2010). Based on this foundation, the following research questions and hypotheses are proposed:
RQ1: Can pedagogical agent assistance effectively enhance video learning performance, and is this enhancement accompanied by optimization of cognitive load?
H1. 
Summaries generated by pedagogical agents will significantly affect learning performance and cognitive load.
H1a. 
Learning performance in the PA-MMS and PA-TS groups will be significantly higher than in the NPA group.
H1b. 
Cognitive load in the PA-MMS and PA-TS groups will be significantly lower than in the NPA group.
H1c. 
Improvements in learning performance will be negatively correlated with reductions in cognitive load.
RQ2: Do different forms of pedagogical agent-assisted content exhibit differences in promoting learning and regulating cognitive load?
H2. 
Mind maps and text summaries will differ in learning effectiveness and cognitive load regulation.
H2a. 
Learning performance in the PA-MMS group will be superior to that in the PA-TS group.
H2b. 
Cognitive load in the PA-MMS group will be lower than in the PA-TS group.
H2c. 
Mind maps will be more effective in reducing extraneous cognitive load, while text summaries may induce higher cognitive processing intensity.
RQ3: How are learners’ visual attention allocation patterns influenced by pedagogical agent assistance, and what is the relationship between this influence and learning outcomes?
H3. 
Pedagogical agent assistance will alter attention allocation patterns and correlate with learning outcomes.
H3a. 
Fixation distributions in the PA-MMS and PA-TS groups will be more concentrated on key information areas.
H3b. 
The PA-MMS group will exhibit network-like fixation distribution patterns, while the PA-TS group will display linear fixation scanning patterns.
RQ4: From a neurophysiological perspective, what are the characteristics of EEG activity during pedagogical agent-assisted learning processes?
H4. 
Pedagogical agent assistance will induce specific changes in EEG activity.
H4a. 
Theta wave power in the PA-MMS and PA-TS groups will be lower than in the NPA group, reflecting reduced cognitive load.
H4b. 
Alpha wave suppression in the PA-MMS group will be greater than in the PA-TS group, reflecting stronger visuospatial processing.

4. Method

4.1. Participants

This study randomly recruited 88 education major students from the Faculty of Education at G Normal University, including both undergraduate and graduate students. The sample comprised 34 males (38.6%) and 54 females (61.4%), with a mean age of 21.4 years (SD = 2.3).
The 88 participants were allocated to three groups through seeded complete randomization: 30 participants in the Pedagogical Agent-Generated Mind Map Summary group (PA-MMS group), 29 participants in the Pedagogical Agent-Generated Text Summary group (PA-TS group), and 29 participants in the Non-Pedagogical Agent Summary Generation group (NPA group). All participants were proficient in computer operations, possessed online learning experience, had normal hearing, and had normal or corrected-to-normal vision. Additionally, none of the participants had previously systematically studied the Soil Mechanics course used as the test material.
Pretest results revealed that the three groups’ mean score on the soil mechanics basic concepts test (15 items) was 3.80 (SD = 1.81), with an average accuracy rate of 25.3%, which was not significantly different from the theoretical chance level of 25% for four-option random guessing (t(87) = 0.17, p = 0.87, d = 0.02). Between-group comparisons showed no significant differences in prior knowledge levels across groups (F(2, 85) = 0.06, p = 0.94, η 2 < 0.001), balanced gender distribution ( x 2 = 0.31, p = 0.86, Cramér’s V = 0.04), and similar age composition (F(2, 85) = 0.04, p = 0.96, η 2 < 0.001), indicating good baseline homogeneity among the three groups and satisfying the initial equivalence premise for the experiment.

4.2. Measurement Indicators

To investigate the effects of different presentation formats of video summaries generated by pedagogical agents on learning, this study constructed a multidimensional measurement system encompassing behavioral, subjective, and physiological levels. Learning performance, cognitive load, eye-tracking indicators, and EEG indicators were selected as dependent variables, aiming to comprehensively and deeply capture the cognitive mechanisms during the learning process.
The first category of dependent variable was learning performance. In this study, we designed a knowledge test questionnaire comprising 15 multiple-choice questions, all derived from the “Soil Mechanics” video course content, covering core knowledge points such as physical properties and engineering classification of soil. Each question was worth 1 point, with a maximum score of 15 points. The questionnaire was used for pretests, posttests, and retention tests, with option orders randomized to avoid memory effects. The questionnaire design was validated by domain experts and pilot-tested with 40 students to ensure appropriate difficulty and good discrimination. The final item difficulty coefficients ranged from 0.43 to 0.72, and discrimination coefficients ranged from 0.32 to 0.58, indicating good psychometric properties of the test. By comparing test score changes before and after learning, this study could scientifically evaluate the differential impacts of various pedagogical agent summary presentation formats on knowledge acquisition, comprehension, and retention.
The second category of dependent variable was cognitive load. This study employed a specifically developed Video Learning Cognitive Load Assessment Scale, constructed based on the three-dimensional model of cognitive load and the Cognitive Theory of Multimedia Learning. The scale contained 20 items using a 5-point Likert rating method and measured five dimensions: Intrinsic Cognitive Load (ICL), reflecting the inherent complexity of the learning content itself; Extraneous Cognitive Load (ECL), reflecting additional cognitive burden from information presentation methods; Germane Cognitive Load (GCL), reflecting beneficial cognitive investment in knowledge schema construction and integration processes; Attention Allocation (ATT), assessing learners’ subjective gaze distribution patterns across different video regions; and Transient Information Pressure (TIME), measuring perceived temporal pressure from the fleeting nature of video information. Through comprehensive measurement of these five dimensions, this study could precisely identify the specific impact mechanisms of different pedagogical agent summary formats on learners’ cognitive resource allocation.
The third category of dependent variable was eye-tracking indicators. This study provided sub-pixel accuracy eye movement data with a sampling rate set at 1000 Hz. Among various eye-tracking metrics, this study focused on fixation duration as a core indicator, representing the cumulative dwell time of gaze within specific Areas of Interest. Fixation duration is widely recognized as a reliable indicator of cognitive processing depth, with longer fixation durations typically indicating more thorough and detailed information processing of content in that area (Szarkowska et al., 2024). By comparatively analyzing the distribution of learners’ fixation durations across different Areas of Interest, this study could reveal how pedagogical agent summaries influence learners’ visual attention allocation strategies and the intrinsic relationship between this influence and learning outcomes.
The fourth category of dependent variable was EEG indicators. This study used portable EEG equipment to collect real-time neurophysiological signals during the learning process, analyzing power ratios in different frequency bands and cognitive state indicators. EEG frequency band analysis included: Delta waves (0.5–4 Hz), reflecting the brain’s relaxation or sleep state; Theta waves (4–8 Hz), closely associated with memory encoding and information integration; Alpha waves (8–12 Hz), reflecting attention regulation and cognitive inhibition; Beta waves (12–30 Hz), related to active cognitive processing and alertness; and Gamma waves (30–70 Hz), associated with higher-order cognitive functions and consciousness integration (Ursutiu et al., 2018). By analyzing between-group differences in these EEG indicators, this study could reveal the regulatory mechanisms of different pedagogical agent assistance methods on cognitive load from a neurophysiological perspective, providing objective physiological evidence for understanding their cognitive impacts.

4.3. Experimental Procedure

4.3.1. Experimental Materials and Pedagogical Agent System Setup

To control for the potential influence of participants’ prior knowledge on experimental results, this study selected the chapter “Physical Properties and Engineering Classification of Soil” from the national quality course “Soil Mechanics” on the China University MOOC platform as the uniform experimental material. The video was 9 min and 9 s in length, in MP4 format with a resolution of 1920 × 1080, covering core knowledge points including the three-phase composition of soil, physical property indicators, and engineering classification systems.
This study employed “Doubao” as the pedagogical agent, which automatically generated two forms of summaries based on the video content: mind map summaries and text summaries. The content accuracy and completeness of both summary types were validated through independent assessment by three subject matter experts to ensure information equivalence. In terms of interface design, both summary types occupied the same display area and were presented in real-time on the right side of the video playback interface (Figure 1).
Based on the above materials, this study established three experimental conditions (Figure 2):
  • PA-MMS Group: While participants watched the video, the right side of the interface synchronously displayed the pedagogical agent-generated mind map summary, visualizing logical relationships between knowledge points through hierarchical structures and node connections (Figure 2a).
  • PA-TS Group: While participants watched the video, the right side of the interface synchronously displayed the pedagogical agent-generated text summary, presenting the same knowledge points in a linear paragraph format (Figure 2b).
  • NPA Group: Participants only watched the video content, with the right side of the interface remaining blank, providing no supplementary summary.

4.3.2. Experimental Implementation and Data Collection

Before the experiment began, participants logged into the online experimental platform, read the experimental instructions, and understood the task requirements. Upon entering the video learning interface, the system automatically played the “Soil Mechanics” course video, during which fast-forward, rewind, and pause functions were disabled to ensure consistent learning duration for all participants. Immediately after the video ended, participants entered the testing phase, where they needed to complete the cognitive load assessment scale and knowledge test items. The entire experiment adopted a linear progression mode, with participants unable to return to modify completed content, ensuring strict control of experimental conditions. To assess long-term knowledge retention effects, a delayed memory test questionnaire was distributed to participants one week after the experiment to measure their knowledge retention. The entire process was automatically recorded and managed by the system, ensuring standardization and objectivity of data collection (Figure 3).

5. Results

Following data screening, the present study retained 80 valid datasets (PA-MMS group: n = 27; PA-TS group: n = 28; NPA group: n = 25). To control for Type I error inflation due to multiple comparisons, all primary indices were subjected to Benjamini–Hochberg false discovery rate (FDR) correction at α = 0.05. The analysis encompassed 17 core indices across four dimensions: learning performance (6 indices), cognitive load (5 indices), EEG activity (4 indices), and eye-tracking behavior (2 indices). Results for each dimension are presented in the corresponding tables.

5.1. Learning Performance

One-way ANOVA revealed significant differences among the three groups across all six learning performance indices (all qFDR < 0.01), with detailed statistics presented in Table 1.
Regarding immediate learning outcomes, both experimental groups equipped with pedagogical agents significantly outperformed the control group on posttest scores (F(2, 77) = 42.37, p < 0.001, η2 = 0.524), with the PA-MMS group demonstrating the highest performance. The contrast effect size between the PA-MMS and NPA groups reached a large effect level (d = 3.78 [3.05, 4.51]). Between-group differences in learning gains exhibited a similar pattern (η2 = 0.489), with the PA-MMS group significantly surpassing the PA-TS group (p < 0.05).
Results from the delayed test administered one week later indicated that between-group differences persisted in the long-term retention phase (F(2, 77) = 42.80, p < 0.001). The PA-MMS group achieved a retention rate of 96.8%, significantly higher than the other two groups (qFDR = 0.001), while simultaneously exhibiting the lowest forgetting score (M = 0.55, SD = 0.83). These findings align with the expectations of hypotheses H1a and H2a, suggesting that pedagogical agent assistance not only facilitates immediate learning but also contributes to long-term knowledge retention.

5.2. Cognitive Load Analysis

Table 2 presents the subjective cognitive load ratings across five dimensions along with their statistical test results. ANOVA revealed that pedagogical agent assistance exerted significant moderating effects on all load types (all qFDR ≤ 0.008).
The PA-MMS group exhibited the lowest ratings on both intrinsic cognitive load (ICL) and extraneous cognitive load (ECL) dimensions (ICL: η2 = 0.160; ECL: η2 = 0.214). Between-group differences in extraneous cognitive load displayed a gradient distribution, with the PA-MMS group scoring lowest, the NPA group highest, and the PA-TS group intermediate, yielding a contrast effect size of d = 1.26 [0.70, 1.82] between the PA-MMS and NPA groups. This result validated the expectation of hypothesis H1b regarding cognitive load optimization.
Notably, germane cognitive load (GCL) exhibited a pattern opposite to the previous two dimensions (F(2, 77) = 8.21, p = 0.001). The PA-MMS group achieved the highest rating on this dimension (M = 3.68, SD = 0.52), significantly exceeding the NPA group (d = 1.15 [0.59, 1.71]). This indicates that pedagogical agent assistance, while reducing irrelevant cognitive load, promoted learners’ engagement in deep cognitive processing.
On the time pressure (TIME) dimension, the PA-MMS group perceived the lowest instantaneous information pressure (M = 2.88), significantly lower than both the PA-TS group (M = 3.34) and the NPA group (M = 3.65), with a contrast effect size of d = 1.08 [0.52, 1.64]. This finding reveals the alleviating effect of structured summaries on the inherent time pressure in video-based learning.
Taken together, the three groups demonstrated a systematic distributional pattern in cognitive load: reduced extraneous load accompanied by elevated germane load. This load redistribution pattern aligns with the theoretical expectations of hypothesis H1c concerning optimized allocation of cognitive resources.

5.3. Neurophysiological Indices

Table 3 and Table 4 present EEG activity and eye-tracking behavioral data, respectively, revealing the cognitive mechanisms underlying pedagogical agent assistance from a neurophysiological perspective.

5.3.1. EEG Activity Patterns

Analysis of theta relative power revealed significant between-group differences (F(2, 77) = 10.28, p < 0.001, η2 = 0.211). The NPA group exhibited the highest theta power (M = 18.92%), while the PA-MMS group showed the lowest (M = 14.35%), with a contrast effect size of d = 1.28 [0.71, 1.85] between the two groups. Given the close association between theta activity and working memory load, this result provided neurophysiological evidence for hypothesis H4a, indicating that pedagogical agent assistance indeed reduced learners’ cognitive burden.
Alpha relative power displayed a trend opposite to that of theta (F(2, 77) = 7.84, p = 0.001, η2 = 0.169). The PA-MMS group demonstrated the highest alpha power (M = 18.76%), significantly exceeding the NPA group (M = 14.38%, d = 1.10 [0.54, 1.66]), supporting hypothesis H4b. Elevated alpha activity is typically regarded as an indicator of enhanced selective attention regulation capacity.
Beta and gamma bands exhibited the highest relative power in the PA-TS group (Beta: M = 17.68%; Gamma: M = 3.50%). However, the contrast effect sizes between the PA-MMS and NPA groups in these two frequency bands were small (Beta: d = 0.42; Gamma: d = 0.25), with confidence intervals crossing zero, suggesting that the substantive significance of these differences requires cautious interpretation. The elevation in beta and gamma power may reflect stronger semantic integration processing activated by text-based summaries, although this inference awaits further verification.

5.3.2. Visual Attention Allocation

Eye-tracking data revealed pronounced differences among the three groups in the proportion of fixation time on AOI A2 (summary area) (F(2, 77) = 43.90, p < 0.001, η2 = 0.533). The PA-TS and PA-MMS groups allocated 14.80% and 12.40% of their fixations to the summary area, respectively, whereas the NPA group allocated only 0.80%. The contrast effect size between the PA-MMS and NPA groups reached d = 2.56 [1.89, 3.23]. The difference between the two experimental groups was not significant (MD = −2.40%, p > 0.05), indicating that different summary formats exerted comparable effects in attracting overall attentional resources, a finding that supports hypothesis H3a.
Analysis of total fixation duration on AOI A2 exhibited a similar pattern (F(2, 77) = 38.45, p < 0.001). The PA-TS group’s cumulative fixation time (M = 65,129 ms) was slightly higher than that of the PA-MMS group (M = 56,318 ms), with both significantly exceeding the NPA group (M = 3320 ms, d = 2.43 [1.77, 3.09]). The longer fixation duration in the text group may reflect the word-by-word processing characteristics inherent in linear reading.
Qualitative analysis of fixation heatmaps (Figure 4) revealed differentiated visual scanning patterns. The PA-MMS group displayed a network-like fixation distribution, with high-density hotspots concentrated on the core nodes of the mind map and their branching connections, reflecting learners’ active construction of inter-conceptual relationships. In contrast, the PA-TS group exhibited a typical linear scanning trajectory characteristic of text reading, with fixation hotspots distributed sequentially from top to bottom. The NPA group’s fixations primarily focused on the video content area (A1 region), with virtually no heat signals in the summary area. These spatial distribution differences validated hypothesis H3b, indicating that mind maps guided networked visual scanning to facilitate conceptual integration, whereas text-based summaries supported linear deep processing.

5.4. Multivariate Correlation Analysis

Table 5 presents the Pearson correlation coefficient matrix among primary variables (N = 80), revealing systematic associations among learning performance, cognitive load, and neurophysiological indices.
Correlation analysis indicated that posttest scores were negatively correlated with intrinsic cognitive load (r = −0.52, p < 0.001) and extraneous cognitive load (r = −0.61, p < 0.001) at moderate strength, while positively correlated with germane cognitive load (r = 0.54, p < 0.001). This pattern suggests that improvements in learning outcomes were accompanied by reductions in irrelevant cognitive load and enhancements in beneficial cognitive processing, providing convergent evidence for hypothesis H1c.
Pronounced coupling relationships existed between neurophysiological indices and cognitive states. Theta power was negatively correlated with posttest scores (r = −0.46, p < 0.001) and positively correlated with intrinsic cognitive load (r = 0.58, p < 0.001), confirming its validity as an objective indicator of working memory load. Alpha power, in turn, was positively correlated with posttest scores (r = 0.39, p < 0.01) and germane cognitive load (r = 0.47, p < 0.001), supporting the theoretical assumption that it reflects active attentional control. Notably, theta and alpha power were negatively correlated (r = −0.52, p < 0.001), suggesting potential competitive modulation between these two neural mechanisms.
The correlation pattern of eye-tracking metrics further revealed the mediating role of visual attention allocation. The proportion of fixation time on AOI A2 exhibited strong positive correlations with learning outcome measures (posttest scores: r = 0.68; learning gains: r = 0.62, both p < 0.001), while demonstrating negative correlations with intrinsic load (r = −0.45) and extraneous load (r = −0.53), and positive correlation with germane load (r = 0.56, all p < 0.001). These associations suggest that strategic reallocation of visual attention may constitute a key mediating mechanism through which pedagogical agents exert their effects.
Analysis of predictors for long-term retention revealed that retention test scores were positively correlated with germane cognitive load (r = 0.58, p < 0.001) and alpha power (r = 0.42, p < 0.01), implying the importance of deep cognitive processing and attentional control for knowledge consolidation. The positive correlation between forgetting scores and theta power (r = 0.51, p < 0.001) indicates that working memory load during the initial learning phase may compromise long-term memory stability.
Taken together, the directionality of the aforementioned cross-index correlation network was highly consistent with the theoretical expectations of hypotheses H2, H3, and H4, providing triangulated support for the integrated interpretation of multimodal data.

6. Discussion

6.1. Mechanisms of Pedagogical Agents’ Impact on Learning Performance

The findings demonstrated that the incorporation of pedagogical agents contributed to enhanced video learning performance, with the underlying mechanism primarily manifested through optimized cognitive resource allocation via structured information presentation. This aligned with findings by Du et al. (2025), who compared three types of text prompts (subtitle-based, keyword-based, and structured text prompts) on elementary students’ educational animation learning, finding that structured text prompts (CTC) performed optimally in learning achievement, knowledge retention, and self-efficacy while maintaining the lowest cognitive load. In the present study, agent-generated mind maps and text summaries act as structured scaffolds that externalize key concepts and logical relationships, thereby reducing cognitive load and enabling deeper understanding.
In video learning, information presentation is characterized by transience and continuity, requiring learners to simultaneously process and integrate content from different temporal points. Agent-generated summaries serve as external cognitive scaffolds that help learners organize key information, thus freeing limited working memory to focus on understanding and integration. Compared to text summaries, mind maps demonstrate unique advantages in knowledge visualization by encoding conceptual hierarchies through spatial positioning, which may facilitate dual-channel processing—though the correlational design precludes causal claims. For instance, when learners observe “three-phase composition of soil” positioned centrally with three branches extending outward in the graphical structure, the visual system processes spatial relationships while the auditory system processes explanatory content, with both working synergistically to expand cognitive capacity.
In delayed testing, the PA-MMS group achieved 96.8% retention, significantly higher than other groups. This advantage may stem from the hierarchical visual structure providing optimized retrieval cues for long-term memory consolidation.
Additionally, this study complements findings by Rahman et al. (2024). While Rahman et al.’s research primarily demonstrated the accuracy advantages of AI-generated summaries, the present study reveals differences in learning effectiveness across different summary formats, particularly in long-term knowledge retention. Mind maps can provide intuitive cues about video content progression, enabling learners to grasp the overall knowledge framework while following detailed content development. This progressive cognitive guidance may underlie their advantages.
From an applied perspective, these results hold potential value for intelligent educational system design. In large-scale online education contexts such as MOOCs, pedagogical agents can automatically generate structured content summaries, helping learners understand video knowledge more efficiently. For populations lacking autonomous learning strategies, such tools can particularly alleviate cognitive burden and enhance the learning experience.

6.2. Optimization Mechanisms and Differentiated Regulation of Cognitive Load

The cognitive load change patterns revealed in this study provided crucial evidence for understanding the mechanisms of pedagogical agents. Mind maps demonstrated superior advantages in reducing extraneous cognitive load, potentially stemming from their spatial structure, enabling learners to efficiently integrate information, while linear text tends to introduce additional interference. The higher Beta wave activity induced by text summaries indicates enhanced semantic processing and attention allocation, suggesting that different summary formats differ in regulating working memory burden and information processing efficiency.
Building on these patterns, cognitive load optimization was primarily manifested through load type transfer. The PA-MMS group showed the lowest extraneous load while maintaining the highest germane load. This pattern differed from Kalyuga’s (2011) expertise reversal effect; for novice learners, high-quality AI scaffolds did not create redundancy but rather promoted deep processing. This may be because AI generates more adaptive knowledge structures based on content features rather than relying on fixed templates.
The NPA group’s highest Theta power is consistent with Klimesch’s (1999) framework linking Theta to working memory demands, though individual baseline differences introduce interpretive caution. NPA group learners needed to simultaneously process video and construct content points, leading to a heavier working memory burden, while both agent-assisted groups could rely on ready-made frameworks to reduce cognitive load. This study also revealed format difference effects, with the PA-MMS group showing lower Theta waves and higher Alpha waves, indicating reduced working memory burden while maintaining close attention. This aligns with Antonenko et al.’s (2010) multimedia learning research findings and extends relevant discoveries to the AI-assisted domain. Although text summaries reduced cognitive load, they were accompanied by higher Beta and Gamma wave activity, indicating processing pathways oriented more toward sustained semantic integration. This resembles Bastiaansen et al.’s (2012) findings on neural oscillatory characteristics of language comprehension. Text summaries facilitate global framework capture but increase integration pressure due to their linear characteristics, while mind maps enable understanding to unfold progressively, thereby alleviating synchronous processing demands.
Integration of multimodal measures revealed systematic associations between neurophysiological and behavioral indicators. Theta power negatively correlated with posttest scores (r = −0.46, p < 0.001) and positively with intrinsic load (r = 0.58, p < 0.001), consistent with its proposed role as a memory load indicator. Fixation time on summaries (AOI A2) negatively correlated with both Theta (r = −0.48, p < 0.001) and extraneous load (r = −0.53, p < 0.001), suggesting strategic attention reallocation may mediate load reduction. However, cross-sectional correlations cannot establish directionality.
The mitigation mechanism for transient information effects was also confirmed. The fleeting nature of video information readily induces overload (Ng et al., 2013), yet AI-generated summaries significantly reduced this pressure. Time pressure decreased from 3.65 in the NPA group to 2.88 in the PA-MMS group, demonstrating not only significant data-level effectiveness but also aligning with distributed cognition theory expectations (Hutchins, 1995). Learners need not memorize verbatim but instead utilize external scaffolds for selective processing.
Compared to Li et al. (2024), who emphasized that AI task-sharing reduces cognitive load, this study found that different AI support formats trigger different cognitive reorganization. Mind maps do not replace learner processing but rather optimize information pathways by enhancing knowledge accessibility and organization, embodying the augmentation effect of AI-assisted learning.
At the educational practice level, this study emphasizes that AI support requires fine-tuned matching according to task characteristics. Not all AI-generated content optimizes cognitive load. For structurally complex learning content with dense conceptual relationships, the spatial organization of mind maps offers greater advantages, whereas, in rapid browsing contexts, the linear structure of text summaries proves more appropriate.

6.3. Attention Allocation Patterns and Information Processing Strategies

Based on eye-tracking evidence, this study revealed the restructuring effects of pedagogical agent assistance on visual attention allocation. The results showed that the PA-MMS and PA-TS groups allocated 12% and 15% of attention to the summary area, respectively. This attention migration represented not only quantitative changes in resource allocation but also pointed to qualitative transformations in information processing strategies.
The spatial fixation differences presented in heat maps carried important theoretical implications. The PA-MMS group exhibited networked fixation distribution resembling ‘global-local’ scanning patterns (Jarodzka et al., 2013), with gaze alternating between core nodes—though without concurrent process-tracing, whether this reflects deliberate integration or passive exploration remains speculative. This pattern validated the unique function of mind maps as progress cuing tools for video content, whereby learners can quickly scan to locate current explanations within the overall knowledge map, thereby achieving progressive knowledge construction. In contrast, the PA-TS group primarily employed linear scanning, which, while beneficial for rapidly grasping the video’s macro structure, demands a higher level of sustained attention for this top-down reading approach, potentially weakening capture of detailed information. This finding aligns with Van Gog and Scheiter’s (2010) conclusions regarding the relationship between eye movement patterns and learning strategies.
The temporal characteristics of attention allocation further elucidated the cognitive effects of different assistance formats. The linear structure of text summaries prompts learners to concentrate investment during the initial video stages to quickly form an overall framework through overview scanning, while mind maps support gradual branch exploration as content progresses. This differentiated deployment aligned with the “immediate processing-lookback integration” model proposed by Hyönä et al. (2003), also explaining why the PA-TS group, despite higher fixation percentages in the summary area, achieved slightly lower learning performance than the PA-MMS group.
The interaction between attention allocation and cognitive load challenged traditional cognitive perspectives. The PA-MMS group achieved efficient attention allocation under lower extraneous cognitive load, presenting a “low load-high efficiency” state consistent with Wickens (2002) multiple resource theory. The spatial organization of mind maps reduced information integration costs, such that moderate attention dispersion not only avoided overload but promoted parallel processing through synergy between visuospatial and auditory-verbal channels.
Eye-tracking data also revealed changes in attention control during self-regulated learning (Azevedo et al., 2013). Learners’ fixations on summary areas during video learning demonstrated strategic and context-dependent characteristics. Specifically, agent-generated summaries provided external tools for metacognitive regulation, enabling learners to dynamically adjust attention allocation based on comprehension states.
The attention allocation patterns across three learning conditions revealed key characteristics of pedagogical agent-assisted MOOC learning. Under no-agent conditions, while learners maintained high focus on video content, the absence of external cognitive scaffolds potentially increased cognitive load and affected learning outcomes. Text summaries provided knowledge frameworks, but their linear reading mode may have interfered to some extent with the continuous processing of video content, yielding relatively limited improvement. In contrast, mind maps, through moderate attention proportion combined with networked scanning patterns, provided cognitive scaffolding while maintaining adequate focus on main content, demonstrating superior learning outcomes and lower cognitive load. Therefore, pedagogical agent assistance formats need to strike a balance between cognitive support and learning focus, avoiding extremes of either excessive guidance or insufficient support.

7. Conclusions

This study provides converging multimodal evidence that LLM-powered pedagogical agents can optimize cognitive resource allocation in video-based learning. Triangulated findings from behavioral assessments, subjective reports, EEG, and eye-tracking suggest AI-generated mind maps are associated with reduced extraneous load (d = 1.26), enhanced germane load (d = 1.15), and superior retention (96.8% vs. 87.0% after one week). Neurophysiological patterns indicate differential processing: PA-MMS exhibited lower Theta and higher Alpha, while PA-TS showed elevated Beta/Gamma.
This research extends Cognitive Load Theory by demonstrating that modern LLM-based agents differ fundamentally from static scaffolds. Unlike predetermined designs, the Doubao agent employed context-aware analysis to generate adaptive structures, representing a shift from designer-centered optimization to AI-mediated dynamic support. The simultaneous reduction of extraneous load and enhancement of germane load suggests generative agents may approximate human designers’ capacity for cognitive optimization, particularly for large-scale contexts where manual design is infeasible.
However, critical constraints apply. Unlike Du et al.’s (2025) controlled comparison of predetermined structured prompts, the present study cannot isolate whether advantages stem from the AI’s generative adaptation capability or merely from the structural properties of mind map formatting itself, as the Doubao agent dynamically generated context-aware summaries that confound adaptive intelligence with visual-spatial affordances. Correlational multimodal data support associations rather than causal pathways, and single-trial assessments do not address whether AI assistance fosters skill internalization or creates dependency. The homogeneous sample (education majors, N = 80) and single-domain content (Soil Mechanics) restrict generalizability. Additionally, expertise reversal effects suggest experts may benefit more from text summaries, yet stratification by prior knowledge was infeasible. Chinese-language materials limit conclusions to similar linguistic contexts, and without concurrent process-tracing, inferred cognitive strategies from eye movements remain speculative.
For practice, integrating generative AI into video platforms could mitigate information transience in MOOCs, though format-task alignment requires consideration. For research, controlled comparisons isolating AI’s generative contribution from format effects, expertise-stratified designs, and transfer tasks assessing deep understanding remain essential. This exploratory study establishes theoretical plausibility for LLM-based agents as adaptive scaffolds, though robust conclusions require systematic replication.

Author Contributions

Conceptualization, L.Y. and J.X.; methodology, L.Y. and J.X.; validation, L.Y., J.X., and Z.Z.; formal analysis, J.X.; investigation, L.Y. and J.X.; resources, L.Y.; data curation, J.X.; writing—original draft preparation, J.X.; writing—review and editing, Z.Z.; visualization, J.X.; supervision, Z.Z.; project administration, L.Y.; funding acquisition, L.Y. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation in China (62567001; 62277018; 62237001) and the Degree and graduate education Reform research project in Guangdong (2023JGXM046).

Institutional Review Board Statement

Institutional Review Board Statement: This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the School of Education, Guangxi Normal University (protocol code GXNU-FE-2025-001 and date of approval: 19 January 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy and ethical restrictions. The data contain sensitive personal information from participants that could compromise participant privacy if shared publicly.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. AlShaikh, R., Al-Malki, N., & Almasre, M. (2024). The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models. Heliyon, 10(3), e25361. [Google Scholar] [CrossRef]
  2. Antonenko, P., Paas, F., Grabner, R., & Van Gog, T. (2010). Using electroencephalography to measure cognitive load. Educational Psychology Review, 22(4), 425–438. [Google Scholar] [CrossRef]
  3. Armando, M., Ochs, M., & Régner, I. (2022). The impact of pedagogical agents’ gender on academic learning: A systematic review. Frontiers in Artificial Intelligence, 5, 862997. [Google Scholar] [CrossRef]
  4. Azevedo, R., Harley, J., Trevors, G., Duffy, M., Feyzi-Behnagh, R., Bouchet, F., & Landis, R. (2013). Using trace data to examine the complex roles of cognitive, metacognitive, and emotional self-regulatory processes during learning with multi-agent systems. In R. Azevedo, & V. Aleven (Eds.), International handbook of metacognition and learning technologies (pp. 427–449). Springer. [Google Scholar]
  5. Baidoo-Anu, D., & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52–62. [Google Scholar] [CrossRef]
  6. Bastiaansen, M., Magyari, L., & Hagoort, P. (2012). Syntactic unification operations are reflected in oscillatory dynamics during on-line sentence comprehension. Journal of Cognitive Neuroscience, 22(7), 1333–1347. [Google Scholar] [CrossRef]
  7. Chromik, M., Eiband, M., Völkel, S. T., & Buschek, D. (2019, March 20). Dark patterns of explainability, transparency, and user control for intelligent systems. IUI Workshops (Vol. 2327), Los Angeles, CA, USA. [Google Scholar]
  8. Dai, Y., Liu, A., & Lim, C. P. (2023). Reconceptualizing ChatGPT and generative AI as a student-driven innovation in higher education. Procedia CIRP, 119, 84–90. [Google Scholar] [CrossRef]
  9. Du, L., Tang, X., & Wang, J. (2025). Different types of textual cues in educational animations: Effect on science learning outcomes, cognitive load, and self-efficacy among elementary students. Education and Information Technologies, 30(3), 3573–3596. [Google Scholar] [CrossRef]
  10. Hao, X., & Cukurova, M. (2023). Exploring the effects of “AI-generated” discussion summaries on learners’ engagement in online discussions. In Proceedings of the International Conference on Artificial Intelligence in Education, Tokyo, Japan, July 3–7 (pp. 155–161). Springer Nature. [Google Scholar]
  11. Hutchins, E. (1995). Cognition in the wild. MIT Press. [Google Scholar]
  12. Hyönä, J., Lorch, R. F., Jr., & Rinck, M. (2003). Eye movement measures to study global text processing. In J. Hyönä, R. Radach, & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye movement research (pp. 313–334). North-Holland. [Google Scholar]
  13. Jamet, E. (2014). An eye-tracking study of cueing effects in multimedia learning. Computers in Human Behavior, 32, 47–53. [Google Scholar] [CrossRef]
  14. Jarodzka, H., Van Gog, T., Dorr, M., Scheiter, K., & Gerjets, P. (2013). Learning to see: Guiding students’ attention via a model’s eye movements fosters learning. Learning and Instruction, 25, 62–70. [Google Scholar] [CrossRef]
  15. Johnson, W. L., & Lester, J. C. (2016). Face-to-face interaction with pedagogical agents, twenty years later. International Journal of Artificial Intelligence in Education, 26(1), 25–36. [Google Scholar] [CrossRef]
  16. Kalyuga, S. (2011). Cognitive load theory: How many types of load does it really need? Educational Psychology Review, 23(1), 1–19. [Google Scholar] [CrossRef]
  17. Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., … Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. [Google Scholar] [CrossRef]
  18. Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews, 29(2–3), 169–195. [Google Scholar] [CrossRef]
  19. Lai, M. L., Tsai, M. J., Yang, F. Y., Hsu, C. Y., Liu, T. C., Lee, S. W. Y., Lee, M. H., Chiou, G. L., Liang, J. C., & Tsai, C. C. (2013). A review of using eye-tracking technology in exploring learning from 2000 to 2012. Educational Research Review, 10, 90–115. [Google Scholar] [CrossRef]
  20. Leahy, W., & Sweller, J. (2011). Cognitive load theory, modality of presentation and the transient information effect. Applied Cognitive Psychology, 25(6), 943–951. [Google Scholar] [CrossRef]
  21. Leppink, J., Paas, F., Van der Vleuten, C. P. M., Van Gog, T., & Van Merriënboer, J. J. G. (2013). Development of an instrument for measuring different types of cognitive load. Behavior Research Methods, 45(4), 1058–1072. [Google Scholar] [CrossRef]
  22. Leutner, D., Leopold, C., & Sumfleth, E. (2009). Cognitive load and science text comprehension: Effects of drawing and mentally imagining text content. Computers in Human Behavior, 25(2), 284–289. [Google Scholar] [CrossRef]
  23. Li, T., Ji, Y., & Zhan, Z. (2024). Expert or machine? Comparing the effect of pairing student teachers with in-service teachers and ChatGPT on their critical thinking, learning performance, and cognitive load in an integrated STEM course. Asia Pacific Journal of Education, 44(1), 45–60. [Google Scholar] [CrossRef]
  24. Liu, Y., Chen, S., Cheng, H., Liu, Y., & Zhang, Z. (2024, May 11–16). How AI processing delays foster creativity: Exploring research question co-creation with an LLM-based agent. 2024 CHI Conference on Human Factors in Computing Systems (pp. 1–25), Honolulu, HI, USA. [Google Scholar]
  25. Ma, Q., Shen, H., Koedinger, K., & Tomkins, S. (2024). How to teach programming in the AI era? Using LLMs as a teachable agent for debugging. In Proceedings of the international conference on artificial intelligence in education (pp. 265–279). Springer Nature Switzerland. [Google Scholar]
  26. Mangaroska, K., Sharma, K., Gašević, D., & Giannakos, M. (2022). Exploring students’ cognitive and affective states during problem solving through multimodal data: Lessons learned from a programming activity. Journal of Computer Assisted Learning, 38(1), 40–59. [Google Scholar] [CrossRef]
  27. Mayer, R. E. (2017). Using multimedia for e-learning. Journal of Computer Assisted Learning, 33(5), 403–423. [Google Scholar] [CrossRef]
  28. Means, B., Toyama, Y., Murphy, R., Bakia, M., & Jones, K. (2013). Evaluation of evidence-based practices in online learning: A meta-analysis and review of online learning studies. U.S. Department of Education.
  29. Mutlu-Bayraktar, D., Cosgun, V., & Altan, T. (2019). Cognitive load in multimedia learning environments: A systematic review. Computers & Education, 141, 103618. [Google Scholar] [CrossRef]
  30. Ng, H. K., Kalyuga, S., & Sweller, J. (2013). Reducing transience during animation: A cognitive load perspective. Educational Psychology, 33(7), 755–772. [Google Scholar] [CrossRef]
  31. Novak, J. D. (2010). Learning, creating, and using knowledge: Concept maps as facilitative tools in schools and corporations. Routledge. [Google Scholar]
  32. Rahman, M. R., Koka, R. S., Shah, S. K., & Subramanian, V. (2024). Enhancing lecture video navigation with AI generated summaries. Education and Information Technologies, 29(6), 7361–7384. [Google Scholar] [CrossRef]
  33. Scheiter, K., & Eitel, A. (2015). Signals foster multimedia learning by supporting integration of highlighted text and diagram elements. Learning and Instruction, 36, 11–26. [Google Scholar] [CrossRef]
  34. Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143(6), 565–600. [Google Scholar] [CrossRef]
  35. Singh, A. M., Marcus, N., & Ayres, P. (2012). The transient information effect: Investigating the impact of segmentation on spoken and written text. Applied Cognitive Psychology, 26(6), 848–853. [Google Scholar] [CrossRef]
  36. Sullivan, M., Kelly, A., & McLaughlan, P. (2023). ChatGPT in higher education: Considerations for academic integrity and student learning. Journal of Applied Learning & Teaching, 6(1), 31–40. [Google Scholar] [CrossRef]
  37. Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer. [Google Scholar]
  38. Szarkowska, A., Ragni, V., Szkriba, S., & Gerber-Morón, O. (2024). Watching subtitled videos with the sound off affects viewers’ comprehension, cognitive load, immersion, enjoyment, and gaze patterns: A mixed-methods eye-tracking study. PLoS ONE, 19(10), e0306251. [Google Scholar] [CrossRef]
  39. Ursutiu, D., Samoilă, C., Drăgulin, S., & Auer, M. E. (2018). Investigation of music and colours influences on the levels of emotion and concentration. In M. E. Auer, & D. G. Zutin (Eds.), Online engineering & internet of things (pp. 910–918). Springer. [Google Scholar]
  40. Van Gog, T., & Scheiter, K. (2010). Eye tracking as a tool to study and enhance multimedia learning. Learning and Instruction, 20(2), 95–99. [Google Scholar] [CrossRef]
  41. Wahn, B., Schmitz, L., Gerster, F. N., & Weiss, M. (2023). Offloading under cognitive load: Humans are willing to offload parts of an attentionally demanding task to an algorithm. PLoS ONE, 18(5), e0286102. [Google Scholar] [CrossRef]
  42. Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science, 3(2), 159–177. [Google Scholar] [CrossRef]
  43. Wong, A., Leahy, W., Marcus, N., & Sweller, J. (2012). Cognitive load theory, the transient information effect and e-learning. Learning and Instruction, 22(6), 449–457. [Google Scholar] [CrossRef]
  44. Wong, L. H., & Viberg, O. (2024). Unpacking students’ interaction patterns in asynchronous online discussions: A learning analytics approach. Computers & Education, 199, 104785. [Google Scholar]
  45. Wooldridge, M., & Jennings, N. R. (1995). Intelligent agents: Theory and practice. The Knowledge Engineering Review, 10(2), 115–152. [Google Scholar] [CrossRef]
  46. Yilmaz, R., & Karaoglan Yilmaz, F. G. (2023). The effect of generative artificial intelligence (AI)-based tool use on students’ computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence, 4, 100147. [Google Scholar] [CrossRef]
  47. Zhang, D., Zhao, J. L., Zhou, L., & Nunamaker, J. F., Jr. (2006). Can e-learning replace classroom learning? Communications of the ACM, 47(5), 75–79. [Google Scholar] [CrossRef]
Figure 1. Page layout of course video.
Figure 1. Page layout of course video.
Education 16 00039 g001
Figure 2. Learning interfaces for the three experimental conditions. (a) PA-MMS group with mind map summary; (b) PA-TS group with text summary.
Figure 2. Learning interfaces for the three experimental conditions. (a) PA-MMS group with mind map summary; (b) PA-TS group with text summary.
Education 16 00039 g002
Figure 3. Experimental procedure flowchart.
Figure 3. Experimental procedure flowchart.
Education 16 00039 g003
Figure 4. Comparative heat map analysis of eye movements across three experimental groups. (a) PA-MMS group showing network-like fixation patterns concentrated on mind map nodes and conceptual connections; (b) PA-TS group displaying linear top-to-bottom scanning patterns characteristic of text reading; (c) NPA group exhibiting dispersed fixations focused primarily on video content area.
Figure 4. Comparative heat map analysis of eye movements across three experimental groups. (a) PA-MMS group showing network-like fixation patterns concentrated on mind map nodes and conceptual connections; (b) PA-TS group displaying linear top-to-bottom scanning patterns characteristic of text reading; (c) NPA group exhibiting dispersed fixations focused primarily on video content area.
Education 16 00039 g004
Table 1. Learning performance by condition.
Table 1. Learning performance by condition.
CategoryPA-MMS Group
(M ± SD)
PA-TS Group
(M ± SD)
NPA Group
(M ± SD)
F(2, 77)pqFDRη2Cohen’s d [95% CI]
Posttest Score11.85 ± 1.0910.39 ± 1.237.10 ± 1.4142.37<0.001<0.0010.5243.78 [3.05, 4.51]
Learning Gain8.15 ± 1.526.56 ± 1.683.24 ± 1.9436.85<0.001<0.0010.4892.81 [2.18, 3.44]
Posttest Accuracy Rate (%)84.60 ± 7.8074.20 ± 8.8050.70 ± 10.1050.14<0.001<0.0010.5663.78 [3.05, 4.51]
Retention Test Score12.30 ± 0.8010.20 ± 1.408.30 ± 1.5042.80<0.001<0.0010.5263.21 [2.53, 3.89]
Retention Rate (%)96.80 ± 6.0092.20 ± 8.0087.00 ± 9.006.520.0020.0050.1451.28 [0.71, 1.85]
Forgetting Amount0.55 ± 0.831.18 ± 1.321.92 ± 1.4821.40<0.001<0.0010.3571.09 [0.53, 1.65]
Note. qFDR represents the false discovery rate after Benjamini–Hochberg correction.
Table 2. Cognitive load by condition.
Table 2. Cognitive load by condition.
CategoryPA-MMS Group
(M ± SD)
PA-TS Group
(M ± SD)
NPA Group
(M ± SD)
F(2, 77)pqFDRη2Cohen’s d [95% CI]
ICL2.85 ± 0.713.18 ± 0.683.52 ± 0.637.320.0010.0030.1600.99 [0.44, 1.54]
ECL2.42 ± 0.592.78 ± 0.643.25 ± 0.7210.47<0.0010.0010.2141.26 [0.70, 1.82]
GCL3.68 ± 0.523.45 ± 0.583.02 ± 0.618.210.0010.0030.1761.15 [0.59, 1.71]
ATT3.75 ± 0.483.82 ± 0.513.31 ± 0.554.120.0200.0080.0970.81 [0.26, 1.36]
TIME2.88 ± 0.663.34 ± 0.713.65 ± 0.789.85<0.0010.0010.2041.08 [0.52, 1.64]
Note. qFDR represents the false discovery rate after Benjamini–Hochberg correction.
Table 3. EEG frequency band power by condition.
Table 3. EEG frequency band power by condition.
CategoryPA-MMS Group
(M ± SD)
PA-TS Group
(M ± SD)
NPA Group
(M ± SD)
F(2, 77)pqFDRη2Cohen’s d [95% CI]
Theta Wave Ratio (%)14.35 ± 3.2815.87 ± 3.5118.92 ± 3.8610.28<0.0010.0010.2111.28 [0.71, 1.85]
Alpha Wave Ratio (%)18.76 ± 4.1516.42 ± 3.8914.38 ± 3.727.840.0010.0030.1691.10 [0.54, 1.66]
Beta Wave Ratio (%)15.23 ± 3.4217.68 ± 3.7813.85 ± 3.216.540.0020.0050.1450.42 [−0.11, 0.95]
Gamma Wave Ratio (%)2.58 ± 1.123.50 ± 1.352.31 ± 0.985.890.0040.0080.1330.25 [−0.28, 0.78]
Note. qFDR represents the false discovery rate after Benjamini–Hochberg correction.
Table 4. Eye-tracking metrics on AOI A2 by conditions.
Table 4. Eye-tracking metrics on AOI A2 by conditions.
CategoryPA-MMS Group
(M ± SD)
PA-TS Group
(M ± SD)
NPA Group
(M ± SD)
F(2, 77)pqFDRη2Cohen’s d [95% CI]
AOI A2 Fixation Time Percentage (%)12.40 ± 5.7014.80 ± 6.900.80 ± 0.9043.90<0.001<0.0010.5332.56 [1.89, 3.23]
AOI A2 Fixation Time (ms)56,318 ± 25,91465,129 ± 31,2063320 ± 371038.45<0.001<0.0010.5002.43 [1.77, 3.09]
Note. qFDR represents the false discovery rate after Benjamini–Hochberg correction.
Table 5. Correlation matrix of key variables.
Table 5. Correlation matrix of key variables.
Variables1234567891011
Posttest Score
Learning Gain0.95 ***
Retention Test Score0.89 ***0.82 ***
Forgetting Amount−0.76 ***−0.68 ***−0.84 ***
ICL−0.52 ***−0.48 ***−0.45 ***0.41 **
ECL−0.61 ***−0.58 ***−0.52 ***0.48 ***0.68 ***
GCL0.54 ***0.51 ***0.58 ***−0.49 ***−0.42 **−0.51 ***
Theta Wave Ratio (%)−0.46 ***−0.43 ***−0.41 **0.51 ***0.58 ***0.64 ***−0.38 **
Theta Wave Ratio (%)0.39 **0.35 **0.42 **−0.36 **−0.35 **−0.41 **0.47 ***−0.52 ***
AOI A2 Fixation Time (ms)0.68 ***0.62 ***0.59 ***−0.54 ***−0.45 ***−0.53 ***0.56 ***−0.48 ***0.44 ***
TIME−0.55 ***−0.51 ***−0.48 ***0.43 ***0.62 ***0.71 ***−0.44 ***0.59 ***−0.38 **−0.49 ***
Note. ** p < 0.01. *** p < 0.001.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yuan, L.; Xu, J.; Zhan, Z. Effects of Pedagogical Agent-Generated Summaries on Video-Based Learning: Evidence from Eye-Tracking and EEG. Educ. Sci. 2026, 16, 39. https://doi.org/10.3390/educsci16010039

AMA Style

Yuan L, Xu J, Zhan Z. Effects of Pedagogical Agent-Generated Summaries on Video-Based Learning: Evidence from Eye-Tracking and EEG. Education Sciences. 2026; 16(1):39. https://doi.org/10.3390/educsci16010039

Chicago/Turabian Style

Yuan, Lei, Jiyuan Xu, and Zehui Zhan. 2026. "Effects of Pedagogical Agent-Generated Summaries on Video-Based Learning: Evidence from Eye-Tracking and EEG" Education Sciences 16, no. 1: 39. https://doi.org/10.3390/educsci16010039

APA Style

Yuan, L., Xu, J., & Zhan, Z. (2026). Effects of Pedagogical Agent-Generated Summaries on Video-Based Learning: Evidence from Eye-Tracking and EEG. Education Sciences, 16(1), 39. https://doi.org/10.3390/educsci16010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop