System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency

Zhou, Xiaofei; Zhao, Shiru; Wu, Pengjun; Chen, Yan

doi:10.3390/app16021068

Open AccessArticle

System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency

¹

School of Art and Design, Dalian Art College, Dalian 116600, China

²

Department of Interior Architecture, Daegu University, Gyeongsan 38453, Republic of Korea

³

School of Architecture and Fine Art, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2026, 16(2), 1068; https://doi.org/10.3390/app16021068

Submission received: 29 December 2025 / Revised: 16 January 2026 / Accepted: 19 January 2026 / Published: 20 January 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Design education involves complex historical knowledge structures that often impose a high extraneous cognitive load on students. This study proposes and evaluates an intelligent instructional system that integrates Retrieval-Augmented Generation (RAG) with anthropomorphic digital humans to function as scalable cognitive scaffolding. We developed a locally deployed architecture utilizing the Qwen3-30B Large Language Model (LLM) for reasoning, BGE-Large-Zh for high-precision semantic embedding, and LiveTalking for real-time audiovisual generation. To validate the system’s pedagogical efficacy, a multi-center randomized controlled trial (RCT) was conducted across three universities (

N = 150

). The experimental group utilized the RAG-enhanced digital human system, while the control group received traditional instruction. Quantitative results demonstrate that the system significantly improved learning outcomes (

p < 0.001, C o h e n ’ s d = 1.14

) and classroom engagement (

p < 0.001, d = 1.39

). Crucially, measurements using the Paas Mental Effort Rating Scale revealed a significant reduction in mental effort (

p < 0.001, d = 1.71

) for the experimental group. Instructional efficiency analysis (E) confirmed that the system successfully converted reduced extraneous load into germane learning gains (Experimental

E = + 0.72

vs. Control

E = - 0.68

). These findings validate the technical feasibility and educational value of combining localized RAG architectures with embodied AI, offering a replicable framework for reducing cognitive load in intensive learning environments.

Keywords:

RAG; digital humans; LLM; Cognitive Load Theory; Intelligent Tutoring Systems; Human–Computer Interaction (HCI); educational artificial intelligence

1. Introduction

1.1. Research Background and Significance

In the context of the global digital transformation of higher education, design disciplines face significant challenges in integrating emerging technologies to enhance pedagogical quality [1,2]. Traditional instructional models often struggle to keep pace with the rapid evolution of knowledge and the increasing diversity of learner needs [3,4]. These challenges are particularly acute in design education, a field characterized by dynamic knowledge structures. Within this context, students frequently demonstrate underdeveloped information retrieval skills [5,6], while traditional interaction modes remain limited, potentially inhibiting the development of practical skills and innovative capacity [7].

Artificial intelligence (AI), specifically RAG, offers a promising solution by synthesizing pre-trained language models with external, verifiable knowledge bases to generate domain-specific content [8,9,10]. RAG facilitates the creation of intelligent platforms capable of delivering tailored resources and personalized instruction. Concurrently, “digital humans”—embodied pedagogical agents characterized by realistic visualization and naturalistic interaction—show increasing potential in educational settings [11,12,13]. These agents can simulate complex scenarios, facilitate immersive role-play, provide software guidance, and foster creativity within design contexts [14,15]. Consequently, this study explores the integration of RAG technology with digital humans in design education, proposing a framework that leverages RAG to construct robust knowledge graphs and embeds digital humans to elevate instructional efficiency and interaction quality.

1.2. Research Questions and Motivation

Despite the growing adoption of AI in education, the convergence of RAG technology and digital human interfaces remains an underexplored area of inquiry, particularly within practice-based disciplines such as design [16]. Principal challenges in this domain include constructing effective domain-specific knowledge bases, defining optimal pedagogical roles and interaction modalities for digital humans, and achieving the seamless integration of RAG-enabled knowledge retrieval with real-time interactive feedback. To address these gaps, this study investigates the development and pedagogical impact of an integrated RAG–digital human teaching system. Specifically, the research addresses the following questions:

(1): System Design and Integration (RQ1): How can RAG technology and digital human interfaces be effectively synthesized to create a responsive, intelligent teaching system tailored to the specific requirements of design education?
(2): Educational Impact (RQ2): To what extent does this integrated system influence university design students’ learning outcomes and classroom engagement compared to traditional instructional methods?
(3): Mechanism of Support (RQ3): In what ways does the RAG component enhance the digital human’s capacity to address domain-specific inquiries and provide context-aware scaffolding in design learning scenarios?

Grounded in Cognitive Load Theory and prior work on Intelligent Tutoring Systems, we advance three hypotheses. First, students who use the RAG-enhanced digital human system will attain significantly higher academic performance than peers taught through traditional instruction (H1). Second, the experimental group will experience significantly lower extraneous cognitive load—and corresponding mental effort—than the control group, with this reduction facilitating more efficient conversion into germane load (H2). Third, combining an anthropomorphic interface with accurate RAG-based retrieval will elicit significantly higher levels of behavioral engagement relative to baseline (H3).

2. Literature Review

2.1. Potential and Applications of RAG Technology in Education

The integration of Large Language Models (LLMs) into educational frameworks has expanded rapidly, demonstrating significant utility in personalized learning and instructional scaffolding [17,18,19]. However, standard LLMs frequently exhibit limitations regarding domain-specific precision, occasionally resulting in “hallucinations” or factual inaccuracies when processing specialized knowledge [20,21]. RAG addresses these cognitive deficits by anchoring generative models to external, verifiable knowledge bases, thereby enhancing the validity and reliability of the output.

Early research established the technical foundations of RAG within natural language processing [10], while subsequent scholarship has focused on its pedagogical applications [22]. Research [23] demonstrated RAG’s efficacy in developing curriculum-aligned question-and-answer systems, while other studies [24] reported significant gains in learning outcomes through RAG-enabled adaptive systems that deliver personalized content. Furthermore, prior work [25] validated the utility of RAG in instructional resource development, noting that such systems reduce instructor preparation time while simultaneously increasing student engagement.

Recent domain-specific inquiries [26,27], such as in computer science education, highlight RAG’s capacity to deliver precise, context-aware support by incorporating authoritative sources for real-time assistance. Collectively, these findings position RAG as a robust framework for enhancing domain-specific educational interventions and ensuring the fidelity of personalized learning experiences.

2.2. Applications of Digital Humans in Design Education

Digital humans represent an emerging paradigm in HCI, introducing innovative modalities for learning and instruction, particularly within design disciplines [26]. Initial research largely focused on virtual avatars within Virtual and Augmented Reality (VR/AR) environments, primarily emphasizing basic interaction mechanics and immersive presence [27,28,29].

Technological advancements have since evolved these entities into “digital mentors” capable of offering personalized guidance and real-time feedback. These virtual agents facilitate simulated professional scenarios, allowing students to cultivate essential soft skills—such as communication and collaboration—through interactions with simulated clients and team members. From a behavioral perspective, the anthropomorphic nature of digital humans can enhance social presence, potentially fostering deeper emotional connection and engagement than text-based interfaces.

Despite these advancements, significant implementation challenges persist. Technical limitations continue to hinder the naturalism of language processing and interaction fluidity, while pedagogical concerns exist regarding the potential displacement of human-to-human engagement [30]. Furthermore, ethical considerations, including the risk of algorithmic bias transmission [31] and data privacy vulnerabilities [32], require rigorous oversight. Integrating digital humans into design education, therefore, necessitates a balanced approach, positioning them as complementary cognitive tools rather than substitutes for human instruction.

2.3. Research Gaps and Contributions of This Study

While prior research has examined RAG technology and digital humans in isolation, significant gaps remain regarding their integrated application, particularly within practice-based design disciplines. Existing literature validates RAG’s strengths in intelligent information retrieval and personalization [33]; however, scant attention has been paid to how RAG can augment the cognitive capabilities of digital humans, specifically in managing complex, design-specific knowledge tasks.

Although current digital humans effectively simulate visual presence and deliver basic guidance, their content generation capabilities remain constrained. Most rely on static, pre-programmed knowledge bases, lacking the dynamic retrieval capacity required to address novel student inquiries or critique evolving design proposals. Moreover, empirical evidence regarding the joint deployment of these technologies in authentic educational settings is scarce.

Furthermore, educational technology scholars necessitate a critical distinction between sustained pedagogical engagement and the “Novelty Effect,” where initial student interest stems primarily from the technology’s newness [34]. Studies suggest that engagement with superficial avatars typically decays once the “wow factor” dissipates, unless the agent provides sustained, high-value utility. Therefore, a critical research gap lies in determining whether equipping digital humans with RAG-based intelligence can transform them from transient novelties into enduring “epistemic partners” capable of supporting long-term inquiry.

This research seeks to address these gaps by developing and locally deploying an instructional system that integrates RAG architecture with a digital human interface, utilizing frameworks such as Ollama and LiveTalking [35] to enable dynamic knowledge retrieval within an embodied interaction model. Through a mixed-methods randomized controlled trial conducted within an authentic design course, the study evaluates the system’s influence on learning outcomes and behavioral engagement while controlling for prior knowledge. The findings suggest that equipping digital humans with RAG capabilities may enhance their pedagogical utility by allowing for more accurate, context-aware responses to complex domain inquiries, thereby potentially mitigating the knowledge limitations of traditional virtual agents and supporting student academic performance.

3. Research Methodology

3.1. System Design and Implementation: Synthesizing RAG and Digital Humans (RQ1)

The primary objective of this research is to develop an intelligent system that effectively supports teaching in design disciplines. This system integrates RAG technology with digital human technology to provide personalized and interactive learning experiences. Its design and implementation involve three main components: RAG model selection and optimization, digital human development, and system integration with local deployment.

3.1.1. RAG Model Selection and System Construction

For the underlying LLM, this study prioritizes open-source architectures over closed-source alternatives. Open-source models offer superior transparency, robust community-driven optimization, and the flexibility required for domain-specific fine-tuning [36]. Furthermore, they enable local deployment, a critical factor for mitigating data privacy concerns and ensuring system sustainability in resource-constrained educational environments.

To determine the optimal foundation model, we conducted a comparative evaluation of three leading open-source candidates: Qwen3-20B-A3B, Llama 3-8B, and GLM4-9B. The selection criteria focused on performance across five key dimensions relevant to educational scaffolding: classification accuracy, information extraction capability, reading comprehension, instruction-following fidelity, and computational resource efficiency for local deployment. Table 1 provides a comparative summary of these models based on established benchmark data.

To justify the selection criteria, we utilized two industry-standard benchmarks relevant to educational AI agents. The MMLU (Massive Multitask Language Understanding) benchmark evaluates the model’s reasoning capabilities across 57 subjects (STEM, humanities, etc.), serving as a proxy for the “epistemic breadth” required to handle interdisciplinary design history inquiries. The IFEVAL (Instruction Following Evaluation) benchmark measures the model’s strict adherence to formatting and constraints (e.g., “reply in under 50 words”). This is pedagogically critical for a digital human system, as the generated script must align with specific speech-timing windows and persona constraints to prevent synchronization errors.

As detailed in Table 1, Qwen3 excels in classification and reading comprehension (89.3% on MMLU) and instruction-following (84.7% on IFEval). While Llama 3-8B-Instruct achieves higher instruction-following scores (~89% on IFEval), it significantly trails in comprehension (68.4% on MMLU). GLM-4-9B (77.87% on MMLU) demonstrates weaker information extraction and instruction adherence (~88% of GPT-4 on Chinese IFEval), alongside prohibitive resource demands. Consequently, Qwen3 was selected as the core LLM. Its Mixture-of-Experts (MoE) architecture (30B total; ~3B active parameters) balances efficacy with efficiency, enabling the cost-effective processing of complex Chinese design inquiries.

RAG efficacy further depends on the embedding model, which encodes data into dense vectors to capture semantic nuances. A robust model aligns queries and documents within a vector space to ensure precise retrieval [40].

We assessed prominent open-source embedding models, factoring dimensions (for semantic detail), benchmark recall rates (for accuracy), and size (for efficiency). Table 2 outlines key model traits.

As shown in Table 2, both BGE-Large-Zh and E5-Large-v2-CN exhibit high embedding dimensions and superior average recall rates on Chinese benchmarks, indicating robust semantic representation. BGE-Large-Zh, optimized for Chinese text, is especially apt for design education. Conversely, Ganymede-Base, though compact, offers lower dimensions and moderate recall, implying constraints in retrieval precision—critical for accurate educational information delivery.

For seamless RAG-digital human integration, we implemented local deployment via Ollama and OpenWebUI [44]. Ollama enables efficient local execution of models like Qwen3, bypassing external Application Programming Interfaces (APIs) to safeguard data privacy and minimize latency. OpenWebUI provides an intuitive interface for model interactions.

System implementation began with the construction of a curated design history knowledge base, encompassing digital textbooks, academic articles, and architectural case studies. To ensure semantic continuity—critical for maintaining the narrative flow of historical events—data processing employed a recursive character text splitting algorithm.

The text was segmented into fixed-size chunks of 512 tokens with a sliding window overlap of 50 tokens. This overlap strategy was specifically selected to prevent ‘context fragmentation,’ ensuring that definition-heavy concepts (e.g., the tenets of the Bauhaus movement) remained intact across chunk boundaries, thereby optimizing the BGE-Large-Zh embedding model’s retrieval accuracy.

The digital human interface sends user queries as HTTP requests to a local URL exposing the RAG pipeline—the frontend-backend bridge. The RAG system handles queries and delivers responses for real-time student feedback.

This setup yields a decoupled, integrated architecture: digital humans oversee interactions, while the RAG core—driven by Qwen3 and BGE-Large-Zh—manages retrieval and generation. Local URL exchanges promote efficient, secure operations, fostering a responsive educational platform.

3.1.2. Digital Human Design: Implementation Based on LiveTalking

With the RAG backend established, the next phase focused on constructing the “frontend” embodiment—an anthropomorphic interface capable of delivering the retrieved content with high visual fidelity and low latency. To identify the optimal rendering framework, we evaluated mainstream platforms based on realism, interactivity, and cost (Table 3).

As detailed in Table 3, while commercial solutions like Synthesia offer high realism, they lack the low-level API access required for real-time, low-latency interaction. Consequently, we adopted the LiveTalking framework, an open-source solution that allows for local deployment and granular control over audiovisual synchronization.

The development of the digital human interface via LiveTalking entailed specific procedural stages. First, model selection prioritized the balance between high-fidelity rendering (e.g., Enerf [45]) and rapid generation (e.g., Wav2Lip [46]) to address pedagogical requirements. Subsequently, voice cloning technologies trained on expert educator audio imbued the agent with a natural, professional vocal timbre [47].

LiveTalking integrates with the RAG architecture to drive real-time speech and facial animation. Through the LiveTalking API, text outputs are converted into audiovisual directives. Concurrently, behavioral orchestration protocols configure non-verbal cues appropriate for instruction, such as presenting visual aids during case studies or adopting listening postures during Q&A sessions [48].

To ensure interaction fluidity, interruption mechanisms enable immediate transitions to Q&A modes upon detecting student input. This integration yields a responsive system capable of effective content delivery, case analysis, and dynamic dialog (Figure 1).

3.1.3. System Integration and Local Deployment

The architecture employs a modular design, allowing independent maintenance of RAG and digital human components via a lightweight local HTTP API. Upon receiving a student query, the frontend forwards it to the RAG module. Here, the BGE-Large-Zh model encodes the query for similarity searches against the knowledge base, retrieving relevant fragments to augment the Qwen3 generation process. Retrieval precision relies on Cosine Similarity (Equation (1)).

Similarity (A, B) = \frac{A \cdot B}{| A | | B |} = \frac{\sum A_{i} B_{i}}{\sqrt{\sum A_{i}^{2}} \sqrt{\sum B_{i}^{2}}}

(1)

where

A

and

B

represent the query and knowledge fragment vectors. To ensure epistemic accuracy and mitigate hallucinations, we implemented a dual-stage filter: extracting the top three matches (

T o p - K = 3

) and applying a similarity threshold (

τ = 0.60

). Fragments scoring below this threshold trigger a fallback response (“insufficient evidence found”), prioritizing academic rigor over generative fluency.

To ensure the epistemic accuracy of the retrieved content, the system implements a conditional retrieval function

R (q, D)

. Unlike standard similarity searches, we define a validity function to filter ‘hallucination risks’. For a user query

q

and a document chunk

d_{i}

in the knowledge base

D

, the retrieval score

S (q, d_{i})

is calculated as (Equation (2)).

S (q, d_{i}) = \{\begin{array}{l} sim (q, d_{i}), & if sim (q, d_{i}) \geq τ \\ 0, & if sim (q, d_{i}) < τ \end{array}

(2)

where

sim (q, d_{i})

denotes the cosine similarity between the embedding vectors of the query and the chunk, and

τ

represents the predefined similarity threshold (set to 0.60 in this study). The final context input

C

for the LLM comprises the concatenation of the top-

k

chunks where

S (q, d_{i}) > 0

.

Retrieved context drives the LiveTalking framework via a high-quality Text-to-Speech (TTS) engine, synchronizing speech with animation [49,50,51]. The system operates on a local server equipped with dual NVIDIA RTX 4090 GPUs (24 GB VRAM). Qwen3-30B was deployed using 4-bit quantization (AWQ) via Ollama to balance memory efficiency with reasoning performance, while Sentence-Transformers and FAISS managed vector indexing. To address inference latency, the system employs a “latency masking” strategy: non-verbal behaviors (nodding, gaze shifts) during processing maintain social presence and interaction continuity.

Additionally, the system supports automated lecturing: the RAG module segments content into scripts which the digital human renders. An interactive interface allows self-directed learning control. Figure 2 illustrates the architecture; implementation details are in Appendix A.

3.1.4. Pedagogical Framework and Curriculum Integration

The efficacy of the RAG-enhanced digital human extends beyond technical architecture, relying on its integration into a pedagogical framework tailored to the Modern World Design History curriculum. Grounded in social constructivism and Vygotsky’s Zone of Proximal Development (ZPD) [52], the system functions not merely as an information retrieval tool but as a “More Knowledgeable Other” (MKO) delivering personalized scaffolding [53].

Drawing on Cognitive Load Theory and Mayer’s Cognitive Theory of Multimedia Learning (CTML) [54], the intervention addresses the high extraneous load novice students face when navigating dense historical archives. Specifically, the system design adheres to the Modality Principle, which posits that learning is enhanced when verbal information is presented as speech rather than on-screen text. By delivering historical narratives via the digital human’s auditory channel, the system offloads the visual channel, allowing students to focus their visual attention on design artifacts and the agent’s non-verbal scaffolding cues. This multimodal distribution prevents the “Split-Attention Effect” often observed in text-heavy interfaces. By offloading retrieval tasks to the RAG architecture, the system preserves cognitive resources for Germane Load—facilitating schema construction and conceptual synthesis. This shifts cognitive focus from the logistics of search to the logic of analysis. Concurrently, the digital human interface enhances behavioral engagement through conversational delivery and social presence, optimizing content structuring for cognitive efficiency.

Operationally, the system serves as a supplementary intelligent assistant, complementing rather than supplanting instruction. It supports personalized Q&A, interactive case studies, and self-paced review. This symbiotic integration leverages RAG for epistemic precision and digital humans for immersive interaction, empowering students while allowing instructors to prioritize critical discourse and higher-order cognitive development.

3.2. Experimental Design and Implementation

3.2.1. Experimental Subjects and Setting

The study enrolled 150 third-year environmental design undergraduates from three universities in Dalian, China. To ensure cross-institutional validity, two distinct sections of the “Modern World Design History” course were selected from each institution, totaling six classes (

n = 25

per class). To minimize instructor bias, a cluster randomization design assigned one class per university to the experimental group (

N_{e x p} = 75

) and the other to the control group (

N_{c t r l} = 75

).

An a priori power analysis was conducted using G*Power 3.1 to determine the requisite sample size. Assuming a medium effect size (

d = 0.5

), an alpha level of 0.05, and a power of 0.80 for a two-tailed independent t-test, the calculation indicated a minimum total sample of 128 participants. Therefore, the recruited sample of 150 ensures adequate statistical power.

Cluster randomization was performed by an independent research assistant using a computer-generated random number sequence to prevent allocation bias. While the nature of the intervention (AI avatar) precluded blinding participants to their condition, a single-blind protocol was implemented for assessment: all subjective design artifacts and essay responses were graded by two external faculty members who were blinded to the group allocations. Inter-rater reliability was high (

C o h e n ’ s κ = 0.88

).

All participants possessed comparable academic backgrounds and had completed identical prerequisite coursework. This specific curriculum, characterized by dense historical narratives and complex visual analysis, was selected as an optimal context for evaluating RAG–digital human integration. The intervention occurred in multimedia classrooms equipped with high-performance workstations and robust network infrastructure, ensuring low-latency interaction with the intelligent system via personal devices. Demographic characteristics and baseline equivalence are shown in Table 4.

3.2.2. Group Assignment and Control

To ensure methodological rigor, the study maintained strict condition separation. The experimental group utilized the RAG-enhanced digital human system for inquiry-based learning, engaging in real-time knowledge retrieval, case analysis, and personalized feedback exercises. Conversely, the control group received traditional didactic instruction covering identical curriculum content via standard lectures and printed resources, without access to the AI-augmented platform.

3.2.3. Experimental Procedure and Data Collection

A one-week preparatory phase preceded the formal study, encompassing participant training, system calibration, and content standardization to ensure procedural fidelity.

The five-week intervention featured biweekly sessions covering core Modern World Design History modules. The experimental cohort engaged with the RAG-enhanced digital human for dynamic content interaction, while the control group received traditional instruction. Immediate feedback on system interaction was collected via post-session questionnaires, and all classroom activities were video-recorded for behavioral analysis.

Performance data, including unit test scores and participation metrics, were recorded after each session. The experimental group also engaged in periodic semi-structured interviews to provide qualitative insights into the intelligent assistance.

Strict data privacy protocols were enforced throughout the study. Given the sensitive nature of classroom recordings and query logs, all digital data were stored locally on the computer to ensure sovereignty. Personal identifiers were replaced with alphanumeric codes prior to analysis. Access to raw video footage was restricted to the principal investigators, and all multimedia files were scheduled for permanent deletion six months post-analysis in compliance with institutional data retention policies.

Following the experiment, quantitative assessment included cumulative performance metrics and the Paas Mental Effort Rating Scale [55]. To empirically assess cognitive load, students rated their mental effort immediately post-test on a validated 9-point Likert scale (1 = ‘Very, very low’ to 9 = ‘Very, very high’). Under Cognitive Load Theory, this subjective rating serves as a robust proxy for the total cognitive load experienced during the learning task.

3.3. Data Analysis Methods

Statistical analyses were conducted using SPSS (Version 27). The Shapiro–Wilk test verified the normality of pre-test scores, post-test outcomes, and engagement metrics. Baseline equivalence was assessed via independent samples t-tests, calculated as (Equation (3)).

t = \frac{M_{1} - M_{2}}{\sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}}

(3)

where

M

,

s

, and

n

represent the mean, standard deviation, and sample size, respectively. A contingency was established to utilize Analysis of Covariance (ANCOVA) should significant baseline heterogeneity be detected.

Post-test scores served as the primary outcome, providing a standardized metric of conceptual mastery in modern design history. While not exhaustive, such assessments offer a robust benchmark for evaluating intervention efficacy [56]. Classroom participation, the secondary outcome, was quantified via a composite index combining query frequency and interaction duration, thereby operationalizing the link between active engagement and learning depth.

Comparative analyses employed independent samples t-tests for outcome variables. Pearson correlation examined the association between participation and performance. Additionally, multiple linear regression—controlling for pre-test variance—was conducted to isolate the specific predictive effect of the RAG intervention. Effect sizes for significant comparisons were reported using Cohen’s d, as (Equation (4)).

d = \frac{M_{1} - M_{2}}{S D_{p o o l e d}}

(4)

Statistical significance was set at

α = 0.05

. To control for the Family Wise Error Rate (FWER) across multiple secondary outcome comparisons, the Holm-Bonferroni correction was applied where appropriate. Confidence Intervals (95% CI) were calculated for all mean differences to report effect precision alongside significance levels.

To evaluate the integrated relationship between learning performance and mental effort, we calculated the Instructional Efficiency (

E

) metric using the formula proposed by Paas. This metric quantifies the performance return on cognitive investment as (Equation (5)).

E = \frac{Z_{P e r f o r m a n c e} - Z_{E f f o r t}}{\sqrt{2}}

(5)

where

Z_{P e r f o r m a n c e}

represents the standardized z-score of the post-test results, and

Z_{E f f o r t}

represents the standardized z-score of the Paas Mental Effort Rating. A positive

E

value indicates higher instructional efficiency (high performance with low mental effort), while a negative value suggests lower efficiency (high effort yielding low performance). This composite metric provides a more rigorous validation of the intervention’s efficacy than analyzing scores or effort in isolation.

4. Results

4.1. Quantitative Analysis: Assessing Educational Impact (RQ2)

To answer research question 2, quantitative data were collected from 150 participants across three universities. The sample consisted of an experimental group (

n = 75

, utilizing RAG-enhanced digital human-assisted teaching) and a control group (

n = 75

, traditional teaching). Prior to conducting inferential statistical tests, normality assumptions for the dependent variables (pre-test scores, post-test scores, and classroom engagement scores) were assessed within each group using the Shapiro–Wilk test (Figure 3).

To empirically justify the selection of the similarity threshold (

τ = 0.60

) for the RAG retrieval module, a post hoc sensitivity analysis was conducted using a validation set of 200 standard design history queries. We measured Precision (relevance of retrieved chunks) and Recall (coverage of required facts) across thresholds ranging from 0.40 to 0.80. As illustrated in the sensitivity analysis (Figure 4), lower thresholds (

τ < 0.50

) maximized Recall but introduced significant noise, leading to hallucinated synthesis. Conversely, higher thresholds (

τ > 0.70

) severely penalized Recall, causing the agent to miss critical context. The value of

τ = 0.60

yielded the optimal F1-score (0.84), effectively balancing the trade-off between epistemic accuracy and content availability.

The results of the Shapiro–Wilk tests are summarized in Table 5.

As detailed in Table 4, Shapiro–Wilk tests yielded non-significant results (

p > 0.05

) across all variables, confirming the normality assumption for pre-test scores, post-test outcomes, and classroom engagement within both cohorts. Accordingly, descriptive statistics for the experimental and control groups are summarized in Table 6.

As shown in Table 5, the experimental group demonstrated higher mean scores in both post-test performance and classroom participation compared to the control group.

Specifically, the experimental group (

M = 4.32, S D = 1.65

), which used the RAG-enhanced digital human, reported significantly lower mental effort during the post-test assessment compared to the control group (

M = 7.15, S D = 1.78

), as confirmed by an independent samples t-test (

t (148) = - 10.09, p < 0.001, d = 1.65

). This finding provides direct empirical support for the hypothesis that the intelligent system effectively reduces the extraneous cognitive load experienced by students during the learning task.

To verify baseline equivalence, an independent samples t-test was conducted on pre-test scores (Table 7).

As shown in Table 6, the independent samples t-test for pre-test scores indicated no statistically significant difference. This demonstrates that both groups had comparable foundational knowledge, satisfying the requirement of between-group equivalence.

To assess the impact of the RAG-enhanced digital human system, an independent samples t-test was conducted on post-test scores (Table 8).

The results in Table 7 indicate that the experimental group’s post-test scores were significantly higher than those of the control group (

t (148) = 7

.01,

p < 0.001

). Cohen’s d of 1.14 suggests a very large intervention effect. These findings strongly support the hypothesis that digital human-assisted teaching based on RAG technology significantly enhances learning outcomes. Additionally, a One-Way ANOVA confirmed no significant performance variance across the three participating universities (

F (2,72) = 1.15, p = 0.32

), validating that the intervention’s efficacy was consistent across diverse institutional settings.

To examine the impact on classroom engagement, an independent samples t-test was conducted (Table 9).

Table 8 shows that the experimental group exhibited significantly higher classroom engagement (

t (148) = 8.54, p < 0.001

). This composite metric, derived from both system interaction logs (query frequency) and video-based behavioral coding (duration of focused attention), confirms that the RAG-based digital human system effectively stimulates student interest.

To examine the relationship between classroom engagement and learning outcomes, a Pearson correlation analysis was conducted (Table 10).

Furthermore, to quantify the combined effect of performance gains and cognitive cost, we calculated the Instructional Efficiency (

E

) for each participant (using Equation (5)). An independent samples t-test revealed a statistically significant difference in efficiency

t (148) = 12.35, p < 0.001

). The experimental group demonstrated high positive efficiency (

M = 0.72, S D = 0.55

), whereas the control group exhibited negative efficiency (

M = - 0.68, S D = 0.62

). This substantial divergence (

d = 2.39

) empirically verifies that the system successfully converts reduced extraneous load into germane learning outcomes (Figure 5).

To further assess the predictive effects of group assignment (experimental vs. control) and pre-test scores on post-test performance, a multiple linear regression analysis was conducted. Post-test scores served as the dependent variable, while group assignment (coded as

e x p = 1

,

c t r l = 0

) and pre-test scores were the independent variables. The regression results are presented in Table 11.

The regression model (Table 10) accounts for a substantial proportion of variance (

R^{2} = 0.68

). Consistent with educational literature, prior knowledge (Pre-test Score) emerged as the dominant predictor (

β = 0.582, p < 0.001

), reflecting the stability of baseline proficiency. However, even after controlling for this covariate, Group assignment remained a statistically significant predictor (

β = 0.465, p < 0.001

). This demonstrates that while prior knowledge is a primary performance driver, the RAG intervention yielded a distinct, measurable incremental gain independent of initial aptitude.

Beyond student outcomes, system log analysis of 2250 queries validated the RAG architecture’s technical efficacy. Application of the similarity threshold (

τ = 0.60

) yielded a 92.4% retrieval acceptance rate, while the 7.6% fallback activation correctly filtered irrelevant inquiries (confirmed via manual review). Crucially, expert evaluation of 500 sampled responses demonstrated a 96.2% factual accuracy rate with a hallucination rate of only 3.8%—significantly outperforming standard LLM baselines—thereby confirming that the Qwen3 and BGE-Large-Zh configuration effectively grounded generation in the curriculum database.

Regarding system responsiveness, Total Turn-Around Time (TTAT) was measured over 200 interaction cycles. Latency was defined as the interval between the cessation of student speech and the onset of the agent’s response. On the dual-GPU configuration, average TTAT for complex queries was 5.2 ± 1.1 s. The latency budget comprised: Automatic Speech Recognition (ASR) (0.6 s), RAG Retrieval/Reranking (0.8 s), LLM Time-to-First-Token (1.5 s), and TTS/Video Synthesis (2.3 s). While exceeding sub-second conversational norms, 4-bit quantization ensured viable generation speeds (Figure 6).

Critically, the “idle” animation state during this 5 s interval was designed to emulate a human tutor’s “cognitive processing” pause. This intentional latency design was key to managing student expectations.

4.2. Qualitative Analysis: Unpacking the Mechanism of Support (RQ3)

While quantitative metrics reveal the magnitude of improvement, they do not explain the underlying pedagogical mechanics. To address research question 3, we conducted semi-structured interviews with the RAG-enhanced digital human system. Semi-structured interviews were conducted with a stratified random sample of 30 participants from the experimental group (10 students from each of the three participating universities). Using NVivo 12 software, a thematic analysis was performed on the interview transcripts. The coding process reached data saturation, revealing four distinct themes regarding the system’s utility, engagement, psychological impact, and technical limitations (Figure 7).

Theme 1: Cognitive Load Reduction through Efficient Retrieval

Consistent across all three campuses, 90% of interviewees cited the efficiency of information retrieval as the system’s primary benefit. Students reported that the RAG capability acted as an effective filter, reducing the extraneous cognitive load associated with sifting through dense design history textbooks.

“Instead of flipping through pages for ages to find the definition of ‘Art Nouveau,’ I got a concise, context-aware answer instantly. It saved my brain power for actually understanding the concept rather than just searching for it.”
(Student C, University A)

Theme 2: Enhanced Engagement via Social Presence and Novelty

Participants described the digital human as having a stronger “social presence” than static text. The conversational interface transformed passive reading into active dialog. However, some students acknowledged a “Novelty Effect,” noting that the excitement of interacting with an AI avatar drove their initial engagement.

“It definitely made studying less boring. Having the avatar ‘look’ at you and respond felt like a real tutoring session. It pushed me to ask more questions just to see how it would react.”
(Student F, University B)

Theme 3: Psychological Safety and Personalized Scaffolding

A critical finding was the reduction in learning inhibition. Students felt psychologically safe to ask “basic” or “repetitive” questions to the digital human that they would feel embarrassed asking a professor. This aligned with the ZPD, allowing self-paced remediation.

“In class, I’m afraid of looking stupid if I ask about a simple term. With the system, I could ask the same definition three times until I truly got it. It judged my questions, but it didn’t judge me.”
(Student D, University C)

Theme 4: Pedagogical and Technical Limitations

Despite the positive reception, limitations were noted regarding the depth of reasoning and technical latency. Students distinguished between “factual retrieval” (where the AI excelled) and “interpretive critique” (where it lacked nuance). Additionally, the latency (discussed in →.2) was occasionally cited as a disruption to the conversational flow.

“It’s great for facts, but less for interpretation. If I asked ‘Why does this design feel sad?’, the answer was a bit generic. Also, sometimes the 5-s thinking pause made me wonder if it crashed.”
(Student H, University A)

Table 12 summarizes the coding structure and frequency.

In synthesis, the qualitative evidence corroborates the quantitative findings, highlighting the system’s efficacy in streamlining information retrieval, fostering behavioral engagement, and providing personalized scaffolding. Crucially, however, participants conceptualized the technology as a complementary pedagogical tool rather than a substitute for human instruction—specifically regarding tasks that demand interpretive nuance and interpersonal connection. These subjective perspectives provide essential explanatory context for the significant statistical improvements observed in learning outcomes and engagement within the experimental group.

5. Discussion

5.1. Interpreting the Findings Through Theoretical Lenses

The observed enhancement in learning outcomes aligns with Constructivist Learning Theory [54]. A prevalent critique of generative AI in education posits that immediate information retrieval might function as a “behaviorist shortcut,” potentially circumventing the “productive struggle” essential for deep knowledge construction. However, our findings indicate that the RAG-digital human system facilitated a process of “scaffolded inquiry” rather than passive consumption.

Interaction logs reveal that students engaged in iterative dialectic processes—posing initial queries, evaluating the agent’s output, and refining follow-up inquiries (e.g., exploring thematic connections to Art Deco). This iterative cycle mirrors hypothesis testing, a fundamental constructivist mechanism wherein learners actively recalibrate mental models based on feedback. By delivering “just-in-time” contextual knowledge (via Top-

k

chunk injection), the system enabled students to bridge the gap between current competency and complex historical narratives without cognitive overload, effectively operationalizing the ZPD within a digital modality [5].

The efficacy of this scaffolding is empirically substantiated by the Instructional Efficiency (

E

) analysis calculated in this study. The pronounced divergence in efficiency scores (

d = 2.39

) between the experimental group (

M = + 0.72

) and the control group (

M = - 0.68

) provides a quantitative physiological basis for the observed learning gains. The negative efficiency observed in the control group suggests a “high-effort, low-performance” state, likely driven by the high Extraneous Cognitive Load inherent in traditional manual information retrieval. Conversely, the high positive efficiency of the experimental group confirms that the RAG system successfully liberated working memory resources from these low-level retrieval tasks. This reallocation allowed students to direct their cognitive effort toward germane load—the schematic construction and conceptual synthesis required for deep learning—thereby validating Paas’s framework in an AI-integrated context.

While the RAG architecture optimized cognitive resource allocation, it introduced an inherent interaction latency (~5 s), primarily attributable to the rendering overhead of the LiveTalking module. From the standpoint of Cognitive Load Theory, such temporal gaps risk disrupting the “flow” state and reintroducing extraneous load. However, within the specific pedagogical context of design education—which prioritizes reflective synthesis over rapid-fire recall—observational data suggest that this latency was behaviorally tolerated. Critically, the implementation of “latency masking” strategies (e.g., nodding or consulting virtual notes) reframed this delay as “thinking time.” Research in human-agent interaction suggests that such “idle” behaviors function as conversational fillers, significantly mitigating the negative impact of latency on user satisfaction [57]. Unlike a static loading spinner, which signals a system state, the digital human’s thoughtful gaze signals a social state of “active processing.” This anthropomorphic signaling maintains the suspension of disbelief, effectively convincing students that the delay is a necessary period of cognitive exertion by the tutor, thereby preserving trust and patience during the retrieval interval. Interview data corroborated this, with students perceiving the delays as evidence that the agent was providing “thoughtful” responses rather than robotic outputs, a finding consistent with established expectations in academic mentorship interactions [30].

5.2. Theoretical and Empirical Contributions

This study bridges a critical gap regarding the integrated implementation and empirical validation of RAG and digital humans in design education. While prior scholarship has explored RAG’s utility in domain-specific Q&A and personalized learning [23,24,27], our results provide direct empirical evidence of its efficacy within Design History—a discipline characterized by visual, conceptual, and historical complexity. The integration of Qwen3 and BGE-Large-Zh into the RAG pipeline offers a robust mechanism to mitigate common Large Language Model limitations, such as factual hallucinations, by anchoring generative outputs in verifiable external sources.

In contrast to earlier digital human applications that prioritized immersion or scripted interactions [12,13], this system leverages RAG to enable real-time knowledge retrieval and synthesis, thereby transcending the static constraints of conventional virtual tutors. The observed increase in student engagement aligns with previous research on virtual agents but extends these findings into a knowledge-intensive, AI-augmented context. Furthermore, the local deployment architecture (via Ollama and LiveTalking) effectively addresses institutional concerns regarding data privacy and security.

Finally, the strong correlation between engagement and academic outcomes reinforces established educational theory [53]. Regression analysis confirmed that the intervention’s positive impact on learning persisted even after controlling for prior knowledge, suggesting the system confers cognitive benefits distinct from purely motivational effects. These findings offer robust empirical support for the synergistic integration of RAG and digital humans—an area previously underexplored in educational scholarship.

5.3. Practical Strategies for Using RAG-Enabled Digital Humans in Education

These findings offer salient implications for design educators and institutions evaluating AI-enabled pedagogies. RAG architectures demonstrate significant utility in constructing discipline-specific knowledge bases that are both dynamically responsive and epistemically precise [58]. Institutions are encouraged to deploy RAG-enhanced platforms leveraging advanced Large Language Models and domain-specific embeddings to facilitate adaptive instructional scaffolding. Concurrently, digital human interfaces function as effective drivers of learner engagement, showing particular promise in interactive case studies, automated inquiry systems, and personalized feedback loops. Crucially, however, these technologies must be positioned to augment rather than supplant human instruction, preserving the instructor’s essential role in fostering interpretive nuance, emotional resonance, and higher-order critical thinking.

Successful integration requires rigorous alignment with explicit pedagogical objectives, as evidenced by the synchronization between system capabilities and the Design History curriculum in this study. Future adoption strategies should prioritize instructional design aligned with learning outcomes over technology-driven determinism. To facilitate this, sustained professional development in AI literacy is essential to enhance faculty implementation capacity. Finally, ethical governance—encompassing bias mitigation, algorithmic fairness, and data privacy—must remain a central priority throughout the design and deployment lifecycle.

5.4. Research Limitations

While the experimental setup utilized high-end dual NVIDIA RTX 4090 GPUs to ensure zero-compromise inference speeds for research validation, we acknowledge that this hardware footprint presents a barrier for widespread adoption in resource-constrained schools. To address this, practical deployment can follow a “Thin-Client” architecture, where a single centralized GPU server supports multiple low-end classroom terminals via a local network, significantly amortizing the hardware cost per student. Furthermore, our use of 4-bit quantization (AWQ) demonstrates that the Qwen3-30B model can theoretically run on consumer-grade hardware (e.g., RTX 4060 with aggressive offloading) with only marginal increases in latency. Future iterations will explore Small Language Models (SLMs) specifically distilled for design history, potentially enabling edge deployment on standard faculty laptops.

The experimental design necessitates a cautious interpretation regarding the potential influence of the “Novelty Effect.” Consistent with the Media Equation, users frequently exhibit heightened engagement with emerging technologies attributable to their unprecedented nature rather than their intrinsic utility. Given the five-week duration of this intervention, it is plausible that the observed engagement metrics were partially driven by the initial excitement of interacting with an AI avatar. Future longitudinal studies are required to determine whether these engagement levels persist once the technology becomes familiar.

A significant methodological limitation lies in the conflation of the delivery medium (Digital Human) with the instructional mechanism (RAG-based Retrieval). The current binary comparison (AI Avatar vs. Traditional Instruction) precludes the isolation of the anthropomorphic interface’s specific contribution distinct from the efficiency of the intelligent search algorithm. Consequently, it remains unclear whether the benefits stem primarily from social presence or improved information access. Future research should employ a factorial design, incorporating a “Text-Only RAG” control arm, to decouple the pedagogical effects of the interface from the underlying retrieval logic.

Furthermore, the reliance on standardized post-tests as the primary outcome measure may not fully capture the multidimensional nature of design competency. While adequate for assessing historical knowledge, this metric may lack construct validity for evaluating practical skills, creative problem-solving, and design thinking. The absence of diverse assessment modalities—such as design critiques, portfolio reviews, or peer evaluations—limits the study’s ability to provide a comprehensive evaluation of holistic learning.

Finally, the study’s contextual specificity—conducted within a Design History curriculum at a specific geographic locale—may limit transferability to studio-based design disciplines or diverse educational settings. Future investigations should expand the scope to varying subdisciplines and demographic populations to establish broader generalizability.

6. Conclusions

This multi-center randomized controlled trial (

N = 150

) provides robust empirical evidence supporting the integration of RAG with digital human interfaces in design pedagogy, offering clear answers to our research questions and validating the proposed hypotheses.

First, regarding H1 (Learning Achievement), the results confirm that the RAG-enhanced system significantly improves academic performance, as evidenced by the substantial effect size in post-test scores (

C o h e n ’ s d = 1.14

) favoring the experimental group. Second, H3 (Engagement) is fully supported, with behavioral engagement metrics (

d = 1.39

) demonstrating that the anthropomorphic interface successfully sustained student interest. Third, validating H2 (Cognitive Load), the Paas Mental Effort Rating Scale revealed a significant reduction in extraneous cognitive load (

d = 1.71

) for the experimental group. This confirms that the system effectively offloads information retrieval tasks, thereby optimizing Instructional Efficiency (E) and freeing cognitive resources for higher-order schema construction.

This suggests that the system enhances instructional efficiency by effectively offloading information retrieval tasks to the RAG architecture, thereby freeing cognitive resources for schema construction and higher-order synthesis.

Theoretically, these findings validate a synergistic model wherein the RAG component functions as an epistemic scaffold—ensuring content accuracy and depth—while the anthropomorphic interface provides the social presence necessary to sustain learner motivation. This dual mechanism effectively operationalizes Social Constructivist principles within an AI-driven environment, bridging the gap between static knowledge bases and interactive, personalized learning.

From a practical perspective, this study confirms the viability of locally deployed architectures (utilizing quantized models like Qwen3-Int4) to deliver high-performance tutoring while adhering to strict data privacy standards. However, we acknowledge that the observed efficacy may be partially influenced by the “Novelty Effect” or the inherent efficiency of the search mechanism. Consequently, future scholarship should employ factorial designs (e.g., distinguishing “Text-Only RAG” from “Avatar RAG”) to decouple the specific pedagogical contributions of anthropomorphism from intelligent retrieval. Longitudinal inquiry is further required to assess whether the observed reduction in cognitive load translates into the sustained, long-term retention of complex design history concepts.

Author Contributions

Conceptualization, X.Z. and S.Z.; methodology, X.Z. and S.Z.; software, X.Z.; validation, P.W.; formal analysis, X.Z. and S.Z.; investigation, X.Z., S.Z. and Y.C.; resources, P.W. and Y.C.; data curation, P.W. and Y.C.; writing—original draft preparation, X.Z., S.Z. and P.W.; writing—review and editing, X.Z., S.Z., P.W. and Y.C.; visualization, P.W.; supervision, P.W.; project administration, X.Z. and S.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the “2025 Planning Research Topics of the Association of Employment for University Graduates (China)” under Grant GJXY2025N057; “2024 University Basic Research of Liaoning Provincial Department of Education (China)” under Grant number LJ132413599003; “Liaoning Cultural and Creative Industry Collaborative Innovation Research Center (China)” under Grant number WH2024002.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of Dalian Art College (Protocol Code DAC-2024-07-18), approved on 18 July 2024) for studies involving human subjects.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets and models used in this study are not publicly available due to institutional confidentiality protocols. However, requests for access to relevant materials may be directed to the Landscape Teaching and Research Office at Dalian Art College (E-mail: DACdesign319@163.com).

Acknowledgments

We would like to express our heartfelt gratitude to the faculty and students of Dalian Art College; Dalian University of Technology; Dalian Neusoft University of Information; Daegu University for their support and contributions to this research. Their assistance was invaluable in the completion of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

import streamlit as st
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import requests
import json
import time
# --- Configurations ---
# Define the names/paths of the key models and data
EMBEDDING_MODEL_NAME = ’BAAI/bge-large-zh’ # Specify the embedding model
KNOWLEDGE_INDEX_PATH = "design_knowledge_index.faiss" # Path to the FAISS index
KNOWLEDGE_EMBEDDINGS_PATH = "design_knowledge_embeddings.npy" # Path to the knowledge base embeddings
KNOWLEDGE_TEXTS_PATH = "design_knowledge_texts.npy" # Path to the knowledge base texts

OPENWEBUI_API_URL = ’http://localhost:8080/api/v1/chat’ # URL for OpenWebUI API
OPENWEBUI_MODEL_NAME = ’qwen’ # Specify the language model name (must match OpenWebUI)

STABLE_DIFFUSION_API_URL = ’http://localhost:7860/sdapi/v1/txt2img’ # URL for Stable Diffusion API (optional image generation)
LIVETALKING_STREAM_URL = "http://localhost:8000/video_feed" # URL for LiveTalking video stream
# --- Load Models and Data ---
@st.cache_resource # Cache the models to avoid reloading
def load_models():
    """Loads the embedding model, FAISS index, and knowledge base."""
    embedding_model = SentenceTransformer(EMBEDDING_MODEL_NAME)
    index = faiss.read_index(KNOWLEDGE_INDEX_PATH)
    knowledge_base = np.load(KNOWLEDGE_EMBEDDINGS_PATH)
    knowledge_texts = np.load(KNOWLEDGE_TEXTS_PATH)
    return embedding_model, index, knowledge_base, knowledge_texts

embedding_model, index, knowledge_base, knowledge_texts = load_models()
# --- Functions ---
def rag_generate_with_openwebui(prompt):
    """Generates text using the RAG pipeline and OpenWebUI."""
    headers = {’Content-Type’: ’application/json’}
    data = {
     "model": OPENWEBUI_MODEL_NAME,
     "messages": [{"role": "user", "content": prompt}],
     "stream": False
    }
    try:
     response = requests.post(OPENWEBUI_API_URL, headers=headers, data=json.dumps(data), timeout=10)
     response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
     return response.json()[’choices’][0][’message’][’content’]
    except requests.exceptions.RequestException as e:
     st.error(f"Failed to communicate with the language model: {e}")
     return None

def rag_pipeline(query):
    """Retrieval-Augmented Generation pipeline."""
    query_embedding = embedding_model.encode(query) # Encode the query using the embedding model
    query_embedding = np.array([query_embedding]).astype(’float32’) # Convert to a numpy array
    D, I = index.search(query_embedding, k=3) # Search the FAISS index for the top 3 relevant documents
    context = [knowledge_texts[i] for i in I[0]] # Retrieve the context texts

prompt = f"Answer the question based on the following background knowledge:\n{’ ’.join(context)}\nQuestion: {query}" # Construct the prompt
return rag_generate_with_openwebui(prompt) # Generate the answer

def generate_lecture_content(topic):
    """Generates lecture content for a given topic using the RAG pipeline."""
    prompt = f"Explain the following design topic in detail: {topic}"
    return rag_generate_with_openwebui(prompt)

def generate_image(prompt):
    """Generates an image using Stable Diffusion (optional)."""
    headers = {’Content-Type’: ’application/json’}
    payload = {
     "prompt": prompt,
     "negative_prompt": "ugly, deformed, distorted",
     "steps": 20,
     "sampler_index": "DPM++ 2M Karras"
    }
    try:
     response = requests.post(STABLE_DIFFUSION_API_URL, headers=headers, data=json.dumps(payload), timeout=60)
     response.raise_for_status()
     r = response.json()
     for i in r[’images’]:
     return f"data:image/png;base64,{i}"
    except requests.exceptions.RequestException as e:
     st.error(f"Failed to generate image: {e}")
     return None
# --- Streamlit App ---
st.title("AI-Powered Design Teaching Assistant")
# --- Sidebar Controls ---
with st.sidebar:
    st.header("System Controls")
    if st.button("Start Automated Lecture"):
     st.session_state.lecture_mode = True
     st.session_state.lecture_index = 0
     st.session_state.lecture_paused = False

    if st.session_state.get("lecture_mode", False):
     col1, col2 = st.columns(2)
     if col1.button("Pause Lecture"):
     st.session_state.lecture_paused = True
     if col2.button("Resume Lecture"):
     st.session_state.lecture_paused = False

     col3, col4 = st.columns(2)
     if col3.button("Skip Topic"):
     st.session_state.lecture_index += 1
     st.session_state.lecture_paused = False
     if col4.button("Repeat Topic"):
     st.session_state.lecture_paused = False

     if st.session_state.lecture_index > 0 and st.button("Previous Topic"):
     st.session_state.lecture_index -= 1
     st.session_state.lecture_paused = False

     if st.button("End Lecture"):
     st.session_state.lecture_mode = False
     st.session_state.lecture_index = 0
     st.session_state.lecture_paused = False
# --- Main Interface ---
col_video, col_content = st.columns([0.7, 0.3])

with col_video:
st.header("Digital Human Instructor")
st.markdown(f’<iframe src="{LIVETALKING_STREAM_URL}" height="480" width="640" frameborder="0" scrolling="no"></iframe>’, unsafe_allow_html=True)

with col_content:
st.header("Interaction & Feedback")
query = st.text_input("Ask a question:")

    if st.button("Ask") and query:
     with st.spinner("Thinking…"):
     response = rag_pipeline(query)
     if response:
     st.write(f"**Instructor:** {response}")

    st.subheader("Provide Feedback")
    feedback = st.text_area("Your feedback:")
    if st.button("Submit Feedback") and feedback:
     # TODO: Implement feedback submission mechanism (e.g., store in database, send via email)
     st.success("Feedback submitted!")

st.header("Automated Lecture")
if st.session_state.get("lecture_mode", False):
    if st.session_state.lecture_index < len(COURSE_OUTLINE):
     current_topic = COURSE_OUTLINE[st.session_state.lecture_index]
     st.subheader(f"Current Topic: {current_topic}")

     if not st.session_state.get("lecture_paused", False):
     with st.spinner(f"Lecturing on: {current_topic}"):
     lecture_script = generate_lecture_content(current_topic)
     if lecture_script:
     st.write(f"**Instructor:** {lecture_script}")

     image_prompt = f"design illustration for {current_topic}"
     image_url = generate_image(image_prompt)
     if image_url:
     st.image(image_url, caption=f"Image related to {current_topic}", use_column_width=True)

     st.session_state.lecture_index += 1
     else:
     st.info("Lecture Paused")
    else:
     st.success("Lecture Completed!")
     st.session_state.lecture_mode = False
# --- State Initialization ---
if "lecture_mode" not in st.session_state:
    st.session_state.lecture_mode = False
if "lecture_index" not in st.session_state:
    st.session_state.lecture_index = 0
if "lecture_paused" not in st.session_state:
    st.session_state.lecture_paused = False
# --- Example COURSE_OUTLINE ---
COURSE_OUTLINE = [
    "Introduction to Modern World Design",
"The Bauhaus Movement"

References

Cross, N. Design Thinking: Understanding How Designers Think and Work; Berg: Oxford, UK, 2011; pp. 1–234. [Google Scholar]
Almufarreh, A.; Arshad, M. Promising emerging technologies for teaching and learning: Recent developments and future challenges. Sustainability 2023, 15, 6917. [Google Scholar] [CrossRef]
Laurillard, D. Teaching as a Design Science: Building Pedagogical Patterns for Learning and Technology; Routledge: London, UK, 2012; pp. 1–224. [Google Scholar]
Waldrop, M.M. The science of teaching science. Nature 2015, 523, 272. [Google Scholar] [CrossRef] [PubMed]
Neshaei, S.P.; Tashkovska, M.; Mejia-Domenzain, P.; Wambsganss, T.; Käser, T. User-centric Reflective Writing Assistance: Leveraging RAG for Enhanced Personalized Support. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; pp. 1–8. [Google Scholar]
Yu, Y.; Liang, M.; Yin, M.; Lu, K.; Du, J.; Xue, Z. Unsupervised Multimodal Graph Contrastive Semantic Anchor Space Dynamic Knowledge Distillation Network for Cross-Media Hash Retrieval. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 4699–4708. [Google Scholar]
Kolb, D.A. Experiential Learning: Experience as the Source of Learning and Development, 2nd ed.; FT Press: Upper Saddle River, NJ, USA, 2015; pp. 1–400. [Google Scholar]
Zhou, X.; Kim, S.; Chen, Y. Generative AI visual creativity system combined with knowledge retrieval. J. Comput. Methods Sci. Eng. 2025, 14727978251346065. [Google Scholar] [CrossRef]
Notarangelo, L.D.; Kim, M.S.; Walter, J.E.; Lee, Y.N. Human RAG mutations: Biochemistry and clinical implications. Nat. Rev. Immunol. 2016, 16, 234–246. [Google Scholar] [CrossRef]
Arslan, M.; Ghanem, H.; Munawar, S.; Cruz, C. A Survey on RAG with LLMs. Procedia Comput. Sci. 2024, 246, 3781–3790. [Google Scholar] [CrossRef]
Bailenson, J.N. Virtual interpersonal touch: Haptic interaction and social presence in immersive virtual environments. Presence Teleoper. Virtual Environ. 2006, 15, 586–603. [Google Scholar]
Sung, E.C.; Han, D.I.D.; Bae, S.; Kwon, O. What drives technology-enhanced storytelling immersion? The role of digital humans. Comput. Hum. Behav. 2022, 132, 107246. [Google Scholar] [CrossRef]
Demirel, H.O.; Ahmed, S.; Duffy, V.G. Digital human modeling: A review and reappraisal of origins, present, and expected future methods for representing humans computationall. Int. J. Hum.–Comput. Interact. 2022, 38, 897–937. [Google Scholar] [CrossRef]
Mikropoulos, T.A.; Natsis, A. Educational virtual environments: A ten-year review of empirical research (1999–2009). Comput. Educ. 2011, 56, 769–780. [Google Scholar] [CrossRef]
Woodgate, D. Immersive spatial narratives as a framework for augmenting creativity in foresight-based learning systems. Horiz. Int. J. Learn. Futures 2019, 27, 57–71. [Google Scholar] [CrossRef]
Marinelli, A.; Iannacci, F.; Papile, F.; Diamanti, M.V.; Sponchioni, M.; Del Curto, B. Higher education classroom of the future: An EU-funded project integrating physical lectures, virtual reality, and artificial intelligence. In Proceedings of the 13th International Materials Education Symposium, Cambridge, UK, 4–5 April 2024; p. 38. [Google Scholar]
Pedro, F.; Subosa, M.; Rivas, A.; Valverde, P. Artificial Intelligence in Education: Challenges and Opportunities for Sustainable Development; United Nations Educational, Scientific and Cultural Organization (UNESCO): Paris, France, 2019; pp. 1–59. [Google Scholar]
Lang, Q.; Wang, M.; Yin, M.; Liang, S.; Song, W. Transforming education with generative AI (GAI): Key insights and future prospects. IEEE Trans. Learn. Technol. 2025, 18, 230–242. [Google Scholar] [CrossRef]
Filipović, A.M.; Fastić-Pajk, I.; Puljiz, H.; Šabić, I. AI-Driven Customization of Adaptive Learning Content: Lessons from Applying LLMs. In Proceedings of the Digital Transformation in Education and Artificial Intelligence Application: Third International Conference, MoStart 2025, Mostar, Bosnia and Herzegovina, 23–25 April 2025; Volume 202, p. 67. [Google Scholar]
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Lowe, R. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
Sadat, M.; Zhou, Z.; Lange, L.; Araki, J.; Gundroo, A.; Wang, B.; Menon, R.; Parvez, R.; Feng, Z. DelucionQA: Detecting hallucinations in domain-specific question answering. In Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Singapore, 2023; pp. 822–835. [Google Scholar]
Izacard, G.; Lewis, P.; Grave, E.; Martinet, L.; Kessler, G.; Usunier, N.; Joulin, A. Distilling knowledge from reader to retriever for question answering. arXiv 2021, arXiv:2101.00408. [Google Scholar]
Henkel, O.; Levonian, Z.; Li, C.; Postle, M. Retrieval-augmented generation to improve math question-answering: Trade-offs between groundedness and human preference. In Proceedings of the 17th International Conference on Educational Data Mining, Paris, France, 1–4 July 2024; pp. 315–320. [Google Scholar]
Thway, M.; Recatala-Gomez, J.; Lim, F.S.; Hippalgaonkar, K.; Ng, L.W. Harnessing GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbot’s Impact on Human Learning. arXiv 2024, arXiv:2406.07796. [Google Scholar] [CrossRef]
Chen, L.; Chen, P.; Lin, Z. Artificial intelligence in education: A review. IEEE Access 2020, 8, 75264–75278. [Google Scholar] [CrossRef]
Ali, S.; Fatima, F.; Hussain, J.; Qureshi, M.I.; Fatima, S.; Zahoor, A. Exploring Student’s Experiences and Problems in Online Teaching and Learning During COVID-19 and Improvement of Current LMS Through Human-Computer Interaction (HCI) Approaches. Int. J. Interact. Mob. Technol. 2023, 17, 4–21. [Google Scholar] [CrossRef]
Kahl, S.; Löffler, F.; Maciol, M.; Ridder, F.; Schmitz, M.; Spanagel, J.; Schilling, M. Enhancing AI Tutoring in Robotics Education: Evaluating the Effect of Retrieval-Augmented Generation and Fine-Tuning on Large Language Models; University of Münster: Münster, Germany, 2024. [Google Scholar]
Holly, M.; Pirker, J.; Resch, S.; Brettschuh, S. Designing VR experiences–expectations for teaching and learning in VR. Educ. Technol. Soc. 2021, 24, 107–119. [Google Scholar]
Dede, C. Immersive interfaces for engagement and learning. Science 2009, 323, 66–69. [Google Scholar] [CrossRef]
Selwyn, N. Should Robots Replace Teachers? AI and the Future of Education; Polity Press: Cambridge, UK, 2021; pp. 1–232. [Google Scholar]
Zhou, X.; Kim, S.; Wang, Y.; Zhang, K. Beyond sparsity: An empirical study of structured collaboration in modular AI. Neurocomputing 2025, 657, 131616. [Google Scholar] [CrossRef]
Zuboff, S. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power; PublicAffairs: New York, NY, USA, 2019; pp. 1–704. [Google Scholar]
Yao, Y.; González-Vélez, H. AI-Powered System to Facilitate Personalized Adaptive Learning in Digital Transformation. Appl. Sci. 2025, 15, 4989. [Google Scholar] [CrossRef]
Elston, D.M. The novelty effect. J. Am. Acad. Dermatol. 2021, 85, 565–566. [Google Scholar] [CrossRef]
Lipku. LiveTalking. Available online: https://github.com/lipku/LiveTalking (accessed on 19 December 2023).
Luccioni, A.S.; Akerman, G.; Vishnyakova, O.; Akimov, A.; Astakhov, D.; Obukhov, V. Open LLM leaderboard. Hugging Face. Available online: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard (accessed on 1 July 2024).
Yang, A.; Li, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Gao, C.; Huang, C.; Lv, C.; et al. Qwen3 technical report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]
Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
Yang, M.; Diao, M.; Luo, J.; Shen, W.; Zhang, C. GLM-4 Based Method for Automatic Construction of Content Graph. IEEE Access 2025, 13, 197300–197311. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
Guan, K.; Cao, Q.; Sun, Y.; Wang, X.; Song, R. BSharedRAG: Backbone shared retrieval-augmented generation for the E-commerce domain. arXiv 2024, arXiv:2409.20075. [Google Scholar]
Tang, S.; Liu, A.T.; Xie, J.; Zhou, Z.; Dong, L.; Smith, N.A.; Wei, F. E5: A Self-supervised Framework for Aligning Embeddings across Languages. arXiv 2024, arXiv:2212.03533. [Google Scholar]
GanymedeNil. Text2vec-Base-Chinese. Available online: https://huggingface.co/GanymedeNil/text2vec-base-chinese (accessed on 25 June 2024).
Baek, J.; Hussain, A.; Liu, D.; Vincent, N.; Kim, L.H. Open WebUI: An Open, Extensible, and Usable Interface for AI Interaction. arXiv 2025, arXiv:2510.02546. [Google Scholar] [CrossRef]
Li, J.; Zhang, J.; Bai, X.; Zheng, J.; Zhou, J.; Gu, L. Er-nerf++: Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis. Inf. Fusion 2024, 110, 102456. [Google Scholar] [CrossRef]
Liang, C.; Wang, Q.; Chen, Y.; Tang, M. Wav2Lip-HR: Synthesising clear high-resolution talking head in the wild. Comput. Animat. Virtual Worlds 2024, 35, e2226. [Google Scholar] [CrossRef]
Dong, Z.; Liu, X.; Chen, B.; Polak, P.; Zhang, P. Musechat: A conversational music recommendation system for videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 12775–12785. [Google Scholar]
An, L. Ultralight-Digital-Human. Available online: https://github.com/anliyuan/Ultralight-Digital-Human (accessed on 10 October 2024).
Mandal, S.; Ghosh, B.; Chakraborty, S.; Naskar, R. Can Deepfakes Mimic Human Emotions? A Perspective on Synthesia Videos. In Proceedings of the TENCON 2024-2024 IEEE Region 10 Conference (TENCON), Singapore, 1–4 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 306–309. [Google Scholar]
Contreras, I.; Hossfeld, S.; de Boer, K.; Wiedler, J.T.; Ghidinelli, M. Revolutionising faculty development and continuing medical education through AI-generated videos. J. CME 2024, 13, 2434322. [Google Scholar] [CrossRef]
Popov, V.; Vovk, I.; Gogoryan, V.; Sadekova, T.; Kudinov, M. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning; PMLR: New York, NY, USA, 2021; pp. 8599–8608. [Google Scholar]
Ferguson, C.; van den Broek, E.L.; van Oostendorp, H. AI-induced guidance: Preserving the optimal zone of proximal development. Comput. Educ. Artif. Intell. 2022, 3, 100089. [Google Scholar] [CrossRef]
Waller, R.; Dahle-Huff, K. More Knowledgeable Others: Exploring Professional Development of Rural Reading Specialists. Educ. Res. Theory Pract. 2023, 34, 68–74. [Google Scholar]
Mayer, R.E. The past, present, and future of the cognitive theory of multimedia learning. Educ. Psychol. Rev. 2024, 36, 8. [Google Scholar] [CrossRef]
Sweller, J.; van Merriënboer, J.J.; Paas, F. Cognitive architecture and instructional design: 20 years later. Educ. Psychol. Rev. 2019, 31, 261–292. [Google Scholar] [CrossRef]
Vygotsky, L.S. Mind in Society: The Development of Higher Psychological Processes; Harvard University Press: Cambridge, MA, USA, 1978; pp. 1–159. [Google Scholar]
Pelikan, H.; Hofstetter, E. Managing Delays in Human-Robot Interaction. ACM Trans. Comput.-Hum. Interact. 2023, 30, 1–42. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Xu, C.; McNeil, G.; Sridhar, A.; Sutskever, I.; Tieleman, T. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]

Figure 1. Digital Human Teacher Interface.

Figure 2. The system architecture.

Figure 3. Quantitative Analysis Framework.

Figure 4. Sensitivity Analysis of RAG Retrieval Threshold (τ).

Figure 5. Instructional Efficiency Matrix.

Figure 6. Latency Decomposition of the RAG-Digital Human Pipeline.

Figure 7. Qualitative Analysis Framework.

Table 1. Comparative Analysis of Open-Source Large Language Models [37,38,39].

Model	MMLU	IFEVAL	Resource Usage
Qwen3	81.38	82.3	Medium
Llama-3	68.4	76.8	Medium
GLM-4	86.5	87.6	High

Table 2. Comparative Analysis of Embedding Models [41,42,43].

Model	Embedding Dimensions	Average Recall Rate	Model Size
BGE-Large-Zh	1024	High (80%+)	Large (1.3 GB)
E5-Large-v2-CN	1024	High (78%+)	Large (1.3 GB)
Ganymede Base	768	Medium (70%+)	Medium (500 MB)

Table 3. Comparative Analysis of LiveTalking and Mainstream Digital Human Platforms.

Platform	Realism	Customization Level	Interactivity	Integration	Cost
LiveTalking	High (model-dependent)	Very High	Medium	High (RTMP/WebRTC)	Low (open-source)
Synthesia	Medium	Medium	Low	Medium (API)	High (subscription)
Hour One	Medium	Medium	Low	Medium (API)	High (subscription)

Table 4. Demographic Characteristics and Baseline Equivalence.

Characteristic	Experimental (n = 75)	Control (n = 75)	Test Statistic	p-Value
Age (Mean ± SD)	21.4 ± 1.2	21.2 ± 1.3	$t = 0.98$	0.328
Gender (Female %)	48 (64.0%)	45 (60.0%)	$χ 2 = 0.26$	0.612
Prior GPA (0–4.0)	3.42 ± 0.35	3.39 ± 0.38	$t = 0.51$	0.614
Tech Familiarity (1–5)	3.8 ± 0.7	3.7 ± 0.8	$t = 0.82$	0.413

Note: Tech Familiarity was self-reported on a 5-point Likert scale. No significant differences were found (p > 0.05).

Table 5. Shapiro–Wilk Test Results for Normality.

Group	Variable	n	Shapiro–Wilk Statistic (W)	p-Value
Exp	Pre-test Score	75	0.978	0.245
Ctrl	Pre-test Score	75	0.982	0.355
Exp	Post-test Score	75	0.975	0.158
Ctrl	Post-test Score	75	0.969	0.062
Exp	Classroom Engagement	75	0.970	0.633
Ctrl	Classroom Engagement	75	0.961	0.430

Table 6. Descriptive Statistics for Experimental and Control Groups.

Indicator	Group	n	Mean (M)	Standard Deviation (SD)
Pre-test Score	Exp	75	70.15	8.42
Pre-test Score	Ctrl	75	69.88	7.95
Post-test Score	Exp	75	86.42	6.85
Post-test Score	Ctrl	75	78.10	7.64
classroom Engagement	Exp	75	15.84	3.41
classroom Engagement	Ctrl	75	11.20	4.02
Mental Effort Rating	Exp	75	4.32	1.65
Mental Effort Rating	Ctrl	75	7.15	1.78

Table 7. Independent Samples t-test Results of Pre-test.

Scores Indicator	t	df	p	Mean Difference	Standard Error
Pre-test Score	0.201	148	0.841	0.27	2.353

Table 8. Independent Samples t-test Results for Post-test Scores.

Measure	t	df	p	Mean Diff.	95% CI of Diff.	Standard Error	Cohen’s d
Post-test Score	7.01	148	<0.001	8.32	[6.01, 10.63]	1.18	1.14

Table 9. Independent Samples t-test Results for Classroom Engagement.

Measure	t	df	p	Mean Diff.	95% CI of Diff.	Standard Error	Cohen’s d
Classroom Engagement	8.54	148	<0.001	4.64	[3.58, 5.70]	0.54	1.39

Table 10. Correlation Analysis Results between Classroom Engagement and Post-test Scores.

Measure	Post-Test Score
Classroom Engagement	0.587 **

**

p < 0.01

,

N = 150

.

Table 11. Multiple Linear Regression Analysis Results.

Model	Variables	B	SE	β	t	p
1	(Constant)	22.885	5.89		4.21	<0.001
	Group	6.073	0.77	0.465	7.82	<0.001
	Pre-test Score	0.546	0.054	0.582	9.94	<0.001

(

R^{2} = 0.68

).

Table 12. Summary of Thematic Analysis (

n = 30

).

Table 12. Summary of Thematic Analysis (

n = 30

).

Primary Theme	Sub-Themes/Codes	Frequency	Representative Quote
1. Information Accessibility (Cognitive Load)	Rapid Retrieval Contextual Relevance Reduced Search Effort	27 (90%)	“It acted like a smart index. I didn’t have to manage a dozen tabs; the relevant knowledge was just delivered to me.”
2. Engagement & Motivation (Social Presence)	Interactivity Visual Appeal/Novelty Anthropomorphism	25 (83%)	“Having a ‘face’ to talk to changed the vibe. It felt like studying with a partner rather than alone.”
3. Personalized Support (Psychological Safety)	Low Inhibition/No Judgment Self-Paced Review Tailored Explanation	23 (76%)	“I could fix my knowledge gaps privately without disrupting the whole class.”
4. System Limitations (Technical/Pedagogical)	Lack of Nuance/Depth Latency/Delays Robotic Intonation	18 (60%)	“The answers were technically correct but lacked the ‘soul’ or unique insight a real professor gives.”

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, X.; Zhao, S.; Wu, P.; Chen, Y. System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency. Appl. Sci. 2026, 16, 1068. https://doi.org/10.3390/app16021068

AMA Style

Zhou X, Zhao S, Wu P, Chen Y. System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency. Applied Sciences. 2026; 16(2):1068. https://doi.org/10.3390/app16021068

Chicago/Turabian Style

Zhou, Xiaofei, Shiru Zhao, Pengjun Wu, and Yan Chen. 2026. "System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency" Applied Sciences 16, no. 2: 1068. https://doi.org/10.3390/app16021068

APA Style

Zhou, X., Zhao, S., Wu, P., & Chen, Y. (2026). System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency. Applied Sciences, 16(2), 1068. https://doi.org/10.3390/app16021068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

System Design and Evaluation of RAG-Enhanced Digital Humans in Design Education: Analyzing Cognitive Load and Instructional Efficiency

Abstract

1. Introduction

1.1. Research Background and Significance

1.2. Research Questions and Motivation

2. Literature Review

2.1. Potential and Applications of RAG Technology in Education

2.2. Applications of Digital Humans in Design Education

2.3. Research Gaps and Contributions of This Study

3. Research Methodology

3.1. System Design and Implementation: Synthesizing RAG and Digital Humans (RQ1)

3.1.1. RAG Model Selection and System Construction

3.1.2. Digital Human Design: Implementation Based on LiveTalking

3.1.3. System Integration and Local Deployment

3.1.4. Pedagogical Framework and Curriculum Integration

3.2. Experimental Design and Implementation

3.2.1. Experimental Subjects and Setting

3.2.2. Group Assignment and Control

3.2.3. Experimental Procedure and Data Collection

3.3. Data Analysis Methods

4. Results

4.1. Quantitative Analysis: Assessing Educational Impact (RQ2)

4.2. Qualitative Analysis: Unpacking the Mechanism of Support (RQ3)

5. Discussion

5.1. Interpreting the Findings Through Theoretical Lenses

5.2. Theoretical and Empirical Contributions

5.3. Practical Strategies for Using RAG-Enabled Digital Humans in Education

5.4. Research Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI