Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning

Lu, Ming; Hu, Zhongyi

doi:10.3390/info16090734

Open AccessArticle

Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning

by

Ming Lu

^*

and

Zhongyi Hu

College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(9), 734; https://doi.org/10.3390/info16090734

Submission received: 28 July 2025 / Revised: 22 August 2025 / Accepted: 25 August 2025 / Published: 26 August 2025

(This article belongs to the Special Issue Digital Systems in Higher Education)

Download

Browse Figures

Versions Notes

Abstract

This study focuses on the mechanisms of behavior and cognition, providing a comprehensive analysis of the innovative path of multimodal learning theory in the teaching practice of the “Web Front-end Development” course. This study integrates different sensory modes, such as vision, hearing, and haptic feedback, with the core objective of exploring the specific impact of this multi-sensory integration form on students’ cognitive engagement status, classroom interaction styles, and long-term learning behavior. We employed a mixed-methods approach in this study. On the one hand, we conducted a quasi-experiment involving 120 undergraduate students. On the other hand, research methods such as behavioral coding, in-depth interviews, and longitudinal tracking were also employed. Results show that multimodal teaching significantly reduces cognitive load (a 34.9% reduction measured by NASA-TLX), increases the frequency of collaborative interactions (2.3 times per class), and extends voluntary practice time (8.5 h per week). Mechanistically, these effects are mediated by enhanced embodied cognition (strengthening motor-sensory memory), optimized cognitive load distribution (reducing extraneous mental effort), and the fulfillment of intrinsic motivational needs (autonomy, competence, relatedness) as framed by self-determination theory. This study fills in the gap between educational technology and behavioral science. We have developed a comprehensive framework that provides practical guidance for designing technology-enhanced learning environments. With such a framework, learners can not only master technical skills more smoothly but also maintain their enthusiasm for learning for a long time and continue to participate.

Keywords:

multimodal learning; web front-end development; learning behavior; cognitive load; embodied cognition

Graphical Abstract

1. Introduction

In the digital age, web front-end development is no longer just about a single technical skill. It gradually evolved into a complex behavioral process, requiring cognitive reasoning and analysis, as well as coordinated visual and manual operations, and collaboration with others to solve problems [1]. As a link between users and digital products, engaging in web front-end development not only requires proficiency in technologies such as HTML, CSS, and JavaScript, but also the ability to transform abstract code into intuitive interfaces—a process that cannot be separated from continuous focus, repeated debugging, and efficient communication with peers [2]. However, the traditional teaching model in this field often focuses too much on the output of technological achievements, while neglecting the behavioral foundation behind learning, resulting in a huge disconnect in understanding [3].

Firstly, passive cognitive participation has become mainstream. Lectures and code demonstrations focus on one-way knowledge transmission [4], resulting in short attention spans (averaging 15–20 min per session) and a superficial understanding of complex concepts, such as responsive layouts or asynchronous programming. Students rarely exhibit active exploratory behaviors, like self-initiated debugging or creative problem-solving.

Second, interactive behaviors are stunted—teacher-centered instruction limits opportunities for peer collaboration or adaptive feedback. Observational studies show that students in traditional settings engage in technical discussions fewer than 1.5 times per task, hindering the development of teamwork skills essential for real-world web front-end projects [5].

Third, persistent learning behaviors are lacking. Outcome-oriented evaluation (e.g., grading based solely on final code) fails to reinforce voluntary practice. Surveys indicate that students spend an average of only 5.7 h weekly on self-directed skill refinement, with minimal engagement in advanced learning activities [6].

These behavioral shortcomings expose a crucial issue: technical education cannot only focus on what students have learned, but also on how they learn, including where their attention is directed, how they interact with others in their daily lives, and what supports their persistence in learning. The multimodal learning theory proposes that knowledge construction originates from the fusion of multiple sensory inputs, such as vision, hearing, and haptic feedback, providing a promising framework for changing related behaviors [7]. The learning process of web front-end development courses relies on the collaboration of visual spatial reasoning (layout design), haptic operations (code input), and auditory feedback (error prompts), becoming a typical scenario for verifying multimodal learning theory [8]. Compared with multimodal applications that focus on symbolic operations in mathematics education (such as VR geometric modeling) or multimedia technologies that focus on situational reconstruction in history education (such as virtual museums), the technical features of network front-end development (real-time rendering, interactive debugging) require more information integration from multiple sensory channels. This provides a unique research scenario for exploring the impact of multimodal inputs on the “abstract concrete transformation ability” [9,10]. Considering that web front-end development itself has a strong sensory–motor component, coding, design, and user interaction are essentially sensory–motor activities. Multimodal teaching methods are expected to enhance learners’ cognitive engagement, promote collaborative interaction, and maintain learning motivation [11].

Although attention in related fields is constantly increasing, research on multimodal learning in teaching web front-end development remains scattered and fragmented. Existing research primarily focuses on technical achievements (such as code accuracy). However, the problem is that it pays little attention to the behavioral mechanisms that can drive improvement, such as whether haptic feedback affects the persistence of people when debugging code [12]. This research gap makes it difficult for us to develop empirically supported strategies to cultivate the most basic behaviors in professional settings, such as adaptability and collaboration skills [13].

So this study mainly aims to clarify three core issues:

(1) How does multimodal learning influence cognitive behaviors, including attention duration, memory retention, and problem-solving strategies in web front-end development tasks?

(2) What impact does it have on interactive behaviors, such as peer collaboration frequency, teacher–student communication, and feedback utilization?

(3) Through what behavioral mechanisms (e.g., cognitive load regulation, motivational need fulfillment) do these effects operate?

By linking multimodal teaching to theories of embodied cognition [14], cognitive load theory [15], and self-determination theory [16], this research bridges the gap between educational technology and behavioral science. This study aims to explore the mechanisms by which sensory–motor input affects learning behavior in technological learning scenarios, providing practical guidance for optimizing web front-end development teaching, developing more efficient and targeted teaching plans, and meeting actual teaching needs.

2. Related Research

2.1. Theoretical Foundations of Multimodal Learning and Behavior

The core idea of multimodal learning theory is that learning is not just about receiving information [17], but about actively integrating sensory inputs, bodily movements, and various cues from the surrounding environment [18]. In this section, we will discuss three important behavioral science frameworks that can help us understand the relationship between multimodal input and learning behavior, namely embodied cognition, cognitive load theory, and self-determination theory [19].

2.1.1. Embodied Cognition: How Sensory–Motor Integration Shapes Cognitive Behavior

The embodied cognition theory does not entirely align with traditional cognitive concepts [20]. In the past, people always thought that the mind was like an “invisible processor”, specifically used to process abstract symbols. However, this theory does not view it in this way. It believes that human psychological activity is fundamentally inseparable from the interaction between the body and its surrounding environment, and is deeply rooted in this interaction [21]. Specifically, in the context of learning web front-end development, this means that the storage of knowledge, such as code syntax, UI design, and user interaction, does not exist only in the form of abstract rules, but is intertwined with sensory experiences (such as the haptic sensation of typing on a keyboard, visual effects of web page rendering) and body movements (such as debugging with a mouse).

The core mechanisms that link embodied cognition with learning behavior are as follows:

(1) The repeated linkage of actions (such as entering flex: 1 in CSS) and sensory feedback (such as seeing div elements unfold) interacts to strengthen neural connections, thereby making the use of procedural skills such as layout design more automated [22]. This reduces the cognitive effort required for routine tasks, freeing resources for problem-solving.

(2) Visual modalities (e.g., wireframes, live previews) combined with motor manipulation (e.g., dragging UI elements) improve the ability to rotate or transform objects mentally, critical for understanding responsive design, where layouts adapt to screen sizes [23].

(3) Auditory signals, such as the buzzing sound when syntax errors occur, or haptic feedback, such as keyboard vibrations, can transform abstract code rules, like “missing semicolons in JavaScript can cause errors”, into tangible experiences, making these rules easier to remember and apply [24].

In the teaching scenario of web front-end development, these mechanisms precisely demonstrate that when students participate in code learning through multiple senses, such as writing code while listening to error prompts, their problem-solving accuracy and attention span are superior to those who rely solely on text for learning [25].

2.1.2. Cognitive Load Theory: Optimizing Information Processing for Sustained Attention

The cognitive load theory proposes that the carrying capacity of working memory limits the learning process [26]. Once there is excessive information or structural confusion, it may exceed its load range. Multimodal learning can allocate information to different sensory channels, effectively addressing this issue and regulating the three cognitive loads that directly affect learning behavior [27].

(1) External load [28]: Caused by irrelevant information or poorly presented content (such as using dense text paragraphs to explain CSS grids without accompanying charts). This diverts attention from core tasks, leading to distraction and frustration [29].

(2) Intrinsic load [30]: Inherent to task complexity (e.g., learning to nest JavaScript promises). This is unavoidable but can be managed by breaking tasks into multimodal subtasks (e.g., watching a video on promise logic while annotating a visual flowchart).

(3) Germane load [31]: Effort invested in meaningful learning (e.g., linking grid-template-columns syntax to a rendered layout). Multimodal inputs (e.g., visual examples and verbal explanations) enhance germane load by encouraging deeper processing [32].

For learners in web front-end development, cognitive load regulation is crucial. Tasks like debugging responsive layouts require simultaneous processing of code, visual output, and user requirements—easily overwhelming working memory. When information is presented across multiple channels, such as visual (e.g., real-time preview), auditory (e.g., explanation and narration), and haptic (e.g., keyboard operation), multimodal design can reduce the additional cognitive burden, allowing students to concentrate and engage in more in-depth exercises [33].

2.1.3. Self-Determination Theory: Motivational Drivers of Persistent and Social Behavior

The self-determination theory (SDT) [34] points out that three core psychological needs drive the generation of intrinsic motivation and the sustainability of behavior, namely autonomy (i.e., a sense of control over things), competence (i.e., feeling capable of effectively completing tasks), and relatedness (i.e., establishing connections with others). The combination of multimodal learning and this theory will directly affect the quality of interaction and the persistence of learning behavior in web front-end development teaching.

(1) Allowing students to choose modalities (e.g., learning via video tutorials vs. interactive code editors) fosters a sense of ownership. This predicts higher engagement in voluntary practice, as students feel their learning aligns with personal preferences [35].

(2) Instant multimodal feedback, such as the green checkmark corresponding to the correct code and the beep sound, can demonstrate learning progress and enhance learners’ self-efficacy. In the context of technical learning, this feedback method helps alleviate “debugging fatigue” and motivates students to attempt more complex tasks [36].

(3) Collaborative multimodal tools (e.g., shared visual boards and voice chat) create opportunities for joint problem-solving, fulfilling the need for social connection [37]. This enhances the quality of peer interaction, as ideas are communicated through complementary modalities (e.g., sketching a layout while explaining logic verbally).

In web front-end development teaching, where collaboration on UI/UX design is central, SDT explains why multimodal group projects increase interaction frequency and feedback utilization: they satisfy the social and motivational needs that drive sustained engagement.

2.2. Multimodal Learning in Technical Education: Behavioral Gaps

Although an increasing number of people are studying multimodal learning in technical fields such as programming and engineering, a crucial aspect that has not been well studied remains: the impact it has on learning behavior, particularly in the field of web front-end development.

2.2.1. Progress in STEM and General Programming Education

In STEM fields, multimodal interventions have demonstrated behavioral benefits:

(1) Engineering students using 3D models and verbal explanations show 35% longer focus on spatial reasoning tasks compared to text-only groups [38];

(2) Computer science students collaborating via multimodal tools (e.g., screen-sharing and voice chat) exhibit 40% more idea contributions than those using text-only platforms [39];

(3) In programming learning, learners who receive auditory error feedback (such as distinguishing grammar and logic errors using different tones) have a 29% faster error correction speed than learners who rely solely on text prompts [40].

These research findings strongly confirm the potential value of multimodal learning in shaping cognitive patterns, interactive styles, and sustained behavior. However, these studies mainly focus on exploring general technical skills and have not yet conducted specialized research on the unique needs of web front-end development. In contrast to STEM fields, web front-end education faces unique challenges in aligning code with UI outputs.

While prior studies in STEM have demonstrated the benefits of multimodal learning [38], their focus on spatial reasoning tasks overlooks the unique code-UI translation demands of web front-end development [1]. This gap underscores the need for targeted research on how multimodal inputs affect coding-specific behaviors.

2.2.2. Underexplored Frontiers in Web Front-End Education

There are some unique behavioral challenges in web front-end development courses that have not yet been addressed in existing research.

Web front-end tasks require a close relationship between code (at an abstract level) and UI output (in a concrete presentation). For example, writing media queries to adapt a layout demands translating logical rules into visual outcomes—yet no studies have explored how multimodal inputs (e.g., haptic feedback for breakpoint adjustments) influence this translation. Research has not examined how multimodal feedback (e.g., visual previews and haptic “click” simulations) shapes persistence in such open-ended tasks. Web front-end development teams frequently collaborate on visual design (e.g., color schemes) and interactive logic (e.g., form validation) to ensure a seamless user experience. However, no studies have quantified how multimodal tools (e.g., shared Figma boards and voice chat) affect the frequency or quality of technical discussions.

Additionally, existing research prioritizes outcomes (e.g., code correctness) over processes (e.g., how students allocate attention or adjust strategies). It limits understanding of why multimodal learning works—a gap this study addresses by focusing on behavioral mechanisms.

2.3. Conceptual Framework: How Multimodal Input Shapes Learning Behaviors

Drawing on the theories above, we propose a conceptual framework as shown in Figure 1, linking multimodal teaching to three core behavioral domains in web front-end education, mediated by key mechanisms:

(1) Cognitive behaviors are enhanced by embodied cognition (sensory–motor memory strengthens skill automaticity) and reduced extraneous cognitive load (multimodal distribution of information frees working memory).

(2) Interactive behaviors are fostered by meeting SDT’s relatedness need (multimodal tools facilitate richer social exchange) and more precise feedback (multimodal cues make suggestions more actionable).

(3) Persistent behaviors are driven by meeting SDT’s autonomy (modality choice) and competence (immediate sensory feedback) needs, which boost intrinsic motivation.

This framework defines multimodal learning as a behavioral intervention that not only changes students’ learning content but also reshapes their interaction patterns with learning materials. This characteristic makes it particularly valuable for addressing behavioral issues in traditional face-to-face education.

3. Research Design and Methods

To clarify this research question, we employed a mixed-methods strategy [41], specifically a quasi-experimental design that combined quantitative behavioral measurement data with qualitative analysis insights [42]. This method enables the rigorous examination of the impact of multimodal teaching on learning behavior, while also utilizing in-depth data to explore the underlying mechanisms behind it. The study was conducted over 16 weeks (2 weeks of pre-testing, 12 weeks of intervention, and 2 weeks of follow-up) at a university in Zhejiang Province, China. This study is part of the university teaching reform research project, and its implementation process is integrated into routine teaching activities. The entire research process strictly adhered to ethical norms. Informed consent was obtained from all individual participants included in the study. Participants were informed that their participation was voluntary, that they could withdraw at any time without penalty, and that their data would be anonymized and encrypted to ensure confidentiality. Written informed consent was collected from each participant prior to the commencement of data collection. This research was conducted in accordance with the Declaration of Helsinki and approved by the Experimental Animal Welfare and Ethics Committee of Wenzhou University (Approval No.: WZU-2025-102). All participants were provided with detailed information about the study objectives, procedures, potential risks, and benefits before enrollment.

3.1. Study Design

This study employed a pre-test and post-test control group design to compare the results of the control group (using traditional web front-end teaching mode) with those of the experimental group (using a multi-mode web front-end teaching mode). The selection of this design is mainly based on the following three key reasons:

(1) By holding technical content, instructional duration, and instructor expertise constant across groups, the design isolated the impact of multimodal elements (visual, auditory, haptic) on learning behaviors.

(2) Repeated measurements at four time points (pre-test: Week 0; mid-test: Week 6; post-test: Week 12; follow-up: Week 14) allowed for the assessment of behavioral changes over time and the sustainability of effects.

(3) Quantitative data (such as cognitive load rating results and frequency statistics of interactions) and qualitative data (such as interview transcripts and learner log content) complement each other, serving both to validate research findings and to explore underlying mechanisms, thereby enhancing the internal effectiveness of the study [43].

Deviations from a randomized controlled trial were due to practical constraints: students were enrolled in intact classes, and randomization within classes risked disrupting instructional continuity. To mitigate selection bias, baseline equivalence [44] between groups was verified through statistical tests (see Section 3.2.2).

3.2. Participants

The participants in this study are 120 undergraduate students, all of whom are second-year students majoring in computer science or information technology and are currently enrolled in the compulsory course “Web Front-end Development”. The reason for choosing this group is that they are novice learners who have not received system web front-end training, which ensures consistency in the starting point of skill learning.

3.2.1. Inclusion and Exclusion Criteria

(1): Inclusion: No prior formal training in web front-end development (verified via a baseline technical proficiency test), access to a computer with internet, and ability to commit to the 16-week study period.
(2): Exclusion: Diagnosed sensory or cognitive disorders (e.g., visual agnosia, dyslexia) that could affect engagement with multimodal stimuli; concurrent enrollment in other web frontend-related courses to avoid confounding learning effects.

3.2.2. Sample Characteristics

Participants were stratified by university to ensure balanced representation across groups. This study used an independent sample t-test (for continuous variables) [45] and a chi-square test (for categorical variables) [46] to compare demographic features with baseline features. The results show that there is no significant difference between these features. The sample composition of learners is shown in Table 1. It demonstrates a balanced baseline between the control and experimental groups in key demographic characteristics and learning foundations. Specifically, gender ratios were similar; average ages were comparable; major distributions were consistent; and there were no statistically significant differences in prior programming experience, learning styles, or baseline technical proficiency. This balanced grouping design eliminated interference from baseline differences in subsequent results, laying the foundation for analyzing causal relationships between “multimodal teaching interventions” and “changes in learning behavior”, and enhancing the reliability of the findings. This sample targets novice learners in the field of web front-end development, ensuring that the research results can be applied to similar educational scenarios.

3.3. Intervention Procedures

Both the control group and the experimental group of learners participated in a 12-week web front-end development course, which consisted of 2 h of theoretical lectures and 2 h of laboratory practice per week. The technical content was identical for both groups, covering HTML5 semantics, CSS layout (flexbox and grid), JavaScript basics (variables, functions, and DOM manipulation), and responsive design.

The key distinction lay in the instructional methods: the control group received traditional teacher-centered instruction, while the experimental group engaged with a multimodal intervention designed to activate visual, auditory, and haptic sensory channels. To visually present the core differences between the two teaching modes in terms of sensory input, interactive forms, and feedback mechanisms, Figure 2 is provided above.

Figure 2 compares the traditional teaching process with the multimodal teaching intervention process, clarifies the practical logic of multimodal teaching, and highlights key innovative points.

3.3.1. Control Group: Traditional Web Front-End Teaching

As shown in Figure 2, the traditional teaching mode adopted by the control group is the mainstream approach to teaching web front-end development in higher education. The control group’s teaching materials and methods were designed to be as close as possible to the traditional teaching mode in web front-end development. This model emphasizes one-way knowledge transmission and mechanical memorization, with limited consideration given to learners’ active participation and sensory integration.

Lectures were delivered in a teacher-centered format, utilizing static PowerPoint slides and live code demonstrations projected onto a classroom screen. Instructors began each session by reviewing key concepts from the previous week (e.g., “Last time, we covered HTML structure; today we will move to CSS styling”) and then introduced new content through verbal explanations paired with text-heavy slides. For example, when teaching CSS flexbox, the instructor displayed a slide with bullet points listing properties (justify-content, align-items) and their definitions, followed by a live demonstration of code execution, typing “display: flex; justify-content: center;” into a text editor and projecting the resulting centered “<div> </div>”. Students were instructed to copy code snippets into their notebooks for later reference, with minimal opportunity for questions or interaction during the lecture.

Lab time was dedicated to individual practice, with students working through exercises from the textbook “Web Design with HTML, CSS, and JavaScript” [48]. Each task is broken down into replicable, step-by-step operation instructions, such as “creating a static navigation bar using unordered lists and basic CSS” and “setting the form style to blue background with white text”, among others. Students can participate independently and refer to the printed lecture notes, which summarize various grammar rules (such as “text-align: center” for text centering). The lecturer travels around the classroom to respond to students’ questions, but only provides feedback when students explicitly seek help. The feedback is often brief, consisting of error correction prompts (such as “missing semicolons”).

The materials used in teaching include printed course packages, PDF tutorials, and textbooks. These resources are primarily based on textual content, with minimal visual aids, including only simple screenshots of the final output effect. For instance, a tutorial on responsive design included a paragraph explaining media queries, accompanied by a single screenshot of a webpage in both desktop and mobile views, with no interactive elements or annotations.

Feedback on student work was delayed and provided in a text-based format. Students are required to submit their weekly coding assignments through the Learning Management System (LMS), and teachers will return the graded assignments with written comments within 7 days. These comments mainly revolve around misidentification (such as “missing closed </div> tag on line 15” or “incorrect value setting for flex direction”). However, the comments rarely explain the reasons for the errors and rarely mention how to avoid such errors in the future. No real-time feedback was available during coding; students often spent hours debugging without guidance, relying on trial and error to resolve issues.

Student performance was evaluated through two components: a 50% final project and a 50% written exam. The final task requires students to use HTML and CSS to build a static webpage (such as a personal portfolio or restaurant menu page), with grading based on the accuracy of the code and adherence to design specifications. The written test section includes multiple-choice questions (such as “Which CSS property is used to control text size?”) and short-answer questions that test grammar and memory (such as “Write code to create bullet lists in HTML”).

3.3.2. Experimental Group: Multimodal Web Front-End Teaching

The technical content learned by the experimental group was entirely consistent with that of the control group; however, multiple sensory modes, including visual, auditory, and haptic, were intentionally integrated into the teaching process. This multimodal teaching intervention is divided into three stages, each of which is aligned with a specific conceptual framework of learning behavior. The first stage focuses on cognitive participation, the second stage emphasizes interactive collaboration, and the third stage concentrates on the practical application of skills. As described in Figure 2.

Phase 1: Multimodal Input (Weeks 1–4)

At this stage, the most crucial thing is to apply the principles of embodied cognition, allowing abstract coding concepts and sensory experiences to blend truly. This serves two purposes: firstly, it can alleviate cognitive pressure, and secondly, it can enhance memory performance. Specifically, it is necessary to establish connections between code syntax and various sensory signals, such as visual, auditory, and haptic, so that rules that cannot be seen or touched, such as “CSS grids are used to define layout columns”, become easier to understand and remember.

(1) Visual modalities

Students use the digital design tool Figma [49] to create low-fidelity UI prototypes. For example, when learning HTML structure, students will visually present the hierarchical structure of the page by dragging and dropping modules such as “header”, “main”, and “footer”, and then directly export the corresponding HTML code snippets from the Figma file. This approach allows them to recognize the corresponding relationship between visual layout and semantic labels.

Coding exercises were completed in CodePen [50], a browser-based editor with real-time rendering. As students typed CSS (e.g., grid-template-columns: 1fr 1fr), the preview pane immediately updated to show a two-column layout, creating an instant visual feedback loop. Instructors emphasized comparing code changes to visual outcomes (e.g., Notice how adding “gap: 20px;” increases space between columns).

(2) Auditory modalities

Short (5-min) audio clips explained complex ideas using analogies. For example, JavaScript event bubbling was described as “a ripple in a pond: when you click a button (the drop), the event ripples outward to its parent elements (the water), affecting everything in its path”. These clips were played at the start of lectures and were available for download via the Learning Management System (LMS).

Custom audio cues were triggered by code actions: a high-pitched “ping” (440 Hz) for successful execution (e.g., a JavaScript function returning a value) and a low-pitched “buzz” (100 Hz) for syntax errors. Students reported that these sounds became “conditioned cues”—the “buzz” prompted them to check for typos, while the “ping” reinforced correct syntax.

(3) Haptic modalities

With the help of the Web Vibration API, browsers will trigger vibration feedback for critical errors. When encountering minor issues (such as missing semicolons), a 200-millisecond pulse will be emitted. When encountering a significant logical error (such as an infinite loop), a vibration of 500 milliseconds will occur. This physical, haptic reminder can help students associate specific errors with haptic signals. For example, they will realize that if the vibration time is long, it means I forgot to close the loop.

The Logitech G300s mouse used by the students features adjustable pressure sensitivity. When clicking on UI elements in CodePen (such as buttons in the preview pane), the mouse produces a slight vibration (lasting 100 milliseconds), which strengthens the association between operational actions (clicking) and numerical feedback (button activation).

Phase 2: Multimodal Collaboration (Weeks 5–8)

This phase shifted to social interaction, using multimodal tools to enhance peer collaboration and feedback exchange. By integrating visual, auditory, and haptic cues into group work, the intervention aimed to fulfill SDT’s “relatedness” need, fostering more frequent and meaningful interactions.

(1) Group projects

4-person teams designed a responsive “online portfolio” (requiring HTML, CSS grid, and basic JavaScript). Teams co-created UI designs, featuring elements such as “voice comments” (recording verbal feedback directly on design elements) and “version history” (visual timelines of changes). For example, a student might leave a voice note on a navigation bar: “Let us make this collapse on mobile—here is a sketch of the hamburger menu”.

Using vs. Code Live Share, teams can edit code in real-time. A haptic alert (300 ms mouse vibration) notified members when a teammate modified their code (e.g., “Li edited your CSS for the header”). This reduced conflicts and encouraged verbal check-ins (“Hey, I changed the color—does that work for you?”).

(2) Multimodal peer review

Teams provided feedback using three channels: video walkthroughs (screen-recording UI critiques with narration), annotated code snippets (PDFs with yellow highlights on problematic lines), and 10-min Zoom calls to discuss revisions.

(3) Teacher–student interaction

Instructors used a “multimodal feedback matrix” during labs, pointing to visual errors on a shared screen, explaining logic verbally, and demonstrating fixes by typing code (haptic for the student observing). For example, when a student struggled with media queries, the instructor: (1) showed a split-screen of desktop/mobile views (visual), (2) explained, “The breakpoint at 768px is not triggering because of a typo” (auditory), (3) corrected max-width 768 to max-width 768px while the student watched (haptic observation).

Phase 3: Multimodal Assessment (Weeks 9–12)

The evaluation method embodies the teaching philosophy of reinforcing learning behavior and alleviating evaluation anxiety [51]. This intervention method enables students to showcase their skills through various means, including visual, auditory, and haptic approaches. In this way, the evaluation method can be aligned with the way they learn knowledge. After all, learning relies on multiple senses, and now it is very smooth to use these channels to test how skills are used.

(1): Formative quizzes

Weekly 20-min assessments included:

•: Visual debugging: Identifying layout flaws in screenshots (e.g., “Why is the text overlapping on mobile?”) and drag-and-dropping CSS fixes (e.g., overflow: hidden).
•: Auditory analysis: Listening to a code narration (“I wrote flex direction: column-reverse—what will the layout look like?”) and selecting the correct visual outcome from options.
•: Haptic coding: Writing JavaScript functions with real-time feedback: vibrations for syntax errors and “pings” for valid logic, with a progress bar filling up as the code neared completion.

(2): Final project presentation

Teams showcased their portfolios in 15-min sessions using:

•: Visual demos: Screen-sharing to highlight responsive design (e.g., “Watch how the grid reflows from 3 columns on desktop to 1 on mobile”).
•: Auditory explanations: Narrating technical choices (e.g., We used “position: sticky” for the header so it stays visible when scrolling—here is why that improves the user experience).
•: Haptic interaction: Demonstrating functionality (e.g., clicking a “filter” button to sort portfolio items) using the haptic mouse, with vibrations confirming successful interactions.

3.4. Measures

To comprehensively capture the behavioral changes caused by multimodal teaching, a series of quantitative and qualitative measurement methods were used in the study, collecting data at four time points: pre-test (week 0), mid-test (week 6), post-test (week 12), and follow-up (week 14). All measurement tools have been screened to ensure their effectiveness in the fields of education and behavioral research. At the same time, to adapt to the cultural background of the target population (Chinese undergraduate students) and ensure clear expression, additional pilot tests were conducted (sample size, n = 20). Below, we will provide detailed information on various measurement tools, including their implementation, the psychological measurement attributes they possess, and the scoring criteria used.

3.4.1. Cognitive Behavior Measures

Cognitive behavior refers to various processes related to attention, memory, and problem-solving [52]. In the study, three validated tools were used to evaluate them, each targeting different dimensions of cognitive engagement.

(1): Cognitive Load

Cognitive load was measured using the NASA Task Load Index (NASA-TLX), a widely used tool in educational and human factors research [53]. The scale quantifies perceived mental effort across seven dimensions, chosen for their relevance to complex technical tasks like coding:

Mental demand: “How mentally demanding was the task of writing responsive CSS?”

Physical demand: “How physically tiring was typing and debugging code?”

Temporal demand: “How hurried or rushed was the pace of completing the coding exercise?”

Performance: “How successful were you in achieving your coding goals?”

Effort: “How hard did you have to work to complete the task?”

Frustration: “How irritated or stressed did you feel while debugging?”

Perceived success: “How satisfied were you with your final code?”

Each item was rated on a 10-point Likert scale (1 = “very low”, 10 = “very high”) [54]. For the performance and perceived success items, reverse scoring was applied (higher scores indicating lower performance/satisfaction) to align with the scale’s convention, where higher total scores reflect greater load. Students are required to complete the NASA-TLX scale [55] immediately after each experimental class (once a week) and the main evaluation stages (mid-test and post-test).

(2): Attention Duration

The measure of attention duration [56] is set as the total amount of time students spend engaged in task-related activities (such as coding, listening to instructions, and asking questions) compared to the total amount of time they engage in distracted behavior (such as using smartphones and engaging in unrelated conversations).

Trained research assistants conducted live observations during 10 randomly selected 5-min intervals per class session (50 min of observation per week). Observers used a standardized checklist to code each minute as:

•: On-task: Engaged in coding, following instructor demonstrations, taking notes, asking task-related questions, or collaborating with peers on technical problems.
•: Distracted: Looking at non-course materials (e.g., social media), talking about non-technical topics, or passively staring at the screen without interaction.

Observers were blinded to group assignment to avoid bias and underwent 10 h of training to ensure consistency. Training included practice coding videos of pilot classes, with feedback from the principal investigator until inter-rater reliability exceeded κ = 0.80.

For the main study, 20% of observations were double-coded, yielding an intraclass correlation coefficient (ICC) of 0.82 (95% CI [0.76, 0.88]), indicating strong agreement. Divergent opinions are resolved through consensus meetings. The calculation method for attention duration is based on the percentage of observation minutes marked as “task in progress” and the average of all time intervals at each time point (pre-test, mid-test, post-test).

(3): Problem-Solving Accuracy

We have designed a standardized set of debugging tasks specifically designed to evaluate students’ accuracy in problem-solving [57]. The purpose of these tasks is straightforward: to assess whether students can identify technical errors in the web front-end code and correct them. This task contains 10 code snippets, specifically 3 in HTML, 4 in CSS, and 3 in JavaScript. They contain various errors commonly encountered in web front-end development, such as:

•: HTML-related issues include missing ending tags (e.g., only <div> without adding the corresponding </div>) and improper use of semantic elements.
•: CSS: Invalid property values (e.g., flex direction: horizontal), misplaced selectors (e.g., styling class with ‘#’).
•: JavaScript: Undefined variables, logic errors in event handlers (e.g., a button click failing to trigger a function).

Each snippet was paired with a brief description of the intended outcome (e.g., “This code should display a red button that alerts ‘Hello’ when clicked—why isn’t it working?”). Students were given 30 min to identify and fix as many errors as possible, with access to their notes but not external resources (e.g., Baidu, textbooks). The task was piloted with 20 non-participating students to ensure difficulty equivalence across time points (pre-test, mid-test, post-test). Item analysis confirmed that no single error was trivial (≥80% correct) or overly complex (≤20% correct), with an overall Cronbach’s α = 0.87, indicating good internal consistency. The calculation method for accuracy is: the percentage of correctly resolved errors, expressed as (number of fixed errors/10) × 100. To determine whether a solution is correct, two conditions must be met: one is to correct the error completely, and the other is to achieve the expected effect (for example, the button changes color and triggers an alarm). If only a partial repair is completed (such as solving an existing problem but creating a new one), it will not be considered the correct result.

3.4.2. Interactive Behavior Measures

The measurement of interactive behavior encompasses multiple dimensions [58], including peer collaboration, teacher–student communication, and feedback applications, while incorporating observation coding and target tracking. This measurement method ensures the ecological validity of assessing social participation.

(1): Peer Collaboration Frequency:

Peer collaboration [59] was defined as verbal or digital exchanges focused on technical problem-solving, including discussions, idea contributions, and code reviews. Four lab sessions per group (16 h total) were video-recorded using stationary cameras positioned to capture group interactions without obstructing students’ work. Recordings were anonymized by blurring faces and removing audio of non-technical conversations. Two trained coders (blinded to group) used a structured coding manual to identify three types of collaborative interactions:

•: Technical discussions: Verbal or chat-based exchanges about code logic (e.g., “How do we make this div responsive on mobile?”).
•: Idea contributions: Proposing solutions or design choices (e.g., “Let us use grid-template-areas instead of flexbox for this layout”).
•: Code reviews: Providing feedback on peers’ work (e.g., “Your JavaScript function is missing a return statement” or “This CSS selector could be more specific”).

Each interaction was timestamped, and coders noted the modality used (e.g., in-person speech, voice chat, text comments, screen-sharing). Twenty percent of recordings were double-coded, yielding a Cohen’s κ = 0.85 (95% CI [0.79, 0.91]) [60], indicating strong agreement. Discrepancies were resolved via discussion with a third coder. Collaboration frequency was calculated as the number of interactions per task (e.g., per hour of group work), averaged across all recorded sessions for each time point.

(2): Teacher–Student Interaction

Researchers counted the number of student-initiated questions and feedback requests per class. It was operationalized as the frequency of student-initiated questions and feedback requests, measured to assess engagement with instructors. Research assistants attended all lectures and lab sessions, recording each student-initiated interaction using a digital log. Entries included:

•: The type of interaction (question vs. feedback request).
•: The complexity of the question (basic: e.g., “What is the syntax for a media query?”; advanced: e.g., “Why does position: fixed behave differently in Safari?”).
•: The instructor’s response (e.g., verbal explanation, code demonstration).

Total interaction frequency was calculated as the number of questions and feedback requests per class session, averaged across weeks for each time point. Separate analyses were also conducted for basic vs. advanced questions to assess changes in cognitive engagement.

(3): Feedback Utilization

Percentage of teacher/peer suggestions incorporated into revised code, verified via GitHub logs (GitHub V 2.50.1). It was defined as the percentage of teacher or peer suggestions incorporated into revised code, measuring students’ ability to act on external input. All code submissions (weekly assignments, mid-test projects) were tracked using GitHub Classroom V 0.0.4, which records version history and allows comparison of original and revised files.

For each submitted content, researchers will count two aspects of information: first, the number of feedback suggestions provided, and second, the actual number of suggestions incorporated into the revised code. To ensure accurate statistics, two researchers will independently review 20% of the submitted materials. If there are differences of opinion, they will reach a consensus through joint discussion and deliberation. Suggestions were considered “implemented” only if they were fully adopted and did not introduce new errors. Utilization rate was calculated as (number of implemented suggestions/total suggestions) × 100.

3.4.3. Persistent Behavior Measures

Persistent behaviors, including voluntary practice, skill extension, and intrinsic motivation, were measured to assess long-term engagement beyond required coursework.

(1): Post-Class Practice Time

It was defined as the number of hours students spent on voluntary coding activities outside of required lectures and labs (e.g., personal projects, online tutorials, coding challenges). Self-reported weekly hours on voluntary coding (e.g., personal projects), validated via GitHub commit logs (r = 0.83, p < 0.001).

Students self-reported weekly practice time via a secure online form, with questions such as: “How many hours did you spend coding for fun or learning new skills (not for assignments) this week?” The feedback from students has been verified through GitHub submission logs, which record the frequency and duration of coding activities in the public code repository. The data show a high correlation (r = 0.83, p < 0.001) between the duration of self-reported activities by students and the results recorded in GitHub logs, confirming the accuracy of self-reported content. For students without public repositories (n = 12), practice time was verified through interviews (e.g., “What projects did you work on this week? How long did they take?”). Practice time was averaged across weeks for each time point (pre-test, mid-test, post-test), with follow-up data collected at Week 14 to assess sustainability.

(2): Skill Extension

Skill extension was measured as participation in optional workshops on advanced web front-end topics, designed to assess motivation to expand technical competence. Attendance at optional workshops is tracked via sign-in sheets. Five workshops were offered biweekly during the intervention, including: “Introduction to React Hooks”, “Advanced CSS Animations”, “Responsive Design for Accessibility”, “JavaScript Async/Await Patterns”, and “UI/UX Testing with Figma”.

Workshops were advertised equally to both groups via course emails and posters, with no extra credit offered to avoid coercing participation. Attendance was tracked through sign-in sheets, where students provided their names and student IDs. Researchers cross-referenced sign-in data with course rosters to confirm participation. Skill extension was operationalized as a binary variable (attended vs. did not attend) for each workshop, with an overall participation rate calculated as (number of workshops attended/5) × 100.

(3): Intrinsic Motivation

Intrinsic motivation refers to participation behavior driven by interest or pleasure [61], which was measured using a 12-item scale based on self-determination theory [16]. This scale is suitable for technology learning scenarios [62]. Sample items included: “I enjoy coding web front-end projects, even when they are challenging”, “I look forward to learning new HTML/CSS/JavaScript skills”, “I would code web front-end projects in my free time, even if I did not have to”, Items were rated on a 5-point Likert scale (1 = “strongly disagree”, 5 = “strongly agree”). Students completed the scale at pre-test, mid-test, and post-test, with instructions to rate their feelings “about coding web front-end projects in general”.

3.4.4. Qualitative Measures

Collecting qualitative data enables a deep exploration of students’ genuine learning experiences, analyzes the mechanisms behind behavioral changes that are not readily observable, and supplements the results of quantitative research with content rich in background information [63].

(1): Semi-Structured Interviews:

After the test, the researchers conducted semi-structured interviews with 20 students (10 in each group), focusing on their perspectives on teaching methods, changes in their behavior, and various factors that affect participation. In addition, these 20 students (10 from each group) also participated in follow-up interviews after the test, aiming to understand their views on learning patterns, behavioral changes, and the challenges they encountered. These interviews, which lasted from 30 to 45 min, were recorded, transcribed, and anonymized.

The interview outline includes a series of open-ended questions, such as:

“How would you describe the process of learning web front-end development in this course?”

“Have you noticed any changes in the way you learn or code outside of class? If so, what are the reasons behind it?”

“What aspects of teaching methods have been helpful for your learning, and what obstacles have arisen? What are the reasons?”

“What impact does feedback from teachers or classmates have on your motivation to write code?”

The duration of each interview is 30 to 45 min, recorded with the interviewee’s consent, and conducted in a separate room to ensure the interviewee can relax and express themselves freely. During the process, the interviewee will be guided to provide more detailed answers through questioning (such as “Can you give a specific example?”). The recorded content will be transcribed word for word and anonymized (for example, the experimental group respondents will be labeled as “Student E7”, while the control group will be labeled as “Classmate C3”). The transcribed text will be imported into NVivo 12 software for thematic analysis [64], with the coding work focusing on identifying patterns related to multimodal experiences, cognitive load, and learning motivation.

(2): Learning Logs

Students are required to write a weekly digital log, documenting their learning activities, any difficulties they encounter, and the solutions they use. In this way, we can observe their daily behavior from a vertical perspective. By analyzing the impact of each sensation using log data, new analyses can be conducted. We extracted relevant data from the learning logs and found that students’ perceptions of different sensory feedback are indeed different. For example, when students received haptic feedback for code errors, they were more likely to correct the errors immediately. We added this analysis and relevant data to the results section. The logs will include entries on the time spent each week, methods and strategies used (such as “Using Visual Preview to Fix CSS Grid Problems”), and each log will have structured prompts, such as “How much time did you spend on web front-end coding today? Please break it down by task type (such as homework, exercises, tutorials, etc.)”., “What learning methods or tools did you use (such as videos, textbooks, code editors with real-time preview function, etc.)?”, “What was your biggest challenge, and how was it solved?”, and “What made you feel successful or frustrated today? What is the reason?”.

The log content will be coded and classified, covering the mentioned learning modes (such as “using visual preview” and “listening to coding podcasts”), emotional states (such as “feeling frustrated while debugging” and “feeling excited while running code”), and strategies adopted (such as “seeking help from classmates” and “reviewing classroom notes”). This enables researchers to track the dynamic trajectory of behavior and its impact over time. These evaluation methods collectively constitute a comprehensive exploration of cognitive activities, interactive processes, and sustained behaviors—rigorously testing the actual effectiveness of multimodal teaching, while also delving into its potential mechanisms of action. This study combines the rigor of quantitative research with the in-depth analysis of qualitative research, revealing the specific content of changes and explaining the underlying reasons for changes, thereby enhancing the persuasiveness of research results.

4. Results

This section presents the results of a quasi-experimental study that focuses on three core behavioral domains: “cognition”, “interaction”, and “persistence”, and draws complementary insights by combining mediation analysis and qualitative data. All quantitative results achieved a significance level of p < 0.05, and the effect sizes (η², Cohen’s d, φ) were reported to demonstrate the practical significance of the observed differences.

4.1. Cognitive Behavior Outcomes

Cognitive behavior is reflected explicitly in cognitive load, attention duration, and problem-solving accuracy. Researchers evaluated these aspects at three time points (pre-test, mid-test, and post-test). Through repeated measures analysis of variance, it was found that all three results had significant group time interaction effects, indicating that multimodal teaching changed students’ cognitive engagement status with its uniqueness during the 12-week intervention process.

4.1.1. Cognitive Load

The cognitive load showed significant intergroup and temporal interaction effects (F (2, 238) = 18.73, p < 0.001, η² = 0.14), reflecting the difference in cognitive load change patterns between the control group and the experimental group. Regarding the cognitive load scores of the two groups at different time points, the specific data are presented in Table 2.

From the data in Table 2, it is evident that during the pre-test phase, each group exhibited a comparable cognitive load. This confirms that before the intervention, students in both groups experienced similar levels of mental effort when engaging with web front-end development tasks, such as writing basic HTML or debugging simple CSS. During the mid-term trial phase, the experimental group demonstrated a 19.3% reduction, whereas the control group remained stable. This early divergence suggests that multimodal inputs, including live code previews and error-specific sound cues, quickly alleviate mental effort, particularly for tasks such as styling layouts with CSS flexbox. As one experimental student noted, “It is less confusing when I can see changes instantly” (Student E5). In the post-test stage, the experimental group’s load decreased by 34.9% reduction from the pre-test, whereas the control group showed no significant change. ANCOVA, controlling for pre-test scores [65], confirmed the experimental group’s lower post-test load (F (1, 119) = 78.24, p < 0.001, η² = 0.39), indicating that multimodal teaching reduced perceived mental effort.

4.1.2. Attention Duration

The study found a significant group × time interaction effect on attention duration (F (2, 238) = 21.46, p < 0.001, η² = 0.15), which is manifested in the experimental group students’ attention maintenance duration, showing a longer trend over time. The attention duration data for the two groups of students at different time points are presented in Table 3.

From the data in Table 3, it can be observed that there is little difference in attention duration between the two groups of students during the pre-test stage. According to observation, within 20 min of the experimental course’s start, both groups of students frequently checked their phones or engaged in conversations unrelated to the course. At the mid-test stage, the experimental group’s attention increased to 22.3 min, while the control group remained at 16.1 min. Observers point out that experimental students are more likely to self-correct distractions: “When students’ phones light up, they take a glance, but soon start coding again after receiving a vibration alert of a grammar error” (Observer 2). In the post-test stage, the experimental group maintained continuous attention for 28.6 min, representing an 80.7% increase compared to the pre-test. In contrast, the control group maintained it for 15.9 min. This large effect size (d > 0.8) indicates a meaningful difference: the experimental group students completed tasks in almost twice the time as their peers in the control group.

Notably, observational notes linked this to multimodal feedback: “Students in the experimental group rarely checked phones during coding; they were focused on resolving errors flagged by visual highlights, sounds, and vibrations” (Observer 1). In the experimental group, a significant negative correlation was found between attention duration and cognitive load (r = −0.67, p < 0.001). This means that when the psychological burden on learners is reduced, more cognitive resources will be available to call upon, which in turn can help students maintain a focused state. In contrast, the control group did not show a significant correlation (r = −0.12, p = 0.31), further indicating that they were consistently in a high cognitive load state, which hindered their ability to maintain attention.

4.1.3. Problem-Solving Accuracy

There was a significant group × time interaction effect in terms of debugging accuracy (F (2, 238) = 32.17, p < 0.001, η² = 0.21), with the experimental group showing faster improvement. Problem-solving accuracy across time points by group is shown in Table 4.

Both groups struggled most with JavaScript logic errors and responsive design flaws, with fewer than 50% resolving these issues correctly. The accuracy of these two groups was comparable. During the mid-test phase, the experimental group achieved an accuracy rate of 68.5%, whereas the control group improved to 54.7%. The most significant gap occurred in CSS grid debugging, where the experimental group students solved 72% of grid-related errors, compared to 49% for the control group. This may be due to the experimental group using a visual grid preview and haptic feedback to correct misalignment. In the post-test phase, the experimental group achieved an accuracy of 82.4%, representing a 59.1% increase from the pre-test and significantly outperforming the control group. This large effect size indicates that multimodal learning nearly doubled the experimental group’s ability to resolve complex errors.

In summary, the cognitive behavioral indicators of the experimental group, including “cognitive load”, “attention duration”, and “problem-solving accuracy”, all showed significant changes over time, while the control group remained relatively stable. The intergroup differences in cognitive behavioral indicators are shown in Figure 3.

From the slope difference of the line in Figure 3, it can be seen that multimodal teaching has a much greater improvement in cognitive behavior than traditional teaching.

4.2. Interactive Behavior Outcomes

When analyzed from the perspective of interactive behavior, significant differences are observed in the frequency of peer collaboration, teacher–student interaction, and feedback utilization between the two student groups, and these differences are closely related to the experimental group’s use of multimodal collaboration tools [66].

4.2.1. Peer Collaboration Frequency

A significant group × time interaction was found (F (2, 238) = 19.82, p < 0.001, η² = 0.14), indicating that the experimental group engaged in more technical interactions. The frequency of peer collaboration across time points by group is shown in Table 5.

In the pre-test, the collaboration frequency between the two groups was similar; most interactions were brief (e.g., “Where is the CSS file?”) and focused on logistics rather than problem-solving. In the mid-test, the experimental group experienced 2.4 interactions, compared to 1.5 in the control group. Video coding revealed that experimental group interactions were more technical: “Teams debated flex-wrap vs. flex-shrink while sharing a Figma preview, with one student drawing layout ideas on the board and another explaining the code” (Coder 1). In the post-test, the experimental group collaborated 3.2 times per task, representing a 113.3% increase from the pre-test, whereas the control group remained at 1.5 times per task. This large effect size reflects a shift from passive individual work to active collaboration: experimental students spent 42% of lab time in technical discussions, compared to 18% in the control group.

Notably, Video coding revealed that the experimental group used mixed modalities, teams alternated between drawing layouts on Figma (visual), explaining logic over voice chat (auditory), and passing the keyboard to debug (haptic). There is a significant difference in the quality of cooperation between the two groups; the interaction in the experimental group often combines multiple forms (such as verbal explanations with visual sketches). In contrast, the interaction in the control group is mainly carried out through text, such as sharing code snippets via email, and communication and discussion are also relatively brief.

4.2.2. Teacher–Student Interaction

A significant group × time interaction was observed (F (2, 238) = 24.19, p < 0.001, η² = 0.17), with the experimental group raising more questions and engaging in more in-depth discussions. The frequency of teacher–student interaction across time points by group is shown in Table 6.

In the pre-test, most of the questions in both groups focused on basic syntax (e.g., “How to link CSS files?”). Their interaction frequency is comparable. In the mid-test, the experimental group asked 9.2 questions, whereas the control group asked 5.4 questions. The experimental group’s question turned to a conceptual question: “Why does location behave differently in Chrome and Firefox?” Reflects a deeper involvement in the subtle differences in technology. In the post-test, the experimental group’s questions increased to 12.6, representing a 129.1% increase compared to the pre-test, while the control group’s questions remained at 5.5. The teacher pointed out that the problems in the experimental group usually involve multimodal feedback: “Students ask, ‘Can vibration intensity be adjusted for different types of errors?’ indicating that they are actively trying the tool” (Teacher 1). Teachers noted a shift in question complexity: “Experimental group students asked, ‘Why does the haptic feedback lag in Chrome?’ instead of basic syntax questions” (Instructor 3).

4.2.3. Feedback Utilization

A significant group × time interaction was observed (F (2, 238) = 31.74, p < 0.001, η² = 0.21), with the experimental group providing more suggestions in the revised code. Feedback utilization rates across time points by group are shown in Table 7.

In the pre-test, both groups were working hard to apply feedback, and less than half of the suggested revisions (such as “adding alt text to images”) were implemented. The utilization rates of the two groups are similar. During the mid-test, experimental students cited the clarity of multimodal feedback, as evidenced by the following comment: “The video exercise accurately displayed the location of my mesh fracture, so I know how to fix it” (Student E23). The experimental group received 62.5% feedback compared to the control group, which received 41.0%. In the post-test, GitHub logs confirmed that the experimental group students were 2.3 times more likely to provide feedback within 24 h compared to the control group, which took an average of 3 to 5 days. The utilization rate of the experimental group reached 78.3%, representing a 90.0% increase compared to the pre-test, whereas the control group remained at 41.2%.

In summary, the interactive behavior indicators of the experimental group, including “peer collaboration frequency”, “teacher–student interaction”, and “feedback utilization”, all showed significant changes over time, while the control group remained relatively stable. The intergroup differences in interactive behavioral indicators are shown in Figure 4.

From the slope difference of the line in Figure 4, it can be seen that multimodal teaching yields a significantly greater improvement in interactive behavior compared to traditional teaching.

4.3. Persistent Behavior Outcomes

Persistent behaviors, including voluntary practice, skill extension, and intrinsic motivation, were enhanced in the experimental group, with effects sustained at follow-up.

4.3.1. Post-Class Practice Time

There was a significant group time interaction (F (2, 238) = 24.61, p < 0.001, η² = 0.17), and the experimental group spent more time on voluntary coding. Post-class practice time across time points by group is shown in Table 8.

In the pre-test, most exercises focus on completing assignments with little exploration of advanced topics. The practice time of the two groups was similar. In the mid-test, the experimental log cited an interesting project: “We built a responsive recipe page using the knowledge we learned and added interesting animations” (Student E9). The experimental group practiced for 9.8 h per week, whereas the control group practiced for 5.7 h. In the post-test phase, GitHub data confirmed that experimental students submitted 3.2 times more to their repositories than other students, with project scopes ranging from interactive calculators to portfolio websites. The experimental group practiced for 14.2 h per week, representing a 149.1% increase from the pre-test, while the control group remained at 5.7 h.

4.3.2. Skill Extension

At follow-up (Week 14), analysis using a chi-square test revealed that 87% of the experimental group attended advanced workshops (e.g., React, Vue.js), compared to 41% of the control group (χ² (1) = 29.4, p < 0.001, φ = 0.49). This moderate to large effect size suggests that multimodal learning promotes motivation to apply skills beyond the course. Interview data linked workshop attendance to confidence: “Nailing the final project with multimodal tools made me think that I can also learn React” (Student E23). In contrast, control group students cited self-doubt: “I barely passed the course—why try something harder?” (Student C7). The motivation scores for skill extension at different time points, divided by groups, obtained through t-test analysis, are shown in Table 9.

The data in Table 9 indicate that multimodal teaching can significantly enhance students’ motivation to expand their skills in web front-end development courses, and this improvement shows a trend of increasing over time, with significant differences in practical significance.

4.3.3. Intrinsic Motivation

A significant group × time interaction was found (F (2, 238) = 28.36, p < 0.001, η² = 0.19), and the experimental group reported higher motivation. Intrinsic motivation scores across time points by group are shown in Table 10.

In the pre-test, these two groups primarily used “course requirements” as the reason for their coding, and their motivation scores were comparable. In the mid-test, experimental students often mention the fun aspect: “I am coding now to see what I can build, and it is fun, not just for the grades” (Student E15). The experimental group scored 3.5, while the control group scored 2.8. In the post-test, the motivation of the experimental group reached 4.2, representing a 50.0% increase compared to the pre-test, whereas the control group remained at 2.8. The magnitude of this huge effect reflects the shift from extrinsic motivation to intrinsic motivation, with experimental students listing “enjoyment” and “curiosity” as their primary reasons for encoding.

In summary, the persistent behavior indicators of the experimental group, including “post-class practice time”, “skill extension”, and “feedback utilization”, all showed significant changes over time, while the control group remained relatively stable. The intergroup differences in persistent behavioral indicators are shown in Figure 5.

From the slope difference of the line in Figure 5, it can be seen that multimodal teaching yields a significantly greater improvement in persistent behavior compared to traditional teaching.

4.4. Mediation Analyses

Mediation analyses examined whether cognitive load and intrinsic motivation accounted for the relationship between multimodal teaching and key outcomes [67]. The mediation analysis examined whether cognitive load and intrinsic motivation explained the relationship between multimodal teaching and key outcomes. Its mechanism has independence and complementarity. Figure 6 illustrates the mechanism of multimodal teaching through two paths: “cognitive optimization” (Model 1) and “motivation-driven” (Model 2). It not only improves immediate task performance by reducing cognitive burden, but also promotes long-term learning behavior by enhancing intrinsic motivation. Together, they form a stable support for the effectiveness of multimodal teaching. The effect values of all paths were validated using a 95% confidence interval (excluding 0). These intermediary mechanisms are not accidental results, but the key logic behind the effectiveness of multimodal teaching. The roadmap transforms abstract mediation analysis results into intuitive causal chains, providing educators with clear intervention directions (such as designing low-load teaching activities and strengthening motivation support).

4.4.1. Cognitive Load as a Mediator of Problem-Solving Accuracy

Cognitive load plays a significant mediating role in the impact of multimodal teaching on problem-solving accuracy. As shown in Figure 6, the effect value of multimodal teaching on cognitive load is β = −0.32 (95% confidence interval [−0.45, −0.19]). The total effect of multimodal teaching on problem-solving accuracy is β = 0.58 (95% confidence interval [0.42, 0.74]). After incorporating cognitive load, the direct effect is β = 0.26 (95% confidence interval [0.10, 0.42]), indicating that the decrease in cognitive load can explain approximately 32% of the improvement in accuracy.

This suggests that multimodal teaching enhances problem-solving skills by reducing psychological effort and freeing working memory to focus on error detection and resolution, rather than dealing with information overload.

4.4.2. Intrinsic Motivation as a Mediator of Post-Class Practice Time

Intrinsic motivation plays a significant mediating role in the impact of multimodal teaching on the time spent on voluntary practice. As shown in Figure 6, the effect value of multimodal teaching on intrinsic motivation is β = 0.41 (95% confidence interval [0.28, 0.54]). The total effect of multimodal teaching on voluntary practice time is β = 0.67 (95% confidence interval [0.52, 0.82]). After incorporating intrinsic motivation, the direct effect is β = 0.26 (95% confidence interval [0.09, 0.43]), indicating that the enhancement of intrinsic motivation can explain approximately 41% of the increase in practice time.

This is consistent with the perspective of autonomy theory: multimodal teaching enhances students’ intrinsic motivation to actively participate in practical activities by meeting their autonomy needs (such as choosing learning modes) and sense of ability (such as receiving immediate feedback).

4.5. Qualitative Findings

Through thematic analysis of interview content and study logs, four key themes were identified and extracted, which corroborate the results of quantitative research.

4.5.1. Embodied Memory Enhances Problem-Solving

Experimental group students linked haptic/auditory inputs to code recall: “I can feel when I miss a semicolon, the keyboard vibrates, and my fingers automatically fix it” (Student E7). This mirrors the quantitative findings of improved debugging accuracy.

4.5.2. Multimodal Feedback Reduces Frustration

Students reported lower frustration with multimodal cues: “Red highlights and buzzes pinpoint errors instantly. With text-only feedback, I would waste 10 min guessing” (Student E12), which explains the reduced cognitive load.

4.5.3. Collaboration Feels “More Natural” with Mixed Modalities

Experimental group students described multimodal collaboration as engaging: “Talking over voice while drawing on Figma feels like working in the same room” (Student E23), which supports a higher frequency of interaction.

4.5.4. Autonomy and Competence Drive Persistence

Students cited modality choice and feedback as motivating: “I chose videos to learn flexbox, so I felt in control. When the code ‘pinged’ successfully, I wanted to keep coding” (Student E9), aligning with increased practice time.

4.5.5. Single Sensory Effect Analysis

Further analysis of learning logs shows that there are differences in the effects of different sensory modalities: visual feedback (real-time preview) contributes significantly to CSS layout tasks (accuracy improvement of 42%), auditory cues (syntax error sounds) accelerate JavaScript debugging (average time reduction of 28%), while tactile feedback (vibration) performs the best in complex logic tasks (error correction rate improvement of 35%). This indicates that multimodal collaboration is not simply a superposition, but rather the complementary enhancement of various senses in specific tasks.

In summary, these research results indicate that multimodal teaching has changed students’ cognition, interaction, and sustained behavior in web front-end development teaching. The effectiveness of this approach benefits from the reduction of cognitive load, the enhancement of intrinsic motivation, and the strengthening of embodied memory. The consistency between quantitative and qualitative research results further enhances the credibility of these conclusions, highlighting the important value of multimodal teaching in promoting the transformation of learning from passive acceptance to active participation.

5. Discussion

This study examines the impact of multimodal learning theory on cognitive, interactive, and sustained learning behaviors in web front-end development education. The research results indicate that integrating visual, auditory, and haptic modes can significantly enhance learning outcomes, achieved through embodied cognition, reduced cognitive load, and the fulfillment of motivational needs. Next, we will discuss these findings in conjunction with behavioral science theory, practical value, research limitations, and future research directions.

5.1. Multimodal Learning Shapes Cognitive Behaviors Through Embodied Cognition and Reduced Load

The experimental group demonstrated improvements in problem-solving accuracy and extended attention duration. It reduced cognitive load, which is consistent with the expectations of embodied cognition theory [14] and cognitive load theory [15]. Among them, haptic feedback (such as keyboard vibrations caused by grammar errors) and auditory cues (such as the “ding dong” sound when code runs successfully) jointly construct a “motion sensory memory” that associates physical operations with abstract coding rules. As described by the students, these sensory inputs can transform abstract concepts such as “JavaScript event bubbles” into perceptible experiences (“like the feeling of dominoes falling one after another”). This aligns with Lakoff’s assertion [14] that knowledge is grounded in bodily interactions, which explains why the experimental group exhibited a 59.1% higher debugging accuracy and sensory–motor integration strengthened memory for code logic and error patterns. By distributing information across visual (live previews), auditory (narration), and haptic (vibration) channels, multimodal teaching reduced extraneous load by 34.9%. For example, explaining CSS grid through diagrams and verbal explanations prevented visual overload, freeing working memory for problem-solving. This reduction in mental effort directly mediated improved problem-solving accuracy (32% of the effect), confirming that cognitive load is a critical mechanism in technical skill acquisition. The experimental group’s 80.7% longer attention duration likely stemmed from the dynamic, multi-sensory nature of the intervention. Unlike static lectures, multimodal feedback (e.g., a red highlight and buzz for errors) continuously re-engaged students, preventing attention decay. This finding is consistent with previous research, which has demonstrated that different sensory inputs can sustain arousal during complex tasks [22].

5.2. Multimodal Collaboration Fulfills Relatedness Needs, Enhancing Interactive Behaviors

The experimental group’s increased frequency of collaboration, teacher–student interaction, and feedback utilization reflects the role of multimodal tools in meeting the self-determination theory’s (SDT) need for relatedness.

Multimodal collaboration tools (shared Figma boards, voice chat) created a “shared cognitive space” where ideas were communicated through complementary modalities, including visual sketches for layout, verbal explanations for logic, and haptic code edits. This aligns with Niemiec and Ryan’s observation [36] that social connection is strengthened when interactions leverage multiple communication channels, explaining why the experimental group collaborated 2.3 times more frequently.

Feedback utilization nearly doubled in the experimental group, as multimodal cues (e.g., video walkthroughs and annotated code) made suggestions more actionable than text-only comments. This supports the idea that feedback effectiveness depends not just on content but on delivery—sensory-rich feedback is easier to interpret and apply, particularly in technical domains where spatial and logical reasoning intersect [27].

Deeper teacher–student interaction: The shift from basic to complex questions in the experimental group (e.g., “Why does haptic feedback differ by browser?”) indicates higher cognitive engagement. Multimodal teaching likely empowered students to explore nuances they might otherwise overlook, thereby fostering a more iterative and inquiry-driven learning environment.

5.3. Persistent Behaviors Are Driven by Autonomy and Competence Needs

The experimental group’s extended practice time, workshop participation, and higher intrinsic motivation highlight the role of multimodal learning in meeting SDT’s autonomy and competence needs. Students reported that choosing modalities (e.g., video tutorials vs. interactive editors) fostered a sense of control, aligning with Reeve’s finding [35] that autonomy predicts voluntary engagement. This explains why the experimental group spent 149.1% more time on post-class practice, learning felt self-directed rather than imposed. Immediate, multi-sensory feedback (e.g., a green checkmark and “ping” for valid code) provided clear evidence of progress, boosting self-efficacy. As one student noted, “That sound meant ‘I got it’ and I wanted to hear it again”. This aligns with Ryan and Deci’s assertion [36] that competence feedback fuels intrinsic motivation, which mediated 41% of the increase in practice time. Eighty-seven percent of the experimental group attended advanced workshops, reflecting sustained motivation to build on their successes. This suggests that multimodal learning not only enhances immediate performance but also cultivates a growth mindset, critical for long-term professional development in rapidly evolving fields like web development.

5.4. Practical Implications

The findings offer actionable strategies for web front-end development, teaching, and technical skill training more broadly:

(1): For educators: Design multimodal sequences that pair abstract concepts with sensory inputs (e.g., JavaScript promises with timeline visuals and countdown sounds). Prioritize real-time, multi-sensory feedback to reduce frustration and cognitive load—structure collaborative tasks to leverage complementary modalities (e.g., shared whiteboards and voice chat).
(2): For curriculum designers: Replace static materials (e.g., textbooks) with interactive, multimodal resources (e.g., video tutorials with embedded code editors). Align assessments with behavioral processes (e.g., evaluating how students strategically utilize modalities) rather than just focusing on outcomes.
(3): For tool developers: Integrate low-cost, multimodal features into coding platforms (e.g., browser-based vibration APIs for error detection, customizable sound cues). Support modality customization to accommodate diverse needs (e.g., visual alternatives for deaf learners).

5.5. Limitations and Future Directions

Sample and context constraints: The study included undergraduates from China, limiting generalizability to other cultures or educational levels (e.g., vocational training). Future research should test multimodal learning in diverse settings, including online environments.

(1): Modality specificity: We did not isolate effects of individual modality combinations (e.g., visual and haptic vs. auditory and haptic). Factorial designs could identify optimal pairings for specific tasks (e.g., CSS layout vs. JavaScript logic).
(2): Long-term retention: Follow-up was limited to 2 weeks. Longer tracking (e.g., 6 months) is needed to assess whether multimodal-induced behaviors persist in professional settings.
(3): Technology access: Multimodal tools require devices with sensory capabilities, which may be inaccessible in resource-limited contexts. Future work should develop low-cost alternatives (e.g., text-to-speech for auditory feedback).

6. Conclusions

This study systematically examines the behavioral mechanisms through which multimodal learning theory transforms experimental teaching in “Web Front-end Development”. By integrating visual, auditory, and haptic modalities, we demonstrate that such interventions reshape cognitive engagement, interactive dynamics, and persistent learning behaviors, with effects rooted in core principles of behavioral science. The findings confirm that multimodal teaching enhances cognitive behaviors by leveraging embodied cognition, thereby strengthening motor-sensory memories that link physical actions (e.g., coding) to abstract concepts (e.g., syntax rules), and reducing cognitive load by distributing information across multiple sensory channels. This explains why the experimental group exhibited 34.9% lower perceived mental effort, 80.7% longer attention duration, and 59.1% higher problem-solving accuracy compared to the control group. These outcomes align with embodied cognition theory and cognitive load theory [15], highlighting how sensory–motor integration optimizes information processing in technical skill acquisition.

In terms of interactive behaviors, multimodal collaboration tools (e.g., shared visual boards, voice chat) fulfilled the relatedness need posited by self-determination theory (SDT) [34], resulting in more frequent peer discussions (a 2.3-fold increase) and deeper teacher–student interactions. Feedback utilization nearly doubled in the experimental group, as multimodal cues (e.g., video annotations and verbal explanations) made guidance more actionable, underscoring that how feedback is delivered matters as much as its content. Persistent behaviors, including voluntary practice (149.1% increase) and advanced skill exploration (87% workshop participation), were driven by SDT’s autonomy and competence needs. Modality choice empowered students to take ownership of their learning, while immediate sensory feedback (e.g., “ping” sounds for successful code) reinforced feelings of mastery, boosting intrinsic motivation. Mediation analyses confirmed that 41% of the increase in practice time was explained by higher motivation, validating the role of motivational mechanisms in sustaining engagement.

Collectively, these results bridge the gap between educational technology and behavioral science, offering a nuanced framework for technical education. By framing web front-end learning as a behavioral process focused on how students attend, interact, and persist, this research challenges the traditional focus on technical outcomes alone. Instead, it highlights the value of designing learning environments that align with natural sensory–motor and motivational tendencies.

Although there are similarities in the application of multimodal learning in fields such as programming, mathematics, and history education, web front-end development has its unique challenges and requirements, and our research provides new insights into educational technology in this specific field. Future work should explore the generalizability of these findings across cultures, educational levels, and technical domains (e.g., backend development, data science). Additionally, long-term tracking (More than 6 months) is needed to assess whether multimodal-induced behaviors persist in professional settings. Ultimately, developing low-cost, multimodal tools (e.g., browser-based vibration APIs) could expand access to these benefits in resource-constrained contexts.

In summary, multimodal learning theory offers a powerful approach to cultivating the adaptive, collaborative, and persistent behaviors essential for success in web front-end development and beyond. By nurturing these behaviors, educators and tool developers can prepare learners not only to write code but also to thrive as lifelong problem solvers in an increasingly digital world.

Author Contributions

Conceptualization, M.L.; methodology, M.L.; software, M.L.; validation, M.L.; formal analysis, M.L.; investigation, M.L.; resources, M.L. and Z.H.; data curation, M.L. and Z.H.; writing—original draft preparation, M.L.; writing—review and editing, M.L.; visualization, M.L.; supervision, M.L.; project administration, Z.H.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

M.L. has been awarded funding from two projects in China: the Industry-University Cooperation and Collaborative Education Project of the Ministry of Education (Project No.: 202102642004) and the Teaching Research and Reform Project of Wenzhou University (Project No.: jg2024046). The funds received are used to cover the article processing charge (APC) and the English manuscript revision.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Experimental Animal Welfare and Ethics Committee of Wenzhou University (Approval No.: WZU-2025-102). All participants were provided with detailed information about the study objectives, procedures, potential risks, and benefits before enrollment.

Informed Consent Statement

Informed consent was obtained from all individual participants included in the study. Participants were informed that their participation was voluntary, that they could withdraw at any time without penalty, and that their data would be anonymized and encrypted to ensure confidentiality. Written informed consent was collected from each participant prior to the commencement of data collection.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to personal privacy and ethical policies.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ferrari, C.; Hurst, A. Accessible Web Development: Opportunities to Improve the Education and Practice of web Development with a Screen Reader. ACM Trans. Access. Comput. 2021, 14, 8. [Google Scholar] [CrossRef]
Martinez, C. Developing 21st century teaching skills: A case study of teaching and learning through project-based curriculum. Cogent Educ. 2022, 9, 2024936. [Google Scholar] [CrossRef]
Li, X.; Di, R.; Cai, J. Online Small Sample Learner Modeling and Curriculum Recommendation with Healthy Emotional Factors of College Students. Comput. Intell. Neurosci. 2022, 2022, 8247203. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Yu, S.; Pang, N.; Dou, C.; Li, D. Review on A big data-based innovative knowledge teaching evaluation system in universities. J. Innov. Knowl. 2022, 7, 100197. [Google Scholar] [CrossRef]
Holly, M.; Hildebrandt, J.; Pirker, J. A Computer-Supported Collaborative Learning Environment for Computer Science Education. In Proceedings of the 10th International Conference on Immersive Learning, Glasgow, UK, 10–13 June 2024. [Google Scholar] [CrossRef]
Zhu, G.; Raman, P.; Xing, W.; Jim, S. Curriculum design for social, cognitive and emotional engagement in Knowledge Building. Int. J. Educ. Technol. High. Educ. 2021, 18, 37. [Google Scholar] [CrossRef]
Tian, F.; Wang, Q.; Li, X.; Sun, N. Heterogeneous multimedia cooperative annotation based on multimodal correlation learning. J. Vis. Commun. Image Represent. 2019, 58, 544–553. [Google Scholar] [CrossRef]
Ojonuba, S.E.; Türkmen, G.; Toker, S. Enhancing Web Development Education with Game-Based and Gamification Learning: A Study of Engagement, Motivation, and Performance. IEEE Access 2025, 13, 137048–137066. [Google Scholar] [CrossRef]
Garcia, M.; Yousef, A. Cognitive and affective effects of teachers’ annotations and talking heads on asynchronous video lectures in a web development course. Res. Pract. Technol. Enhanc. Learn. 2023, 18, 020. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, K. Online course in web development: The case of Chinese universities. Interact. Learn. Environ. 2024, 32, 3377–3387. [Google Scholar] [CrossRef]
Liu, Y.; Ma, W.; Guo, X.; Lin, X.; Wu, C.; Zhu, T. Impacts of Color Coding on Programming Learning in Multimedia Learning: Moving Toward a Multimodal Methodology. Front. Psychol. 2021, 12, 773328. [Google Scholar] [CrossRef]
Ling, H.C.; Chiang, H.S. Learning Performance in Adaptive Learning Systems: A Case Study of Web Programming Learning Recommendations. Front. Psychol. 2022, 13, 770637. [Google Scholar] [CrossRef]
Abdoulqadir, C.; Loizides, F. Interaction, Artificial Intelligence, and Motivation in Children’s Speech Learning and Rehabilitation Through Digital Games: A Systematic Literature Review. Information 2025, 16, 599. [Google Scholar] [CrossRef]
Lakoff, G.; Johnson, M. The metaphorical structure of the human conceptual system. Cogn. Sci. 1980, 4, 195–208. [Google Scholar] [CrossRef]
Sweller, J. Cognitive load during problem solving: Effects on learning. Cogn. Sci. 1988, 12, 257–285. [Google Scholar] [CrossRef]
Ryan, R.M.; Deci, E.L. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. Am. Psychol. 2000, 55, 68–78. [Google Scholar] [CrossRef]
Hsieh, P.-C.; Fang, T.-L.; Jin, S.; Wang, Y.; Funabiki, N.; Fan, Y.-C. A Verilog Programming Learning Assistant System Focused on Basic Verilog with a Guided Learning Method. Future Internet 2025, 17, 333. [Google Scholar] [CrossRef]
Giannakos, M.; Cukurova, M. The role of learning theory in multimodal learning analytics. Br. J. Educ. Technol. 2023, 54, 1246–1267. [Google Scholar] [CrossRef]
Kozan, K. The incremental predictive validity of teaching, cognitive and social presence on cognitive load. Internet High. Educ. 2016, 31, 11–19. [Google Scholar] [CrossRef]
Chen, W.; Ping, X.; Dong, D. Filling in the vacuous flesh: Embodiment, constitution, and interoception. Theory Psychol. 2023, 33, 515–534. [Google Scholar] [CrossRef]
Lakoff, G. Explaining Embodied Cognition Results. Top. Cogn. Sci. 2012, 4, 773–785. [Google Scholar] [CrossRef]
Wilson, M. Six views of embodied cognition. Psychon. Bull. Rev. 2002, 9, 625–636. [Google Scholar] [CrossRef]
Simon, C.; Hacene, M.B.; Otmane, S.; Chellali, A. Study of communication modalities to support teaching tool manipulation skills in a shared immersive environment. Comput. Graph. 2023, 117, 31–41. [Google Scholar] [CrossRef]
Nambiar, K.; Bhargava, P. An Exploration of the Effects of Cross-Modal Tasks on Selective Attention. Behav. Sci. 2023, 13, 51. [Google Scholar] [CrossRef] [PubMed]
Czok, V.; Krug, M.; Müller, S.; Huwer, J.; Weitzel, H. Learning Effects of Augmented Reality and Game-Based Learning for Science Teaching in Higher Education in the Context of Education for Sustainable Development. Sustainability 2023, 15, 15313. [Google Scholar] [CrossRef]
El Hajji, M.; Ait Baha, T.; Berka, A.; Ait Nacer, H.; El Aouifi, H.; Es-Saady, Y. An Architecture for Intelligent Tutoring in Virtual Reality: Integrating LLMs and Multimodal Interaction for Immersive Learning. Information 2025, 16, 556. [Google Scholar] [CrossRef]
Zhang, Y.; Song, Y. The Effects of Sensory Cues on Immersive Experiences for Fostering Technology-Assisted Sustainable Behavior: A Systematic Review. Behav. Sci. 2022, 12, 361. [Google Scholar] [CrossRef]
Kirschner, P.A. Cognitive load theory: Implications of cognitive load theory on the design of learning. Learn. Instr. 2002, 12, 1–10. [Google Scholar] [CrossRef]
Paas, F.; Renkl, A.; Sweller, J. Cognitive load theory and instructional design: Recent developments. Educ. Psychol. 2003, 38, 1–4. [Google Scholar] [CrossRef]
Ryan, R.M.; Deci, E.L. Intrinsic and extrinsic motivation from a self-determination theory perspective: Definitions, theory, practices, and future directions. Learn. Instr. 2020, 61, 101860. [Google Scholar] [CrossRef]
Sewell, J.L.; Young, J.Q.; Boscardin, C.K.; ten Cate, O.; O’Sullivan, P.S. Trainee perception of cognitive load during observed faculty staff teaching of procedural skills. Med. Educ. 2019, 53, 925–940. [Google Scholar] [CrossRef]
van Merriënboer, J.J.G.; Sweller, J. Cognitive load theory in health professional education: Design principles and strategies. Med. Educ. 2010, 44, 85–93. [Google Scholar] [CrossRef]
Du, X.; Dai, M.; Tang, H.; Hung, J.L.; Li, H.; Zheng, J. A multimodal analysis of college students’ collaborative problem solving in virtual experimentation activities: A perspective of cognitive load. J. Comput. High. Educ. 2023, 35, 272–295. [Google Scholar] [CrossRef]
Deci, E.L.; Ryan, R.M. The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychol. Inq. 2000, 11, 227–268. [Google Scholar] [CrossRef]
Reeve, J. Motivating Students to Learn, 2nd ed.; Pearson: London, UK, 2013. [Google Scholar]
Ryan, R.M.; Deci, E.L. Self-determination theory: Basic psychological needs in motivation, development, and wellness. Handb. Self-Determ. Res. 2017, 2, 3–33. [Google Scholar]
Niemiec, C.P.; Ryan, R.M. Autonomy, competence, and relatedness in the classroom: Applying self-determination theory to educational practice. Theory Pract. 2009, 48, 237–245. [Google Scholar] [CrossRef]
Retnowati, E.; Ayres, P.; Sweller, J. Collaborative learning effects when students have complete or incomplete knowledge. Appl. Cogn. Psychol. 2018, 32, 681–692. [Google Scholar] [CrossRef]
Pérez-Marín, D.; Hijón-Neira, R.; Pizarro, C. A First Approach to Co-Design a Multimodal Pedagogic Conversational Agent with Pre-Service Teachers to Teach Programming in Primary Education. Computers 2024, 13, 65. [Google Scholar] [CrossRef]
Tuo, M.; Long, B. Construction and Application of a Human-Computer Collaborative Multimodal Practice Teaching Model for Preschool Education. Comput. Intell. Neurosci. 2022, 2022, 2973954. [Google Scholar] [CrossRef]
Skliarova, I.; Meireles, I.; Martins, N.; Tchemisova, T.; Cação, I. Enriching Traditional Higher STEM Education with Online Teaching and Learning Practices: Students’ Perspective. Educ. Sci. 2022, 12, 806. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2008, 3, 77–101. [Google Scholar] [CrossRef]
Creswell, J.W.; Clark, V.L.P. Designing and Conducting Mixed Methods Research, 3rd ed.; Sage Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
Ayres-Pereira, V.; Arntzen, E. Effect of Presenting Baseline Probes During or After Emergent Relations Tests on Equivalence Class Formation. Psychol. Rec. 2019, 69, 193–204. [Google Scholar] [CrossRef]
Jankowski, K.R.B.; Flannelly, K.J.; Flannelly, L.T. The t-test: An Influential Inferential Tool in Chaplaincy and Other Healthcare Research. J. Health. Care. Chaplain. 2017, 24, 30–39. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.T.; Chen, M.C. Using chi-square statistics to measure similarities for text categorization. Expert Syst. Appl. 2011, 38, 3085–3090. [Google Scholar] [CrossRef]
A Guide to Learning Styles. Available online: https://vark-learn.com/ (accessed on 8 April 2025).
Duckett, J. Web Design with HTML, CSS, JavaScript and jQuery Set; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Staiano, F. Designing and Prototyping Interfaces with Figma; Packt Publishing: Birmingham, UK, 2023. [Google Scholar]
Holman, T. The best of CodePen. Net 2016, 283, 76–81. [Google Scholar]
Skliarova, I. Project-Based Learning and Evaluation in an Online Digital Design Course. Electronics 2021, 10, 646. [Google Scholar] [CrossRef]
Liang, Z.; Suntrayuth, S.; Sun, X.; Su, J. Positive Verbal Rewards, Creative Self-Efficacy, and Creative Behavior: A Perspective of Cognitive Appraisal Theory. Behav. Sci. 2023, 13, 229. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar] [CrossRef]
Matosas-López, L.; Leguey-Galán, S.; Regaña, C.B.; Piris, N.P. University and Quality Systems. Evaluating faculty performance in face-to-face and online programs: A comparison of Likert and BARS instruments. Int. J. Educ. Res. Innov. 2024, 22, 18. [Google Scholar] [CrossRef]
Morales, A.F.C.; Arellano, J.L.H.; Muñoz, E.L.G.; Macías, A.M. Development of the NASA-TLX Multi Equation Tool to Assess Workload. Int. J. Combin. Optim. Probl. Inform. 2020, 11, 50–58. [Google Scholar]
Ogden, R.S.; Turner, F.; Pawling, R. Exploring the Role of Overt Attention Allocation During Time Estimation: An Eye Movement Study. Timing Time Percept 2022, 10, 17–39. [Google Scholar] [CrossRef]
Boomgaarden, A.; Loibl, K.; Leuders, T. The trade-off between complexity and accuracy. Preparing for computer-based adaptive instruction on fractions. Interact. Learn. Environ. 2023, 31, 6379–6394. [Google Scholar] [CrossRef]
Wang, T.J.; Xia, X. The Study of Hierarchical Learning Behaviors and Interactive Cooperation Based on Feature Clusters. Sage Open 2023, 13, 21582440231166593. [Google Scholar] [CrossRef]
Burns, S.; Brathwaite, L.; Hoan, E.; Yu, E.; Yasiniyan, S.; White, L.; Dhuey, E.; Perlman, M. A scoping review and network analysis of the characteristics of peer collaboration in early educational settings from studies using diverse methodologies. Educ. Rev. 2025, 77, 1649–1671. [Google Scholar] [CrossRef]
Yang, J.; Chinchilli, V.M. Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data. Comput. Stat. Data Anal. 2011, 55, 1061–1070. [Google Scholar] [CrossRef]
Sklyarov, V.; Skliarova, I. Teaching reconfigurable systems: Methods, tools, tutorials, and projects. IEEE Trans. Educ. 2005, 48, 290–300. [Google Scholar] [CrossRef]
Liu, Y.; Hau, K.; Zheng, X. Does instrumental motivation help students with low intrinsic motivation? Comparison between Western and Confucian students. Int. J. Psychol. 2020, 55, 182–191. [Google Scholar] [CrossRef] [PubMed]
Faems, D. Moving forward quantitative research on innovation management: A call for an inductive turn on using and presenting quantitative research. R&D Manag. 2020, 50, 352–363. [Google Scholar] [CrossRef]
Brorsen, B.W.; Lin, H.; Larzelere, R. Critique of enhanced power claimed for Quasi-ANCOVA and Dual-Centered ANCOVA. PLoS ONE 2025, 20, e0317860. [Google Scholar] [CrossRef]
Guo, X. Relationship between Parents’ Educational Expectations and Children’s Growth Based on NVivo 12.0 Qualitative Software. Sci. Program. 2022, 2022, 9896291. [Google Scholar] [CrossRef]
Riyantoko, P.A.; Funabiki, N.; Brata, K.C.; Mentari, M.; Damaliana, A.T.; Prasetya, D.A. A Fundamental Statistics Self-Learning Method with Python Programming for Data Science Implementations. Information 2025, 16, 607. [Google Scholar] [CrossRef]
Hayes, A. Introduction to mediation, moderation, and conditional process analysis. J. Educ. Meas. 2013, 51, 335–337. [Google Scholar] [CrossRef]

Figure 1. Conceptual framework: mechanisms linking multimodal input to learning behaviors in web front-end development.

Figure 2. Comparison diagram of intervention processes between traditional teaching and multimodal teaching.

Figure 3. Line graph of intergroup differences in cognitive behavioral indicators.

Figure 4. Line graph of intergroup differences in interactive behavioral indicators.

Figure 5. Line graph of intergroup differences in persistent behavioral indicators.

Figure 6. Path diagram of the mediation effect model.

Table 1. Learner sample characteristics.

Characteristic	Control Group	Experimental Group
Gender	Male: 32 people Female: 28 people	Male: 32 people Female: 28 people
Mean age	19.2 ± 0.8 years	19.1 ± 0.7 years
Major	Computer Science: 42 people Information Technology: 18 people	Computer Science: 42 people Information Technology: 18 people
Prior programming experience	12.3 ± 4.1 h/week (self-reported, including basic Python or C#)	11.9 ± 3.8 h/week (self-reported, including basic Python or C#)
Learning styles (VARK questionnaire)	Visual: 42% Auditory: 28% Kinesthetic: 30%	Visual: 40% Auditory: 30% Kinesthetic: 30%
Baseline technical proficiency (pre-test score, 0–100)	56.2 ± 8.7	55.8 ± 9.1

Note: Independent samples t-tests and chi-square tests confirmed no significant differences between groups at baseline (all p > 0.05), indicating group equivalence before the intervention [47].

Table 2. Cognitive load scores across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	64.9 ± 9.2	65.3 ± 8.9	t (120) = 0.25, p = 0.80
Mid-test (Week 6)	65.1 ± 9.5	52.7 ± 8.1	t (120) = 7.21, p < 0.001
Post-test (Week 12)	65.7 ± 10.2	42.3 ± 8.7	t (120) = 12.83, p < 0.001

Note: Cognitive load was measured using the NASA-TLX scale.

Table 3. Attention duration across time points by group.

Time Point	Control Group (M ± SD, min)	Experimental Group (M ± SD, min)	Group Comparison Statistics
Pre-test	16.2 ± 4.3	15.9 ± 4.1	t (120) = 0.41, p = 0.68
Mid-test	16.1 ± 4.2	22.3 ± 4.8	t (120) = 7.59, p < 0.001
Post-test	15.9 ± 4.0	28.6 ± 5.3	t (120) = 14.21, p < 0.001

Note: Attention duration refers to the average time (in minutes) students spent on-task during 10 random 5-min observation intervals per class. The effect size (d = 2.58) at the post-test indicates a significant practical difference between the groups.

Table 4. Problem-solving accuracy across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	52.3 ± 9.7	51.8 ± 10.2	t (120) = 0.26, p = 0.79
Mid-test	54.7 ± 10.1	68.5 ± 8.9	t (120) = 7.83, p < 0.001
Post-test	57.6 ± 11.3	82.4 ± 9.1	t (120) = 12.05, p < 0.001

Note: The accuracy of problem-solving is reflected in the correct completion rate of standardized debugging tasks. The accuracy of the experimental group improved by 59.1% from before to after testing, while the progress of the control group was minimal. The effect size (d = 2.18) indicates that the actual difference between the two groups is highly significant after testing.

Table 5. Peer collaboration frequency across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	1.4 ± 0.7	1.5 ± 0.6	t (120) = 0.87, p = 0.38
Mid-test	1.5 ± 0.6	2.4 ± 0.7	t (120) = 8.02, p < 0.001
Post-test	1.5 ± 0.6	3.2 ± 0.8	t (120) = 13.17, p < 0.001

Note: Collaboration frequency refers to the average number of technical interactions per coding task. The experimental group showed a 113.3% increase in collaboration frequency from the pre-test to the post-test, while the control group remained stable. The effect size (d = 2.37) at the post-test indicates a significant practical difference between the groups.

Table 6. Teacher–student interaction frequency across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	5.3 ± 2.4	5.5 ± 2.2	t (120) = 0.51, p = 0.61
Mid-test	5.4 ± 2.3	9.2 ± 2.8	t (120) = 8.76, p < 0.001
Post-test	5.5 ± 2.2	12.6 ± 3.1	t (120) = 14.03, p < 0.001

Note: Interaction frequency refers to the average number of student-initiated questions per class session. The experimental group showed a 129.1% increase in question frequency from the pre-test to the post-test, while the control group remained stable. The effect size (d = 2.55) at the post-test indicates a significant practical difference between the groups.

Table 7. Feedback utilization rates across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	40.8 ± 12.3	41.2 ± 11.9	t (120) = 0.18, p = 0.86
Mid-test	41.0 ± 12.1	62.5 ± 10.7	t (120) = 10.15, p < 0.001
Post-test	41.2 ± 11.8	78.3 ± 10.4	t (120) = 16.39, p < 0.001

Note: Feedback utilization refers to the percentage of teacher/peer suggestions incorporated into the revised code. The experimental group showed a 90.0% increase in utilization from the pre-test to the post-test, while the control group remained stable. The effect size (d = 2.97) at the post-test indicates a significant practical difference between the groups.

Table 8. Post-class practice time across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	5.6 ± 2.2	5.7 ± 2.1	t (120) = 0.26, p = 0.79
Mid-test	5.7 ± 2.3	9.8 ± 2.7	t (120) = 9.24, p < 0.001
Post-test	5.7 ± 2.2	14.2 ± 3.2	t (120) = 16.72, p < 0.001

Note: Post-class practice time refers to the average number of hours spent on voluntary coding activities (e.g., personal projects, self-directed learning) each week. The experimental group showed a 149.1% increase in practice time from pre-test to post-test, while the control group remained stable. The effect size (d = 3.02) at the post-test indicates a significant practical difference between the groups.

Table 9. Skill extension motivation scores across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	3.1 ± 0.9	3.2 ± 0.8	t (118) = 0.52, p = 0.60
Mid-test	3.2 ± 0.8	4.0 ± 0.7	t (118) = 6.83, p < 0.001
Post-test	3.3 ± 0.9	4.8 ± 0.6	t (118) = 10.72, p < 0.001

Note: Due to the incomplete completion of the scale by two students, data was missing. This table analysis was based on a valid sample of 118 people (with a missing rate of 1.7%). The missing pattern was found to be random and unrelated to the intervention group (p > 0.05). The complete data statistical method was used for processing. The effect size (d = 1.96) at the post-test indicates a significant practical difference between the groups.

Table 10. Intrinsic motivation scores across time points by group.

Time Point	Control Group (M ± SD)	Experimental Group (M ± SD)	Group Comparison Statistics
Pre-test	2.7 ± 0.8	2.8 ± 0.7	t (120) = 0.68, p = 0.50
Mid-test	2.8 ± 0.7	3.5 ± 0.6	t (120) = 7.91, p < 0.001
Post-test	2.8 ± 0.8	4.2 ± 0.6	t (120) = 12.56, p < 0.001

Note: Intrinsic motivation was measured using a 12-item scale based on self-determination theory (range: 1–5, with higher scores indicating greater motivation). The experimental group showed a 50.0% increase in motivation from the pre-test to the post-test, while the control group remained stable. The effect size (d = 2.26) at the post-test indicates a significant practical difference between the groups.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, M.; Hu, Z. Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning. Information 2025, 16, 734. https://doi.org/10.3390/info16090734

AMA Style

Lu M, Hu Z. Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning. Information. 2025; 16(9):734. https://doi.org/10.3390/info16090734

Chicago/Turabian Style

Lu, Ming, and Zhongyi Hu. 2025. "Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning" Information 16, no. 9: 734. https://doi.org/10.3390/info16090734

APA Style

Lu, M., & Hu, Z. (2025). Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning. Information, 16(9), 734. https://doi.org/10.3390/info16090734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Multimodal Information for Web Front-End Development Instruction: Analyzing Effects on Cognitive Behavior, Interaction, and Persistent Learning

Abstract

1. Introduction

2. Related Research

2.1. Theoretical Foundations of Multimodal Learning and Behavior

2.1.1. Embodied Cognition: How Sensory–Motor Integration Shapes Cognitive Behavior

2.1.2. Cognitive Load Theory: Optimizing Information Processing for Sustained Attention

2.1.3. Self-Determination Theory: Motivational Drivers of Persistent and Social Behavior

2.2. Multimodal Learning in Technical Education: Behavioral Gaps

2.2.1. Progress in STEM and General Programming Education

2.2.2. Underexplored Frontiers in Web Front-End Education

2.3. Conceptual Framework: How Multimodal Input Shapes Learning Behaviors

3. Research Design and Methods

3.1. Study Design

3.2. Participants

3.2.1. Inclusion and Exclusion Criteria

3.2.2. Sample Characteristics

3.3. Intervention Procedures

3.3.1. Control Group: Traditional Web Front-End Teaching

3.3.2. Experimental Group: Multimodal Web Front-End Teaching

3.4. Measures

3.4.1. Cognitive Behavior Measures

3.4.2. Interactive Behavior Measures

3.4.3. Persistent Behavior Measures

3.4.4. Qualitative Measures

4. Results

4.1. Cognitive Behavior Outcomes

4.1.1. Cognitive Load

4.1.2. Attention Duration

4.1.3. Problem-Solving Accuracy

4.2. Interactive Behavior Outcomes

4.2.1. Peer Collaboration Frequency

4.2.2. Teacher–Student Interaction

4.2.3. Feedback Utilization

4.3. Persistent Behavior Outcomes

4.3.1. Post-Class Practice Time

4.3.2. Skill Extension

4.3.3. Intrinsic Motivation

4.4. Mediation Analyses

4.4.1. Cognitive Load as a Mediator of Problem-Solving Accuracy

4.4.2. Intrinsic Motivation as a Mediator of Post-Class Practice Time

4.5. Qualitative Findings

4.5.1. Embodied Memory Enhances Problem-Solving

4.5.2. Multimodal Feedback Reduces Frustration

4.5.3. Collaboration Feels “More Natural” with Mixed Modalities

4.5.4. Autonomy and Competence Drive Persistence

4.5.5. Single Sensory Effect Analysis

5. Discussion

5.1. Multimodal Learning Shapes Cognitive Behaviors Through Embodied Cognition and Reduced Load

5.2. Multimodal Collaboration Fulfills Relatedness Needs, Enhancing Interactive Behaviors

5.3. Persistent Behaviors Are Driven by Autonomy and Competence Needs

5.4. Practical Implications

5.5. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI