1. Introduction
As digital innovation continues to redefine education, the deployment of advanced technological solutions has become a prerequisite for developing 21st-century competencies [
1], transitioning from optional integration to an essential practice for effective instruction [
2]. Nevertheless, instructional practice in Mexico remains largely grounded in traditional lecture-based approaches [
3].
Fostering computational thinking and programming skills remains a substantial challenge for novice learners worldwide [
4], as evidenced by persistent failure rates. This challenge is partly attributable to limitations in traditional instructional delivery formats. While traditional teaching methods and static materials such as books, notes, and slides can communicate introductory concepts, they fall short of effectively teaching computer programming, reinforcing the need for formats that allow learners to visualize dynamic program behavior and data flow [
5,
6]. In response to these limitations, VBL has emerged as a promising approach to knowledge acquisition [
7], proving particularly effective in education by addressing the specific need for dynamic visualization [
8].
Important instructional delivery formats for leveraging VBL include flipped classrooms and MOOCs [
9]. Flipped classroom approaches have been widely studied and integrated into courses. However, reported findings remain inconsistent, with many studies indicating statistically insignificant [
10,
11] or marginal gains [
12,
13]. MOOCs have been widely adopted at scale since their emergence in 2012 [
14,
15]. The global user base reached 220 million learners in 2021 [
16], with computer science courses consistently ranking among the most popular offerings [
17]. Despite this extensive reach, completion rates remain substantially lower than in traditional courses [
18], often falling below 10% of total enrolments [
19].
Simultaneously, the role of AI in educational settings is shifting from purely adaptive instruction to personalized content generation. For decades, research on artificial intelligence in education has largely focused on personalized and adaptive learning technologies, such as Intelligent Tutoring Systems, knowledge-tracing models, and recommendation systems, to tailor instruction to individual learner needs [
20]. The emergence of GenAI, however, has significantly expanded this scope [
21]. Recent empirical data indicate that educators are predominantly adopting these technologies as powerful authoring tools to streamline workflows, with educational content creation and curriculum planning emerging as the most prominent use cases [
22].
2. Related Works
The literature on blended learning and VBL in computer science education can be approached from two complementary perspectives: theoretical frameworks and empirical validation. While some works primarily focus on one of these dimensions, this section reviews studies across both perspectives, examining instructional design principles alongside measured learning outcomes and student attitudes.
Regarding video pedagogical characteristics, a prior meta-analysis [
6] of 257 articles from computer science-related databases provides a holistic perspective on VBL. The study identifies two complementary taxonomies: one characterizes instructional video features, and the other structures video-related tools, learning tasks, and enabling technologies. Together, these taxonomies provide insights into integrating artificial intelligence methods into video-assisted learning environments. Among the identified features, the most impactful include interactive quizzes, in-video annotations, and animations.
Regarding the effectiveness of VBL in computer science, research reported in [
23] considered the effects of live-coding–based instruction combined with student reflection annotations on program comprehension and coding performance, as well as whether such instruction influences student attitudes toward learning programming. The experiment found that observing the flowcharting and coding processes of experts significantly improved students’ coding skills. Furthermore, reflective engagement with expert programming activities proved critical in supporting programming learning.
Complementing these findings, a similar study [
24] evaluated a redesigned, blended-learning computer science course for electrical engineering over two semesters. They used a combination of inverted classrooms and graded hackathons to maintain the course pace and level students’ progress. The results show that students reported higher levels of engagement, lower dropout rates, and improved examination performance.
A different study [
25] explored the pedagogical quality of publicly available educational videos for teaching computing and programming concepts, including abstraction, algorithms, data structures, and programming paradigms. Custom-developed artificial intelligence tools were used to transcribe, index, and categorize videos. The selection criteria included platform, educational level, engagement indicators, and didactic structure. The analysis revealed that most videos are effective at teaching programming tools but often lack comprehensive explanations of foundational theories, prioritizing practical implementation over conceptual understanding. In contrast, videos that employ visual metaphors, structured explanations, and step-by-step problem-solving approaches were shown to better support conceptual learning.
Student engagement behaviors were also examined [
26] in VBL by introducing a theory-grounded framework of active viewing, implemented through a video player designed to support active interaction. The study analyzed video interaction data collected from 460 undergraduate students. The findings indicate that students can develop active viewing strategies, such as rewinding, fast-forwarding, and transcript highlighting, to regulate their engagement.
Regarding the use of GenAI, most of the existing literature focuses on curriculum content generation [
27], particularly instructional videos in the context of this study, with other work studying conversational uses of GenAI, such as educational chatbots. For example, a comparative study [
28] of videos presented by a human instructor versus a synthetic virtual instructor using pre- and post-learning assessments reported no statistically significant differences in learning gains.
Similarly, a within-subjects study [
29] in science teacher education evaluated the impact of AI-generated instructional videos, using pre-, post-, and transfer assessments, to measure learning outcomes, task performance, and self-efficacy across two video formats: one with an embedded preview feature and one without. The results indicated that both video formats effectively supported self-efficacy, task performance, and knowledge retention; however, no statistically significant differences were observed between videos with and without preview features.
To reduce the instructor workload associated with creating instructional videos in university courses, a GenAI–based workflow was proposed [
30] to automatically generate short introductory course videos from existing course descriptions. The workflow combines AI-generated scripts and visuals with text-to-speech video and was evaluated through a field survey in which engineering instructors reviewed AI-generated videos for their own courses. Outcomes demonstrated improved efficiency in video production, while also revealing limitations in voice-narration naturalness and the risk of AI-generated misinformation, indicating the need for human oversight.
To improve VBL, a study [
31] examined the impact of AI-based chatbot feedback and peer feedback integrated into online instructional videos on 144 pre-service teachers’ (PTs) learning performance and intrinsic motivation. Using a pre- and post-learning assessment quasi-experimental design with two experimental groups (chatbot-based immediate feedback and delayed peer feedback) and a control group. The results indicated that both chatbot-based and peer feedback conditions led to higher learning performance and intrinsic motivation compared to traditional instructional video use, highlighting the potential of AI-powered feedback mechanisms to enhance VBL.
Recent work [
32] has also explored adaptive instructional video systems for programming education that account for individual learner preferences. To evaluate the effectiveness of adaptive video-based instruction, a decision-tree-driven learning style model was implemented within an adaptive e-learning environment. A controlled experiment involving 195 first-year undergraduate students evaluated learning outcomes across three instructional conditions: no video instruction (control), traditional video-based learning, and adaptive VBL. Over a six-month semester, student performance scores and learner feedback were collected as evaluation metrics. Results indicated that students in the adaptive video-based condition achieved higher performance scores and reported more positive learning experiences than those in the traditional video and no-video conditions.
COVIA’s distinctive contribution lies in integrating and evaluating three complementary approaches: VBL, blended learning through coding exercises, and spatial contiguity design within a unified educational platform. The studies presented on VBL highlight various environments and platforms for teaching programming, reflecting the remarkable growth of this pedagogical approach; however, they do not offer a single environment for creating and running code while simultaneously presenting educational videos.
Table 1 highlights the main elements present in both the related works and in COVIA. In contrast to the reviewed studies, COVIA integrates four key components: VBL, blended learning strategies, a notebook, and the use of GenAI to create educational materials. The integration of these four elements underscores COVIA’s distinctive contribution, since related works typically employ only a subset of them.
3. Materials and Methods
3.1. Instructional Design and Delivery Modality
The COVIA design is grounded in an instructional design theory, on constructive engagement [
33], cognitive load theory, and the spatial contiguity principle, optimizing both learner engagement and information processing. This approach employs a blended learning modality in which VBL serves as an additional home-based component, complementing in-person lectures and supporting self-paced learning.
While these pedagogical constructs are well established, their comprehensive integration into a single computational interface remains underexplored within programming education. Traditional VBL environments often restrict learners to passive or active modes within the ICAP framework [
33], typically limiting them to watching content or controlling playback [
26]. To address this gap, COVIA unifies these pedagogical constructs within a constructive VBL environment that explicitly maps interface elements to distinct engagement modes within a single visual field, thereby increasing engagement and mitigating split-attention effects.
3.1.1. Constructive Engagement
Learning is more effective when students are cognitively engaged. The ICAP framework [
33] formalizes this principle by defining four levels of cognitive engagement: Interactive, Constructive, Active, and Passive [
26].
The workspace is designed to incrementally advance learners through progressively deeper levels of cognitive engagement by integrating complementary instructional media within a unified interface where each component fulfills a distinct pedagogical function:
Passive Engagement (Static Markdown): Serves as an efficient medium for static information delivery through the integration of formatted technical text, diagrams, and syntax-highlighted code snippets.
Active Engagement (Instructional Video): Supports dynamic visualization and self-regulated learning [
34] by allowing learners to interact with the video player through pausing, replaying, and verification of key logical segments [
35]. This process eases the identification of essential algorithmic steps and supports internalization of code structure before implementation.
Constructive Engagement (Code Notebook): Enables the generation of new outputs and supports automatic validation using unit tests. Students should synthesize the information from the Markdown and Video modules to write, execute, and debug functional Java solutions.
3.1.2. Spatial Contiguity Design
COVIA aligns with the Spatial Contiguity Principle [
36], which suggests that learning is more effective when corresponding words and images are presented in close spatial proximity rather than being separated on the screen [
37]. In contrast, traditional programming education frequently violates this principle by requiring learners to constantly switch between multiple standalone tools and windows, such as a video player and an integrated development environment (IDE), thereby inducing a split-attention effect [
38] and increasing unnecessary cognitive load. COVIA addresses this limitation by providing a unified workspace, illustrated in
Figure 1, that integrates instructional sources (markdown and video) with the constructive environment (a coding notebook) within a single visual field.
3.1.3. Complementary VBL
Evidence suggests that the effectiveness of VBL depends on how well the selected media form aligns with the type of knowledge being conveyed [
39] and on the proportion of the content delivered across the disciplinary domain. To maximize the instructional effectiveness, COVIA uses VBL as an additional instructional modality rather than a replacement for traditional lectures, following cross-disciplinary analyses based on Biglan’s hard–soft taxonomy [
40]. Specifically, these findings suggest that while soft disciplines such as the humanities, psychology, and sociology tend to benefit when video replaces instruction, hard disciplines such as computer science and physics demonstrate greater learning gains when video is used in a supplemental role [
41].
3.2. Curricular Design
The COVIA curriculum was developed through a comparative synthesis [
42] of introductory programming curricula from three local higher-education institutions: Universidad Autónoma de Sinaloa (UAS), Universidad Autónoma de Occidente (UADEO), and Tecnológico Nacional de México, Instituto Tecnológico de Culiacán (TecNM-Culiacán). A systematic analysis was conducted to identify shared core topics, learning objectives, and pedagogical sequencing, resulting in a consolidated curricular baseline of 33 lessons organized into three thematic blocks, aligned with regional academic standards and designed to support transferability across institutional contexts.
COVIA lessons are structured instructional units composed of three main components:
Markdown Content: Static resources including structured explanatory text, diagrams, and syntax-highlighted code snippets.
Short Instructional Video: recorded instructional videos incorporating diagrams, animations, and code were designed with a target duration of approximately 6 min, aligned with prior research indicating that shorter videos maximize learner engagement [
43,
44].
Coding Exercise: a formal problem specification accompanied by automated unit tests used to validate student solutions. Each lesson incorporates a dedicated exercise; consequently, the total number of exercises integrated into the COVIA platform amounts to 33.
Markdown content was generated with the support of GenAI systems, using existing instructional materials as contextual input and prompt guidance. Videos were developed with HeyGen (June, 2025) following an outline derived from the generated Markdown. Additionally, AI-generated virtual tutors were included. This workflow aligns with emerging trends in the use of GenAI in education [
21] as a tool for streamlining instructional design workflows and supporting content authoring at scale [
22].
Figure 2 shows a clip from a video featuring a virtual tutor created using GenAI.
Figure 3 shows an exercise related to language elements called “Analyzing a Sentence.” The exercise consists of obtaining a sentence through keyboard input and analyzing it using methods from the Java String class. The instructions are divided into steps to ensure that the student can proceed step-by-step through the coding required to complete the exercise and pass the automated tests.
In each of COVIA’s exercises, input/output examples are included, such that the student can carefully observe what the program is expected to display for the exercise to be marked as completed.
The exercises are designed for students who are beginning their studies in programming; therefore, the level of difficulty corresponds to an introductory or beginner stage. The
Appendix A contains additional exercises.
While the use of GenAI within the COVIA workflow significantly accelerates content creation, it also introduces inherent risks that require strict validation. Primary concerns include the potential for AI-generated misinformation or hallucinations, in which outputs may appear syntactically correct yet contain subtle logical or technical errors, as well as algorithmic bias that can affect content neutrality. Additionally, over-reliance on automated generation may diminish the naturalness of content and weaken the emotional resonance of instructional delivery. To mitigate these challenges, COVIA adopts a human-in-the-loop process for iterative refinement and validation, as shown in
Figure 4, in which subject-matter experts (such as introductory programming professors) serve as the final judges of content quality and accuracy.
The lesson validation and development process was conducted in groups of varying academic levels, as detailed in
Table 2. The first group consists of individuals with a bachelor’s degree; this group handled the initial tasks in the process and compiled the information used to create the prompts. The second group consists of professors with more than 20 years of experience teaching programming. Their main activities included: reviewing and submitting the prompt, conducting pedagogical and technical reviews, approving the lessons, and resubmitting prompts that did not meet the technical and pedagogical review criteria. The group of experts holding doctoral degrees was responsible for reviewing and refining the approved lessons. This group includes professors with more than 30 years of experience. A total of eight individuals participated in the entire process. For each stage of the process, review documents were created that included a series of checklists allowing the experts to verify the necessary information for the proper development of the lesson content.
The LLMs used to create content in COVIA were GPT-4.1, GPT-5, and Gemini-2.5-Pro. A prompt was designed to create the exercises. The prompt consists of three main parts: General Description, Lesson Context, and Instructions. The lesson context from which the exercise will be generated consists of the lesson title and an introduction.
Figure 5 shows the complete prompt to create an exercise for the lesson on random numbers.
COVIA Teaching Process
The teaching process of COVIA consists of a series of steps presented below:
The student selects the lesson.
The student watches the lesson video, which includes definitions, syntax, code examples, and short exercises.
The student subsequently uses the notebook to implement the code corresponding to the examples and exercises covered in the lesson.
The environment also provides additional examples, notes, and links to relevant resources for further consultation.
At this stage of the learning process, students are required to implement the lesson exercise through coding. Each exercise provides a detailed specification of the elements that need to be developed.
Once the student has completed the coding exercise, the solution is submitted for evaluation through automated tests, after which the lesson is marked as completed.
In addition to watching the lesson video, students can practice by coding the examples presented both in the videos and in the lesson description. Completing the lesson exercise is mandatory for the lesson to be marked as completed, allowing the student to progress to subsequent lessons. In this way, COVIA enables learners to engage with simple, practical examples, thereby reinforcing the teaching process through hands-on practice.
3.3. Selection Process and Criteria for Participant Groups
After completing the admissions process for new students in the computer systems engineering program, a list of accepted students from various high schools across the state is compiled. These students have taken elective courses across different fields of study, demonstrating a diverse set of skills and competencies. From this list of accepted students, groups are formed, and students are assigned to these groups at random. The number of students per group is distributed evenly to ensure that (most of the time) each group has the same number of students.
Following the creation of the groups, instructors were assigned to the courses according to methodological criteria, taking into account each teacher’s schedule and availability.
During the August–December 2025 semester, ten groups were formed to Introduction to programming course. For the present study, five of these groups were selected. The selection considered instructors who met two main criteria: (1) holding a postgraduate degree in the field of computing and (2) having several years of experience teaching programming courses. The study was conducted with these five groups, randomly assigning three to the experimental group and two to the control group. The participants in the experiment were between 17 and 20 years old and enrolled in the Introduction to Programming course, the first in the school’s programming curriculum.
Figure 6 illustrates the experimental design adopted in the study. Experimental groups (EG1–EG3) used the COVIA platform throughout the instructional units, whereas control groups (CG1–CG2) followed a traditional instructional approach. Each curricular unit included a pre-test and a post-test to assess student performance.
3.4. Experimental Design and Evaluation Methodology
To validate the effectiveness of the COVIA environment, this study uses a longitudinal quasi-experimental design to measure learning gains and a survey-based analysis to assess technology acceptance, as illustrated in
Figure 7. The evaluation was conducted over a full academic semester and involved undergraduate students enrolled in the Computer Systems Engineering program.
Three targeted interventions were implemented, each aligned with one of the core modules of the introductory Java curriculum: (1) Language Elements, (2) Selective Structures, and (3) Iterative Structures. For each intervention, a 10-item multiple-choice pre-test and post-test were designed to establish baseline knowledge and quantify learning gains.
To evaluate user experience and potential adoption, the study employed two complementary instruments applied at the final stage of the evaluation. Technology acceptance was measured using a Likert-scale questionnaire based on TAM [
45], assessing perceived usefulness (PU), perceived ease of use (PEU), perceived enjoyment (PE), attitude toward use (ATU), and Intention To Use (ITU).
Table 3 details the specific instruments employed: multiple-choice assessments were performed before and after each intervention to quantify learning gains in syntax and logic. Meanwhile, a TAM questionnaire and an open-ended survey were deployed at the conclusion of the study to capture student perceptions.
The study involved 147 students from TecNM-Culiacán enrolled in five introductory programming courses. Participants were assigned to two primary experimental conditions to evaluate the platform’s efficacy relative to traditional instructional baselines:
Experimental Groups (COVIA): Three groups (Groups 1, 2, and 3) engaged with the full COVIA learning environment, including instructional text, interactive video content, and code development with automated unit-test-based validation.
Control Groups (Traditional): Two groups (Groups 4 and 5) served as the control baseline and received conventional instruction using standard teaching materials, such as lecture slides, textbooks, and standalone IDEs.
4. Results
This section presents the quantitative results of the study. Data were collected from five groups of engineering students: three experimental groups (EG) received traditional and video-based lectures via COVIA in a blended learning setting, and two control groups (CG) received only conventional instruction. Reported scores represent group averages on a 0–100 scale. Cases with incomplete paired data (i.e., students who had only a pre-test score or only a post-test score) were excluded from the analysis to ensure accurate paired comparisons. To evaluate the effectiveness of the COVIA environment as compared to traditional instruction, statistical analyses were conducted across three intervention phases: Phase 1 (Language Elements), Phase 2 (Selective Structures), and Phase 3 (Iterative Structures), comparing learning outcomes between the experimental and control groups.
4.1. Baseline Equivalence
To assess baseline equivalence, a Mann–Whitney U test was performed on the initial pre-test scores. To account for the frequency of identical scores inherent in a 10-item assessment, the -value was adjusted for ties. The results indicated no statistically significant difference between the Control and Experimental groups ), confirming equivalent foundational knowledge at the start of the study.
4.2. Normality Testing and Selection of Analytical Tools
To ensure the reliability of the findings, normality assumptions were first evaluated using the Shapiro–Wilk test, which revealed significant deviations in the pre-test scores across all three phases (). Given these violations and the unequal sample sizes, non-parametric tests were selected as the primary analytical tools. To assess intra-group improvements (pre-test vs. post-test), the Wilcoxon Signed-Rank Test was used. To evaluate differences in learning gains between the two independent groups, the Mann–Whitney U Test was employed.
Finally, to examine interaction effects and corroborate the non-parametric results, a Mixed ANOVA was conducted. This complementary analytical strategy was employed to ensure robust, consistent findings across different statistical models, thereby reinforcing the validity of the conclusions despite distributional limitations.
4.3. Intra-Group Learning Assessment
To assess whether each instructional method produced significant learning improvements within its own group, the Wilcoxon Signed-Rank Test was applied to pre-test and post-test scores for each phase; the results are presented in
Table 4.
The Experimental Group (COVIA) was the only group to demonstrate significant improvement across all three phases with Phase 1 (, ), Phase 2 (, ), and Phase 3 (, ). These results were accompanied by medium-to-large effect sizes ( in Phase 1, in Phase 2, and in Phase 3). Notably, the Experimental Group achieved a mean score increase exceeding 4 points (on a 0–100 scale) across all three phases ( = , , and , respectively).
In contrast, the Control Group stagnated in Phase 1 () and Phase 3 (), failing to achieve statistically significant growth and exhibiting small to minimal effect sizes despite slight increases in mean scores. The Control Group only demonstrated a significant improvement during Phase 2 (), though the effect size remained below the medium threshold (). Furthermore, the magnitude of improvement for the Control Group never exceeded 10 points in any phase, remaining below a five-point gain in Phases 1 and 3.
4.4. Inter-Group Learning Assessment
To assess differences between control and experimental groups, the Mann–Whitney U Test was applied to compare the learning gains (calculated as
) from each phase between the Experimental and Control groups. The results are presented in
Table 5 and by
Figure 8 and
Figure 9, which show the average pre-test and post-test scores, respectively.
In
Figure 8, both groups started in Phase 1 with relatively homogeneous scores and no statistically significant differences, as outlined in
Section 4.1, indicating a comparable initial understanding of the topics. In contrast,
Figure 9 highlights the divergence in post-test performance. As the semester progressed, a distinct performance gap emerged, with the Experimental group consistently outperforming the Control group. This trend became the most pronounced in the final phase, covering iterative structures.
In Phase 1 (Language Elements), the Mann–Whitney U test revealed a statistically significant difference in learning gains between the instructional groups (, ), characterized by a small-to-medium effect size (), where the Experimental Group recorded a notably higher Mean Rank () and mean gain () compared to the Control Group (, ).
Regarding Phase 2 (Selective Structures), the statistical analysis showed no significant difference in the learning gains between the two groups (, Z = −1.34, ), with the Experimental Group recording a Mean Rank of () and the Control Group Mean Rank of ).
For Phase 3 (Iterative Structures), a statistically significant difference was identified (), accompanied by a near-medium effect size () where the Experimental Group obtained a significantly higher Mean Rank of (, ) compared to the Control Group (, ).
4.5. Mixed ANOVA: Interaction Effects and Global Analysis
A Mixed ANOVA 3 (Times: Phase 1, Phase 2, Phase 3) × 2 (Assessments: Pre-test, Post-test) × 2 (Groups: CG, EG indicated notable main effects for Time (), Assessment (), and Teaching Modality (), indicating that scores improved over time and that the Experimental group achieved higher average performance. Crucially, a significant Assessment × Modality interaction () was identified, confirming that the magnitude of learning gain depended on the instructional method, with the Experimental group showing the largest increase. No statistically significant interactions were found for Time × Modality or Time × Assessment × Modality (), indicating that the instructional advantage established early was stable across all phases.
4.6. Technology Acceptance Model
To validate the internal consistency of the survey instrument and assess user perceptions, a reliability analysis was conducted, along with descriptive statistics for each construct.
Table 6 presents the five constructs, along with the three questions applied to each. The mean and the standard deviation (SD) of each question are also shown. The mean values range between 4.24 and 3.35, while the standard deviation lies between 1.21 and 0.70. From these data, it can be inferred that the highest means (with values greater than or very close to 4) are found in Perceived Usefulness, Perceived Ease of Use, and Attitude Toward Use, which indicates that COVIA is perceived as useful, easy to use, and that students hold a positive attitude toward COVIA as a support tool. Mean perceived enjoyment is lower than 4 but higher than 3.5, suggesting that COVIA is motivating. However, not all students consider it more enjoyable than traditional methods, given that the standard deviation is greater than 1. For mean values lower than 3.5 (in two of the questions) and with a standard deviation greater than 1 (in two of the questions), the construct Intention to Use is observed, which indicates that although students value the use of COVIA, not all plan to continue using it or recommend its use.
The results, detailed in
Table 7, demonstrate robust internal consistency, with both McDonald’s Omega (
) and Cronbach’s Alpha (
) coefficients exceeding the recommended threshold of 0.70 across all dimensions. Descriptive findings indicate a generally positive acceptance of the platform; specifically, Perceived Ease of Use (PEU) attained the highest mean score (
), suggesting that the unified interface effectively minimized technical friction for novice learners. Similarly, Perceived Usefulness (PU) and Attitude Toward Using (ATU) showed strong positive agreement (
and
respectively), indicating that students consciously recognized the pedagogical value of the constructive environment. While Intention To Use (ITU) yielded a slightly lower mean (
) compared to the other constructs, the overall data reflect a favorable reception of the proposed blended learning model.
To evaluate the relationship between the constructs and the hypotheses, the Pearson correlation coefficient was calculated and is listed for each hypothesis (
Figure 10). The highest value of this coefficient is found in H4 at 0.852, followed by H3 at 0.843, indicating that perceived utility and perceived enjoyment positively influence attitude toward use. If they enjoy it and find it useful, their attitude toward COVIA will be more positive. The next highest values of the Pearson coefficient are found in H7 (0.706), H6 (0.685), and H5 (0.682); these three values indicate moderate to strong positive correlations, meaning that perceived enjoyment, perceived utility, and attitude toward use are associated with usage intention, which suggests that users who enjoy and find COVIA useful are more likely to continue using it. Finally, hypotheses H2 = 0.383, H1 = 0.283, and H8 = 0.260 show a weak correlation. In this case, COVIA is preparing to better adapt the instruction, and new tests are planned to improve the values between these constructs.
5. Discussion
This study examines the impact of an integrated learning environment comprising video-based instruction, blended learning through coding exercises, and a spatially continuous design as an addition to traditional instructional methods on introductory programming. In addition, GenAI was used to create videos and educational content.
Phase-level analysis across the three instructional interventions reveals differences in learning outcomes between the Experimental and Control groups. The Experimental group achieved consistent and statistically significant gains across all curricular topics, whereas the Control group’s progress was more variable, with improvements confined to specific types of content.
Longitudinal data suggests that COVIA helped during the initial acquisition of syntax. In the first phase covering language elements, novice learners in the Experimental group, using a blended approach that integrated COVIA with traditional instruction, significantly overcame initial barriers compared to the Control group. While the Control group showed no statistically significant improvement between pre- and post-tests (), the Experimental group achieved highly significant gains (). This divergence was confirmed by the Mann–Whitney U test (, , , r = 0.22), suggesting that integrating COVIA into the learning process may lower the barrier to introductory language concepts.
As the curriculum advanced to selective structures in Phase 2, the performance gap narrowed, representing the only point of convergence between the two methodologies. The Control group achieved its only significant intra-group improvement of the semester during this phase (, , ), although the Experimental group maintained a higher level of significance and effect size (, , ).
However, the Phase 2 inter-group comparison revealed no significant difference in learning gains between the blended and traditional-only models (). This suggests that traditional instructional methods remain adequate for teaching logical branching. The topic may be intuitive enough for novice learners, meaning that standard instructional methods could be as effective as specialized visual aids at this particular curricular stage.
However, the difference in performance became most apparent in the final phase, which covered iterative structures, a topic historically identified as difficult for novices. Here, the Control group stagnated, failing to achieve significant learning gains (, , ), implying that traditional instruction could not support the cognitive leap required for understanding iteration. In contrast, the Experimental group sustained its trajectory of significant improvement (), with a significant inter-group difference (). These findings may suggest that adding the COVIA environment supports learning as curricular complexity increases.
These quantitative learning gains are further contextualized by TAM results, where high scores in Perceived Usefulness (PU) and Perceived Ease of Use (PEU) suggest that students consciously recognized the value of the platform. In five of the eight hypotheses presented, values very close to 0.7 were found, whereas in the remaining three, the values ranged between 0.3 and 0.2.
Limitations and Future Work
The interpretation and generalization of the findings must be approached with caution due to potential limitations that may influence the results. Some limitations should be considered when interpreting the results. First, the experiments were conducted with pre-existing groups that displayed a variety of prior knowledge backgrounds. This limits the ability to determine whether the different pre-existing conditions caused the observed effects and requires caution when interpreting group differences. Although similar results were obtained in the pre-tests between the control and experimental groups, future interventions should consider the possibility of identifying groups with prior knowledge in programming.
Second, it is pertinent to note how the assessments were conducted. Although multiple-choice examinations with code examples were administered, this strategy may not be sufficient to accurately reflect learning gains, as coding practice constitutes a more appropriate means of evaluating programming competencies. Therefore, it would be advisable for future experiments to incorporate assessments that include code writing. At this stage, it is important to acknowledge that, while students are engaged in practical exercises, the evaluation framework would benefit from a coding component to more accurately assess programming competencies.
Third, the experiments were conducted within a single semester and at only one institution, which may limit the ability to generalize the results to other contexts, as the specific characteristics of the institution (student profile, academic environment, available resources) may have influenced the findings. Since the Tecnológico Nacional de México is an institution with nationwide presence, future interventions should integrate groups from other technological institutes across the country and, if possible, extend to other higher education institutions, given that the lessons were designed with the thematic content of several universities in mind. In addition, experiments may extend beyond a single semester.
Fourth, it must be considered that a “teaching-effect advantage” could arise, since instructors may influence the results obtained. To delimit this effect, the process described in
Section 3.3 was used to select the groups and participating instructors for the experiment. Notably, the instructors have access to standardized course content, including exercises, practice activities, evaluation rubrics, and bibliographic references, among others, such that courses are delivered using this standardized material.
Finally, the use of generative AI in educational contexts implies that, although it facilitates the rapid creation of content, it tends to lack critical reflection and pedagogical depth. This issue raises concerns regarding the accuracy of information, the development of higher-order competencies, and the potential for excessive dependence. Therefore, it is imperative to address these challenges through meticulous instructional design and effective human oversight.
6. Conclusions
This paper presented COVIA, a constructive VBL environment grounded in the ICAP framework and Cognitive Load Theory, designed to foster programming competencies within a blended learning context. Evidence suggests a positive impact of integrating constructive VBL as a supportive tool within a blended instructional framework for introductory programming education.
Compared with related experimental studies that emphasize lower levels of learner engagement, such as active [
35] or passive [
46] behaviors, COVIA adopts a constructive (coding) learning approach, which likely accounts for the observed differences in pre- and post-assessment learning gains [
35,
46]. In addition, evidence based on the TAM indicates that student perceptions remained broadly positive, underscoring the acceptance and perceived usability of the tool.
In addition, the use of GenAI within a human-in-the-loop content authoring workflow significantly accelerated the development of curricular materials, including instructional markdown files, video outlines, and video content, which were subsequently validated by programming domain experts.
While traditional instruction may be sufficient for intermediate logical concepts, it lacks the basic structure necessary for initial syntax acquisition and complex algorithmic thinking, gaps that COVIA addresses by providing a constructive environment that promotes improved learning progress throughout the introductory curriculum.
Future development of COVIA will focus on enhancing learner engagement by incorporating in-video interaction mechanisms and improving the integration of GenAI components. One planned addition is the integration of a video annotation overlayer, designed to further mitigate the Split-Attention Effect by embedding contextual notes and visual highlights directly within the video frame as video overlays [
23]. In addition, formative video quizzes will be incorporated at strategic points in the instructional sequence to serve as conceptual checkpoints. These quizzes will gate progression until sufficient understanding is demonstrated, encouraging active viewing and enabling the early identification of recurring misconceptions, an approach commonly adopted in MOOC platforms.
The most substantial evolution of COVIA involves advancing from a Constructive to an Interactive level within the ICAP framework [
33]. These LLM-based pedagogical conversational agents will gather context from the learning environment, such as video interactions and the state of the coding exercise, to provide Socratic support [
46]. Agents will engage in collaborative dialogue [
47], rather than providing direct solutions.