Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making

Kašiković, Nemanja; Dedijer, Sandra; Zeljković, Željko; Glušac, Dragana; Premčevski, Velibor; Anđelković, Aleksandar S.; Tasić, Nemanja

doi:10.3390/admsci16060246

Open AccessArticle

Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making

by

Nemanja Kašiković

¹

,

Sandra Dedijer

¹

,

Željko Zeljković

^1,*,

Dragana Glušac

²,

Velibor Premčevski

²

,

Aleksandar S. Anđelković

¹

and

Nemanja Tasić

¹

Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia

²

Technical Faculty “Mihailo Pupin”, University of Novi Sad, 23000 Zrenjanin, Serbia

^*

Author to whom correspondence should be addressed.

Adm. Sci. 2026, 16(6), 246; https://doi.org/10.3390/admsci16060246

Submission received: 24 April 2026 / Revised: 16 May 2026 / Accepted: 18 May 2026 / Published: 22 May 2026

Download

Browse Figures

Versions Notes

Abstract

In contemporary organizations, digital learning environments and AI-supported instructional modalities play an increasingly important role in workforce upskilling and operational efficiency. Despite growing investments in video-based learning and AI-generated instructional agents, empirical evidence on their effectiveness remains inconclusive. This study examines whether different digital learning modalities influence skill acquisition, task performance, retention, and user perceptions in a simulated work-related context. An experimental study was conducted with 65 participants assigned to one of three learning conditions: static instructional material, video-based instruction with human narration, and video-based instruction with an AI-generated avatar. Performance was assessed through a pretest–posttest design, a practical task simulating a typical data-processing activity, and a delayed retention test after seven days. Participants also evaluated the learning experience in terms of clarity, engagement, and overall effectiveness. The results revealed no statistically significant differences between instructional modalities in knowledge acquisition, task performance, or retention. Similarly, no statistically significant differences were observed in participants’ self-reported ratings. However, qualitative findings suggested that some participants perceived the AI-generated avatar as somewhat distracting, despite generally positive evaluations of the video-based formats. These findings did not provide evidence that more technologically advanced and resource-intensive learning formats led to superior performance outcomes in the present sample. The findings highlight the importance of instructional design quality over technological complexity and point to a potential mismatch between user preferences and actual performance. From a management perspective, the results raise relevant questions regarding the cost-effectiveness of AI-supported learning solutions and provide evidence-based insights for decision-making in organizational learning and digital transformation strategies.

Keywords:

digital training; AI-supported instruction; decision-making; cost-effectiveness

Graphical Abstract

1. Introduction

In contemporary organizational environments, continuous learning and digital upskilling are essential for maintaining competitiveness and adapting to rapidly evolving technological demands. Organizations increasingly rely on digital learning solutions to enhance employee competencies, improve productivity, and support operational efficiency. Recent organizational training research additionally emphasizes the importance of responsible and evidence-based implementation of AI-supported learning systems, particularly regarding their practical effectiveness, transparency, and managerial implications (Chen, 2024). Among these solutions, video-based instruction and, more recently, artificial intelligence (AI)-supported learning formats have attracted significant attention as scalable and flexible approaches to workforce development (Wiafe et al., 2025; Park, 2024; Bautista et al., 2024; Lin & Yu, 2024; Brynjolfsson et al., 2023; W. Xu & Ouyang, 2022; Cavanagh & Kiersch, 2022).

The growing adoption of these technologies is largely driven by the assumption that more advanced and interactive learning formats lead to better learning outcomes, higher engagement, and improved performance. In particular, video-based instruction is often considered more effective than static materials because it can demonstrate procedures dynamically and support the acquisition of practical skills (Ghilay, 2025; Dipon & Dio, 2024). Similarly, AI-generated instructional agents are promoted as innovative tools capable of enhancing learner interaction and creating a sense of social presence within digital learning environments (Atabekova et al., 2026; Alam & Mohanty, 2023).

AI-powered virtual reality (VR) and augmented reality (AR) learning environments offer immersive, practice-oriented learning opportunities (Al-Ansi et al., 2023; Xie et al., 2021), yet their high implementation costs, technical requirements, and limited accessibility raise concerns regarding their scalability and cost-effectiveness in organizational contexts (Blackburn, 2023).

Despite these trends, empirical findings regarding the effectiveness of different instructional modalities remain inconsistent. While some studies report positive effects of video-based learning on academic performance and engagement (Shen, 2024; Noetel et al., 2021), others suggest that such benefits may be context-dependent and influenced by factors such as task complexity, learner characteristics, and instructional design quality (Navarrete et al., 2025; Trenholm & Marmolejo-Ramos, 2024). This inconsistency raises a relevant question from a management perspective: do more resource-intensive and technologically advanced learning formats actually lead to improved performance outcomes, or do they primarily enhance users’ subjective experience without measurable gains in effectiveness?

The integration of AI into digital learning environments adds further complexity. AI-generated avatars and virtual instructors are increasingly used to simulate human presence in instructional materials, aiming to improve engagement and perceived interactivity. Research suggests that such features may influence learning through social and cognitive mechanisms, including increased motivation and perceived clarity (T. Xu et al., 2025; Tan, 2024; Beege et al., 2023). However, other findings indicate that AI-generated instructors may produce similar learning outcomes to human instructors, while potentially reducing perceived naturalness or increasing cognitive distraction (Lin & Yu, 2025; Arkün-Kocadere & Çağlar-Özhan, 2024). This indicates that the role of AI in instructional design remains insufficiently understood.

From an organizational perspective, this presents a significant challenge. Investments in digital learning are often justified by expectations of improved performance and efficiency, yet the actual return on these investments remains uncertain. In particular, the potential mismatch between perceived effectiveness and objective performance outcomes may lead to suboptimal decision-making in the selection and implementation of learning strategies. Understanding this relationship is therefore essential for developing evidence-based approaches to organizational learning and digital transformation. Recent reviews further suggest that AI is increasingly reshaping organizational learning and knowledge-management processes, requiring organizations to critically evaluate how AI-supported systems influence learning effectiveness and strategic decision-making (Litvinenko, 2026).

In response to these challenges, this study examines the effects of three instructional modalities—static instructional materials, video-based instruction with human narration, and video-based instruction with an AI-generated avatar—on knowledge acquisition, task performance, and retention in a controlled experimental setting. In addition to objective performance measures, the study includes subjective evaluations of the learning experience, allowing for analysis of potential discrepancies between perceived and actual effectiveness.

Accordingly, this study addresses the following research questions:

RQ1. Do different digital instructional modalities influence immediate knowledge acquisition in a task-oriented learning context?
RQ2. Do different instructional modalities influence procedural task performance and knowledge retention over time?
RQ3. What is the relationship between subjective evaluations of instructional effectiveness and objective performance outcomes?
RQ4. How do participants qualitatively perceive engagement, clarity, and usability across different instructional modalities?

The contribution of this study is threefold. First, it provides empirical evidence on the comparative effectiveness of different digital learning formats in a task-oriented context relevant to workplace activities. Second, it identifies boundary conditions under which more advanced and resource-intensive learning modalities may not yield measurable performance benefits. Third, it offers practical implications for organizational decision-making by highlighting a potential divergence between perceived and actual effectiveness of learning solutions, thereby informing more cost-effective and evidence-based approaches to digital learning and AI adoption.

2. Theoretical Framework and Hypothesis Formulation

2.1. The Cognitive Theory of Multimedia Learning

The rapid development of digital technologies has transformed educational practices, shifting learning environments from traditional static knowledge transmission toward multimedia-based instructional approaches. Multimedia learning occurs when learners construct meaningful mental representations by integrating verbal and visual information presented through words, images, animations, or other multimodal instructional formats (Mayer, 2002a). The cognitive theory of multimedia learning (CTML), which seeks to explain how people learn academic material from words and graphics (Mayer, 2024; Park, 2024), posits that learning from multimedia materials is governed by three core assumptions: dual channels, limited cognitive capacity, and active processing (AlShaikh et al., 2024; Cavanagh & Kiersch, 2022). Learners process verbal and visual information through separate cognitive channels, each with limited capacity, and meaningful learning occurs when learners actively select, organize, and integrate information (Shamim, 2018).

The modality principle (Mayer, 2024) suggests that presenting verbal information as spoken narration rather than on-screen text can reduce cognitive overload in the visual channel and enhance learning outcomes. When textual explanations are delivered visually alongside graphics, both streams compete for processing within the visual channel, potentially increasing extraneous cognitive load. In contrast, distributing information across auditory and visual channels, such as combining narrated explanations with dynamic screen demonstrations, should facilitate more efficient processing and integration (Mayer, 2024).

Beyond modality effects, the social agency theory of multimedia learning proposes that socially engaging cues such as conversational tone, human voice, or embodied agents can foster greater cognitive engagement and generative processing (Mayer, 2024). Bechtold (2023), in research within the field of educational psychology, suggests that multimedia materials combining speech and visual elements, together with the use of social cues in communication, may contribute to a better understanding of educational content and greater satisfaction with learning.

It is important to emphasize that CTML recognizes a clear distinction between immediate performance and long-term retention. While multimedia advantages may appear in immediate post-tests, retention after a delay reflects the durability of mental representations and the stability of knowledge integration (Mayer, 2002a, 2002b). From an organizational perspective, these distinctions are particularly relevant because decisions regarding digital learning investments are often based on assumptions about training efficiency, long-term knowledge retention, and workforce performance outcomes. Consequently, instructional design choices represent not only pedagogical decisions but also organizational decisions related to resource allocation and training effectiveness.

2.2. Video-Based Learning Materials Effectiveness

Research shows that instructional videos can effectively support the learning process and, in certain contexts, lead to better learning outcomes compared to traditional approaches or static textual materials (Ashrafi & Hosna, 2025; Noetel et al., 2021). Their effectiveness is often explained by the possibility of presenting authentic demonstrations of procedural skills, allowing learners to control the pace of learning, and enabling instructional content to be designed in accordance with multimedia learning principles. Video materials also allow for a more flexible and scalable approach to learning content across different temporal and technological contexts (Noetel et al., 2021).

Analysis of empirical studies published between 2019 and 2023 shows that the use of video instruction in the fields of science and mathematics achieves a large positive effect (Dipon & Dio, 2024). Similar results have also been obtained within the flipped classroom model, where video materials can support more active and deeper learning (Bobkina et al., 2025; Shen, 2024). Research results (Bland et al., 2024) provide strong empirical support for applying the principles of the cognitive theory of multimedia learning in the design of instructional materials in the field of medicine, where using materials designed according to these principles led to the achievement of better results on knowledge tests and demonstrated higher levels of situational interest during learning.

In software-learning environments, video materials can facilitate understanding of procedural operations compared to static textual explanations, contribute to greater cognitive engagement, and support the formation of mental models required for applying knowledge in practical tasks (Ghilay, 2025).

In the research by Truss et al. (2024), it is shown that instructor-generated video content can contribute to various forms of engagement, including behavioral, cognitive, and affective dimensions of learning. At the same time, analyses of educational practices indicate that engagement largely depends on contextual factors such as schedules of obligations, cognitive demands of instructional content, individual characteristics, the way video materials are implemented, educational level, and the broader educational context (Truss et al., 2024; Lin & Yu, 2024; Trenholm & Marmolejo-Ramos, 2024).

It is also emphasized that technological characteristics of video content can simultaneously foster a sense of social presence and emotional engagement, contribute to increased motivation for learning and the development of practical skills (Lin & Yu, 2024), but may also lead to decreased interest if not adequately aligned with learners’ needs (Truss et al., 2024; Khilya et al., 2024).

An analysis of numerous studies has identified key characteristics of video materials that may influence learning effectiveness, including audio and visual elements, textual components, instructor behavior, learners’ activities during viewing, interactive features, production style, and instructional design principles (Navarrete et al., 2025).

Recent studies comparing different digital technologies in education indicate that their effects on learning outcomes are not always consistent. Studies analyzing the influence of platforms such as YouTube, as well as immersive technologies such as augmented reality (AR) and virtual reality (VR), show that more advanced technologies do not necessarily lead to better learning outcomes compared to simpler digital media (Wiafe et al., 2025; Çeken & Taşkın, 2025).

Research grounded in cognitive load theory also points to potential advantages of video instruction in the learning process. Empirical findings show that video tutorials may reduce intrinsic cognitive load while simultaneously increasing germane cognitive load compared to traditional instructional approaches (Fan et al., 2024). Such redistribution of cognitive resources may facilitate deeper information processing and more effective knowledge acquisition, particularly in situations involving demonstrations of procedural steps and work in digital environments (Fan et al., 2024). Previous findings indicate that video tutorials may lead to greater engagement of cognitive resources compared to traditional instructional methods without video materials (Fan et al., 2026a). Furthermore, video-based instruction may enable a more stable level of cognitive activity during task performance (Fan et al., 2026b).

One important aspect of instructional video design concerns the presence of the instructor and the use of social cues in video content (Beege et al., 2023). Research shows that the presence of a visible instructor in video materials can increase the sense of social presence, motivation, and emotional engagement during learning and produce a statistically significant positive effect on knowledge retention, although its influence on knowledge transfer is not always confirmed (Beege et al., 2023).

However, despite these reported advantages, empirical findings remain inconsistent, suggesting that the effectiveness of video-based instruction may depend on contextual and design-related factors rather than modality alone. These inconsistencies create an important challenge for organizational decision-making, as investments in technologically advanced training solutions may not necessarily translate into measurable improvements in employee performance or learning transfer.

2.3. AI-Supported Learning Materials

With the development of artificial intelligence, digital representations of instructors are increasingly appearing, such as AI-generated avatars that can mediate the presentation of instructional content in video materials. Their introduction requires an interdisciplinary approach that connects pedagogical, technological, and ethical aspects of contemporary education (Atabekova et al., 2026; Alam & Mohanty, 2023). Research indicates that characteristics of these digital instructors, such as the level of similarity to human behavior and the presence of visual cues, may significantly influence the learning process (T. Xu et al., 2025).

Although some findings highlight limitations regarding the naturalness of digital avatars, others suggest that they may serve as an effective alternative to traditional text-based communication, contribute to greater engagement, and reduce extraneous cognitive load during the learning process (Yusuf et al., 2025; Lin & Yu, 2025; Zhang et al., 2024; Tan, 2024). Empirical evidence indicates that the use of avatars with generated speech may achieve similar results in knowledge acquisition compared to video materials presented by a real instructor, while potentially resulting in lower levels of technology acceptance and satisfaction with learning (Arkün-Kocadere & Çağlar-Özhan, 2024).

These points out that AI-supported instructional modalities may influence user perception more strongly than objective learning outcomes, highlighting the need for empirical validation in task-oriented contexts. For organizations, this raises important questions regarding the cost-effectiveness, scalability, and practical value of AI-supported instructional modalities in professional learning environments.

2.4. Hypothesis Formulation

Although prior research grounded in the Cognitive Theory of Multimedia Learning suggests potential advantages of video-based and AI-supported instructional modalities, empirical findings remain inconsistent, particularly in task-oriented and applied learning contexts. These inconsistencies indicate that the effectiveness of instructional modalities may be conditional and influenced by contextual factors such as task complexity, cognitive demands, and instructional design quality. The present study formulates a set of hypotheses addressing both objective performance outcomes and subjective evaluations of instructional effectiveness. The proposed hypotheses were formulated to operationalize different dimensions of the research questions, including objective learning outcomes, subjective evaluations, and exploratory qualitative perceptions across instructional modalities.

The conceptual relationships examined in this study are illustrated in Figure 1.

H1.

Participants exposed to video-based instructional modalities (human narration and AI-generated avatar) are expected to achieve higher levels of immediate knowledge acquisition compared to those exposed to static instructional materials.

H2.

Participants in video-based instructional conditions are expected to demonstrate better performance in a practical task, as reflected in higher accuracy, shorter completion time, and fewer errors, compared to participants in the static condition.

H3.

Participants exposed to video-based instructional modalities are expected to demonstrate better knowledge retention over time compared to those exposed to static instructional materials.

H4.

Subjective evaluations of instructional effectiveness are expected to demonstrate weak associations with objective performance outcomes.

H5.

Exploratory qualitative analyses are expected to indicate differences in perceived engagement, clarity, and usability across instructional modalities.

3. Materials and Methods

3.1. Research Design

The study was a controlled experimental investigation designed to examine the effectiveness of different digital learning modalities in a task-oriented learning context relevant to workplace activities. The initial sample consisted of 87 participants (G1 = 27, G2 = 28, G3 = 32). The final analyzed sample included 65 participants (G1 = 25, G2 = 21, G3 = 19), corresponding to an overall attrition rate of 25.3% (see Table 1). Attrition was primarily associated with participant absence during scheduled experimental phases, particularly the delayed retention phase, incomplete or unsubmitted questionnaires, incorrectly entered identification information preventing response matching across phases, and failure to save or submit practical task files required for evaluation. The larger reduction observed in group 3 appeared to be primarily related to attendance-related factors during later phases of data collection rather than performance-related exclusion. Only participants with complete and evaluable datasets across all phases were retained in the final analysis.

The final sample included 22 male and 43 female participants enrolled in the first year of the Graphic Engineering and Design program at the Faculty of Technical Sciences, University of Novi Sad, Serbia. Participants were predominantly between 18 and 21 years of age. All participants were novice users without prior formal experience with the spreadsheet functions addressed in the study.

The use of a student sample was methodologically justified by the study’s objectives. Specifically, participants were considered representative of novice users in a digital learning context. In organizational settings, similar conditions often arise when employees are introduced to new digital tools, software systems, or data-processing tasks for which they have little or no prior experience. In such situations, early learning phases are characterized by the acquisition of fundamental procedural knowledge under structured instructional conditions, making student participants a suitable proxy for entry-level workforce training.

Additionally, the relatively homogeneous composition of the sample enhanced the internal validity of the study by reducing variability associated with prior knowledge, experience, and professional background. This allowed for a more precise examination of the effects of instructional modality, as potential confounding influences were minimized.

Participants were randomly assigned to one of three experimental groups using a random allocation procedure. Group 1 received static instructional materials consisting of text and images. Group 2 was exposed to a screen-recorded video accompanied by human voice narration, while Group 3 received a comparable video-based instructional format supplemented with an AI-generated avatar delivering the narration. The allocation procedure was intended to minimize systematic bias across conditions.

3.2. Instructional Materials and Experimental Procedure

The instructional content was designed to simulate a typical task-oriented digital learning scenario relevant to workplace environments. The learning material focused on the application of fundamental spreadsheet functions (AVERAGE, MIN, MAX, MEDIAN, and SUM), as well as formula replication across multiple cells. These operations reflect common data-processing tasks in administrative and business contexts.

Across all three experimental conditions, the instructional content was identical in structure, sequence, and conceptual scope, differing only in the mode of presentation. In the static condition, content was delivered through written explanations supported by screenshots illustrating the workflow (see Figure 2). In the video-based conditions, the same instructional sequence was presented through a real-time screen recording of the task execution (see Figure 3). In the human narration condition, explanations were provided through a recorded voice-over, while in the AI condition, narration was delivered by an AI-generated avatar positioned in the lower corner of the screen (see Figure 4). The textual script used in the static condition corresponded directly to the narration used in both video conditions, ensuring content equivalence. These materials are publicly available on the website https://www.asking.edu.rs/ (accessed on 1 December 2025) (Kašiković et al., 2025).

In the AI-avatar condition, the instructional video was created using Studio D-ID (Creative Reality Studio 3.0), which enables the generation of talking avatars synchronized with pre-recorded narration. The avatar was selected from the platform’s built-in avatar database and was presented as a human-like digital instructor positioned in the lower right corner of the screen throughout the instructional sequence. The avatar depicted a semi-realistic middle-aged male instructor with neutral visual characteristics. The avatar was designed in a semi-realistic visual style and included facial animation, subtle head movements, and automatic lip synchronization generated through the platform’s AI voice engine. The narration was delivered using AI-generated synthetic speech in English, synchronized automatically with facial movements and mouth articulation. The avatar remained visually present during the entire video lesson and did not interact dynamically with the learner beyond scripted narration. No additional gestures, adaptive feedback, or interactive elements were included.

The overall sequence of the experimental procedure is presented in Figure 5.

The experimental procedure comprised five phases. First, participants completed a pretest to assess baseline knowledge. The test included eight items—multiple-choice and true/false questions—with a maximum score of eight points. The pretest measured prior knowledge and served as a control variable in subsequent analyses.

The second phase was the learning session, during which participants engaged with instructional materials corresponding to their assigned condition. This phase lasted 20 min for all groups. During the learning phase, participants in all conditions were allowed to revisit the instructional material within the allocated 20 min period. In the static condition, participants could scroll through the instructional content freely, while in the video-based conditions, participants could pause, rewind, and replay segments of the instructional videos. Although the navigation format differed according to the instructional medium, all groups had identical exposure duration and equal access to the learning material within the same fixed time frame.

Immediately after the learning phase, participants completed a post-test to assess comprehension of the instructional content. The post-test mirrored the pretest in structure, comprising multiple-choice, true/false, and short-answer items, with a maximum score of eight points. Although the pretest and post-test assessed the same instructional content and learning objectives, the items were not identical. The tests followed the same structural format and covered equivalent procedural concepts, but variations in wording and item presentation were introduced in order to reduce simple recall effects and minimize potential practice effects across testing phases.

The third phase was a practical task simulating a typical data-processing activity. Participants were required to apply the learned spreadsheet functions to a dataset representing performance metrics across multiple entities. Task performance was evaluated using three indicators: accuracy (number of correct outputs), completion time (in seconds), and number of errors. The maximum achievable score was eight points, with one point awarded for each correctly calculated output.

The fourth phase involved a subjective evaluation of the learning experience. Participants completed a questionnaire with Likert-scale items assessing perceived clarity, engagement, and overall effectiveness of the instructional format, as well as open-ended questions for qualitative feedback.

The fifth phase took place seven days after the initial session and consisted of a retention test in the form of a practical task. This task followed the same structure and evaluation criteria as the initial practical task, but with a reduced maximum score of four points and modified datasets designed to assess equivalent procedural knowledge while reducing direct repetition effects. The retention phase enabled the assessment of longer-term knowledge stability and procedural task performance.

3.3. Measures

The study incorporated both objective and subjective measures to capture different dimensions of learning effectiveness.

Objective performance was assessed using pre-test and post-test scores, task accuracy, task completion time, number of errors, and retention performance. These measures reflect key indicators of learning effectiveness in organizational contexts, including knowledge acquisition, procedural execution, efficiency, and error minimization.

Subjective perceptions were measured with composite scales derived from Likert-type items. Three dimensions were assessed: perceived clarity of instruction, engagement during the learning process, and overall evaluation of the instructional format. Reliability analysis indicated high internal consistency for all scales, supporting their use as aggregated measures.

Additionally, qualitative responses were collected to provide deeper insight into participants’ experiences with different learning modalities, particularly regarding perceived usability and potential limitations of the instructional formats.

3.4. Data Analysis

Statistical analysis was performed using IBM SPSS Statistics 20. Descriptive statistics were examined to assess central tendency and variability across all measured variables. To test for differences between experimental groups, one-way analysis of variance (ANOVA) was applied to normally distributed variables, while analysis of covariance (ANCOVA) was used when it was necessary to control for prior knowledge or immediate learning outcomes. When assumptions of normality were not met, non-parametric tests (Kruskal–Wallis) were used. This combination of statistical procedures enabled a robust evaluation of differences between learning modalities, considering the characteristics of the data and ensuring the validity of the results.

Because the study addressed a theoretically integrated set of hypotheses examining related dimensions of learning effectiveness, formal family-wise error corrections were not applied across all statistical tests. Instead, emphasis was placed on effect sizes, consistency of patterns across analyses, and theoretical interpretation of the findings, in line with recommendations for exploratory and multifactorial educational research contexts. In the present study, the majority of analyses were non-significant, reducing the likelihood that the overall interpretation of the findings was substantially influenced by Type I error inflation.

Qualitative responses from the open-ended questionnaire items were analyzed using a descriptive thematic approach. Participant comments were independently reviewed by two researchers and grouped into recurring thematic categories related to clarity, engagement, usability, and perceived limitations of the instructional formats. Any differences in interpretation were resolved through discussion and consensus. Given the exploratory and supplementary role of the qualitative data, the analysis was intended primarily to support interpretation of the quantitative findings rather than to provide a fully formalized qualitative analytical framework.

4. Results

4.1. Descriptive Analysis

An initial understanding of the results was gained through descriptive statistics and graphical representation of mean scores across all experimental phases. Visual inspection of pretest and post-test results indicated an overall improvement in performance following the learning intervention (Figure 6). The highest mean post-test scores were observed in the group exposed to video-based instruction with human narration (G2), while the lowest scores were recorded in the AI avatar condition (G3). Participants in G1 generally demonstrated intermediate performance patterns across the analyzed phases, with results remaining comparable to those observed in the two video-based conditions.

A similar pattern was observed in the practical task immediately after the learning phase (Figure 7), with participants in the video with human narration condition achieving the highest scores, followed by the AI avatar group, and the lowest performance recorded in the static condition. Error rates (Figure 7) followed the same pattern. In the delayed retention task (Figure 8), the video with human narration condition again showed the most favorable outcomes, while the AI avatar condition showed the lowest performance and the highest number of errors.

With regard to subjective evaluations (Figure 9), the results indicated that both video modalities were rated more positively than the static format, with a slightly more favorable pattern again observed for the video lesson with a human narrator. Descriptive patterns appeared broadly consistent with theoretical expectations derived from the Cognitive Theory of Multimedia Learning, which suggests that the combination of visual and auditory channels can contribute to more efficient processing of content than presentation primarily based on static text and images.

These descriptive trends may indicate a potential advantage of video-based formats in facilitating procedural understanding and initial task execution. However, descriptive differences alone are insufficient for evaluating learning effectiveness, which must be assessed using statistically validated performance indicators.

4.2. Immediate Knowledge Acquisition

The effect of instructional modality on immediate knowledge acquisition was examined using a one-way analysis of covariance, controlling for prior knowledge. This analysis was conducted to test H1. The results indicated no statistically significant difference between the three groups in post-test performance, F(2, 61) = 0.903, p = 0.411, partial eta squared = 0.029 (see Appendix A.2). The observed effect size was small, indicating limited practical significance of modality differences.

Although descriptive patterns suggested slightly higher performance in the video-based conditions, these differences were not sufficiently pronounced to be considered statistically reliable. Overall, the analysis did not provide evidence that one instructional format produced superior immediate knowledge acquisition compared to the others. Therefore, H1 was not supported.

In the context of organizational training research, this result aligns with the view that learning outcomes are influenced by multiple interacting factors, including instructional design, learner characteristics, and task structure, rather than delivery format alone (Salas et al., 2012).

4.3. Practical Task Performance

To test H2, the analysis of practical task performance was conducted and showed no statistically significant differences between the groups in task accuracy, F(2, 525.992) = 0.557, p = 0.576, with small effect sizes indicating minimal practical differences between conditions (see Appendix A.3).

Similarly, no statistically significant differences were found in task completion time or number of errors (see Appendix A.4 and Appendix A.5). Although the Kruskal–Wallis test for completion time approached statistical significance (p = 0.056), this result did not meet conventional thresholds for significance. Descriptively, participants in the video conditions completed the task more quickly immediately after learning, but this advantage was not maintained in the delayed condition.

Thus, H2 was not supported.

In practical terms, this might mean that differences in learning modality may not translate into meaningful differences in task execution performance, even when small efficiency trends are present. This is particularly relevant for decision-making in learning design, as it indicates that perceived improvements in efficiency may not reflect stable or generalizable performance gains.

Consistent with prior research on training transfer, the ability to apply learned knowledge in practical contexts depends on factors beyond initial exposure, including task characteristics and opportunities for reinforcement (Baldwin & Ford, 1988).

4.4. Knowledge Retention

H3 was tested by analyzing delayed knowledge retention. The analysis of delayed knowledge retention revealed no statistically significant differences between the groups, F(2, 61) = 0.458, p = 0.634, partial eta squared = 0.015 (see Appendix A.6). This small effect size further suggests minimal practical impact of instructional modality on knowledge retention. Neither task completion time (see Appendix A.7) nor number of errors (see Appendix A.8) showed significant variation across conditions in the delayed phase. Accordingly, H3 was not supported.

These findings indicate that instructional modality did not influence the durability of learning over time. Theoretically, this suggests that any potential advantages of multimedia or AI-supported instructional modalities during initial exposure do not necessarily lead to more stable knowledge representations.

In organizational training contexts, retention and transfer are considered critical indicators of effectiveness, as they determine whether acquired knowledge can be applied in real work situations (Salas et al., 2012). The absence of differences in retention performance, therefore, indicates that more advanced instructional formats may not provide additional value in supporting long-term learning outcomes in comparable task conditions.

4.5. Perception–Performance Relationship

To examine H4, the analysis of relationships between subjective evaluations and objective performance outcomes showed no statistically significant correlations (see Appendix A.10). One-way ANOVA analyses additionally indicated no statistically significant differences between instructional modalities regarding perceived clarity, engagement, or overall system evaluation (see Appendix A.10), despite descriptively more positive evaluations observed for the video-based formats. Perceived clarity was not associated with post-test performance (r = 0.105, p = 0.404), engagement was not associated with task performance (r = 0.078, p = 0.536), and overall modality evaluation was not associated with retention performance (r = 0.049, p = 0.698) (see Appendix A.11). These findings are consistent with H4, indicating weak and statistically non-significant associations between subjective evaluations and objective performance outcomes.

These findings indicate a divergence between perceived and actual effectiveness of learning modalities. From a management perspective, this finding is particularly relevant, as it suggests that user satisfaction and engagement may not be reliable indicators of performance improvement.

These results emphasize the need to evaluate learning effectiveness across multiple dimensions, including cognitive, behavioral, and performance outcomes. Relying solely on subjective feedback may therefore lead to suboptimal decisions in the selection and implementation of learning strategies.

4.6. Qualitative Analysis of Participants’ Experiences Across Instructional Modalities

Qualitative responses provided additional insight into how participants in the study experienced the different forms of instructional presentation. The qualitative findings should be interpreted as exploratory and illustrative, complementing the quantitative analyses rather than serving as independently validated qualitative conclusions.

In the group that learned through the static presentation, responses most frequently emphasized the clarity, conciseness and transparency of the material, as well as the visual support provided through images and the step-by-step representation. This format was described as “short and clear”, “simply explained” and “easy to follow”. The main advantages highlighted were the structured organization of the material, the possibility of easily returning to the content, and the feeling of independent learning. As potential improvements, greater interactivity and stronger visual emphasis on key information were suggested. Overall, the static format was perceived as a clear and understandable way of initially introducing new content.

In the group exposed to the video demonstration with tutor narration, it was emphasized that “watching how someone performs the task while explaining it at the same time” helped understand the procedure and apply the learned steps more easily. Particularly highlighted were the detailed explanations, repetition of key steps, and the overall efficiency of this approach. Suggestions for improvement mainly referred to technical aspects such as sound quality and a slower pace of explanation, rather than to the concept of the instructional format itself. Overall, this modality was perceived positively and described as useful, engaging, and easy to follow.

In the group exposed to the video demonstration with the AI avatar, the advantages of the video format, particularly the possibility of revisiting the content, following the procedure step by step, the clearly structured sequence of actions, and the practical nature of the format were recognized. However, unlike the tutor video group, a clear reservation towards the presence of the AI avatar emerged. Several participants subjectively perceived the visual representation of the avatar as somewhat distracting or less natural, which may have influenced attention during task performance. As possible improvements, it was suggested removing the avatar from the frame, slowing down the pace of explanation, and providing stronger visual highlighting of formulas or instructions on the screen. Thus, although the basic video format was perceived as useful and practical, the personalized AI component itself was not uniformly experienced as an advantage.

The qualitative findings suggested descriptive differences in participants’ subjective experiences across instructional modalities, particularly regarding engagement and perceived usability. However, these perceptions were not reflected in statistically significant quantitative differences, further reinforcing the divergence between perceived and actual effectiveness.

5. Discussion

5.1. Instructional Modality and Effectiveness

Addressing RQ1, the findings related to immediate knowledge acquisition indicated that video-based instructional modalities did not produce statistically superior outcomes compared to the static instructional condition. More broadly, the results suggest that the choice of instructional modality did not significantly influence learning outcomes within the present task-oriented learning context. The non-support of H1–H3 further indicates that the expected advantages of richer instructional modalities may not emerge in relatively simple, well-structured procedural tasks. The consistently small effect sizes across analyses additionally support the conclusion that instructional modality had limited practical significance in this context. This challenges the common assumption that more technologically advanced learning formats, such as video-based instruction or AI-supported delivery, inherently lead to superior performance. Instead, the present findings did not show statistically reliable evidence that video-based or AI-supported delivery produced superior knowledge acquisition or application compared to static materials, when instructional content is clearly structured and cognitively manageable. This interpretation aligns with recent findings by Zhang et al. (2024), who showed that although AI-generated video materials were more positively perceived and more readily accepted by learners, these advantages did not result in improved learning outcomes compared to traditional paper-based materials. This divergence reinforces the notion that learner acceptance and perceived effectiveness are not necessarily reliable indicators of actual performance gains.

From a managerial standpoint, these findings raise important questions about the return on investment (ROI) of advanced digital learning solutions. If higher levels of technological sophistication do not produce measurable improvements in performance, organizations may risk overestimating the effectiveness of formats that primarily enhance user perception without corresponding evidence of improved performance. This has direct implications for organizational learning strategies, indicating that effectiveness may depend more on instructional design quality than on technological sophistication. In practical terms, organizations may achieve comparable performance outcomes using simpler and less resource-intensive instructional solutions, particularly for routine procedural tasks.

5.2. Cost–Effectiveness and Resource Allocation

Regarding RQ2, the findings similarly indicated no statistically significant differences between instructional modalities in procedural task performance, including task accuracy, completion time, and error frequency. These results have important practical implications from a cost-effectiveness perspective, particularly considering that advanced instructional formats, such as professionally produced videos and AI-generated instructional agents, typically require substantial financial, technical, and time-related resources. However, the lack of measurable performance advantages in this study suggests that such investments may not always lead to improved outcomes. Organizations may need to reconsider the assumption that more complex and visually engaging formats justify their higher costs. Instead, the findings support a more strategic approach, selecting modalities based on task requirements and expected performance gains rather than technological appeal or perceived innovation.

5.3. Perception–Performance Gap

With respect to RQ3, the retention findings suggested that the instructional modalities did not differ substantially in supporting longer-term knowledge retention within the present learning context. In addition to retention outcomes, an important contribution of this study lies in the identification of a divergence between subjective perceptions and objective performance outcomes. Although participants tended to evaluate video-based formats as more engaging and effective, these perceptions were not associated with improved task performance or knowledge retention. This perception–performance gap has important implications for organizational decision-making. Learning programs are often evaluated based on user satisfaction and engagement metrics, which may not accurately reflect their impact on actual performance. The results of this study suggest that relying solely on subjective evaluations may lead organizations to overestimate the effectiveness of certain formats, particularly those perceived as more modern or technologically advanced.

The findings related to AI-generated avatars provide further insight into the implementation of emerging technologies in organizational learning environments. Although AI-supported instructional modalities are often promoted as scalable and innovative solutions, the results suggest that their impact on learning effectiveness may be limited in certain contexts.

Moreover, qualitative feedback from some participants suggests that the presence of an AI-generated avatar may introduce unintended effects, such as perceived distraction or reduced naturalness of the interaction. These factors may negatively influence user experience without providing measurable benefits in performance. From an organizational perspective, the integration of such technologies should not be driven solely by their novelty or perceived innovation, but should be carefully evaluated in terms of their actual contribution to performance outcomes and the quality of user interaction.

5.4. Boundary Conditions of Digital Learning Effectiveness

In relation to RQ4, subjective evaluations and qualitative perceptions revealed descriptive differences in perceived engagement, usability, and overall learning experience across instructional modalities, despite the absence of corresponding objective performance differences. More specifically, participants generally evaluated video-based formats more positively, while these perceptions were not associated with improved learning outcomes. These findings contribute to a more nuanced understanding of the conditions under which advanced instructional modalities may or may not provide measurable benefits. In this context, the absence of statistically significant modality effects may be interpreted as a potential boundary condition, suggesting that the benefits of multimedia and AI-supported instructional modalities may depend on contextual factors such as task complexity, cognitive demands, and learner characteristics.

In relatively simple and well-defined task environments, where cognitive load is limited and procedural steps are straightforward, different instructional formats may be equally effective. In such conditions, the added value of multimodal or AI-enhanced presentation may be reduced. This perspective aligns with the view that the effectiveness of digital learning is conditional rather than universal. For organizations, this implies that the selection of learning modalities should be aligned with the specific characteristics of the task and the learning objectives, rather than based on general assumptions about the superiority of certain technologies.

6. Conclusions

This study examined the effectiveness of different digital learning modalities in a task-oriented learning context relevant to organizational environments. The results showed that instructional modality did not produce statistically significant differences in knowledge acquisition, task performance, or retention. Across all three conditions—static materials, video-based instruction with human narration, and video-based instruction with an AI-generated avatar—no statistically significant differences were detected across the objective performance indicators. Subjective evaluations of the learning experience also did not differ significantly between formats.

From a management perspective, these findings have important implications for the design and implementation of digital learning strategies. In particular, the results challenge the widespread assumption that more technologically advanced and resource-intensive learning formats inherently lead to superior performance outcomes. This might suggest that technological sophistication alone is not a reliable predictor of learning effectiveness.

A key implication concerns the cost-effectiveness of organizational investments in digital learning. As advanced learning formats typically require greater financial, technical, and production resources, their adoption should be carefully evaluated in relation to expected performance gains. The findings of this study suggest that, in relatively simple and well-structured task environments, comparable outcomes can be achieved using less complex and more cost-efficient instructional approaches. This suggests that organizations may be able to optimize learning investments in comparable introductory procedural learning contexts by prioritizing instructional clarity and alignment with task requirements over technological novelty.

The study also identified a divergence between subjective perceptions and objective performance outcomes. Although participants tended to perceive video-based formats as more engaging and effective, these perceptions were not associated with improved performance. This perception–performance gap highlights a potential risk in organizational decision-making, where learning formats may be selected based on user preference or perceived modernity rather than evidence of effectiveness. Accordingly, organizations should complement subjective evaluation metrics with objective performance indicators when assessing learning outcomes.

The findings related to AI-generated instructional agents provide further insight into the adoption of emerging technologies in organizational learning. While AI-based formats are often positioned as innovative and scalable solutions, the results suggest that their contribution to performance may be limited in certain contexts. Moreover, qualitative feedback suggests that design-related factors, such as perceived naturalness and potential distraction, may influence user experience without necessarily translating into measurable performance benefits. These findings underscore the importance of critically evaluating AI-based learning solutions beyond their technological appeal.

Overall, the study findings did not support the expected advantages of video-based and AI-supported instructional modalities (H1–H3). Hypothesis 4 was supported, indicating no significant association between subjective evaluations and objective performance outcomes, while H5 received only partial qualitative support.

Despite its contributions, this study has several limitations. First, the relatively small sample size and unequal group sizes may have limited the statistical power to detect small-to-moderate effects between instructional modalities. Therefore, the non-significant findings should not be interpreted as definitive evidence of equivalence between the conditions. Rather, they indicate that no statistically reliable evidence of superiority of one instructional modality over another was detected within the present sample. Future studies with larger samples should consider equivalence testing approaches, such as two one-sided tests (TOST), or Bayesian analyses, in order to evaluate more directly whether instructional formats can be considered statistically equivalent. Additionally, unequal attrition across experimental conditions, particularly in the AI-avatar group, should be considered when interpreting qualitative observations related to this modality.

Second, the study was conducted in a controlled academic setting with a student sample, which limits the direct generalizability of the findings to real workplace populations and professional training environments. Although the study was positioned within an organizational learning perspective, the participants represented novice users engaged in introductory procedural learning tasks rather than actual organizational employees. Consequently, managerial implications should be interpreted cautiously and primarily as indicative rather than directly transferable to organizational settings.

An additional limitation concerns the relatively limited dispersion observed in some performance measures. Although the obtained scores did not indicate a strict ceiling effect, post-test and task performance results were concentrated within a relatively narrow range. This may have reduced the sensitivity of the measures to detect subtle differences between instructional conditions, particularly given the relatively simple procedural nature of the task and the novice level of participants.

Furthermore, specific design characteristics of the AI-generated avatar, including voice naturalness, visual appearance, and lip-synchronization quality, may have influenced participants’ perceptions and should be examined more systematically in future research.

Future studies should therefore examine more complex and cognitively demanding tasks, diverse employee populations, and real organizational settings in order to provide deeper insight into the conditions under which different instructional modalities may yield meaningful advantages for organizational learning and digital training.

The study contributes to a more nuanced understanding of digital learning effectiveness by demonstrating that the benefits of advanced instructional technologies are conditional rather than universal. For organizations, this implies that effective learning strategies should be guided by task characteristics, instructional design principles, and evidence-based evaluation, rather than assumptions about the inherent superiority of specific technologies.

Author Contributions

Conceptualization N.K., D.G., V.P., A.S.A. and N.T.; methodology, N.K. and S.D.; questionnaires and tests development N.K., S.D. and Ž.Z.; formal analysis and data collection Ž.Z.; statistical analysis and validation S.D.; writing—original draft preparation S.D., Ž.Z. and N.K.; writing—review and editing all authors; supervision N.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by the Ministry of Science, Technological Development and Innovation (Contract No. 451-03-34/2026-03/200156) and the Faculty of Technical Sciences, University of Novi Sad through project “Scientific and Artistic Research Work of Researchers in Teaching and Associate Positions at the Faculty of Technical Sciences, University of Novi Sad 2026” (No. 01-3609/1).

Institutional Review Board Statement

The study was conducted in an educational context using anonymized academic data and did not involve sensitive personal information and all questionnaires and tests were prior to testing approved by the Ethics Committee of Faculty of Technical Sciences, University of Novi Sad, Serbia (protocol code 01- 394/2, 26 January 2026).

Informed Consent Statement

Participation was voluntary and students were informed about the use of anonymized data for research purposes.

Data Availability Statement

The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CTML	Cognitive Theory of Multimedia Learning
ANOVA	Analysis of Variance
ANCOVA	Analysis of Covariance

Appendix A

Appendix A.1

In the following tables in Appendix A.1, the results of the ANOVA analysis conducted on the pretest scores in order to establish whether the three experimental groups were equivalent in terms of prior knowledge. The results showed that there was no statistically significant difference between the groups in pretest performance: G1 (M = 4.32, SD = 1.249), G2 (M = 4.29, SD = 0.902), and G3 (M = 4.32, SD = 1.493), F(2, 93.831) = 0.05, p = 0.995. The effect size, expressed as partial eta squared, was 0.0002, indicating a negligible effect size and confirming that the actual differences between the groups at the beginning were minimal. Based on these results, it can be stated that the randomization procedure was successful and that the groups entered the experiment with comparable levels of prior knowledge. Because the homogeneity of variance assumption was violated for the pretest score (Levene’s test, p = 0.043), additional robust tests of equality of means were examined. Welch and Brown–Forsythe tests both confirmed that there were no statistically significant baseline differences between the groups, supporting the robustness of the equivalence findings.

This finding is methodologically important for at least two reasons. First, it confirms the internal validity of the experimental design, since any differences that might appear in later phases could not be attributed to initial differences between participants. Second, confirmation of baseline equivalence justified the use of analysis of covariance in the evaluation of posttest and retention results, where the pretest and posttest scores were used as covariates to further control for individual differences in initial and immediately acquired knowledge.

Table A1. Descriptives (pretest score).

	N	Mean	Std. Deviation	Std. Error	95% Confidence Interval for Mean		Minimum	Maximum
	N	Mean	Std. Deviation	Std. Error	Lower Bound	Upper Bound	Minimum	Maximum
G1	25	4.32	1.249	0.250	3.80	4.84	2	7
G2	21	4.29	0.902	0.197	3.87	4.70	3	6
G3	19	4.32	1.493	0.342	3.60	5.04	2	7
Total	65	4.31	1.211	0.150	4.01	4.61	2	7

Table A2. Test of Homogeneity of Variances (pretest score).

Levene Statistic	df1	df2	p
3.306	2	62	0.043

Table A3. ANOVA (pretest score).

	Sum of Squares	df	Mean Square	F	p
Between Groups	0.015	2	0.008	0.005	0.995
Within Groups	93.831	62	1.513
Total	93.846	64

Table A4. Robust Tests of Equality of Means (pretest score).

	Statistic ^a	df1	df2	p
Welch	0.007	2	38.323	0.993
Brown-Forsythe	0.005	2	49.739	0.995

^a Asymptotically F distributed.

Appendix A.2

In the following tables in Appendix A.2, the results of the ANCOVA analysis (a one-way analysis of covariance) conducted in order to test the hypothesis referred to immediate knowledge acquisition using the posttest score as the dependent variable, the factor was group and the covariate was the pretest score. Preliminary analysis indicated that the assumption of homogeneity of regression slopes was not violated, allowing interpretation of the model.

Table A5. Levene’s Test of Equality of Error Variances (immediate posttest score) ^a.

F	df1	df2	p
0.752	2	62	0.476

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. ^a Design: Intercept + pretest_score + group.

Table A6. ANCOVA results for immediate posttest performance controlling for pretest scores.

Source	Type III Sum of Squares	df	Mean Square	F	p	η²
Corrected Model	4.184 a	3	1.395	1.288	0.286	0.060
Intercept	117.310	1	117.310	108.371	0.000	0.640
pretest_score	2.275	1	2.275	2.101	0.152	0.033
group	1.955	2	0.978	0.903	0.411	0.029
Error	66.032	61	1.082
Total	2165.000	65
Corrected Total	70.215	64

a. R Squared = 0.060 (Adjusted R Squared = 0.013).

Table A7. Estimated marginal means (immediate posttest scores adjusted for pretest performance).

Group	Mean	Std. Error	95% Confidence Interval
Group	Mean	Std. Error	Lower Bound	Upper Bound
G1	5.638 a	0.208	5.222	6.054
G2	5.908 a	0.227	5.454	6.362
G3	5.472 a	0.239	4.995	5.950

a. Covariates appearing in the model are evaluated at the following values: pretest_score = 4.31.

Appendix A.3

In the following tables in Appendix A.3, the results of the ANOVA analysis regarding the ability of students to apply the acquired knowledge to a practical Excel task immediately after the learning phase.

Table A8. Descriptives (task score).

	N	Mean	Std. Deviation	Std. Error	95% Confidence Interval for Mean		Minimum	Maximum
	N	Mean	Std. Deviation	Std. Error	Lower Bound	Upper Bound	Minimum	Maximum
G1	25	5.72	3.273	0.655	4.37	7.07	0	8
G2	21	6.62	2.500	0.545	5.48	7.76	0	8
G3	19	6.00	2.828	0.649	4.64	7.36	0	8
Total	65	6.09	2.892	0.359	5.38	6.81	0	8

Table A9. Test of Homogeneity of Variances (task score).

Levene Statistic	df1	df2	p
2.067	2	62	0.135

Table A10. ANOVA (task score).

	Sum of Squares	df	Mean Square	F	p
Between Groups	9.454	2	4.727	0.557	0.576
Within Groups	525.992	62	8.484
Total	535.446	64

Table A11. Robust Tests of Equality of Means (task score).

	Statistic ^a	df1	df2	p
Welch	0.600	2	40.562	0.554
Brown-Forsythe	0.574	2	60.567	0.567

^a Asymptotically F distributed.

Appendix A.4

In the following tables in Appendix A.4, the results of the non-parametric Kruskal–Wallis test used for the analyses of the task completion time. Regarding task completion time, it was previously established that the variable task_time showed positive skewness, which is why a log transformation was performed. Inspection of the Q–Q plot indicated that the improvement in normality was not sufficiently pronounced, and therefore, the non-parametric Kruskal–Wallis test was used to compare the groups.

Table A12. Ranks (task completion time).

	Group	N	Mean Rank
task_time_log	G1	25	39.74
	G2	21	30.95
	G3	19	26.39
	Total	65

Table A13. Test Statistics (task completion time) ^a,b.

	task_time_log
χ²	5.774
df	2
p	0.056

^a Kruskal–Wallis Test. ^b Grouping Variable: group.

Appendix A.5

In the following tables in Appendix A.5 are presented the results of the non-parametric Kruskal–Wallis test used for the analyses of the number of errors on the practical task.

Table A14. Ranks (task errors).

	Group	N	Mean Rank
task_errors	G1	25	33.40
	G2	21	30.31
	G3	19	35.45
	Total	65

Table A15. Test Statistics (task errors) ^a,b.

	task_errors
χ²	0.953
df	2
p	0.621

^a Kruskal Wallis Test. ^b Grouping Variable: group.

Appendix A.6

In the following tables in Appendix A.6, the results of the one-way ANCOVA used for the analyses of the retention score with the posttest score as a covariate. Preliminary checks indicated that the assumption of homogeneity of regression slopes was not violated. Given the violation of homogeneity of variance, additional robustness analyses were conducted. Welch and Brown–Forsythe robust tests of equality of means confirmed that there were no statistically significant differences between the groups (Welch: p = 0.582; Brown–Forsythe: p = 0.663). In addition, a non-parametric Kruskal–Wallis test also showed no statistically significant group differences in retention performance, Chi-square (2) = 1.164, p = 0.559. These analyses support the stability of the original findings despite the violation of variance homogeneity.

Table A16. Levene’s Test of Equality of Error Variances (retention score) ^a.

F	df1	df2	p
6.621	2	62	0.002

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. ^a Design: Intercept + posttest_score + group.

Table A17. ANCOVA results for retention performance controlling for post-test scores.

Source	Type III Sum of Squares	df	Mean Square	F	p	η²
Corrected Model	1.896 a	3	0.632	0.377	0.770	0.018
Intercept	19.130	1	19.130	11.405	0.001	0.158
posttest_score	0.613	1	0.613	0.365	0.548	0.006
group	1.538	2	0.769	0.458	0.634	0.015
Error	102.319	61	1.677
Total	518.000	65
Corrected Total	104.215	64

a. R Squared = 0.018 (Adjusted R Squared = −0.030).

Table A18. Estimated Marginal Means for retention task scores adjusted for posttest performance.

Group	Mean	Std. Error	95% Confidence Interval
Group	Mean	Std. Error	Lower Bound	Upper Bound
G1	2.557 a	0.259	2.038	3.075
G2	2.688 a	0.285	2.119	3.258
G3	2.297 a	0.299	1.699	2.894

a. Covariates appearing in the model are evaluated at the following values: posttest_score = 5.68.

Table A19. Robust Tests of Equality of Means (retention task score).

retention_score
	Statistic ^a	df1	df2	p
Welch	0.548	2	41.332	0.582
Brown-Forsythe	0.413	2	59.706	0.663

^a Asymptotically F distributed.

Table A20. Ranks (retention task score).

	Group	N	Mean Rank
retention_score	G1	25	33.94
	G2	21	35.10
	G3	19	29.45
	Total	65

Table A21. Test Statistics (retention task score) ^a,b.

	retention_score
χ²	1.164
df	2
p	0.559

^a Kruskal Wallis Test. ^b Grouping Variable: group.

Appendix A.7

In the following tables in Appendix A.7, the results of the non-parametric Kruskal–Wallis test used for the analyses of time needed to complete the retention task.

Table A22. Ranks (retention task time).

	Group	N	Mean Rank
retention_time	G1	25	29.14
	G2	21	35.95
	G3	19	34.82
	Total	65

Table A23. Test Statistics (retention task time) ^a,b.

	retention_time
χ²	1.742
df	2
p	0.419

^a Kruskal Wallis Test. ^b Grouping Variable: group.

Appendix A.8

In the following tables in Appendix A.8, the results of the non-parametric Kruskal–Wallis test used for the analyses of the number of errors on the retention task.

Table A24. Ranks (retention task errors).

	Group	N	Mean Rank
retention_errors	G1	25	32.36
	G2	21	31.00
	G3	19	36.05
	Total	65

Table A25. Test Statistics (retention task errors) ^a,b.

	retention_errors
χ²	0.876
df	2
p	0.645

^a Kruskal Wallis Test. ^b Grouping Variable: group.

Appendix A.9

In the following table in Appendix A.9, reliability statistics for the clarity, engagement and system perception scale, respectively. Before constructing these scales, used for the subjective evaluation of the learning experience, the internal consistency of the items was examined. All three scales demonstrated high reliability. These coefficients indicate that the grouped items formed sufficiently homogeneous sets of indicators and could therefore be treated as composite scales.

The subjective evaluation questionnaire contained a total of 12 Likert-type items. However, only 11 items were included in the construction of the composite scales reported in Appendix A.9. One standalone evaluative item was analyzed descriptively and presented in Figure 9 but was not included in the reliability analysis or scale construction because it did not conceptually align with the three predefined perceptual dimensions.

Table A26. Reliability Statistics (perceptual dimensions).

	Cronbach’s Alpha	Cronbach’s Alpha Based on Standardized Items	N of Items
clarity	0.936	0.938	3
engagement	0.874	0.880	3
system perception	0.957	0.957	5

Appendix A.10

In the following table in Appendix A.10, the differences between the groups on these perceptual dimensions (clarity, engagement and system perception) are shown using one-way ANOVA analysis.

Table A27. ANOVA (perceptual dimensions).

		Sum of Squares	df	Mean Square	F	p
clarity_scale	Between Groups	3.405	2	1.702	0.834	0.439
	Within Groups	126.534	62	2.041
	Total	129.938	64
engagement_scale	Between Groups	1.881	2	0.940	0.663	0.519
	Within Groups	87.873	62	1.417
	Total	89.754	64
system_eval_scale	Between Groups	2.179	2	1.090	0.545	0.582
	Within Groups	123.882	62	1.998
	Total	126.062	64

Appendix A.11

In the following table in Appendix A.11, analysis of the relationship between subjective evaluations of the learning experience and objective performance outcomes through three correlations: between clarity scale and posttest score, between engagement scale and task score and between modality evaluation scale and retention score, respectively.

Table A28. Correlations (clarity scale and posttest score).

		clarity_scale	posttest_score
clarity_scale	Pearson Correlation	1	0.105
	p (2-tailed)		0.404
	N	65	65
posttest_score	Pearson Correlation	0.105	1
	p (2-tailed)	0.404
	N	65	65

Table A29. Correlations (engagement scale and task score).

		engagement_scale	task_score
engagement_scale	Pearson Correlation	1	0.078
	p (2-tailed)		0.536
	N	65	65
task_score	Pearson Correlation	0.078	1
	p (2-tailed)	0.536
	N	65	65

Table A30. Correlations (modality evaluation scale and retention score).

		system_eval_scale	retention_score
system_eval_scale	Pearson Correlation	1	0.049
	p (2-tailed)		0.698
	N	65	65
retention_score	Pearson Correlation	0.049	1
	p (2-tailed)	0.698
	N	65	65

References

Alam, A., & Mohanty, A. (2023). Facial analytics or virtual avatars: Competencies and design considerations for student-teacher interaction in AI-powered online education for effective classroom engagement. In R. S. Tomar, S. Verma, B. K. Chaurasia, V. Singh, J. H. Abawajy, S. Akashe, P.-A. Hsiung, & R. Prasad (Eds.), Communication, networks and computing (CNC 2022). Springer. [Google Scholar] [CrossRef]
Al-Ansi, A. M., Jaboob, M., Garad, A., & Al-Ansi, A. (2023). Analyzing augmented reality (AR) and virtual reality (VR) recent development in education. Social Sciences & Humanities Open, 8(1), 100532. [Google Scholar] [CrossRef]
AlShaikh, R., Al-Mali, N., & Almasre, M. (2024). The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models. Heliyon, 10(3), e25361. [Google Scholar] [CrossRef]
Arkün-Kocadere, S., & Çağlar-Özhan, Ş. (2024). Video lectures with AI-generated instructors: Low video engagement, same performance as human instructors. International Review of Research in Open and Distributed Learning, 25(3), 350–369. [Google Scholar] [CrossRef]
Ashrafi, A. M., & Hosna, R. (2025). Comparative study of the use of video-based learning media and textbooks on fiqh learning outcomes. Urwatul Wutsqo: Jurnal Studi Kependidikan dan Keislaman, 14(3), 859–873. [Google Scholar] [CrossRef]
Atabekova, A., Atabekov, A., & Shoustikova, T. (2026). AI-facilitated lecturers in higher education videos as a tool for sustainable education: Legal framework, education theory and learning practice. Sustainability, 18(1), 40. [Google Scholar] [CrossRef]
Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41(1), 63–105. [Google Scholar] [CrossRef]
Bautista, C., Karlo, L. J., Osco, M., Elias, F., Alarcon Vasquez, S. F., & Miguel Angeles Suazo, J. (2024, October 11–13). Machine learning for corporate learning in virtual environments: A systematic review [Conference session]. 2024 14th International Conference on Dependable Systems, Services and Technologies (DESSERT) (pp. 1–8), Athens, Greece. [Google Scholar] [CrossRef]
Bechtold, S. W. (2023). The cognitive theory of multimedia learning: The impact of social cues. In J. M. Spector, B. B. Lockee, & M. D. Childress (Eds.), Learning, design, and technology (pp. 561–574). Springer. [Google Scholar] [CrossRef]
Beege, M., Schroeder, N. L., Heidig, S., Rey, G. D., & Schneider, S. (2023). The instructor presence effect and its moderators in instructional video: A series of meta-analyses. Educational Research Review, 41, 100564. [Google Scholar] [CrossRef]
Blackburn, G. (2023). AR/VR technologies in eLearning: Opportunities, challenges, and future possibilities. Available online: https://elearningindustry.com/ar-vr-technologies-in-elearning-opportunities-challenges-and-future-possibilities (accessed on 1 April 2026).
Bland, T., Guo, M., & Dousay, T. A. (2024). Multimedia design for learner interest and achievement: A visual guide to pharmacology. BMC Medical Education, 24, 113. [Google Scholar] [CrossRef] [PubMed]
Bobkina, J., Baluyan, S., & Dominguez Romero, E. (2025). Tech-enhanced vocabulary acquisition: Exploring the use of student-created video learning materials in the tertiary-level EFL flipped classroom. Education Sciences, 15(4), 450. [Google Scholar] [CrossRef]
Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). Generative AI at work. National Bureau of Economic Research. (31161). Available online: https://www.nber.org/system/files/working_papers/w31161/w31161.pdf (accessed on 1 April 2026).
Cavanagh, T. M., & Kiersch, C. (2022). Using commonly-available technologies to create online multimedia lessons through the application of the Cognitive Theory of Multimedia Learning. Educational Technology Research and Development, 71, 1033–1053. [Google Scholar] [CrossRef]
Chen, Z. (2024). Responsible AI in organizational training: Applications, implications, and recommendations for future development. Human Resource Development Review, 23(4), 498–521. [Google Scholar] [CrossRef]
Çeken, B., & Taşkın, N. (2025). Examination of multimedia learning principles in augmented reality and virtual reality learning environments. Journal of Computer Assisted Learning, 41(1), e13097. [Google Scholar] [CrossRef]
Dipon, C. H., & Dio, R. V. A. (2024). A meta-analysis of the effectiveness of video-based instruction on students’ academic performance in science and mathematics. International Journal on Studies in Education, 6(4), 732–746. [Google Scholar] [CrossRef]
Fan, E., Bower, M., & Siemon, J. (2024). Video tutorials in the traditional classroom: The effects on different types of cognitive load. Technology, Knowledge and Learning, 29, 2017–2036. [Google Scholar] [CrossRef]
Fan, E., Bower, M., & Siemon, J. (2026a). Comparing cognitive load during video versus traditional classroom instruction based on heart rate variability measures. Computers & Education, 241, 105487. [Google Scholar] [CrossRef]
Fan, E., Bower, M., & Siemon, J. (2026b). From heartbeats to actions: Multimodal learning analytics of cognitive and behavioral en-gagement in real classrooms. Learning and Instruction, 103, 102325. [Google Scholar] [CrossRef]
Ghilay, Y. (2025). Comparison of the effectiveness of distance learning for software courses in higher education: Videos vs. texts. Higher Education Studies, 15(1), 30–40. [Google Scholar] [CrossRef]
Kašiković, N., Glušac, D., Premčevski, V., Anđelković, A., & Tasić, N. (2025, November 7–9). Using studio D-ID (creative reality studio) in educational platform asking [Conference session]. 2nd International Conference on Digital Education Trends and Challenges (p. 15), Novi Sad, Serbia. [Google Scholar]
Khilya, A., Kukharchuk, H., Sabadosh, Y., & Korol, A. (2024). Principles of creating and designing video content for asynchronous learning. Environment Technology Resources. Proceedings of the International Scientific and Practical Conference, 2, 405–409. [Google Scholar] [CrossRef]
Lin, Y., & Yu, Z. (2024). A meta-analysis evaluating the effectiveness of instructional video technologies. Technology, Knowledge and Learning, 29, 2081–2115. [Google Scholar] [CrossRef]
Lin, Y., & Yu, Z. (2025). Learner perceptions of artificial intelligence-generated pedagogical agents in language learning videos: Embodiment effects on technology acceptance. International Journal of Human–Computer Interaction, 41(2), 1606–1627. [Google Scholar] [CrossRef]
Litvinenko, A. (2026). The knowledge collaboration framework: A review of artificial intelligence in organisational learning and knowledge management. Journal of Decision Systems, 35(1), 2616703. [Google Scholar] [CrossRef]
Mayer, R. E. (2002a). Multimedia learning. Psychology of Learning and Motivation, 41, 85–139. [Google Scholar] [CrossRef]
Mayer, R. E. (2002b). Multimedia learning. The Annual Report of Educational Psychology in Japan, 41, 27–29. [Google Scholar] [CrossRef]
Mayer, R. E. (2024). The past, present, and future of the cognitive theory of multimedia learning. Educational Psychology Review, 36(8), 1–25. [Google Scholar] [CrossRef]
Navarrete, E., Nehring, A., Schanze, S., Ewerth, R., & Hoppe, A. (2025). A closer look into recent video-based learning research: A comprehensive review of video characteristics, tools, technologies, and learning effectiveness. International Journal of Artificial Intelligence in Education, 35, 1631–1694. [Google Scholar] [CrossRef]
Noetel, M., Griffith, S., Delaney, O., Sanders, T., Parker, P., Del Pozo Cruz, B., & Lonsdale, C. (2021). Video improves learning in higher education: A systematic review. Review of Educational Research, 91(2), 204–236. [Google Scholar] [CrossRef]
Park, J. J. (2024). Unlocking training transfer in the age of artificial intelligence. Business Horizons, 67(3), 263–269. [Google Scholar] [CrossRef]
Salas, E., Tannenbaum, S. I., Kraiger, K., & Smith-Jentsch, K. A. (2012). The science of training and development in organizations: What matters in practice. Psychological Science in the Public Interest, 13(2), 74–101. [Google Scholar] [CrossRef]
Shamim, M. (2018). Application of cognitive theory of multimedia learning in under graduate surgery course. International Journal of Surgery Research and Practice, 5(2), 065. [Google Scholar] [CrossRef]
Shen, Y. (2024). Examining the efficacies of instructor-designed instructional videos in flipped classrooms on student engagement and learning outcomes: An empirical study. Journal of Computer Assisted Learning, 40(4), 1791–1805. [Google Scholar] [CrossRef]
Tan, S. F. (2024). Perceptions of students on artificial intelligence-generated content avatar utilization in learning management systems. Asian Association of Open Universities Journal, 19(2), 170–185. [Google Scholar] [CrossRef]
Trenholm, S., & Marmolejo-Ramos, F. (2024). When video improves learning in higher education. Education Sciences, 14, 311. [Google Scholar] [CrossRef]
Truss, A., McBride, K., Porter, H., Anderson, V., Stilwell, G., Philippou, C., & Taggart, A. (2024). Learner engagement with instructor-generated video. British Journal of Educational Technology, 55, 2192–2211. [Google Scholar] [CrossRef]
Wiafe, I., Ekpezu, A. O., Gyamera, G. O., Winful, F. B. P., Atsakpo, E. D., Nutropkor, C., & Gulliver, S. (2025). Comparative evaluation of learning technologies using a randomized controlled trial: Virtual reality, augmented reality, online video platforms, and traditional classroom learning. Education and Information Technologies, 30, 11775–11795. [Google Scholar] [CrossRef]
Xie, B., Liu, H., Alghofaili, R., Zhang, Y., Jiang, Y., Lobo, F., Li, C., Li, W., Huang, H., Akdere, M., Mousas, C., & Yu, L. F. (2021). A review on virtual reality skill training applications. Frontiers in Virtual Reality, 2, 645153. [Google Scholar] [CrossRef]
Xu, T., Chen, Q., Zhang, Z., Dong, B., Zhang, H., Bai, J., & Zhou, Y. (2025). Maximizing effectiveness of AI-generated instructors through human-like behavior and dynamic visual cues in instructional videos: Evidence from an eye-tracking study. The Internet and Higher Education, 67, 101034. [Google Scholar] [CrossRef]
Xu, W., & Ouyang, F. (2022). The application of AI technologies in STEM education: A systematic review from 2011 to 2021. International Journal of STEM Education, 9(1), 59. [Google Scholar] [CrossRef]
Yusuf, H., Money, A., & Daylamani-Zad, D. (2025). Pedagogical AI conversational agents in higher education: A conceptual framework and survey of the state of the art. Educational Technology Research and Development, 73, 815–874. [Google Scholar] [CrossRef]
Zhang, Y., Lucas, M., Bem-Haja, P., & Pedro, L. (2024). The effect of student acceptance on learning outcomes: AI-generated short videos versus paper materials. Computers and Education: Artificial Intelligence, 7, 100286. [Google Scholar] [CrossRef]

Figure 1. Conceptual framework of the study.

Figure 2. The instructional content—static condition (screen shot).

Figure 3. The instructional content—video condition (screen shot).

Figure 4. The instructional content—AI condition (screen shot).

Figure 5. Sequential overview of the experimental procedure.

Figure 6. Mean scores on the pre-test and post-test across the three instructional modalities.

Figure 7. Mean scores on the practical task and error rates immediately after the learning phase (task score and task errors) and after seven days (retention score and retention errors).

Figure 8. Mean task completion time for the practical task immediately after the learning phase and seven days later.

Figure 9. Mean values of the Likert scale responses evaluating satisfaction and perceived knowledge gained across the three instructional modalities.

Table 1. Participant attrition across experimental conditions.

Group	Initial Sample	Final Sample	Attrition (%)
G1	27	25	7.4
G2	28	21	25
G3	32	19	40.6
Total	87	65	25.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kašiković, N.; Dedijer, S.; Zeljković, Ž.; Glušac, D.; Premčevski, V.; Anđelković, A.S.; Tasić, N. Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making. Adm. Sci. 2026, 16, 246. https://doi.org/10.3390/admsci16060246

AMA Style

Kašiković N, Dedijer S, Zeljković Ž, Glušac D, Premčevski V, Anđelković AS, Tasić N. Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making. Administrative Sciences. 2026; 16(6):246. https://doi.org/10.3390/admsci16060246

Chicago/Turabian Style

Kašiković, Nemanja, Sandra Dedijer, Željko Zeljković, Dragana Glušac, Velibor Premčevski, Aleksandar S. Anđelković, and Nemanja Tasić. 2026. "Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making" Administrative Sciences 16, no. 6: 246. https://doi.org/10.3390/admsci16060246

APA Style

Kašiković, N., Dedijer, S., Zeljković, Ž., Glušac, D., Premčevski, V., Anđelković, A. S., & Tasić, N. (2026). Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making. Administrative Sciences, 16(6), 246. https://doi.org/10.3390/admsci16060246

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Effectiveness of AI-Supported Digital Training: Implications for Organizational Learning and Decision-Making

Abstract

1. Introduction

2. Theoretical Framework and Hypothesis Formulation

2.1. The Cognitive Theory of Multimedia Learning

2.2. Video-Based Learning Materials Effectiveness

2.3. AI-Supported Learning Materials

2.4. Hypothesis Formulation

3. Materials and Methods

3.1. Research Design

3.2. Instructional Materials and Experimental Procedure

3.3. Measures

3.4. Data Analysis

4. Results

4.1. Descriptive Analysis

4.2. Immediate Knowledge Acquisition

4.3. Practical Task Performance

4.4. Knowledge Retention

4.5. Perception–Performance Relationship

4.6. Qualitative Analysis of Participants’ Experiences Across Instructional Modalities

5. Discussion

5.1. Instructional Modality and Effectiveness

5.2. Cost–Effectiveness and Resource Allocation

5.3. Perception–Performance Gap

5.4. Boundary Conditions of Digital Learning Effectiveness

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

Appendix A.2

Appendix A.3

Appendix A.4

Appendix A.5

Appendix A.6

Appendix A.7

Appendix A.8

Appendix A.9

Appendix A.10

Appendix A.11

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI