1. Introduction
In contemporary organizational environments, continuous learning and digital upskilling are essential for maintaining competitiveness and adapting to rapidly evolving technological demands. Organizations increasingly rely on digital learning solutions to enhance employee competencies, improve productivity, and support operational efficiency. Recent organizational training research additionally emphasizes the importance of responsible and evidence-based implementation of AI-supported learning systems, particularly regarding their practical effectiveness, transparency, and managerial implications (
Chen, 2024). Among these solutions, video-based instruction and, more recently, artificial intelligence (AI)-supported learning formats have attracted significant attention as scalable and flexible approaches to workforce development (
Wiafe et al., 2025;
Park, 2024;
Bautista et al., 2024;
Lin & Yu, 2024;
Brynjolfsson et al., 2023;
W. Xu & Ouyang, 2022;
Cavanagh & Kiersch, 2022).
The growing adoption of these technologies is largely driven by the assumption that more advanced and interactive learning formats lead to better learning outcomes, higher engagement, and improved performance. In particular, video-based instruction is often considered more effective than static materials because it can demonstrate procedures dynamically and support the acquisition of practical skills (
Ghilay, 2025;
Dipon & Dio, 2024). Similarly, AI-generated instructional agents are promoted as innovative tools capable of enhancing learner interaction and creating a sense of social presence within digital learning environments (
Atabekova et al., 2026;
Alam & Mohanty, 2023).
AI-powered virtual reality (VR) and augmented reality (AR) learning environments offer immersive, practice-oriented learning opportunities (
Al-Ansi et al., 2023;
Xie et al., 2021), yet their high implementation costs, technical requirements, and limited accessibility raise concerns regarding their scalability and cost-effectiveness in organizational contexts (
Blackburn, 2023).
Despite these trends, empirical findings regarding the effectiveness of different instructional modalities remain inconsistent. While some studies report positive effects of video-based learning on academic performance and engagement (
Shen, 2024;
Noetel et al., 2021), others suggest that such benefits may be context-dependent and influenced by factors such as task complexity, learner characteristics, and instructional design quality (
Navarrete et al., 2025;
Trenholm & Marmolejo-Ramos, 2024). This inconsistency raises a relevant question from a management perspective: do more resource-intensive and technologically advanced learning formats actually lead to improved performance outcomes, or do they primarily enhance users’ subjective experience without measurable gains in effectiveness?
The integration of AI into digital learning environments adds further complexity. AI-generated avatars and virtual instructors are increasingly used to simulate human presence in instructional materials, aiming to improve engagement and perceived interactivity. Research suggests that such features may influence learning through social and cognitive mechanisms, including increased motivation and perceived clarity (
T. Xu et al., 2025;
Tan, 2024;
Beege et al., 2023). However, other findings indicate that AI-generated instructors may produce similar learning outcomes to human instructors, while potentially reducing perceived naturalness or increasing cognitive distraction (
Lin & Yu, 2025;
Arkün-Kocadere & Çağlar-Özhan, 2024). This indicates that the role of AI in instructional design remains insufficiently understood.
From an organizational perspective, this presents a significant challenge. Investments in digital learning are often justified by expectations of improved performance and efficiency, yet the actual return on these investments remains uncertain. In particular, the potential mismatch between perceived effectiveness and objective performance outcomes may lead to suboptimal decision-making in the selection and implementation of learning strategies. Understanding this relationship is therefore essential for developing evidence-based approaches to organizational learning and digital transformation. Recent reviews further suggest that AI is increasingly reshaping organizational learning and knowledge-management processes, requiring organizations to critically evaluate how AI-supported systems influence learning effectiveness and strategic decision-making (
Litvinenko, 2026).
In response to these challenges, this study examines the effects of three instructional modalities—static instructional materials, video-based instruction with human narration, and video-based instruction with an AI-generated avatar—on knowledge acquisition, task performance, and retention in a controlled experimental setting. In addition to objective performance measures, the study includes subjective evaluations of the learning experience, allowing for analysis of potential discrepancies between perceived and actual effectiveness.
Accordingly, this study addresses the following research questions:
RQ1. Do different digital instructional modalities influence immediate knowledge acquisition in a task-oriented learning context?
RQ2. Do different instructional modalities influence procedural task performance and knowledge retention over time?
RQ3. What is the relationship between subjective evaluations of instructional effectiveness and objective performance outcomes?
RQ4. How do participants qualitatively perceive engagement, clarity, and usability across different instructional modalities?
The contribution of this study is threefold. First, it provides empirical evidence on the comparative effectiveness of different digital learning formats in a task-oriented context relevant to workplace activities. Second, it identifies boundary conditions under which more advanced and resource-intensive learning modalities may not yield measurable performance benefits. Third, it offers practical implications for organizational decision-making by highlighting a potential divergence between perceived and actual effectiveness of learning solutions, thereby informing more cost-effective and evidence-based approaches to digital learning and AI adoption.
3. Materials and Methods
3.1. Research Design
The study was a controlled experimental investigation designed to examine the effectiveness of different digital learning modalities in a task-oriented learning context relevant to workplace activities. The initial sample consisted of 87 participants (G1 = 27, G2 = 28, G3 = 32). The final analyzed sample included 65 participants (G1 = 25, G2 = 21, G3 = 19), corresponding to an overall attrition rate of 25.3% (see
Table 1). Attrition was primarily associated with participant absence during scheduled experimental phases, particularly the delayed retention phase, incomplete or unsubmitted questionnaires, incorrectly entered identification information preventing response matching across phases, and failure to save or submit practical task files required for evaluation. The larger reduction observed in group 3 appeared to be primarily related to attendance-related factors during later phases of data collection rather than performance-related exclusion. Only participants with complete and evaluable datasets across all phases were retained in the final analysis.
The final sample included 22 male and 43 female participants enrolled in the first year of the Graphic Engineering and Design program at the Faculty of Technical Sciences, University of Novi Sad, Serbia. Participants were predominantly between 18 and 21 years of age. All participants were novice users without prior formal experience with the spreadsheet functions addressed in the study.
The use of a student sample was methodologically justified by the study’s objectives. Specifically, participants were considered representative of novice users in a digital learning context. In organizational settings, similar conditions often arise when employees are introduced to new digital tools, software systems, or data-processing tasks for which they have little or no prior experience. In such situations, early learning phases are characterized by the acquisition of fundamental procedural knowledge under structured instructional conditions, making student participants a suitable proxy for entry-level workforce training.
Additionally, the relatively homogeneous composition of the sample enhanced the internal validity of the study by reducing variability associated with prior knowledge, experience, and professional background. This allowed for a more precise examination of the effects of instructional modality, as potential confounding influences were minimized.
Participants were randomly assigned to one of three experimental groups using a random allocation procedure. Group 1 received static instructional materials consisting of text and images. Group 2 was exposed to a screen-recorded video accompanied by human voice narration, while Group 3 received a comparable video-based instructional format supplemented with an AI-generated avatar delivering the narration. The allocation procedure was intended to minimize systematic bias across conditions.
3.2. Instructional Materials and Experimental Procedure
The instructional content was designed to simulate a typical task-oriented digital learning scenario relevant to workplace environments. The learning material focused on the application of fundamental spreadsheet functions (AVERAGE, MIN, MAX, MEDIAN, and SUM), as well as formula replication across multiple cells. These operations reflect common data-processing tasks in administrative and business contexts.
Across all three experimental conditions, the instructional content was identical in structure, sequence, and conceptual scope, differing only in the mode of presentation. In the static condition, content was delivered through written explanations supported by screenshots illustrating the workflow (see
Figure 2). In the video-based conditions, the same instructional sequence was presented through a real-time screen recording of the task execution (see
Figure 3). In the human narration condition, explanations were provided through a recorded voice-over, while in the AI condition, narration was delivered by an AI-generated avatar positioned in the lower corner of the screen (see
Figure 4). The textual script used in the static condition corresponded directly to the narration used in both video conditions, ensuring content equivalence. These materials are publicly available on the website
https://www.asking.edu.rs/ (accessed on 1 December 2025) (
Kašiković et al., 2025).
In the AI-avatar condition, the instructional video was created using Studio D-ID (Creative Reality Studio 3.0), which enables the generation of talking avatars synchronized with pre-recorded narration. The avatar was selected from the platform’s built-in avatar database and was presented as a human-like digital instructor positioned in the lower right corner of the screen throughout the instructional sequence. The avatar depicted a semi-realistic middle-aged male instructor with neutral visual characteristics. The avatar was designed in a semi-realistic visual style and included facial animation, subtle head movements, and automatic lip synchronization generated through the platform’s AI voice engine. The narration was delivered using AI-generated synthetic speech in English, synchronized automatically with facial movements and mouth articulation. The avatar remained visually present during the entire video lesson and did not interact dynamically with the learner beyond scripted narration. No additional gestures, adaptive feedback, or interactive elements were included.
The overall sequence of the experimental procedure is presented in
Figure 5.
The experimental procedure comprised five phases. First, participants completed a pretest to assess baseline knowledge. The test included eight items—multiple-choice and true/false questions—with a maximum score of eight points. The pretest measured prior knowledge and served as a control variable in subsequent analyses.
The second phase was the learning session, during which participants engaged with instructional materials corresponding to their assigned condition. This phase lasted 20 min for all groups. During the learning phase, participants in all conditions were allowed to revisit the instructional material within the allocated 20 min period. In the static condition, participants could scroll through the instructional content freely, while in the video-based conditions, participants could pause, rewind, and replay segments of the instructional videos. Although the navigation format differed according to the instructional medium, all groups had identical exposure duration and equal access to the learning material within the same fixed time frame.
Immediately after the learning phase, participants completed a post-test to assess comprehension of the instructional content. The post-test mirrored the pretest in structure, comprising multiple-choice, true/false, and short-answer items, with a maximum score of eight points. Although the pretest and post-test assessed the same instructional content and learning objectives, the items were not identical. The tests followed the same structural format and covered equivalent procedural concepts, but variations in wording and item presentation were introduced in order to reduce simple recall effects and minimize potential practice effects across testing phases.
The third phase was a practical task simulating a typical data-processing activity. Participants were required to apply the learned spreadsheet functions to a dataset representing performance metrics across multiple entities. Task performance was evaluated using three indicators: accuracy (number of correct outputs), completion time (in seconds), and number of errors. The maximum achievable score was eight points, with one point awarded for each correctly calculated output.
The fourth phase involved a subjective evaluation of the learning experience. Participants completed a questionnaire with Likert-scale items assessing perceived clarity, engagement, and overall effectiveness of the instructional format, as well as open-ended questions for qualitative feedback.
The fifth phase took place seven days after the initial session and consisted of a retention test in the form of a practical task. This task followed the same structure and evaluation criteria as the initial practical task, but with a reduced maximum score of four points and modified datasets designed to assess equivalent procedural knowledge while reducing direct repetition effects. The retention phase enabled the assessment of longer-term knowledge stability and procedural task performance.
3.3. Measures
The study incorporated both objective and subjective measures to capture different dimensions of learning effectiveness.
Objective performance was assessed using pre-test and post-test scores, task accuracy, task completion time, number of errors, and retention performance. These measures reflect key indicators of learning effectiveness in organizational contexts, including knowledge acquisition, procedural execution, efficiency, and error minimization.
Subjective perceptions were measured with composite scales derived from Likert-type items. Three dimensions were assessed: perceived clarity of instruction, engagement during the learning process, and overall evaluation of the instructional format. Reliability analysis indicated high internal consistency for all scales, supporting their use as aggregated measures.
Additionally, qualitative responses were collected to provide deeper insight into participants’ experiences with different learning modalities, particularly regarding perceived usability and potential limitations of the instructional formats.
3.4. Data Analysis
Statistical analysis was performed using IBM SPSS Statistics 20. Descriptive statistics were examined to assess central tendency and variability across all measured variables. To test for differences between experimental groups, one-way analysis of variance (ANOVA) was applied to normally distributed variables, while analysis of covariance (ANCOVA) was used when it was necessary to control for prior knowledge or immediate learning outcomes. When assumptions of normality were not met, non-parametric tests (Kruskal–Wallis) were used. This combination of statistical procedures enabled a robust evaluation of differences between learning modalities, considering the characteristics of the data and ensuring the validity of the results.
Because the study addressed a theoretically integrated set of hypotheses examining related dimensions of learning effectiveness, formal family-wise error corrections were not applied across all statistical tests. Instead, emphasis was placed on effect sizes, consistency of patterns across analyses, and theoretical interpretation of the findings, in line with recommendations for exploratory and multifactorial educational research contexts. In the present study, the majority of analyses were non-significant, reducing the likelihood that the overall interpretation of the findings was substantially influenced by Type I error inflation.
Qualitative responses from the open-ended questionnaire items were analyzed using a descriptive thematic approach. Participant comments were independently reviewed by two researchers and grouped into recurring thematic categories related to clarity, engagement, usability, and perceived limitations of the instructional formats. Any differences in interpretation were resolved through discussion and consensus. Given the exploratory and supplementary role of the qualitative data, the analysis was intended primarily to support interpretation of the quantitative findings rather than to provide a fully formalized qualitative analytical framework.
4. Results
4.1. Descriptive Analysis
An initial understanding of the results was gained through descriptive statistics and graphical representation of mean scores across all experimental phases. Visual inspection of pretest and post-test results indicated an overall improvement in performance following the learning intervention (
Figure 6). The highest mean post-test scores were observed in the group exposed to video-based instruction with human narration (G2), while the lowest scores were recorded in the AI avatar condition (G3). Participants in G1 generally demonstrated intermediate performance patterns across the analyzed phases, with results remaining comparable to those observed in the two video-based conditions.
A similar pattern was observed in the practical task immediately after the learning phase (
Figure 7), with participants in the video with human narration condition achieving the highest scores, followed by the AI avatar group, and the lowest performance recorded in the static condition. Error rates (
Figure 7) followed the same pattern. In the delayed retention task (
Figure 8), the video with human narration condition again showed the most favorable outcomes, while the AI avatar condition showed the lowest performance and the highest number of errors.
With regard to subjective evaluations (
Figure 9), the results indicated that both video modalities were rated more positively than the static format, with a slightly more favorable pattern again observed for the video lesson with a human narrator. Descriptive patterns appeared broadly consistent with theoretical expectations derived from the Cognitive Theory of Multimedia Learning, which suggests that the combination of visual and auditory channels can contribute to more efficient processing of content than presentation primarily based on static text and images.
These descriptive trends may indicate a potential advantage of video-based formats in facilitating procedural understanding and initial task execution. However, descriptive differences alone are insufficient for evaluating learning effectiveness, which must be assessed using statistically validated performance indicators.
4.2. Immediate Knowledge Acquisition
The effect of instructional modality on immediate knowledge acquisition was examined using a one-way analysis of covariance, controlling for prior knowledge. This analysis was conducted to test H1. The results indicated no statistically significant difference between the three groups in post-test performance,
F(2, 61) = 0.903,
p = 0.411, partial eta squared = 0.029 (see
Appendix A.2). The observed effect size was small, indicating limited practical significance of modality differences.
Although descriptive patterns suggested slightly higher performance in the video-based conditions, these differences were not sufficiently pronounced to be considered statistically reliable. Overall, the analysis did not provide evidence that one instructional format produced superior immediate knowledge acquisition compared to the others. Therefore, H1 was not supported.
In the context of organizational training research, this result aligns with the view that learning outcomes are influenced by multiple interacting factors, including instructional design, learner characteristics, and task structure, rather than delivery format alone (
Salas et al., 2012).
4.3. Practical Task Performance
To test H2, the analysis of practical task performance was conducted and showed no statistically significant differences between the groups in task accuracy,
F(2, 525.992) = 0.557,
p = 0.576, with small effect sizes indicating minimal practical differences between conditions (see
Appendix A.3).
Similarly, no statistically significant differences were found in task completion time or number of errors (see
Appendix A.4 and
Appendix A.5). Although the Kruskal–Wallis test for completion time approached statistical significance (
p = 0.056), this result did not meet conventional thresholds for significance. Descriptively, participants in the video conditions completed the task more quickly immediately after learning, but this advantage was not maintained in the delayed condition.
Thus, H2 was not supported.
In practical terms, this might mean that differences in learning modality may not translate into meaningful differences in task execution performance, even when small efficiency trends are present. This is particularly relevant for decision-making in learning design, as it indicates that perceived improvements in efficiency may not reflect stable or generalizable performance gains.
Consistent with prior research on training transfer, the ability to apply learned knowledge in practical contexts depends on factors beyond initial exposure, including task characteristics and opportunities for reinforcement (
Baldwin & Ford, 1988).
4.4. Knowledge Retention
H3 was tested by analyzing delayed knowledge retention. The analysis of delayed knowledge retention revealed no statistically significant differences between the groups,
F(2, 61) = 0.458,
p = 0.634, partial eta squared = 0.015 (see
Appendix A.6). This small effect size further suggests minimal practical impact of instructional modality on knowledge retention. Neither task completion time (see
Appendix A.7) nor number of errors (see
Appendix A.8) showed significant variation across conditions in the delayed phase. Accordingly, H3 was not supported.
These findings indicate that instructional modality did not influence the durability of learning over time. Theoretically, this suggests that any potential advantages of multimedia or AI-supported instructional modalities during initial exposure do not necessarily lead to more stable knowledge representations.
In organizational training contexts, retention and transfer are considered critical indicators of effectiveness, as they determine whether acquired knowledge can be applied in real work situations (
Salas et al., 2012). The absence of differences in retention performance, therefore, indicates that more advanced instructional formats may not provide additional value in supporting long-term learning outcomes in comparable task conditions.
4.5. Perception–Performance Relationship
To examine H4, the analysis of relationships between subjective evaluations and objective performance outcomes showed no statistically significant correlations (see
Appendix A.10). One-way ANOVA analyses additionally indicated no statistically significant differences between instructional modalities regarding perceived clarity, engagement, or overall system evaluation (see
Appendix A.10), despite descriptively more positive evaluations observed for the video-based formats. Perceived clarity was not associated with post-test performance (
r = 0.105,
p = 0.404), engagement was not associated with task performance (
r = 0.078,
p = 0.536), and overall modality evaluation was not associated with retention performance (
r = 0.049,
p = 0.698) (see
Appendix A.11). These findings are consistent with H4, indicating weak and statistically non-significant associations between subjective evaluations and objective performance outcomes.
These findings indicate a divergence between perceived and actual effectiveness of learning modalities. From a management perspective, this finding is particularly relevant, as it suggests that user satisfaction and engagement may not be reliable indicators of performance improvement.
These results emphasize the need to evaluate learning effectiveness across multiple dimensions, including cognitive, behavioral, and performance outcomes. Relying solely on subjective feedback may therefore lead to suboptimal decisions in the selection and implementation of learning strategies.
4.6. Qualitative Analysis of Participants’ Experiences Across Instructional Modalities
Qualitative responses provided additional insight into how participants in the study experienced the different forms of instructional presentation. The qualitative findings should be interpreted as exploratory and illustrative, complementing the quantitative analyses rather than serving as independently validated qualitative conclusions.
In the group that learned through the static presentation, responses most frequently emphasized the clarity, conciseness and transparency of the material, as well as the visual support provided through images and the step-by-step representation. This format was described as “short and clear”, “simply explained” and “easy to follow”. The main advantages highlighted were the structured organization of the material, the possibility of easily returning to the content, and the feeling of independent learning. As potential improvements, greater interactivity and stronger visual emphasis on key information were suggested. Overall, the static format was perceived as a clear and understandable way of initially introducing new content.
In the group exposed to the video demonstration with tutor narration, it was emphasized that “watching how someone performs the task while explaining it at the same time” helped understand the procedure and apply the learned steps more easily. Particularly highlighted were the detailed explanations, repetition of key steps, and the overall efficiency of this approach. Suggestions for improvement mainly referred to technical aspects such as sound quality and a slower pace of explanation, rather than to the concept of the instructional format itself. Overall, this modality was perceived positively and described as useful, engaging, and easy to follow.
In the group exposed to the video demonstration with the AI avatar, the advantages of the video format, particularly the possibility of revisiting the content, following the procedure step by step, the clearly structured sequence of actions, and the practical nature of the format were recognized. However, unlike the tutor video group, a clear reservation towards the presence of the AI avatar emerged. Several participants subjectively perceived the visual representation of the avatar as somewhat distracting or less natural, which may have influenced attention during task performance. As possible improvements, it was suggested removing the avatar from the frame, slowing down the pace of explanation, and providing stronger visual highlighting of formulas or instructions on the screen. Thus, although the basic video format was perceived as useful and practical, the personalized AI component itself was not uniformly experienced as an advantage.
The qualitative findings suggested descriptive differences in participants’ subjective experiences across instructional modalities, particularly regarding engagement and perceived usability. However, these perceptions were not reflected in statistically significant quantitative differences, further reinforcing the divergence between perceived and actual effectiveness.
5. Discussion
5.1. Instructional Modality and Effectiveness
Addressing RQ1, the findings related to immediate knowledge acquisition indicated that video-based instructional modalities did not produce statistically superior outcomes compared to the static instructional condition. More broadly, the results suggest that the choice of instructional modality did not significantly influence learning outcomes within the present task-oriented learning context. The non-support of H1–H3 further indicates that the expected advantages of richer instructional modalities may not emerge in relatively simple, well-structured procedural tasks. The consistently small effect sizes across analyses additionally support the conclusion that instructional modality had limited practical significance in this context. This challenges the common assumption that more technologically advanced learning formats, such as video-based instruction or AI-supported delivery, inherently lead to superior performance. Instead, the present findings did not show statistically reliable evidence that video-based or AI-supported delivery produced superior knowledge acquisition or application compared to static materials, when instructional content is clearly structured and cognitively manageable. This interpretation aligns with recent findings by
Zhang et al. (
2024), who showed that although AI-generated video materials were more positively perceived and more readily accepted by learners, these advantages did not result in improved learning outcomes compared to traditional paper-based materials. This divergence reinforces the notion that learner acceptance and perceived effectiveness are not necessarily reliable indicators of actual performance gains.
From a managerial standpoint, these findings raise important questions about the return on investment (ROI) of advanced digital learning solutions. If higher levels of technological sophistication do not produce measurable improvements in performance, organizations may risk overestimating the effectiveness of formats that primarily enhance user perception without corresponding evidence of improved performance. This has direct implications for organizational learning strategies, indicating that effectiveness may depend more on instructional design quality than on technological sophistication. In practical terms, organizations may achieve comparable performance outcomes using simpler and less resource-intensive instructional solutions, particularly for routine procedural tasks.
5.2. Cost–Effectiveness and Resource Allocation
Regarding RQ2, the findings similarly indicated no statistically significant differences between instructional modalities in procedural task performance, including task accuracy, completion time, and error frequency. These results have important practical implications from a cost-effectiveness perspective, particularly considering that advanced instructional formats, such as professionally produced videos and AI-generated instructional agents, typically require substantial financial, technical, and time-related resources. However, the lack of measurable performance advantages in this study suggests that such investments may not always lead to improved outcomes. Organizations may need to reconsider the assumption that more complex and visually engaging formats justify their higher costs. Instead, the findings support a more strategic approach, selecting modalities based on task requirements and expected performance gains rather than technological appeal or perceived innovation.
5.3. Perception–Performance Gap
With respect to RQ3, the retention findings suggested that the instructional modalities did not differ substantially in supporting longer-term knowledge retention within the present learning context. In addition to retention outcomes, an important contribution of this study lies in the identification of a divergence between subjective perceptions and objective performance outcomes. Although participants tended to evaluate video-based formats as more engaging and effective, these perceptions were not associated with improved task performance or knowledge retention. This perception–performance gap has important implications for organizational decision-making. Learning programs are often evaluated based on user satisfaction and engagement metrics, which may not accurately reflect their impact on actual performance. The results of this study suggest that relying solely on subjective evaluations may lead organizations to overestimate the effectiveness of certain formats, particularly those perceived as more modern or technologically advanced.
The findings related to AI-generated avatars provide further insight into the implementation of emerging technologies in organizational learning environments. Although AI-supported instructional modalities are often promoted as scalable and innovative solutions, the results suggest that their impact on learning effectiveness may be limited in certain contexts.
Moreover, qualitative feedback from some participants suggests that the presence of an AI-generated avatar may introduce unintended effects, such as perceived distraction or reduced naturalness of the interaction. These factors may negatively influence user experience without providing measurable benefits in performance. From an organizational perspective, the integration of such technologies should not be driven solely by their novelty or perceived innovation, but should be carefully evaluated in terms of their actual contribution to performance outcomes and the quality of user interaction.
5.4. Boundary Conditions of Digital Learning Effectiveness
In relation to RQ4, subjective evaluations and qualitative perceptions revealed descriptive differences in perceived engagement, usability, and overall learning experience across instructional modalities, despite the absence of corresponding objective performance differences. More specifically, participants generally evaluated video-based formats more positively, while these perceptions were not associated with improved learning outcomes. These findings contribute to a more nuanced understanding of the conditions under which advanced instructional modalities may or may not provide measurable benefits. In this context, the absence of statistically significant modality effects may be interpreted as a potential boundary condition, suggesting that the benefits of multimedia and AI-supported instructional modalities may depend on contextual factors such as task complexity, cognitive demands, and learner characteristics.
In relatively simple and well-defined task environments, where cognitive load is limited and procedural steps are straightforward, different instructional formats may be equally effective. In such conditions, the added value of multimodal or AI-enhanced presentation may be reduced. This perspective aligns with the view that the effectiveness of digital learning is conditional rather than universal. For organizations, this implies that the selection of learning modalities should be aligned with the specific characteristics of the task and the learning objectives, rather than based on general assumptions about the superiority of certain technologies.
6. Conclusions
This study examined the effectiveness of different digital learning modalities in a task-oriented learning context relevant to organizational environments. The results showed that instructional modality did not produce statistically significant differences in knowledge acquisition, task performance, or retention. Across all three conditions—static materials, video-based instruction with human narration, and video-based instruction with an AI-generated avatar—no statistically significant differences were detected across the objective performance indicators. Subjective evaluations of the learning experience also did not differ significantly between formats.
From a management perspective, these findings have important implications for the design and implementation of digital learning strategies. In particular, the results challenge the widespread assumption that more technologically advanced and resource-intensive learning formats inherently lead to superior performance outcomes. This might suggest that technological sophistication alone is not a reliable predictor of learning effectiveness.
A key implication concerns the cost-effectiveness of organizational investments in digital learning. As advanced learning formats typically require greater financial, technical, and production resources, their adoption should be carefully evaluated in relation to expected performance gains. The findings of this study suggest that, in relatively simple and well-structured task environments, comparable outcomes can be achieved using less complex and more cost-efficient instructional approaches. This suggests that organizations may be able to optimize learning investments in comparable introductory procedural learning contexts by prioritizing instructional clarity and alignment with task requirements over technological novelty.
The study also identified a divergence between subjective perceptions and objective performance outcomes. Although participants tended to perceive video-based formats as more engaging and effective, these perceptions were not associated with improved performance. This perception–performance gap highlights a potential risk in organizational decision-making, where learning formats may be selected based on user preference or perceived modernity rather than evidence of effectiveness. Accordingly, organizations should complement subjective evaluation metrics with objective performance indicators when assessing learning outcomes.
The findings related to AI-generated instructional agents provide further insight into the adoption of emerging technologies in organizational learning. While AI-based formats are often positioned as innovative and scalable solutions, the results suggest that their contribution to performance may be limited in certain contexts. Moreover, qualitative feedback suggests that design-related factors, such as perceived naturalness and potential distraction, may influence user experience without necessarily translating into measurable performance benefits. These findings underscore the importance of critically evaluating AI-based learning solutions beyond their technological appeal.
Overall, the study findings did not support the expected advantages of video-based and AI-supported instructional modalities (H1–H3). Hypothesis 4 was supported, indicating no significant association between subjective evaluations and objective performance outcomes, while H5 received only partial qualitative support.
Despite its contributions, this study has several limitations. First, the relatively small sample size and unequal group sizes may have limited the statistical power to detect small-to-moderate effects between instructional modalities. Therefore, the non-significant findings should not be interpreted as definitive evidence of equivalence between the conditions. Rather, they indicate that no statistically reliable evidence of superiority of one instructional modality over another was detected within the present sample. Future studies with larger samples should consider equivalence testing approaches, such as two one-sided tests (TOST), or Bayesian analyses, in order to evaluate more directly whether instructional formats can be considered statistically equivalent. Additionally, unequal attrition across experimental conditions, particularly in the AI-avatar group, should be considered when interpreting qualitative observations related to this modality.
Second, the study was conducted in a controlled academic setting with a student sample, which limits the direct generalizability of the findings to real workplace populations and professional training environments. Although the study was positioned within an organizational learning perspective, the participants represented novice users engaged in introductory procedural learning tasks rather than actual organizational employees. Consequently, managerial implications should be interpreted cautiously and primarily as indicative rather than directly transferable to organizational settings.
An additional limitation concerns the relatively limited dispersion observed in some performance measures. Although the obtained scores did not indicate a strict ceiling effect, post-test and task performance results were concentrated within a relatively narrow range. This may have reduced the sensitivity of the measures to detect subtle differences between instructional conditions, particularly given the relatively simple procedural nature of the task and the novice level of participants.
Furthermore, specific design characteristics of the AI-generated avatar, including voice naturalness, visual appearance, and lip-synchronization quality, may have influenced participants’ perceptions and should be examined more systematically in future research.
Future studies should therefore examine more complex and cognitively demanding tasks, diverse employee populations, and real organizational settings in order to provide deeper insight into the conditions under which different instructional modalities may yield meaningful advantages for organizational learning and digital training.
The study contributes to a more nuanced understanding of digital learning effectiveness by demonstrating that the benefits of advanced instructional technologies are conditional rather than universal. For organizations, this implies that effective learning strategies should be guided by task characteristics, instructional design principles, and evidence-based evaluation, rather than assumptions about the inherent superiority of specific technologies.