Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study

Aliady, Wateen; Poesio, Massimo

doi:10.3390/app16041915

Open AccessArticle

Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study

by

Wateen Aliady

^1,2,* and

Massimo Poesio

¹

School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK

²

College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11432, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 1915; https://doi.org/10.3390/app16041915

Submission received: 23 December 2025 / Revised: 2 February 2026 / Accepted: 3 February 2026 / Published: 14 February 2026

(This article belongs to the Special Issue Innovative Horizons: Exploring the Convergence of Gamification and Virtual Reality)

Download

Browse Figures

Versions Notes

Abstract

The current research discusses the thematic alignment of narratives and its impact on user engagement in gamified apps, using NLP Game-with-a-Purpose (GWAP) as the experimental environment. The experimental game is made up of a three-dimensional environment wherein a scene-based narrative is contextually integrated with the game world, and there is no such integration in the case of a non-scene-based narrative. Quantitative data gathered from 80 participants shows that a scene-based narrative contributes much better to user engagement, scoring higher in Focused Attention (FA) and Reward (RW) compared to the non-scene-based narrative. These findings are supported by qualitative feedback and think-aloud protocols, which showed that the participants found the scene-based narrative to be more engaging and involved than the non-scene-based material. Even though there were no statistically significant differences in cognitive load, trends of mental demand and frustration were observed indicating that thematic alignment has the potential to streamline the user experience. These findings have design implications in developing narrative-based gamified systems that can be used to improve interaction in language-related tasks.

Keywords:

GWAPs; serious games; cognitive load; engagement; mixed method study; natural language processing

1. Introduction

Narratives are significant in the art of serious game design because they transform trivial tasks into enjoyable activities. By embedding the actions of players in a narrative, people perceive that their actions are a part of a larger developing story instead of a set of mechanics [1,2]. This storytelling technique creates emotional investment and motivation, keeping the player engaged for longer. As an example, even simple elements of a narrative, like those present in Space Invaders, can transform basic gameplay into a series of emotional moments, and more complex stories of role-playing games such as The Elder Scrolls can take a player into a world of incredible depth and space [1].

Studies have determined some of the core principles of successful narrative design, such as decentralized narratives allowing alternate engagement routes [3,4]; seamless integration, where tasks and narrative are tightly interwoven [5,6,7,8]; relatable characters that enhance identification and empathy [9,10,11,12]; and dynamic narratives that respond to player choices and environments [3,7,13,14,15]. All of these factors contribute to participation, motivation, and better learning results.

Although previous research has shown that storytelling has the potential to enhance engagement and learning in educational and serious games [16,17], its participation in Games-with-a-Purpose (GWAPs), especially those that focus on linguistic annotation and differences in text style or content, is not well studied. The goal of GWAPs is normally to gather descriptive language data of various text types to support computational language modeling. In our research, we particularly focus on narrative texts to determine how the connection with the game world influences the player experience. Whether the format of narrative text, i.e., whether it is scene-based (associated with the game world) or non-scene-based (not related to the game world), affects player engagement and cognitive load is not clear. To fill this knowledge gap, our study explores how narrative alignment, using GPT-generated texts incorporated into two different game versions (one scene-based, one non-scene-based), impacts player experience in a coreference annotation GWAP via both quantitative and qualitative methods.

2. Literature Review

2.1. Games as Narrative Architectures

Narrative theory provides valuable insights into how digital games create meaning. Jenkins argued that games were narrative architectures, where the story is not only created through linear plots but also through spatial worlds, interactivity, and choices of the players [18]. On the same note, Murray referred to games as cyberdramas and underscored that the effect of immersion is an outcome of the interaction between the story and interactivity [19]. These views place narrative as one of the central principles of design, where players think about games and interact with them, instead of being in the background.

2.2. Immersion Theory

Game design narrative structure is an effective way to tell a story, as well as creating a feeling of presence and immersion. Immersion is interpreted as a state of being in which the players feel that they are immersed in the virtual world [20]. Brown and Cairns suggested a multi-stage model of immersion, where the players can advance through initial engagement, engulfment, and total immersion, provided they overcome formalities like usability, absence of empathy, or incongruities in the storylines [21]. Building on this, Jennett et al. showed immersion to be determined both subjectively using surveys and objectively using performance-based measurement and eye tracking [22].

2.3. Flow Theory

Flow has been found in both real-life and online environments, and digital games are especially effective in delivering immersive conditions. Sustained interactions with players are often regarded with the experience of flow, which is a mental state when people feel immersed and unaware of anything except their current project [23]. Instead of their attention being distracted by extraneous stimuli, people in a state of flow are subjected to a peak of attention, and they even find the activity they are doing intrinsically rewarding. According to Csikszentmihalyi [23], flow describes the emergence of participants in activities where objectives are defined and timely responses provided to ensure that the participants remain oriented and motivated in the activity. Building upon a multidisciplinary foundation, most notably incorporating flow theory, the User Engagement Scale (UES) was created mainly by Heather O’Brien to measure engagement [24].

2.4. Narrative Impact on Engagement and Learning

Whereas immersion has a tendency to summarize the emotional experience of the player, narrative engagement structures explain how it is supported by mechanisms. The Narrative Engagement Framework, created by Miller-Day and Hecht, incorporates transportation, identification, and realism theories and argues that interesting narratives can create emotional attachment when they allow players to identify with characters, feel immersed in the story world, and perceive events as real [25]. This construct describes why incorporating activities in a story setting can tend to increase motivation and affective investment.

The effectiveness of narrative integration in various contexts (such as education, entertainment, and crowdsourcing) is supported by research. As an example, the integration of narratives in physics education has been demonstrated to support the development of abstract knowledge in students by introducing a contextual background to the material [26], and narrative visualizations have been shown to result in better interpretation of medical information [27]. In the gaming field, within the framework of the Citizen Sort project (http://www.citizensort.org), a citizen science project conceived from crowdsourced data, two games were compared: Forgotten Island, which is a story-based game, and Happy Match, which is a points-based one. Respondents claimed to enjoy Forgotten Island more, and this could be explained by the fact that it has a greater narrative component and elicited more feelings as a result of this narrative aspect compared to Happy Match, where rewards appear in the form of mechanical incentives [28]. However, narrative choice is not always helpful in understanding. It was noted that, even though the level of emotional interest increased with the introduction of narrative choice, the level of understanding did not [29]. Classroom studies with parallel classes showed that narrative texts are associated with emotional reactions, and expository texts are connected with more analytical conversations [30]. All of these findings suggest that the selected text type should be aligned with certain learning or engagement goals in order to maximize its effects.

2.5. Cognitive Load Theory and Narrative Alignment

Although immersion and narrative engagement emphasize the affective and experiential aspects of games, players’ capacity to absorb narrative information is also influenced by cognitive constraints. Cognitive Load Theory assumes that learning and performance are limited by the small capacity of the working memory [31]. By not aligning the instructional elements (text and visual) with each other, they cause an increase in extraneous cognitive load, so that mental resources are allocated to fundamental activities [32]. This research is supported by multimedia learning studies: when the learning stimuli are incoherent, the learner is forced to spend effort on the discrepancy resolution, and this lowers the efficiency of comprehension [32]. Equally, performance is hampered by irrelevant auditory or visual distractions that especially apply to second-language learners, where cohesion between text and visuals disintegrates attention [33,34]. These results indicate the significance of narrative correspondence, in which a unified text, images, and activities reduce mental load and maintain engagement.

2.6. Focus of the Present Study

The original theories of narrative [18,19], immersion [20,21,22], narrative engagement [25], and cognitive load [31] all combine in the ways games create meaning, maintain immersion, and affect understanding. Nevertheless, few studies have investigated the role of scene-based and non-scene-based presentation of text in GWAPs in terms of immersion and cognitive load. To fill this gap, the current study experimentally investigates how narrative alignment—as implemented through scene-based and non-scene-based GPT-generated texts—affects player engagement and perceived cognitive effort in a coreference GWAP. This study advances understanding of how narrative design choices shape motivation and cognitive processing in data-driven games, with implications for the development of more effective and cognitively balanced gamified systems.

3. Research Questions and Hypotheses

This study tests whether aligning on-screen text with the game environment (scene-based) versus not aligning it (non-scene-based) affects engagement and cognitive load in a gamified annotation task. Figure 1 only illustrates the two text types (with Arabic excerpts and English translations); for the actual game and 3D environments, see Section 4.3 “Stroll with a Scroll: A 3D Virtual World Game” and Figure 2.

RQ1: Does scene-based text affect user engagement?

H1:

Scene-based text will significantly enhance user engagement compared to non-scene-based text by creating immersive and contextually relevant narratives.

RQ2: Does scene-based text affect cognitive load?

H2:

Scene-based text is expected to reduce cognitive load by providing thematic coherence and minimizing extraneous mental effort compared to non-scene-based text.

4. Methods

4.1. Experimental Framework

This study investigates the impact of narrative text alignment with the game environment on user engagement and cognitive load in a gamified annotation task. To gain Arabic coreference annotations, a virtual world game was created in three-dimensional environments, set in a desert-cave setup which relies on the aesthetics of the Middle East old town. The subjects were asked to respond to thematic matching questions embedded in a short narrative passage, which allowed the evaluation of the extent to which thematic alignment adjusts experiential performance. The visual space, including a deserted setting, an old marketplace, and a cave, was kept in place throughout all versions of the game in order to control the effect of story matching. With a difference in thematic relevance in the text of the narrative being the only difference, any difference in engagement or cognitive load would be ascribed to the text condition, hence creating a controlled framework through which the effects of narrative-driven gamification may be probed.

The participants were provided with standardized instructions that defined the annotation task, e.g., finding words or phrases in a text that describe the same entity (i.e., connecting the ancient city and Al-Zahra). The game interface also featured tutorials that were used to familiarize the players with the navigation and annotation mechanics, such that the interface has been accessible to both gamers and non-gamers. Such a methodological setup enabled the authors to focus on the effect of narrative presentation on player engagement and cognitive effort in a GWAP setup.

4.2. Experimental Design

The research design was a between-subjects design, and the participants were randomly assigned to either of two versions of the games:

Scene-based condition: The annotation tasks were embedded in narrative passages that reflected the desert-cave condition, including tales about a trip across a deserted sandy wasteland or a visit to a secret cave.
Non-scene-based condition: Annotation tasks were incorporated with the narrative, which was not in any way associated with the environment, such as a story in a forest or a medieval tower, which were given in the same desert-cave setting.

The independent variable consisted of the alignment with the narrative (scene versus non-scene), and the dependent variables included user engagement (self-reported focus, perceived reward, and other measures) and cognitive load (demanded by subjective ratings and performance indicators of the task). In order to have a strong comparison, the A/B test method was employed, which supports causal inference because the effects of narrative alignment and all other aspects of the game, including visuals, interface, and difficulty of a task, remained unchanged [35]. The selected A/B framework has been popularly known to provide statistically valid results for the impact of a single design factor on user experience [35]. Conditions were randomly assigned to the participants, and each group was given the same number of annotation tasks so that similar datasets were obtained.

4.3. Stroll with a Scroll: A 3D Virtual World Game

Stroll with a Scroll is a virtual world game that was specifically created to perform Arabic NLP work, and it is based on a treasure hunt topic to involve the players in the coreference annotation. The task is to find words or phrases that define the same thing (such as the relationship between the city of Cairo and the capital) in narrative texts. The game is played with the aid of the third-person camera, and the players move an avatar that is depicted on the screen (controlled with the help of arrow keys to move and the shift key to change the speed) in a three-dimensional environment of the fictional ancient town, a bustling bazaar, full of colorful stalls, and a cave in the desert (see Figure 2). Avatars, wearing culturally inspired Middle Eastern attire, help increase immersion. A red, yellow, and green navigation system controls the closeness to concealed chests, thus leading players through the world.

Upon reaching a chest, players opened an in-game scroll UI that displayed a narrative passage with an embedded annotation task (Arabic excerpt) and the labeling interface. Each play session presented two annotation passages in sequence, with all passages generated and curated as detailed in Section 4.3. The first passage appeared on a scroll in the ancient town. After completing it, participants followed the color-coded navigation system through the bazaar to the cave, where the second passage was presented on a new scroll. These passages appear in two forms: scene-based texts, which reference the in-game environment (e.g., describing a desert storm or cave exploration), and non-scene-based texts, which present the same annotation tasks without ties to the visual setting (e.g., describing a forest journey).

The visual environments and interaction loop (navigate → open chest → read on scroll → annotate) remained identical across conditions. Players achieve the assigned task by identifying coreferential expressions, which are followed by other exploratory actions. The suggested design will combine a treasure hunt system and linguistic annotation generation with the use of visually notable stimuli, in this case, illuminated chests, to maintain the interest of the players and increase their motivation.

4.4. LLM-Driven Story Generation and Preprocessing

A large language model (LLM) provided by the OpenAI API, namely GPT-4o, version: 1.26.0 (released and accessed on May 2024), was used to generate narrative texts due to its higher capability to generate coherent and contextually adequate Arabic text. To come up with the narratives, we used a Python 3.12.3 script to automate the process with the same prompt template mentioning the clear role and task of the model and a detailed input structure. These are the inputs to the prompt: story theme or title, text length, narration perspective, and writing style. These inputs were identical when running the prompt for each generated text: between 400 and 800 words, second-person perspective, descriptive short style that would fit into the game as a video game, and structure (exploration → obstacles → travel journey → reflective ending). The only difference between the two is that the former type was a set of stories told in Modern Standard Arabic, in the imaginary city of Al-Zahra (The Radiant City). These narratives portrayed a second-person journey through a desert to a prehistoric cave where there were two exits and ended with a trip back to the town. It was explicitly noted that it should be thematically aligned with the desert-cave environment. The second type of prompt created contrasting stories that intentionally went off-course from the environment of the game. An example is the sea voyage in the Emerald Forest, with rivers, fertile foliage, and towers of stone, where there was no theme overlap with the desert-cave environment.

Within the generated XML file, there are several layers. To create <baseLayer>, we tokinezed text; for <markableLayer>, we made an additional API call with a prompt asking the model to extract all nouns and places; and for <anaphoraLayer>, we asked the model to extract all coherent mentions related to the extracted mentions in the markable layer.

The parameters of both prompt types were controlled: the temperature was set to 0.7 to ensure a balance between creativity and coherence, and the top-p value was set to 0.9 to maintain the lexical diversity but not allow too much randomness. The generated texts, presented in Figure 3, were stored in an XML format, containing both the raw narrative text and other layers, which include word segmentation, named entity recognition (NER), and coreference links. Both narratives were subject to a thorough manual review that was performed by using four different criteria: (1) thematic consistency is checked for the aligned text within the desert-cave environment; (2) narrative consistency within sentences; (3) linguistic clarity in Arabic; (4) compatibility with the extraction of accurate mentions and annotation of coreferences. In addition, the texts that met all four requirements were manually controlled by the main author and truncated, to ensure similarity based on measurable attributes such as word count and sentences length, as presented in Table 1. The truncation is performed by removing some descriptive texts until both narratives are equal in length at the same time while truncating, and the cohesion and meaning of narratives remain intact.

In order to obtain an annotation of high quality, a number of pretrained Arabic Natural Language Processing (NLP) models were tested, among which Hugging Face models [36], Gemini, Stanford NLP/CoreNLP-Arabic, asafaya/bert-base-arabic, and hatmimoha/arabic-ner were utilized, and the results were obtained through the Hugging Face Inference API. These models achieved tokenization, named entity recognition, and coreference resolution of the Arabic texts. They were, however, less accurate and contextually consistent in their outputs compared to GPT-4o when it came to identifying entity mentions in complex stories. The GPT-4o was thus chosen as the main model in terms of generating the story and processing the annotation. A Python script was written to automate the process, which involves GPT-4o outputs into the XML format and simplifies the next components, tokenization, named entity extraction, and coreference resolution on the Arabic text, together with the markable layer generation and anaphora detection. This machine learning provided uniformity and scalability in the preparation of the narrative texts for the game.

4.5. Ethical Approval

This study received ethical approval from the Research Ethics Committee at Queen Mary University of London under approval reference number QMERC20.565.DSEECS23.010. All participant data were anonymized to ensure confidentiality, they were informed about the purpose of the study, and informed consent was obtained from all individuals prior to participation.

5. Experiment 1: A Quantitative Study on Scene-Based vs. Non-Scene-Based Texts and Their Relationship to Engagement and Cognitive Load

In this section, we present our quantitative study comparing scene-based and non-scene-based texts. This study examines the effects of these text types within an in-game environment, focusing on their impact on player experience and engagement. The following sections detail the experimental design, materials, and procedures used to evaluate these text types.

5.1. Participants

Power analysis was performed using G*power to calculate the sample size required to run a t-test analysis with α = 0.05; power was 80%, and a large effect size [33] sample of 52 was required to achieve statistical power.

The final sample size was 80 participants, with 53.8% females and 46.3% males, with the majority in the 25–34-year age group (41.3%) and then 28.7% in the 35–44 age group, 17.5% in the 18–24 age group, 10% in the 45–54 age group, and only 2.5% aged 55 years or above.

5.2. Materials and Measures

The User Engagement Scale (UES) was selected to measure the engagement [37], which is one of the most widely validated instruments for assessing engagement with digital systems. We selected the short form (UES-SF) [37], which retains the validated factor structure of the original while reducing participant burden by minimizing questionnaire length. The subscales had three items on a five-point Likert scale. The UES-SF measures engagement in four major dimensions, namely Focused Attention (FA), which is an indicator of immersion and concentration; Perceived Usability (PU), which is an indicator of ease of use and frustration levels; Aesthetic Appeal (AE), which is an indicator of visual and sensory attractiveness; and Reward (RW), which is an indicator of perceived benefits or satisfaction with the experience. For the purpose of hypothesis testing, a single summative Engagement score was calculated by averaging all UES-SF items across these subscales. This summative score served as the primary metric for evaluating H1. Individual subscale analyses were conducted for supplementary insight and were not treated as independent hypothesis tests; they were considered exploratory and were not adjusted for multiple comparisons.

Internal consistency for each subscale was evaluated, showing good reliability: FA (ω = 0.82), PU (ω = 0.86), AE (ω = 0.84), and RW (ω = 0.81). Also, there was construct and face validity in the scale [37]. Each item is averaged, while the UES score is the average of the subscale scores to score the subscales.

Measurement of subjective cognitive load was based on NASA-TLX [38,39], which examines six areas including cognition requirement (mental requirement MD), physical requirement (physical requirement PD), time requirement (temporal requirement TD), output (performance P), work input (effort E) and psychological workload (frustration F). Each area gets an overall score ranging from 0 to 100 in increments of 5 to indicate possible physical and mental loading that the participants may have undergone [38]. These areas include different sources of cognitive workload in various tasks, which have been tried several times [39,40,41]. For hypothesis testing (H2), a single overall workload score was calculated by averaging the scores across all six dimensions. This summative score served as the primary measure of subjective cognitive load. Individual subscale results were examined for additional descriptive context and were not treated as separate hypothesis tests; they were considered exploratory and were not adjusted for multiple comparisons.

This study used the unweighted version of NASA-TLX. The decision was informed by a previous study [42], which indicated that there was a high level of correlation (r = 0.94) in the weighted and unweighted TLX scores with no significant difference between the two processes. These results are consistent with previous reports, including those of Hill et al. [43], who stated that weighted scoring is not necessary in certain circumstances. Moreover, Moroney et al. [42] proved that a 15 min latency in the provision of ratings produces similar results to those obtained in previous studies.

This study was based on users’ subjective answers, and there were no objective measures used; therefore future studies might examine, e.g., time spent or annotation accuracy using gold-standard data along with subjective measures to help in interpreting objective values.

5.3. Procedure

The experiment subjects were recruited through Prolific and the screening criteria included Arabic being the first language of the participants and them being fluent in English. Before the study started, the participants were asked to read a consent form and sign it, which confirmed their knowledge of the purpose of the study, as well as the fact that they were free to participate in the study. Data tracking had unique identifiers that ensured anonymity. Those involved were paid the minimum wage in the UK and had the option of dropping out at any time.

The survey utilized demographic data to give background knowledge of the analysis. Standardized measures were used to determine user engagement and cognitive load using such standardized measures as the User Engagement Scale (Short Form) and NASA-TLX. The process was ethical and provided a holistic view of experiences among the participants.

5.4. Data Analysis and Design

IBM SPSS Statistics (version 29) was used to analyze the data. Descriptive statistics were calculated based on the overall sample and each group of participants: the mean, mode, median, and range. The variables’ normality was checked with the help of the Shapiro–Wilk test. Since it was deemed necessary for certain specific tests, the Levene test was performed in an attempt to analyze the homoscedasticity of the variances. To compare the usage and the constructed mental load of all groups and the scene-based group in particular, the independent t-test was used on the data that was normally distributed. Where there were variables that could not be assumed to have equal variances, particularly if the data were not normally distributed, the Mann–Whitney rank sum test was used. For all the tests conducted, the accepted level of significance was at 0.05; any p-value less than this value was considered significant. It was an effective way of making sure that virtually no areas were left out and the outcome represented a dependable conclusion with regard to the findings.

5.5. Results

The data was first preprocessed on the dataset collected for analysis and reliability enhancement. No problems in data entry were found; all records were entered, and no absence of any value was noted in all the datasets. To check for outliers, the boxplots of each variable were visually examined. No outliers were noted, which validated the dataset for the statistical analysis that follows the exploratory analysis phase. These measures helped to achieve methodological and statistical soundness of the results and exclusion of bias in conclusions.

Table 2 provides the descriptive statistics of all variables. The initial four variables are related to the User Engagement Scale, which involves different aspects of user engagement, and the six additional variables are associated with the NASA-TLX dimensions that measure cognitive load in a variety of task factors. This table provides a summary of the most important measures discussed in the research.

5.6. Scene-Based Analysis

In this section, the scene-based analysis results and the correlation between the thematic alignment of user engagement and cognitive load are presented. Table 3 presents descriptive statistics of all the variables in the study by distinguishing scene-based groups from other non-scene groups to make a clear contrast of the results. The Shapiro–Wilk test was used to check the normality of the variables. It revealed that FA (p = 0.653), AE (p = 0.206), and Mental (p = 0.154) are normally distributed and pass the Equality of Variances assumption. However, all the other variables are skewed (p < 0.05).

In Table 4, independent t-tests were conducted to compare engagement subscales, Focused Attention and Aesthetic Appeal, which were treated as exploratory and were not used to directly test the hypotheses and, therefore, were not adjusted for multiple comparisons. In addition, independent t-tests were conducted for the cognitive load dimension, Mental load, between based and not-based scenes, which were also treated as exploratory and were not used to directly test the hypotheses and therefore were not adjusted for multiple comparisons. The results showed that the Focused Attention score was significantly higher in the scene-based (M = 3.44, SD = 0.78) compared to the non-scene-based (M = 3.03, SD = 0.63) group, t(78) = 2.58, p = 0.012, with Cohen’s d = 0.58. Neither aesthetic appeal nor mental demand showed significant differences between the scene-based and non-scene-based groups, indicating that thematic alignment did not impact these variables.

Mann–Whitney tests for skewed variables are shown in Table 5, which were treated as exploratory and not used to directly test the hypotheses and therefore were not adjusted for multiple comparisons. They showed no significant differences between scene-based and non-scene-based groups, except for the Reward variable, where thematic alignment corresponded to significantly higher levels of perceived satisfaction. The results showed that RW score was significantly higher in the scene-based (Mdn = 3.33, IQR = 1.33) compared to the non-scene-based group (Mdn = 2.5, IQR = 1.33), U = 1011.5, p = 0.040 with

r_{r b}

= 0.26.

An independent-samples t-test was conducted, the results of which are shown in Table 6, to compare engagement and cognitive load between the scene-based and non-scene-based conditions and to investigate our two hypotheses. For engagement, participants in the scene-based condition (M = 3.47, SD = 0.37) reported significantly higher scores than those in the non-scene-based condition (M = 3.26, SD = 0.41), t(78) = 2.43, p = 0.017, 95% CI [0.039, 0.386], indicating a moderate effect (Cohen’s d = 0.54). This suggests that scene-based content, which is thematically aligned with the game, enhanced users’ engagement during the task. In contrast, for cognitive load, the difference between the scene-based (M = 28.69, SD = 6.95) and non-scene-based (M = 31.63, SD = 7.38) conditions was not statistically significant, t(78) = −1.83, p = 0.071, 95% CI [−6.13, 0.25]. The effect size, Cohen’s d = −0.41, indicates a small to moderate effect, with the scene-based group lower than the non-aligned group.

While there was a trend toward lower cognitive load in the scene-based condition, this difference did not reach significance.

5.7. Discussion

The results show that texts based on scenes significantly increased engagement compared to non-scene-based texts. Participants who viewed the scene-based condition gave greater scores for Reward, which is the perception of greater enjoyment and intrinsic motivation, which is consistent with previous studies that emphasize the relevance of the context to immersion [30,31,32]. Also, the texts that were scene-based yielded higher Focused Attention scores, which suggested that the players were more consumed by the congruency of the narrative with the desert-cave setting of the game. However, the effect sizes were relatively small and medium, meaning that thematic alignment can improve engagement but not significantly. These small rewards are still useful in keeping the mind focused when working on large-scale annotation tasks, but designers must consider other factors that affect attention and intellectual load. In addition, the observed engagement effects are moderate in size and context-dependent, and results from short-session annotation tasks should not be overgeneralized to longer narrative serious game experiences.

Even though participants in the scene-based condition recorded a reduced mean cognitive load, including reduced mental effort and frustration, the difference did not attain statistical significance (p = 0.071). Thus, I cannot say yet that I fully trust the idea that the scene-based texts decrease cognitive load.

Reading through [20], these reports support the notion that coherence/realism acts as a barrier to immersion: matching text with the visual location likely minimized the need to re-orient, making it simpler to remain interested. In the words of [21], participants’ descriptions corresponding to immersion indicators are Focused Attention and involvement, and this pattern is echoed in our quantitative results (higher Focused Attention and Reward in the aligned condition).

6. Experiment 2: A Qualitative Study on Scene-Based vs. Non-Scene-Based Texts and Their Relationship with Engagement and Cognitive Load

The quantitative study showed that thematic alignment is an aspect that leads to the user experience. In order to explore this more, a qualitative follow-up study was planned to take a closer look at these relationships.

6.1. Participants

Eight subjects were recruited, including four in the scene-based (Players 3, 4, 5, and 6) and four in the non-scene-based (Players 1, 2, 7, and 8) condition.

6.2. Measures and Procedure

This part outlines open-ended inquiries that are specific to responses to capture qualitative information about engagement, cognitive load, and text-specific feedback under the rubric of a gamified annotation exercise. The questions attempt to record the experiences of the participants in reading texts in scenes and the texts that are not scenes, with a strong focus on their emotional reactions, perceived complexity, and their style of narration preference.

6.2.1. Engagement

What was your experience during the process of reading and annotating the text?
Follow up: Did you feel engaged, disengaged, or indifferent? What affected your emotions?
Was the type of text interesting to you?
Follow-up: What did you find attractive or unattractive about it (story content, style of writing, etc.)?
Would you have the motivation to play the game, using this type of text?
Follow-up: What prompts you to continue or not?

6.2.2. Cognitive Load

How difficult was it to write notes on the text?
Follow-up: What text characteristics made the text easier/more challenging?
What was the extent of mental effort needed to comprehend the text and label it?
Have you felt overwhelmed or tired at any time?
Follow-up: What do you think was evoked in you in the text?

6.2.3. Text-Specific Feedback

Have you been affected by the text when undertaking the annotation exercise?
What was your favorite or least favorite thing about the style and the content of the text?
Follow-up: What is the influence of these factors on your experience?
How might the text be shortened/edited to become more interesting/easier to annotate?
Follow-up question: Why do you think these changes will make your experience better?

6.3. Results

To explore participants’ experiences and perceptions, we conducted a thematic analysis of the think-aloud protocols and interview data, following Braun and Clarke’s reflexive thematic analysis framework [43,44,45].

The data was analyzed by the author, with no inter-coder reliability; while this does not invalidate the findings, it should be acknowledged as a limitation. It is initially transcribed and then coded to create an affinity diagram, from which themes have emerged and are translated into English where necessary. The sessions were conducted face-to-face and synchronously. After conducting the sessions with two participants in each group, three recurring patterns and insights were reported. No additional theme emerged from the following participants, which was considered sufficient for the study, indicating theme saturation; no new perspectives emerged in the last four interviews. This process led to the development of three central themes that capture the key aspects of user engagement and task interaction within the game. These themes are presented below, each supported by illustrative quotes from the participants.

6.3.1. Theme 1: Contextual Relevance Enhances Engagement

The respondents identified greater engagement when the text was presented in the gaming environment. Individuals in the scene-based condition were very interested, saying “I lived the story by going to the locations of where it was set and behaving like the protagonist of the narrative, in which case I traveled through the town to the treasure location in the cave”. Similarly, Player 5 liked the combination of text and environment and remarked that, “You play the story in a very rich environment… I wasn’t bored, the story was interesting.” Such reactions indicate that scene-congruent narratives may contribute to a more engaging, game-like reading process. This is why the experience was immersive; it was clear and style-wise described; in the words of Player 3, “The language was simple, and the description of the place made it intriguing… it’s thrilling to read similar stories to know how they would end.”

However, people in the non-scene-based group often reported disengagement, especially in cases when the subject matter seemed irrelevant to the game world. Player 1 mentioned, “I was bored… It took me time to understand it.”. Player 2 had a similar opinion. He said, “I liked visiting places in the game, but the content wasn’t engaging,” expressing clear dissatisfaction with the text content. A player who read a non-scene-based text (player 7) was interested in the first reading of the text, but noted later, “I was interested at the start, but then I got bored.” This decrease in interaction seems to appear not only due to the non-thematic relevance but also to the design of the experience. The game presented the story in two different places to the participant, and to proceed with reading, they had to move physically to the other place. Such a break in the flow of the story, combined with the fact that the story is irrelevant to the game setting, could have caused a loss of attention and recollection of previous information. The second part of the story was thus lost and not as interesting, and thus the player may have lost emotional involvement and may not have been immersed in the narrative easily, thus leading to boredom. In addition, Player 8 expressed that the disassociation between text and context seemed to restrict narrative engagement.

6.3.2. Theme 2: Scene Integration Eases Cognitive Processing

Scene-based texts were always seen as less challenging in terms of mental load. The reading and labelling activities were smooth and intuitive according to the participants. Player 3 stated, “It was easy… the story was exciting, and I was focused,” emphasizing how the narrative pulled attention in and made comprehension almost effortless. The sentiments of Player 4 were similar, saying, “The answers were very clear… I didn’t feel overwhelmed because the text was very clear.” Player 5 reported minimal cognitive strain, saying, “Almost no effort, it was very easy,” and even noted that the coherence of the text helped him stay on task: “The text is very connected and to the point, so I didn’t get distracted.”

Unlike the scene-based texts, non-scene-based texts had a higher tendency to interfere with attentional focus and create friction. Player 1 has stated that the level of effort involved was moderate, and he gets distracted when reading, stating, “I am easily distracted when reading. I was merely responding and scanning it rather than reading it comprehensively.” Even though Player 2 has described the text as being easy, regarding the level of understanding, he has noticed the lack of motivation: “You work hard when you like the thing… I liked the game; however, the text was distracting”. Fatigue also emerged for some participants, as Player 2 admitted feeling “mentally tired at the end,” and Player 7 recalled “feeling overwhelmed at the end of the story.” While the texts may have been structurally simple, this suggests that the lack of thematic connection may have contributed to reduced focus.

6.3.3. Theme 3: Narrative Coherence Shapes Labelling Approach

The ability of the participants to do the labelling tasks differed on the basis of the clarity with which the text conveyed the meaning and concomitance of the text with the game environment. The respondents in the scene-based conditions were convinced that the stories appeared to reinforce their decisions to label the other by creating smoother content to read. According to Player 3 of the group, the story was exciting and they concentrated, which point towards the high engagement predisposing them to be attentive in the course of annotation. Likewise, Player 5 explained that the text was very connected as well as to the point as to not distrust themself (which shows that the integration of game context with narrative clarity is useful in minimizing the mental friction relating to switching activities that is otherwise theoretically expressed in task switching). Even Player 4 asserted that the task was very clear, making the labelling almost automatic.

Conversely, other participants who dealt with non-scene-based texts tended to become less attentive. Participant 1 confessed that they were skimming and answering without reading it through; primarily, this is an indication of a lack of comprehension and interest. Though Player 8 found the language simple, they also claimed the game interrupted aspects of the reading process where they had to literally move physically to a different in-game location in order to resume the narrative. Disengagement during the annotation process was often associated with the absence of thematic coherence and a sense of narrative continuity in the non-scene-based texts.

6.4. Discussion

The qualitative findings suggest that when the story matched the scene (scene-based), players may have felt more absorbed and focused; when it did not match (non-scene-based), some reported boredom and frustration. We approach these as impressions, not proof of mechanism, and consider possible alternatives (for example, topic interest, tiredness, breaks in tale flow between sites).

7. Discussion

In this section, we will discuss the findings and address our research hypothesis based on the results obtained.

H1:

Scene-based text will significantly enhance user engagement compared to non-scene-based text by creating immersive and contextually relevant narratives.

To answer the hypothesis, an overall Engagement score was calculated by summing all User Engagement Scale—Short Form item scores—spanning Focused Attention, Perceived Usability, Aesthetic Appeal, and Reward Factor—and dividing by twelve. The subscales are treated as exploratory and are not used to directly test the hypotheses.

The Engagement score of the t-test was a good indication of normal distribution values, with p = 0.017. Since the p-value was less than the standard significance value (i.e., 0.05), the null hypothesis was discarded, which indicated that the effect was statistically significant.

The observed significance in the overall engagement was mainly determined by the significant impacts in several subscales of the User Engagement Scale. Specifically, significant differences were found in Focused Attention and Reward Factor, which made significant contributions to the final engagement result. When comparing scene-based text to non-scene-based text, the results show that the Focused Attention score was significantly higher in scene-based (M = 3.44, SD = 0.78) compared to non-scene-based (M = 3.03, SD = 0.63) groups, t (78) = 2.58, p = 0.012 with Cohen’s d = 0.58. The findings also indicated that scene-based Reward (RW) (Mdn = 3.33, IQR = 1.33) scored significantly higher than non-scene-based Reward (RW) (Mdn = 2.5, IQR = 1.33), U = 1011.5, p = 0.040 with

r_{r b}

= 0.26.

For Aesthetic Appeal, the total outcome was not significantly different but there was a slight leaning towards the scene-based group based on the proportion of the mean score of all three items (see Table 3) and the score of each item separately. This implies that theme-based text reading could have produced this minor variation although the visual scene remained constant across the conditions. The lack of significant difference makes sense since aesthetic appeal is mostly related to the game’s visual and sensory elements, not the text. Perceived Usability did not show a statistically significant difference between the scene-based and non-scene-based groups, indicating that usability perceptions were comparable across conditions.

Individuals that were presented with scene-based texts continued to mention greater engagement, and the stories were labeled as immersive and consistent with the environment of the game. The reactions that they had pointed out the importance of environmental relatability to maintain interest and involvement. A number of participants expressed their feeling of interest in the plot and, thus, the desire to engage in the story more, which encouraged them to read more. The responses of the participants stressed the importance of the integration of the narrative with environmental relevance in maintaining interest and engagement. A certain number of participants stated that they felt greater interest in the narrative; therefore, they were more motivated to read the text and thus they spent more time reading. These results indicate that narrative coherence and topical relevance are important factors in user experience and interaction. This fact highlights the fact that the user experience and interaction are concerned with coherence in narration and topicality. On the other hand, those participants who received non-scene-based texts indicated disengagement and dissatisfaction. The content was seen by many as irrelevant to the context of the game, which can be one of the reasons why attention and desire to continue decreased. Although a few of them admitted that they read the texts, they did not have meaningful engagement or sustained reading. Suggestions to do better were to incorporate visuals (context) or more relevant information, thus supporting the idea that narrative isolation may be counterproductive as far as the whole user experience is concerned.

H2:

Scene-based text is expected to reduce cognitive load by providing thematic coherence and minimizing extraneous mental effort compared to non-scene-based text.

An overall subjective workload score was calculated by summing all NASA Task Load Index item scores—spanning Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration—and dividing by six. A t-test was conducted because the values were normally distributed, with a p-value of 0.071. Since the p-value exceeds the conventional significance threshold (e.g., 0.05), we fail to reject the null hypothesis, suggesting that the results fail to demonstrate enough evidence for a difference between the conditions.

The mental load subscale of the NASA-TLX showed no significant difference with a p-value of (p = 0.060). The mean mental load score for the scene-based group was 30.63, compared to 37.25 for the non-scene-based group, further indicating that scene-based texts may reduce cognitive demands.

The frustration subscale of NASA-TLX yielded a p-value of 0.051, meaning it is not statistically significant. The results are in line with PU1 in the UES-SF, which is the measure of frustration specifically; the average score in the scene-based condition was 1.8, whereas in the non-scene-based condition, it was 2.76. The p-value of 0.077 of PU1 also indicates that scene-based text may help create a less straining user experience more easily, although this is not statistically significant.

NASA-TLX subscales also showed that there were no significant differences in physical demand (p = 0.961), temporal demand (p = 0.625), and the subscale of effort, which indicated that there were no differences in perceived effort between text conditions. Performance, when measured using smaller scores implying better performance, indicated that the scene-based group had a higher mean rank (25) than the non-scene-based group (35.00), with p-value = 0.075, which implies that it did not have any significant influence on performance. The findings revealed that there was no notable difference in the effort subscale of the NASA-TLX, and the implication is that perceived effort did not differ between text based on scene and non-scene conditions.

These results are also explored by qualitative responses. Subjects who were shown the scene-based texts found the task easy and straightforward, with many stating that they remained focused because of the relevance and clarity of the content. Fatigue and confusion were hardly reported. An example is that one respondent expressed that they were not distracted, and another was motivated to carry on.

Conversely, non-scene-based text users reported instances of boredom and lack of concentration, to the effect of exerting more mental effort and having to re-focus. Some of them claimed to have an energy decline or lack of concentration, particularly when the texts did not seem related to the game. One of the participants mentioned that they were interrupted by new locations due to unrelated texting. Other respondents expressed no interest in putting effort into it and said that the material was not interesting enough to keep their focus on it.

Though the total cognitive load did not demonstrate a statistically significant difference in conditions, certain NASA-TLX subscales, in particular, Mental Demand and Frustration, revealed tendencies with difference in mean values, which could be worth mentioning. The scene-based group scored lower in the following subscales, and this can imply that thematic coherence and contextual integration may facilitate a reduction in mental effort. The qualitative feedback also supports these findings greatly, as in many instances, participants reported that scene-based tasks were easier and more focused. Even though the overall difference in cognitive load was not significant, these subscale patterns may shed light on possible ways in which the content presented in the form of scenes can alleviate certain aspects of mental load.

8. Conclusions

To test how different narrative alignments influenced the engagement and cognitive experience of players in a game environment, a customized 3D gaming environment was created with special zones and differences between narrative alignments and non-aligned narratives based on the environment (i.e., a scene-based narrative and a non-aligned narrative embedded in the desert-cave world). We found that a scene-based narrative greatly improved user engagement, especially Focused Attention and Reward, which implies that players perceived themselves to be more immersed and motivated when narratives corresponded to the visual setting. Their statistically significant but small-to-medium effects suggest that thematic alignment provides an engagement benefit, but with valuable, rather than transformative, utility to help keep attention during large-scale annotation tasks. This implies that narrative compatibility is among a number of factors affecting engagement.

Although there were no statistically significant differences in overall cognitive load between scene- and non-scene-based conditions, there were tendencies in mental demand and frustration that were in support of scene-based texts. This was also explored by qualitative responses, whereby the participants viewed the scene-based narratives as more engaging and easier to follow and felt that they created a feeling of participation in the story. On the other hand, non-scene-based texts were fragmented and less interesting, which decreased interest. These findings highlight the importance of having a follow-up study as our short experimental session and limited stimuli may have reduced our ability to detect significant effects. We could test participants spending longer time in the task because longer time spent on a task is essential to complex linguistic processes, like coreference resolution. We could also test more clearly unrelated text types, e.g., news articles or research documents. In addition, variables like experience playing previous games or familiarity with annotation were not investigated, and therefore, future research should investigate the effect of factors like participation on engagement and cognitive workload. Such studies may also streamline a narrative to gamify linguistic activities.

Author Contributions

Conceptualization, methodology, software, W.A.; validation, W.A. and M.P.; formal analysis, investigation, resources, data curation, writing—original draft preparation, W.A.; writing—review and editing, W.A. and M.P.; supervision, M.P.; project administration, M.P.; funding acquisition, W.A. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Saudi Arabian Cultural Bureau in the UK, Imam Mohammad Ibn Saud Islamic University (IMSIU) in Saudi Arabia, and by the ARCIDUCA project, UK EPSRC (EP/W001632/1).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

Wateen Aliady is supported by the Saudi Arabian Cultural Bureau in the UK and Imam Mohammad Ibn Saud Islamic University (IMSIU) in Saudi Arabia. Massimo Poesio was in part supported by the ARCIDUCA project, funded by the UK EPSRC (EP/W001632/1).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kapp, K. The Gamification of Learning and Instruction, 1st ed.; Pfeiffer: Hoboken, NJ, USA, 2012. [Google Scholar]
Nicholson, S. Peeking Behind the Locked Door: A Survey of Escape Room Facilities. White Paper. 2015. Available online: https://scottnicholson.com/pubs/erfacwhite.pdf (accessed on 14 September 2025).
Rowe, J.P.; Shores, L.R.; Mott, B.W.; Lester, J.C. Integrating learning, problem solving, and engagement in narrative-centered learning environments. Int. J. Artif. Intell. Educ. 2011, 21, 115–133. [Google Scholar] [CrossRef]
Barab, S.A.; Sadler, T.D.; Heiselt, C.; Hickey, D.; Zuiker, S. Relating narrative, inquiry, and inscriptions: Supporting consequential play. J. Sci. Educ. Technol. 2007, 16, 59–82. [Google Scholar] [CrossRef]
Rieber, L.P. Seriously considering play: Designing interactive learning environments based on the blending of microworlds, simulations, and games. Educ. Technol. Res. Dev. 1996, 44, 43–58. [Google Scholar] [CrossRef]
Habgood, M.J.; Ainsworth, S.E. Motivating children to learn effectively: Exploring the value of intrinsic integration in educational games. J. Learn. Sci. 2011, 20, 169–206. [Google Scholar] [CrossRef]
Malone, T.W. Toward a theory of intrinsically motivating instruction. Cogn. Sci. 1981, 5, 333–369. [Google Scholar] [CrossRef]
Prins, P.J.; Brink, E.T.; Dovis, S.; Ponsioen, A.; Geurts, H.M.; De Vries, M.; Van Der Oord, S. Braingame Brian: Toward an executive function training program with game elements for children with ADHD and cognitive control problems. Games Health J. 2013, 2, 44–49. [Google Scholar] [CrossRef]
Kotler, J.A.; Schiffman, J.M.; Hanson, K.G. The influence of media characters on children’s food choices. J. Health Commun. 2012, 17, 886–898. [Google Scholar] [CrossRef] [PubMed]
Paiva, A.; Dias, J.; Sobral, D.; Aylett, R.; Sobreperez, P.; Woods, S.; Hall, L. Caring for agents and agents that care: Building empathic relations with synthetic agents. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA, 19–23 July 2004. [Google Scholar]
Johnsen, K.; Raij, A.; Stevens, A.; Lind, D.S.; Lok, B. The validity of a virtual human experience for interpersonal skills education. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, 28 April–3 May 2007. [Google Scholar]
McQuiggan, S.W.; Rowe, J.P.; Lester, J.C. The effects of empathetic virtual characters on presence in narrative-centered learning environments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, 5–10 April 2008. [Google Scholar]
Mateas, M.; Stern, A. Structuring content in the Façade interactive drama architecture. In Proceedings of the Artificial Intelligence and Interactive Digital Entertainment Conference, Marina del Rey, CA, USA, 1–6 June 2005. [Google Scholar]
Thue, D.; Bulitko, V.; Spetch, M.; Wasylishen, E. Interactive storytelling: A player modelling approach. In Proceedings of the Artificial Intelligence and Interactive Digital Entertainment Conference, Stanford, CA, USA, 6–8 June 2007. [Google Scholar]
Aylett, R.S.; Louchart, S.; Dias, J.; Paiva, A.; Vala, M. FearNot!—An experiment in emergent narrative. In Proceedings of the International Workshop on Intelligent Virtual Agents, Kos, Greece, 12–14 September 2005; pp. 305–316. [Google Scholar]
Bopp, M.M. Storytelling and Motivation in Serious Games. Final Consolidated Research Report of the Enhanced Learning Experience and Knowledge Transfer Project ELEKTRA; Nr. 27986; Publications Office of the European Union: Luxembourg, 2008. [Google Scholar]
Naul, E.; Liu, M. Why story matters: A review of narrative in serious games. J. Educ. Comput. Res. 2020, 58, 687–707. [Google Scholar] [CrossRef]
Jenkins, H. Game design as narrative architecture. In First Person: New Media as Story, Performance, and Game; Wardrip-Fruin, N., Harrigan, P., Eds.; MIT Press: Cambridge, MA, USA, 2004; pp. 118–130. [Google Scholar]
Murray, J. From game-story to cyberdrama. In First Person: New Media as Story, Performance, and Game; Wardrip-Fruin, N., Harrigan, P., Eds.; MIT Press: Cambridge, MA, USA, 2004; pp. 2–11. [Google Scholar]
Ryan, M.L. Narrative as Virtual Reality: Immersion and Interactivity in Literature and Electronic Media; Johns Hopkins University Press: Baltimore, MD, USA, 2001. [Google Scholar]
Brown, E.; Cairns, P. A grounded investigation of game immersion. In Proceedings of the CHI ‘04 Extended Abstracts on Human Factors in Computing Systems, Vienna, Austria, 24–29 April 2004; pp. 1297–1300. [Google Scholar]
Jennett, C.; Cox, A.L.; Cairns, P.; Dhoparee, S.; Epps, A.; Tijs, T.; Walton, A. Measuring and defining the experience of immersion in games. Int. J. Hum. Comput. Stud. 2008, 66, 641–661. [Google Scholar] [CrossRef]
Csikszentmihalyi, M.; Csikzentmihaly, M. Flow: The Psychology of Optimal Experience; Harper & Row: New York, NY, USA, 1990; p. 1. [Google Scholar]
O’Brien, H.L.; Toms, E.G. The development and evaluation of a survey to measure user engagement. J. Am. Soc. Inf. Sci. Technol. 2010, 61, 50–69. [Google Scholar] [CrossRef]
Miller-Day, M.; Hecht, M.L. Narrative means to preventative ends: A narrative engagement framework for designing prevention interventions. Health Commun. 2013, 28, 657–670. [Google Scholar] [CrossRef]
Marsh, T.; Klopfer, N.L.Z.; Xuejin, E.; Osterweil, S.; Haas, J. Fun and learning: Blending design and development dimensions in serious games through narrative and characters. In Serious Games and Edutainment Applications; Springer: London, UK, 2011; pp. 273–288. [Google Scholar]
Mittenentzwei, S.; Garrison, L.A.; Mörth, E.; Lawonn, K.; Bruckner, S.; Preim, B.; Meuschke, M. Investigating user behavior in slideshows and scrollytelling as narrative genres in medical visualization. Comput. Graph. 2023, 114, 229–238. [Google Scholar] [CrossRef]
Prestopnik, N.R.; Tang, J. Points, stories, worlds, and diegesis: Comparing player experiences in two citizen science games. Comput. Hum. Behav. 2015, 52, 492–506. [Google Scholar] [CrossRef]
Schraw, G.; Flowerday, T.; Reisetter, M.F. The role of choice in reader engagement. J. Educ. Psychol. 1998, 90, 705–714. [Google Scholar] [CrossRef]
Moschovaki, E.; Meadows, S. Young children’s cognitive engagement during classroom book reading: Differences according to book, text genre, and story format. Early Child. Res. Pract. 2005, 7, n2. Available online: https://files.eric.ed.gov/fulltext/EJ1084854.pdf (accessed on 14 September 2025).
Sweller, J. Cognitive load during problem solving: Effects on learning. Cogn. Sci. 1988, 12, 257–285. [Google Scholar] [CrossRef]
Schüler, A. Investigating gaze behavior during processing of inconsistent text-picture information: Evidence for text-picture integration. Learn. Instr. 2017, 49, 218–231. [Google Scholar] [CrossRef]
Boll, S.; Berti, S. Distraction of task-relevant information processing by irrelevant changes in auditory, visual, and bimodal stimulus features: A behavioral and event-related potential study. Psychophysiology 2009, 46, 645–654. [Google Scholar] [CrossRef] [PubMed]
Yum, Y.N.; Cohn, N.; Lau, W.K.-W. Effects of picture-word integration on reading visual narratives in L1 and L2. Learn. Instr. 2021, 71, 101397. [Google Scholar] [CrossRef]
Kohavi, R.; Longbotham, R.; Sommerfield, D.; Henne, R.M. Controlled experiments on the web: Survey and practical guide. Data Min. Knowl. Discov. 2009, 18, 140–181. [Google Scholar] [CrossRef]
Huggingface. Available online: https://huggingface.co (accessed on 14 September 2025).
O’Brien, H.L.; Cairns, P.; Hall, M. A practical approach to measuring user engagement with the refined user engagement scale (UES) and new UES short form. Int. J. Hum. Comput. Stud. 2018, 112, 28–39. [Google Scholar] [CrossRef]
Hart, S.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar]
Rubio, S.; Díaz, E.; Martín, J.; Puente, J.M. Evaluation of subjective mental workload: A comparison of SWAT, NASA-TLX, and workload profile methods. Appl. Psychol. 2004, 53, 61–86. [Google Scholar] [CrossRef]
Xiao, Y.; Wang, Z.; Wang, M.; Lan, Y. The appraisal of reliability and validity of subjective workload assessment technique and NASA-task load index. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi 2005, 23, 178–181. [Google Scholar] [PubMed]
Moroney, W.F.; Biers, D.W.; Eggemeier, F.T.; Mitchell, J.A. A comparison of two scoring procedures with the NASA task load index in a simulated flight task. In Proceedings of the IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 18–22 May 1992; pp. 734–740. [Google Scholar]
Hill, S.G.; Byers, J.C.; Zaklad, A.L.; Christ, R.E. Subjective workload assessment during 48 continuous hours of LOS-FH operations. In Proceedings of the Human Factors Society Annual Meeting, Denver, CO, USA, 16–20 October 1989; pp. 1129–1133. [Google Scholar]
Lewis, C. Using the ’Thinking-Aloud’ Method in Cognitive Interface Design; Research Report RC9265; IBM T. J. Watson Research Center: Yorktown Heights, NY, USA, 1982. [Google Scholar]
Braun, V.; Clarke, V. Reflecting on reflexive thematic analysis. Qual. Res. Sport Exerc. Health 2019, 11, 589–597. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qual. Res. Psychol. 2021, 18, 328–352. [Google Scholar] [CrossRef]

Figure 1. Comparison of scene-based and non-scene-based texts. The background image of a desert and cave reflects the in-game environment, enhancing the scene-based text’s content.

Figure 2. Game environments.

Figure 3. Text presentation in game. The English translation of the text presented at the bottom, the labeling task, is ‘Is the text highlighted in red a reference to a previous text? If yes, click on the closest text it refers to, then click on the submit button.’.

Table 1. Linguistic Complexity Measures.

	Aligned Text	Non-Aligned Text
Text length	339 words	339 words
Number of sentences	23 sentences	24 sentences
Average sentence length	14.7 words per sentence	14.1 words per sentence
Type–token ratio	0.81	0.84
Coreference annotations	10 questions	10 questions

Table 2. Descriptive statistics for UES-sf and NASA-TLX variables.

Descriptive Statistics	N	Minimum	Maximum	Mean	Std. Deviation
Focused Attention	80	1	4.66	3.24	0.73
Perceived Usability	80	1	3	1.83	0.55
Aesthetic Appeal	80	1	5	3.22	0.80
Reward	80	1	4.67	2.82	0.88
Mental	80	5	70	33.94	15.77
Physical	80	5	55	21.13	11.66
Temporal	80	5	70	23.63	13.66
Performance	80	5	85	30.44	16.56
Effort	80	5	90	39.00	22.65
Frustration	80	0	80	32.81	16.74

Table 3. Descriptive statistics for variables by scene-based and non-scene-based groups.

Scene-Based	Subscales	Min	Max	Mean	SD
Scene-Based	Focused Attention	1	4.67	3.44	0.78
	Perceived Usability	1	3	1.92	0.59
	Aesthetic Appeal	2	5	3.33	0.76
	Reward	1.33	4.67	3.03	0.85
	Mental	5	65	30.63	17.4
	Physical	5	45	21	11.28
	Temporal	5	45	22.5	12.4
	Performance	5	55	27	14.93
	Effort	5	90	41.75	22.72
	Frustration	5	65	29.25	15.3
Non-Scene-Based	Focused Attention	1.33	4	3.03	0.63
	Perceived Usability	1	3	1.74	0.50
	Aesthetic Appeal	1	4.67	3.13	0.83
	Reward	1	4	2.61	0.87
	Mental	15	70	37.25	13.35
	Physical	5	55	21.25	12.18
	Temporal	5	70	24.75	14.89
	Performance	5	85	33.88	17.56
	Effort	5	80	36.25	22.53
	Frustration	0	80	36.38	17.54

Table 4. Comparative results for normally distributed variables between scene-based and non-scene-based groups.

	Scene-Based			Non-Scene-Based
	N	M	SD	N	M	SD	t	p
Focused Attention	40	3.44	0.78	40	3.03	0.63	2.58	0.012
Aesthetic Appeal	40	3.33	0.76	40	3.13	0.83	1.07	0.286
Mental	40	30.6	17.4	40	37.2	13.3	−1.91	0.060

M is mean, SD is standard deviation, N = 80.

Table 5. Comparative results for non-normally distributed variables between scene-based and non-scene-based groups.

	Based		Non-Scene-Based
	Mdn	IQR	Mdn	IQR	U	p
Perceived Usability	1.67	1	1.67	0.67	920	0.240
Reward	3.33	1.33	2.5	1.33	1011.5	0.040
Physical	20.00	20.00	20.00	17.5	805	0.965
Temporal	20.00	25.00	25.00	22.5	749	0.625
Performance	25.00	22.5	35.00	20	615	0.075
Effort	40.00	35.00	37.50	40.00	906	0.309
Frustration	25.00	25.00	35.00	25.00	598	0.051

Table 6. User engagement and cognitive load results.

	Based			Non-Scene-Based
	N	M	SD	N	M	SD	t	p
User Engagement	40	3.47	0.37	40	3.26	0.41	2.43	0.017
Cognitive Load	40	28.7	6.95	40	31.6	7.38	−1.83	0.071

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aliady, W.; Poesio, M. Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study. Appl. Sci. 2026, 16, 1915. https://doi.org/10.3390/app16041915

AMA Style

Aliady W, Poesio M. Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study. Applied Sciences. 2026; 16(4):1915. https://doi.org/10.3390/app16041915

Chicago/Turabian Style

Aliady, Wateen, and Massimo Poesio. 2026. "Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study" Applied Sciences 16, no. 4: 1915. https://doi.org/10.3390/app16041915

APA Style

Aliady, W., & Poesio, M. (2026). Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study. Applied Sciences, 16(4), 1915. https://doi.org/10.3390/app16041915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring the Impact of Thematic Alignment of Narratives in a Game with a Purpose on User Engagement and Cognitive Load: An Experimental Study

Abstract

1. Introduction

2. Literature Review

2.1. Games as Narrative Architectures

2.2. Immersion Theory

2.3. Flow Theory

2.4. Narrative Impact on Engagement and Learning

2.5. Cognitive Load Theory and Narrative Alignment

2.6. Focus of the Present Study

3. Research Questions and Hypotheses

4. Methods

4.1. Experimental Framework

4.2. Experimental Design

4.3. Stroll with a Scroll: A 3D Virtual World Game

4.4. LLM-Driven Story Generation and Preprocessing

4.5. Ethical Approval

5. Experiment 1: A Quantitative Study on Scene-Based vs. Non-Scene-Based Texts and Their Relationship to Engagement and Cognitive Load

5.1. Participants

5.2. Materials and Measures

5.3. Procedure

5.4. Data Analysis and Design

5.5. Results

5.6. Scene-Based Analysis

5.7. Discussion

6. Experiment 2: A Qualitative Study on Scene-Based vs. Non-Scene-Based Texts and Their Relationship with Engagement and Cognitive Load

6.1. Participants

6.2. Measures and Procedure

6.2.1. Engagement

6.2.2. Cognitive Load

6.2.3. Text-Specific Feedback

6.3. Results

6.3.1. Theme 1: Contextual Relevance Enhances Engagement

6.3.2. Theme 2: Scene Integration Eases Cognitive Processing

6.3.3. Theme 3: Narrative Coherence Shapes Labelling Approach

6.4. Discussion

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI