4. Results
The serious game prototype evaluated in this study was developed using Unity (version 2022.3.3f1) and incorporates key learning mechanisms to support literacy development. Scaffolding is achieved through the progressive sequencing of tasks across the three themed worlds, while spaced repetition and active recall are embedded through the structured revisits of previously learned graphemes, syllables, and words—both within the Reading Club and through retrieval-based mini-games. The game was designed for low- to mid-range devices to ensure accessibility and feasibility across evaluation contexts. Minimum hardware specifications included the following: Windows 10 operating system, Intel Core i3 processor (or equivalent), 4 GB RAM, integrated GPU (e.g., Intel HD Graphics 4000), and 1 GB of available storage.
For testing purposes, all expert reviewers accessed the downloadable build on desktop computers, while the hospital-based pilot was implemented on a 2-in-1 convertible device (tablet-laptop hybrid) operating in tablet mode to simulate the touchscreen tablets intended for clinical use in hospital classrooms.
Following the technical deployment of the prototype, qualitative analyses were conducted using Atlas.ti software (Atlas.ti v.24, ATLAS.ti Scientific Software Development GmbH, Berlin, Germany) was used for all qualitative data analyses. Qualitative findings indicated that hospital-based teachers anticipated strong emotional engagement and perceived the serious game as a potential motivator for children to explore and participate actively. According to both the focus group with six hospital-based teachers and the individual interviews conducted with 30 design experts, the game was considered to be well-aligned with the intended learning objectives and appropriate for the hospital context. The expert sample included five specialists from each of six key fields involved in serious games design of this nature: educational game design; educational technology; psycholinguistics and reading processes applied to digital environments; computer engineering, programming and development; hospital education and health-related technologies; and accessibility in reading-focused games.
In the focus group discussion (N = 6), participants were invited to share their experiences with the game and assess its design, narrative, pedagogy, and challenges in relation to the intended educational outcomes. There was broad consensus that the game was well-suited to the needs of children undergoing cancer treatment. Specifically, the participants agreed that the objectives, activities and tasks, theoretical and didactic underpinnings, language, content, and presentation were all appropriate for the paediatric oncology context. From the perspective of hospital-based teachers, the alignment of the game with the clinical and emotional trajectories of paediatric cancer patients was regarded as one of its key strengths, contributing not only to literacy development but also to children’s social, emotional, and curricular growth.
Participants also highlighted the game’s visual and musical aesthetics as particularly effective in capturing children’s attention. The technological nature of the tool was seen as an advantage over traditional paper-and-pencil methods, increasing learner engagement and making the learning experience more enjoyable. Additional strengths cited included the game’s intuitiveness and ease of use, its accessibility and inclusivity, and the originality and challenge level of the embedded activities.
Nonetheless, several design and technical challenges emerged during the evaluation. Teachers reported difficulties in exiting the game and returning to the start menu and noted that the time required to access interactive segments or mini-games could be improved. These points of feedback were addressed in the subsequent iteration of the game prototype.
In addition to the focus group with hospital teachers, individual interviews were conducted with the 30 design experts previously described, representing complementary domains involved in serious game design development, with gender representation being relatively balanced. These experts affirmed the strength of the theoretical foundation underlying the prototype, its contextual suitability for paediatric hospital settings, and its potential to foster intrinsic motivation among young patients. They also praised the integration of playful and educational elements, the clarity and simplicity of the narrative, and language deemed developmentally appropriate for the target audience and the overall usability of the interactive environment. The visual and auditory quality of the game, along with its feasibility for implementation in real hospitalisation conditions, were also highlighted as key advantages.
Areas identified for further development included the need to enhance technical performance, diversify the range of tasks, and provide clearer, more accessible instructions to support user autonomy. These suggestions were implemented in the final version of the video game to improve functionality, educational effectiveness, and alignment with the hospital learning environment.
Concerning the QUAN investigations, 274 experts (N = 274) participated in the evaluation (
Table 1), with over 50% of participants being male and middle-aged. Participants received access to the serious game prototype through a downloadable build provided via Google Drive, which they explored prior to completing the online questionnaire. All statistical analyses were conducted using SPSS software (SPSS v.27, IBM Corporation, Armonk, NY, USA).
This study also explored participants’ views of serious game design (
Table 2), with the experts’ responses in the distributed instrument being overly positive and consistent (Cronbach’s alpha = 0.86) across all the examined constructs.
Furthermore, item-level scores were analysed within each of the three evaluated dimensions.
Table 3 presents the mean values and standard deviations for the dimension titled “Quality of Design in the Adaptation of Content to Context.”
The mean rating for all items within this construct was M = 4.28 (SD = 0.396), with individual item scores ranging from 4.15 to 4.44. These results reflect a high overall assessment of the prototype’s quality in adapting content to the specific educational context. Furthermore, the experts indicated that the prototype is grounded in a robust theoretical framework and makes effective use of language and methodology that are well-aligned with the needs of the target learner population and the characteristics of the hospital-based learning environment. Additionally, the game demonstrated adequate responsiveness and featured well-balanced auditory and visual aesthetics. Its development also accounted for key factors necessary for successful implementation within the intended educational context.
The highest-rated item was the theoretical soundness and internal coherence of the prototype (Item 1; M = 4.44, SD = 0.604), followed by its contextual and temporal adaptability (Item 7; M = 4.35, SD = 0.690). The lowest score within this dimension was assigned to the prototype’s capacity to facilitate the intended learning outcomes (Item 3; M = 4.15, SD = 0.673). Nevertheless, this score still reflects a clearly positive expert perception of the didactic methodology employed.
Overall, the aforementioned results suggest a high level of agreement among respondents, as demonstrated by the relatively low standard deviations. The strongest consensus was observed for the item on theoretical coherence (Item 1), while slightly more variation in responses was found for the item addressing the prototype’s responsiveness to player actions (Item 4; M = 4.16, SD = 0.729).
Table 4 presents the mean scores and standard deviations for the dimension “Pedagogical Quality of Design.”
The mean rating for all items within the Pedagogical Quality of Design construct was M = 4.16 (SD = 0.444), with individual item scores ranging from 4.09 to 4.39. These results reflect a high perceived pedagogical quality in the design of the serious game prototype. According to expert evaluations, the learning objectives were considered appropriate, the instructional activities relevant and sufficient, and the user support mechanisms effective in facilitating learning. Additionally, the content was viewed as varied and engaging, with the potential to stimulate learners’ interest.
The highest-rated elements within this dimension were those related to the game’s educational objectives and instructional activities. Experts regarded the objectives as specific, pedagogically grounded, and relevant for the target audience (Item 11; M = 4.39, SD = 0.667). Similarly, the activities were rated as both pertinent and sufficient for achieving intended learning outcomes (Item 12; M = 4.13, SD = 0.650). Conversely, the lowest-rated aspects were the variety of content (Item 10; M = 4.09, SD = 0.605) and the prototype’s capacity to generate learner interest (Item 13; M = 4.09, SD = 0.611). Despite these lower ratings, the scores remain within a high and favourable range.
Low variability in responses across most items suggests a strong consensus among experts. Notably, there was high agreement regarding the level of training and support provided by the game (Item 14; M = 4.10, SD = 0.597). The greatest variation in expert opinion occurred with respect to the adequacy and relevance of the educational objectives (Item 11; SD = 0.667), though the mean rating remained the highest within the construct.
Table 5 reports the mean scores and standard deviations for the dimension “Technical Quality of Design.”
The technical quality of the prototype was likewise positively assessed, with an overall mean of M = 4.27 (SD = 0.433). Individual item means ranged from 4.13 to 4.41, indicating consistently strong expert approval. Experts affirmed that the prototype demonstrated adequate technical performance, responsive interaction, intuitive controls, and satisfying feedback mechanisms. They also emphasised the importance of the prototype’s evaluability, noting that the system architecture and content components lend themselves well to assessment and future improvements. In this context, evaluability refers to the extent to which the prototype’s structure, functionalities, and pedagogical elements are clearly defined, observable, and measurable, enabling systematic assessment and iterative refinement. Furthermore, the clarity of the information provided about the design was seen as a strength, supporting the potential for iterative refinements based on user feedback and contextual requirements.
The highest technical ratings were associated with the evaluability of the prototype’s content and components (Item 18; M = 4.41, SD = 0.716), followed by its accessibility, stability, and overall system performance (Item 15; M = 4.31, SD = 0.631). The lowest-rated technical aspect, though still favourably evaluated, was the ease of interaction with the prototype to receive pleasant feedback (Item 17; M = 4.13, SD = 0.565).
Although responses across this construct remained relatively consistent, variability was slightly higher than in the previous two dimensions. Greater dispersion was observed, particularly in items related to the availability of information for prototype improvement (Item 19; M = 4.20, SD = 0.736) and the evaluability of content and components (Item 18; M = 4.41, SD = 0.716).
Taken together, these findings indicate that the prototype’s content is appropriate and relevant for the hospital classroom context. The educational approach is well-structured, coherent, and tailored to the developmental and emotional needs of young learners in clinical settings. Furthermore, the technical design demonstrates a high level of robustness, contributing to a smooth and accessible user experience. Collectively, these results validate the prototype’s promise as an effective educational resource in hospital-based learning environments.
In addition to analysing expert responses for each construct, correlations among the three dimensions of the questionnaire were also examined using Spearman’s rank-order correlation coefficient. The analysis revealed statistically significant positive correlations among all constructs (
p < 0.001). The strongest association was observed between content-context adaptation and technical quality of design (r = 0.63), followed by the correlation between content-context adaptation and pedagogical quality of design (r = 0.52). The correlation between technical and pedagogical quality was also significant, though more moderate (r = 0.45). These relationships are visually illustrated in
Figure 15.
These correlation patterns suggest that expert ratings of contextual adaptation were strongly linked to perceptions of both pedagogical and technical quality. Additionally, positive evaluations of pedagogical design were often accompanied by favourable assessments of technical performance. The observed differences in correlation strength highlight the centrality of contextual alignment in expert evaluations, indicating that well-contextualised design contributes meaningfully to both pedagogical coherence and technical functionality.
The experts who participated in the quantitative phase of the study represented two primary professional backgrounds: individuals from the educational field (e.g., professors and researchers) and professionals from the video game design sector. Given the differing domains of expertise, it was hypothesised that their evaluations of the prototype might also diverge, particularly with regard to aspects most aligned with their respective areas of specialisation. For example, it was anticipated that designers might be more critical when evaluating technical features, while educators might apply more stringent criteria to the assessment of pedagogical elements.
To examine potential differences in evaluation between the two groups, the Mann–Whitney U test was conducted, appropriate for ordinal data and non-parametric distributions. Bonferroni correction was applied to adjust for multiple comparisons and reduce the risk of Type I error.
The results revealed no statistically significant differences in the item-level ratings between the two groups (
p > 0.05 across all comparisons). As shown in
Table 6, both educational experts and video game designers provided consistently high ratings across all dimensions of the questionnaire. This convergence in responses despite professional background differences further reinforces the perceived overall quality and coherence of the serious game prototype.
Taken together, the results of the quantitative evaluation indicate a highly favourable appraisal of the prototype across its three core dimensions: content-context adaptation, pedagogical quality, and technical quality. Importantly, these positive assessments were consistent regardless of the respondent’s area of expertise. That is, professors and researchers were not disproportionately critical of pedagogical aspects, nor were game designers more exacting in their assessment of technical components.
After integrating and triangulating these findings with the qualitative data, it can be concluded that the educational game prototype, Yuki’s Adventure: Hidden Words, demonstrates strong contextual alignment, pedagogical soundness, and technical robustness. These characteristics collectively support its potential application within paediatric oncology classroom settings. Specifically, the game offers promising benefits for fostering rapid naming, initial decoding, and automatization in the recognition of syllables, words, and basic sentences, core components in early literacy development for hospitalised learners.
The mixed-methods integration was a central phase in the evaluation of the prototype, consistent with the exploratory sequential design adopted for this study. The integration process was not limited to surface comparison but involved a systematic cross-analysis to identify convergences, logical divergences due to iterative development, and expansions that enriched the interpretation of results. To facilitate this process, a joint display strategy was employed, aligning analytical dimensions across both strands using MAXQDA 24 software (MAXQDA 2024, version 24.4.1, VERBI Software, Berlin, Germany). Qualitative findings from expert interviews and the focus group guided iterative refinements of the prototype, and these changes were subsequently validated through quantitative ratings. This developmental logic resulted in patterns of partial convergence, logical divergences by development, and confirmatory expansion across constructs.
Three joint displays were constructed, as can be seen in
Table 7, corresponding to the study’s core constructs: contextual adaptation, pedagogical quality, and technical quality. These displays demonstrated high coherence across data strands. Experts’ initial concerns during the qualitative phase, such as clarity of instruction, progression, or visual consistency, were addressed in the redesign, and high quantitative ratings (all means above 4 on a 5-point Likert scale) confirmed the success of these adjustments.
The integration process yielded the following outcomes:
Convergence: Both data strands confirmed the strengths of the prototype in terms of contextual relevance, ease of use, and motivational appeal. For example, the prototype’s alignment with paediatric oncology conditions (e.g., short sessions, reduced load) was consistently praised across methods.
Expansion: Qualitative narratives provided depth to quantitative trends. While visual and auditory aesthetics received high scores, experts elaborated on specific tensions (e.g., background music monotony, visual overstimulation) that informed future improvements.
Divergence by development: In areas such as feedback mechanisms or instructional clarity, qualitative critiques prompted design revisions. These refinements were later validated by positive quantitative ratings, illustrating a cycle of development-confirmation that aligns with Design-Based Research principles.
Cross-construct coherence: The analysis revealed interdependencies among the three core constructs. For instance, high technical quality was necessary to deliver pedagogical content effectively, and the contextual adaptation of the interface supported both usability and didactic effectiveness. Adaptation to context emerged as the central axis reinforcing both pedagogical and technical quality.
Meta-inferences: The integrated results validated the relevance of Design-Based Research as a methodological framework for creating digital learning tools tailored to clinical-educational environments. The iterative integration of expert input contributed to a high-fidelity prototype evaluated as pedagogically sound, technically functional, and contextually appropriate for hospital classrooms.
Additionally, the analysis revealed strong interdependence among the constructs. For instance:
The technical performance of the prototype was directly linked to its contextual feasibility in hospital settings. Stability and ease of interaction were essential to avoid frustration in vulnerable learners.
The pedagogical strength of the game depended on its contextual adaptation. Tailored pacing, language, and representation of content were crucial to addressing the cognitive and emotional needs of children undergoing cancer treatment.
Technical and pedagogical dimensions were mutually reinforcing. Experts noted that intuitive interaction design supported educational understanding, while visual and auditory coherence facilitated cognitive engagement.
Finally, this phase enabled the formulation of further meta-inferences about the success of the prototype:
The presence of multiple logical divergences by development confirmed that initial qualitative criticisms triggered meaningful improvements, validated later through high Likert ratings (all means > 4).
The most consistent point of integration was the adaptation to the paediatric oncology context, which acted as a central pillar in enhancing both pedagogical relevance and technical feasibility.
The concept of confirmation with expansion emerged repeatedly: quantitative validation was complemented by rich qualitative insight that explained why the design was perceived as effective.
This integrated analysis highlighted how iterative refinement, grounded in expert feedback, resulted in a final prototype that is pedagogically, contextually, and technically sound. The mixed-methods process validated the logic of the design-based research (DBR) framework, where expert-informed adjustments during the qualitative phase were corroborated by positive quantitative evaluation scores. Additionally, the analysis also confirmed the multidimensional strength of the final prototype, which validated the prototype’s pedagogical, contextual, and technical adequacy for use in hospital-based pre-fluency interventions targeting young children with central nervous system tumours.
In light of the highly positive expert evaluations and the lack of statistically significant differences across professional backgrounds, an exploratory assessment of the prototype’s potential impact and feasibility in real-world clinical-educational settings was considered appropriate. This initiative was encouraged by a teacher working in an oncology hospital classroom in the Canary Islands, who recognised the educational promise of the prototype and advocated for its integration into daily instructional practice. Notably, this application took place despite the prototype still being under development and not yet finalised.
Given the early educational stage of the participants (ages 5–6, enrolled in the final year of Infant Education) and the limited availability of validated literacy assessment tools designed for this age group in the Spanish context, careful methodological consideration was required for instrument selection. In Spain, the official Infant Education curriculum promotes a global, exploratory introduction to literacy, with formal reading instruction typically beginning in Primary Education. As such, administering comprehensive standardized reading tests at this stage would be both pedagogically inappropriate and psychometrically unreliable.
Moreover, the exploratory nature of this pilot, implemented as a single-group post-test design, was not intended to establish causal relationships or generate generalizable findings. Instead, it aimed to offer preliminary insights into the prototype’s educational potential. To achieve this, three validated instruments were selected for their alignment with foundational literacy skills: the Rapid Naming Test (TDR), the reading subtest of the Spanish Test of Basic Instrumental Aspects in Language and Mathematics (PAIB-1), and section 3A of the Spanish Reading and Writing Test (LEE), focusing on basic sentence-level reading. These tools specifically targeted rapid naming, initial decoding, and early indicators of fluency, thus aligning with both developmental appropriateness and curricular expectations.
In addition, fluency performance was assessed using the Scale of Reading Fluency in Spanish (SRFS; Escala de Fluidez Lectora en Español, EFLE) [
185]. The TDR measured rapid automatized naming, widely acknowledged as a strong predictor of reading fluency. The PAIB-1 captured syllable and word-level reading skills in Spanish-speaking preschoolers, while the selected LEE items assessed sentence-level decoding. The SRFS was applied across both PAIB-1 and LEE tasks, offering a multicomponential fluency evaluation encompassing reading speed, accuracy, prosody (including volume, intonation, pauses, and phrasing), and an additional component (reading quality) to provide a holistic perspective. Each component was rated using a four-point scale (1 = lowest, 4 = highest), with standardised descriptors outlined in Appendix 1 of the EFLE scoring scale [
185].
Four children (1 male, 3 female) participated in this pilot implementation, selected according to strict inclusion criteria: age (5–6 years), enrolment in preschool, no prior formal literacy instruction, ongoing treatment for oncology conditions, and diagnosis involving the central nervous system.
As presented in
Table 8, results from the Rapid Naming Test indicated that two participants (P1 and P4) exhibited high naming speed (≥80th percentile), while the remaining two (P2 and P3) demonstrated moderate performance, falling between the 40th and 50th percentiles. The administered subtests included object and colour naming, standardised for this age group, and letter and number naming, which were used qualitatively and interpreted using first-grade reference norms due to the participants’ pre-literacy stage.
In the PAIB-1, designed to assess isolated word reading, three out of four participants performed within the medium range on all indicators (direct score, centile, standardised score, and T score). One participant (P3) fell into the low range, indicating difficulties in early reading processes (
Table 9).
For the assessment of sentence-level reading fluency, two sentences from Section 3A of the LEE (Test de Lectura y Escritura en Español) were selected. Although the LEE is standardised for students in Grades 1 to 4 of Primary Education, the chosen items were specifically selected for their brevity and syntactic simplicity, rendering them suitable for the developmental level of the participants, who were five-year-old children undergoing oncology treatment who had not yet commenced formal literacy instruction.
Given these developmental considerations, the administration of full subtests or longer passages would have been inappropriate, both pedagogically and methodologically. Moreover, since the game prototype was still in its early stages and not designed to produce significant gains in complex reading comprehension, these simplified sentences served as controlled indicators of emergent sentence-level fluency. To complement this, word-level reading was independently assessed using the PAIB-1 test, which includes isolated lexical items appropriate for learners at the pre-reading stage.
The adaptation of the LEE—limiting it to two brief sentences—was both ethically justified and methodologically necessary to ensure developmental validity, reduce cognitive burden, and maintain alignment with the study’s exploratory aims.
Based on the application of the SRFS-EFLE (Escala de Fluidez Lectora en Español) across performance on the PAIB-1 and LEE assessments, 75% of participants (3 out of 4) achieved an overall fluency score equal to or above the group mean (M = 3.0). These students (Students 1, 2, and 4) obtained ratings of 3 or higher in at least three of the four assessed components: reading speed, accuracy, prosody, and reading quality. In contrast, Student 3 showed below-average performance, particularly in speed, prosody, and reading quality, which is consistent with their lower scores on the PAIB-1 subtest, as detailed in
Table 10 and
Table 11.
Given the limited sample size (N = 4), which reflects the exploratory nature of the pilot within a hospital classroom setting, the analysis was restricted to descriptive statistics, without inferential testing. The results suggest a generally favourable trend in early reading fluency development following the intervention. Specifically, three out of four participants achieved scores at or above the group mean in overall initial fluency, as measured by the SRFS-EFLE scale.
The strongest performance was recorded in the accuracy component, where all participants received an identical score (M = 3.0, SD = 0.00), indicating a high and consistent ability to decode words or sentences correctly. In contrast, greater variability was observed in the components of reading speed and reading quality (SD = 0.82). While not designed to address effectiveness, this pilot study aimed to explore feasibility and inform final development; subsequent research should, therefore, incorporate larger samples and control group designs where feasible.
5. Discussion, Conclusions and Future Research
This study presented the instructional design, development, and evaluation of a serious game prototype aimed at fostering rapid naming, decoding, and reading automatization in preschool-aged children undergoing long-term hospitalisation in oncology settings.
The game was grounded in an extensive three-year process of literature review and iterative development, informed by evidence-based pedagogical practices, principles of game design, and the experiential insights of hospital educators, alongside experts in Educational game design; educational technology; computer engineering, programming and development; psycholinguistics and reading processes applied to video games; hospital education and health with technology; and accessibility in reading games.
Drawing on evidence-based practices from paediatric oncology classrooms, the resulting prototype was designed to merge playful engagement with targeted literacy outcomes. Its evaluation revealed that the game aligns well with the educational and clinical needs of its intended context. This alignment was further validated through highly favourable feedback from a diverse group of experts—including preschool hospital teachers and game design experts and professionals—who praised its pedagogical coherence, usability, and contextual relevance. The convergence of insights from these varied perspectives lends strong credibility to the qualitative assessment and reinforces the prototype’s potential as a valuable supplementary tool for supporting instructional goals in paediatric hospital classrooms.
Quantitative results mirrored this positive perception: across all three evaluated dimensions (contextual adaptation, pedagogical quality, and technical quality), mean scores ranged from 4.16 to 4.28 out of 5, indicating uniformly high levels of approval among 274 expert respondents. These data corroborate the prototype’s potential to bridge gaps in stimulating pre-reading fluency processes under clinical and developmental constraints.
Although the post-test pilot sample was limited to four children—reflecting the exploratory nature of the study—the primary aim was not to evaluate effectiveness, but rather to assess the feasibility of implementing the prototype within real hospital classroom settings and to inform its future development. Preliminary findings suggest that the game may support early gains in reading fluency, particularly in naming accuracy and basic decoding skills. These encouraging results highlight the need for further research with larger and more diverse clinical samples to evaluate the game’s generalizability and its potential for measurable educational impact.
Importantly, the small pilot cohort was not pre-planned but emerged from a contextual opportunity, encouraged by a teacher who saw value in the game’s educational potential even at a pre-final stage. Ethical considerations, especially the participants’ vulnerability due to age, health condition, and treatment status, necessitated a cautious, non-intrusive exploratory approach, consistent with the British Educational Research Association (BERA) Risk–Benefit Guidelines. As such, the study was exploratory and descriptive in nature, with a focus on validating design quality rather than establishing statistical significance in learning outcomes. Although children’s subjective experiences and motivational indicators were not systematically assessed using formal instruments, informal teacher feedback indicated a generally positive reception, with boys showing particular enthusiasm. This anecdotal input, though not empirically analysed, offers initial insight into engagement and will inform future research.
Furthermore, this approach facilitated the extraction of evidence-informed design principles while simultaneously identifying key areas for improvement—such as accessibility, interface usability, narrative coherence, and motivational appeal—all of which have been implemented in the more recent version of the prototype. Building on these insights, this research calls for a re-examination of traditional educational design paradigms, advocating for pedagogical innovation through interactive and inclusive technologies. In doing so, it also surfaces important epistemological, methodological, and ethical considerations that future studies must address to establish a rigorous framework for developing serious games in sensitive clinical contexts.
In addition to its methodological contribution, this study presents a serious game whose design embodies a set of distinctive features that clearly differentiate it from existing literacy-oriented educational games. First, unlike general-purpose applications, the prototype was conceived specifically for preschool children undergoing oncological treatment for CNS tumours, a population that frequently exhibits neurocognitive vulnerabilities in rapid naming, processing speed, and fluency. The game is not intended as a vehicle for formal reading instruction; rather, it serves as a pre-reading cognitive-linguistic training tool, focusing on foundational processes such as syllable segmentation, rapid naming, and decoding automatization, which are critical precursors to fluent reading. Importantly, these tasks are not presented in isolation, but are embedded in a high-fidelity, story-driven role-playing environment that substitutes traditional drill-based approaches with quests, virtual classrooms and interactive scenarios. This integration of pedagogy and gameplay is supported by an eclectic instructional model that aligns each learning objective with specific mini-games and narrative mechanics (see
Appendix A,
Appendix B and
Appendix C). Furthermore, the aesthetic design—including neutral characters, adaptive pacing, and emotionally safe audio-visual elements—was intentionally developed for hospitalised contexts, where children often face attentional variability, fatigue, and emotional vulnerability. Finally, the prototype underwent expert validation involving both clinical-educational professionals and game design specialists, ensuring its feasibility and relevance in paediatric oncology settings. Taken together, these characteristics position
Yuki’s Adventures: Hidden Words as a context-specific, theory-informed, and clinically sensitive innovation that extends current research in educational technology and provides a foundation for future studies on fluency development in vulnerable paediatric populations.
Building on these, the primary contribution of this work lies in the elaborate design of an adaptable, pedagogically robust serious game that harmonises educational content with engaging gameplay for children facing extraordinary learning barriers. The resulting design patterns offer a valuable reference point for educators, researchers, and developers seeking to create meaningful, skill-transfer-driven serious games in similarly constrained or high-stakes learning environments.
Beyond its practical design value, this study also advances current literature in several key areas. First, it addresses a significant gap by designing and evaluating a serious game tailored to hospitalised preschool children with CNS tumours—an underrepresented population in educational game research. Second, it offers an initial validation of the preliminary prototype’s design, grounded in principles of neuroeducation, inclusive pedagogy, and accessible game design, all tailored to a clinical-educational context. This validation serves as a foundational step towards developing a fully functional final version of the game, ensuring pedagogical coherence, contextual relevance, and usability are achieved prior to broader implementation. In this context, if a fully developed final version of the game were pursued, a 24-week implementation period—double the 12-week prototype—would be more suitable to foster deeper engagement and sustained learning. Third, the study makes a methodological contribution by applying a Design-Based Research model within a highly sensitive clinical setting, integrating expert validation with exploratory pilot testing. These elements set the work apart from previous studies and provide a replicable framework for advancing serious games design for vulnerable learners’ population.
In light of these contributions, the present study provides both an immediate educational resource and a foundation for future advancements in the design of serious games for vulnerable learners. Specifically, this study not only offers a validated tool tailored to the needs of children in paediatric oncology but also proposes a methodological model for developing and evaluating serious games in complex learning contexts. In doing so, it highlights the transformative potential of educational games when they are grounded in learner-centred pedagogy, guided by empathy, and developed with scientific rigour. At the same time, it challenges the traditional view of video games as mere entertainment, reframing them instead as powerful cognitive and emotional mediators within fragile and discontinuous educational environments such as hospital classrooms.
Within this scope, the present study did not integrate extended user-centred data such as systematic child feedback or long-term gameplay analytics. While these forms of evidence would certainly enrich final development, their collection lies beyond the objectives of the present exploratory phase. Instead, the prototype offers a conceptual and technical foundation that can serve as a reference point for subsequent projects or collaborations in which user-centred evidence might be incorporated more extensively.
Moreover, although no further iterations are planned at this stage, as the final prototype design has been completed, validated, and the exploratory research line concluded, the study has laid a strong foundation and produced a validated design and design principles that can inform the development of the full-scale serious game in the future, contingent on adequate funding and resources. Looking ahead, research exploring learner motivation and engagement in similar hospital-based serious games could benefit from validated, developmentally appropriate instruments tailored to preschool populations. The Leuven Scale of Involvement [
186], for instance, provides a structured observational framework for assessing behavioural engagement and emotional well-being, while the Fun Toolkit [
187] offers child-friendly self-report methods such as the Smileyometer and Fun Sorter to capture affective responses to digital learning. These instruments are well aligned with the developmental and cognitive profiles of hospitalised preschoolers and could be applied effectively in clinical-educational contexts in future studies within the field.
Evaluating motivation and engagement in this highly specific population—five-year-old children hospitalised with CNS tumours—would also require an ethically sensitive and contextually appropriate design. A complementary concurrent triangulation mixed-methods approach [
188], prioritising quantitative measures while drawing on qualitative observations for additional nuance, could offer a robust framework. Such an approach would acknowledge the multifactorial nature of engagement, which is influenced not only by the game itself but also by mediators such as teacher facilitation [
189], environmental conditions, and the impact of illness [
190].
Ultimately, this study marks an important step forward while also opening several avenues for future research. Building on the present findings will require studies with larger samples, control groups, and both baseline and post-intervention assessments using validated, age-appropriate instruments, complemented by longitudinal follow-up. In parallel, integrating in-game analytics, AI-based scaffolding, and speech recognition will be crucial to enable real-time adaptation and support personalised learning trajectories. Finally, examining the transferability of this framework to other sensitive educational contexts could extend its social impact and enhance its broader systemic relevance.