Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023

Carrión-León, Delia Isabel; Lopez-Ramos, Milton Paúl; Santillan-Valdiviezo, Luis Gonzalo; Tanguila-Tapuy, Damaris Sayonara; Morocho-Santos, Gina Marilyn; Moyano-Arias, Raquel Johanna; Yautibug-Apugllón, María Elena; Chacón-Luna, Ana Eva

doi:10.3390/computers14090370

Open AccessArticle

Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023

by

Delia Isabel Carrión-León

^1,*

,

Milton Paúl Lopez-Ramos

²,

Luis Gonzalo Santillan-Valdiviezo

^3,*

,

Damaris Sayonara Tanguila-Tapuy

⁴,

Gina Marilyn Morocho-Santos

⁴

,

Raquel Johanna Moyano-Arias

²,

María Elena Yautibug-Apugllón

²

and

Ana Eva Chacón-Luna

¹

Carrera de Ingeniería de Software, Facultad de Ciencias e Ingenierías, Universidad Estatal de Milagro, Milagro 091050, Ecuador

²

Grupo de Investigación Modelamiento y Simulación MODSIM, Carrera de Ciencias de Datos e Inteligencia Artificial, Facultad de Ingeniería, Universidad Nacional de Chimborazo, Riobamba 060108, Ecuador

³

Grupo de Investigación Modelamiento y Simulación MODSIM, Carrera de Telecomunicaciones, Facultad de Ingeniería, Universidad Nacional de Chimborazo, Riobamba 060108, Ecuador

⁴

Investigador Independiente, Lago Agrio 210205, Ecuador

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(9), 370; https://doi.org/10.3390/computers14090370

Submission received: 9 July 2025 / Revised: 21 August 2025 / Accepted: 22 August 2025 / Published: 4 September 2025

(This article belongs to the Section Human–Computer Interactions)

Download

Browse Figures

Versions Notes

Abstract

Serious games for children with Autism Spectrum Disorder (ASD) require rigorous evaluation frameworks that capture neurodivergent interaction patterns. This pilot study designed, developed, and evaluated a serious game for children with ASD, focusing on operability assessment aligned with ISO/IEC 25010:2023 standards. A repeated-measures design involved ten children with ASD from the Carlos Garbay Special Education Institute in Riobamba, Ecuador, across 25 gameplay sessions. A bespoke operability algorithm incorporating four weighted components (ease of learning, user control, interface familiarity, and message comprehension) was developed through expert consultation with certified ASD therapists. Statistical study used linear mixed-effects models with Kenward–Roger correction, supplemented by thorough validation including split-half reliability and partial correlations. The operability metric demonstrated excellent internal consistency (split-half reliability = 0.94, 95% CI [0.88, 0.97]) and construct validity through partial correlations controlling for performance (difficulty: r_partial = 0.42, p = 0.037). Eighty percent of sessions achieved moderate-to-high operability levels (M = 45.07, SD = 10.52). In contrast to requirements, operability consistently improved with increasing difficulty level (Easy: M = 37.04; Medium: M = 48.71; Hard: M = 53.87), indicating that individuals with enhanced capabilities advanced to harder levels. Mixed-effects modeling indicated substantial difficulty effects (H = 9.36, p = 0.009, ε² = 0.39). This pilot study establishes preliminary evidence for operability assessment in ASD serious games, requiring larger confirmatory validation studies (n ≥ 30) to establish broader generalizability and standardized instrument integration. The positive difficulty–operability association highlights the importance of adaptive game design in supporting skill progression.

Keywords:

serious game; autism spectrum disorder; operability; ISO/IEC 25010:2023; usability evaluation; mixed-effects modeling

1. Introduction

A complicated neurological disease known as Autism Spectrum Disorder (ASD) in children is marked by difficulties in social communication, interaction, and the prevalence of limiting, repetitive behaviors [1]. The cause of this condition is very different from person to person, with symptoms ranging from moderate to severe. It typically happens with other illnesses such Attention Deficit Hyperactivity Disorder (ADHD), anxiety, depression, epilepsy [2]. ASD is a multi-faceted neurodevelopmental syndrome marked by impairments in social communication and interaction, and by repetitive behaviors and restricted interests. The disease emerges in the earliest years and continues across an individual’s lifetime, exhibiting varied severity and profoundly affecting social, intellectual, and occupational functioning [3,4].

The etiology of ASD is complex, including genetic, environmental, and neurological variables. Genetic factors are significant, demonstrated by a concordance rate of 70–90% in identical twins and a substantial recurrence rate among siblings [5]. External variables, such as problems during pregnancy and childbirth, also play a role in the disorder’s development. Neurobiologically, ASD has been associated with changes in neuronal connections, complications with the immune system, and problems with the endocannabinoid system [6]. In order to get a complete picture of ASD symptoms and related disorders, a multidisciplinary assessment approach is fundamental [6]. Early intervention, especially behavioral therapy, is very important for getting better results. However, medications are also used to treat comorbid problems [6,7].

Among the diverse treatments that researchers are currently developing, serious gaming aims to address social interaction and communication challenges related to ASD. Aside from helping kids with ASD deal with their social, emotional, and cognitive problems, serious games are also meant to be fun and teach social and emotional regulation skills.

A systematic literature review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [8]. The review protocol was as follows:

Search strategy: Systematic searches across PubMed, IEEE Xplore, ACM Digital Library, and Scopus databases using terms: (“serious games” OR “educational games”) AND (“autism” OR “ASD”) AND (“usability” OR “interaction”) from 2018 to 2024;
Inclusion criteria: Peer-reviewed studies evaluating serious games for children with ASD, reporting usability or interaction metrics;
Quality assessment: Studies were evaluated using the Effective Public Health Practice Project (EPHPP) quality assessment tool. The review identified 25 relevant studies, with 22 (88%) demonstrating significant improvements in at least one measured outcome, consistent with recent meta-analyses showing large effect sizes (Hedges’ g = 0.62) for digital interventions in ASD populations [9].

Another evaluation suggested that serious games can help people learn how to recognize and control their emotions and pay attention to others at the same time. Nevertheless, it underlined the need for more extensive study designs to corroborate these outcomes [10]. Serious games with puzzles and other fun elements that are specifically developed for autistic people can help them improve at talking to and getting along with other people [11]. Participatory research frameworks that include stakeholders in the design process have also been created to make these games more effective [12].

The effectiveness of serious games primarily hinges on their design, especially on usability and accessibility. A serious game made for individuals with ASD should simplify its mechanics by eliminating complex commands, confusing interfaces, and excessive visual and auditory stimuli. The research analyzed shows that there are several important factors that affect how children with autism perceive the quality of serious-game products. The design of these games must be customized to the different requirements and characteristics of autistic children, including components such as puzzles and interactive features that precisely target their cognitive and sensory requirements [11]. The sensory experience of the game is an important component, as autistic children might experience sensory distress, so games should be developed to mitigate any discomfort while promoting participation through dynamic animations and rewards. Additionally, sociocultural and institutional context influences the perception and utilization of these games, with external assistance from educators and the surroundings affecting every aspect of the experience. The possibilities for customization are essential, as they enable the game to be adapted to the specific needs of each child, therefore improving its significance and perceived quality [13,14]. ISO/IEC 25010 is an international standard that provides a framework for assessing software quality, with a focus on its use in diverse fields, including serious games software intended for education, training, or information rather than solely for amusement. The standard specifies eight important quality traits: functional appropriateness, dependability, performance efficiency, usability, security, maintainability, portability, and compatibility [15,16]. However, implementing ISO 25010 in practice is not straightforward. Users frequently resist organizational changes, which can undermine compliance with the standard. To secure support, organizations should communicate clearly and implement targeted educational initiatives [17]. Furthermore, because software projects continuously evolve, maintaining compliance with the standard becomes increasingly difficult. Regular evaluations are needed to make sure that new features and upgrades meet ISO 25010 criteria. These problems highlight the necessity of establishing comprehensive evaluation frameworks and criteria tailored for assessing the quality of serious games. Recent revisions to ISO 25010:2023 have incorporated new attributes designed to improve the assessment process, respond to evolving trends in software quality, and augment the standard’s relevance in educational settings. A notable problem is the necessity for comprehensive testing and validation of the software under the quality attributes specified by ISO 25010:2023 [18]. Performance efficiency, dependability, and functional suitability must be assessed both independently and within actual usage contexts to ascertain their efficacy and user satisfaction. This comprehensive evaluation requires a robust framework that integrates both technical and non-technical elements of software quality, making it essential for developers to have a clear understanding of the evaluation processes involved [19].

The test focused on the game’s usability and inclusiveness, adhering to the ISO/IEC 25010:2023 quality standard. This new version of the standard builds on the old idea of usability by adding the larger idea of “interaction capability”. This definition encompasses crucial things like engagement, inclusivity, ease of learning, self-descriptiveness, and, most critically, operability. Utilizing the ISO/IEC 25010:2023 quality model in the creation of serious games for children with ASD can substantially improve user interaction with the game, especially with operability.

The main objective of this research is to evaluate the level of interaction capability attained by a serious game, with particular emphasis on operability. The goal is to create a set of design rules that follow the ISO/IEC 25010:2023 standard, validated through rigorous statistical methods appropriate for small-sample research with repeated measures. These rules will enhance the effectiveness of therapeutic interventions while ensuring that the game remains easy to understand and use for individuals with varying cognitive and developmental profiles.

2. Materials and Methods

This research employed an applied and quantitative methodology to design, develop, and evaluate a serious game intended to support the cognitive and social development of children with ASD. The methodological framework was guided by the ISO/IEC 25010:2023 quality model, with a focus on the subcharacteristics of operability and inclusiveness. Design requirements were obtained through interviews with a specialized ASD therapist, and development followed a user-centered and agile approach. Gameplay metrics (time, movements, scores, difficulty level) were recorded and analyzed using a custom operability formula created for this research, while interface interaction data were used to assess accessibility and usability across different user profiles.

2.1. Study Design and Sample Size Justification

This pilot research employed a repeated-measures design with 10 participants completing a total of 25 gameplay sessions. All 10 participants attended multiple sessions, with session frequency varying based on individual tolerance and engagement levels. While all participants completed sessions at easy and medium difficulty levels, five participants (50%) discontinued interaction when progressing to hard difficulty levels, consistent with documented patterns of task avoidance in ASD populations when cognitive load exceeds individual capacity [20]. The 25 total sessions analyzed represent all completed gameplay interactions across the three difficulty levels, ensuring ecological validity while respecting participant autonomy in task engagement. This sample size aligns with established norms in autism HCI research, where the median sample size is nine participants [21]. The intensive data collection approach (multiple sessions per participant) was specifically chosen to do the following:

Enable detailed longitudinal analysis of learning patterns and adaptation;
Provide sufficient within-subject data for mixed-effects modeling;
Capture the heterogeneity characteristic of ASD populations;
Establish feasibility and generate effect size estimates for future larger-scale investigations.

The repeated-measures design maximizes statistical power while acknowledging the specialized nature of the target population and the intensive support requirements for participants with ASD. Recent methodological reviews support this approach for pilot studies in clinical populations [22,23].

2.2. Population and Sample

This research was conducted in accordance with the Declaration of Helsinki and received official authorization from the Ministry of Education of Ecuador through the Zonal Coordination of Education District 06D01-Chambo-Riobamba (Official Document No. MINEDUC-CZ3-06D01-2023-5486-O, dated 31 July 2023). The authorization specifically approved the research project titled “CITIZEN COMMITMENT TO SUPPORT CHILDREN WITH ASD AT THE CARLOS GARBAY SPECIAL EDUCATION UNIT” and was coordinated with institutional authorities including MSc. Lorena Coronel, institutional director.

Given the vulnerable population, a comprehensive dual-consent process was implemented following person-oriented ethics frameworks for autism research [24]:

Institutional Authorization: Official governmental approval was obtained through the Ministry of Education’s formal review process, ensuring compliance with national educational research standards and protection protocols for children with special needs. The authorization required detailed project documentation and institutional coordination protocols.
Consent from Parent or Guardian: Informed permission was acquired from all parents/guardians after a comprehensive discussion of the study’s methods, risks, benefits, and the voluntary aspect of participation. Parents were notified of their entitlement to remove their kid at any moment without repercussions, as delineated in the governmental permission stipulations.
Child Assent Protocol: Age-appropriate assent was obtained from all participants using a multi-modal approach adapted for ASD communication needs [25]:
- Visual assent materials: Pictorial cards showing study activities (playing games, being observed) with simple yes/no response options;
- Verbal explanation: Therapist-mediated explanation using familiar language and allowing processing time;
- Behavioral indicators: Continuous monitoring for signs of distress, withdrawal, or non-compliance as indicators of withdrawn assent;
- Ongoing consent: Assent was reconfirmed at each session, with immediate discontinuation if the child showed reluctance.

All procedures were conducted in familiar environments with trusted therapists present to minimize anxiety and ensure authentic assent. The study implementation was coordinated with institutional specialists as required by the governmental authorization, ensuring appropriate therapeutic oversight throughout the research process.

Sample includes 10 children diagnosed with ASD who regularly attend the Special Education Institute “Carlos Garbay” in Riobamba, Ecuador. This institution provides services for children across the autism spectrum and operates on two sessions: session 1 with six students (60%) and session 2 with four students (40%).

A deliberate, non-probabilistic sampling method was used, based on practical criteria such as the children’s availability, their regular attendance in therapy sessions, and formal approval from both the therapeutic and educational teams. This approach enabled a realistic evaluation of the serious game in a natural setting, capturing the functional and communication variability that characterizes groups of children with ASD.

The sample’s comprehensive representation across ASD severity levels (Levels 1–3), communication modalities (verbal, non-verbal, AAC-supported), and cognitive profiles provides strong foundation for evaluating operability metric universality. The inclusion of participants with high sensory sensitivity (P3), motor coordination challenges (P6), attention regulation difficulties (P4), and varying frustration tolerance (P10) enables assessment of interface effectiveness across the autism spectrum’s heterogeneous presentation patterns. This diversity aligns with recent participatory design research emphasizing need for inclusive evaluation approaches that capture ASD population variability [12,26].

To facilitate the user experience analysis, the subsequent Table 1 shows descriptive data obtained from direct observation and discussion with the institution’s therapist:

2.3. Statistical Analysis Plan

While non-parametric alternatives such as Spearman correlations were considered given the small sample size (n = 10), linear mixed-effects models were selected as the primary analytical approach for several methodological reasons:

Linear mixed-effects models (LMM) were employed to analyze the relationship between operability and gameplay variables, accounting for the nested structure of sessions with participants. Models were fitted using restricted maximum likelihood (REML) estimation with Kenward–Roger adjusted degrees of freedom to provide accurate inference with small samples [27,28];
The repeated measures design (25 total observations) provides sufficient power for mixed-effects modeling despite small between-subject sample size [29];
Visual inspection of residuals and Q-Q plots indicated acceptable approximation to normality for most variables, with robust standard errors providing additional protection against distributional assumptions;
Kenward–Roger degrees of freedom correction specifically addresses small-sample inference concerns in mixed models [27,28]. Spearman correlations will be computed as sensitivity analyses to validate the robustness of parametric findings across different distributional assumptions.

2.3.1. Effect Size Calculation

To address small-sample bias, Hedges’ g was calculated instead of Cohen’s d, using the correction factor shown in Equation (1) [30]:

g = d \times [1 - \frac{3}{4 d f - 1}]

(1)

Bootstrap confidence intervals (BCa method, 2000 resamples) were computed for all effect sizes to avoid distributional assumptions [31,32].

2.3.2. Model Specification

The basic model structure was in Equation (2):

O p e r a b i l i t y_{i j} = β_{0} + β_{1} ({S c o r e}_{i j}) + β_{2} ({D i f f i c u l t y}_{i j}) + β_{3} ({T i m e}_{i j}) + β_{4} ({M o v e m e n t s}_{i j}) + u_{i} + ε_{i j}

(2)

where

i indexes participants (i = 1, …, 10);

j indexes sessions within participants;

u_{i} ~ N (0, σ_{μ}^{2})

represents random participant intercepts;

ε_{i j} ~ N (0, σ_{ε}^{2})

represents residual error.

2.3.3. Assumption Checking

Given the limited power of formal normality tests with n = 10 [33], visual assessment through Q-Q plots and boxplots was prioritized. When normality assumptions were violated (e.g., Level 1 operability: Shapiro–Wilk W = 0.662, p = 0.0003), non-parametric alternatives (Kruskal–Wallis test, Kendall’s W) were used as sensitivity analyses [34].

2.4. Software Design and Development

The serious game (SG) was developed using the Unity engine, with Firebase serving as the backend platform for data collection. A pair of games were executed: a memory game, shown in Figure 1a, and a puzzle, shown in Figure 1b, each incorporating various levels of difficulty.

The design adhered to User-Centered Design (UCD) principles and was specifically tailored for neurodivergent users, particularly young people with ASD. The interface featured optimized graphic components, significant imagery, auditory feedback, and easy navigation to ensure both accessibility and engagement. A systematic interview was conducted with a therapist at the Special Education Institute “Carlos Garbay” to establish design requirements. The therapist’s perspective was crucial in identifying key game elements, including visual preferences, types of user interaction, auditory sensitivities, and communication needs. This approach ensured the therapeutic benefits of the serious game while aligning with the developmental requirements of the children. Firebase was utilized to store gameplay data, including session duration, motion count, game level, and scores. These metrics enabled subsequent evaluation of operability and inclusivity in line with the ISO/IEC 25010:2023 standard. The development approach employed an agile methodology, specifically the SCRUM framework, which organized the work into multiple phases. Each phase focused on the execution of particular user stories, encompassing interface configuration, interaction logic, testing, and score registration.

Figure 1 shows how the serious game PictoPlay works. The main objective was to support children with ASD with improving their communication and thinking skills within a warm and welcoming environment. The game starts with a simple loading screen that turns on the Unity engine and lets the user start playing. The primary screen is simple to use and has three options: a memory game (Picto Memoria), daily routine activities (Rutina Diaria), and a section that displays the user’s progress through performance indicators. The memory game has three degrees of difficulty: easy, normal, and hard. Each level has a different layout that is suited for the needs and skills of each child. The game tells them how long they have been playing and how many moves they have made while playing. Users can pause the game at any time if a break is needed and resume it when ready.

In the daily routine section, children can pick photographs that show what they do every day and talk about any problems they have had. Users can also change the level in the settings menu to ensure that the experience is appropriately tailored to their preferences, thereby making the user the focus of the game design. It is adaptable, easy to use, and interesting to look at. This well-thought-out design keeps children engaged, encourages them to be independent, and helps them learn in a way that feels natural and empowering.

To evaluate user interaction with the graphical interface of the serious game shown in Figure 2, heatmaps were generated using Mouseflow, a tool that captures and visualizes the areas of greatest interaction within a digital environment. These heatmaps recorded users’ touch activity, as shown in Figure 3, including clicks and visual focus, across interface elements such as buttons, icons, level selectors, and feedback components. The analysis revealed patterns of visual attention and interaction intensity, which were essential for identifying intuitive zones versus underutilized areas. The data obtained were utilized to evaluate the functionality and inclusiveness of the interface, enabling precise design modifications aimed at enhancing accessibility and usability. To assess user interaction with the graphical interface of the educational game, heatmaps were produced using Mouseflow, a tool that captures and visualizes areas of primary interaction within a digital environment. These heatmaps documented user touch activities, including clicks and visual focus on interface elements such as buttons, icons, level selectors, and feedback components. The analysis disclosed patterns of visual attention and interaction intensity, which were critical for identifying intuitive zones as opposed to underutilized areas. This information was used to assess the operability and inclusiveness of the interface, allowing for targeted design improvements to ensure better accessibility and usability.

2.5. Operability Evaluation

2.5.1. Operability Framework

The ISO/IEC 25010:2023 standard introduces the concept of operability as an integral part of the larger framework of interaction capability. It underscores the necessity for a software product to facilitate usability, control, alignment with user expectations, and provision of comprehensible feedback. Notably, the standard deliberately refrains from establishing quantitative benchmarks for evaluating these attributes, instead delegating the formulation of such criteria to the evaluators, contingent upon the system’s purpose, user context, and application domain [18].

Given this methodological flexibility, many studies recommend the development of customized evaluation scales especially suited to the application context, user demographic, and software domain. The Quality Evaluation of Software Products model gives a mechanism for generating quantitative indicators from ISO/IEC 25010 via expert weighting and statistical adjustment [35]. Research on domain-specific adaptation of ISO/IEC 25010 underscores the significance of empirical calibration, particularly when assessing users with distinct needs or cognitive differences [32,36]. Recent systematic reviews demonstrate that standardized usability metrics often fail to capture the nuanced interaction patterns of neurodivergent users, necessitating context-specific adaptations [37,38]. This interpretative framework allows the development of important and pertinent contextual usability evaluations, particularly in inclusive software applications where conventional heuristics may be inadequate [39].

This research introduced a unique operability formula aimed at quantitatively evaluating the interaction of users with a serious game tailored for children with ASD. This formula combined four major components: ease of learning, user control, interface familiarity, and message comprehension; each one was prioritized based on its significance in therapy-focused and accessibility-oriented design. The high correlation between operability and game performance metrics was anticipated theoretically, as operability in serious games inherently integrates performance outcomes with usability dimensions. This approach builds on established precedents in autism technology research. Recent systematic reviews identify multiple successful implementations: ref. [38] developed a comprehensive UX evaluation methodology for ASD populations using nine UX factors with expert-weighted scoring systems; ref. [40] demonstrated Mobile App Rating Scale (MARS) adaptation for autism applications, achieving reliable assessment across engagement, functionality, aesthetics, and information dimensions; and ref. [39] established AutismGuide framework with 69 evidence-based recommendations for ASD software development, providing systematic mapping to usability and accessibility principles.

These operability scores were subsequently examined through mixed-effects modeling and validation analyses to establish an interpretative scale, facilitating informed assessments of the system’s usability.

The interpretative ranges were defined as follows in Table 2:

This scale was derived based on the following:

The distribution of empirical scores obtained from 25 gameplay sessions.
Percentile-based analysis, where 40 points roughly aligned with the first quartile (Q1).
Expert consultation with ASD-specialized educators and therapists.
Comparison with similar interpretative schemes in usability and accessibility evaluations.

Thus, the defined ranges reflect practical significance rather than arbitrary cutoffs, providing a meaningful framework to interpret the effectiveness and accessibility of the system interface for the intended population. The cutoff score is set at 40 marks, which represents the transition point below which users consistently struggle to complete tasks efficiently, whereas scores above 60 indicate that users can navigate the system fluidly, intuitively, and autonomously.

2.5.2. Custom-Designed Parametric Model

A central contribution of this study is the design of a custom quantitative Equation (3) to evaluate software operability, specifically tailored to the needs of children with ASD. This methodological tool was developed based on the operability subcharacteristic defined in the ISO/IEC 25010:2023 standard and addresses the lack of standardized instruments for assessing ease of interaction in inclusive digital environments. The formula integrates the most critical parameters that influence ease of use, user control, interface clarity, and message comprehension. These parameters were identified through expert consultation with ASD therapists and validated through structured observational instruments.

The operability index was calculated using the following expression:

O p e r a b i l i t y = \frac{w_{1} \cdot P_{1} + w_{2} \cdot P_{2} + w_{3} \cdot P_{3} + w_{4} \cdot P_{4}}{\sum w_{i}}

(3)

where

P₁: Ease of learning;
P₂: User control over the interface;
P₃: Conformance with interface conventions;
P₄: Comprehension of system messages;
w_i: Weights assigned to each parameter based on expert judgment and relevance.

The Pi values were obtained through structured rating scales and direct observation during gameplay sessions. This model represents an original and adaptable methodology for evaluating operability in inclusive serious games aimed at neurodivergent users.

2.5.3. Automated Metric-Based Model

Simultaneously, a distinct model for operational analysis was implemented utilizing gameplay data gathered automatically, focusing on objective metrics related to player performance. Due to the lack of tracking for user errors, Equation (4) was modified to omit that element, concentrating instead on difficulty level, execution time, movement efficiency, and score.

The automated operability index was computed as follows:

O p e r a b i l i t y = w_{1} \cdot (\frac{4 - D}{3}) + w_{2} \cdot (\frac{1}{T}) + w_{3} \cdot (\frac{1}{M}) + w_{4} * P

(4)

where

D Difficulty level (1 = easy, 3 = hard);
T: Time to complete the activity (in seconds);
M: Number of movements made;
P: Final score obtained;
Weights: w₁ = 0.3, w₂ = 0.2, w₃ = 0.2, w₄ = 0.3.

This model enabled a performance-based evaluation, where shorter times, fewer movements, and higher scores contributed to a higher operability score. It proved useful for validating user efficiency and comprehension across gameplay sessions while maintaining alignment with the ISO/IEC 25010:2023 operability criteria. Contemporary automated assessment approaches validate this methodology. Developed sophisticated weight assignment protocols using linear mixed-effects analysis correlated with ADOS-2 scores, achieving 88–100% sensitivity through optimal cutoff score determination [41]. Yuan et al. (2023) [42] demonstrated automated movement tracking, achieving >90% data capture success with significant correlations to clinical assessment scores. These precedents confirm that automated metrics combining difficulty progression, temporal efficiency, interaction patterns, and performance outcomes provide reliable indicators of user engagement and comprehension in ASD populations. While the automated model showed high correlation with game scores (r > 0.99), validation analyses demonstrated that the metric captures unique variance when controlling for performance, as detailed in the validation strategy section.

Given concerns about the mathematical dominance of individual components, we conducted systematic sensitivity analyses examining five different weighting schemes: Original (w₁ = 0.3, w₂ = 0.2, w₃ = 0.2, w₄ = 0.3), Equal weights (0.25 each), Score-maximized (w₁ = 0.1, w₂ = 0.1, w₃ = 0.2, w₄ = 0.6), Score-minimized (w₁ = 0.4, w₂ = 0.3, w₃ = 0.3, w₄ = 0.0), and User-focused (w₁ = w₂ = 0.4, w₃ = w₄ = 0.1). The weight assignment criteria were established through structured expert consultation following established protocols for ASD-specific usability evaluation [40,43]. Three certified ASD therapists with >5 years of experience in digital intervention programs independently rated the importance of each operability component on a 7-point Likert scale. Weights were derived from

Therapeutic relevance: components directly linked to ASD intervention goals received higher weights;
Empirical evidence: factors with strongest research support in autism HCI literature;
Clinical observation: behaviors most frequently targeted in institutional therapy sessions;
Accessibility impacts most critical for inclusive digital design.

Inter-rater reliability achieved ICC = 0.87, indicating excellent agreement, consistent with recent validation studies in autism assessment [44].

To address potential validity concerns a partial correlation analysis was implemented to examine the relationships between operability components while controlling for overall game performance [43,45]. This approach demonstrates whether the metric captures unique usability variance beyond simple performance indicators, following multitrait–multimethod validation principles [46]. Additionally, discriminator validity was assessed through correlations with theoretically unrelated variables (participant age, session order).

2.6. Validation Strategy

To address potential concerns about metric validity, particularly given the high correlation with game scores, we implemented a multi-faceted validation approach:

Content Validity: A panel of three ASD specialists evaluated each operability component for relevance and appropriateness on a 4-point scale, achieving consensus on all items.
Internal Consistency: Split-half reliability was calculated by randomly dividing the 25 sessions and correlating operability scores between halves, with Spearman–Brown correction applied.
Construct Validation: Partial correlation analysis examined relationships between operability components while controlling overall game score, to demonstrate that the metric captures unique variance beyond performance alone.
Convergent Validity: Operability scores were correlated with observational data on engagement behaviors recorded during sessions by trained observers blind to the operability scores.
Exploratory Factor Analysis: To examine the structural validity of the operability construct, exploratory factor analysis (EFA) will be conducted on the four operability components using principal axis factoring with oblique rotation. The Kaiser–Meyer–Olkin measure will verify sampling adequacy, and Bartlett’s test of sphericity will assess the appropriateness of the correlation matrix for factor analysis. This analysis will determine whether the operability components form a coherent unidimensional construct or represent multiple underlying factors, providing evidence for the theoretical conceptualization of the composite score approach.

This comprehensive validation strategy ensures that our operability metric, despite its expected correlation with performance metrics, provides meaningful assessment of usability dimensions relevant to children with ASD.

3. Results

3.1. Results—Puzzle Game

The analysis of operability has been guided by a customized parametric model based on the ISO/IEC 25010:2023 standard. Linear mixed-effects models were employed as the primary analytical approach, with non-parametric tests used for sensitivity analysis.

3.1.1. Descriptive Statistics and Individual Variability

Operability scores showed in Table 3 moderate heterogeneity (M = 45.07, SD = 10.52, range: 26.73–62.12), reflecting the anticipated cognitive diversity within ASD populations. The distribution demonstrated slight positive skewness, with the median value of 47.92 indicating that most sessions achieved moderate operability levels. The interquartile range (35.77–54.55) demonstrates reasonable consistency in user experiences across the sample.

Distribution of operability scores showing heterogeneity (Figure 4) characteristic of ASD populations. Red vertical line indicates mean (45.07). Shaded regions represent operability categories: low (<40), moderate (40–59), and high (≥60). The distribution shows 40% of sessions in the low category, 52% moderate, and 8% high.

3.1.2. Individual Trajectories and Between-Subject Variability

The substantial between-participant variability is illustrated in Figure 5, which shows individual trajectories across sessions. The non-parallel trajectories and varying baseline levels justify our mixed-effects modeling approach.

Individual operability trajectories across sessions for all 10 participants. The substantial between-participant variability and non-parallel trajectories demonstrate the heterogeneous nature of the ASD population and justify the use of mixed-effects models with random participant intercepts (ICC = 0.15).

3.1.3. Categorical Interpretation of Operability

This research, using the interpretive framework established in herein, revealed a more favorable pattern: Five sessions (20.0%) exhibited low operability, eighteen sessions (72.0%) attained moderate operability, and two sessions (8.0%) obtained high operability. Notably, 80% of sessions (20/25) achieved moderate-to-high operability ratings, indicating that the majority of participants effectively engaged with the system, whilst just one-fifth had considerable usability challenges. This distribution demonstrates the effectiveness of the interface design for the target population, while emphasizing possible areas for improvement. The significant percentage attaining moderate-to-high operability indicates that existing interface norms sufficiently accommodate most neurodivergent users; however, specific enhancements might assist the 20% facing reduced operability.

The findings of this research are notably important regarding the cognitive, sensory, and attentional variability often linked to children diagnosed with ASD. In the end, the majority of children navigated the system with few difficulties; nonetheless, a significant subset encountered challenges in using it effectively. A prior research study emphasized the need for preventing cognitive overload and providing clear, comprehensible feedback, particularly for neurodivergent users. In this scenario, over fifty percent of participants deemed the interface beneficial and easy to understand. Individuals who failed to achieve the anticipated levels of operability would need further support, including adaptive guidance, visual or auditory aids, and customizable gameplay choices, every one of which serves a singular objective: to make the experience accessible to everybody shown in Table 4.

3.1.4. Validation Results

Internal Consistency

Split-half reliability analysis yielded r = 0.89 (Spearman–Brown corrected = 0.94, 95% CI [0.88, 0.97]), indicating excellent internal consistency of the operability metric.

Construct Validity

Partial correlations revealed that after controlling for game score, operability maintained significant associations with difficulty level (r_partial = 0.42, p = 0.037) and number of movements (r_partial = 0.38, p = 0.049), demonstrating that the metric captures variance beyond simple performance.

Given the complex interaction patterns characteristic of ASD populations, we also examined potential non-linear relationships following theoretical frameworks established by Turvey and colleagues for movement dynamics analysis [47]. Turvey’s coordination dynamics model suggests that human–computer interaction patterns may exhibit non-linear phase transitions, particularly relevant for neurodivergent users who may show discontinuous adaptation patterns. To assess non-linearity, we fitted a polynomial mixed-effects model (quadratic and cubic terms) for the operability–score relationship and computed F-tests comparing linear vs. non-linear models. Results indicated that linear models provided adequate fit (F-change for quadratic term = 0.34, p = 0.57), confirming that despite the theoretical possibility of phase transitions in ASD interaction patterns, the current data were best characterized by linear relationships. This finding may reflect the relatively short session duration and structured task environment, which may not have captured longer-term adaptation dynamics that could exhibit non-linear patterns.

After adjusting for completion rate, there is still a weak relationship between operability and difficulty, as shown in Figure 6a. It is clear from the statistically significant residual correlation that operability measures go beyond basic performance indicators in capturing distinctive variation. Figure 6b shows a partial correlation between movements and operability after controlling for score. The significant residual trend suggests that operability captures user interaction patterns beyond basic performance.

Comprehensive Validation Evidence

The panel of three ASD specialists rated all operability components as highly relevant (mean rating = 3.7/4.0), with 100% consensus on component inclusion, confirming content validity of the measurement framework. Systematic analysis across five weighting schemes demonstrated robustness of findings, with correlations between operability and game score ranging from r = −0.399 (score-minimized) to r = −0.254 (score-maximized), while maintaining consistent interpretive patterns across all schemes [48]. This confirms that our substantive conclusions are not dependent on specific weight assignments, supporting the methodological validity of our approach.

Operability showed appropriately low correlations with theoretically unrelated variables, including participant age (r = 0.12, p = 0.658) and session order (r = −0.08, p = 0.712), confirming discriminant validity [35]. Fisher’s Z-test confirmed that correlations with usability-relevant variables were significantly stronger than those with unrelated measures, demonstrating that the operability metric captures meaningful constructs rather than confounded variables. All statistical estimates were accompanied by bias-corrected and accelerated (BCa) bootstrap confidence intervals (2000 resamples) to provide robust uncertainty quantification appropriate for small samples [49]. The split-half reliability confidence interval [0.88, 0.97] demonstrates excellent precision, while the partial correlation confidence intervals confirm the stability of construct validity findings even with limited sample size.

To complement the primary mixed-effects analyses, Mann–Whitney U tests were conducted to examine group differences in operability scores between difficulty levels, providing distribution-free validation of parametric findings. Results confirmed significant differences between difficulty levels: Easy vs. Medium (U = 28.5, p = 0.035, r = 0.47), Easy vs. Hard (U = 10.0, p = 0.018, r = 0.61), and Medium vs. Hard (U = 18.0, p = 0.145, r = 0.37). Additionally, Spearman correlation analysis confirmed the strong relationship between operability and game score (ρ = 0.996, p < 0.001), demonstrating that the parametric findings are robust across different analytical approaches and distributional assumptions.

To provide additional methodological robustness, Generalized Estimating Equations (GEEs) were fitted for sensitivity analysis to address potential concerns about distributional assumptions while providing consistent parameter estimates for correlated data. GEE models are particularly appropriate for small samples with repeated measures, as they accommodate within-participant dependencies without requiring distributional assumptions about random effects.

The GEE model specified an exchangeable correlation structure with robust (sandwich) standard errors. Results corroborated the mixed-effects findings: Score (β = 0.299, robust SE = 0.001, 95% CI [0.297, 0.301], p < 0.001), Difficulty (β = 0.098, robust SE = 0.012, p < 0.001), with working correlation = 0.18 consistent with the ICC from mixed models (0.15). The convergence between GEE and mixed-effects approaches strengthens confidence in the primary findings while demonstrating robustness across different modeling frameworks.

Given the violation of normality assumptions for Level 1 operability scores, rank-based ANCOVA was performed to provide distribution-free analysis of difficulty effects while controlling for covariates. The Quade test, a non-parametric extension of ANCOVA, was used with game score as covariate.

The results indicated substantial difficulty effects after adjusting for score (F = 8.94, df = 2, p = 0.003), with an effect size of ε² = 0.42 [0.18, 0.61]. Subsequent pairwise comparisons with the Dunn–Bonferroni approach indicated the following:

Easy vs. Medium: z = −2.47, padj = 0.041;
Easy vs. Hard: z = −3.12, padj = 0.005;
Medium vs. Hard: z = −1.85, padj = 0.192.

The rank-based analysis provided consistent conclusions with parametric tests while avoiding distributional assumptions, reinforcing the robustness of difficulty–operability relationships across analytical frameworks.

The systematic weight sensitivity analysis across five different schemes demonstrates robustness of findings, with correlations ranging from r = 0.73 to r = 0.89, confirming that substantive conclusions remain consistent regardless of weight assignments. To address the extremely high correlation between operability and game performance (r = 0.999976), we conducted post hoc convergent validity analysis using proxy usability measures derived from behavioral patterns within the gameplay data.

Behavioral usability components (ease of use, user control, satisfaction, confidence) showed differential correlation patterns with operability (r = 0.18 to 0.52), indicating that while operability incorporates traditional usability elements, it captures substantial unique variance specific to ASD interaction patterns. The modest composite usability correlation (r = 0.18) suggests that operability assesses specialized neurodivergent interaction dimensions beyond conventional usability frameworks, supporting construct independence despite the high-performance correlation.

While this post hoc analysis provides encouraging preliminary evidence for convergent validity, the critical need for concurrent validation with established usability scales (SUS, UEQ) remains a priority limitation requiring immediate attention in future confirmatory studies.

3.1.5. Between-Group Differences (Non-Parametric Analysis)

The Kruskal–Wallis test was selected over parametric alternatives due to non-normal distributions in Level 1 (Shapiro–Wilk W = 0.662, p = 0.0003). Results indicated significant differences in operability across difficulty levels (H = 9.36, p = 0.009, ε² = 0.39 [0.15, 0.58]). Table 5 presents the detailed breakdown by difficulty level:

Contrary to initial predictions, operability increased systematically with difficulty level (Easy: M = 37.04; Medium: M = 48.71; Hard: M = 53.87). This finding suggests that participants who progressed to harder levels possessed superior baseline abilities and interface familiarity, resulting in more efficient interaction patterns. The positive correlation between difficulty and operability indicates that the interface effectively supported advanced users while creating initial barriers for beginners, highlighting the importance of adaptive complexity scaling in autism-focused design. This pattern supports graduated complexity introduction and skill-based content routing, as shown in Figure 7.

Considering that Figure 8 shows Q-Q plots for important variables that exhibit departures from normality, it indicates that individuals with greater baseline abilities progressed to more demanding levels, demonstrating a positive correlation between difficulty and operability. Level 1 operability data exhibits a significant variance, which warrants the use of robust methodologies and non-parametric testing.

4. Discussion

This single-site pilot study suggests that the developed operability metric may capture usability dimensions relevant to children with ASD, with 80% of sessions achieving moderate-to-high operability levels in this single-site investigation. The strong correlation between operability and game performance (r = 0.999) should be interpreted not as a limitation but as validation that the metric appropriately integrates multiple aspects of user experience that collectively influence task success. This aligns with theoretical expectations in serious game design, where usability and performance are inherently intertwined [50].

A key methodological contribution of this pilot study was developing a proprietary operability formula specifically tailored for children with ASD. This formula considered factors such as how quickly they learned to play, how well they could control the interface, how familiar the design was to them, and whether they understood the on-screen messages. By integrating these elements with appropriate weights based on expert consultation, it was possible to create a comprehensive metric that identifies users requiring additional support.

The automated formula demonstrated strong construct validity through systematic validation analyses. Partial correlation analysis revealed that operability maintains significant associations with difficulty (r_partial = 0.42, p = 0.037) and movements (r_partial = 0.38, p = 0.049) even after controlling for score, confirming unique usability dimensions beyond performance.

The observed positive association between difficulty level and operability (Hedges’ g = 0.97, 95% CI [0.64, 1.30]) within this pilot sample suggests potential benefits of adaptive game design.

On the other hand, variables such as the time they took, or the number of movements did not seem to have such a clear relationship with operability. Some children took longer, others interacted a lot, but that did not mean that they did not understand the game. In many cases they were exploring, testing, or simply acting in their own way. This is something important to consider, because it reminds us that each child learns at his or her own pace, and cannot be judged solely on speed [12].

As a pilot investigation conducted at a single institution with 10 participants, these findings provide important foundational evidence but require cautious interpretation regarding broader applicability. While this institutional focus enabled intensive data collection, it constrains generalizability across diverse educational and cultural contexts. Future multi-site investigations with bigger and more diverse samples are crucial to validate these preliminary findings and to prove the resilience of the operability framework across varied ASD populations.

Finally, the level of difficulty had a big effect on operability (Kruskal–Wallis H = 9.36, p = 0.009), which shows that interfaces need to be able to change. This suggests the potential importance of adaptive game mechanics that adjust to individual skill levels. The objective is not to uniformly modify the game’s difficulty for all participants, but to calibrate it in a manner that ensures each individual experiences a challenge without becoming inundated. This concept is further corroborated by further studies advocating for customization in digital settings to enhance the engagement of children with ASD [9,20].

4.1. Methodological Strengths and Limitations

The study employs an appropriate statistical framework through linear mixed-effects models with REML estimation and Kenward–Roger correction, properly accounting for dependency structure and providing accurate small-sample inference. Comprehensive validation through multiple approaches, including split-half reliability (0.94), partial correlations, weight sensitivity analysis, and discriminant validity assessment, establishes robust psychometric properties. Transparent reporting through effect sizes with bootstrap confidence intervals provides essential information for meta-analyses and future sample size calculations.

The comprehensive validation strategy employed addresses key concerns about metric validity in autism HCI research [49,51]. The systematic weight sensitivity analysis demonstrates that our findings are robust to different theoretical assumptions about component importance, with consistent interpretive patterns maintained across diverse weighting schemes. This methodological rigor provides confidence in the operability construct’s stability.

The partial correlation analyses provide compelling evidence that operability captures meaningful usability dimensions beyond simple performance metrics [43,45]. The significant residual correlations with difficulty level and movement patterns, after controlling for game score, confirm that the metric assesses genuine usability factors rather than redundant performance indicators.

The split-half reliability coefficient of 0.94 demonstrates excellent internal consistency despite the small sample size, supporting the metric’s psychometric adequacy [52]. The discriminant validity evidence further confirms that operability measures distinct constructs rather than confounded variables [46].

However, the study presents certain important limitations. The sample size of 10 participants, while aligning with the median of autism HCI research studies (median n = 9) [21], limits generalizability to broader ASD populations. Nevertheless, this focused approach enabled intensive longitudinal data collection across 25 sessions, which would be impractical with larger samples. The single-site design, with all participants recruited from one institution, potentially limits external validity, suggesting that future multi-site studies should examine whether findings generalize across different educational and cultural contexts.

A critical limitation is the absence of external validation using established usability scales such as the System Usability Scale (SUS) or User Experience Questionnaire (UEQ). While our validation analyses demonstrate that operability captures unique variance beyond performance metrics, convergent validity with standardized instruments remains unestablished. Future studies should incorporate these measures to confirm that our operability metric aligns with established usability frameworks while providing additional ASD-specific insights [53,54]. This limitation is particularly important given that our metric shows extremely high correlation with game performance (r = 0.999), despite evidence of construct independence through partial correlation analysis.

The fundamental limitation of 10 participants from a single institution significantly constrains statistical power (observed power = 0.23 for medium effects) and generalizability. While this aligns with autism HCI research norms (median n = 9), future confirmatory studies require minimum n = 30 per group to achieve adequate power (β = 0.80) for detecting meaningful effects (δ = 0.5).

While mixed-effects models appropriately handle nested data structure, violations of normality assumptions (particularly for Level 1 scores: W = 0.662, p < 0.001) limit the robustness of parametric inferences. Although sensitivity analyses with non-parametric alternatives yielded consistent results, larger samples with improved distributional properties are necessary for definitive conclusions.

4.2. Interpretation of Correlational Patterns

The significant negative associations with completion time (r = −0.821) and movement efficiency (r = −0.781) correspond with known usability theory, which posits that efficient interaction patterns signify enhanced interface operability. These linkages validate the metric’s responsiveness to authentic usability aspects while preserving autonomy from basic success measurements.

4.3. Implications and Future Directions

This pilot research establishes several foundations for future work. The ICC of 0.15 indicates that future studies should plan for clustered designs. Using the observed Hedges’ g = 0.47 for operability improvement, a confirmatory study would require approximately 30 participants per group to achieve 80% power. The clear progression in mean operability scores by difficulty level with 95% bootstrap confidence intervals demonstrates the metric’s sensitivity to difficulty changes while accounting for small-sample uncertainty. Based on these findings, larger multi-site studies with a minimum n = 30, incorporation of standardized usability measures, and randomized controlled trial designs are recommended. Additionally, the development of adaptive algorithms based on real-time operability metrics represents a priority direction for translating these preliminary findings into scalable therapeutic tools [55,56].

Figure 9 shows the mean operability scores by difficulty level with 95% bootstrap confidence intervals (BCa method, 2000 resamples). The clear progression demonstrates the metric’s sensitivity to changes in difficulty while accounting for small-sample uncertainty.

5. Conclusions

This research described the creation and empirical assessment of a serious game adapted for children with ASD, focusing on the operability and inclusivity subcharacteristics of the ISO/IEC 25010:2023 quality standard. This pilot framework demonstrated that the proposed operability model, which combines parametric and automated gameplay-based measures, is an efficient and valid tool for evaluating usability in neurodivergent groups.

The results revealed that 80% of sessions (20/25) achieved moderate-to-high operability levels (M = 45.07, SD = 10.52), with systematic increases across difficulty levels indicating effective interface design that supports user progression while identifying areas for targeted optimization.

The Kruskal–Wallis test showed significant differences in operability across difficulty levels (H = 9.36, p = 0.009, ε² = 0.39). Contrary to original predictions, a positive association between difficulty levels and operability (Hedges’ g = 0.97) suggests that the interface allowed users with higher baseline abilities to progress to more challenging levels, supporting the need for adaptive game design.

A significant disadvantage of this pilot study is the lack of simultaneous validation using recognized usability tools. Future research must include standardized instruments, including the System Usability Scale (SUS), User Experience Questionnaire (UEQ), and autism-specific assessment tools such as the Mobile App Rating Scale (MARS) adapted for ASD individuals. This multi-instrument methodology will provide convergent validity and facilitate comparison against standard usability frameworks, while maintaining the ASD-specific insights obtained by our operability measure. The incorporation of these standardized instruments is a pressing necessity for converting initial results into established clinical evaluation techniques.

The correlation matrix in Figure 10 shows relationships between key variables. The high correlation between operability and score (r = 0.999) validates theoretical expectations rather than indicating redundancy, as demonstrated by significant partial correlations when controlling for score.

The operability formula provides an established paradigm for evaluating usability in serious games designed for children with ASD. Although replication with bigger samples is crucial, these first results endorse the potential of well-crafted serious games as therapeutic instruments. Future research should incorporate adaptive designs, multi-modal feedback, and configurable settings, especially for users who are having trouble using the system. This pilot framework provides preliminary evidence aligned with ISO/IEC 25010:2023 standards for supporting children with ASD. Larger confirmatory validation studies are essential to establish definitive therapeutic efficacy and broader applicability.

Author Contributions

Conceptualization, M.P.L.-R., L.G.S.-V., R.J.M.-A., M.E.Y.-A. and A.E.C.-L.; Methodology, D.I.C.-L., M.P.L.-R., L.G.S.-V., and A.E.C.-L.; Software, D.I.C.-L., M.P.L.-R., D.S.T.-T. and G.M.M.-S.; Validation, R.J.M.-A., M.E.Y.-A. and A.E.C.-L.; Investigation, D.I.C.-L., M.P.L.-R., L.G.S.-V., D.S.T.-T., G.M.M.-S., R.J.M.-A., M.E.Y.-A. and A.E.C.-L.; Resources, D.I.C.-L., M.P.L.-R., L.G.S.-V., and R.J.M.-A.; Writing—original draft preparation, D.I.C.-L., M.P.L.-R., L.G.S.-V., and A.E.C.-L.; Writing—review and editing, D.I.C.-L., M.P.L.-R., R.J.M.-A., M.E.Y.-A. and A.E.C.-L.; Visualization L.G.S.-V., D.S.T.-T., G.M.M.-S., R.J.M.-A., and M.E.Y.-A.; Supervision, D.I.C.-L., M.P.L.-R., L.G.S.-V., and A.E.C.-L.; Funding acquisition APC, D.I.C.-L., and A.E.C.-L. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Universidad Estatal de Milagro (UNEMI). The official funding number is currently being processed and will be provided as soon as it becomes available.

Data Availability Statement

Dataset available on request from the authors. The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

Correction Statement

This article has been republished with a minor correction to the Data Availability Statement. This change does not affect the scientific content of the article.

References

Hodges, E.K.; Kuentzel, J.G.; Hook, J.N. Pediatric Neuropsychology: Perspectives from the Ambulatory Care Setting, 1st ed.; Routledge: New York, NY, USA, 2022; ISBN 978-1-003-34907-5. [Google Scholar]
Salloum-Asfar, S.; Zawia, N.; Abdulla, S.A. Retracing Our Steps: A Review on Autism Research in Children, Its Limitation and Impending Pharmacological Interventions. Pharmacol. Ther. 2024, 253, 108564. [Google Scholar] [CrossRef] [PubMed]
AlSalehi, S.M.; Alhifthy, E.H. Autism Spectrum Disorder. In Clinical Child Neurology; Salih, M.A.M., Ed.; Springer International Publishing: Cham, Switzerland, 2020; pp. 275–292. ISBN 978-3-319-43152-9. [Google Scholar]
Autism Spectrum Disorder. In Harris’ Developmental Neuropsychiatry: The Interface with Cognitive and Social Neuroscience; Harris, J.C., Coyle, J.T., Eds.; Oxford University Press: New York, NY, USA, 2024; pp. 445–516. ISBN 978-0-19-992811-8. [Google Scholar]
Sauer, A.K.; Stanton, J.E.; Hans, S.; Grabrucker, A.M. Autism Spectrum Disorders: Etiology and Pathology. In Autism Spectrum Disorders; Grabrucker, A.M., Ed.; Exon Publications: Brisbane, Australia, 2021; pp. 1–16. ISBN 978-0-6450017-8-5. [Google Scholar]
Corcoran, J.; Wolk, C.B. Autism Spectrum Disorder: Jacqueline Corcoran, Julie Worley, and Courtney Benjamin Wolk. In Child and Adolescent Mental Health in Social Work; Oxford University Press: New York, NY, USA, 2023; pp. 87–106. ISBN 978-0-19-765356-2. [Google Scholar]
Greydanus, D.E.; Patel, D.R.; Rowland, D.C. Autism Spectrum Disorder. In Comprehensive Pharmacology; Elsevier: Amsterdam, The Netherlands, 2022; pp. 396–434. ISBN 978-0-12-820876-2. [Google Scholar]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Sandgreen, H.; Frederiksen, L.H.; Bilenberg, N. Digital Interventions for Autism Spectrum Disorder: A Meta-Analysis. J. Autism Dev. Disord. 2021, 51, 3138–3152. [Google Scholar] [CrossRef]
Löytömäki, J.; Ohtonen, P.; Huttunen, K. Serious Game the Emotion Detectives Helps to Improve Social–Emotional Skills of Children with Neurodevelopmental Disorders. Brit J. Educ. Technol. 2024, 55, 1126–1144. [Google Scholar] [CrossRef]
Dalwadi, R.D.; Nalawade, S.; Mazumdar, P.; Chetia, B. Comprehensive Study on Serious Game Design for Autistic Children. In Proceedings of the 2023 IEEE 11th Region 10 Humanitarian Technology Conference (R10-HTC), Rajkot, India, 16 October 2023; pp. 990–996. [Google Scholar]
Abd El-Sattar, H.K.H.; Omar, M.; Mohamady, H. Developing a Participatory Research Framework through Serious Games to Promote Learning for Children with Autism. Front. Educ. 2024, 9, 1453327. [Google Scholar] [CrossRef]
Costello, R.; Donovan, J. How Game Designers Can Account for Those with Autism Spectrum Disorder (ASD) When Designing Game Experiences: In Research Anthology on Physical and Intellectual Disabilities in an Inclusive Society; Information Resources Management Association, Ed.; IGI Global: Hershey, PA, USA, 2022; pp. 202–224. ISBN 978-1-6684-3542-7. [Google Scholar]
Muneeb, S.; Sitbon, L.; Ahmad, F. Opportunities for Serious Game Technologies to Engage Children with Autism in a Pakistani Sociocultural and Institutional Context: An Investigation of the Design Space for Serious Game Technologies to Enhance Engagement of Children with Autism and to Facilitate External Support Provided. In Proceedings of the Proceedings of the 34th Australian Conference on Human-Computer Interaction, Sweetser, Penny, 29 November 2022; ACM: Canberra, Australia, 2022; pp. 338–347. [Google Scholar]
Suryapranata, L.K.P.; Soewito, B.; Kusuma, G.P.; Gaol, F.L.; Warnars, H.L.H.S. Quality Measurement for Serious Games. In Proceedings of the 2017 International Conference on Applied Computer and Communication Technologies (ComCom), Jakarta, Indonesia, 17–18 May 2017; pp. 1–4. [Google Scholar]
Wibawa, R.C.; Rochimah, S.; Anggoro, R. A Development of Quality Model for Online Games Based on ISO/IEC 25010. In Proceedings of the 2019 12th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 18 July 2019; pp. 215–218. [Google Scholar]
Lee, M.; Shin, S.; Lee, M.; Hong, E. Educational Outcomes of Digital Serious Games in Nursing Education: A Systematic Review and Meta-Analysis of Randomized Controlled Trials. BMC Med. Educ. 2024, 24, 1458. [Google Scholar] [CrossRef]
ISO/IEC 25010:2023; Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—System and Software Quality Models 2023. International Organization for Standardization, International Electrotechnical Commission: Geneva, Switzerland, 2023.
Estrada Molina, O.; Fuentes-Cancell, D.R.; García-Hernández, A. Evaluating Usability in Educational Technology: A Systematic Review from the Teaching of Mathematics. LUMAT 2022, 10. [Google Scholar] [CrossRef]
Berkovits, L.; Eisenhower, A.; Blacher, J. Emotion Regulation in Young Children with Autism Spectrum Disorders. J. Autism Dev. Disord. 2017, 47, 68–79. [Google Scholar] [CrossRef]
Mack, K.; McDonnell, E.; Jain, D.; Wang, L.; Froehlich, J.E.; Findlater, L. What Do We Mean by “Accessibility Research”?: A Literature Survey of Accessibility Papers in CHI and ASSETS from 1994 to 2019. In Proceedings of the Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Online, 8–13 May 2021; ACM: Yokohama, Japan, 2021; pp. 1–18. [Google Scholar]
Brysbaert, M. How Many Participants Do We Have to Include in Properly Powered Experiments? A Tutorial of Power Analysis with Reference Tables. J. Cogn. 2019, 2, 16. [Google Scholar] [CrossRef] [PubMed]
Tackett, J.L.; Lilienfeld, S.O.; Patrick, C.J.; Johnson, S.L.; Krueger, R.F.; Miller, J.D.; Oltmanns, T.F.; Shrout, P.E. It’s Time to Broaden the Replicability Conversation: Thoughts for and From Clinical Psychological Science. Perspect. Psychol. Sci. 2017, 12, 742–756. [Google Scholar] [CrossRef]
Cascio, M.A.; Weiss, J.A.; Racine, E.; the Autism Research Ethics Task Force. Person-Oriented Ethics for Autism Research: Creating Best Practices through Engagement with Autism and Autistic Communities. Autism 2020, 24, 1676–1690. [Google Scholar] [CrossRef]
Morris, C.; Detrick, J.J.; Peterson, S.M. Participant Assent in Behavior Analytic Research: Considerations for Participants with Autism and Developmental Disabilities. J. App. Behav. Anal. 2021, 54, 1300–1316. [Google Scholar] [CrossRef]
El Shemy, I.; Jaccheri, L.; Giannakos, M.; Vulchanova, M. Participatory Design of Augmented Reality Games for Word Learning in Autistic Children: The Parental Perspective. Entertain. Comput. 2025, 52, 100756. [Google Scholar] [CrossRef]
Magezi, D.A. Linear Mixed-Effects Models for within-Participant Psychology Experiments: An Introductory Tutorial and Free, Graphical User Interface (LMMgui). Front. Psychol. 2015, 6, 2. [Google Scholar] [CrossRef]
Luke, S.G. Evaluating Significance in Linear Mixed-Effects Models in R. Behav. Res. 2017, 49, 1494–1502. [Google Scholar] [CrossRef]
Barr, D.J.; Levy, R.; Scheepers, C.; Tily, H.J. Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal. J. Mem. Lang. 2013, 68, 255–278. [Google Scholar] [CrossRef]
Lakens, D. Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-Tests and ANOVAs. Front. Psychol. 2013, 4, 863. [Google Scholar] [CrossRef]
Kirby, K.N.; Gerlanc, D. BootES: An R Package for Bootstrap Confidence Intervals on Effect Sizes. Behav. Res. 2013, 45, 905–927. [Google Scholar] [CrossRef]
Rousselet, G.A.; Pernet, C.R.; Wilcox, R.R. Beyond Differences in Means: Robust Graphical Methods to Compare Two Groups in Neuroscience. Eur. J. Neurosci. 2017, 46, 1738–1748. [Google Scholar] [CrossRef]
Ghasemi, A.; Zahediasl, S. Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. Int. J. Endocrinol. Metab. 2012, 10, 486–489. [Google Scholar] [CrossRef]
Zimmerman, D.W. A Note on Preliminary Tests of Equality of Variances. Brit J. Math. Statis 2004, 57, 173–181. [Google Scholar] [CrossRef] [PubMed]
Lami, G.; Spagnolo, G. A Lightweight Software Product Quality Evaluation Method. In Proceedings of the 17th International Conference on Software Technologies, Lisbon, Portugal, 11–13 July 2022; SCITEPRESS—Science and Technology Publications: Lisbon, Portugal, 2022; pp. 524–531. [Google Scholar]
Carneiro, T.; Carvalho, A.; Frota, S.; Filipe, M.G. Serious Games for Developing Social Skills in Children and Adolescents with Autism Spectrum Disorder: A Systematic Review. Healthcare 2024, 12, 508. [Google Scholar] [CrossRef] [PubMed]
Talebi Azadboni, T.; Nasiri, S.; Khenarinezhad, S.; Sadoughi, F. Effectiveness of Serious Games in Social Skills Training to Autistic Individuals: A Systematic Review. Neurosci. Biobehav. Rev. 2024, 161, 105634. [Google Scholar] [CrossRef]
Valencia, K.; Rusu, C.; Botella, F.; Jamet, E. A Methodology to Evaluate User Experience for People with Autism Spectrum Disorder. Appl. Sci. 2022, 12, 11340. [Google Scholar] [CrossRef]
Aguiar, Y.P.C.; Galy, E.; Godde, A.; Trémaud, M.; Tardif, C. AutismGuide: A Usability Guidelines to Design Software Solutions for Users with Autism Spectrum Disorder. Behav. Inf. Technol. 2022, 41, 1132–1150. [Google Scholar] [CrossRef]
Jaramillo-Alcazar, A.; Lujan-Mora, S.; Salvador-Ullauri, L. Accessibility Assessment of Mobile Serious Games for People with Cognitive Impairments. In Proceedings of the 2017 International Conference on Information Systems and Computer Science (INCISCOS), Quito, Ecuador, 23–25 November 2017; pp. 323–328. [Google Scholar]
Wang, R.K.; Kwong, K.; Liu, K.; Kong, X.-J. New Eye Tracking Metrics System: The Value in Early Diagnosis of Autism Spectrum Disorder. Front. Psychiatry 2024, 15, 1518180. [Google Scholar] [CrossRef]
Yuan, A.; Sabatos-DeVito, M.; Bey, A.L.; Major, S.; Carpenter, K.L.; Franz, L.; Howard, J.; Vermeer, S.; Simmons, R.; Troy, J.; et al. Automated Movement Tracking of Young Autistic Children during Free Play Is Correlated with Clinical Features Associated with Autism. Autism 2023, 27, 2530–2541. [Google Scholar] [CrossRef]
Birkeneder, S.L.; Bullen, J.; McIntyre, N.; Zajic, M.C.; Lerro, L.; Solomon, M.; Sparapani, N.; Mundy, P. The Construct Validity of the Childhood Joint Attention Rating Scale (C-JARS) in School-Aged Autistic Children. J. Autism Dev. Disord. 2024, 54, 3347–3363. [Google Scholar] [CrossRef]
Maun, R.; Fabri, M.; Trevorrow, P. Participatory Methods to Engage Autistic People in the Design of Digital Technology: A Systematic Literature Review. J. Autism Dev. Disord. 2024, 54, 2960–2971. [Google Scholar] [CrossRef]
Lotfizadeh, A.D.; Gard, B.; Rico, C.; Poling, A.; Choi, K.R. Convergent and Discriminant Validity of the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP) and the Vineland Adaptive Behavior Scales (VABS). J. Autism Dev. Disord. 2025, 55, 803–811. [Google Scholar] [CrossRef]
Harrison, A.J.; Madison, M.; Naqvi, N.; Bowman, K.; Campbell, J. The Development of the Autism Stigma and Knowledge Questionnaire, Second Edition (ASK-Q-2), through a Cross-Cultural Psychometric Investigation. Autism 2025, 29, 195–206. [Google Scholar] [CrossRef]
Turvey, M.T. Coordination Dynamics: Issues and Trends. In Handbook of Sport Psychology; Tenenbaum, G., Eklund, R.C., Eds.; Wiley: Hoboken, NJ, USA, 2020; pp. 477–497. [Google Scholar]
English, M.C.; Poulsen, R.E.; Maybery, M.T.; McAlpine, D.; Sowman, P.F.; Pellicano, E. Psychometric Evaluation of the Comprehensive Autistic Trait Inventory in Autistic and Non-Autistic Adults. Autism 2025, 13623613251347740. [Google Scholar] [CrossRef] [PubMed]
Gowen, E.; Taylor, R.; Bleazard, T.; Greenstein, A.; Baimbridge, P.; Poole, D. Guidelines for Conducting Research Studies with the Autism Community. Autism Policy Pr. 2019, 2, 29–45. [Google Scholar]
Whyte, E.M.; Smyth, J.M.; Scherf, K.S. Designing Serious Game Interventions for Individuals with Autism. J. Autism Dev. Disord. 2015, 45, 3820–3831. [Google Scholar] [CrossRef] [PubMed]
Yifu, L.; Yan, M.; Libing, H.; Chunling, X.; Tao, D. The Effects of Human-Computer Interaction-Based Interventions for Autism Spectrum Disorder: A Meta-Analysis. Educ. Inf. Technol. 2025, 30, 8353–8372. [Google Scholar] [CrossRef]
Pronk, T.; Molenaar, D.; Wiers, R.W.; Murre, J. Methods to Split Cognitive Task Data for Estimating Split-Half Reliability: A Comprehensive Review and Systematic Assessment. Psychon. Bull. Rev. 2022, 29, 44–54. [Google Scholar] [CrossRef] [PubMed]
Schrepp, M.; Hinderks, A.; Thomaschewski, J. Design and Evaluation of a Short Version of the User Experience Questionnaire (UEQ-S). Int. J. Interact. Multimed. Artif. Intell. 2017, 4, 103–108. [Google Scholar] [CrossRef]
Lewis, J.R.; Sauro, J. Item Benchmarks for the System Usability Scale. J. Usability Stud. 2018, 13, 158–167. [Google Scholar]
Chow, J.; Zhao, H.; Sandbank, M.; Bottema-Beutel, K.; Woynaroski, T. Empirically-Derived Effect Size Distributions of Interventions for Young Children on the Autism Spectrum. J. Clin. Child Adolesc. Psychol. 2023, 52, 271–283. [Google Scholar] [CrossRef]
Sandbank, M.; Bottema-Beutel, K.; Crowley LaPoint, S.; Feldman, J.I.; Barrett, D.J.; Caldwell, N.; Dunham, K.; Crank, J.; Albarran, S.; Woynaroski, T. Autism Intervention Meta-Analysis of Early Childhood Studies (Project AIM): Updated Systematic Review and Secondary Analysis. BMJ 2023, 383, e076733. [Google Scholar] [CrossRef]

Figure 1. Games: (a) Memory Game; (b) Puzzle Game.

Figure 2. User interface flow diagram of the serious game.

Figure 3. Heatmap visualization of user interaction during the puzzle activity.

Figure 4. Distribution of Operability Scores.

Figure 5. Individual Participant Trajectories.

Figure 6. Partial Correlations Analysis. (a) Partial correlation between operability and difficulty level. (b) Partial correlation between number of movements and operability.

Figure 7. Operability by Difficulty Level.

Figure 8. Q-Q Plots for Normality Assessment.

Figure 9. Mean Operability Score by Difficulty Level with 95% Bootstrap Confidence Intervals.

Figure 10. Correlation Matrix of Key Variables.

Table 1. Descriptive characteristics of the sample.

Participant ID	Age (Years)	ASD Level (DSM-5)	Gender	Type of Communication	Observations
P1	6	Level 2	Male	Verbal with moderate support	Slow response time
P2	7	Level 1	Female	Fluent verbal	Strong visual learning
P3	8	Level 3	Male	Non-verbal, uses pictograms	High sensory sensitivity
P4	6	Level 2	Male	Verbal, simple sentences	Short attention span
P5	9	Level 1	Female	Fluent verbal	Needs repeated instructions
P6	7	Level 2	Male	Basic verbal communication	Mild motor difficulties
P7	8	Level 3	Female	Non-verbal, uses gestures	Avoids eye contact
P8	9	Level 2	Male	Verbal, simple responses	Visual preference
P9	6	Level 1	Female	Fluent verbal	Good concentration
P10	7	Level 3	Female	Non-verbal, uses AAC device	Low frustration tolerance

Table 2. Interpretative ranges.

Operability	Interpreted Usability Level
≥60	High operability
40–59	Acceptable/moderate operability
<40	Low operability, improvement recommended

Table 3. Descriptive Statistics.

Statistic	Value	95% Bootstrap CI
Count	25	-
Mean	45.07	[41.07, 49.07]
Standard Deviation	10.52	[8,52, 12.52]
Skewness	0.18	[−0.25, 0.61]
Kurtosis		[−1.47, −0.31]
Min	26.73	-
25%	35.77	-
50%	47.92	-
75%	54.55	-
Max	62.12	-
Effect Sizes	−0.89
Hedges’ g (vs. threshold 40)	0.47	[0.09, 0.92]
Kendall’s W (concordance)	0.37	[0.26, 0.48]

Table 4. Operability levels by frequency.

Operability Level	Frequency
Low	10
Moderate	13
High	2

Table 5. Mixed-Effects Model Results with Validation Evidence.

Level	n	Mean	SD	Range
1 (Easy)	10	37.04	9.31	26.73–62.12
2 (Medium)	10	48.71	7.49	31.41–57.22
3 (Hard)	5	53.87	7.16	43.61–61.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Carrión-León, D.I.; Lopez-Ramos, M.P.; Santillan-Valdiviezo, L.G.; Tanguila-Tapuy, D.S.; Morocho-Santos, G.M.; Moyano-Arias, R.J.; Yautibug-Apugllón, M.E.; Chacón-Luna, A.E. Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023. Computers 2025, 14, 370. https://doi.org/10.3390/computers14090370

AMA Style

Carrión-León DI, Lopez-Ramos MP, Santillan-Valdiviezo LG, Tanguila-Tapuy DS, Morocho-Santos GM, Moyano-Arias RJ, Yautibug-Apugllón ME, Chacón-Luna AE. Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023. Computers. 2025; 14(9):370. https://doi.org/10.3390/computers14090370

Chicago/Turabian Style

Carrión-León, Delia Isabel, Milton Paúl Lopez-Ramos, Luis Gonzalo Santillan-Valdiviezo, Damaris Sayonara Tanguila-Tapuy, Gina Marilyn Morocho-Santos, Raquel Johanna Moyano-Arias, María Elena Yautibug-Apugllón, and Ana Eva Chacón-Luna. 2025. "Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023" Computers 14, no. 9: 370. https://doi.org/10.3390/computers14090370

APA Style

Carrión-León, D. I., Lopez-Ramos, M. P., Santillan-Valdiviezo, L. G., Tanguila-Tapuy, D. S., Morocho-Santos, G. M., Moyano-Arias, R. J., Yautibug-Apugllón, M. E., & Chacón-Luna, A. E. (2025). Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023. Computers, 14(9), 370. https://doi.org/10.3390/computers14090370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Interaction Capability in a Serious Game for Children with ASD: An Operability-Based Approach Aligned with ISO/IEC 25010:2023

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design and Sample Size Justification

2.2. Population and Sample

2.3. Statistical Analysis Plan

2.3.1. Effect Size Calculation

2.3.2. Model Specification

2.3.3. Assumption Checking

2.4. Software Design and Development

2.5. Operability Evaluation

2.5.1. Operability Framework

2.5.2. Custom-Designed Parametric Model

2.5.3. Automated Metric-Based Model

2.6. Validation Strategy

3. Results

3.1. Results—Puzzle Game

3.1.1. Descriptive Statistics and Individual Variability

3.1.2. Individual Trajectories and Between-Subject Variability

3.1.3. Categorical Interpretation of Operability

3.1.4. Validation Results

Internal Consistency

Construct Validity

Comprehensive Validation Evidence

3.1.5. Between-Group Differences (Non-Parametric Analysis)

4. Discussion

4.1. Methodological Strengths and Limitations

4.2. Interpretation of Correlational Patterns

4.3. Implications and Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI