1. Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition and a recognized form of neurodiversity. While clinical frameworks define ASD through diagnostic criteria and its functional impact, the neurodiversity perspective regards it as a natural variation in human cognition, associated with both strengths and challenges. It is typically characterized by differences in social communication and interaction, alongside varying levels of restricted and repetitive patterns of behavior, interests, or activities [
1]. In that sense, people with ASD can benefit from several kinds of interventions that aim at providing the tools for developing skills that can lead to an easier approach to everyday life and social interaction. The interventions can be of different natures, including training based on the repetition of a task, i.e., the whole process of going to a cafe, ordering food, paying, and leaving. We refer to this approach as experiential training [
2], referring to the possibility to repeat a task several times in a controlled experimental environment, until the person feels ready to face the situation in real life.
Virtual reality (VR) has shown to be a solid and promising intervention tool for individuals with ASD, due to its ability to provide controlled, engaging, and repeatable experiences [
3], converging in experiential training opportunities [
4,
5]. VR environments can be customized to minimize sensory overload [
6], and tailored to address individual learning needs [
7]. Moreover, the strong visual–spatial learning preferences common in ASD align well with the immersive nature of VR, while the generally high affinity for technology among people on the spectrum [
8] supports acceptance.
People in the spectrum present specific support needs and functional profiles. According to the Diagnostic and Statistical Manual of Mental Disorders, published by the American Psychiatric Association, the intensity of support required by people in the spectrum can be defined on a three-level scale [
1]. The current edition of the scale is DSM-5 (published in 2013, with text revision DSM-5-TR in 2022), and classifies the levels as shown in
Table 1.
Day Activity Centers (DACs) are non-residential facilities offering structured, supervised daytime programs aimed at enhancing social, vocational, and life skills for individuals with disabilities or special needs. These centers provide essential opportunities, particularly for people with ASD, as activities are tailored to individual needs and adapted according to severity levels. The current use case is related to the DAC “Il Margine”
https://www.ilmargine.it/ (accessed on 25 October 2025) in Torino, Italy, targeting individuals with ASD at DSM-5 Level 1, as their cognitive and mobility skills make them suitable potential users of VR. Initial visits and transitions to new daily routines, however, often elicit significant distress and discomfort [
9], impeding smooth and effective adaptation. To facilitate transitions for new residents, we developed a VR system that replicates the DAC, allowing users to virtually familiarize themselves with the center. The project, “Knock Knock. It’s open.”, implements an experiential training approach, enabling repeated and safe exploration before in-person visits.
Our approach presents two main key points that, to the best of our knowledge, represent a novelty. (i) The digital twin represents the specific center and not a generic DAC. This way, users already know where the kitchen will be and where to go for a snack, or where they will find the restroom. The system also includes a set of interactive activities that are available in the specific center, such as playing table football or printing graphics on t-shirts using a screen-printing machine. This allows users to virtually explore the center and become familiar with the environment before visiting in person. (ii) The multiplayer sessions allow users to enter the virtual center together with their personal caregiver, the same way it will happen in person. The user is guided through the virtual visit by an avatar embodied by the participant’s actual caregiver, introducing a familiar and trusted presence, which may enhance engagement and reduce stress during the session. Throughout the session, the two can freely navigate the virtual environment, talk to each other, see each other’s avatar and share the virtual space, allowing the caregiver to tailor the experience to the participant’s needs.
A fundamental goal in the development of a virtual reality experiential training approach consists in validating an asset that is appropriate for people with ASD, which is the aim of the current work. We hereby present a user-centered co-designed system validation framework to assess usability and suitability for autistic people (in particular DMS-5 Level 1). The validation was conducted following a co-design approach, with current center attendees, allowing us to gather feedback from individuals with needs and cognitive skills akin to those of future newcomers who will benefit from the approach prior to entering the center.
In a previous work, we presented a preliminary validation of the VR system with neuro-typical users [
10], leveraging standard assessment methods, showing that the system is suitable and appropriate for a general audience, and the presence of an avatar companion was perceived as enjoyable. However, at that stage, we did not address its usability for individuals with ASD. The present work investigates the system’s suitability for people with autism and introduces a methodology for evaluation tailored to their needs. An additional finding was that the system elicits a generally high Sense of Presence. However, in the current work, we chose not to investigate this aspect with individuals on the spectrum, as the concept is too abstract for our target population.
In this scenario, we formulated the following hypotheses:
Hypothesis 1 (H1). A virtual reality application that represents a Day Activity Center can be usable for people diagnosed with Autistic Spectrum Disorder at DSM-5 Level 1;
Hypothesis 2 (H2). A virtual reality reproduction of a Day Activity Center is a suitable and appropriate experience for people diagnosed with Autistic Spectrum Disorder at DSM-5 Level 1;
Hypothesis 3 (H3). The presence of a caregiver in the virtual world provides a sense of enjoyment and support.
We conducted a preliminary user study with individuals diagnosed with ASD at DSM-5 Level 1 to validate the acceptability of the asset and gather insights on assessment methods for users with ASD. We then ran the main experiment with participants from the same population. The results indicate that the system is both usable and suitable for the target group. The presence of a caregiver, represented by an avatar, was perceived as a significant source of support. These findings confirm that our approach provides a valid and appropriate framework for validating an experiential training system designed for individuals with autism.
2. Related Works
The literature search focused on 2017–2025 studies in Scopus
https://www.scopus.com/ (accessed on 25 October 2025) using keywords on Autism Spectrum Disorder, virtual reality, social skills training, and transfer to real-world contexts. Peer-reviewed English-language studies were prioritized to enable contextual synthesis rather than a systematic review. Several works suggested that experiential training in VR can be beneficial for individuals with ASD in improving daily life social skills (see Mosher et al. [
11] and Chiappini et al. [
12] for reviews). It is important to note that studies involving participants with ASD often involve small samples due to practical and ethical recruitment constraints [
13,
14,
15]; this occurs in both system validation studies and efficacy-oriented investigations.
Due to their limited generalization abilities, people with ASD often struggle to transfer the skills acquired through conventional therapies to real-world scenarios [
16]. Yet, several works suggested that VR-based interventions can facilitate this transfer to everyday contexts [
13,
17], with several projects specifically targeting social skill development. Ip it et al. investigated the potential of VR to enhance affective expression and social reciprocity skills in children with ASD [
18]. A total of 176 children participated in the study, which included 30 VR training sessions over 15 weeks, targeting social norms, communication, conflict resolution, emotion recognition, and reciprocal conversation. Results show significant improvements in both affective expression and social reciprocity compared to the control group. Other studies explored how VR can reduce stress and help individuals with ASD become more familiar with real-world scenarios. Dixon et al. used VR to teach individuals with ASD to assess safe street-crossing conditions [
13]. Through multiple 360° video sessions with adjustable difficulty and environmental distractions, all three participants learned to identify safe crossing situations in both virtual and real-world scenarios. Yet, the training did not include the act of crossing itself. Soccini et al. proposed a VR experiential training system that allows users to try an airport scenario several times before facing it in real life, to improve their understanding of airport procedures and reduce the associated distress [
2]. The study aimed to compare the effectiveness of the VR intervention with a commonly used step-by-step textual guide. Miller et al. conducted a feasibility and preliminary learning-outcomes evaluation of a low-cost mobile VR air-travel training for individuals with ASD [
14]. Seven participants completed a non-interactive narrated VR airport tour. All participants tolerated and accepted the VR setup, and mean comprehension-retell scores improved by about
from baseline. However, feasibility and acceptability were evaluated primarily through clinician observation and brief adverse-effects check-ins rather than standardized participant-reported measures. Simões et al. [
19] proposed a VR serious game to help individuals with ASD familiarize themselves with the process of taking a bus. The experiment included 10 neuro-atypical participants and consisted of several tasks involving boarding a bus and reaching a designated destination. Results suggested an increase in the knowledge related to taking a bus and a reduction in anxiety during the experience. Kim et al. used VR to enhance self-efficacy and improve social skills in individuals with ASD [
20]. A total of 14 participants completed four scenarios of increasing difficulty, simulating typical work-related social interactions such as making coffee, using a cash machine, and holding conversations. Results showed a significant increase in perceived self-efficacy and greater self-awareness of both emotions and social behaviors. Schmidt et al. conducted a usability and learner-experience evaluation of Virtuoso, an immersive intervention for public-transport training in adults with ASD [
15]. Five participants completed two brief sessions, a 360° video walkthrough of the shuttle procedure, and a coach-guided rehearsal in an interactive VR environment, followed by standardized usability ratings. Both modules achieved above-average usability on the System Usability Scale, and interviews indicated high engagement in the interactive, coach-guided VR session, despite some controller-mapping and stability issues. The study emphasized design feedback over efficacy or real-world transfer outcomes. The presented works have shown to be effective in training particular social skills and familiarizing users with real-world scenarios. However, most interventions involved solo experiences, which may cause stress or discomfort for participants [
21]. To address this issue, some recent projects explored the use of conversational agents, powered by large language models (LLMs), to guide participants during VR experiences [
22,
23,
24]. While promising, conversational agents embodied within virtual avatar guides might resemble unfamiliar individuals, potentially causing discomfort or distress among participants due to the absence of a familiar and trusted presence.
3. Preliminary Study: Co-Design of the Experience and Methods Evaluation
The inclusion criteria comprised residents with a formal diagnosis of ASD at DSM-5 Level 1. A preliminary experiment was conducted with four volunteers (3 male, 1 female; mean age = 38 years, range = 23–50), none of whom had previous experience with VR. The gender distribution of participants reflects the demographic composition of DAC residents and aligns with the male percentage typically observed in individuals with ASD [
25,
26]. The purpose of this preliminary study was to evaluate and refine the investigation methods in VR with the specific target population, namely individuals with ASD at DSM-5 Level 1, who are capable of developing autonomy in everyday tasks but still require structured support through attendance at a DAC.
We defined our methodological research questions:
RQp1: Does our population accept wearing a Head Mounted Display (HMD)?
RQp2: Are the tasks in virtual reality adequate?
RQp3: Are questionnaires suitable for our population and in line with their cognitive level?
RQp4: Is a 7-point Likert scale a suitable way to provide a response to questions mainly related to user experience?
RQp5: How do users rate the overall experience?
3.1. Experimental Design
Before starting the experiment, the subjects were told by the caregivers they would try a virtual reality experience representing the center, developed by friendly scientists that would be in the room during the whole period. Caregivers instead received specific information and instructions on how to play the serious game and guide the participant. Every participant, together with their caregiver, was welcomed by the operator in a room at the center. Both the participant and the caregiver were asked to wear an HMD and freely explore the application. During their experience, we observed their behavior, focusing on the research questions RQp1 and RQp2. At the end of the experience, the caregiver asked every user about the experience in a free-form dialogue, and also proposed a set of six questions, reported in
Table 2, whose responses were collected on a Likert scale either of 1 to 7 or of 1 to 5, depending on the preferences and capabilities of the user. This would provide an insight on the methodological questions RQp3 and RQp4. The results of the questionnaire would provide users’ ratings of the experience, which falls under RQp5.
Figure 1 summarizes the co-design workflow and the preliminary experiment design.
3.2. Insights on Methods
All the subjects easily wore the HMD with no issues (RQp1), and understood the basics of navigation and interaction in the virtual environment in a timely manner, similarly to what we would expect from a neuro-typical population (RQp2). While the questions were, overall, considered appropriate, the presence and the support of the caregiver in the process was fundamental to provide punctual explanations on the concepts asked. Without the caregivers, the administration of the questionnaires would not be feasible (RQp3). The caregivers underlined that responses on a 7-point Likert scale required a strong sense of abstraction that resulted in difficulty and stress for the participants. Instead they suggested using a 5-point Likert scale that was perceived as easier and friendlier, and therefore, was approved. Also, to aid in the evaluation, we associated each number with an image depicting sad, neutral, or happy emoticons, allowing users to select their response based on these. The caregivers supported this approach and users appreciated it (RQp4).
3.3. Quantitative Results
As mentioned, responses were provided on a Likert scale from 1 to 5, where 1 corresponds to “Very Little” and 5 to “Very Much”. The total score, calculated as the sum of individual responses, ranged from 6 to 30, with question 4 representing a negative effect and being treated accordingly. Overall, the questionnaire results align with our observations: mean values indicate that participants generally felt comfortable and safe within the simulation (see
Table 3 and
Figure 2), responding to RQp5. However, participants reported varying levels of distress and differing appreciation of the virtual experience, particularly for questions 3 and 6.
3.4. Lessons Learned
Thanks to the preliminary study, we learned that
Users with Autism at DSM-5 Level 1 seem to have a positive attitude towards wearing an HMD and experiencing virtual reality.
Providing a high abstraction of the tasks in virtual reality is suitable, as users were able to match their virtual activities with real-world tasks (i.e. throw a ball, play table football, print a t-shirt). On that regard, there is no need to force users towards learning combinations of buttons or fine gestures.
Users were able to provide responses to the questionnaires, but several aspects need to be considered:
- (i)
Caregiver support is crucial for verbalizing feedback and recording responses on a scale;
- (ii)
Questions investigating physical discomfort should be avoided, as they tend to elicit unease;
- (iii)
Response options should be limited to five, as seven options are more difficult to manage;
- (iv)
The number of questionnaires and items should be kept to the minimum necessary, since users otherwise tend to become distracted or irritated.
5. Results
Overall, participants provided positive feedback across all evaluated parameters, supporting our hypotheses. The SUS results confirmed the good usability of the VR system (H1), as shown in
Table 7 and
Figure 6. The statistical analysis further supported this outcome, with
and
(
). A post hoc power analysis for a one-sample
t-test comparing the observed SUS mean to the acceptability benchmark of
indicated that, with
,
, and an effect size of
, (derived from the sample
,
), statistical power was adequate (power = 0.85), supporting the sufficiency of the sample for this validation.
Participants reported high ratings on the VRNQ (H2), as presented in
Table 8 and
Figure 7. The results were statistically significant, with
and
(
). The power analysis for the one-sample
t-test (
) and the effect size
(using the sample mean
and standard deviation
) indicated sufficient power (power = 0.99).
Finally, participants also provided high ratings on the Avatar Guide questionnaire (H3), reported in
Table 9 and
Figure 8. Scores were significantly above the threshold of
, with
and
(
). A one-sample
t-test power check, given
and effect size
(from
and
), indicated high power (power = 0.97).
During the VR experience, we observed participants’ behavior, focusing on their emotional responses and bodily cues. Overall, all six participants appeared enthusiastic and engaged, supporting H3. At the beginning of the virtual visit, caregivers prompted participants to perform certain activities, and all followed these suggestions with interest and motivation. After a few minutes, participants began exploring the virtual center independently, without further guidance. Most showed strong curiosity, actively requesting to interact with virtual objects and explore all available rooms.
To apply the Benjamini–Hochberg correction, p-values were ranked in ascending order as , , , and compared to the corresponding adjusted p-values , , , calculated, respectively, as , and , where . Since all raw p-values were below their adjusted values, the null hypotheses were rejected and the three tests remained significant after Benjamini–Hochberg correction.
We finally spotted a strong negative correlation between VRNQ and the Avatar Guide Questionnaire total scores (Pearson
,
). Pearson correlations between SUS and VRNQ and between SUS and the Avatar Guide Questionnaire were instead weak and not statistically significant. The results are reported in
Table 10.
6. Discussion
This study evaluated a virtual reality experiential training system designed to support individuals with ASD at DSM-5 Level 1 in adapting to a new daily environment, specifically a Day Activity Center (DAC). The positive usability and engagement outcomes observed in the study align with the core hypotheses that such a location-specific digital twin VR system can be both usable and suitable for this population, especially when combined with caregiver guidance. To test these assumptions, the evaluation targeted three hypotheses: usability (H1), suitability (H2), and the importance of the caregivers’ presence in providing a sense of enjoyment and perceived support (H3).
In the main study with six adult participants diagnosed with ASD at DSM-5 Level 1 (all male; 18–50 years), both standardized questionnaires and observational data converged in supporting these hypotheses. The SUS total score (mean ) exceeded the acceptability benchmark and was statistically significant (one-sample Wilcoxon signed-rank test vs. 50: , ), indicating good usability of the system for the target population (H1). The adapted VRNQ total score (mean ) was also significantly above the rescaled acceptability threshold (one-sample Wilcoxon signed-rank test vs. : , ), supporting suitability (H2). Beyond total scores, subscales help interpret which experience factors worked well. Participants reported high enjoyment and perceived technology quality (User Experience subscale total = ), solid ease of navigation and manipulation (Game Mechanics total = ), and particularly strong ratings for guidance and onboarding (In-Game Assistance total = ), suggesting that tutorials and caregiver-supported scaffolding effectively lowered the entry barrier to the experience. Participants’ feedback on the Avatar Guide questionnaire suggested that the presence of the caregiver, embodied in a virtual avatar, contributed to the overall enjoyment of the VR experience and provided meaningful support to participants (mean ), significantly greater than the threshold (one-sample Wilcoxon signed-rank test vs. 10.5: , ), supporting H3. All three tests remained significant after Benjamini–Hochberg correction, indicating that results retain statistical significance despite the limited sample while controlling the expected proportion of false positives. It is worth noting that the primary goal of the present study is system validation rather than inferential hypothesis testing; therefore, despite the small sample size, the statistical analysis supports immediate observations from user interaction rather than broad generalizations. Observationally, all participants displayed engagement and curiosity, transitioning from caregiver-prompted actions to increasingly autonomous exploration of the digital twin. This behavioral pattern towards adaptation aligns with the intent of the experiential training approach, and further supports H3. While one participant wore the headset only briefly, the group as a whole showed motivated exploration and interest in interacting with the modeled activities, such as those mirroring real objects and routines in the center.
Interestingly, we found a strong and significant negative correlation between VRNQ and Avatar Guide Questionnaire. Although highly preliminary and exploratory, this result might suggest that participants who experienced some difficulties with the virtual reality application (lower VRNQ scores) found caregiver support particularly helpful (high Avatar Guide Questionnaire score). On the other hand, participants who rated the VR experience positively (high VRNQ score) may have perceived the caregiver as less essential (low Avatar Guide Questionnaire score). This relationship will be investigated deeper in future works.
The presented outcomes are consistent with the co-design lessons learned in the preliminary study, which led us, together with caregivers, to minimize cognitive load in measurement (5-point Likert with visual supports), remove VRISE items from VRNQ, and keep the assessment concise; they also reaffirmed the practical necessity of caregiver assistance when administering questionnaires in this population. We can consider these findings as guidelines for the design of future system validations for this population. Taken together, these findings suggest that reproducing the specific DAC rather than a generic facility, including its real activities, and allowing the familiar caregiver to accompany the user as an avatar may be critical to usability, perceived appropriateness, and engagement. The design choice to validate directly with current attendees, whose needs and skills resemble those of future newcomers, further grounds the ecological relevance of the approach for transition support.
6.1. Limitations
A limitation of this study is the small sample size, which was due to restrictive inclusion criteria (individuals formally diagnosed with ASD at DSM-5 Level 1 and currently attending the center), as well as the logistical complexity involved in organizing and executing the experiments on site. Nevertheless, power analysis indicates that despite the small sample, the study was sufficiently powered to detect statistically significant improvements. Across the assessed outcomes, all three results remain significant after Benjamini–Hochberg correction. However, despite the limited sample size, these findings align with the existing literature, as many studies involving participants with ASD are based on small groups. They can therefore be regarded as a meaningful contribution to the field, particularly given the exploratory and preliminary nature of the present work.
A second limitation is that the presented findings may not apply to all autistic individuals, as users at different DSM-5 levels may need different design practices for virtual experiences and alternative reporting methods. The system was intentionally developed to serve individuals with ASD at DSM-5 Level 1 and is not intended for those who are nonverbal or have limited autonomous mobility. Although this targeted approach restricts the applicability of the results to other ASD populations, it represents a conscious methodological decision guiding the scope of this study. By focusing on a specific group, the investigation aims to ensure that the system is appropriately tailored to the abilities and needs of its intended users, acknowledging that broader generalizations should be made with caution. However, the co-design framework hereby presented may facilitate future identification of the most appropriate methods for individuals at different DSM-5 levels, while participant feedback will offer valuable insights regarding the suitability of both the system and assessment methods. Third, the intervention and assessment were conducted in a single session, limiting the evaluation of longer-term retention of familiarity with the technology, as well as sustained engagement and enthusiasm toward the experience. While this was not essential for system validation, it may offer valuable insights into mid- and long-term acceptance among individuals with ASD in future studies. Accordingly, this work should be interpreted as a pilot validation study conducted within a specific target population, with an emphasis on evaluating the proposed framework.
6.2. Future Works
Although discussed as contextual background, the effectiveness of the system in facilitating users’ familiarization with the virtual center prior to an in-person visit was not evaluated in this study and will be addressed in future research, involving upcoming attendees of the center. We plan to initially measure affective states such as distress, unease, and enthusiasm before and after the VR experiential training and the initial in-person visits, using subjective reports and speech sentiment analysis. Such multimodal evaluations are essential to determine whether observed usability, suitability, and the supportive role of the caregiver avatar translate into sustained benefits during the transition to a DAC. Finally, we will assess the overall efficacy of the methodology by comparing adaptation success rates with those of individuals adapting without the support of the VR system. These investigations will be essential to quantify long-term adaptation and verify real-world transfer.