USEQ: A Short Questionnaire for Satisfaction Evaluation of Virtual Rehabilitation Systems

New emerging technologies have proven their efficacy in aiding people in their rehabilitation. The tests that are usually used to evaluate usability (in general) or user satisfaction (in particular) of this technology are not specifically focused on virtual rehabilitation and patients. The objective of this contribution is to present and evaluate the USEQ (User Satisfaction Evaluation Questionnaire). The USEQ is a questionnaire that is designed to properly evaluate the satisfaction of the user (which constitutes part of usability) in virtual rehabilitation systems. Forty patients with balance disorders completed the USEQ after their first session with ABAR (Active Balance Rehabilitation), which is a virtual rehabilitation system that is designed for the rehabilitation of balance disorders. Internal consistency analysis and exploratory factor analysis were carried out to identify the factor structure of the USEQ. The six items of USEQ were significantly associated with each other, and the Cronbach alpha coefficient for the questionnaire was 0.716. In an analysis of the principal components, a one-factor solution was considered to be appropriate. The findings of the study suggest that the USEQ is a reliable questionnaire with adequate internal consistency. With regard to patient perception, the patients found the USEQ to be an easy-to-understand questionnaire with a convenient number of questions.


Usability
Usability is an important quality attribute of a user's experience when interacting with a system or tool, and it is also an important attribute in helping users to achieve the suggested goals [1]. With regard to HCI (Human-Computer Interface) and usability, Bevan states in [2] that standards related to usability can be categorized as being primarily concerned with the use of the product (effectiveness, efficiency, and satisfaction in a specific context of use).
The categorization of Bevan is coherent with the ISO 9241-11 standard [3][4][5], which describes a widely accepted definition of usability. This standard indicates the rules that are needed in terms of ergonomics, hardware, software, and environments in order to obtain good usability for a product or system. Section 8.1 describes the term usability as "the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use".

Usability in Virtual Rehabilitation
One of the promising and emerging fields within rehabilitation therapies for different pathologies is virtual rehabilitation (VRh) [6][7][8][9]. VRh systems are designed to assist clinical specialists and patients in the rehabilitation process [10]. The use of ground-breaking technologies together with the emergence of entertaining and playful virtual environments (VE) have demonstrated promising results in the rehabilitation process [11][12][13][14], improving the adherence to treatments [12]. However, these systems should be tested regarding important aspects such as usability.
Currently, there are different questionnaires that are designed to evaluate usability in general-purpose systems. The best-known usability questionnaire is the system usability scale (SUS) [15,16], which measures the feeling of usability of the users when using computer systems. It is composed of 10 questions with a five-point Likert attitude scale (from strongly disagree to strongly agree). This questionnaire has been used in different domains such as: security software [17], mobile phones [18,19], PDA [20], Social Network sites [21,22], wiki sites [23], serious games [24], or robotics [25]. Even though the SUS questionnaire is not specifically designed for VRh systems, it has also been used for rehabilitation purposes due to the lack of questionnaires that focus on VRh systems. Meldrum et al. [26] tested balance in patients with vestibular and other neurological diseases using VRh and quantified the usability of the Nintendo Wii Fit Plus ® . Duvinage et al. [27] assessed the usability of a P300 system (using Brain-Computer interfaces) for lower-limb rehabilitation purposes. One considerable advantage of the SUS questionnaire is the reasonable number of questions that are to be answered at the end of the first session. However, the concepts of this questionnaire are too generic (computers, PDAs, Websites, etc.). The main drawback of the SUS questionnaire is that it does not include questions to obtain responses about specific items related to Virtual Rehabilitation.
Another well-known usability questionnaire is VRUSE [28]. Fitzgerald et al. [29] assessed the usability of the E-Yoga system using VRUSE, with the goal of improving postural control and biomechanical alignment of the subjects in a rehabilitation process. The VRUSE evaluates a wide range of concepts: functionality, user input, system output (display), user guidance and help, consistency, flexibility, simulation fidelity, error correction/handling and robustness, sense of immersion/presence, and overall system usability. The main drawback of this test is the large number of questions that the patients are required to answer [28]: the complete questionnaire has 100 questions. This drawback is especially important if the patients involved in a rehabilitation process have neurological and/or cognitive disorders. Other simplified usability questionnaires for VRh with reasonable outcomes are described in [30][31][32][33], but the drawback of these questionnaires is that the internal consistency has not yet been validated.
Kizony et al. [34] published the Short Feedback Questionnaire (SFQ), which is a questionnaire that is related to Witmer and Singer's Presence Questionnaire [35]. It is composed of eight questions with a five-point Likert attitude scale, and it has been used in virtual reality environments [36][37][38]. The SFQ questionnaire evaluates the user's sense of presence, perceived difficulty of the task, and any discomfort that users may have felt during the experience. This questionnaire does not focus on VRh systems.
To our knowledge, there are no validated questionnaires for testing usability or satisfaction of virtual rehabilitation systems. A questionnaire for this purpose must have a reasonable number of questions and internal consistency reliability.
Following the definitions of usability in [2][3][4][5], usability can be divided into three components: efficiency, effectiveness, and satisfaction. Focusing on VRh, efficiency and effectiveness can usually be measured through a clinical trial. With a classical clinical trial, we can compare an experimental group (using a VRh system) with a control group (following a traditional rehabilitation program) by evaluating efficacy and comparing the recovery level of the two groups. With regard to effectiveness, we can measure, for instance, the number of sessions that each group needs to reach a certain level. However, the third component of usability, satisfaction, cannot be evaluated in the same way as efficiency and effectiveness: a reliable and consistent questionnaire (with an adequate number of questions) is necessary to measure the satisfaction of the users.
The aim of the present study is to introduce the USEQ, a user satisfaction questionnaire that is specifically designed to evaluate satisfaction with virtual rehabilitation systems, and to validate their reliability by analyzing their internal consistency.

SEQ: The Suitability Evaluation Questionnaire
In [39], the SEQ was introduced as a 14-question questionnaire that is designed to test items such as satisfaction, acceptance, and security of use in virtual rehabilitation systems. The SEQ was designed by a multidisciplinary team of clinical and technical experts. Factors such as the length of the questionnaire, the type of questions to be asked and what to ask were taken into account in the design of the questionnaire. For the length of the questionnaire, the clinical experts that collaborated in the design of the SEQ estimated that a maximum of 15 questions would be an acceptable length for patients.
For the type of questions, the designers of the SEQ considered 13 questions with a five-point Likert Scale, plus an open-ended question offering patients the possibility to add comments if necessary. The SEQ has a five-point Likert Scale questions (instead of other options such as seven-point Likert Scale questions) because the authors considered five options of answers to be good enough, and, also, it is coherent with the main usability questionnaires that are currently being used: SUS [15], VRUSE [28], and SFQ [34] also use five-point Likert Scale questions.
For what to ask about, the designers of the SEQ composed the questions taking into account the usability questionnaires available and their own experience, both in the technical and in the clinical field.
A previous study evaluating the suitability of virtual rehabilitation for the elderly was carried out using the SEQ [40]. The SEQ was used to evaluate the ABAR (Active Balance Rehabilitation) system, the VRh system that is used in this study. The study presented in [40] allowed the evaluation of the perceived length and difficulty of the SEQ. In [40], the patients completed the questionnaire without any problems. None of the patients considered the questionnaire to be too long. The main drawback of SEQ is that it is composed of different dimensions; therefore, it is not possible to evaluate their internal consistency.

USEQ: Questions
The USEQ questionnaire is composed of the set of questions in the SEQ that evaluate satisfaction. The USEQ has six questions with a five-point Likert Scale. The questions and their scores are shown in Table 1.

Question Response
Not at All-Very Much Q1. Did you enjoy your experience with the system? 1 2 3 4 5 Q2. Were you successful using the system? 1 2 3 4 5 Q3. Were you able to control the system? 1 2 3 4 5 Q4. Is the information provided by the system clear? 1 2 3 4 5 Q5. Did you feel discomfort during your experience with the system? 1 2 3 4 5 Q6. Do you think that this system will be helpful for your rehabilitation? 1 2 3 4 5 The total score of the USEQ questionnaire ranges from 6 (poor satisfaction) to 30 (excellent satisfaction). To calculate this total score, we consider all of the questions to be positive, except for Q5, which is considered to be a negative question. The numerical value of the positive questions is used to calculate the score (for instance, if the patient selects 4 in Q1, then 4 is added to the total score). The negative question subtracts the numerical value of the response from 6 and then adds this result to the total score (for instance, if the patient selects 2 in Q5, then 4 is added to the total score).

Subjects
Patients who had balance disorders and were attending a VRh program were the potential participants in this study. The diagnoses of these patients include stroke, multiple sclerosis, meningioma, subdural hematoma, cervical myelopathy, Guillain-Barré syndrome, Parkinson disease, brain tumors, and vestibular pathology.

Study Interventions
In the study, the USEQ is used to test satisfaction of the ABAR system [42]. The ABAR system is a VRh system that is specifically designed to recover balance ( Figure 1).

Subjects
Patients who had balance disorders and were attending a VRh program were the potential participants in this study. The diagnoses of these patients include stroke, multiple sclerosis, meningioma, subdural hematoma, cervical myelopathy, Guillain-Barré syndrome, Parkinson disease, brain tumors, and vestibular pathology.

Study Interventions
In the study, the USEQ is used to test satisfaction of the ABAR system [42]. The ABAR system is a VRh system that is specifically designed to recover balance ( Figure 1). ABAR integrates the Nintendo ® Wii Balance Board (WBB) (Nintendo, Kyoto, Japan) for the interaction of the patient. WBB is a low-cost, widely-available device that allows the center-ofpressure of the patient to be obtained.
Five different games can be selected in ABAR to recover balance, in both sitting and the standing positions.

Study Procedures
The study was conducted in a specialized rehabilitation facility of a hospital under clinical supervision. Each patient completed the USEQ after the first session with ABAR. Each session with the system lasted 30 min; each session mixed periods of playing and resting, according to the specialist's indications. A member of our team was with the patients while they were answering the questionnaire. ABAR integrates the Nintendo ® Wii Balance Board (WBB) (Nintendo, Kyoto, Japan) for the interaction of the patient. WBB is a low-cost, widely-available device that allows the center-of-pressure of the patient to be obtained.
Five different games can be selected in ABAR to recover balance, in both sitting and the standing positions.

Study Procedures
The study was conducted in a specialized rehabilitation facility of a hospital under clinical supervision. Each patient completed the USEQ after the first session with ABAR. Each session with the system lasted 30 min; each session mixed periods of playing and resting, according to the specialist's indications. A member of our team was with the patients while they were answering the questionnaire.

Outcome Measures
The primary outcome measures were provided by the questionnaire. The scores for the questions of the USEQ allowed us to carry out the statistical analysis as described in Section 4.
Secondary measures were obtained when the patients completed the USEQ. At this stage, we had informal conversations about the USEQ with patients after the completion of the questionnaire. Although the informal conversations with patients are a subjective source of information, they provided us with responses to questions that are related to perceived questionnaire length or perceived questionnaire difficulty.

Data Analysis
Data analysis was carried out with SPSS for Windows, version 15 (SPSS Inc., Chicago, IL, USA) on a standard PC. To test the internal consistency reliability, we used Cronbach's alpha [43]. Cronbach's alpha is a coefficient of internal consistency that is commonly used to estimate the reliability of a test.
For sampling adequacy, the Kaiser-Meyer-Olkin (KMO) index and Bartlett's test of sphericity were calculated. While the KMO index ranges from 0 to 1, adequate sample size is accepted for a value over 0.5. For factor analysis to be considered suitable, Bartlett's test of sphericity must be less than 0.05.
To identify the factor structure of the USEQ, we carried out an exploratory factor analysis (analysis of principal components), retaining components with eigenvalues greater than 1; in addition, we carried out a scree plot inspection. For the correlations between the items and the factor, we used unrotated factor loadings above 0.3.

Sample Characteristics
In the study period, 198 patients who had balance problems and were attending a rehabilitation program in the clinical facilities were the potential participants in this study. Of these, 108 patients had unsolved acute trauma injuries, 36 patients had cognitive impairment, 5 patients had visual deficit or severe hearing impairment, and 9 patients refused to participate in the study. A final sample of 40 patients fulfilled the inclusion-exclusion criteria, and were included in the study. Table 2 shows information summarizing the characteristics of the patients. In the final sample, 19 patients were male and 21 patients were female. Most of the patients were elderly: 80% of the patients were older than 65 years old and the average age was 74.35 (SD 14.59) years old. Based on the post-injury time [44], 10 patients were post-acute (0-5 months post-injury), 11 patients were acute (6-23 months post-injury), and 19 patients were chronic (24 months or more post-injury). Most of the patients came from an urban background (75%), and 25% came from a rural background. With regard to the level of studies, 85% of the patients had completed primary (60%) or secondary (25%) studies whereas only three patients had completed higher studies and only three patients had not completed any studies.

USEQ Scores
The results corresponding to the USEQ question evaluation are presented in Table 3. The mean USEQ score was 25.80 (SD 3.589). The scale mean if the item is deleted was measured for all the items, ranging from 21.10 to 22.00.

Item-Total Correlation and Cronbach's Alpha
To carry out the item analysis for selecting items for inclusion in the scale, we used the corrected item-total correlation. In this way, we avoid the problem of performing the correlation of an item with the total of the scale, considering that this total includes the value of the item whereby that correlation would be skewed. The corrected item-total correlation values ranged from 0.321 to 0.666 (Table 3).
The Cronbach's alpha value for the complete scale was 0.716. Cronbach's alpha if the item is deleted was calculated for all six items, and none of the values were above Cronbach's alpha for complete scale (Table 3).

Factor Structure
The KMO index of sampling adequacy was 0.60. Bartlett's test of sphericity was significant (p < 0.001). The analysis of principal components indicated two components with an eigenvalue greater than 1, which accounted for 65.777% of the total variance. The first of the components had all six items (Table 4), and it explains 42.869% of the variance; the items of the first component had a correlation with the factor that ranged from 0.468 to 0.816. The second component had only four items, only two of which had factor loadings greater than 0.5 (Table 4). The scree plot ( Figure 2) did not reveal a clear point of inflexion, but the sharpest angle is placed in the second component.  The scree plot ( Figure 2) did not reveal a clear point of inflexion, but the sharpest angle is placed in the second component.

Informal Conversations
The informal interview after the completion of the questionnaire is a subjective source of data, but it is always very interesting to know the opinions of patients. The patients considered that the questionnaire was short and they were not distracted when answering the questions. The patients also perceived the USEQ to be an easy-to-understand questionnaire.

Discussion and Conclusions
Despite the fact that virtual rehabilitation is an emerging field that shows great potential, with many studies in recent years, there are no specific usability or satisfaction questionnaires with validated internal consistency for virtual rehabilitation systems. The USEQ is a questionnaire that is designed to evaluate satisfaction, which is part of usability, in virtual rehabilitation systems. This study has addressed the factor structure and internal consistency of the USEQ. Analysis of item-total correlation (Table 3) suggested that all items correlated well with the overall scale because all of the

Informal Conversations
The informal interview after the completion of the questionnaire is a subjective source of data, but it is always very interesting to know the opinions of patients. The patients considered that the questionnaire was short and they were not distracted when answering the questions. The patients also perceived the USEQ to be an easy-to-understand questionnaire.

Discussion and Conclusions
Despite the fact that virtual rehabilitation is an emerging field that shows great potential, with many studies in recent years, there are no specific usability or satisfaction questionnaires with validated internal consistency for virtual rehabilitation systems. The USEQ is a questionnaire that is designed to evaluate satisfaction, which is part of usability, in virtual rehabilitation systems. This study has addressed the factor structure and internal consistency of the USEQ. Analysis of item-total correlation ( Table 3) suggested that all items correlated well with the overall scale because all of the values were above 0.3. Therefore, the results suggest that the USEQ is a reliable questionnaire.
With regard to the Cronbach's alpha evaluation, Cronbach explains in [43] how this coefficient should be interpreted. Cronbach indicates that alpha values greater than or equal to 0.7, in general indicate good internal consistency. On the other hand, if Cronbach's alpha is too high, it may suggest redundancies because some items are testing the same question but in a different way. Streiner [45] recommends a maximum alpha value of 0.9.
In general, although no accurate ranges exist to classify the Cronbach alpha coefficient, an alpha coefficient ranging between 0.7 and 0.9 is considered to be acceptable. The Cronbach alpha coefficient for the USEQ was 0.716; therefore, this indicates adequate internal consistency.
The increase in Cronbach's alpha when an item is deleted indicates that the item could probably be removed from the scale. For the USEQ, the results of the study showed that the Cronbach alpha values if the item is deleted were minor for all six items (Table 3). Therefore, we kept all of the questions of the USEQ.
Regardless of the fact that an analysis of the principal components showed two factors with an eigenvalue greater than 1, the features of the second factor (Table 4) and the scree plot inspection ( Figure 2) suggest that only the first factor (which includes all six questions) can be considered to be appropriate. Therefore, a one-factor solution was considered to be appropriate, which accounted for 42.869% of the total variance. As shown in Table 4, all items had a correlation greater than 0.4 with the factor, which implies that they are probably meaningful. Thus, the factor under consideration represents 'user satisfaction' because all of the items were designed to measure user satisfaction with the system, and it has a Cronbach alpha coefficient of 0.716.
Other similar studies that evaluate the internal consistency of tests with good results show comparable values.
In [46], the authors evaluated HARUS (Handheld Augmented Reality Usability Scale), a questionnaire composed of 16 statements where users rate their agreement by using a seven-point Likert scale. HARUS has a two-factor structure. Statements 1 to 8 are measures of manipulability, while statements 9 to 16 are measures of comprehensibility. The authors confirm that the separate manipulability and comprehensibility scales have good internal consistency because they obtained alpha values between 0.71 and 0.83 in all their experiments.
Fackrell et al. [47] evaluated the validity and reliability of the Hyperacusis Questionnaire. The authors also specify α > 0.7 as reliability criteria for the scales evaluated.
Although the Cronbach alpha coefficient in this study is acceptable (0.716), it is just above the lower value of the range of values considered to be adequate (0.7 ≤ α ≤ 0.9). Several factors related to the sample may have influenced this result. We hypothesize that the size of the sample and/or the age of the patients (mostly elderly) could have influenced the result. As we suggest later, further studies with different samples can allow us to check how these factors influence the results.
In the study, the patients observed that the USEQ has a convenient number of questions. They also considered the questions in the USEQ to be clear. This perception of the questionnaire by the patients is especially interesting considering the age of the patients enrolled in the study (most were elderly) and their level of studies (67.50% had only primary studies or less). With younger patients, and/or with patients with a higher level of studies, it is expected that the perception of the questionnaire will be even more favorable.
With regard to the results for the evaluated system (ABAR), the USEQ score was 25.80. Since the scale ranges from 6 (lower satisfaction) to 30 (higher user satisfaction), the score suggests that the satisfaction perceived by patients was very high. With regard to the characteristics of the sample, it is necessary to pay attention to some points. The sample includes a heterogeneous population from the point of view of their disabilities. This is due to the VRh system used for the evaluation: ABAR is a VRh system designed to help in the recovery of balance, and there are many different disabilities that cause balance disorders. Moreover, these disabilities are most common among the elderly and, consequently, the patients of the sample tend to be seniors. Future studies that evaluate the USEQ with a younger population and/or with a homogeneous population with respect to their disabilities would be interesting, in order to compare their results to the results of this study.
With regard to the sample size, the main drawback of the study is that it is a bit limited to perform factor analytic studies. However, but it was impossible to add more patients that fulfilled the inclusion-exclusion criteria. In the future, further studies with larger samples will allow the results of this study to be compared.
Based on the results obtained, we do not suggest any changes in the items of the USEQ. From an objective point of view, this is for especially two reasons. First, the results confirmed a one-factor solution and adequate internal consistency. Second, the results support that all the items are necessary: first, because the results of the study showed that the Cronbach alpha values, if an item was deleted, were minor for all six items, and, second, because the Cronbach alpha coefficient is below 0.9 (a high value of alpha may suggest redundancies and indicate that the length of the questionnaire should be shortened). From a subjective point of view, the positive perception of the USEQ by the patients support our suggestion.
In summary, the USEQ presented in this study is an easy-to-understand questionnaire that has an appropriate number of questions that are correlated with each other. The USEQ is a reliable and useful tool for properly evaluating the satisfaction of the user (which is part of usability).