Usability and Acceptance of Exergames Using Different Types of Training among Older Hypertensive Patients in a Simulated Mixed Reality

: Virtual and augmented reality (VR/AR) exergames are promising tools for increasing training motivation. However, the use of exergames with mixed reality (MR) headsets remains under-researched. Older adults with hypertension could also beneﬁt from the increased training adherence associated with MR. Endurance and strength endurance exercises are recommended for this group to lower blood pressure. The aim of the preliminary study ( n = 22) was to compare the usability and acceptance of two exergames, which represent two different training types—strength endurance training (SET) and endurance training (ET). The developed exergame prototypes were applied in “simulated MR” using a VR head-mounted display. We examined the following outcomes: usability (TUI), intention to use (TUI), subjective task load (NASA-TLX), frustration (NASA-TLX), and presence (PQ). The results showed that frustration was signiﬁcantly greater in the ET than in the SET ( p = 0.038). Presence was signiﬁcantly higher in the SET ( p = 0.002). No signiﬁcant differences in usability and acceptance were found in the exergames. The results indicate that usability and acceptance are not related to the type of training when utilizing MR exergames. Whether the results are transferable with a real MR headset must be determined in further research.


Introduction
According to the results of the German Health Update (GEDA) 2014/2015 European Health Interview Survey (EHIS) study, nearly one in three German adults has medically diagnosed hypertension.Almost two-thirds of adults aged 65 and older (63.8% of women and 65.1% of men) in Germany are known to have high blood pressure [1].Physical activity may be beneficial for both the prevention and treatment of hypertension [2].Moderateintensity endurance and strength endurance exercises are recommended for older adults with hypertension to lower blood pressure.Following the guidelines of the European Society of Cardiology and the European Society of Hypertension (ESC/ESH), hypertensive adults should perform at least 30 min (min) of dynamic aerobic exercise of moderate intensity five to seven days per week and additional resistance exercise on two to three days per week [2].
The BewARe project, which is a three-year research and development project founded by the German Federal Ministry of Education and Research (BMBF), aimed to develop a sensor-supported augmented reality (AR)/mixed reality (MR) system as a form of movement training for seniors with hypertension.The main goal was to enhance motivation and achieve adherence to exercise training since continuity of practice can be an obstacle for many older adults (aged 65 and older).To accomplish these goals, it was necessary to create a user-friendly, adaptive, personalized AR/MR exergame tailored to the target group.For this reason, a user-centered design (UCD) was developed in the project.As recommended in a systematic review by Duque et al. [3], it is advisable to involve elderly people from the initial steps onward, including collecting the essential specifications for the framework.The papers of Stamm et al. [4] and Vorwerg et al. [5] described the requirements for the system and movement training.In the current paper, the usability and acceptance of two types of training will be compared using a functional VR prototype developed based on the requirements gathered previously.In the requirements analysis, Stamm and Vorwerg showed that AR exergames should focus on the endurance component, as well as other concepts, and adapt training to the user's health profile and fitness level through precise training control via heart rate monitoring.The older adults wanted an interactive exergame with motivating, playful elements.The results suggest that many seniors would prefer an AR system to a VR system because of the lower risk of falls.Furthermore, a requirement of the experts was that four to five 30-min training sessions be carried out per week.A virtual agent was the first solution to the requirement of a multiplayer exergame.Due to the existing limitations of AR/MR headsets, such as the field of view in the HoloLens, the consortium decided to implement a high-fidelity prototype VR that simulates an ideal MR.
Since the definition of MR is ambiguous in the scientific community, it is first necessary to define what is meant by it in this paper.Milgram and Kishino's reality-virtuality continuum [6] introduced the term "mixed reality".They defined MR as a subset of VR-related technologies that involve the merging of real and virtual worlds regardless of the environment.Skarbez et al. [7] revised Milgram and Kishino's reality-virtuality continuum, considering the new technological possibilities, and provided a new taxonomy for describing MR experiences with dimensions including the extent of world knowledge, immersion, and coherence.Speicher et al. [8] conducted expert interviews and a literature review from which six notions of MR were derived.As a consortium, we considered MR to be a "stronger" version of AR involving possible interactions between the user and the virtual objects as well as the latter and the real environment, as described by Speicher et al. [8,9].The developed high-fidelity prototype was applied in a room that was exactly replicated in VR, simulating the feeling of reality for the study participants and creating a "simulated MR" with a VR head-mounted display (HMD).
The motivation behind this preliminary study was to ascertain whether different types of training have an influence on the usability and acceptance of older adults in relation to an MR exergame.In this way, a better estimation of usability and acceptance can be made for an MR training program that includes both types of training.Based on previously mentioned exercise guidelines for nonvirtual training in adults with hypertension, an MR training program would probably offer a maximum of one type of training per day to avoid the risk of an exercise-related acute cardiac event.If the usability is significantly lower for a certain training type, there is a risk of low adherence on that day, and the entire MR training program could be cancelled due to the decreasing level of motivation.In order to analyze in this preliminary study the usability and acceptance of two exergame versions, which represent two different training types (strength endurance training (SET) and endurance training (ET)), the following hypotheses were developed:

Hypothesis 1 (H1).
There is no significant difference in usability between the guided SET exergame and the gamified ET exergame.
We tested Hypothesis 1 against the following alternative hypothesis.

Hypothesis 2 (H2).
There is a significant difference in usability between the guided SET exergame and the gamified ET exergame.

Hypothesis 3 (H3).
There is no significant difference in acceptance (intention to use, ITU) between the guided SET exergame and the gamified ET exergame.
We tested Hypothesis 3 against the following alternative hypothesis.

Hypothesis 4 (H4).
There is a significant difference in acceptance (ITU) between the guided SET exergame and the gamified ET exergame.

Hypothesis 5 (H5).
There is no significant difference in subjective task load between the guided SET exergame and the gamified ET exergame.
We tested Hypothesis 5 against the following alternative hypothesis.

Hypothesis 6 (H6).
There is a significant difference in subjective task load between the guided SET exergame and the gamified ET exergame.

Hypothesis 7 (H7).
There is no significant difference in the frustration experienced between the guided SET exergame and the gamified ET exergame.
We tested Hypothesis 7 against the following alternative hypothesis.

Hypothesis 8 (H8).
There is a significant difference in the frustration experienced between the guided SET exergame and the gamified ET exergame.

Hypothesis 9 (H9).
There is no significant difference in the presence perceived between the guided SET exergame and the gamified ET exergame.
We tested Hypothesis 9 against the following alternative hypothesis.

Hypothesis 10 (H10).
There is a significant difference in the presence perceived between the guided SET exergame and the gamified ET exergame.

Technology Acceptance Models
Technology acceptance models are used to explain how people utilize or accept a particular technology.Intention to use can be considered a predictor of behavior, according to Davis' technology acceptance model (TAM) [10].The TAM is based on Fishbein and Ajzen's theory of reasoned action [11], in which behavioral intention is defined as an individual's subjective probability of performing a particular behavior [11,12].According to Davis, there are three factors that affect technology acceptance: perceived usefulness and ease of use, as well as attitude toward using.Over the years, there have been several extensions of the TAM to improve the predictive accuracy of technology use [13].The most inclusive model is the unified theory of acceptance and use of technology (UTAUT), which integrates elements from eight different technology acceptance models [14].It is common to adopt the TAM and the UTAUT to assess the intention of older people to adopt new technologies.Cechetti et al. [15], for example, formulated a questionnaire based on the TAM to evaluate a mobile health application for hypertension monitoring in order to improve user engagement.However, relevant psychological factors, as well as theories regarding aging, which are crucial in the actual employment of technologies, were not considered in these technology acceptance models.Kothgassner et al. [16] developed an instrument, the Technology Usage Inventory (TUI), which includes key psychological factors to better reflect accessibility, usability and acceptability and is thus intended for older adults.Longo [17] investigated the relationship between usability and mental workload and found that these are two non-overlapping constructs that can be used together to improve the prediction of human performance.

The Usability and Acceptance of HMD-Based VR Exergames in Older Adults
In a systematic review by Miller et al. evaluating the effectiveness and feasibility of the use of VR/gaming systems by older adults at home to enable physical activity, the authors assessed the evidence as weak with a high risk of bias.In studies analyzing feasibility, strong retention (≥70%) and adherence (≥64%) were reported [18].Nevertheless, this systematic review must be put into perspective since rapid technical development means that completely different systems are available in the current commercial market.For example, Miller et al. did not consider HMD technology in 2014.
Tuena et al. [19] analyzed the usability, user experience (UX), and feasibility of VR clinical systems with older adults in a more recent systematic review.Regardless of a number of technical and interactional issues, the included studies showed that the usability of clinical VR systems is good, well-accepted, appropriate, effective, and useful [19].However, the findings indicated that non-HMD systems are considered better for older people.Huygelier et al. [20] assessed the acceptance of VR HMDs with 76 older adults.The participants positively accepted and tolerated the VR HMD.The self-reported cybersickness was minimal and was not associated with VR exposure.
Nevertheless, the digital divide between younger and older adults may inhibit older adults from achieving the benefits of immersive technologies [21].Older people are confronted with individualized challenges when using VR or AR.They vary widely in terms of their use of and attitudes toward technology.Therefore, developers of VR/AR software and hardware should consider individual needs and skills [22,23].Physical and cognitive abilities can vary among older adults.Thus, mobility impairments, dementia and visual or hearing losses may occur.Especially in VR, these can lead to an increased risk of falls or limited use of the application.Therefore, it is important that these are considered in the development of VR/AR systems [23].
Initial analyses also showed a high acceptance and usability in relation to sports activities with a VR HMD system [24][25][26].The Virtual Park, a VR-based system on a cycle-ergometer, aims at improving the performance of elderly patients during endurance exercise training.The training was found to be feasible and positively accepted by elderly people with chronic respiratory diseases.After a single session, the preliminary results for a group of eight patients demonstrated excellent usability and high acceptability [27].A canoeing game was also tested on older people as a VR HMD.The results indicated that Canoe VR was received very well and can be utilized as an additional fitness tool [28].Other types of training outside of the endurance category have been considered in only a few investigations involving older adults and HMD-based VR.Høeg et al. employed virtual reality-based high-intensity interval training for pulmonary rehabilitation in their feasibility and acceptability study [29].The use of VR in their study did not lead to cardiovascular demands, higher dyspnea, tension or any serious adverse events or severe cybersickness.
A few works have examined physical activity over a longer period of time in multiple training sessions.In a pilot study by Shabbir et al. [30], 30 older adults tested nine VR applications that promote physical activities, among other tasks, for 15 min twice a week for six weeks.The older adults perceived VR as useful, easy to use, and an enjoyable experience.In an investigation by Chau et al. [31], training for 30 min three times a week for six weeks in the VR was recommended for older individuals with disabilities.Most of the 135 participants accepted the VR training, and 65.2% provided positive comments.
In some works, immersive HMD VR systems have been compared with traditional video exercises.Kruse et al. [32] conducted a study with 25 older adults in order to compare a recorded 2D gymnastics video with an immersive VR exergame.The acceptance observed in the investigation suggests that a VR exergame can be a suitable alternative but not a replacement for 2D video-based exercise activities.

Exergames Using Holographic MR Headsets
Although there are already VR exergames that are used by older adults, it is difficult to identify any publications on exergames for MR headsets.Kunze et al. [33] described a new research field called "Superhuman Sports" as human augmentation, which combines the competitive and physical elements of a sport with technology to overcome the limitations of the real world.Kegeleers et al. [34] developed such a "Superhuman Training" application for AR, which is an adventure shooter for the Microsoft HoloLens and represents a multiplayer game.Through social interaction, players work together to destroy an energy core.An initial survey showed that 7 out of 10 participants were satisfied with the level of physical exertion of the game.The game League of Lasers by Miedema et al. [35] is a combination of football and Pong in which two teams compete against each other while both wearing a HoloLens.In a user study with 32 participants, the game was rated as fun to play, intuitive and immersive.Another multiplayer augmented reality game is VRabl [36], which is a dodgeball game based on two Microsoft HoloLenses.The first evaluation indicates that VRabl is also enjoyable and immersive for older adults.
An initial prototype exergame in an MR environment for rehabilitation was evaluated with ataxic patients [37].The exergame is a HoloLens 2 application involving a pointing and reaching exercise to improve control over upper limbs.Currently, there is a dearth of exergames for holographic MR headsets, which is certainly related to the novelty of the technology and the limitations that still exist with MR headsets.Consequently, there is a lack of studies evaluating MR exergames in the context of rehabilitative interventions.

Data Already Published from the Study
Vorwerg et al. [38] presented a part of the data collected in this study and showed in the results that both VR exergames can lead to a positive short-term effect on lowering blood pressure.Heart rate was measured in the training sessions via smartwatches.Furthermore, the rate of perceived exertion (RPE) and the NASA-TLX subscales of mental, physical, temporal demand, and effort were presented in the work.Stamm et al. [39] compared VR sickness (Simulator Sickness Questionnaire [40]) between a strength endurance-based and an endurance-based VR exergame.Significant differences were found only for the scale "oculomotor", which was higher for the endurance exergame.Buchem et al. [41] described the gamification design of the exergame and investigated the user experience of the endurance-based exergame with the User Experience Questionnaire [42].The results showed for all scales a score of at least above average compared to the benchmark.The "stimulation" and "novelty" scales were classified as "good".In a further paper, Buchem [43] presented the design for rapport with virtual agents in a simulated mixedreality environment.

Materials and Methods
The aim of the preliminary study was to evaluate the usability and acceptance of two VR exergame versions, which represent two different training concepts.The first exergame is a strength endurance training (SET) game, and the second is an endurance training (ET) game.Both exergames were developed as part of the BewARe project [44] for the physical training of older patients with hypertension.The VR exergame prototypes used "simulated" MR environments in order to complete an initial pilot study with older hypertensive adults in an ideal environment.Ethical approval was gained from the Ethics Committee of the Charité-Universitätsmedizin Berlin (No. EA1/019/20) and was registered in the German Clinical Trials Register (DRKS-ID: DRKS00022881).

General
The pilot study applied a mixed-method approach and included quantitative and qualitative instruments for data collection.This paper focuses on the quantitative results.The investigation was conducted in a VR setting in a mobile laboratory truck called the VITALab.Mobile in 2020.

Procedure
The participants were recruited via the internal sample database of the Geriatrics Research Group of the Charité Berlin or by flyers via a university for seniors and a facility for health sports.A personal telephone screening was conducted to clarify which volunteers matched the inclusion criteria.On the first visit, before the volunteers were included, the Tinetti test [45] took place, which measured the risk of falling.Participation was only possible if there was no increased risk of falling involved.The following inclusion criteria were applied: (1) ≥65 years, (2) diagnosed with essential hypertension (stage I), (3) independent mobility, and (4) at least one fully completed training session.Participants who did not complete the first training session had to be excluded.The exclusion criteria were: (1) the existence of a legal guardianship, (2) mild cognitive impairment (MCI) and severe cognitive impairment (Telephone Interview for Cognitive Status, TICS < 33 points [46]), (3) dizziness, (4) severe visual impairment, (5) motion sickness, (6) medical conditions associated with a high risk of falling (stroke, Parkinson's disease), and ( 7) an increased risk of falling (Tinetti test < 24 points).All participants were informed about the study procedure and provided their written consent to participate.
The included participants undertook the same procedure on both visits.The only difference was the use of various "simulated" MR applications (ET and SET).First, the participants filled in the Technology Usage Inventory pre-test [16], which was the first part of the questionnaire (pre-testing) and had to be submitted before the intervention.During Visit 1, the immersive tendencies questionnaire [47] was conducted before the task-based application.The participants were then briefed about the HMD headset.After that, the trackers were attached to their hands, and the VR headset was put on.The taskbased application of the movement training in VR subsequently followed.All participants performed the training in the same order.On visit 1, they performed the strength endurance training application, and on visit 2, they performed the endurance training application.There was no more than one week between visit 1 and visit 2 for all the participants.Both applications consisted of five different training sequences (introduction, warm-up, training, cool-down, and evaluation).The training sessions lasted about 25 min each.After the applications, the following assessments were completed by the participants during both visits: the TUI post-test [16], the National Aeronautics and Space Administration (NASA) task load index [48] and the presence questionnaire [47].Other assessments which are not described in this publication were also applied.The remaining assessments are mentioned in Section 2.4 and can be found in the publications named.

Technology Usage Inventory
The Technology Usage Inventory [16] is employed to assess technology-specific and psychological factors that contribute to the actual use of new technology.The instrument is based on the TAM and contains the following eight scales: curiosity, anxiety, interest, ease of use, immersion, usefulness, skepticism and accessibility.In addition, the procedure contains a scale measuring intention to use (ITU).The internal consistencies (Cronbach's alpha) of the eight scales range from α = 0.70 to α = 0.89 [16].

NASA Task Load Index
The NASA task load index (TLX) [48] is a subjective instrument for assessing workload.In our case, we applied the NASA-TLX to evaluate the task load during the use of taskbased exergames.The NASA-TLX calculates an overall score based on a weighted average of ratings on six subscales: mental, physical and temporal demand, as well as performance, effort and frustration.The German version of the NASA-TLX showed a high internal consistency (Cronbach's α = 0.84) [49].

Immersive Tendencies Questionnaire and Presence Questionnaire
The immersive tendencies questionnaire (ITQ) [47] was developed to measure the tendency of individuals to be involved or immersed in virtual environments.The presence questionnaire (PQ), which was also developed by Witmer and Singer, measures the degree to which individuals perceive presence in a virtual environment.The ITQ consists of the following subscales: focus, involvement, emotions and game.The PQ includes subscales measuring realism, possibility to act, quality of interface, possibility to examine, selfevaluation of performance, sounds and haptic perception.The internal consistencies (Cronbach's alpha) for the ITQ and PQ were 0.75 and 0.81 [47].The study took place in the Living Lab VITALab.Mobile [50], which is a project associated with the BewARe project.The VITALab.Mobile (Figure 1) is a mobile VR/AR laboratory set up in a truck for case, field studies and clinical trials in extended reality (XR) research.During the evaluation, the VITALab.Mobile was placed on the campus of the Evangelisches Geriatriezentrum Berlin.
The NASA task load index (TLX) [48] is a subjective instrument for assessing workload.In our case, we applied the NASA-TLX to evaluate the task load during the use of task-based exergames.The NASA-TLX calculates an overall score based on a weighted average of ratings on six subscales: mental, physical and temporal demand, as well as performance, effort and frustration.The German version of the NASA-TLX showed a high internal consistency (Cronbach's α = 0.84) [49].

Immersive Tendencies Questionnaire and Presence Questionnaire
The immersive tendencies questionnaire (ITQ) [47] was developed to measure the tendency of individuals to be involved or immersed in virtual environments.The presence questionnaire (PQ), which was also developed by Witmer and Singer, measures the degree to which individuals perceive presence in a virtual environment.The ITQ consists of the following subscales: focus, involvement, emotions and game.The PQ includes subscales measuring realism, possibility to act, quality of interface, possibility to examine, self-evaluation of performance, sounds and haptic perception.The internal consistencies (Cronbach's alpha) for the ITQ and PQ were 0.75 and 0.81 [47].

VITALab.Mobile
The study took place in the Living Lab VITALab.Mobile [50], which is a project associated with the BewARe project.The VITALab.Mobile (Figure 1) is a mobile VR/AR laboratory set up in a truck for case, field studies and clinical trials in extended reality (XR) research.During the evaluation, the VITALab.Mobile was placed on the campus of the Evangelisches Geriatriezentrum Berlin.Since, at the time of the study, the limitations of MR headsets for the safe and effective use of MR training could not be guaranteed, the consortium decided to use a high-fidelity prototype in VR.To achieve this as effectively as possible, the interior of the VITALab.Mobile truck was recreated in VR, here referred to as "simulated MR".Based on a training room (Figure 2), it contains various real training objects, which were also recreated in VR to enhance the feeling of immersion.The participants were supposed to experience the feeling that they had put on an MR headset through which they could see the exact same room.Since, at the time of the study, the limitations of MR headsets for the safe and effective use of MR training could not be guaranteed, the consortium decided to use a high-fidelity prototype in VR.To achieve this as effectively as possible, the interior of the VITALab.Mobile truck was recreated in VR, here referred to as "simulated MR".Based on a training room (Figure 2), it contains various real training objects, which were also recreated in VR to enhance the feeling of immersion.The participants were supposed to experience the feeling that they had put on an MR headset through which they could see the exact same room.

Equipment
In the study, an HTC Vive Pro was used as a VR HMD headset in order to simulate an MR system, which will be a future use scenario.In addition to the headset, two different hand controllers were utilized.While in the SET exergame, the trackers were attached to the hands of the users, Valve Index controllers were employed in the ET exergame.The trackers in the SET exergame allowed the participants to interact with objects (dumbbells and a chair), which also had attached trackers.The Valve Index controllers, on the other hand, enabled finger tracking and thus grasping, releasing, and throwing in the ET exergame.The Valve Index Base Station 2.0 devices were positioned diagonally to each other under the ceiling of the truck.A Polar M600 smartwatch was utilized to monitor and record the heart rate during the exergame application.A,B) were taken independently of each other at different times; thus, e.g., the chair is not in exactly the same position in the images, which, however, was, the case during the test.

Equipment
In the study, an HTC Vive Pro was used as a VR HMD headset in order to simulate an MR system, which will be a future use scenario.In addition to the headset, two different hand controllers were utilized.While in the SET exergame, the trackers were attached to the hands of the users, Valve Index controllers were employed in the ET exergame.The trackers in the SET exergame allowed the participants to interact with objects (dumbbells and a chair), which also had attached trackers.The Valve Index controllers, on the other hand, enabled finger tracking and thus grasping, releasing, and throwing in the ET exergame.The Valve Index Base Station 2.0 devices were positioned diagonally to each other under the ceiling of the truck.A Polar M600 smartwatch was utilized to monitor and record the heart rate during the exergame application.

SET Exergame
The strength endurance training (SET) exergame included strength endurance-based exercises in the main part.Only the main part (excluding the introduction, warm-up, cool down and training analysis) differed between the two exergames.The SET exergame represented guided instruction-based training.The participants performed five exercises ((A) squat, (B) overhead press, (C) lateral raise, (D) leg raise, and (E) heel raise), during which they remained in a static position (Figure 3).The trainer agent "Anna" instructed and supported the participants during the use of the exergames.First, the participants began the SET with a learning phase.In this stage, the participants were supposed to observe the virtual trainer during each exercise.The trainer showed the execution as well as explained important hints about the posture for the current exercise.After Anna's demonstration, the participants were asked to repeat the exercises three times each.This was completed with all five exercises.Subsequently, in the exercise phase, a set with 20 repetitions was performed for each of the five exercises (with the exception of the leg raise involving 12 repetitions).Between the exercises, a short break of 20 seconds was executed.During the active break, the participants walked around the room and touched colored bubbles, causing them to burst.To go beyond pure VR and simulate an MR, training objects were equipped with trackers (chairs and dumbbells) in the SET exergame.These could be moved freely in the space after the participants were prompted by Anna.A,B) were taken independently of each other at different times; thus, e.g., the chair is not in exactly the same position in the images, which, however, was, the case during the test.

SET Exergame
The strength endurance training (SET) exergame included strength endurance-based exercises in the main part.Only the main part (excluding the introduction, warm-up, cool down and training analysis) differed between the two exergames.The SET exergame represented guided instruction-based training.The participants performed five exercises ((A) squat, (B) overhead press, (C) lateral raise, (D) leg raise, and (E) heel raise), during which they remained in a static position (Figure 3).The trainer agent "Anna" instructed and supported the participants during the use of the exergames.First, the participants began the SET with a learning phase.In this stage, the participants were supposed to observe the virtual trainer during each exercise.The trainer showed the execution as well as explained important hints about the posture for the current exercise.After Anna's demonstration, the participants were asked to repeat the exercises three times each.This was completed with all five exercises.Subsequently, in the exercise phase, a set with 20 repetitions was performed for each of the five exercises (with the exception of the leg raise involving 12 repetitions).Between the exercises, a short break of 20 s was executed.During the active break, the participants walked around the room and touched colored bubbles, causing them to burst.To go beyond pure VR and simulate an MR, training objects were equipped with trackers (chairs and dumbbells) in the SET exergame.These could be moved freely in the space after the participants were prompted by Anna.
For the time orientation, a time bar was placed behind Anna, which functioned during the exercises at the speed of Anna's pace.A repetition counter was operated manually and was displayed to the left of Anna.The participants assumed that the repetitions were recognized automatically.This was not resolved for the participants during the interventions based on the Wizard of Oz methodology.Later in the project, automatic repetition counting will be enabled through motion detection.When the participants looked at their wrists, they saw a smartwatch, which they wore around their wrists in the same place as the one they had in the real world.This allowed the live-streamed heart rate (HR) to be read.A green display meant they were in the individual target HR as calculated for them; a blue one meant that they were below it, and a red one meant that they were above it.Behind Anna, the participants could see which exercise was being performed, and the exercise name was shown.Furthermore, the participants were told that voice control was possible, so the participants could say "pause" and Anna would recognize this.The Wizard of Oz methodology was again followed, in which the study staff paused the game when the command was given.For the time orientation, a time bar was placed behind Anna, which functioned during the exercises at the speed of Anna's pace.A repetition counter was operated manually and was displayed to the left of Anna.The participants assumed that the repetitions were recognized automatically.This was not resolved for the participants during the interventions based on the Wizard of Oz methodology.Later in the project, automatic repetition counting will be enabled through motion detection.When the participants looked at their wrists, they saw a smartwatch, which they wore around their wrists in the same place as the one they had in the real world.This allowed the live-streamed heart rate (HR) to be read.A green display meant they were in the individual target HR as calculated for them; a blue one meant that they were below it, and a red one meant that they were above it.Behind Anna, the participants could see which exercise was being performed, and the exercise name was shown.Furthermore, the participants were told that voice control was possible, so the participants could say "pause" and Anna would recognize this.The Wizard of Oz methodology was again followed, in which the study staff paused the when the command was given.

ET Exergame
The endurance training (ET) exergame represented a gamified training session (Figure 4) and included three endurance-based exercises ((A) a ball game, (B) high five and (C) hustle dance).In the ET exergame, the participants actively interacted with the trainer, in contrast to the SET one.During the ball game, the participants threw a virtual ball into a virtual ring held by the agent, Anna.Within the given time, Anna moved the ring into different positions.The participants caught the ball with the Valve Index controllers through a gripping movement of the fingers.By spreading the fingers in combination with a throwing motion, the ball was thrown out of their hands.In the high-five game, Anna held her hands in different positions.The participant had to mirror her movements and touch the circles on Anna's hands with the controllers.In the hustle dance exercise, the participants were taught various dance steps by the trainer, which they tried to imitate first without music and then with music.The participants learned a certain sequence of steps consisting of four dance elements: a basic step, a cross step, the Travolta and a mixer.The ball and high-five games lasted 2-3 min, and the hustle dance lasted 12 min.

ET Exergame
The endurance training (ET) exergame represented a gamified training session (Figure 4) and included three endurance-based exercises ((A) a ball game, (B) high five and (C) hustle dance).In the ET exergame, the participants actively interacted with the trainer, in contrast to the SET one.During the ball game, the participants threw a virtual ball into a virtual ring held by the agent, Anna.Within the given time, Anna moved the ring into different positions.The participants caught the ball with the Valve Index controllers through a gripping movement of the fingers.By spreading the fingers in combination with a throwing motion, the ball was thrown out of their hands.In the high-five game, Anna held her hands in different positions.The participant had to mirror her movements and touch the circles on Anna's hands with the controllers.In the hustle dance exercise, the participants were taught various dance steps by the trainer, which they tried to imitate first without music and then with music.The participants learned a certain sequence of steps consisting of four dance elements: a basic step, a cross step, the Travolta and a mixer.The ball and high-five games lasted 2-3 min, and the hustle dance lasted 12 min.In the ball game and high-five exercises, a counter provided information about the number of ball hits and touches of the circles.As before, the Wizard of Oz method was used, and an automatic count was simulated.In addition, there was a time bar displayed during the two exercises behind Anna.As with the SET exergame, the participants could also look at the smartwatch during the exercises.

Participants
The sample included 22 participants with essential hypertension.There were two dropouts due to motion sickness experienced in the first moments of using the VR headset.The sample was composed of 13 female and nine male participants.The average age In the ball game and high-five exercises, a counter provided information about the number of ball hits and touches of the circles.As before, the Wizard of Oz method was used, and an automatic count was simulated.In addition, there was a time bar displayed during the two exercises behind Anna.As with the SET exergame, the participants could also look at the smartwatch during the exercises.

Participants
The sample included 22 participants with essential hypertension.There were two dropouts due to motion sickness experienced in the first moments of using the VR headset.The sample was composed of 13 female and nine male participants.The average age of the participants was 75.4 ± 3.6 years.No participants had any cognitive impairment measured with the Telephone Interview for Cognitive Status-TICS (37.3 ± 2.6 points) or an increased risk of falls measured with the Tinetti test (27.6 ± 0.8 points).Ten participants already had prior VR experience.The participants performed sports on average 2.5 times per week up to the time of the study.

Statistical Analysis
The data were analyzed with descriptive and inductive statistical methods using IBM Statistical Package for the Social Sciences (SPSS) Statistics 27.Based on the results, we presented descriptive data with means (M), standard deviations (SD), medians (Mdn) and 95% confidence intervals (CI).In the inductive analyses, each dataset was examined for a normal distribution.When the data were normally distributed, a paired t-test was applied.If there was no normal distribution, the Wilcoxon signed-rank test was employed.For all tests, statistical significance was set at an alpha level of 0.05.

Technology Usage Inventory
To assess the level of technology acceptance (intention to use) and its specific aspects for the two exergame versions, the Technology Usage Inventory was used.No significant differences in mean scores were found in any of the subscales (Table 1).A Wilcoxon signedrank test indicated that the median TUI post-test ranks of the VR-SET in the subscale usability, Mdn = 17.00, were not significantly different from the median TUI post-test ranks of the VR-ET in the same subscale, Mdn = 18.00,Z = −0.445,p = 0.656.On average, the intention to use was higher for the VR-SET exergame version, Mdn = 147.00,than in the VR-ET, Mdn = 132.00.However, no significant results could be achieved, Z = −0.261,p = 0.794.Thus, the acceptance did not differ between the guided instruction VR-SET and the gamified VR-ET exergame.

NASA-TLX
To determine the perceived task load, the NASA-TLX was performed (Figure 5).This paper will cover the weighted rating and both adjusted ratings of the NASA-TLX subscales relating to performance and frustration.The other subscales were reported by Vorwerg et al. [38].The performance subscale measures how successfully the participants rate their performance of the task.The perceived performance level was significantly higher (p = 0.010) in the VR-ET (M = 139.55,SD = 76.31)than in the VR-SET (M = 102.50,SD = 73.11).The frustration level measures how insecure and discouraged (higher score) or secure and content (lower score) the participant felt during the task.The frustration experienced was also significantly higher (p = 0.038) in the VR-ET (M = 62.50, SD = 115.99)than in the VR-SET (M = 3.86, SD = 12.72).The weighted rating did not show any significant differences.A Pearson correlation coefficient was used to assess the linear relationship between the subjective task load (NASA-TLX weighted rating for the SET and ET) and the usability (TUI scale "Usability" for the SET and ET).There was no significant correlation between the variables of the NASA-TLX weighted rating and usability for the SET (r(20) = −0.33,p = 0.146) and between the variables of the NASA-TLX weighted rating and usability for the ET (r(20) = 0.00, p = 0.987).Moreover, no significant correlation was found for either type of training when combined with the TUI scale "Intention to Use" and the NASA-TLX weighted rating or "Intention to Use" and the Frustration subscale of the NASA-TLX.

Immersive Tendencies Questionnaire
The immersive tendencies questionnaire was conducted prior to the task-based application during Visit 1.There were no significant differences between the sample (n = 22) and the reference sample (n = 94) based on either the subscales of the ITQ or the total score (Table 2).The mean total score was M = 65.59(SD: 10.75) with a maximum score of 126.

Immersive Tendencies Questionnaire
The immersive tendencies questionnaire was conducted prior to the task-based application during Visit 1.There were no significant differences between the sample (n = 22) and the reference sample (n = 94) based on either the subscales of the ITQ or the total score (Table 2).The mean total score was M = 65.59(SD: 10.75) with a maximum score of 126.

Presence Questionnaire
In order to measure the degree to which the participants experienced presence in a virtual environment, the presence questionnaire was used (Table 3).A Wilcoxon signed-rank test indicated that the median PQ ranks of the VR-SET in the total score, Mdn = 117.00,were significantly different from the median PQ ranks of the VR-ET, Mdn = 107.00,Z = −3.114,p = 0.002.Furthermore, significant differences were found in the following subscales of the PQ: realism (p = 0.008), possibility to act (p = 0.001), self-evaluation of performance (p = 0.046) and haptic perception (p < 0.001).   1 Without the "sounds" and "haptic" sub-scores.Results shown as arithmetic mean (standard deviation) and median (95% confidence interval).Total score maximum without "sound" and "haptic": 126.Inter-group differences were calculated by Wilcoxon signed-rank test.
A Pearson correlation coefficient was computed to assess the linear relationship between the immersive tendencies (total score of ITQ) and presence (total score of PQ for the SET and ET).There was a significant negative correlation between the variables ITQ and PQ for the SET (r(20) = −0.45,p = 0.037) and a weak negative correlation between the variables ITQ and PQ for the ET (r(20) = −0.16,p = 0.479).The findings of Witmer and Singer regarding high ITQ scores resulting in a greater presence (positive correlation) could not be replicated with our exergames.Furthermore, a Pearson correlation coefficient was calculated to assess the linear relationship between the usability (TUI scale "Usability" for the SET and ET) and presence (total score of PQ for the SET and ET).There was a significant positive correlation between the variables Usability for the SET and PQ for the SET (r(20) = 0.48, p = 0.023) and also between the Usability for the ET and PQ for the ET (r(20) = −0.56,p = 0.008).However, no significant correlation was found for either type of training when combined with the TUI scale "Intention to Use" and presence.

Discussion
The aim of this pilot study was to compare the usability and acceptance of two simulated MR exergames, which represented two training types (strength endurance training and endurance training).

Principal Findings
The usability was measured with a scale derived from the Technology Usage Inventory.There was no significant difference, according to this scale, between the SET and ET exergames.Therefore, Hypothesis 1 cannot be rejected, and the alternative hypothesis H2 must be rejected, which implies that the two exergames are equally perceived as simple and easy to understand.Compared to the reference group in Kothgassener's TUI study, both VR exergames (SET and ET) were rated more highly in terms of usability (>65 years old (y.o.): M = 14.06) [16].Preliminary investigations with small samples also demonstrated high usability in endurance training with immersive exergames [27].One of the few papers that examined strength endurance in VR showed that overall, 59.4% of middle-aged adults and 37.5% of older adults were very and moderately willing to perform the VR dumbbell exercise.The group provided positive feedback about the usability [51].
Acceptance was also measured with the TUI, so the intention to use was examined.Hypothesis 3 cannot be rejected, and Hypothesis 4 must be rejected.There was no significant difference according to the scale measuring intention to use between both exergames.In Kothgassener's TUI study [16], similar scores were reached in the same age group (>65 y.o.: M = 112.43)[16].Xu et al. [52] evaluated the technology acceptance model with older Chinese adults and VR exergames.The results showed that a more favorable attitude toward VR exergames was evident among older adults who are younger and retired, in addition to having higher education, better financial resources, and good health.This could be included as participant frame data in future research.In an investigation by Khundam [53], who did not examine older adults, the participants preferred aerobic (arm swings) more than strength exercise (squats) for a long period of exercise in VR.However, for a short period of exercise, they favored strength exercise more.
The subjective task load was examined with the NASA-TLX.There were no significant differences between the two exergames in the weighted rating.For this reason, Hypothesis 5 cannot be rejected.The similar results of the weighted task loads confirm the comparability of the two exergames.This would be useful if the two types of training were applied in a joint exercise program, as then a similar task load could be assumed, and an overload could be avoided.Exercise programs in a physiotherapeutic or sports medicine field include a variety of different training components, e.g., endurance, strength, mobility and coordination.Thus, both SET and ET could be present as single courses in one therapy program for hypertensive people.Kruse et al. [32] also used the NASA-TLX but not the weighted and adjusted ratings.They reported the NASA-TLX as the mean value, and by calculating the mean value for our dataset, we obtained values of M: 28.22,SD: 12.41 for ET as well as M: 39.36, SD: 8.41 for SET, which in comparison to those of Kruse et al. (M: 19.77, SD: 11.91) indicates a higher task load for our SET and ET.These results were to be expected considering the exercises in the two studies.
The results for the scale of the NASA-TLX measuring frustration were significantly higher for the ET than for the SET exergame.This leads to a rejection of Hypothesis 7, which implies we fail to reject alternative Hypothesis 8.One possible explanation is that the of the ET allows for a greater motivational impact as well as more action options and therefore more opportunities for potential errors that may not occur as frequently in a guided instruction setting (SET).This could lead to increased frustration for the gamified ET exergame.According to the results of the qualitative interviews described in the paper by Buchem et al. [41], the dance sequence was too fast for some participants, and the steps were too difficult.This could explain why frustration was rated more highly in the ET.In the future, a practice phase for the dance steps, as well as an application over a longer period of time, are planned.In this target group especially, it is necessary to introduce the users slowly to new technology.Another aspect that may have contributed to the higher frustration in the ET was the difficulty in handling the Valve Index controllers.These recognize finger movements, but the handling is a matter of habituation.The participants complained that they could not see their own hands in the ball game, which was an unfamiliar sensation.To achieve lower frustration, a longer learning period for the technology will be included in future studies, and thus, higher usability can possibly be achieved.They also complained about latency when throwing the ball, which did not correspond to the speed in reality.This also needs to be improved for the future prototype.
The results of the presence questionnaire total score indicated a significantly higher perception of presence in the SET than in the ET exergame.Thus, Hypothesis 9 must be rejected, and it is impossible for us to reject the alternative Hypothesis 10.Witmer assumed that a significant positive correlation existed between immersive tendencies (ITQ) and presence (PQ) [47].This could not be demonstrated with our sample.In the same sample analyzed by Stamm and Vorwerg [39], the participants reported more symptoms of VR sickness or cybersickness in the ET exergame than in the SET one.In their systematic review, Weech et al. [54] showed that the number of studies that found a negative correlation between cybersickness and presence outweighs those that identified a positive one.In our investigation, this would be consistent as presence was higher in the SET than in the ET and VR sickness was higher in the ET than in the SET.Stamm and Vorwerg [39] assumed that the increased movement in space could be a cause of a higher VR sickness in the ET.This could also explain why a lower level of presence was perceived during the ET.Since presence correlates with usability, the main finding for VR exergames would be that it may be beneficial, although no causality should be assumed, to aim for high presence in exergames with older people.
Apart from the data collected using the named assessments, a motion analysis via the Microsoft Azure Kinect ran during the training.Motion data were collected for the next development stage, which should enable individual exercise feedback in a future prototype version.One of the purposes of the dashboard used was to determine the individual moderate exercise intensity zone.This dashboard provides the interface to the VR application and the Polar M600 smartwatch used.By recording the real-time heart rate through the smartwatch, the individual intensity zone could be displayed on the virtual smartwatch of the participants.In the dashboard, the medical professionals can monitor the heart rate of the participants in retrospect or via a monitor; the VR application could be displayed in real-time to, e.g., therapists or physicians.With the integration of the exercise feedback, a further step towards the individualization of the training is aimed to be achieved and can lead to better health-promoting effects in hypertension.The future prototype should be compared with the current study results, as the planned changes may have an impact on usability and acceptance.

Limitations
Technical problems involving the exergame prototypes in the test sessions have not yet been completely avoidable with the current state of the functional prototypes.For example, the base stations lost the tracking of the HTC Vive headset due to participants moving too forcefully in the truck.These technical issues could have had an impact on usability and could also be associated with increased frustration.For future testing, the truck should be supported by construction props.Due to the limited ceiling height in the truck, the base stations, which are responsible for tracking the headset and the tracker/controller, had to be mounted in quite a low position.With taller participants, a brief loss of headset position was observed, which resulted in a black screen for a few seconds or a short period of loss in the tracking of the trackers/controllers when they were raised very high.This could potentially have affected usability and frustration assessments, even though the aforementioned phenomena did not occur frequently.Furthermore, the fixed order of the interventions could have had an influence on the results.
The experience with the VR system from the first day of the study could have had an influence on the handling and execution.By testing both training programs several times over a longer period of time or randomizing the order of the training program type with several participants, this could be avoided.Since the training types differed in terms of the exercises, comparability was not easy.The training programs were different in content, included various controllers and training tools, and had slightly dissimilar intensity and duration.In addition, because only subjects with stage I hypertension were included, the results are not generalizable to subjects with other hypertension stages.
The Tinetti Test, which includes a gait and balance examination, was used to rule out impaired mobility during walking.However, the range of motion of the shoulder joint was a problem in a few participants, which resulted in some tasks not being performed under a clean execution.For an extension of the prototype, an individual adaptation of the possible range of motion is important; otherwise, use is only possible without movement restrictions.

Conclusions
In the preliminary study, we compared the usability and acceptance of two exergames representing two training types, strength endurance training (SET) and endurance training (ET), in older adults with hypertension.The exergame prototypes developed were applied in "simulated MR" using a VR HMD.The results indicate that usability and acceptance are not related to the type of training when using MR exergames with older adults.The subjective task load also did not differ significantly between the two types of training in the simulated MR.Nevertheless, major dissimilarities were found in terms of frustration.Thus, frustration was significantly higher in the gamified ET.There were also large variations in relation to presence.Presence was perceived to be significantly higher in the SET.Moreover, ideas for future improvements became evident through the investigation.For example, a longer practice phase, especially for ET, would be useful and should lead to less frustration.The external environment is also important, so preventing the wobbling of the truck due to movements should eliminate tracking losses in the future and thus also lower the level of frustration and possibly increase the perception of presence.Whether the results are transferable with a real MR headset must be determined in further research with a larger sample.

Figure 1 .
Figure 1.The study took place in the VITALab.Mobile, a mobile VR/AR laboratory for case and field studies.

Figure 1 .
Figure 1.The study took place in the VITALab.Mobile, a mobile VR/AR laboratory for case and field studies.

Figure 2 .
Figure 2. (A) Participant in the VITALab.Mobile with the real training objects: chair and dumbbells; (B) recreated interior of the VITALab.Mobile in VR with the virtual equivalents of the training objects and the virtual trainer agent Anna.Both images (A,B) were taken independently of each other at different times; thus, e.g., the chair is not in exactly the same position in the images, which, however, was, the case during the test.

Figure 2 .
Figure 2. (A) Participant in the VITALab.Mobile with the real training objects: chair and dumbbells; (B) recreated interior of the VITALab.Mobile in VR with the virtual equivalents of the training objects and the virtual trainer agent Anna.Both images (A,B) were taken independently of each other at different times; thus, e.g., the chair is not in exactly the same position in the images, which, however, was, the case during the test.

19 Figure 5 .
Figure 5. Results of the adjusted ratings of the subscales and the weighted rating of the NASA-TLX.The white columns represent the strength endurance training, and the gray ones represent the endurance training.Significance: * p < 0.05, ** p < 0.01, *** p < 0.001.

Figure 5 .
Figure 5. Results of the adjusted ratings of the subscales and the weighted rating of the NASA-TLX.The white columns represent the strength endurance training, and the gray ones represent the endurance training.Significance: * p < 0.05, ** p < 0.01, *** p < 0.001.

Table 1 .
Results of the Technology Usage Inventory (TUI) for the strength endurance training (SET) and endurance training (ET).

Table 2 .
Results of the immersive tendencies questionnaire (ITQ) and a reference sample from the Université du Québec en Outaouais (UQO) Cyberpsychology Lab.

Table 2 .
Results of the immersive tendencies questionnaire (ITQ) and a reference sample from the Université du Québec en Outaouais (UQO) Cyberpsychology Lab.Canadian participants tested by the UQO Cyberpsychology Lab, Gatineau (Québec).Results shown as arithmetic mean (standard deviation).Inter-group differences were calculated by Wilcoxon signed-rank test.Effect size was determined by Cohen's d.

Table 3 .
Results of the presence questionnaire (PQ) for the strength endurance training (SET) and endurance training (ET).