Preliminary Study of Efficacy and Safety of Self-Administered Virtual Exposure Therapy for Social Anxiety Disorder vs. Cognitive-Behavioral Therapy

Social anxiety disorder (SAD) is one of the most frequent mental disorders. Exposure to virtual reality can be a solution complementing standard CBT (cognitive-behavioral therapy) or can be used as an independent therapeutic tool. The study’s objective was to assess the safety and efficacy of using self-administered virtual reality exposure vs. CBT and CBT with virtual exposure. We assessed the efficacy of the applied intervention with the Leibowitz Social Anxiety Scale (LSAS). We compared three groups: CBT (n = 25), CBT + VR (n = 29), and self-administered therapy without aid of a therapist (n = 19). The results indicated that all three groups showed changes on the LSAS. The simple effect analysis showed that there were no differences between experimental conditions at T0 (session 1) and T1 (session 9) and that the only significant difference occurred at T2 (session 14). The pairwise comparisons showed that the participants in the VR condition scored higher on the LSAS score during the measurement at T2 than participants in CBT condition. Our study has several limitations. The presented initial study shows that the methods of CBT for social anxiety used so far are also effective, while the VR tool for self-therapy requires further research.


Introduction
Social anxiety disorder (SAD) is one of the most frequent mental disorders, with 7-9% of the general population being affected [1]. It most often starts in early adulthood and has a chronic course [2]. The vast majority of people report symptoms before 18 years of age [3,4]. SAD is characterized by a marked and persistent fear of one or more social situations. The anxiety is related to the fear of being criticized. The fear in this situation is disproportionate to the real threat posed by the situation. Patients often avoid social stimuli or experience severe anxiety. To be diagnosed with social anxiety, a patient should report the symptoms for longer than 6 months [5] The disorder is associated with impairment in vital domains of daily life, such as occupational/academic and family functioning, relationships, and social activities [6]. Individuals with social phobia, particularly those with the generalized subtype, often show a high degree of comorbidity with other anxiety and affective disorders [7] as well as with alcohol abuse [8]. Usually, social phobia precedes the onset of comorbid conditions. It is a serious social and clinical problem, which implies that people experiencing its symptoms also have the feeling of lower quality of life [9]. Cognitive-behavioral therapy (CBT) is a proven and well-documented treatment for social phobia [10][11][12]. Following the early formulation of cognitive models for social anxiety disorder by Clark and Wells [13] and Rapee and Heimberg [14], a wide range of CBT protocols have been developed. Though most of these have proven effective, the most researched treatment is a combination of exposure and cognitive restructuring [15]. Treatment approaches to SAD include cognitive behavior therapy [16,17], exposure group therapy [18], in vivo exposure therapy [19], and recently virtual reality exposure therapy (VRET) [20]. The use of in vivo exposure is based on models of fear development that implicate the learned nature of particular fears and the instrumental role that avoidance plays in maintaining anxiety. In the exposure treatment of social anxiety disorder, the patient develops an exposure hierarchy, or a list of feared situations, which ranges from situations provoking moderate to extreme anxiety. Using this hierarchy, patients are encouraged to systematically expose themselves to their feared situations and to stay in the situation until their anxiety has subsided [21].
Several published meta-analyses have examined CBT for social anxiety disorder. They compared exposure plus cognitive restructuring with exposure-only treatments for SAD. In this meta-analysis, the two types of CBT were similarly effective, but a higher number of exposure sessions was related to a better outcome [22,23]. The most recent meta-analysis of CBT for SAD included 32 RCTs with a pooled total of 1479 participants. The authors found that CBT produced better posttreatment outcomes than wait-list, psychological placebo, or pill placebo [24]. There are three approaches to implementing exposure therapy in social anxiety disorders: in vivo exposure, imagery exposure, and virtual reality exposure.
Virtual reality (VR) has become an interesting alternative for the treatment of SAD and can constitute an alternative to in vivo and imagery exposure. VR provides a human-computer interaction that allows patients to feel a sense of presence and immersion in a virtual environment, offering an opportunity to expose clinically anxious individuals to realistic life scenarios, thereby reducing their reactivity to anxiety-provoking cues. The impact of VR technologies is discussed in many studies and meta-analyses [25][26][27][28]. Powers et al. were the first to demonstrate in a healthy sample that a virtual reality conversation task led to a similar increase in feelings of anxiety in participants as an in vivo conversation task [29]. For this and many other reasons, the virtual social environment is being used more and more often. VR exposure can be less time-consuming, requires less work in the organization of an exposure situation, and gives the therapist control over the context and intensity of the exposure [30]. Research results are divergent as to whether virtual exposure is as effective as in vivo exposure [31]. The Chesham meta-analysis reported a significant effect of VRET for SAD in comparison to the waiting list and no difference between VRET and CBT [32]. Wechsler et al. published a meta-analysis on RCTs, specifically comparing the efficacy of VRET to in vivo exposure in anxiety disorders [33]. The comparison revealed a small but nonsignificant effect size favoring in vivo exposure. However, some results suggest the opposite. For example, the study conducted by Bouchard et al., which involved two active conditions (VR and in vivo exposure) and a control group (waiting list), showed that VR exposure was more effective and that the treatment effects were still measurable during the follow-up six months after the completion of the study [34]. The main limitation of the study was the small size of the groups. Thus, assessing the efficacy of virtual exposure compared to in vivo exposure requires further investigation.
The patient's involvement in the virtual exposure, and thus the impact of this reality on the habituation process, depends on many factors, including those related to the technological advancement of the virtual environment and factors related to the patients themselves [35]. In the literature to date, there are preliminary studies on the use of therapeutic programs with virtual reality but without the aid of an actual therapist in the treatment of social phobia [36]. Those studies were primarily exploratory research. In one such study, there was a system of home autotherapy in which patients used virtual reality exposure and the support of an e-therapist [37]. Such methods can be treated as innovative approaches to therapy requiring additional efficacy evaluation in clinical studies. The various benefits patients can derive from social exposure in virtual reality as autotherapy are very interesting and helpful in therapeutic practice. We would like to explore the safety, efficacy, and usefulness of the tool for self-administered VR exposure compared to the standard therapeutic approach to a social anxiety disorder.
The aim of this study was to evaluate the effectiveness and safety of the self-exposure tool in virtual reality in patients diagnosed with social phobia in comparison to active groups (CBT and CBT + VR). The efficacy was assessed using the Leibowitz Anxiety Scale, and the safety assessment focused on the occurrence of simulator disease symptoms during exposure. The self-administered VR exposure is a tool that is operated by the patient without the support of a therapist.

Design of the Study
The study was a randomized, open-label, single-blind, controlled trial. The study was preregistered at clinicaltrial.gov under number NCT03895957 and was approved by the Bioethics Committee of the Regional Medical Chamber in Warsaw (KB/1214/19). In the study, we decided to compare the efficacy of the therapeutic effects in three parallel groups:

1
A self-administered VR group (experimental group), where the patients were gradually exposed to social situations in virtual reality. The patients had several exposures at their disposal in VR, which they selected themselves, taking into account the severity of the anxiety. The patients underwent the therapy without any kind of help or intervention from a therapist. 2 A CBT + VR group (experimental group), where virtual reality was used in the CBT protocol. Within this arm, a therapy analogous to that which took place in the CBT group was carried out, while the exposure in virtual reality replaced the exposure in the patient's imagination. 3 A CBT group (active control group) in which work was based on a cognitive-behavioral therapy protocol.
Regarding the purposes of the therapy, a therapeutic protocol was prepared, on the basis of which cognitive-behavioral therapists conducted their sessions. The therapy was based on the Clark and Wells model (1995). In this arm, the protocol assumes that exposure to social situations takes place in the patient's imagination.

Participants
We planned our sample size based on the results of Yoshinaga et al. [38]. We estimated an effect size of 30 points (SD = 30) on the Liebowitz Social Anxiety Scale [39], our main efficacy measure. The results of a power analysis for a repeated-measure analysis of variance conducted in G*Power (v. 3.1; f = 0.4, α = 0.05, β = 0.8) showed that we had an 81% chance of correctly rejecting the H0 of no significant effect of the interaction (time vs. group) with a total sample of 78 participants (26 participants per group). We assumed a 15% dropout rate and estimated our required sample size to be 30 participants per condition.
All participants were Polish. The participants were recruited via social media, e-mails, and a dedicated website (www.tomorrow.pro/vrmind (accessed on 12 August 2022), and they were then qualified for the study on the basis of the inclusion and exclusion criteria. The inclusion criteria were: age (18-50 years old), confirmed SAD (diagnosis via DSM IV-TR criteria) lasting for at least two years, stable pharmacological treatment-no change in pharmacological therapy during the three months prior to the study, and signed informed consent. The exclusion criteria were: psychoses, bipolar affective disorder, mental disability, pregnancy, addictions, attending a therapeutic session under the influence of alcohol, (ongoing) treatment by a neurologist for chronic CNS disease, epilepsy, seizure dizziness, the presence of (current) suicidal thoughts, tendencies, or attempts, and currently undergoing CBT therapy. The participants were randomly assigned to each group, CBT, CBT + VR, and VR, based on block randomization. Randomization took place using computer software. Randomization was carried out after the end of the visit (T0), consisting of a mental condition assessment. See Figure 1 for an overview of the randomization procedure, and for the sample characteristics per condition, see Table 1. Thirty-nine men and fifty-two women participated in the study. There were no age differences within our sample based on gender (F (1,84) =0.13, p = 0.71, influence of alcohol, (ongoing) treatment by a neurologist for chronic CNS disease, epilepsy, seizure dizziness, the presence of (current) suicidal thoughts, tendencies, or attempts, and currently undergoing CBT therapy. The participants were randomly assigned to each group, CBT, CBT + VR, and VR, based on block randomization. Randomization took place using computer software. Randomization was carried out after the end of the visit (T0), consisting of a mental condition assessment. See Figure 1 for an overview of the randomization procedure, and for the sample characteristics per condition, see Table 1. Thirty-nine men and fifty-two women participated in the study. There were no age differences within our sample based on gender (F (1,84) =0.13, p = 0.71, ŋp 2 = 0.00), experimental condition (F (2,84) = 0.04, p= 0.95, ŋp 2 = 0.00), or the interaction of these factors (F (2,84) = 2.55, p = 0.08, ηp 2 = 0.05) (one person from the VR arm did not provide information about age). The proportion of gender did not differ by condition (χ 2 = (2, N = 91) = 0.01, p = 0.99).  , (ongoing) treatment by a neurologist for chronic CNS disease, epiness, the presence of (current) suicidal thoughts, tendencies, or atly undergoing CBT therapy. The participants were randomly assigned , CBT + VR, and VR, based on block randomization. Randomization puter software. Randomization was carried out after the end of the of a mental condition assessment. See Figure 1 for an overview of the edure, and for the sample characteristics per condition, see Table 1. fifty-two women participated in the study. There were no age differple based on gender (F (1,84) =0.13, p = 0.71, ŋp 2 = 0.00), experimental 0.04, p= 0.95, ŋp 2 = 0.00), or the interaction of these factors (F (2,84) = .05) (one person from the VR arm did not provide information about of gender did not differ by condition (χ 2 = (2, N = 91) = 0.01, p = 0.99).
session under the influence of alcohol, (ongoing) treatment by a neurologist for chronic CNS disease, epilepsy, seizure dizziness, the presence of (current) suicidal thoughts, tendencies, or attempts, and currently undergoing CBT therapy. The participants were randomly assigned to each group, CBT, CBT + VR, and VR, based on block randomization. Randomization took place using computer software. Randomization was carried out after the end of the visit (T0), consisting of a mental condition assessment. See Figure 1 for an overview of the randomization procedure, and for the sample characteristics per condition, see Table 1. Thirty-nine men and fifty-two women participated in the study. There were no age differences within our sample based on gender (F (1,84) =0.13, p = 0.71, ŋp 2 = 0.00), experimental condition (F (2,84) = 0.04, p= 0.95, ŋp 2 = 0.00), or the interaction of these factors (F (2,84) = 2.55, p = 0.08, ηp 2 = 0.05) (one person from the VR arm did not provide information about age). The proportion of gender did not differ by condition (χ 2 = (2, N = 91) = 0.01, p = 0.99).

Measures
We used a Polish adaptation of the Structured Clinical Interview for Axis I Disorders DSM-IV-TR [40] as the main screening tool for confirming SAD. SCID-I is a semi-structured clinical interview aimed at diagnosing Axis I disorders according to the DSM classification-IV-TR.
We used the Clinical Global Impression Scale (CGI) [41] to assess the severity of SAD. CGI is a clinician-completed, single-item, seven-point scale used to evaluate the severity of illness, where one is labelled as "Normal, not at all ill" and seven is labelled as "Among the most extremely ill patients". In addition to the CGI, we also used the Patient Global Impression Scale (PGI), where the patient defines his subjective experience related to his current well-being and functioning (where one is labelled as "Normal" and four is labelled as "Severe"). The Liebowitz Social Anxiety Scale [39] was used as the primary outcome measure. The scale is composed of 24 items depicting various social situations. For each item, participants assess their fear (from 1, -"No fear", to 4, "Severe") and avoidance (from 1, "Never", to 4, "Usually"). The LSAS has three scores, summing up the results for the particular items: fear (0-72), avoidance (0-72), and the total score (0-144). The internal consistency of this scale in our study was excellent (Cronbach's α = 0.87-0.97).
The widely used Beck Depression Inventory (BDI) [42], Clinician Global Impressions of Improvement (CGI-I), and Patient Global Impression of Change Scale (PGICS) [41] were applied as secondary outcome measures. CGI-I is a clinician-completed scale that measures the change in the severity of symptoms, ranging from 1, labelled as "Very much improved", to 7, labelled as "Very much worse". The PGICS is a patient-completed scale that measures the change in the severity of symptoms, where one is labelled as "No change" and seven is labelled as "Very much improved". Finally, the BDI is a 21-question multiple-choice self-report inventory for measuring the severity of depression. Each answer is scored on a scale value of 0 to 3. A higher total score indicates more severe depressive symptoms. In our sample, the BDI reliability was excellent (Cronbach's α = 0.88-0.93).
To assess the severity and occurrence of simulator sickness, we used the Simulator Sickness Questionnaire (SSQ) [43]. The questionnaire contains 26 items. Participants assess the occurrence and severity of each symptom with four labels (0-"none", 1-"slight", 2-"moderate", and 3-"severe"). Individual scores were corrected for baseline (pre-VR) symptom severity. Only items with a "post-VR-pre-VR" increase contributed to the final SSQ score. A total SSQ score equal to or higher than ten was used as a preliminary cut-off for simulator sickness [44].
In addition to the above, we gathered information about the use of VR systems. The computer system recorded information about each VR scenario, including its duration and the participant's speaking time (active input on a microphone during periods when participants were asked to speak). The VR system asked the participant to assess his/her subjective units of distress (SUD; 0-100) before and after each exposure (after the exposure, participants provided feedback on their actual state as well as on the maximal perceived SUD during the exposure). SUD is a subjective measure of perceived fear Brain Sci. 2022, 12, 1236 6 of 17 in a certain situation. The scale range is from 0 ("Totally relaxed) to 100 ("Highest distress/fear/anxiety/discomfort that you have ever felt"). The scales used in the study are presented in Table 2.

Treatment
The study used proprietary software called VR Mind TM and VR Mind+ TM , which uses a virtual helmet. The software consists of nine therapeutic scenarios (job interview, public speaking in an auditorium, speaking at a meeting in a conference room, purchasing a ticket at a railway station, restaurant visit, telephone call in a public place, train compartment, returning goods in a shop, and a social call) and a training scenario (learning controls, movement, etc.). Each therapeutic scenario has three levels of difficulty.
In the study in the VR and CBT + VR arm, the following equipment was used: HTC VIVE VR goggles with 6 degrees of freedom (6-DOF), a viewing angle of over 100 degrees, a viewing width of 110 degrees, and a refresh frequency of 90 Hz. The computer used included a gtx 960/8 GBRAM card, Windows 10, a 256 GB disk, and a motherboard compatible with an Intel i7 9600K processor.
The therapy in the CBT and CBT + VR group was conducted by certified cognitive behavioral therapists.
Prior to the examination, the therapists received 10 h of training on the manner in which to conduct the therapy based on the protocol. Then, during the entire time of working with the patients, they performed supervisions (2x a month, 4 h each), and their task was to assess the compliance of the session with the protocol. During the study, the monitoring of events/side effects of the conducted interventions was carried out (CBT, CBT + VR, and VR). In the CBT control group and experimental CBT + VR group, the intervention was based on a protocol created on the basis of the Clark and Wells model, and therapeutic sessions were held twice a week, each lasting 45 min (slightly shorter than a standard CBT session). In total, the patients had 12 therapeutic sessions (Table 3).
Due to the shorter duration of the therapy, we decided to omit two typical elements of the therapy present in the CBT model: experiments checking social standards and the use of video recordings during the experiments.

Procedure
After entering the laboratory, the participant was asked to become familiar with basic information about the research and to sign the informed consent. The clinical evaluation was performed by an independent clinician in order to confirm the SAD diagnosis conducted by the SCID and filled-in CGI. In the next step, the participant completed a series of questionnaires in a fixed order: LSAS, BDI, PGI, and SSQ-PRE. The physician did not know which group the patient would be assigned to. After the therapy, the patients were re-examined by the clinician, who was not informed which group the patient belonged to. Then, the participant took part in the first VR training session where he/she became familiar with the VR environment. Shortly after finishing the training session, the participant filled in the SSQ-POST. At the end, the computer software randomly assigned the participant to one of the conditions. The duration of the first session was 90 min. The subsequent sessions, from 2 to 13, were held according to the session plan of the therapeutic protocol for each arm ( Table 2). Each session lasted 45 min. In the CBT + VR and VR arms, the SSQ was administered during each session. In the CBT arm, the SSQ was administered during sessions 7, 8, 10, 11, and 12. After the 9th session (T1), patients were re-examined with the Leibowitz Social Anxiety Scale. The last session (T2) was the final assessment of the effectiveness of the proposed impacts. During this meeting, the patients filled in the following questionnaires: LSAS, BDI, PGI-CS, and CGI-I.

Statistical Analysis
The results were calculated with the jamovi statistical package and the GAMLj library [39]. Due to high dropout rates in the VR group (36%), we decided to switch from a traditional within-subjects ANOVA to mix models, as they are more suitable for longitudinal data with missing values and the unbalanced design of our main efficacy measures. We used the Satterthwaite method to estimate the degrees of freedom. The dependent variables followed normal distributions as did the model residuals [45].

T0-The Severity of SAD Symptoms
Before the main analysis, we checked if there were any differences at T0 between the conditions in the severity of SAD symptoms and the participants' everyday functions. The analysis of variance showed no differences in the CGI and PGI scores (see Table 4 for an overview of the comparisons and descriptive statistics).

Efficacy
We built a mixed model with time of measurement as a fixed factor and a random intercept for the total LSAS score using the Satterthwaite method for degrees of freedom. The model revealed a significant effect of session (F(2,149.31) = 54.71, p < 0.001). The estimated coefficient showed a reduction of −23.73 (95% CI: −28.23 to −19.23) points in the total LSAS score between the first and the last sessions.
In the second step, we added an experimental condition as a fixed factor, and in the third step we allowed for the interaction of both variables. The main effect of condition was nonsignificant (F(2,87.96) = 1.04, p = 0.355), but the interaction between time of measurement and condition showed significant differences in the total LSAS score (F(4,144.9) = 2.65, p = 0.035). The significant interaction, jointly with the ratios of the marginal and conditional pseudo R 2 s and AIC criterion, supported the model with main effects of condition and time of measurement and their interaction (see Tables 5 and 6 and Figure 2). Table 5. Means and standard deviations for time of measurement of primary and secondary efficacy outcomes. T0  T1  T2  T0  T1  T2  T0  T1  T2 LSAS-anxiety 44 The simple effect analysis showed that there were no differences between experimental conditions at session 1 (F(2,122.56) = 0.35, p = 0.700) and session 9 (F(2,135.32) = 0.21, p = 0.808) and that the only significant difference occurred at session 14 (F(2,133.9) = 206 4.15, p = 0.002). The pairwise comparisons showed that the participants in the VR condition scored higher on the LSAS score during the measurement at session 14 by 19.76 (SE = 7.25) points than participants in CBT condition (t(153.271) = 2.74, p = 0.007). There were no other significant differences between conditions.   The simple effect analysis showed that there were no differences between experimental conditions at session 1 (F(2,122.56) = 0.35, p = 0.700) and session 9 (F(2,135.32) = 0.21, p = 0.808) and that the only significant difference occurred at session 14 (F(2,133.9) = 206 4.15, p = 0.002). The pairwise comparisons showed that the participants in the VR condition scored higher on the LSAS score during the measurement at session 14 by 19.76 (SE = 7.25) points than participants in CBT condition (t(153.271) = 2.74, p = 0.007). There were no other significant differences between conditions.

CBT CBT + VR VR
In the next step, the LSAS subscales were analyzed to determine the source of differences in the LSAS total score. We built a separate mixed model for each subscale. The anxiety subscale results were similar to the results obtained on the LSAS total score. For the avoidance subscale, the pattern of results was also similar. However, the interaction between time of measurement and condition did not meet the conventional significance threshold (Figure 3). In the next step, the LSAS subscales were analyzed to determine the source of differences in the LSAS total score. We built a separate mixed model for each subscale. The anxiety subscale results were similar to the results obtained on the LSAS total score. For the avoidance subscale, the pattern of results was also similar. However, the interaction between time of measurement and condition did not meet the conventional significance threshold (Figure 3).

Safety Assessment
In order to assess safety, we used the SSQ. For each session in which the level of the simulator sickness was measured, the difference between the second and the first SSQ measurement was calculated (measurement II-measurement I). We summed up all positive differences and all values above ten were classified as having simulator sickness [34].

Safety Assessment
In order to assess safety, we used the SSQ. For each session in which the level of the simulator sickness was measured, the difference between the second and the first SSQ measurement was calculated (measurement II-measurement I). We summed up all positive differences and all values above ten were classified as having simulator sickness [34]. According to the adopted algorithm, out of 973 therapeutic sessions in which the SSQ measurement was performed, there were 11 results indicating the occurrence of symptoms characteristic of simulator sickness. The results indicated that the symptoms of simulator sickness occurred in three people in the VR arm and one person in the CBT + VR arm. Additionally, four people in the CBT condition reported symptoms similar to simulator sickness. In seven out of eight people, the symptoms of simulator sickness appeared once. One person in the VR group experienced four episodes. However, it did not result in withdrawal from the therapy. One male in the VR condition who experienced an episode of simulator sickness in session #3 did not appear in session #4 and withdrew from the study. Due to the low number of cases with simulator sickness symptoms, no further statistical analysis was conducted.

Secondary Endpoint-Efficacy
According to the assumptions of the study, secondary endpoints were also determined, and these results were obtained using Clinical Global Impression Improvement (CGI I), Patient Global Impression of Change (PGIC), and the Beck Depression Inventory (BDI).
In the case of the CGI-I and PCGI measurements, we used a one-way variance model with a three-levelfactor, which is the experimental condition (CBT vs. CBT + VR vs. VR). The results of the ANOVA for CGI-I (1, "Very much improved", and 7, "Very much worse") showed a significant effect of the experimental condition (F (2,70) = 6.49, p = 0.003,  (Table 6).
Finally, we checked if the groups differed on the BDI scores before and after the intervention. In order to accomplish this, we built a mixed model for the BDI total score with session as fixed factor (session 1 vs. session 14) and a random intercept. The result showed a significant effect of session (F(1, 90) = 56.6, p < 0.001), with an intercept estimate equal to 13.2 (95% CI: from 11.36 to 15.03) and estimate reduction of −7.5 points (95% CI: from −9.54 to −5.06) between session 1 and session 14. Adding group and the group and session interaction did not reveal any other significant effects. This means that the BDI total score reduction was comparable within experimental conditions.

VR System
In the VR condition, 30 participants launched a total of 654 scenarios. The mean time of a single exposure was 7.42 min. The three most frequently selected scenarios were: public speaking in the auditorium (112), job interview (88), and buying a ticket at a train station (68). The mean speaking time was 1.65 min. The average subjective units of distress were equal to 18.12 before exposure, 18.37 after exposure, and 23.63 at maximum during the exposure. In comparison, in CBT + VR, 31 participants launched a total of 184 scenarios. The mean time of VR exposure was 7.85 min. The most frequent scenarios were public speaking in the auditorium (52), job interview (32), small talk meeting (28), and speaking in a conference room. The average subjective units of distress were equal to 29.70 before exposure, 28.60 after exposure, and 40.65 at maximum during the exposure.

Discussion
Studies on the use of virtual reality in patients with social phobia are more and more common, and their number has recently increased. The available meta-analyses are ambiguous. One of them indicates that VRET (virtual exposure therapy) is an acceptable method of social phobia therapy and brings a long-lasting effect. The long-term effectiveness of VRET may decrease compared to in vivo and in sensu exposure [46]. In his review, Emmelkamp indicated that there are no significant differences between CBT and VRET, but there is a need for further studies to assess the effectiveness of VRET as a stand-alone therapy [20]. On the other hand, the Chesham study shows that there are no differences between VRET and in vivo and in sensu exposure, which makes these types of exposure equivalent in the treatment of patients with social phobia. The study by Zainal et al. evaluated the effectiveness of VRET vs. the waiting list, and this study favors VRET [47]. Pellisolo's study shows that VRET can be seen as an alternative to CBT and SSRI therapy [48].
In the described study, the results confirmed the lower effectiveness of the selfadministered VR therapy compared to the standard CBT and CBT + VR therapy. Similar to Emmelkamp's postulate, we believe that the autotherapy tool, VR, requires an assessment of efficacy against an inactive control group and then an analysis of the result and an evaluation of the strengths and weaknesses of the tool. Another important issue is the use of exposure as part of standard CBT therapy. Research indicates that many therapists do not have full control over the conducted exposure, whether in sensu or real life, and the intensity of the exposure may be insufficient [49,50]. The therapist does not know whether the patient is practicing the exposure at the time of performing the exercise and cannot fully assess its intensity or the intensity of the emotions experienced by the patient. In addition, many studies show that exposure exercises are not chosen by the therapist as frequently as they should due to the difficulty in their planning and implementation [51]. Virtual reality allows therapists to be more precise in planning social exposure in terms of the number of people in the virtual reality or the difficulty of the given social situation, which enables them to make use of the interactive nature of the exercise. It is also possible to analyze the level of anxiety through a system of data collection during VR exposure.
A major limitation in the described study is the lack of assessment of the patients' "sense of presence" in virtual reality. "Sense of presence" is defined as the interpretation of the artificial environment as if it were real. Many studies view this construct as a mechanism that can make VR an effective tool for exposure therapy [52]. This parameter is one of the factors that affects the extent to which the patient is able to expose himself or herself to virtual scenes. Information from the observation of the patients' reactions in our study has led us to the conclusion that some patients did not feel a sufficient sense of presence in the displayed scenes. However, this aspect requires further investigation.
Patients in the VR self-administered therapy group were more likely to drop out of the intervention prematurely than those in other groups. This prevented us from achieving the research goal of analyzing 30 individuals. At the initial planning stage of the study, we did not consider such a high dropout rate in this study condition. Our tentative hypotheses assumed: (a) a lack of motivation to stay in the situation of exposure, which may be linked to a lack of therapeutic alliance, (b) excessive anxiety triggered by the scenes, (c) difficulties in overcoming avoidance, and (d) a low level of sense of presence in VR. Dropouts in the VR self-administered therapy group were not associated with aggravated simulator sickness.
The study showed that the use of virtual reality exposure using the HTC VIVE virtual helmet is safe. Symptoms of simulator sickness occurred only in a small number of cases and were of low to moderate severity, which did not require interruption of the exposure. What is interesting is the false-positive results recorded in the pre-and post-SSQ questionnaires in the CBT group. These results can be interpreted in two ways. On the one hand, it is probable that the applied SSQ scale does not measure the symptoms of simulator sickness accurately enough, and physiological data should be additionally taken into account [53]. On the other hand, the symptoms of anxiety during the exposure could have been interpreted by the patients as symptoms of simulator sickness. Based on these false-positive results, one might be tempted to conclude that the applied scale does not objectively measure the severity of the symptoms of simulator sickness. A recent meta-analysis presented similar conclusions [54]. When interpreting the total SSQ scores, according to Kennedy et al. [43], scores between 10 and 15 indicate significant symptoms, and scores from 15 to 20 indicate severe symptoms, while scores above 20 are indicative of simulator sickness. It should, however, be stressed that the given values were established for military personnel using flight simulators, and the scores may vary in the population at large. Furthermore, SSQ scores tend to be higher in other virtual environments compared to flight simulators [43,55]. Since the SSQ is a self-report scale, participants may not always faithfully reflect the severity of a given symptom. Taking into account all the shortcomings of the scale used, in the VR condition only three people from the VR self-administered therapy group reported symptoms of simulator sickness (one of them confirming the symptoms on four occasions), while in the CBT + VR condition only one participant reported symptoms. This indicates a low severity of symptoms of simulator sickness and confirms the safety of the system used.
Future research should assess the effectiveness of the VR autotherapy tool compared to the inactive control group in order to verify the effectiveness of the method. We assume that the tool could be used by people who have problems reaching a psychotherapist. Moreover, an important direction of research would be the identification of people who could benefit from such a tool in a special way. Another interesting issue that requires further research and deeper reflection is the high dropout rate in the group treated with the autotherapy tool. The project requires further development in order to improve the technical parameters of the virtual reality.

Limitations of the Study
There are three main limitations of this study. The first is the lack of follow-up. It is not clear how robust and long-lasting the therapeutic effects are or if there are any differences across the conditions. Second, we did not screen participants for the dropout motives, and consequently our results could be biased in any direction. For example, it is possible that our participants discontinued the therapy as a consequence of lack of motivation and positive reinforcement from the therapist, low perceived therapy efficacy, or due to reaching a certain level of habituation (i.e., they were no longer perceiving anxiety during VR exposition). In future studies, detailed analyses of dropout motives should be applied. Finally, our sample size was limited to a total of 91 participants. Future research would benefit from larger sample sizes.

Conclusions
The results of the presented study indicate that in the three research groups there was a change according to the Leibowitz Social Anxiety Scale (primary outcome) between the measurement points T0 and T2. The worst effect in this respect was obtained in the self-administered VR group, while the results in the other two groups (CBT and CBT + VR) were comparable. Due to the insufficient size of the studied groups and the fact that the research is burdened with insufficient statistical power, we treat the results as a trend that should be studied further. The effects in terms of the LSAS avoidance subscale were similar. The VR self-administered therapy group obtained significantly higher scores on the anxiety level, as measured by the LSAS anxiety subscale. The compared types of therapy can therefore be considered effective in our study group, and the change in exposure type (virtual reality vs. in sensu) did not significantly affect the effectiveness of the intervention in the CBT vs. CBT + VR conditions. The results described in this paper indicate that the use of virtual reality as part of CBT therapy does not significantly affect its efficacy, although it does affect some of its aspects.
Author Contributions: I.S., K.H., P.M., P.B., T.P. and S.M. participated in the study design and the interpretation of data. I.S. participated in the data collection and the interpretation of the data and helped to draft the manuscript. K.H. analyzed the data. All authors have read and agreed to the published version of the manuscript.
Funding: This work was funded by the National Centre for Research and Development and Tomorrow Ltd. within the "Fast Track" program (grant number POIR.01.01.01-00-0636/16-00).
Institutional Review Board Statement: The study was approved by the Bioethics Committee of the Regional Medical Chamber in Warsaw (KB/1214/19). The study participants were informed about the study procedure and potential risks, and consent to participate in the study was given by signing an informed consent.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The database supporting the conclusions of this article will be made available by the authors upon request while following the institutional regulations.