Exploring the Benefits of Virtual Reality-Assisted Therapy Following Cognitive-Behavioral Therapy for Auditory Hallucinations in Patients with Treatment-Resistant Schizophrenia: A Proof of Concept

Background: Combining cognitive behavioral therapy (CBT) for psychosis with another psychosocial intervention comprising virtual reality (VR)-assisted therapy (VRT) may improve targeted outcomes in treatment-resistant schizophrenia patients. Methods: Ten participants having followed CBT were part of our comparative clinical trial comparing VRT to CBT and were selected at the end of the study as they desired to continue to achieve improvements with VRT (CBT + VRT). Clinical assessments were administered before/after treatments and at follow-ups. Changes in outcomes were examined using linear mixed-effects models. To gain a more in depth understanding on CBT + VRT, therapists’ notes, and open interviews on a sub-group of patients were qualitatively analyzed. Results: Findings showed that the sequence of both interventions was appreciated by all patients. Several significant improvements were found throughout time points on auditory verbal hallucinations, beliefs about voices, depressive symptoms, symptoms of schizophrenia and quality of life. Although most of these improvements were in similar range to those observed in our comparative trial, effects of CBT + VRT on depressive symptoms and symptoms of schizophrenia were larger than those found for either intervention alone. Conclusion: This proof of concept is the first to merge gold-standard CBT with VRT for treatment refractory voices and to suggest a certain synergistic effect.


Introduction
Schizophrenia, especially treatment-resistant schizophrenia (TRS), is a complex, severe and disabling psychiatric disorder, which poses a significant therapeutic challenge and current treatments show limited efficacy [1]. Notably, TRS contributes to a significant loss in patients' quality of life and is associated with a high economic burden [2]. Among the variety of debilitating symptoms, auditory verbal hallucinations (AVH) are the most reported form of hallucination [3]. Medication is generally helpful to treat these symptoms, however, up to 50% of patients are resistant to first-line Hence, there may be benefits for patients to have gained a set of skills and knowledge with CBT that may then be experientially applied in VRT.
The aim of this proof of concept paper is therefore to detail the benefits of combining CBT for voices followed by VRT (CBT + VRT) as part of our trial comparing the efficacy of VRT to CBT. In this exploratory study, we attempt to investigate the synergistic effects of both approaches in TRS patients on our primary outcome consisting of AVH and our secondary outcomes comprising beliefs about voices, psychiatric symptoms, and quality of life. Moreover, we gain insight into patients' perspectives of the individual therapies and the combination of both approaches.

Participants
Participants were part of our larger comparative clinical trial comparing VRT to CBT for voices (identifier number on Clinicaltrials.gov: NCT03585127). Particularly for this study, 10 participants assigned to the CBT arm and having done their corresponding follow-up assessment requested whether they could continue to achieve improvements by following VRT as well. Patients (≥18 years old) with refractory AVH and schizophrenia or schizoaffective disorder were recruited from the Institut Universitaire en Santé Mentale de Montréal (where the therapy was provided) as well as from the community. Patients were recruited if they had been hearing persecutory voices and did not respond to at least two antipsychotic trials. All patients continued to receive standard psychiatric care and agreed to withhold from changing existing antipsychotic use over the duration of the therapy sessions. Participants were excluded if they presented a neurological disorder, an unstable and serious physical illness, or a substance use disorder in the past 12 months and if they followed another CBTp in the past year (other than the one offered by our team). The trial was conducted in accordance with the Declaration of Helsinki and was approved by the institutional ethical committee (CER IPPM 16-17-06). Written informed consent was obtained from all patients.

Cognitive-Behavioral Therapy for Auditory Verbal Hallucinations
Patients first began by following CBT for AVH, which consisted of nine individual and weekly sessions of one hour. These sessions were administered in an individual format by a licensed psychologist trained in CBT by Dr. O'Connor who had trained 35 psychologists throughout his career [36][37][38][39][40][41][42][43]. The CBT program was derived and adapted from current evidence-based treatments for AVH. This therapy involved a succession of learning modules as well as voice journals (task assignments) and has been previously described elsewhere [44,45]. The first three sessions consisted of patients' anamnesis to set goals and learning about AVH. More particularly, in line with the cognitive model of hallucinations, voices were understood as being triggered rather than being related to the beliefs they held. Patients completed voice journals to reflect on their positive symptoms and associated triggers. The following three sessions focused on metacognition. In session 4, patients first learned about diverse attributional mechanisms and completed another voice journal to detect the beliefs that were the cause of their ill-being. In sessions 5 and 6, patients were taught to better interpret situations with the use of vignettes. The following two sessions were based on mindfulness exercises and they were encouraged to ask for feedback and to learn to observe. Patients learned to put forward alternative explanations to their most common beliefs about their hallucinations. Lastly, the therapy ended with a summary of learnings and relapse prevention.

Virtual Reality (VR)-Assisted Therapy
The therapy was delivered by an experienced clinician (AD) who has around seven years of experience as a psychiatrist and treated over one thousand patients with major psychiatric disorders including schizophrenia [46][47][48][49][50][51][52][53][54][55]. The therapy consisted on average of nine-weekly sessions. VRT has been previously described elsewhere [29,44,45]. Briefly, patients were first requested to create and personalize the face and voice of an avatar best resembling the person or entity believed to be the source of their most distressing voice. The following therapeutic sessions consisted of three phases. Pre-immersion: the therapist discussed the preceding week and determined the objective of the therapy session with the patients. Immersion: patients were immersed in the VR environment and encouraged to enter in a dialogue with their avatar animated in real time by the therapist. Post-immersion: the therapist debriefed the patient and evaluated their feelings of their immersive experience. In the first immersive sessions, the therapist would confront patients via the avatar by verbalizing personalized distressing utterances and encouraged patients to use their usual coping strategies. The avatar's interaction with the patient became gradually less hostile and more encouraging as sessions of VRT progressed. The patient generally became more empowered in the interaction they held with their avatar as the former developed more assertiveness and a higher self-esteem. In the final consolidation sessions, patients had the opportunity to apply what they had previously learned in the experiential sessions and to follow up on their initial objectives. Each therapy session was audio recorded and the recording was offered to patients.

Clinical Assessments
Clinical assessments were administered before and after each intervention and at follow-up periods by trained research psychiatric nurses. Follow-up periods for CBT could have been three, six or 12 months depending on when patients desired to undertake VRT. As for VRT, follow-up period was conducted at three months.
The primary outcome covered the overall severity of AVH as measured with the Psychotic Symptoms Rating Scale (PSYRATS-AH) [56]. This scale comprises 11 items evaluated by interview, which was divided into four factors (distress, frequency, attribution, and loudness). Secondary outcomes included beliefs about voices, general psychiatric symptoms, and quality of life. Patients' beliefs about their voices as well as the manner they cope with them were measured with the Beliefs About Voices Questionnaire-Revised (BAVQ-R) [57]. The BAVQ-R supports four subscales [58]: two subscales relating to beliefs (persecutory beliefs and benevolence) in addition to two further subscales that measure responses to the voices (resistance and engagement). Depressive symptoms were assessed with the Beck Depression Inventory-II (BDI-II) [59], which consists of a 21-item self-report inventory. Symptoms of schizophrenia were evaluated with the Positive And Negative Syndrome Scale (PANSS) [60]. This scale was separated into five symptom clusters (positive symptoms, negative symptoms, cognitive symptoms, hostility and excitement symptoms, and affective symptoms) [61]. Life satisfaction was evaluated with the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF) [62,63], which consists of a self-report scale of 14 items.
To gain a more in depth understanding on the combination of both therapies, we qualitatively examined the therapists' notes of each patient in addition to interviews that were held with a sub-group of patients. For the latter part, six participants who desired to give their opinion were interviewed after having followed both interventions. These interviews were open interviews with a loose guide provided to explore topics of interest. The central questions consisted on their perspectives on each individual therapy and their combination. When appropriate, the evaluator prompted patients to expand on interesting responses. Patients' evolution of their voices in addition to their quality of life were also assessed. These interviews were recorded and elements pertaining to the main topics of interest were transcribed. French excerpts were then translated into English.

Analyses
Statistical analyses were performed with SPSS Statistics for Windows (Version 25, IBM, Armonk, NY, United States). Descriptive statistics were conducted on baseline data. Changes in reported outcomes during the assessment periods were assessed using a linear mixed-effects model with maximum-likelihood estimations for missing data. Time points were defined as follows: T1 = baseline CBT, T2 = post-CBT, T3 = follow-up CBT/baseline VRT, T4 = post-VRT and T5 = three-month follow-up VRT. Since the follow-ups for the CBT arm differed between patients, we took the overall mean of follow-up periods as T3 value. The statistical threshold for significance was set at p-value < 0.05. Effect sizes were categorized as small (0.2), medium (0.5) and large effects (>0.8) [64].
As for qualitative data, the therapists' notes and patient verbatim were annotated for any elements pertaining to changes in symptomatology, self-reflections, and overall comments on individual therapies in addition to their combination. As many headings as necessary were systematically written down to describe all aspects of content. The headings were collected, categorized into related concepts, and grouped under higher order themes to develop an integrated representation of data. Higher order themes were in line with key therapeutic processes of the interventions. For CBT, these themes were based on the topics discussed during the intervention (e.g., normalization of voices, coping strategies) and for VRT, the themes were based on prior qualitative analyses (e.g., emotion regulation, self-perceptions) [34,35].

Sample Characteristics
Sample characteristics are found in Table 1. Overall, there was a greater proportion of men (80%), all were Caucasian, and most were single (90%) with a mean age of 43.4 years (Standard deviation (SD) = 14.6). The mean duration of schooling was 11.2 years (SD = 2.7). Most patients held a diagnosis of schizophrenia (80%) and the rest (20%) held a diagnosis of schizoaffective disorder. The mean duration of illness was of 16.4 years (SD = 10.9). All patients were treated with atypical antipsychotics with over two thirds of participants (70%) being prescribed clozapine. There was a mean delay of 6.8 months between the end of CBT and start of VRT.  Table 2 summarizes the results from time points and associated effect sizes of measured outcomes.  As for the primary outcome, there was no statistically significant effect for the CBT portion (T1 to T3) for the severity of AVH as assessed with the PSYRATS-AH. Effects were of moderate magnitude (d = 0.427). Effects were statistically significant for the VRT portion (T3 to T5) for overall AVH, with most significant reduction found for AVH frequency (p = 0.008). The effect for the severity of AVH was additionally of moderate magnitude (d = 0.606). There were statistically significant reductions throughout overall time points for the combination on CBT + VRT treatment on the severity of AVH (p = 0.015), AVH frequency (p = 0.010), attribution (p = 0.019) and loudness (p = 0.047). A trend towards significance was observed for AVH distress (p < 0.1). Effects were notably significant between baseline CBT and follow-up VRT for all AVH outcomes. The effects of CBT + VRT on AVH were large (PSYRATS-AH-Total score d = 1.043; PSYRATS-AH-Distress d = 0.898; PSYRATS-AH-Frequency d = 0.859; PSYRATS-AH-Attribution d = 1.020; PSYRATS-AH-Loudness d = 0.946). There were four patients considered as treatment responders to CBT + VRT, which was defined as a decline of at least 20% on the total score of the PSYRATS-AH.

Treatment Efficacy
Concerning beliefs about voices measured with the BAVQ-R, there was no significant pre-post-treatment difference on overall beliefs for the CBT portion with the effect size being of small magnitude. A trend was, however, found for pre-to follow-up CBT (p = 0.089), yielding to a moderate effect size (d = 0.408). As for sub-scales, there was a significant pre-post reduction on persecutory beliefs (p = 0.049) and a significant pre-to follow-up reduction in engagement (p = 0.001) for CBT. Regarding the latter outcome, there was then a significant increase in engagement between pre-post as well as pre-to follow-up VRT (p = 0.010 and p = 0.028, respectively). The merge CBT + VRT showed a significant diminishment in overall beliefs about voices between baseline CBT to post VRT (p = 0.037) yielding to a moderate effect size between these time points (d = 0.461). Beliefs about voices then increased towards baseline value at follow-up VRT (T5).
Depressive symptoms measured with the BDI-II significantly diminished throughout time (p = 0.001). There were pre-post and pre-to follow-up reduction for CBT (p = 0.002 and p = 0.030, respectively) yielding to a large effect size post treatment (d = 0.806) and a medium effect size at follow-up (d = 0.467). Depressive symptoms additionally significantly diminished for the VRT portion with a significant pre-to follow-up reduction being observed (p = 0.015). The merged therapies showed a significant diminishment between baseline CBT to post-(p = 0.004) and follow-up (p < 0.001) VRT with the effect being of large magnitude (d = 1.020).
Although CBT showed no significant differences for overall symptoms of schizophrenia as measured with the PANSS, the VRT portion did yield to significant pre-post-treatment reductions (p = 0.003). A trend was also observed between pre-VRT and follow-up VRT (p = 0.097). As for the sub-scales, significant findings were found for pre-post VRT on negative symptoms (p = 0.024) and excited/hostility symptoms (p = 0.009); a trend was also observed for positive symptoms (p < 0.1). Overall, general symptoms significantly diminished throughout all time points (p = 0.013). The effect size for overall symptoms from baseline CBT to follow-up VRT was of large magnitude (d = 0.806). Moreover, there were principally statistically significant effects from baseline CBT to post VRT for most subscales. There were also significant effects of CBT + VRT throughout time points for disorganized (p = 0.022) and excited/hostility (p = 0.011) symptoms and a trend for positive symptoms (p < 0.1).
Lastly, CBT showed no significant effects on quality of life. The VRT portion did reach significance from pre-VRT to follow-up VRT (p = 0.045) yielding to a moderate effect size (d = 0.451). In addition, there was a trend towards significance for quality of life between baseline CBT to follow-up VRT (p < 0.1). There was an overall small effect for quality of life from baseline CBT to follow-up VRT (d = 0.331).

Qualitative Perspectives on Interventions
To complement and go beyond the statistical findings, the therapists' notes and patient interviews found relevant elements that emerged for each therapy and their merge. First, CBT for voices was found to help patients gain better awareness into their illness and talk about their experience to better accept their diagnosis and themselves. For some, this was the first time they had the opportunity to discuss their experience in a non-judgmental environment. Patients noted that the therapist dived deep "into (their) pride," "allowed to shed light in the fog" and "put words on blurry things." The therapy therefore helped to clarify their experience. A large portion of CBT was aimed towards learning new strategies such as relaxation and mindfulness to diminish the breadth of voices and better manage/control voices and stressful periods when they arose. Hence, many patients verbalized being able to better manage their emotions and they felt more confident.
Second, similarities were found with VRT. The therapy helped patients better accept themselves and their voices, while also learning to better manage their emotions. Patients noted that VRT helped to embody their voice and make their experience come to life by enabling a direct discussion with their voice (e.g., "the avatar was truly there" and they "had to face it"). A patient stated that "putting a face to the voice" was one of their favorite parts of the therapy. The therapy helped them not only improve their dialogue with their voices (i.e., self-assertiveness), but these gains also extended to their overall life by making them more open with others, thereby improving interpersonal relations. VRT allowed many patients to forgive themselves and accept past events that occurred in their life (sexual abuse, for instance). Although the initial confrontation sessions were emotionally difficult, several patients noted a feeling of liberation following therapy sessions. One patient even verbalized that the therapy enabled him/her to "chase away the demon inside (him/her)". Lastly, patients discussed their hopes about their voices and life. For some, VRT helped make them ready to undertake new life plans, such as returning to school and work.
Third, when looking back into both therapies, patients expressed their complementarity and found the sequence to be the best option. As noted, CBT was axed on "questioning" and "comprehension", while VRT was axed on "taking action" and "dialogue". CBT allowed patients to obtain the necessary tools to be able to then apply them into VRT in a more direct manner. Accordingly, one patient stated that "in retrospect, both therapies were necessary and complementary; the sequence was ideal because CBT helped to prepare the tools needed for VRT, and mostly helped with the difficult confrontation sessions of VRT."

Discussion
With the limited efficacy of CBTp, it has become more established that the combination of gold-standard CBT with another empirically based psychotherapy may be valuable to address the needs of patients with schizophrenia [65], and more so those with treatment resistance. This proof of concept study aimed to explore the usefulness and effectiveness of combining CBT for voices and VRT in patients with TRS. The sequence of both low-intensity interventions was found to be appreciated by all patients. CBT + VRT was beneficial for patients with several significant improvements being found. Although most of these improvements were in similar range to those observed in our comparative trial [45], effects on certain symptoms such as depressive symptoms were larger than those found for either intervention alone.
Concerning our primary outcome, results on CBT + VRT showed that there were large reductions from baseline to three-month follow-up treatment on overall AVH severity (d = 1.043), including distress, frequency, attribution, and loudness. Moreover, CBT + VRT achieved large effects in line with the effects reported in our comparative trial [45] and prior trials on VR therapies for AVH [28][29][30]. The CBT portion yielded to moderate effects (d = 0.427). These results are in line with our hypothesis stating that CBT + VRT may achieve greater efficacy on AVH in comparison to the small-to-moderate effects that have been observed in literature on generic CBTp [66]. Results were more mitigated on beliefs about voices as observed in our comparative trial [45]. Although not significant, there was a trend towards significance for overall beliefs about voices for CBT and the effect was of moderate magnitude (d = 0.408). More precisely, in line with the therapeutic targets of CBT, the intervention decreased persecutory beliefs as well as engagement. Engagement then increased with VRT. This is in accordance with CBT aimed at changing beliefs about voices and learning non-relational coping strategies [67] and VRT aimed at increasing a dialogue with voices. In accord to our comparative trial [45], the effect of CBT + VRT was at best of moderate magnitude. Moreover, although not statistically significant for overall time points, there was an effect on quality of life mostly after having followed VRT. This is consistent with our previous trials showing improvement in quality of life with VRT [29,45] and findings demonstrating that CBT does not generally improve quality of life [68].
Of interest, there were notable improvements observed for depressive symptoms (d = 1.020) and overall symptomatology of schizophrenia (d = 0.806), with most effects for excited/hostility and disorganized symptoms. These effects were larger than those observed in our comparative trial for either of the interventions [45]. Moreover, there were additional benefits to the combination of both interventions with significant findings being found for positive symptoms, negative symptoms, and disorganized symptoms. This is suggestive of a certain synergistic effect of combining both interventions. Although results are not clear-cut, such synergistic effects of combining both approaches were verbalized by patients as well. These findings are expected given the emphasis of CBT on learning to normalize the psychotic experience and gain new skills to manage stress and emotions in combination with VRT that then enables patients to experience strong emotions (e.g., anxiety, fear, and anger) during the dialogue with their voices to directly learn to regulate them.
Our exploratory paper is the first to combine gold-standard CBT with VRT for the treatment of refractory AVH in patients with schizophrenia and has shown some preliminary evidence for a synergistic effect of CBT + VRT mainly on depressive symptoms (BDI) and symptoms of schizophrenia (PANSS). This proof of concept study highlights the possible benefits of having gained knowledge on the psychotic experience and acquired a set of skills with CBT that may then be applied in a more experiential manner within emotion inducing VRT. Nevertheless, this study has limitations that should be acknowledged. First, the principal limitation consists the small sample size that limits the generalization of results. A larger fully powered study would be necessary to see whether statistically significant effects remain or whether trends towards significance would become significant. Secondly, the effectiveness of the combination is likely to be influenced by the timing of each treatment. Since patients were part of our larger comparative trial, patients began VRT at differing intervals. While most began after the three-month CBT follow-up assessment, some began after longer-term follow-ups, which may explain the lessening of the effects of CBT in time. Keeping in mind personalized treatment, patients commenced VRT when they felt CBT treatment gains were consolidated and they were ready to undertake the emotion-inducing sessions of VRT. Third, we did not have a control group consisting of the inverse sequence, that is patients first following VRT and then CBT. However, as noted in patients' interviews, they found the sequence of CBT followed by VRT to be more complimentary as they gained insight into their illness and then could put into practice their knowledge within the VR dialogue. Additionally, we did not have any patients in the trial desiring to follow CBT after their VRT treatment. The lack of control group was fulfilled with our results from our comparative trial on the individual interventions [45].

Conclusions
To conclude, despite the small sample of the study, we nonetheless found preliminary promising results suggesting the usefulness of merging CBT followed by VRT. The effects of sequencing treatments in this manner may be more than additive since each would enhance the effectiveness of the other. This was observed notably for depressive symptoms and symptoms of schizophrenia. Moreover, these interventions derive from common rationale and are viewed as complimentary with each having their specific therapeutic targets. Firstly, by beginning with CBT, patients may learn to establish links between their thoughts, feelings, or actions with respect to their symptoms and the accompanying dysfunctions [69]. CBT is therefore considered a more op-down approach with its goal aimed to address problematic thinking patterns and core beliefs that may contribute to emotional distress and maladaptive behaviors. CBT for voices thus relies on a considerable degree of abstract self-reflection [70]. In sum, it is aimed at normalizing the psychotic experience, providing a range of alternative explanations, developing a shared understanding of the voices, changing the appraisal of the voices, testing unhelpful beliefs, reducing unhelpful coping strategies and increasing good coping strategies (i.e., mindfulness) [31]. The approach could be enhanced when followed by an experiential psychotherapy, such as VRT, to maximize its effectiveness. VRT uses additional methods for working with voices within the wider context of one's view of themselves, of relationships with others/voices, and self-narratives [22]. Additionally, the visualization of the avatar/the voice may facilitate the process of validating the experience and modifying the flow of dialogue with the voice through sessions, while altering the voice-hearer relationship [69]. As an experiential therapy, VRT primarily focuses on how patients relate with their voices by targeting self-esteem, emotion regulation and acceptance rather than challenging beliefs about voices. In this vein, this approach is thus considered as a more bottom-up approach. VRT may also allow patients to test/challenge coping mechanism directly in the VR environment with their avatar, while being encouraged to try new strategies throughout the therapy. Combining treatment components in this way and specific order may not only engage multiple treatment targets and possible pathogenic mechanisms but is also likely to have a synergistic effect through the interaction of the individual treatment components [19]. This was confirmed with the interviews held with patients. Particularly for the combination of CBT with VRT, patients may first gain more effective top-down regulation mechanisms with CBT and then they may improve their bottom-up mechanisms with VRT. VRT highlights the future of patient-tailored approaches that integrates several processes relevant to potentially improve the effectiveness of CBT for voices. Future research should nevertheless aim to develop the best merged approach to combine CBT and VRT.