Clinical Relevance of Immersive Virtual Reality in the Assessment and Treatment of Addictive Disorders: A Systematic Review and Future Perspective

(1) Background: Virtual reality (VR) has been investigated in a variety of psychiatric disorders, including addictive disorders (ADs); (2) Objective: This systematic review evaluates the current evidence of immersive VR (using head-mounted displays) in the clinical assessment and treatment of ADs; (3) Method: PubMed and PsycINFO were queried for publications up to November 2020; (4) Results: We screened 4519 titles, 114 abstracts and 85 full-texts, and analyzed 36 articles regarding the clinical assessment (i.e., diagnostic and prognostic value; n = 19) and treatment (i.e., interventions; n = 17) of ADs. Though most VR assessment studies (n = 15/19) showed associations between VR-induced cue-reactivity and clinical parameters, only two studies specified diagnostic value. VR treatment studies based on exposure therapy showed no or negative effects. However, other VR interventions like embodied and aversive learning paradigms demonstrated positive findings. The overall study quality was rather poor; (5) Conclusion: Though VR in ADs provides ecologically valid environments to induce cue-reactivity and provide new treatment paradigms, the added clinical value in assessment and therapy remains to be elucidated before VR can be applied in clinical care. Therefore, future work should investigate VR efficacy in randomized clinical trials using well-defined clinical endpoints.


Introduction
Addictive disorders (ADs), including both substance use disorders (SUDs) and behavioral addictions, are among the most prevalent psychiatric conditions with the highest global mental disease burden besides depression [1]. Globally, the prevalence of ADs varies across various substances: alcohol (4.9%), psychoactive drugs (0.2-3.5%), problematic gambling (1.5%) and tobacco use (22.5%) [2]. Though evidence-based treatments for ADs are available, these are on average only moderately effective, with around 50% relapse rates despite treatment in clinical practice [3,4]. New treatment modalities are therefore urgently needed, especially for patients that do not profit from conventional therapies.
A rather novel approach in the treatment of psychiatric disorders, including ADs, is the application of virtual reality (VR) [5]. VR is commonly described as a computergenerated simulation of a three-dimensional environment, which aims to immerse the user using special electronic equipment [6]. Typically, head-mounted displays (HMD) are used, allowing the user to feel immersed and present in a virtual environment (VE) [7]. It is thought that VR could be of great potential for both the assessment and treatment of psychiatric disorders [6,8].
VR research in the mental health field has focused predominantly on the application in anxiety disorders, such as phobias, social anxiety, post-traumatic stress disorder and obsessive-compulsive disorder, using VR exposure therapy (VRET) [9]. Through VRET, patients are systematically confronted with fear-inducing stimuli to remove the conditioned psychological response. VRET has been found to be as effective as in-vivo exposure, showing the potential of VR technology in anxiety disorders [10]. Generally, the application of VR is reported to be well-tolerated and safe in several target groups [6,11].
Various studies investigated VR in the context of ADs, mainly using cue exposure paradigms similar to anxiety disorders [6,[12][13][14][15][16]. In cue exposure paradigms, patients are confronted with substance-related situations, and stimuli to elicit cue-reactivity in an ecologically valid manner [17]. Cue-reactivity refers to a conditioned response, such as subjective craving and psychophysiological responses (skin conductance, heart rate and temperature), when exposed to addiction-related stimuli [18]. The level of experienced craving during cue exposure has been linked to the severity of ADs, as well as the risk of relapse after initial abstinence [19].
Although VRET has been proven clinically effective in anxiety disorders, scientific evidence for its effectiveness in ADs is mixed [6,12]. Three recent systematic reviews summarized the evidence for VR applications in ADs [12,15,16]. Ghiţă and colleagues [15] focused on both assessment of craving and treatment possibilities with VR in alcohol use disorders. The authors conclude that there are some promising preliminary results regarding VR for both the assessment and treatment of ADs, but the study quality was found to be poor due to heterogeneity in study samples, small sample sizes and a lack of follow-up data. Trahan and colleagues [16] focused on the effectiveness of VRET in tobacco and alcohol use disorder. The authors also conclude that the number of studies is low, with limited scientific rigor. Segawa and colleagues [12] focused on the assessment of cue-reactivity and treatment of various ADs with VR. They conclude that the VRET studies show heterogenous results and identify several methodological shortcomings, although no systematic quality assessment was applied. The authors reported positive results in provoking craving through VEs, as well as promising results of learning coping strategies as part of VR cognitive-behavioral therapy (VR-CBT). Though these interventions use VEs to expose patients to AD-related stimuli, their principles are not based on the ET paradigm described above, but rather on providing a more ecological valid environment to train these new skills.
The three systematic reviews on the application of VR in the treatment of addiction cover literature published until March 2019. Since then, several new papers with improved methodology have been published, including larger samples, a control group and the investigation of relevant clinical variables [20][21][22][23]. Previous reviews focused predominantly on the assessment of cue-reactivity. Although it has been shown that VR is a suitable environment for inducing and measuring cue-reactivity, these reviews did not address clinical correlates of VR-induced cue-reactivity, and therefore lack insight into the value for clinical use. Furthermore, several publications were not identified in the review by Segawa and colleagues [12], probably because certain databases, such as PsycINFO, were left out of the search strategy.
It is important to note that the VR field evolves rapidly, including various technological advances. All three systematic reviews on VR in ADs included studies with variable types of VR technology, including non-HMD devices. Non-HMD devices, like 3D displays with shutter glasses, are no longer regarded as immersive VR. Furthermore, the review papers cited above do not provide sufficiently detailed descriptions of the VR technical set-up. These issues limit the validity and generalizability of the conclusions in the previous reviews.
Given the limitations of previous reviews and the high speed of development in the VR field, including its application in ADs, the current review aims to evaluate the clinical relevance of VR in the assessment and treatment of ADs. To do so, we reviewed literature investigating VR-technology as a clinical assessment or intervention tool in patients with ADs, exclusively incorporating studies using an HMD. Specific research questions include: (1) What is the diagnostic/prognostic value of VR-induced cue-reactivity for the clinical assessment of patients with Ads; and (2) What is the effectiveness of VR in the treatment of patients with ADs?

Method
This systematic review was carried out in accordance with the PRISMA statement for reporting systematic reviews in healthcare [24]. We utilized the PICOS framework to formulate our research questions and identify eligible data for analysis [25].

Eligibility Criteria
The population considered in this systematic review were adolescents or adults with SUD, behavioral addiction or daily/heavy substance use. Only immersive VR applications that utilize an HMD for the assessment or treatment of ADs were included. Given the developmental level of the VR field in ADs, we applied rather broad inclusion criteria and as little exclusion criteria as possible (see Table 1).

Search Strategy
The electronic databases PubMed and PsycINFO were searched and checked by two independent authors (JH and SL) for papers published until November 2020 using the MeSH terms and keywords: (virtual) AND ((addictive) OR (addiction) OR (substance) OR (alcohol) OR (cocaine) OR (cannabis) OR (opioid) OR (tobacco) OR (nicotine) OR (methamphetamine) OR (GHB) OR (crack) OR (gaming) OR (gambling)). In addition, we conducted a backward citation search to identify articles not retrieved through the database search.

Study Selection
Studies were selected in three steps after conducting the literature search by following the PRISMA flow diagram (see Figure 1). All selection steps were conducted by two independent reviewers (JH, SL). First, duplicates were removed and titles were scanned based on the eligibility criteria. Afterwards, the abstracts of remaining articles were scanned to identify potentially eligible articles. In the last step, full texts of the remaining articles were scanned to exclude studies that did not meet the inclusion criteria. Any discrepancies and/or disagreements in the process between the two independent reviewers were resolved by discussion and consultation of a third reviewer (JV), where applicable. Interrater reliability was calculated for the selection steps using Cohen's Kappa.
(methamphetamine) OR (GHB) OR (crack) OR (gaming) OR (gambling)). In addition, we conducted a backward citation search to identify articles not retrieved through the database search.

Study Selection
Studies were selected in three steps after conducting the literature search by following the PRISMA flow diagram (see Figure 1). All selection steps were conducted by two independent reviewers (JH, SL). First, duplicates were removed and titles were scanned based on the eligibility criteria. Afterwards, the abstracts of remaining articles were scanned to identify potentially eligible articles. In the last step, full texts of the remaining articles were scanned to exclude studies that did not meet the inclusion criteria. Any discrepancies and/or disagreements in the process between the two independent reviewers were resolved by discussion and consultation of a third reviewer (JV), where applicable. Interrater reliability was calculated for the selection steps using Cohen's Kappa.

Quality Appraisal
Quality of diagnostic studies was assessed by checking whether 1) discriminative power of VR assessment was determined by means of sensitivity, specificity, predictive values, Area Under the Curve (AUC), of the Receiver Operating Characteristic (ROC) curve. If so, it was examined whether the 2) populations studied were representative for clinical populations and 3) comparison with a golden standard was performed.
Quality of effectiveness studies was assessed using the International Working Group Recommendations for Methodology of Virtual Reality Clinical Trials in Healthcare [26]. In this framework, VR1 studies focus on content development, VR2 studies on feasibility, acceptability, tolerability and initial clinical effects and VR3 studies on efficacy. VR3 studies provide the strongest level of evidence. VR1 studies on content development were

Quality Appraisal
Quality of diagnostic studies was assessed by checking whether (1) discriminative power of VR assessment was determined by means of sensitivity, specificity, predictive values, Area Under the Curve (AUC), of the Receiver Operating Characteristic (ROC) curve. If so, it was examined whether the (2) populations studied were representative for clinical populations and (3) comparison with a golden standard was performed.
Quality of effectiveness studies was assessed using the International Working Group Recommendations for Methodology of Virtual Reality Clinical Trials in Healthcare [26]. In this framework, VR1 studies focus on content development, VR2 studies on feasibility, acceptability, tolerability and initial clinical effects and VR3 studies on efficacy. VR3 studies provide the strongest level of evidence. VR1 studies on content development were not within the scope of this paper. Criteria for quality assessment of VR2 effect studies include representativeness of patient population, sample size, selection of clinically relevant Patient-Reported Outcome measures and pre-post measurements [26]. Quality of VR3 studies was assessed based on (1) representativeness of the population, (2) use of an empirically validated treatment comparison, (3) follow up of clinical outcomes, (4) sample size and power and (5) use of randomization and/or a control group.

Data Extraction
The data extraction template was developed based on the Cochrane data extraction sheet for intervention reviews and pilot-tested prior to data extraction [27]. Data were extracted by five reviewers (LDFM, JvM, BD, WM, JV) and checked for accuracy and completeness by a second reviewer (SL). The following information was extracted for each study: (a) publication (author(s), year, country of origin, publication type), (b) methods (aim of study, duration of study, study design), (c) participants (e.g., sample size, control group, dependence severity, age, gender, ethnicity, comorbidities, inclusion/exclusion criteria, (d) assessment/treatment (procedure, setting, provider information, comparators, assessment instrument, follow-up, time-points measured, cues in environment, multisensory, technological aspects), (e) outcome measures of interest (clinical outcomes and secondary outcomes, response-rate, drop-outs, and (f) risk of bias and study quality. In case papers were inconclusive regarding methods and results, the corresponding authors were contacted to elucidate the issues.

Data Synthesis
A narrative approach was used to synthesize the findings because of the heterogeneity in terms of study design, methods, assessment and treatment approaches, as well as (clinical) outcome measures (Tables 2-4). The results are summarized, describing and explaining the study characteristics and outcomes in text and tables. In the result section, the findings based on this data synthesis were separately described for assessment and treatment studies. A table with definitions and descriptions of concepts in VR in general, VR technology, cue-reactivity in VEs, cues in VEs, and VR treatment approaches can be found in the supplementary materials (Table S1 in Supplementary Materials).

Study Selection
The electronic database search ( Figure 1) identified 5021 records of interest. After removing the duplicates, 4519 records remained, which were screened for eligibility. After screening for title and abstract, 4437 studies were excluded. Finally, 82 full-text articles were assessed for eligibility. Furthermore, three additional resources were identified through the backward citation search for eligible papers, and 49 papers were excluded because the studies did not use immersive VR through HMDs (n = 20), ineligible outcomes were reported (n = 18), an ineligible study design (case studies, study protocols) was employed (n = 9), population without AD or daily/heavy use (n = 1), or the paper was written in a different language than English (n = 1). Cohen's Kappa for the title and abstract screening (eligible, ineligible, maybe) was substantial (K = 0.65, CI = 0.59-0.71; K = 0.62, CI = 0.50-0.75, respectively). A total of 36 studies were included in the review and divided into assessment (n = 19) and treatment (n = 17) studies.

General Description of the Included Studies
We identified 19 papers presenting relevant findings toward VR as a tool in the clinical assessment of ADs ( Table 2). All studies reported a relation between cue-reactivity and one or more clinical parameters, thereby providing insight in the diagnostic value of VR-induced cue-reactivity (in assessing e.g., addiction severity). However, two studies specifically analyzed the discriminative power (e.g., sensitivity, specificity, predictive values), and are therefore considered the most informative regarding the diagnostic possibilities of VR in ADs [23,28].       All publications assessed the ability of (multiple) VEs (substance-related and or neutral) to induce cue-reactivity using a single session (n = 17) or more sessions (n = 2). Studies reported single-group (n = 7) and between-group (n = 14) designs with moderate sample sizes (n = 11-665; median = 40).
Participants used tobacco (n = 8), alcohol (n = 7), methamphetamine (n = 3), or were gaming participants (n = 1). The reported mean age of subjects ranged from 18 to 43 (m = 31.8), and men (range = 18-100%, m = 63%) were more represented than female. AD criteria were reported in 10 studies, using the screening instruments Fagerström Test for Nicotine Dependence (FTND, n = 4) and Cigarette Dependence Scale for nicotine dependence (CDS, n = 1), as well as Alcohol Use Disorders Identification Test (AUDIT, n = 3) and Alcohol Dependence Scale (ADS, n = 2) for alcohol dependence. Units per day/week/month were reported in 13 studies. The mean AD severity was heterogeneous across studies.
The VEs varied greatly regarding several aspects, including cues utilized (proximal, contextual and complex) and the exposure time in the VE (3-150 min). Multisensory VEs were utilized using visual, auditory, olfactory and haptics (n = 2), visual, auditory and olfactory (n = 4), visual and auditory (n = 12) stimuli, while possibilities to interact with the VE were reported in 12 papers. The VEs were presented using an HMD (n = 18) vs. smartphone HMD (n = 1) showing computer-generated (n = 16) or 360-degree videos (n = 3).

VR Findings
Most studies (15/19) reported one or more significant associations of VR-cue-reactivity with clinical parameters (see Table 2). Two-thirds of the studies that showed significant associations reported the strength of the associations (n = 10). Eight indicated moderate to strong associations between a single cue-reactivity parameter and clinical status. Correlation coefficients ranged from r = 0.45-0.78 (n = 4) and effect sizes from d = 0.35-1.46 (n = 4).
Two studies reported statistical parameters on the discriminative power of cuereactivity measures [23,28]. These two studies combined multiple cue-reactivity parameters -based on ECG (HRV), EEG, and/or skin response (GSR)-into one or more classifiers. In the study of Ding et al. [23], the AUC value ranged from 0.95-0.97, indicating excellent discrimination between methamphetamine patients and healthy controls. In the study of Wang et al. [28], the best classifier showed positive and negative predictive values of 90% and 83% respectively when discriminating between methamphetamine-dependent patients and healthy controls.
Two studies that used the FTND to assess the level of tobacco dependence found significant correlations with cue-reactivity. Most studies on alcohol that used VR-induced craving (5/7) found a significant correlation with clinical status. One study evaluating cue-reactivity in VR versus smartphone found positive results in favor of smartphone use.

Quality of the Studies
VR techniques and procedures highly varied between studies. Comparisons between different techniques or procedures within studies were not made, thereby limiting the ability to make further design decisions based on these studies. Only the studies of Wang et al. and Ding et al. [23,28] tested the discriminative or diagnostic value of VR-induced cue-reactivity (for methamphetamine addiction). However, the included study populations were not representative for the general population, considering the prior probability of substance use, which will be much lower than 42-50%.

Treatment Studies Using a VR Exposure Therapy Paradigm
About half of the studies (n = 10/19) used a VRET approach ( Table 3). Half of these used VRET as a stand-alone intervention, the rest used it as an add-on to another intervention (e.g., CBT or mindfulness). Most VRET studies (n = 9/10) provided multiple VRET sessions (range = 5-15, each lasting 20-50 min). VRET studies focused on tobacco use (n = 8), alcohol use (n = 1) and gambling (n = 1).
In the VRET studies, participants were exposed to a combination of discrete or proximal cues (e.g., a lighter or cigarette) in a typical VE (e.g., a café). Mostly, the user could interact with objects or agents in the VE (e.g., refusing a cigarette when offered). The VEs were mostly visual and auditory (n = 8) supplemented with olfactory or haptics (n = 2). All VEs were presented using an HMD, showing computer-generated environments (n = 9) or 360-degree videos (n = 1). All VRET studies used either craving or urge to gamble as (one of the) outcome measurements.
Studies using substance use as an outcome measure (n = 6/10) showed positive effects of VRET in two studies [46,56]. Only Pericot-Valverde et al. [46] reported effect sizes regarding number of cigarettes per day and air expired CO levels (ηp 2 respectively 0.82 and 0.49). Two studies found no effect on substance use [49,52]. Lee et al. [51] found a reduction of cigarettes smoked during the morning but not on a daily basis. Finally, two studies showed negative effects of VRET on substance use [47,49].
Studies using other addiction-related variables generally showed no effects of VRET on severity scores of dependence or withdrawal [47,51,55,56]. One study found that readiness to quit increased in smokers allocated to the VRET group, compared to a control group [56]. However, the VRET group in this study also received mindfulness, peer-topeer and conditional support, while the control group only received a smoking cessation manual without any support. Giroux et al. [54] found no change in perceived self-efficacy in gamblers.
Finally, four studies reported on effects of VRET on treatment retention, with two showing positive effects [21,56] and two showing no clear effect [49,55]. Of note, Goldenhersch et al. [56] reported a very high number of completers (93%) in the experimental condition, which they attributed to the use of strategies to enhance adherence, such as SMS text messaging and phone call reminders.

Treatment Studies Using Other VR Paradigms
The other studies (nine studies, reported in seven papers) used a variety of treatment paradigms other than VRET (Table 4). These studies focused on tobacco use (n = 3), alcohol use (n = 2), methamphetamine use (n = 2) and gambling (n = 2). In five of these studies, the VR intervention was a stand-alone intervention, in four VR was provided as an add-on to another intervention (e.g., CBT). Most studies (n = 7) provided multiple VR sessions (range = 2-10, each lasting 6-60 min) and two studies used a single VR session [50,57]. Participants were exposed to a combination of proximal cues, in a fitting contextual VE and mostly (six of the studies) complex VEs, in which the user could interact with objects or agents in the VE. Only two studies applied a passive paradigm [50,58]. Most VEs included multisensory cues (mostly auditory (n = 4) or auditory, olfactory and/or haptics (n = 2)). Studies used computer-generated (n = 6) or 360-degree videos (n = 1).
Most non-VRET studies (n = 8) used several different VEs (range 2-6) to expose the participants to multiple ecologically valid VEs. None of the studies used an individualized hierarchy. One study applied a generic hierarchy in a virtual bar or casino and guided participants progressively, approaching machines where they could gamble whilst applying various CBT techniques at each step [45].
Three studies used complex aversive stimuli (e.g., scenes of vomiting in the subway, police arrest, substance use-related illness) paired with nicotine, alcohol or methamphetamine use, respectively, to motivate participants to reduce unwanted behavior (aversive learning) [22,50,59]. Girard et al. [53] instructed participants in the experimental group to find and crush up to 60 virtual cigarettes in a VE. In contrast, control participants crushed balls instead. We categorized this approach under the term 'embodied learning'. Two studies used VR to train coping skills to deal with respectively nicotine craving and gambling urges in a CBT framework, with gradually increasing difficulty [45,58].
Finally, one study used VR to assess drinking behavior, psychological factors (emotion regulation and self-esteem) and social factors (relational competence and social pressure on drinking behavior) [57]. During immersion, the researcher would ask questions like: "Imagine you have just drunk a glass of wine, how do you feel?; Would you call anyone from your family?", in order to evoke coping-related imagery, negative memories of a relapse or increase motivational status.
Studies using substance use (both tobacco) as outcome measure showed positive effects of the VR intervention in terms of abstinence rates, confirmed by CO measures [53] and the mean number of cigarettes used per day or week at one, two-and six-months follow-up [58]. Interestingly the effect of VR seemed to have increased over time [53], however low retention hampers strong conclusions. Furthermore, Bordnick et al. [58] reported a large effect size of ηp 2 = 0.14.
The eight studies reporting other addiction-related variables were generally positive, but showed some mixed findings. One study showed positive effects of the VR intervention on nicotine dependence level compared to a control condition [53], but this was not observed in gamblers [45]. Caponnetto et al. [59] found beneficial effects of the active VR intervention on motivation to quit smoking, compared to a passive image or video. Similarly, Spagnoli et al. [57] found beneficial effects of the VR interventions on the readiness to quit alcohol use, compared to those receiving regular care.
Self-efficacy and confidence to resist smoking increased respectively post-intervention and at follow-up 1, 2, 3 and 6 months, compared to the control condition, with medium to large effect size (ηp 2 = 0.13) [58]. In contrast, gambling-related cognitions were not influenced by the VR intervention in a group of gamblers, compared to the control condition [45]. The one study using HRV reported a significant decrease of several-yet not all-indexes, suggesting that the VR sensitization procedure suppressed cue-induced reactivity in methamphetamine users [22]. Similarly, reductions in implicit alcohol associations were observed after a VR intervention in both high and low social drinkers (η 2 = 0.14) [50].
Of note is that drop-out was a major issue in several non-VRET studies. Girard et al. [53] found higher drop-out rates during the control condition, compared to the intervention (49% vs. 22%) and at the end of the 12-week program (71% vs. 50%). Bordnick et al. [58] also reported substantial drop-out in both conditions before (17% vs. 18%) and during treatment (29% vs. 42%). Caponnetto et al. [59] experienced no drop-out, while Wang et al. [22] did report a loss-to-follow up, without further analysis. Some studies, including those using a single VR session [50,57], did not report retention [45].

Quality of the Treatment Studies
The majority of the included intervention papers (n = 14/17) could be regarded as developmental studies (VR2). Only one study was a clear efficacy study (VR3) [49]. Two studies seemed to be intended as VR3 studies, but provided only preliminary evidence for efficacy, due to limited number on inclusions (lack of power) in comparison with the original protocol publication [55,60], or reporting of pilot data only, without further power analysis or availability of a comparison group [53].
Six papers included either addiction severity or addictive behavior as outcome measure, while three used both, and eight lacked information on addiction severity or addictive behavior. Furthermore, five of the nicotine papers and one of the gambling papers used non-treatment-seeking participants that seemed to resemble clinical populations (based on severity criteria). The remaining papers described interventions for treatment seekers.
Three VRET papers and five papers describing a non-VRET intervention used an active control condition (CBT, Treatment As Usual (TAU), nicotine replacement therapy, imaginal exposure or a form of embodied learning), while seven papers used no control condition, two used a waiting list, one gave access to a self-help manual and one sued a crossover design. Eight papers used a randomized design, yet one study did not compare group differences statistically [21] and one lacked a statistics paragraph in the methods section hindering understanding of their statical approach [58]. Furthermore, only seven studies reported effect sizes. None of the VR2/3 papers described a power analysis, though Malbos et al. [60] refer to a study protocol describing a power analysis. However, they fail to reach the number of participants described in their study protocol [55].
Most papers lacked follow-up data and only report effects directly post-intervention. Those with follow-up data, applied time frames ranging from seven consecutive days following an intervention [48], to one follow-up assessment at 90 days [56] and six months [53], to multiple follow-up assessments during a six-month [58] or 12-months period [49].
In addition to the criteria mentioned in 2.4, several studies only analyzed treatment completers [22,55,58], though drop-out was significant [55,58]. In addition, Malbos [55] specified the total number of completers, not the distribution across treatment and control groups.          -Craving h -Implicit alcohol associations i -Implicit alcohol eye behavior j -Implicit alcohol attentional bias k -HSD showed a greater reduction than LD group (p < 0.01) -HSD showed a weaker positive association than LD group (p < 0.01) -Reduced dwell time in both HSD and LD group (p < 0.05, η 2 = 0.14) -Reduced reaction times in both HSD and LD group (p < 0.05, η 2 = 0.14)

Methamphetamine studies
Wang et al.  VR2 studies with focus on integrating VR and CBT (study 2) and preliminary effectiveness (study 3) with pre-post session evaluation and randomized controlled design

General Discussion
The present systematic review evaluated (1) the diagnostic/prognostic value of VRinduced cue-reactivity for the clinical assessment of patients with ADs and (2) the effectiveness of VR-delivered treatment in patients with ADs. Though the number of papers on application of VR in ADs has grown over the past decade, study methods and outcome measures, and consequently results, were highly heterogeneous. In addition, most studies lack a clinical focus to demonstrate the (added) value in clinical practice.
Regarding VR-assessment, our findings show that cue-reactivity paradigms might be of diagnostic value in patients with nicotine, alcohol and methamphetamine use disorder, as well as gaming disorder. Despite negative findings in some studies, the majority (n = 15/19) reported one or more significant associations between clinical status (dependence status, dependence severity) and VR-cue-reactivity (craving, psychophysiology and withdrawal).
Regarding VR-treatment, one VRET trial showed a negative effect compared to standard CBT treatment in tobacco use disorder [49], with other VRET pilot studies not showing convincing treatment effects either [21,[46][47][48]51,52,[54][55][56]. Similarly, a gambling study using VRET did not show significant effects of a single session on the urge to gamble or self-efficacy [54]. Likewise, series of pilot studies using VR-CBT did not show significant added value compared to TAU either [45]. Other VR interventions, such as embodied learning (crushing cigarettes) [53], coping skills training (nicotine) [58] and aversive learning (methamphetamine) [22,50,59] produced encouraging results, with beneficial effects on disease severity and abstinence rates.
Our findings show that clinical assessment studies toward the diagnostic and prognostic potential of VR-induced cue-reactivity are scarce. Previous reviews showed that VR can induce craving in different VEs, with various cue exposure procedures [12][13][14][15][16], but lack insights into the clinical value [18]. We extend this body of evidence by exploring the diagnostic value of VR and reviewing studies that relate cue-reactivity to clinical indices. However, only two discriminative studies were identified, comparing AD patients with healthy controls, using psychophysiological measures during VR-cue-exposure [23,28]. In addition, several studies showed a relationship between VR-cue-reactivity and the severity of various clinical parameters. Given the limited number of discriminative studies and heterogeneity of methods and results, it remains to be elucidated whether VR-assessment can add to current clinical assessment practice. Further research into both discrimination between healthy and AD populations and severity assessment within AD populations is warranted.
Interestingly, while craving has been considered a predictive factor for treatment success or relapse [61][62][63], we did not identify prognostic studies that investigated the association between reactivity to VR-cue-exposure and treatment outcome. Only one study showed an effect of pharmacological treatment (nicotine lozenge) on VR-cue-reactivity [36]. Furthermore, previous studies using a VR-based assessment in patients with AUD reported greater exhaustion compared to standard clinical interviews [57,64]. Therefore, future studies should assess the feasibility and acceptability of VR assessments, as well as its predictive value in clinical practice [12], and explore potential benefits of VR assessment as compared to non-VR induced cue-reactivity, with more easy-to-use and more comprehensive procedures, such as personalized environments or smartphone applications.
Regarding VRET, the clinical value in ADs remains unclear. We only identified a single clinical effect study (level VR3), showing negative effects of the VR intervention [49] and several pilot studies (level VR2) showing limited effectiveness [21,[46][47][48]51,52,[54][55][56]. This is in line with previous reviews on the efficacy of traditional ET in ADs, also showing no to small effects or even negative effects [65,66]. Several studies in this review showed short-term reductions in cue-reactivity but did not examine long-term effects on extinction (i.e., spontaneous recovery, reinstatement, renewal effect) due to missing follow-ups [66]. It could be argued that long-term outcome might even be worse due to potential reinstatement effects of ET, as observed in studies on face-to-face ET [67]. Considering the limited short-term effects of VRET, it could also be argued, that ecological validity might not improve ET efficacy as expected [12,15,16]. Yet, recent advancements in VR technology, such as display technology, multimodality and multi-sensuality, might increase the ecological validity of VRET and need further study in the treatment of patients with ADs [15].
Other VR-based treatment approaches, such as VR-CBT [45], -embodied learning [53], -covert sensitization [22,50,59] and -cognitive reframing [57], showed somewhat more promising results, though the level of evidence is still limited (mainly pilot studies at level VR2). These findings are in line with previous reviews that suggest the incorporation of CBT-related coping skills training in VR-based treatments to transfer VR-based learning effects to everyday experience [12,15,16]. Using a VR in which the patient is an actor rather than passive observer may be more beneficial [53,58]. As described by Segawa et al. [12], embodied experiences could empower the patient's self-regulation and self-efficacy to foster sustainable behavioral change and improve coping with cue-reactivity [12,68]. Future studies should disentangle the most effective VR components and procedures to maximize such learning experiences. Likewise, the added value of VR treatment on top of traditional approaches or as an alternative to face-to-face sessions needs to be examined [69].
Throughout the review process, we encountered fundamental methodological shortcomings in many of the identified papers. As mentioned before, VR-assessment techniques, study procedures and instruments highly varied, limiting the comparability between studies and development of best practices for future research. The included studies were mostly in developmental stages and were not set up to examine diagnostic possibilities for the clinical application. To further investigate the diagnostic capabilities, the sensitivity, specificity and predictive values should be examined in random samples and compared to golden standards. Thus, representative clinical populations need to be studied to avoid an over-or underestimation of predictive values.
Treatment studies were mostly piloting stages (level VR2) and lacked detailed intervention protocols, predefined primary outcome measures, control conditions, randomization, follow-up data and sufficient sample sizes based on power analyses. Multiple studies failed to clearly communicate the conducted procedures and technological details, resulting in low explanatory power and the inability to replicate findings [16,50]. Besides, protocols delivered in VR were often vaguely defined [12,45,53,58,59]; for instance, one is described as VR-counterconditioning while VR-covert sensitization was applied [22]. Furthermore, VRET approaches must expose participants to a variety of VEs until craving is reduced, without acting upon the craving elicited, otherwise the participants may be sensitized instead. Moreover, time to extinction or a certain level of reduction in craving needs to be examined, to be sure of extinction effects, instead of standard exposure times regardless of the participant's response. Hence, VR research in ADs should focus on method development and reporting with scientific rigor, including evidence-based protocols and clear clinical endpoints and pre-registration of clinical trials.
A topic that has largely been overlooked in the VR addiction field is the ethics of VR application in a vulnerable population with mental health problems [12,70,71]. Kellmeyer et al. describe two main issues related to the development of VR applications in psychiatric patients [70]. The deceptive illusion and persuasiveness of the VEs might influence the user's behavior and ability to differentiate reality from virtuality. Side-effects of VR therapy, such as cybersickness and discomfort after prolonged use, should be considered thoroughly, but were not reported in the included studies [72]. Likewise, aversive conditioning remains ethically controversial and might be prone to cultural influences. Lastly, data-collecting HMDs threaten the patient's privacy, and might restrict the implementation and benefits of cutting-edge VR hardware in clinical practice [71]. Therefore, ethical guidelines should be established that addressed the aforementioned issues, for example through a patientcentered design and value alignment when developing future VR systems [70].

Strengths and Limitations
In comparison to previous review papers, first and foremost we focused on clinical applicability of the findings, differentiating between promising new concepts and evidence that can be applied in the clinical context. We were able to identify several additional papers that were either published in the past years [20][21][22][23]35,36,42,43,56] or through an additional search in the PsycINFO database [32,41,48,55]. In addition, we limited our review to studies applying HMDs, excluding other devices which are not state of the art and more difficult to apply in clinical practice. There are, however, several limitations that should be considered when interpreting the results of this systematic review. We were unable to systematically assess the quality of the studies through standard clinical frameworks due to the early stage of intervention development and lack of methodological details described in many studies [26]. Other limitations in the field include small sample sizes, heterogeneous methodologies and group characteristics, as well as a lack of validated instruments used to measure clinically relevant outcomes, accompanied by a lack of adequate follow-up periods and control groups. Therefore, a systematic quality appraisal with standardized clinical frameworks or meta-analysis of data was not possible [73,74]. Another issue is that we excluded two papers from the review because it was unclear whether HMDs were used, because of a lack of detailed methodological reporting [75,76]. These methodological shortcomings need to be addressed to further the VR field and bring VR technology to clinical care for patients with ADs.

Future Outlook
Future research needs to circumvent the current methodological shortcomings through scientific rigor with clear, pre-registered clinical endpoints. Assessment studies should investigate the potential of cue-reactivity to diagnose ADs (sensitivity and specificity), discriminate different levels of AD severity, monitor treatment effects and predict treatment outcome or relapse. The combination of multiple parameters into a discriminative model, for instance through machine learning, seems promising. The resulting models need to be tested in representative clinical populations to avoid biased conclusions, and should be compared to regular diagnostic instruments (e.g., DSM5, ICD10) as well as alternative, less complex approaches.
Treatment studies should focus on the implementation of therapeutic elements in the VE design (e.g., coping skills training, mindfulness) and the related development of treatment protocols that entail active (embodied) learning practices. To evaluate the (cost-)effectiveness, relevant RCTs (level VR3), including adequate follow-up periods, need to be conducted. During our literature review, we identified five study protocols that report on planned RCTs and insights into potential new treatment mechanisms [77][78][79][80][81]. The studies focus on approach-avoidance training, mindfulness-based relapse prevention, memory-retrieval extinction and the use of pharmacotherapy (Isradipine) to enhance the effect of VRET on the extinction of craving. However, the interdependence of psychological mechanisms and the technological implementation thereof should receive more attention to foster the identification of effective VR treatment paradigms.

Conclusions
The studies on VR in addiction medicine show that benefits for the clinical practice remain to be elucidated. Though we found 19 papers reporting a relation between cuereactivity and one or more clinical parameters, thereby providing some insight in the potential diagnostic value of VR-induced cue-reactivity, only two studies specifically analyzed the discriminative power of the VR intervention, and are therefore considered to be clinical assessment studies. Regarding VR-treatment, VRET studies showed conflicting results. While the application of VR-CBT, -embodied learning and -covert sensitization shows promising paradigms, which up to now lack clinical effect studies. Thus, VR in ADs is not yet an intervention that is ready for clinical application beyond clinical studies. A major issue in this field of research is a general lack of methodological rigor and insufficient quality of reporting methods. To move the field forward, studies with clear clinical endpoints and scientific quality, including randomized controlled designs and adequate follow-up, are required.