Success Rates of Monitoring for Healthcare Professionals with a Substance Use Disorder: A Meta-Analysis

In the past decades, monitoring programs have been developed for healthcare professionals with substance use disorders. We aimed to explore estimates of abstinence and work retention rates after participation in such monitoring programs. A literature search was performed using PubMed, Embase, PsycINFO, and CINAHL. Twenty-nine observational studies reporting on success rates (abstinence and work retention) of monitoring for healthcare professionals with a substance use disorder were included in the meta-analysis. Quality-effects models calculated pooled success rates and corresponding 95%-Confidence Intervals (CI), with subgroup analyses on monitoring elements and patient characteristics. Pooled success rates were 72% for abstinence (95%-CI = 63–80%) and 77% for work retention (95%-CI = 61–90%). Heterogeneity across studies was partly explained by the starting moment of monitoring, showing higher abstinence rates for studies that started monitoring after treatment completion (79%; 95%-CI = 72–85%) compared to studies that started monitoring with treatment initiation (61%; 95%-CI = 50–72%). About three-quarters of healthcare professionals with substance use disorders participating in monitoring programs are abstinent during follow-up and working at the end of the follow-up period. Due to selection and publication bias, no firm conclusions can be drawn about the effectiveness of monitoring for healthcare professionals with SUD.


Introduction
Substance Use Disorders (SUD) are a major health burden, also among healthcare providers, not only affecting their own health, but also their professional image and potentially patient safety [1,2]. Although the prevalence of SUD in healthcare professionals is estimated to be similar to that in the general population (about 10%) [1,3] they more often abuse alcohol and addictive medication, like sedatives and opioids, compared to other SUD patients [4,5].
In the 1970s, the first so-called Physician Health Programs (PHPs) were initiated in the United States. PHPs aim to facilitate early identification and adequate treatment of psychiatric disorders, including SUD, among physicians [6]. Subsequently, health programs 2 of 31 were established for other healthcare disciplines and in many more, mainly Western, countries across the globe [7,8]. The content and scope of these health programs vary widely. In the United States (US), professionals are commonly referred to inpatient and/or outpatient treatment in regular care and participate in monitoring provided by the health program [9]. In Europe, some programs mainly provide advice, others provide treatment themselves, and some offer monitoring [7]. A key difference between US health programs and some European programs (e.g., in Norway, Spain, and the United Kingdom (UK)), is that European programs encourage voluntary help seeking by offering free services and have high rates of self-referrals (45-75%) [7]. Additionally, the UK program also guarantees confidentiality by not having any formal links with regulating authorities [10].
Monitoring offers the opportunity to follow the rehabilitation of healthcare professionals with SUD by using biological testing as an objective measure for substance use or abstinence [11]. Monitoring can be started simultaneously with treatment, as well as after successful treatment completion. In addition to biological monitoring of substance use, health programs might also monitor a participants' fitness to practice at work (by an employer or colleague) or require participation in self-help groups. Health programs usually report outcomes of rehabilitation in terms of abstinence or relapse, return to clinical practice, and/or program completion. A systematic review on rehabilitation outcomes for healthcare professionals found a variety of success rates: abstinence rates of 56% to 94% and work retention rates at the end of follow-up of 74% to 90% [12]. Previous research suggests that this variation in success rates might be influenced by both monitoring elements and participant characteristics [12,13]. Unfortunately, success rates in the systematic review were only presented as ranges per outcome and no thorough examination of the actual data was performed.
So far, there is no meta-analysis performed about success rates of monitoring for healthcare professionals with SUD. Therefore, the current meta-analysis aims to explore success rates of monitoring, using biological testing, for healthcare professionals with SUD, in terms of abstinence and work retention. Furthermore, we explored whether specific monitoring elements and/or participant characteristics explained heterogeneity in success rates across studies.

Search Strategy and Selection Criteria
For this meta-analysis, a review protocol was written, but not published or preregistered before the review was conducted. This protocol adopted a broad search strategy in order to maximize identification of potentially relevant papers. The search strategy, including the definition of outcome measures, was based on a set of a priori identified publications on outcomes of PHPs. The search strategy was developed by a multidisciplinary team with expertise in bibliography (medical librarian), epidemiology (P.M.G., S.J.M.v.d.B., F.A.), and addiction studies (B.A.G.D., A.F.A.S.). The search was performed on 8 December 2020 using the following databases: PubMed, Embase, PsycINFO, and CINAHL.
To be eligible, studies were required to (1) aim at adult healthcare professionals with a SUD diagnosis, (2) clearly describe their (biological) monitoring, and (3) use well-defined outcome measures in terms of abstinence (no relapse during the follow-up period) and/or work retention (working at the end of the follow-up period). Studies were excluded if (1) they concerned tobacco use disorder only, (2) no biological testing was applied, (3) the study solely reported on outcomes of care as usual, or (4) when outcomes were assessed by surveying third parties (i.e., a survey distributed among anesthesia program directors). Studies were limited to English-language research articles published in peer-reviewed journals. Details of the search strategy can be found in Table 1. Next, data-extraction was performed by one researcher (P.M.G.). The data of each study was documented in Microsoft Excel 2016, which was subsequently checked by a second researcher (S.J.M.v.d.B).
A flow chart of the study selection procedure is provided in Figure 1. First, duplicates were removed, using Rayyan software (Qatar Computing Research Institute, Doha, Qatar, 2017) for citation screening [14]. Next, three authors (P.M.G., S.J.M.v.d.B., and B.A.G.D.) screened 5907 unique titles and abstracts on the selection criteria mentioned above. Discrepancies in the identified eligible records were discussed until consensus was reached. When in doubt, records moved on to the next phase of assessing the eligibility, based on the full-text articles. Full-text assessment of 94 remaining records was performed by two authors (P.M.G. and S.J.M.v.d.B.). Discrepancies were discussed until consensus was reached. This resulted in 29 studies eligible for the meta-analysis, published in 24 articles. Next, data-extraction was performed by one researcher (P.M.G.). The data of each study was documented in Microsoft Excel 2016, which was subsequently checked by a second researcher (S.J.M.v.d.B). Extracted information included study characteristics: name of first author, year of publication, country (state) of first author, design of the study, time frame of the study, number of included subjects, percentage of males, type of healthcare professional, type of substance use, and source of referral. In addition, characteristics of monitoring were summarized: name of the health program, recommended type of treatment, starting moment of monitoring, type of biological testing, monitoring at work, and additional agreements. Finally, the outcomes of monitoring programs were extracted: percentage of abstinence and work retention specified with the (exact or range of) duration of follow-up. Since the Extracted information included study characteristics: name of first author, year of publication, country (state) of first author, design of the study, time frame of the study, number of included subjects, percentage of males, type of healthcare professional, type of substance use, and source of referral. In addition, characteristics of monitoring were summarized: name of the health program, recommended type of treatment, starting moment of monitoring, type of biological testing, monitoring at work, and additional agreements. Finally, the outcomes of monitoring programs were extracted: percentage of abstinence and work retention specified with the (exact or range of) duration of follow-up. Since the information was not always presented in the same manner, we categorized monitoring elements and participant characteristics in order to perform subgroup analyses: program elements (biological, at work, and additional agreements; biological and additional agreements; biological and at work; biological), starting moment of monitoring (before treatment; after treatment; unknown), duration of follow-up (less than 2 years; 2 to 5 years; more than 5 years; other duration), gender (more than 50% males; other or unknown), type of healthcare professional (more than 50% physicians; other or mixed), and type of substance use (more than 50% alcohol; more than 50% opioids; mixed or unknown).
All included studies were assessed on their quality in order to account for study quality in the meta-analysis. The initial assessment was performed by one researcher (P.M.G.), and subsequently checked by a second researcher (S.J.M.v.d.B.). The Health States Quality Index [15] was used to assess study quality. Assessment parameters include a clear definition of the target population and observation period (yes or no), use of diagnostic criteria (diagnostic system or symptom based/not specified), method of case selection (attempting all cases, convenience sampling, or not specified), type of outcome assessment (administered interview, register/case record, or not specified), size of the study area (broad, small, or not specified), and type of prevalence measure (exact follow-up duration, average follow-up duration, or range of follow-up duration). The quality index of each study is calculated as the total quality score of that study divided by the maximum total quality score, see Table A1. The instrument was slightly adjusted for a good fit to our meta-analysis. The higher the score, the higher the study quality. We report our study in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) and the proposal for reporting Meta-analyses of Observational Studies in Epidemiology (MOOSE) were applicable, see Table S1 [16,17].

Data-Analysis
Statistical analyses were performed using MetaXL (EpiGear International Pty Ltd., Sunrise Beach, Australia, version 5.3) within Microsoft Excel 2016 [15,18]. For every study, the total number of participants, the number of participants with a successful outcome (abstinence or work retention), and the quality index were entered in MetaXL. Quality-effects models were used in order to address heterogeneity caused by differences in study quality. The quality-effects model is a modified version of the fixed-effects inverse variance method and gives greater weight to the studies that were judged as being of high quality [19]. The models were applied to analyze the data and calculate pooled abstinence and work retention rates, and accompanying 95%-Confidence Intervals (CI).
The heterogeneity assumption was assessed by Cochrane's Q-test (which verifies the presence of heterogeneity) and I2 statistic (which shows the amount of heterogeneity between studies). A significant Q-test (p < 0.10) and an I2 > 50% indicated the presence of substantial heterogeneity. Subgroup analyses were explored by stratifying the data on monitoring elements (start of monitoring, type of monitoring, and duration of followup) and participant characteristics (gender, type of healthcare professional, and type of substance use).
Publication bias was assessed using the Doi plot and Luis Furuya-Kanamori asymmetry (LFK) index. In the case of a symmetric shape, no publication bias is indicated. In case of an asymmetric shape, publication bias is indicated. An LFK index within −1 and +1 indicates no publication bias, an LFK of −1 to −2 or +1 to +2 minor asymmetry, and an LFK of <−2 or >+2 major asymmetry [15].
Subgroup analyses on the type of monitoring did slightly reduce heterogeneity across studies ( Figure A1). Heterogeneity across studies was not significantly reduced by duration of follow-up, gender, type of healthcare professional, and type of substance use ( Figures A2-A5). Risk of bias across studies was visualized in a Doi plot, indicating an asymmetric shape for the pooled abstinence rate ( Figure A6). The LFK index was −1.59, also indicating minor publication bias.
Subgroup analyses on type of monitoring and type of substance use did slightly reduce heterogeneity across studies ( Figures A7 and A11). Subgroup analyses on starting moment of monitoring, duration of follow-up, gender, and type of healthcare professional did not significantly reduce heterogeneity across studies (Figures 3 and A8-A10). Risk of bias across studies was visualized in a Doi plot, indicating an asymmetric shape for the pooled work retention rate ( Figure A12). The LFK index was −2.70, also indicating major publication bias.

Work Retention
Work retention rates of the individual studies ranged from 43 to 96% with a substantial heterogeneity across studies (Q = 162.7; p < 0.001; I2 = 92%). The overall pooled work retention rate was 77% (95%-CI = 61-90%), with a follow-up duration up to 8 years ( Figure  3). Subgroup analyses on type of monitoring and type of substance use did slightly reduce heterogeneity across studies ( Figures A7 and A11). Subgroup analyses on starting moment of monitoring, duration of follow-up, gender, and type of healthcare professional did not significantly reduce heterogeneity across studies (Figures 3, A8-A10). Risk of bias across studies was visualized in a Doi plot, indicating an asymmetric shape for the pooled

Discussion
This study aimed to identify the success rate of monitoring for healthcare professionals with SUD, as indexed by abstinence and work retention. Furthermore, possible explaining variables for heterogeneity were explored. On average, three quarters of the healthcare professionals who engaged in a monitoring program remained abstinent and were working at follow-up. Follow-up duration varied widely between 0 to 8 years. We identified significant heterogeneity across studies, as well as indication for publication bias. Heterogeneity within abstinence rates was partly explained by the starting moment of monitoring. Monitoring that started after successful initial treatment had better outcomes compared to those that started monitoring simultaneously with treatment. Duration of follow-up, gender, and type of healthcare professional did not significantly decrease the heterogeneity in success rates.
Unfortunately, none of the included studies used a randomized control trial or quasiexperimental design, and due to the naturalistic design of the studies included in this meta-analysis we cannot draw firm conclusions on the effectiveness of monitoring programs for healthcare professionals with SUD. If the actual effectiveness of monitoring turns out to be comparable to the success rates we found, this would be promising. In general, SUD patients show relapse rates over 50% within the first year after treatment initiation, and they remain at increased risk for relapse throughout the early years of recovery [42][43][44].
Professionals in monitoring were thus about 1.5 times more successful in maintaining abstinence when compared to regular addiction care patients without monitoring. Biological monitoring has also been applied in general SUD patients, showing a one-year abstinence rate of 46% [45,46]. This is far less successful as observed here among healthcare professionals. This may be partly attributed to the starting moment of monitoring (during treatment), but might also be the result of a difference in effectiveness of the intervention. Furthermore, work retention is a major incentive for healthcare professionals, which might apply to a lesser extent in general SUD patients. Indeed, studies on Contingency Management (CM) and Community Reinforcement Approach (CRA) indicate that positive reinforcement increases abstinence rates [47].
We only included studies that applied biological monitoring of substance use. Biological testing is the most reliable and objective measure for abstinence [11]. The studies included in this meta-analysis mostly reported urine toxicology as method of biological testing. Yet, abstinence rates might be inflated due to false-negative urine toxicology [48]. On the other hand, biological testing might be more effective in promoting abstinence than self-report. Indeed, studies on monitoring without biological testing among healthcare professionals showed somewhat less positive results (i.e., abstinence rates ranging from 13% to 76% and work retention rates ranging from 36% to 89%) [49][50][51][52][53][54][55][56][57][58]. This might indicate that monitoring programs should preferably include biological monitoring of substance use.
Heterogeneity in abstinence rates across studies was partially explained by the starting moment of monitoring. This suggests a potential source of selection bias, depending on the timing of monitoring. Participants who start monitoring after successful treatment completion might be strongly motivated to achieve abstinence and have high chances to maintain their good treatment outcome. Moreover, the group who starts monitoring simultaneously with treatment initiation also includes participants who will drop out of treatment, or relapse during treatment. This will lead to lower success rates of monitoring. Indeed, many continuing care studies limited their participants to those who had successfully completed the initial treatment phase, thus introducing selection bias [59]. Other variables included in the subgroup analyses (duration of follow-up, gender, and type of healthcare professional) did not explain a substantial part of the heterogeneity across studies. Unfortunately, the data reported in the included studies did not enable us to perform subgroup analyses on type of initial treatment (inpatient, outpatient, pharmacological intervention, etc.) and on the mandatory status of monitoring.
Several other sources of bias might affect our findings. First, it has been suggested that many physicians who are forced to participate in a PHP might not actually have a SUD [60]. Not all PHPs use diagnostic criteria to assess their participants. Indeed, more than two-thirds of the studies included in our meta-analysis did not specify the diagnostic process of SUD assessment. Secondly, some of the studies we included did not take into account participants who were lost to follow-up in calculating the overall success rate of monitoring. It is unclear how this may have influenced the outcomes. Participants may have become lost to follow-up either because they are doing well and feel they no longer need monitoring or, on the other end of the spectrum, because they have relapsed and cannot be located or do not want to reveal their condition [59]. Thirdly, the duration of follow-up varied widely within and between the included studies and durations were either presented as range, average, or exact follow-up between 0 to 8 years. A followup of 0 years meant that some participants recently started monitoring, whereas other participants in the same study were followed-up for 3 or 5 years. Fourthly, three very small studies either showed high [21,37] or low [22] success rates, thereby possibly skewing the results. Though some success rates changed slightly, the sensitivity analyses showed that the main findings still hold, indicating the robustness of findings. Lastly, our meta-analysis showed asymmetry for both abstinence and work retention, suggesting publication bias. Taken together, this raises concerns of potential overestimation of the effectiveness of monitoring in the current literature [60]. In order to reduce reporting and publication bias, we strongly encourage health programs to systematically assess effectiveness and publish about the outcomes of their monitoring.
The current study results should be interpreted in the light of several limitations. First, we identified a considerable amount of heterogeneity between studies, but were able to explain only a small fraction by the starting moment of monitoring. Other potential sources of heterogeneity like the severity of the SUD, the presence of comorbidity, a (family) history of SUD, the type of initial treatment (inpatient, outpatient, pharmacological and/or psychological intervention), and the status of monitoring (mandatory or voluntary) could not be analyzed since this information was generally not available across studies [31,33,61]. Secondly, we included only English-language research articles published in peer-reviewed journals. This might have increased bias in our study results, because we did not include foreign language studies, unpublished studies, partially published studies, and studies published in "grey" literature sources [62]. Thirdly, the definition of the abstinence outcome measure (no relapse during follow-up) was quite strict, so some abstinence rates included in the meta-analysis were lower than reported in the conclusions of the individual studies. Furthermore, the overall quality of the included studies was moderate, with 60% of the studies scoring 0.5 or lower on the Quality Index. Thus, future studies with more rigorous designs are highly needed, in order to support effectiveness of monitoring for healthcare professionals with SUD. Finally, we focused only on healthcare professionals with SUD. Therefore, we cannot say anything about behavioral addictions or other psychiatric problems among healthcare professionals. Yet, some studies investigated the success of monitoring for other psychiatric problems among healthcare professionals, showing high recovery rates ranging from 88 to 94% and work retention rates ranging from 77 to 90% [12]. The current positive findings may thus indicate good prognosis of mental health issues in general among healthcare professionals.

Conclusions
Three quarters of the healthcare professionals who engaged in monitoring for SUD remained abstinent and were working at follow-up. There was significant heterogeneity across studies, as well as an indication for major publication bias. The heterogeneity in success rates of monitoring was slightly explained by the starting moment of monitoring, with studies starting monitoring after treatment completion showing higher success rates than studies starting monitoring at treatment initiation. Given the heterogeneity across studies and indication for publication bias, no firm conclusions can be drawn about the effectiveness of monitoring for healthcare professionals with SUD. Future studies should apply controlled comparisons, using more rigorous measurements and substantially long follow-up rates.

Acknowledgments:
The authors wish to thank Marlies de Rond, from the Royal Dutch Medical Association for facilitating the writing and discussion process.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A   Table A1. The Health States Quality Index.