Physician-Related Variability in the Outcomes of an Invasive Treatment for Neck and Back Pain: A Multi-Level Analysis of Data Gathered in Routine Clinical Practice

Neuro-reflexotherapy (NRT) is a proven effective, invasive treatment for neck and back pain. To assess physician-related variability in results, data from post-implementation surveillance of 9023 patients treated within the Spanish National Health Service by 12 physicians were analyzed. Separate multi-level logistic regression models were developed for spinal pain (SP), referred pain (RP), and disability. The models included all patient-related variables predicting response to NRT and physician-related variables. The Intraclass Correlation Coefficient (ICC) and the Median Odds Ratio (MOR) were calculated. Adjusted MOR (95% CI) was 1.70 (1.47; 2.09) for SP, 1.60 (1.38; 1.99) for RP, and 1.65 (1.42; 2.03) for disability. Adjusted ICC (95%CI) values were 0.08 (0.05; 0.15) for SP, 0.07 (0.03; 0.14) for RP, and 0.08 (0.04; 0.14) for disability. In the sensitivity analysis, in which the 6920 patients treated during the physicians’ training period were excluded, adjusted MOR was 1.38 (1.17; 1.98) for SP, 1.37 (1.12; 2.31) for RP, and 1.25 (1.09; 1.79) for disability, while ICCs were 0.03 (0.01; 0.14) for SP, 0.03 (0.00; 0.19) for RP, and 0.02 (0.00; 0.10) for disability. In conclusion, the variability in results obtained by different NRT-certified specialists is reasonable. This suggests that current training standards are appropriate.


Subjects
At the design phase of this study, it was decided to analyze the variability of results obtained by different practitioners once the first 9000 patients who sought care for NP or LBP within the SNHS, and had undergone NRT, had been discharged. All the patients who had been discharged after receiving NRT were included in this study, with no exclusion criteria.
Patients were treated by 12 physicians (two supervisors and 10 trainees), none of whom authored this study.
Patients are referred by primary care physicians to specialized NRT Units in each region following a standardized protocol (Figure 1). Specialists confirm indication criteria and carry out the NRT interventions after patients have signed the written consent forms, which authorize the intervention and the analysis of data gathered during postimplementation surveillance. which authorize the intervention and the analysis of data gathered during post-implementation surveillance. Twelve weeks later, the surgical material is removed and patients are discharged, unless criteria for repeating the procedure are met (pain improvement after the previous intervention ≥2 VAS points, pain severity still ≥3 VAS points, absence of relevant adverse events, and patient's written consent) [16][17][18][19][20][21][22][23].

Post-Implementation Surveillance
All data analyzed in this study stem from post-implementation surveillance in routine practice. They are gathered through previously validated methods, and introduced into a database [18][19][20][21][22][23].

Post-Implementation Surveillance
All data analyzed in this study stem from post-implementation surveillance in routine practice. They are gathered through previously validated methods, and introduced into a database [18][19][20][21][22][23].
Data provided by the patient include; gender, date of birth, academic level (less than elementary school, elementary school, high school, university), current pregnancy, employment status ("working", "not qualifying for receiving financial assistance for NP, BP or LBP"e.g., housewife-, or "receiving financial assistance for NP, TP or LBP"-e.g., sick leave-), involvement in NP, TP or LBP-related employment claims (e.g., requesting disability pension), involvement in NP, TP or LBP-related litigation (e.g., traffic accident), and satisfaction (standardized questionnaire completed anonymously and alone by the patients).
Pain and disability are assessed at each visit to the primary care centers and the specialized NRT Units. Separate 10-cm visual analogue scales (VASs) are used for spinal pain (NP, TP or LBP) and referred pain [27]. The Roland-Morris Questionnaire (RMQ) and the Neck Disability Index (NDI) are used for LBP and NP-related disability, respectively [28,29]. Value ranges (from best to worst) are 0-10 for VAS, 0-24 for RMQ, and 0-100 for NDI [27][28][29][30][31][32]. Since the NDI is used for patients with neck pain and RMQ is used for patients with LPB, at the analysis phase, a "standardized score for disability" is calculated with a value range from 0 to 100 (from best to worst). This score reflects the percentage of the maximum possible score for neck or back pain-related disability (100 and 24 points, respectively). Data on adverse events (detected by patients, referring physicians, or NRT specialists) and use of diagnostic and other therapeutic procedures are also gathered.

Training Standards and Certification
Training standards for NRT imply a 3-year education program, during which 850 supervised NRT interventions per year are performed [25]. In order to qualify, trainees must match ≥95% of their supervisor's decision on patient eligibility for treatment and obtain clinical results in line with those from certified specialists.
After certification, trainees obtain full privileges to perform NRT interventions in solo practice, and receive a code to access the software used for post-implementation surveillance. Until then, they use their supervisors' personal code. Therefore, data from the interventions performed during training are assigned to each supervisor.

Analysis
Outcomes were spinal pain (SP), referred pain (RP), and disability [33]. Changes in the scores at referral and at discharge were calculated for each outcome, and patients who had experienced clinically relevant improvements were distinguished from those who had not. A "clinically relevant improvement" was defined as a score reduction in the corresponding measuring instrument, larger than the minimal clinically important change (MCIC). MCIC has been established at 30% of the baseline value, with a minimum value of 1.5 VAS points for SP and RP, 7 NDI points for NP-related disability, and 2.5 RMQ points for LBP-related disability [30][31][32][33]. These definitions made it impossible for patients with a baseline score below the cut-off point for a given variable, to show a clinically relevant improvement for that variable. Therefore, these patients were excluded from the analysis on that variable. Patients for whom the score of one of these variables at baseline or at discharge were missing were also excluded from the analysis on that variable.
Results for each outcome were analyzed by using multilevel logistic regression models [34], with patients at the first level and physicians at the second level. In the fixed-effects part of the models, the associations between variables and outcomes were appraised using the Odds Ratio (OR) with a 95 percent confidence interval (95%CI). Two models were developed for each outcome-an empty model in which only a random intercept was included, and a full model including at the patient-level all variables which have shown to predict improvement in SP, RP, and disability after NRT [21]. These variables are: gender, age (years), baseline scores for SP (VAS points), RP (VAS points) and disability (standardized disability score), reason for referral (NP or LBP), time elapsed since the first pain episode (<1 year, 1-<5 years, 5-<10 years, ≥10 years), duration of the current episode ("subacute"-14 to 89 days, "chronic"-90 to 365 days, highly chronic->365 days) [34,35], employment status ("passive", or "working"), type of pain ("radicular pain caused by symptomatic disc protrusion/herniation or lumbar spinal stenosis" vs. "common NP or LBP"), diagnosis of fibromyalgia, other comorbidities, involvement in employment claims, diagnostic tests undertaken until referral to NRT (X-rays, MRI, other), imaging findings, history of spine surgery, and treatments used prior to referral for NRT). At the physician level, the model included the number of years each physician had been performing NRT interventions in solo practice (i.e., after having been certified).
Data from patients treated for thoracic pain were excluded, since predictive models have only been developed for NP and LBP [21]. Only the first pain episode was analyzed for each patient because the number of pain episodes per patient was low, and, when combining "patient" and "physician" in one single level, very few replicates were available to assess variability.
Empirical Bayes' residuals were calculated for each physician [36]. In each model, the variability at the "physician" level was estimated through the Median Odds Ratio (MOR) and the Intraclass Correlation Coefficient (ICC). The latter was adapted to logistic regression by the latent variable method [37,38].
The ICC quantifies the fraction of the total variability in outcomes which is attributable to the physicians. ICC values range from 0 to 1. The higher the ICC value, the greater the variability that can be attributed to physicians. For instance, an ICC = 0.06 shows that 6% of the total individual differences over the odds of improvement occurs at the physician level and might be attributable to contextual physician factors.
Conceptually, MOR reflects the degree of variability in clinical results which stem from the fact that the patients are treated by different physicians. A higher MOR reflects a higher variability. The MOR quantifies the difference in results obtained by different physicians, by comparing clinical results from two patients treated by two different physicians selected randomly. For instance, considering two patients with the same covariates, selected randomly among cases treated by two different physicians, the MOR is the median odds ratio between the patient with higher odds and the patient with lower odds [38], and can be interpreted as the median increased odds of improvement, if a given patient was treated by another better performing physician. A MOR value of 3 would mean that the median odds of a patient improving if treated by the physicians obtaining the best results is 3 times higher than the odds for improving if treated by the physicians obtaining the worst results. In this study, the MOR shows the extent to which the probability that an individual patient will improve (in terms of pain severity and disability) is determined by the treating physician [39].
A sensitivity analysis was performed, in which the models were repeated restricting data to patients treated by the trainees after they were certified as NRT specialists (i.e., data on interventions assigned to codes corresponding to supervisors were eliminated).

Role of the Funding Source and Conflicts of Interest-Associated Biases
This study was promoted by the Spanish Back Pain Research Network, a not for profit research Organization with no links to the health industry. No external funds were received to fund this study. Only the authors were responsible for the design and conduction of the study; data collection, management, analysis and interpretation; preparation, review and approval of the manuscript; or the decision to submit the article for publication. None of the authors received any payment for this work, and none harbor any conflicts of interest.

Results
Since post-marketing surveillance was applied simultaneously at centers in different geographic locations, when recruitment for this study stopped, 9023 patients had been discharged after having undergone NRT for neck or back pain. Therefore, all 9023 patients were included in this study. There were no losses to follow-up. Figure 2 shows the flow chart of the study.
pain or disability at baseline or discharge, or had a baseline score below the corresponding MCIC, were 8%, 32%, and 37%, respectively. Additionally, patients with missing completely at random (MCAR) scores for any of the independent variables, which were introduced in the full models, were 3379 for the model on spinal pain, 2418 for the one on referred pain, and 1537 for the one on the disability. Therefore, the model on spinal pain included data from 4791 patients, the one on referred pain included data from 3606 patients, and the one on disability included data from 4061 patients ( Figure 2).  The identity of the treating physician was missing for 157 (1.8%) patients. Among the 12 physicians who were identified, the median (P25; P75) number of patients treated in solo practice was 180 (44; 649). The number of NRT interventions performed by, or assigned to, the two supervisors was 6763. Two trainees never qualified and are not identified in Table 1; data on the evolution of the 115 patients they had treated appear assigned to their supervisors. The sensitivity analysis included data on the 2103 patients treated by the 10 junior specialists after they were certified ( Table 1).
The proportion of patients who presented missing scores for spinal pain, referred pain or disability at baseline or discharge, or had a baseline score below the corresponding MCIC, were 8%, 32%, and 37%, respectively. Additionally, patients with missing completely at random (MCAR) scores for any of the independent variables, which were introduced in the full models, were 3379 for the model on spinal pain, 2418 for the one on referred pain, and 1537 for the one on the disability. Therefore, the model on spinal pain included data from 4791 patients, the one on referred pain included data from 3606 patients, and the one on disability included data from 4061 patients ( Figure 2).
Clinical results obtained by each physician are shown in Table 1, while Table 2 shows the distribution of physician and patient characteristics across the patients who did and did not show clinically relevant improvements in SP, RP, and disability after NRT.    * Frequency (%); ¥ Median (P25; P75); Type of pain: "Radicular pain caused by disc protrusion/herniation or spinal stenosis" if; (a) Severity of referred pain ≥ local pain, (b) corresponding imaging finding on MRI, (c) distribution of pain consistent with the nerve root compressed by the corresponding imaging finding. "Non-specific pain", if one or more of these criteria were not met.; ¤ Other diagnostic procedures: EMG, CT scan and other; ∞ Other imaging findings: annular tear, loss of cervical lordosis, loss of thoracic cifosis, loss of lumbar lordosis, horizontalization of the sacrum, lumbarization of S1, sacralization of L5; NRT: Neuroreflexotherapy intervention; SP: Severity of spinal pain; RP: Severity of referred pain; VAS: Visual Analog Scale (range from better to worse; 0-10).  .98) † Only includes patients whose spinal pain at baseline was higher than the minimal clinically important change, and for whom data on this variable at baseline and discharge were available. £ Restricted to patients treated by physicians after the latter became certified NRT practitioners. Table 4. Estimates of the inter-physician variability for improvement in referred pain (RP). ) † Only includes patients whose referred pain at baseline was higher than the minimal clinically important change, and for whom data on this variable at baseline and discharge were available. £ Restricted to patients treated by physicians after the latter became certified NRT practitioners. Tables 6-8 show results from the multilevel models on the strength of the association between predictors and improvement in spinal pain, referred pain, and disability. The proportion of patients who improved and did not improve was virtually the same in both the empty and the full models for these variables (Table 9). Figure 3 shows the Empirical Bayes' residuals of each physician's variability for each outcome, and Figure 4 shows the number of years of experience for each physician, and the improvement in outcomes. .79) † Only includes patients whose disability at baseline was higher than the minimal clinically important change, and for whom data on this variable at baseline and discharge were available. £ Restricted to patients treated by physicians after the latter became certified NRT practitioners.         Figure 4. Years of experience for each physician, and improvement in neck and back pain, referred pain, and disability *. *: Note that the figure shows improvement in pain and disability between 50% and 100% of baseline values (not between 0 and 100%). Therefore, differences across physicians appear to be larger than they are.

Discussion
Previous studies have shown that NRT improves pain and disability in 84-89% of subacute and chronic neck and back patients, and that this improvement is clinically relevant in 72-76% patients [18][19][20][21][22][23]. Results from the current study suggest that, after having adjusted for patient-related characteristics which predict individual responses to NRT, changing the physician who performs the procedure is associated with a variation of 60-70% in the odds of experiencing a clinically relevant improvement (MOR values for SP, RP and disability 1.70, 1.60, and 1.65, respectively), and that the physician who performs NRT accounts for 7-8% of the variability in patients' evolution (ICC values for SP, RP, and disability 0.08, 0.07, and 0.08, respectively) ( Tables 3-5). These figures decrease to 25-38% and 2-3% respectively, when the analysis is restricted to results obtained by certified spe-  . Years of experience for each physician, and improvement in neck and back pain, referred pain, and disability *. *: Note that the figure shows improvement in pain and disability between 50% and 100% of baseline values (not between 0 and 100%). Therefore, differences across physicians appear to be larger than they are.

Discussion
Previous studies have shown that NRT improves pain and disability in 84-89% of subacute and chronic neck and back patients, and that this improvement is clinically relevant in 72-76% patients [18][19][20][21][22][23]. Results from the current study suggest that, after having adjusted for patient-related characteristics which predict individual responses to NRT, changing the physician who performs the procedure is associated with a variation of 60-70% in the odds of experiencing a clinically relevant improvement (MOR values for SP, RP and disability 1.70, 1.60, and 1.65, respectively), and that the physician who performs NRT accounts for 7-8% of the variability in patients' evolution (ICC values for SP, RP, and disability 0.08, 0.07, and 0.08, respectively) (Tables 3-5). These figures decrease to 25-38% and 2-3% respectively, when the analysis is restricted to results obtained by certified specialists (Tables 3-5).
Several factors can account for this reduction. In the complete dataset, the number of observations was higher and the number of years in practice and the number of patients assigned to each physician were skewed. Moreover, in the analysis of the complete data set, results from the interventions for which the worst and best results were to be expected (i.e., those performed by trainees at the beginning of their training period, and by senior experts) were identified with the same codes, which impeded analyzing them separately. All of the above may have contributed to the MOR and ICC values being larger in this analysis than in the analysis in which only data from certified specialists were included.
Senior experts who participated in this study can be seen as those with the maximum level of competency in performing NRT interventions; they had ≥20 years of experience and had shown to obtain positive clinical results in RCTs and studies conducted in routine practice [14,15,18,19,22]. However, results obtained by recently certified specialists were better than the combined results of these senior experts and the same junior specialists during their 3-year pre-certification training (Tables 1 and 6, Table 7, Table 8, and Figure 4). At the end of the training period, all physicians obtained improvements in pain ≥60% of baseline value, which is unusually positive for patients with subacute and chronic neck and back pain treated in routine clinical practice. Some physicians obtained better results sooner than others, but, in general, between 3 and 5 years after certification, results across physicians became similar (Figure 4). This suggests that training sharply increases the proficiency of trainees, but that the learning curve for this procedure is long. In fact, the number and specific location of the surgical devices implanted in a NRT intervention are determined through physical examination and subtle manual palpation, vary from one patient to another, and are essential for the procedure to be effective; the insertion of the same number of devices within a 5 cm-radius of the target zone has consistently been shown to have virtually no clinical effect [14,15,17,40].
Concerns have been expressed about the feasibility of accurately assessing the clinical performance of physicians, especially for procedures requiring manual skills, given that awareness that their results are being monitored might result in changes in behavior or in avoiding treating patients they feel might have a worse prognosis [41,42]. However, these concerns are not likely to challenge the results of this study, since post-marketing surveillance mechanisms have been in force uninterruptedly since NRT was first implemented in the SNHS and include all patients who have undergone this procedure. Furthermore, physicians treating the more complex cases have no reason to fear that this could penalize data on their performance, since they are aware that results are adjusted by the individual prognosis of each patient [21].
Most reports on learning curves or variability across practitioners for invasive treatments for neck or low back pain focus on interventions other than NRT, many of which lack high quality evidence supporting their efficacy and effectiveness. Moreover, they are based on small samples (e.g., ≤150 patients or ≤3 practitioners), do not use multi-level analyses to adjust results for patient prognosis as established by previously validated models, or have been conducted either retrospectively or outside routine practice conditions (e.g., based on data gathered in randomized controlled trials) [8][9][10][11][12][13]40,41,[43][44][45][46][47][48][49][50][51][52][53][54]. Therefore, it is inappropriate to compare such data with results from this study.
To date, although some invasive procedures used for neck or back pain have been assessed [55][56][57], most lack solid evidence on their efficacy or effectiveness [6,7,58,59], and very few undergo post-marketing surveillance, which makes it difficult to establish any reliable benchmarks. If, as suggested [60,61], in the future, all invasive health technologies are subject to a more stringent assessment and surveillance processes, and it will be possible to compare results from this study with data on other procedures. It is also difficult to compare physician-related variability in results when using NRT to the one obtained when using other invasive procedures, since very few other invasive treatments for neck and back pain are subject to systematic and validated post-implementation surveillance mechanisms in routine practice [18][19][20][21][22][23][24][25]. Generalizing such mechanisms to all treatments for neck and back pain will make it possible to compare physician-related variability across treatments.
Data analyzed in this study stem from surveillance mechanisms in routine practice. These mechanisms encompass all patients who undergo NRT within the Spanish National Health Service, with very little losses to follow-up, and use previously validated methods [18][19][20][21][22][23]. Therefore, validity of data is not a major concern. However, this study has several weaknesses. It was impossible to distinguish the results obtained by the trainees during their training period from those obtained by the senior experts who acted as their supervisors. In the future, trainees will be assigned a personal code throughout their entire traineeship period, which will make it possible to analyze the results they obtain separately. Missing data impeded inclusion of data from all patients in the full models on spinal pain, referred pain, and disability ( Figure 2). However, it is unlikely that missing data introduced any biases, since the proportion of patients who improved and did not improve were virtually the same in the empty and full models ( Table 9). The number of years of solo practice after certification and the number of patients treated by each physician were skewed; this may influence the stability of results from the models. This is due to the fact that, in order to comply with the training standards, new trainees are only recruited when the procedure is implemented in a new territory. Generalizing this procedure will make it possible to resolve this.

Conclusions
In conclusion, results from this study suggest that, at the end of their training period, NRT-certified specialists achieve clinical results, which, after adjusting for patient characteristics, are reasonably similar, which suggests that current training standards are valid for generalizing this technology.

Institutional Review Board Statement:
This study consisted of a Clinical Audit which did not imply any changes to routine clinical practice, and data which were analyzed did not contain any personal data which would allow to reveal the identity of patients. Therefore, according to the Spanish law, this study was not subject to approval by an Institutional Review Board.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data analyzed and presented in this study are available upon request from the authors.