3.1. Results
Among the 70 papers identified as retracted RCTs by PubMed, a total of 66 papers (94.3%) were actually evaluated. Four papers were excluded from evaluation for the following reasons: one paper was in an obscure journal that could not be obtained from the medical library of a major American university; one paper was a systematic review misclassified as an RCT; one paper was a psychology study that did not involve patients or treatments; and one study did not involve humans at all. An additional three papers were excluded after further evaluation, because they did not report the results of a clinical trial, but were rather meta-analyses of data from clinical trials. That approximately 10.0% (seven of 70) of studies identified as RCTs by PubMed were misclassified suggests that it can be problematic to rely upon PubMed to identify RCTs.
Cases were matched to controls by journal and by volume in every case, though it was not always possible to match cases and controls by issue number. Of 126 control papers, 116 controls (92.1%) were selected from the same issue number as the case, implying that control RCTs went through a similar editorial process. Ten controls had to be selected from a different issue than the matched case, but nine of those 10 were selected from the next consecutive issue. One retracted case paper appeared in a journal that published so few RCTs that it was impossible to find two controls in a consecutive issue of the journal. Nevertheless, 99.2% (125/126) of controls were matched to cases within approximately one month of publication in the same journal.
We did all analyses with and without the block effect, and it made no difference in any case. We therefore dropped the block effect for simplicity, and only analyses without the block effect are reported.
Table 1 contains descriptive statistics and tests of the effect of each potential predictor in a multiple logistic regression model. By this method, only the number of authors was significantly associated with retraction status (
p < 0.0296).
Table 1.
Comparison of retracted randomized clinical trials (RCTs) (cases) and non-retracted RCTs (controls). “Logistic Regression” predicts retraction from all four predictors one-at-a-time and is significant only for the number of authors. All other analyses do not use a block structure, since this added nothing to the overall predictive ability of the model. “Multiple Logistic Regression” tries to predict retraction from all four predictors in a single equation; the small disagreement between this analysis and the logistic regression suggests that there is some correlation among predictors, such that when all predictors are used together, each adds a small increment to the overall prediction. The log-transformed analysis-of-variance (Log-Transformed ANOVA) uses log-transformed predictors to address likely skew in the data. The “Permutation Test” is used to test the same hypotheses as the ANOVAs, but permutation tests make fewer assumptions. As expected, p-values are nearly identical to the ANOVA, indicating that the assumptions of the ANOVA are met. These p-values were not corrected for multiple comparisons, because they demonstrate only a mild association that was not useful for prediction. All analyses, except the multiple logistic regression, were sensitivity analyses. Had we corrected p-values in the multiple logistic regression, they would not have achieved statistical significance.
Table 1.
Comparison of retracted randomized clinical trials (RCTs) (cases) and non-retracted RCTs (controls). “Logistic Regression” predicts retraction from all four predictors one-at-a-time and is significant only for the number of authors. All other analyses do not use a block structure, since this added nothing to the overall predictive ability of the model. “Multiple Logistic Regression” tries to predict retraction from all four predictors in a single equation; the small disagreement between this analysis and the logistic regression suggests that there is some correlation among predictors, such that when all predictors are used together, each adds a small increment to the overall prediction. The log-transformed analysis-of-variance (Log-Transformed ANOVA) uses log-transformed predictors to address likely skew in the data. The “Permutation Test” is used to test the same hypotheses as the ANOVAs, but permutation tests make fewer assumptions. As expected, p-values are nearly identical to the ANOVA, indicating that the assumptions of the ANOVA are met. These p-values were not corrected for multiple comparisons, because they demonstrate only a mild association that was not useful for prediction. All analyses, except the multiple logistic regression, were sensitivity analyses. Had we corrected p-values in the multiple logistic regression, they would not have achieved statistical significance.
| Retracted | | Non-Retracted | | Logistic | Multiple Logistic | Log-Transformed | Permutation |
---|
| Cases | | Controls | | Regression | regression | ANOVA | Test |
---|
| (n = 63) | | (n = 126) | | (1 Variable) | (All Variables) | (1 Variable) | (100,000 Iterations) |
---|
| Mean | SD | Mean | SD | p ≤ | p ≤ | p ≤ | p ≤ |
---|
Number of authors | 5.0 | 3.2 | 6.7 | 5.8 | 0.0253 | 0.0296 | 0.0032 | 0.0032 |
Subjects enrolled | 216.8 | 874.5 | 539.6 | 2367.6 | 0.3490 | 0.6310 | 0.6061 | 0.6080 |
Patients at risk | 198.1 | 875.2 | 253.5 | 1264.5 | 0.7568 | 0.6868 | 0.6816 | 0.6807 |
Treated patients | 123.6 | 438.4 | 167.2 | 1216.7 | 0.7853 | 0.9957 | 0.0047 | 0.0048 |
Because the relationship between predictors of retraction and the fact of retraction could have been influenced by other variables in the model, we did four separate logistic regressions to predict the retraction status from each predictor. A small number of RCT authors predicted retraction (p < 0.0253) whether the number of authors was the only predictor in the model or other predictors were used.
As an additional test of sensitivity, we used ANOVA to fit models using log-transformation of each of the skewed count variables with retraction status as the predictor. Log(0) was taken as log(0.5), a common manner of handling zeros when using log transformation. Log(number of authors) was significantly related to retraction status (F(1187) = 8.95, p < 0.0032), as was log(number of treated patients) (F(1187) = 8.19; p < 0.0047). This implies that, while it is not possible to “significantly” model retraction status as a function of the number of treated patients, retracted papers differ significantly from non-retracted papers in having fewer treated patients. Finally, we assessed these same four variables using permutation tests, with 100,000 samples from the permutation distribution for each of the log-transformed variables. There were significant effects for the log number of authors (p < 0.0032) and for the log number of treated patients (p < 0.0048).
A plot of the number of authors listed for case and control papers shows that retracted RCTs tend to have fewer named authors than non-retracted control RCTs (
Figure 1). It may be noteworthy that 9.5% of retracted RCTs (six of 63) had eight or more authors, whereas 24.6% of control RCTs (31 of 126) had eight or more authors. Conversely, 17.5% of retracted RCTs (11 of 63) had one or two authors, whereas just 6.3% of control RCTs (eight of 126) had so few authors.
Figure 1.
The number of authors listed in retracted case randomized clinical trials (RCTs) and non-retracted control RCTs. The category of “10 authors” includes papers with 10 or more authors.
Figure 1.
The number of authors listed in retracted case randomized clinical trials (RCTs) and non-retracted control RCTs. The category of “10 authors” includes papers with 10 or more authors.
We also determined the number of papers retracted in every field of medicine that had at least one retraction (
Table 2). This analysis shows that medical fields differ sharply in the number of retracted RCTs. Anesthesiology had significantly more retracted RCTs than other fields of medicine, even if the comparison is Bonferroni-corrected for 18 multiple comparisons (χ
2 = 94.48;
p < 0.001). The impact of “repeat offender” authors was substantial only in anesthesiology, which had 14 retracted RCTs published by a single author (Dr. Scott Reuben); Reuben thus accounted for 63.6% of all RCTs retracted in anesthesiology. Nevertheless, two additional authors also published a retracted fraudulent RCT in anesthesiology. However, if Reuben is deleted from the analysis, anesthesiology is not significantly different from other medical fields.
Table 2.
Retracted randomized clinical trials (RCTs) characterized by the field of medicine. “Publications/Retraction” was calculated by dividing “total published” by “total retracted” in each row. Anesthesiology had proportionally more retractions than every other medical field, even when Bonferroni-corrected for 18 possible comparisons between fields (χ2 = 94.48; p < 0.001). Of the 22 retractions in anesthesiology, 14 were first-authored by Scott Reuben.
Table 2.
Retracted randomized clinical trials (RCTs) characterized by the field of medicine. “Publications/Retraction” was calculated by dividing “total published” by “total retracted” in each row. Anesthesiology had proportionally more retractions than every other medical field, even when Bonferroni-corrected for 18 possible comparisons between fields (χ2 = 94.48; p < 0.001). Of the 22 retractions in anesthesiology, 14 were first-authored by Scott Reuben.
| Total | Total | Publications | | | Fraudulent |
---|
Field | Retracted | Published | /Retraction | Fraud | Error | Authors |
---|
Anesthesiology | 22 | 9881 | 449 | 16 | 6 | 3 |
Gynecology | 7 | 6874 | 982 | 3 | 7 | 3 |
Surgery | 6 | 66,719 | 11,120 | 2 | 4 | 2 |
Oncology | 5 | 6562 | 1312 | 2 | 3 | 2 |
Urology/Nephrology | 4 | 103 | 26 | 4 | 0 | 2 |
Psychology | 4 | 28,822 | 7206 | 4 | 0 | 4 |
Cardiology | 3 | 6259 | 2086 | 0 | 3 | 0 |
Pediatrics | 2 | 6832 | 3416 | 1 | 1 | 1 |
Rheumatology | 1 | 1604 | 1604 | 1 | 0 | 1 |
Diabetology | 1 | 11,055 | 11,055 | 1 | 0 | 1 |
Dentistry | 1 | 9054 | 9054 | 0 | 1 | 0 |
Emergency medicine | 1 | 1888 | 1888 | 0 | 1 | 0 |
Gastroenterology | 1 | 4281 | 4281 | 0 | 1 | 0 |
Hematology | 1 | 1913 | 1913 | 1 | 0 | 1 |
Orthopedics | 1 | 705 | 705 | 0 | 1 | 0 |
Pulmonology | 1 | 3129 | 3129 | 1 | 0 | 1 |
Sports medicine | 1 | 607 | 607 | 0 | 1 | 0 |
Transplantation | 1 | 6705 | 6705 | 1 | 0 | 1 |
Overall (sum or average) | 63 | 172,993 | 2746 | 37 | 29 | 22 |
Most retracted RCTs arise from the United States; of the 63 RCTs evaluated (
Table 1), 26 (41.3%) were published by first authors whose institution was located in the United States. The next most frequent country from which retracted RCTs arose was Japan, which had five retracted RCTs.
The overall number of retracted RCTs is plotted as a function of year of publication (
Figure 2). The number of RCTs published in several fields of medicine is also shown for comparison; diabetology had more RCTs published than did anesthesiology (
Figure 2), though anesthesiology had far more RCTs retracted (
Table 2). The number of published RCTs has increased progressively until recently, with a commensurate rise in the number of retractions. However, there does not seem to be an increase in the proportion of RCTs retracted, nor is there an obvious relationship between total number of RCTs retracted and number published within most fields of medicine. Therefore, the high rate of retraction of RCTs in anesthesiology likely cannot be explained by the increase in the overall volume of RCTs in anesthesiology (
Table 2).
Figure 2.
Randomized clinical trials (RCTs) published per year in several different fields of medicine, together with the number of retracted RCTs (times 10) year-by-year since 1980. The decline in the number of published RCTs after 2008 is striking, especially in diabetology; this may reflect a global downturn in the economy or the lack of funding for clinical research.
Figure 2.
Randomized clinical trials (RCTs) published per year in several different fields of medicine, together with the number of retracted RCTs (times 10) year-by-year since 1980. The decline in the number of published RCTs after 2008 is striking, especially in diabetology; this may reflect a global downturn in the economy or the lack of funding for clinical research.
3.2. Discussion
Our results suggest that retracted case RCTs have fewer authors and perhaps fewer treated patients than non-retracted control RCTs (
Table 1). Yet, differences between cases and controls are not robust enough to predict which papers are likely to be retracted, even when all four variables of interest are combined in a predictive model. Retraction is a very rare event, and given its low prevalence, virtually all positive findings would be false positives, no matter the decision rule. Overall, the greatest association with retraction is having few listed authors (
Figure 1), though the field of medicine may also be associated with retraction (
Table 2). Until recently, there has been a sharp increase in published RCTs in most fields of medicine (
Figure 2), though this increase in publication rate probably cannot explain the large number of retracted RCTs in anesthesiology (
Table 2).
Scrupulous evaluation of the data published in retracted RCTs has shown that there can be subtle patterns suggestive of fraud, though evaluating these patterns requires access to raw data. Such patterns can be useful if, for example, a pharmaceutical company is evaluating data from an RCT that has not been published. Multi-center clinical trials, in particular, offer an opportunity to check the plausibility of submitted clinical data, by comparing data submitted by different centers [
6]. In one retracted single-center RCT, the standard deviation of several variables were found to be “unexpectedly” and “unbelievably” low [
10]. Furthermore, when
p-values were recalculated from means and standard deviations in the tables of this paper, recalculated values did not agree with the
p-values reported by the authors [
10]. Such findings argue that RCT publication should require that a locked copy of the dataset, and of the computer programs used to produce results, be deposited in journal archives [
10], even if these data are never made available to the public.
Why are retracted case RCTs generally smaller and less ambitious than matched control RCTs? It has been postulated that research fraud represents an effort to obtain the recognition and prominence that a key paper can provide, without actually doing the work [
3]. Yet newly-developed statistical methods can detect patterns in data that may be indicative of fraud, with a focus on outliers, inliers, over-dispersion, under-dispersion, hidden correlations, and the lack thereof [
6]. Although such methods require access to raw data (and are likely to produce false positives, because retraction is rare), it can be quite difficult to fabricate plausible data [
6]. The difficulty of fabricating plausible data confirms that it would be useful to require raw data for an RCT to be filed with the journal of publication.
Clinical trials with a single author are more prone to retraction (
Figure 1). This result is consistent with anecdotal reports; Dr. R.K. Chandra published 200 papers, many of which were single-author RCTs, and many of these papers are suspected to be fraudulent [
11]. Fraudulent authors also tend to have extraordinarily high research output [
10]; such was the case with Dr. R.K. Chandra [
11], Dr. Scott Reuben [
2] and Dr. Joachim Boldt [
12]. This may be because it is less time-consuming to generate a fraudulent RCT than to perform a real one.
The toll from “repeat offender” authors is very damaging and may have skewed our results. Dr. Scott Reuben had 14 retractions among the 63 retracted RCTs examined here, which is 22.2% of all RCTs evaluated (
Table 2). Dr. Joachim Boldt had 23 studies retracted in 2011 [
12], though only one of his retracted studies is included in this study (the other 22 were retracted after the PubMed search in February, 2011). The Reuben and Boldt cases [
13,
14] illustrate a common pattern; in-depth investigation inspired by a single retracted paper reveals that a first author has engaged in questionable practices, including other fraudulent papers, sometimes dating back many years [
11]. Hence, it is essential that, if research fraud is detected, the entire output of the author in question be carefully examined.
We concur with Trikalinos
et al., who concluded that “fraudulent articles are not obviously distinguishable from non-fraudulent ones” [
15], although we can add nuance to this conclusion. The Trikalinos study concluded that the number of authors did not differ between fraudulent and non-fraudulent papers [
15], yet we found that there was a difference for RCTs, though that difference was too small to be useful as a predictor. The Trikalinos study concluded that there was no difference between research fields in the number of fraudulent papers [
15], and we strongly disagree with that conclusion; anesthesiology is the field most prone to retraction (
Table 2). Still, without Reuben, anesthesiology would not be more corrupted by misconduct than any other medical field.
What harm might be done to patients by clinical research that is eventually retracted? To evaluate this question, consider just one retracted RCT evaluated here. Some years ago, a very controversial issue in medical oncology was the use of high-dose chemotherapy to treat metastatic breast cancer, with autologous stem cell transplantation after ablation of the patients’ immune system [
16]. Though several RCTs had been done to test this therapeutic approach, no study had provided clear-cut evidence that the rigors of treatment were rewarded with improved survival.
Then, a landmark study by Dr. Werner Bezwoda claimed to show that high-dose chemotherapy with stem cell rescue was a useful treatment for breast cancer [
17]. Bezwoda had already published many papers in the medical literature; he was a well-established scientist with a commensurate reputation, and his “stem-cell rescue” paper was given credence. In the 1990s, breast cancer became the most common disease for which transplant therapy was given [
16]. Yet, it gradually emerged that the key paper [
17] was marred by fraud and that at least nine additional papers were also problematic [
18]. Eventually, an exhaustive on-site analysis demonstrated unequivocally that the original study could not have been performed as Bezwoda described [
16,
18].
A great deal of real damage was done to patients by Bezwoda’s fraud. According to the National Breast Cancer Coalition, approximately 30,000 breast cancer patients worldwide and 16,000 in the US received high-dose chemotherapy with stem cell rescue prior to when the Bezwoda study was discredited [
19]. Many terminally ill breast cancer patients, faced with a choice between standard chemotherapy, which was known to be ineffective in some cases,
versus an apparently promising therapy, chose high-dose chemotherapy with the attendant side effects. Eventually, even women with early-stage, but high-risk, breast cancer were encouraged to get high-dose chemotherapy, at a cost of $80,000 to $200,000 per treatment [
19]. Subsequent clinical trials showed that high-dose treatment with stem cell rescue did little [
20] or nothing to extend survival of breast cancer patients [
21,
22,
23,
24], though such treatment was associated with more frequent adverse events [
22], at least in the short term [
25]. Hence, a fraudulent RCT did substantial damage to breast cancer patients.