A Case-control Comparison of Retracted and Non-retracted Clinical Trials: Can Retraction Be Predicted?

Does scientific misconduct severe enough to result in retraction disclose itself with warning signs? We test a hypothesis that variables in the results section of randomized clinical trials (RCTs) are associated with retraction, even without access to raw data. We evaluated all English-language RCTs retracted from the PubMed database prior to 2011. Two controls were selected for each case, matching publication journal, volume, issue, and page as closely as possible. Number of authors, subjects enrolled, patients at risk, and patients treated were tallied in cases and controls. Among case RCTs, 17.5% had ≤2 authors, while 6.3% of control RCTs had ≤2 authors. Logistic regression shows that having few authors is associated with retraction (p < 0.03), although the number of subjects enrolled, patients at risk, or treated patients is not. However, none of the variables singly, nor all of the variables combined, can reliably predict retraction, perhaps because retraction is such a rare event. Exploratory analysis suggests that retraction rate varies by medical field (p < 0.001). Although retraction cannot be predicted on the basis of the variables evaluated, concern is warranted when there are few authors, enrolled subjects, patients at risk, or treated patients. Ironically, these features urge caution in evaluating any RCT, since they identify studies that are statistically weaker.


Introduction
Factors that potentially promote research misconduct in clinical research are legion; financial gain, personal fame, scientific hubris and the competitive nature of research funding may all contribute [1].What should balance these factors is an ethical consideration for the wellbeing of the patient, since fraudulent clinical research may put patients at risk [2][3][4][5].
Fabrication and falsification of data has a long history; allegations of misconduct have been made against Ptolemy, Galileo, Newton, Dalton and Mendel [6].Recently, a claim was made that certain "warning signs… can be used in most instances to identify… attempts to deceive" [7], though the editorial in which this claim was made identified no such warning signs.Statistical methods have been devised to detect fraud in clinical questionnaire data [8] and in clinical trial data [6], and statistical comparison of a retracted randomized clinical trial (RCT) to a non-retracted RCT found that statistical features of data in the retracted RCT were so strongly suggestive of data fabrication that other explanations were not plausible [9].Specifically, differences between treated and control groups at baseline following "randomization" were large enough to suggest that patients had actually not been allocated randomly [9].However, there are problems with this approach; given the large number of RCTs and the large number of variables measured at baseline, such discrepancies may be frequent.Furthermore, this type of analysis requires access to raw data, which can be precluded by patient confidentiality.
Here, we test a hypothesis that there are warning signs of retraction that may be available in the submitted manuscript, without access to underlying raw data.We ask, are there features of a newly submitted manuscript that should urge caution for editors and referees?

Methods
We evaluated every RCT noted as retracted from the PubMed database prior to 2011.PubMed was searched on February 1, 2011, with the limits of "retracted publication, randomized clinical trial, English language."A total of 70 RCTs were identified, all of which were exported from PubMed and saved as a text file (available upon request).
Each retracted paper was read to verify that the research was actually an RCT, defined as a study involving humans prospectively allocated to competing treatments or to treatment and placebo [2].Papers that met these criteria were further evaluated, using an established analytic approach [3].
Two non-retracted control RCTs were selected to match each retracted case RCT using a method designed to control for standards that might differ between journals or between medical fields.Controls were identified by selecting non-retracted RCTs that had appeared at the same time and in the same journal as the retracted case RCTs; controls were matched to cases by journal name, journal volume and journal issue, with the journal page number matched as closely as possible.This method assumes that control RCTs underwent the same editorial processes as time-matched case RCTs.
Quantitative analysis of retracted RCTs was done using parameters available to any editor prior to external review of a paper.Such data were extracted, as follows [5]: • Authors listed: Total number of named authors under the title; • Subjects enrolled: Total number of patients and healthy controls; • Patients at risk: Patients with an illness (if only healthy subjects were enrolled, this number could be zero)-this is a subset of "Subjects enrolled"; • Patients treated: Patients who received a risky intervention (e.g., medication, surgery) in an RCT; blood draw or similarly minor interventions do not count toward this total.Patients who received placebo or standard treatment also do not count.This number could be zero, as it is a subset of "Patients at risk." Variables were used in a block-matched case-control study, with each block a case and two controls.The primary analysis was a multiple logistic regression to predict retraction status from each of the four variables.As a sensitivity analysis, we used four simple logistic regressions, assessing whether each potential predictor was independently related to retraction status, because correlation among predictors could potentially obscure one or more relationships.Further sensitivity analyses were provided by examining the degree of relationship of retraction status with the variables above using analysis of variance (ANOVA) and a randomization test with a sample of 100,000 observations.We did each analysis with and without a block effect, but, since blocks had essentially no effect, reported results are from analyses without a block effect.All statistical analyses used a statistical package by SAS (Cary, NC, USA).
The number of retracted articles was broken down by field of medicine, as determined by journal title, to determine whether retraction was related to medical field.The total number of RCTs published in a specific field was determined by searching PubMed using the keywords "randomized clinical trials, English language, (field of medicine)."This search strategy enabled us to determine the number of publications per retraction in fields of medicine that had at least one retraction.By searching each year individually, the number of retracted RCTs in, for example, anesthesiology in 2009 could be compared to the total number of published RCTs in anesthesiology in the same year.We collate the numbers of retractions in each field descriptively.A χ 2 analysis was used to evaluate the impression that the number of retractions in anesthesiology was higher than the norm shown in pooled data from all other medical fields that had a retraction.

Results
Among the 70 papers identified as retracted RCTs by PubMed, a total of 66 papers (94.3%) were actually evaluated.Four papers were excluded from evaluation for the following reasons: one paper was in an obscure journal that could not be obtained from the medical library of a major American university; one paper was a systematic review misclassified as an RCT; one paper was a psychology study that did not involve patients or treatments; and one study did not involve humans at all.An additional three papers were excluded after further evaluation, because they did not report the results of a clinical trial, but were rather meta-analyses of data from clinical trials.That approximately 10.0% (seven of 70) of studies identified as RCTs by PubMed were misclassified suggests that it can be problematic to rely upon PubMed to identify RCTs.
Cases were matched to controls by journal and by volume in every case, though it was not always possible to match cases and controls by issue number.Of 126 control papers, 116 controls (92.1%) were selected from the same issue number as the case, implying that control RCTs went through a similar editorial process.Ten controls had to be selected from a different issue than the matched case, but nine of those 10 were selected from the next consecutive issue.One retracted case paper appeared in a journal that published so few RCTs that it was impossible to find two controls in a consecutive issue of the journal.Nevertheless, 99.2% (125/126) of controls were matched to cases within approximately one month of publication in the same journal.
We did all analyses with and without the block effect, and it made no difference in any case.We therefore dropped the block effect for simplicity, and only analyses without the block effect are reported.Table 1 contains descriptive statistics and tests of the effect of each potential predictor in a multiple logistic regression model.By this method, only the number of authors was significantly associated with retraction status (p < 0.0296).

Table 1.
Comparison of retracted randomized clinical trials (RCTs) (cases) and non-retracted RCTs (controls)."Logistic Regression" predicts retraction from all four predictors one-at-a-time and is significant only for the number of authors.All other analyses do not use a block structure, since this added nothing to the overall predictive ability of the model."Multiple Logistic Regression" tries to predict retraction from all four predictors in a single equation; the small disagreement between this analysis and the logistic regression suggests that there is some correlation among predictors, such that when all predictors are used together, each adds a small increment to the overall prediction.The log-transformed analysis-of-variance (Log-Transformed ANOVA) uses log-transformed predictors to address likely skew in the data.The "Permutation Test" is used to test the same hypotheses as the ANOVAs, but permutation tests make fewer assumptions.As expected, p-values are nearly identical to the ANOVA, indicating that the assumptions of the ANOVA are met.These p-values were not corrected for multiple comparisons, because they demonstrate only a mild association that was not useful for prediction.All analyses, except the multiple logistic regression, were sensitivity analyses.Had we corrected p-values in the multiple logistic regression, they would not have achieved statistical significance.Because the relationship between predictors of retraction and the fact of retraction could have been influenced by other variables in the model, we did four separate logistic regressions to predict the retraction status from each predictor.A small number of RCT authors predicted retraction (p < 0.0253) whether the number of authors was the only predictor in the model or other predictors were used.

Retracted
As an additional test of sensitivity, we used ANOVA to fit models using log-transformation of each of the skewed count variables with retraction status as the predictor.Log(0) was taken as log(0.5),a common manner of handling zeros when using log transformation.Log(number of authors) was significantly related to retraction status (F(1187) = 8.95, p < 0.0032), as was log(number of treated patients) (F(1187) = 8.19; p < 0.0047).This implies that, while it is not possible to "significantly" model retraction status as a function of the number of treated patients, retracted papers differ significantly from non-retracted papers in having fewer treated patients.Finally, we assessed these same four variables using permutation tests, with 100,000 samples from the permutation distribution for each of the log-transformed variables.There were significant effects for the log number of authors (p < 0.0032) and for the log number of treated patients (p < 0.0048).
A plot of the number of authors listed for case and control papers shows that retracted RCTs tend to have fewer named authors than non-retracted control RCTs (Figure 1).It may be noteworthy that 9.5% of retracted RCTs (six of 63) had eight or more authors, whereas 24.6% of control RCTs (31 of 126) had eight or more authors.Conversely, 17.5% of retracted RCTs (11 of 63) had one or two authors, whereas just 6.3% of control RCTs (eight of 126) had so few authors.We also determined the number of papers retracted in every field of medicine that had at least one retraction (Table 2).This analysis shows that medical fields differ sharply in the number of retracted RCTs.Anesthesiology had significantly more retracted RCTs than other fields of medicine, even if the comparison is Bonferroni-corrected for 18 multiple comparisons (χ 2 = 94.48;p < 0.001).The impact of "repeat offender" authors was substantial only in anesthesiology, which had 14 retracted RCTs published by a single author (Dr.Scott Reuben); Reuben thus accounted for 63.6% of all RCTs retracted in anesthesiology.Nevertheless, two additional authors also published a retracted fraudulent RCT in anesthesiology.However, if Reuben is deleted from the analysis, anesthesiology is not significantly different from other medical fields.
Table 2. Retracted randomized clinical trials (RCTs) characterized by the field of medicine."Publications/Retraction" was calculated by dividing "total published" by "total retracted" in each row.Anesthesiology had proportionally more retractions than every other medical field, even when Bonferroni-corrected for 18 possible comparisons between fields (χ 2 = 94.48;p < 0.001).Of the 22 retractions in anesthesiology, 14 were first-authored by Scott Reuben.Most retracted RCTs arise from the United States; of the 63 RCTs evaluated (Table 1), 26 (41.3%) were published by first authors whose institution was located in the United States.The next most frequent country from which retracted RCTs arose was Japan, which had five retracted RCTs.
The overall number of retracted RCTs is plotted as a function of year of publication (Figure 2).The number of RCTs published in several fields of medicine is also shown for comparison; diabetology had more RCTs published than did anesthesiology (Figure 2), though anesthesiology had far more RCTs retracted (Table 2).The number of published RCTs has increased progressively until recently, with a commensurate rise in the number of retractions.However, there does not seem to be an increase in the proportion of RCTs retracted, nor is there an obvious relationship between total number of RCTs retracted and number published within most fields of medicine.Therefore, the high rate of retraction of RCTs in anesthesiology likely cannot be explained by the increase in the overall volume of RCTs in anesthesiology (Table 2).

Discussion
Our results suggest that retracted case RCTs have fewer authors and perhaps fewer treated patients than non-retracted control RCTs (Table 1).Yet, differences between cases and controls are not robust enough to predict which papers are likely to be retracted, even when all four variables of interest are combined in a predictive model.Retraction is a very rare event, and given its low prevalence, virtually all positive findings would be false positives, no matter the decision rule.Overall, the greatest association with retraction is having few listed authors (Figure 1), though the field of medicine may also be associated with retraction (Table 2).Until recently, there has been a sharp increase in published RCTs in most fields of medicine (Figure 2), though this increase in publication rate probably cannot explain the large number of retracted RCTs in anesthesiology (Table 2).
Scrupulous evaluation of the data published in retracted RCTs has shown that there can be subtle patterns suggestive of fraud, though evaluating these patterns requires access to raw data.Such patterns can be useful if, for example, a pharmaceutical company is evaluating data from an RCT that has not

Figure 1 .
Figure 1.The number of authors listed in retracted case randomized clinical trials (RCTs) and non-retracted control RCTs.The category of "10 authors" includes papers with 10 or more authors.

Figure 2 .
Figure 2. Randomized clinical trials (RCTs) published per year in several different fields of medicine, together with the number of retracted RCTs (times 10) year-by-year since 1980.The decline in the number of published RCTs after 2008 is striking, especially in diabetology; this may reflect a global downturn in the economy or the lack of funding for clinical research.