A Novel Rubric for Rating the Quality of Retraction Notices

When a scientific article is found to be either fraudulent or erroneous, one course of action available to both the authors and the publisher is to retract said article. Unfortunately, not all retraction notices properly inform the reader of the problems with a retracted article. This study developed a novel rubric for rating and standardizing the quality of retraction notices, and used it to assess the retraction notices of 171 retracted articles from 15 journals. Results suggest the rubric to be a robust, if preliminary, tool. Analysis of the retraction notices suggest that their quality has not improved over the last 50 years, that it varies both between and within journals, and that it is dependent on the field of science, the author of the retraction notice, and the reason for retraction. These results indicate a lack of uniformity in the retraction policies of individual journals and throughout the scientific literature. The rubric presented in this study could be adopted by journals to help standardize the writing of retraction notices.


Introduction
Science builds on the work of others, so the scientific community must be able to trust the accuracy of published research.Occasionally, a research article must be retracted due to error or fraud.Having an article retracted can ruin a scientific career, and have severe repercussions for other researchers OPEN ACCESS using concepts or methodology from said article.If the retraction notice does not clearly specify the cause of the retraction, it can be impossible for researchers to know if and to what degree their own work is affected.Consequently, the quality of the notice accompanying a retraction is of great importance.Unfortunately, the quality of these notices may vary considerably.
Retractions have become more scrutinized in recent years.The blog Retraction Watch, founded in August 2010, reports on retractions and investigations within the scientific community, often being more elaborate about the details of the retractions than the journals themselves.For example, when covering a retraction in Clinical and Translational Allergy, the website obtained further information about the retraction by emailing one of the collaborators on the paper, in order to better understand what the retraction notice called "unintentional errors in the analysis of the data presented" [1].Also commentated upon are the ways in which journals handle retractions.For example, one post reports upon a retraction while expressing frustration with "one of the journal's typically inscrutable retraction notices" [2].This indicates not only that some retraction notices are more informative than others, but also that journals vary in the quality of their retraction notices.
The Committee on Publication Ethics (COPE) was created in 1997 to help regulate the publishing process.In the wake of retraction research, they created guidelines for creating a retraction notice [3].Though guidelines such as these are available, they are not always utilized.In a study of high impact biomedical journals [4], only 18% reported having a policy for retraction.Another study showed that only 36% of editors were aware of COPE's guidelines, and only 17% used them [5].
The recent attention on retractions in the scientific literature has generated research [6][7][8][9][10][11][12][13][14][15][16].However, very little has been done on the retraction notices themselves.When looking at how a reader was notified about a retraction, one study determined that 31.8% of retracted articles were not noted as such in any way [13].Another study showed that, where misconduct contributed to a retraction, only 41.2% of notices specified ethics as a problem [10], indicating that incidences of fraud may not be properly identified as such in retraction notices.There have not been any studies attempting to quantify the quality of retraction notices.
The purpose of this study was to create a rubric based on the COPE guidelines to quantify retraction notices, and to use this rubric to analyze the current quality of retraction notices.It was hypothesized that the quality of notices would vary according to many factors; including the journal, its impact, its scientific field, the reason for retraction, and the author of the notice.

Development of a Rubric to Assess Retraction Notices
The two lead authors developed an initial rubric to assess the quality of retraction notices, based on the COPE retraction guidelines [3], and then refined the rubric to better distinguish between ratings by examining a selection of retraction notices.These authors, along with the third author (not involved in the rubric development and thus serving as a blind comparison), read each retraction notice and rated it on a scale of zero to three, based on this refined rubric: 0-No reason for retraction can be discerned from the notice.
1-The reason for retraction can be inferred but is not stated clearly through the naming or definition of a category.
2-The reason for retraction is clearly stated, but explanation is not given as to how the rest of the article was affected by retraction.
3-The reason for retraction is clearly stated and explanation is given for if and how the entirety of the article was affected by the fault.
After the initial separate ratings, the three authors reviewed the retractions with disputed scores.This allowed discussion of exceptions to the rubric, produced a consensus rating of all retraction notices, and increased the clarity of the rubric.

Selection of Journals for Assessment
A selection of high and low impact journals were chosen, in a variety of fields, to provide as broad a selection of retraction notices as was practical.The highest impact journals in five scientific fields (biological chemistry, cellular, medical, multidisciplinary, and physics) were chosen from those listed in Web of Knowledge, based on their Eigen factor™, five-year impact factor, and Article Influence™ score as of 7 March 2012.At least one low impact journal was matched to the high impact journal in a scientific field to assure multiple retraction notices from both high and low impact journals in that field (Table 1).The Eigen factor™, five-year impact factor, and Article Influence™ score of the low impact journals chosen within a field had to be less than half of those for the high impact journal.There was a significant difference between the higher and lower impact journals when comparing 5 year impact factor (p < 0.05), Eigen factor (p < 0.05), and Article influence score (p < 0.05).For Journal of the Chinese Medical Association, the Article Influence™ and five-year impact factor were not provided, so only the Eigen factor was used.PubMed and Web of Knowledge were used to find every retraction from each of the resultant 15 journals with 171 retractions from 1960 to February 2012.1960 was chosen because it was the earliest year of publication for which retractions were available, with the earliest retraction notice dating from 1968.This information was found on Web of Knowledge by searching the journal name under Publication title, and retraction in topic.The search was then refined by limiting the search to retractions.In PubMed, under the limits for a search, the corrected/retracted article search limit was selected, and a journal was added under add journal.

General Analysis of the Retractions
The following information was collected for each retraction: journal, issue, volume, page, author, title, times cited, date the article was published and retracted, reason for retraction, link to retraction notice, and which entity retracted the article.
The reason for retraction (fraud or error) was determined based solely on the information provided in the retraction notice.Following the conservative example of Steen [6], plagiarism was included within the definition of error.Other examples of error included experimental mistakes, spurious authorships, misinterpretation of results or the inability to reproduce them, plagiarism and self-plagiarism.Fraud was defined as either data fabrication or data falsification [6], as defined by the Office of Research Integrity [17].

Statistical Analyses
When analyzing retraction notices that retracted multiple articles, each notice was counted once.All data was analyzed using Excel and JMP.Statistical analyses included: t-tests, to show any statistical difference between two populations; ANOVA, to indicate any statistical difference within a group; and Tukey-Kramer tests, in conjunction with ANOVA, to determine if means were significantly different from one another.

Analysis of the Rubric
Figure 1 compares the three scientists who rated every retraction notice, an average of their ratings, and a consensus rating of each retraction after discussion by the group.The retraction notices received significantly lower ratings from Researcher 1 when compared with the consensus and Researcher 3 (p < 0.05), suggesting that the rubric may vary from person to person.However, the consensus was not significantly different from the average, and the ratings of Researcher 3 (the blind comparison) varied only from Researcher 1 (p < 0.05).These last two findings indicate that the rubric is reasonably robust.

Figure 1.
The retraction notice ratings by each scientist, an average of their ratings ("All researchers"), and a consensus rating of each notice after group discussion ("Decided Upon").Retraction notice ratings from Researcher 1 were significantly lower than consensus and Researcher 3 (p < 0.05), while there was no difference between consensus and average.Ratings from Researcher 3, the blind comparison, were significantly different from Researcher 1 (p < 0.05).Blue bars represent the standard error in the mean.

Analysis of the Retraction Notices
The number of retraction notices rated in each category of the rubric is shown in Figure 2.There are no retraction notices rated above zero where the reason for retraction was unclear, because the rubric states that a notice that does not clearly state the reason for retraction should receive a low rating.The fewest retraction notices received a one, as few notices at the lower end of the scale could have the reason for retraction inferred.Because the rubric was created with the intention of spacing the severity of the infractions into categories, an uneven distribution of notice ratings was not surprising.Articles retracted due to error or fraud had significantly higher notice ratings (p < 0.05) than those whose reason for retraction was unclear (Figure 2).However, notices that dealt with fraud were not statistically different from those addressing error.There was no statistically significant difference between notice ratings for erroneous and fraudulent articles, but both rated significantly higher than those where the reason was unclear (p < 0.05).No unclear retractions rated above 0.
Figure 3 represents the percentage of retractions due to error, fraud or that could not be determined for each journal, along with the percentages for all retraction notices in this study.While there is considerable variation between journals, error is the major reason for retraction (62% for all retraction notices), followed by the reason being unclear (22%) and fraud (16%).While these findings compare favorably with the earlier work of Nath and co-authors [7], they are in marked contrast to the recent findings of Fang, Steen and Casadevall, who found only 21% of the articles they studied were retracted for error, with 67% retracted for misconduct [12].There are two main reasons for this difference.Firstly, the misconduct category consisted of fraud (43%), plagiarism (10%) and duplicate publication (14%) [12].Secondly, the present study determined the reason for a retraction based solely on the retraction notice, while Fang and co-authors used additional resources to make their determinations [12].While this is certainly a limitation of the current study, and while Fang and co-authors obtained a more accurate picture of the reasons articles are retracted [12], the authors of the current study made this decision to most thoroughly assess the proposed rubric and also because a retraction notice should (according to the COPE retraction guidelines) "state the reason(s) for retraction (to distinguish misconduct from honest error)" [3].As can be seen in Figure 4, there was considerable variation in the quality of retraction notices both within and between journals.Science and Cell were the only journals whose mean retraction notice ratings were significantly above the grand mean (p < 0.05), while both Annals of the New York Academy of Sciences and Journal of Biological Chemistry were significantly below (p < 0.05).When comparing high impact journals, Science, Cell, New England Journal of Medicine and Physical Review Letters had significantly higher ratings than Journal of Biological Chemistry (p < 0.05).Annals of the New York Academy of Sciences was the only journal to differ within the low impact journals (p < 0.05), though the general lack of variation between these journals may well be due to less of retractions to compare.
Retractions notices were categorized by whether the article was retracted by the journal (17%) or the author (80%).This differs from a previous study [11], with 17% more author retractions.Retraction notices written by the journal had significantly higher ratings than those by the author (p < 0.05), which is interesting given a recent study suggesting that the scientific community is more forgiving of authors who self-report errors in their articles.
When examining retraction notices between the scientific fields represented in the current study (Figure 5), biological chemistry rated significantly lower than the other fields (being statistically below the grand mean, while multidisciplinary science was statistically above, p < 0.05).These differences can both be attributed to Journal of Biological Chemistry.Journal of Biological Chemistry contributed two-thirds of retraction notices in the field of biological chemistry, and the majority of those were unclear.When these were removed from the dataset, the mean rating of the field was significantly higher (p < 0.05), and no field was statistically different from the grand mean.There was no significant difference between the mean retraction notice ratings of the higher (1.7 ± 1.2) and lower impact journals (1.9 ± 0.8), though the higher impact journals exhibited more variation.This is probably due to there being fewer retractions from lower impact journals.The higher impact journals had more notices which rated a zero (many from Journal of Biological Chemistry), while the lower impact journals had fewer which rated a three (Figure 6).
When comparing retraction notices from each rubric category in terms of citations per year of the retracted article, there is a significant increase in citations for the original articles with retraction notices with a rating of 3 (p < 0.05) compared to those rating 0 or 2 (Figure 7).A rating of 1 is significantly different from 0. These results can be interpreted numerous ways: Greater care may be taken with retraction notices of prominent articles; researchers may cite articles from high-impact journals regardless of retraction; researchers might be unaware of the retraction; though retracted, some unaffected parts of an article might still be used and cited; or the result may purely relate to journal impact, as the majority of citations occur before a retraction [15].Regardless, citations should indicate the number of people who have read the article and are impacted by a retraction.A Tukey-Kramer test showed that biological chemistry was significantly lower and multidisciplinary science significantly higher than the grand mean (p < 0.05).However, when Journal of Biological Chemistry ratings were removed, there were no statistically significant differences between fields.There was a significant difference in overall rating for biological chemistry with and without Journal of Biological Chemistry (p < 0.05).Blue bars represent the standard error in the mean.Figure 6.Percentage of retraction notices rated 0, 1, 2 or 3.For lower impact journals 10.5% rated 0, 2.6% rated 1, 71.1% rated 2 and 15.8% rated 3.In higher impact journals, 28.7% rated 0, 4.6% rated 1, 34.3% rated 2, 32.4% rated 3.
Finally, retractions were grouped into five-year increments and analyzed for any significant differences in retraction notice ratings over time (Figure 8).2006-2011 was further broken down to separate out notices published after the release of the COPE retraction guidelines [3].Interestingly, this time period (2009-2011) contained notices that rated significantly lower than the preceding time periods (p < 0.05), excluding 1968-1990 (probably due to the limited amount of notices from that earlier period).This result is not surprising, given the limited time the guidelines had to impact notices (about a year) and previous findings that these guidelines were not widely known to journal editors [5].A valuable future study would be to investigate the influence of these guidelines on the quality of retraction notices.

Conclusions
Retraction notices are an important tool in correcting the scientific record when an article is retracted, so they must be as accurate and informative as possible.The current study is an initial attempt to create a rubric for assessing the quality of retraction notices.Results suggest the rubric has promise.A preliminary analysis of 171 notices from 15 journals found considerable variation in notice quality within and between journals.The journal, its impact and field, and the author of the notice, all play a role in this variation.
The original rubric was refined during initial testing to eliminate ambiguities.For example, if the notice was not clear as to the reason for retraction, but a link in the notice allowed that determination, it was decided that information should also be assessed.For plagiarism, the notice must specify (self-)plagiarism to receive a 2 and must state the information that was copied to receive a 3. Finally, if it was not apparent how much of an article was affected because experiments could not be duplicated, all conclusions were considered null.However, if not stated, the notice could not receive a 3.
Though it is not clear in every situation, the variance in retraction notices makes producing a perfect rubric difficult.With experience, the rubric shows potential for helping craft and evaluating retraction notices.
The journals studied here were a mixture of high and low impact in several fields.While done to provide a diverse sample set, it may also have produced weakened correlations.The higher impact journals had more retractions than the lower impact journals.Increased publicity in higher impact journals could motivate scientists to commit fraud in order to gain attention, while the level of scrutiny from the scientific community could be more intense for these journals [18].
Most journals were not consistent in the quality of their retraction notices.Journal of Biological Chemistry was an exception, but their notices were consistent in their lack of information.Journal of Biological Chemistry frequently publishes notices, stating: "This article has been retracted by the Publisher."Journal of Biological Chemistry also contributed to why there was not a difference in notice ratings for higher and lower impact journals.Having said this, the publisher of Journal of Biological Chemistry has since hired a manager of publication ethics to help oversee the writing of retraction notices [19], and initial results are promising [20].Though beyond the scope of this study, it would be interesting to compare notice ratings before and after this appointment.
Future research could include looking at changes in notice quality over time for specific journals, comparing the notice quality for different types of error and fraud, and comparing the current rubric with a direct assessment using the COPE guidelines of the same retraction notices.
To our knowledge, this is the first published study that attempts to quantify retraction notice quality.With cooperation from journal editors, more investigation into the retraction policies for each journal could help standardize and improve the retraction process.A 2012 opinion article in The Scientist by the authors of Retraction Watch called for the creation of a "transparency index" to rate how transparent journals are about the accuracy of articles [21].They suggest it be formed using numerous criteria, including the journal's use of preventative measures like plagiarism-checking software and "whether corrections and retraction notices are as clear as possible, conforming to accepted publishing ethics guidelines such as those from COPE or the International Committee of Medical Journal Editors."The current attempt to quantify the quality of retraction notices could help in creating such an index.Since retractions are not always common, many lower impact journals, which often have smaller readerships and fewer resources to address suspect articles, go through decades without a retraction.This minimizes editor experience with the retraction process, making it vital that journals define a standard policy for retractions and writing their notices.When an article does need retraction, the focus should be to inform the readers of the problem with as much information as possible.In the process of retraction, transparency should be of the utmost importance.

Figure 2 .
Figure2.Retraction notices rated in each category of the rubric by consensus.The notices are further categorized by the reason for the retraction (error, fraud or unclear) as determined from the retraction notice.There was no statistically significant difference between notice ratings for erroneous and fraudulent articles, but both rated significantly higher than those where the reason was unclear (p < 0.05).No unclear retractions rated above 0.

Figure 3 .
Figure 3. Percentage of retractions due to error, fraud and unclear reasons, for each journal studied and for the total sample set.Error was the major reason for retraction (62%), followed by the reason being unclear (22%) and fraud (16%).

Figure 4 .
Figure 4. Retraction notice ratings, as an average of the two lead-author ratings, for each journal.A Tukey-Kramer test showed that Journal of Biological Chemistry and Annals of the New York Academy of Sciences had mean ratings below the grand mean (p < 0.05), while Science and Cell rated above (p < 0.05).Of the high impact journals, Science, Cell, New England Journal of Medicine and Biochemical Biophysical Research Communications had significantly higher rated notices than Journal of Biological Chemistry (p < 0.05).Blue bars represent the standard error in the mean.Five journals either had insufficient retractions, or no variation in the ratings for their notices, so blue bars are not visible.

Figure 5 .
Figure5.Consensus retraction notice ratings for each scientific field represented in this study.A Tukey-Kramer test showed that biological chemistry was significantly lower and multidisciplinary science significantly higher than the grand mean (p < 0.05).However, when Journal of Biological Chemistry ratings were removed, there were no statistically significant differences between fields.There was a significant difference in overall rating for biological chemistry with and without Journal of Biological Chemistry (p < 0.05).Blue bars represent the standard error in the mean.

Figure 7 .
Figure 7. Consensus retraction notice rating compared to number of article citations per year (date of publication to February 2011).There is a significant difference for notices with a rating of 3 (p < 0.05) compared to notice ratings of 0 and 2. A rating of 1 is significant from 0 (p < 0.05).Blue bars represent the standard error in the mean.

Figure 8 .
Figure 8. Comparing retraction notice ratings with their date of publication.Notices were binned into 5 year increments, except 1968-1990 (extended to include sufficient notices for analysis) and 2006-2011, which was separated into before (2006-2008) and after (2009-2011) the release of the Committee on Publication Ethics (COPE) retraction guidelines [3].2009-2011 contained significantly lower notice ratings than all but the 1968-1990 increment (p < 0.05).No other statistical significance was observed.Blue bars represent the standard error in the mean.

Table 1 .
Descriptive statistics for the studied journals.Statistics include 5-year Impact factor, Eigen Factor, Article Influence score (as of 7 March 2012), and the percentage of articles retracted out of all the articles each journal has published as of February 2012.