Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Can Retracted Social Science Articles Be Distinguished from Non-Retracted Articles by Some of the Same Authors, Using Benford’s Law or Other Statistical Methods?

Publications 2023, 11(1), 14; https://doi.org/10.3390/publications11010014

by Walter R. Schumm^1,*

, Duane W. Crawford¹, Lorenza Lockett², Asma bin Ateeq³ and Abdullah AlRashed⁴

Reviewer 1:

Iván Herrera-Peco

Reviewer 2: Anonymous

Publications 2023, 11(1), 14; https://doi.org/10.3390/publications11010014

Submission received: 28 November 2022 / Revised: 3 February 2023 / Accepted: 24 February 2023 / Published: 3 March 2023

Round 1

Reviewer 1 Report

Dear Authors

First of all, I would like to point out that I find the subject of your study interesting and with a novel approach.

After reviewing your manuscript I have some comments:

1.- Please, could you explain how the rejected papers have been selected?.

Which search engine did you use to find the retracted articles? Did you use a bibliographic manager?

Furthermore, regarding the reasons for retraction, since the main objective for you is to observe whether or not there is a statistical association, did you use RertactionWatch to find out the reasons for retraction, and from there did you randomly select the documents analyzed?

Or did you select, for convenience, the studies included in your study? If so, what was the selection criterion?

2.- I noticed that you have included the google citations in table 1. However, that is not an accepted indicator of scientific quality. You have valued the inclusion of citations of the papers, for example through SCOPUS or WoS.

Google Scholar may present the problem that it continues to provide citations to a retracted article even though it is retracted. They would need to use a search engine, and a suitable scientific database.

3.- epigraph 3.24, is more associated with the methodology section than the results section.

4.- line 185-187. if you have used a literal quotation, please include it correctly.

Author Response

Here are our comments for both reviewers and the editors:

Cover Letter for Revision to Schumm et al.’s “Can Retracted Social Science Articles Be Distinguished from Non-Retracted Articles by Some of the Same Authors, Using Benford’s Law or Other Statistical Methods?

Our response to Reviewer #1

The selection process is described on lines 163-172 in the revised version. Pickett proposed the questionable articles, that was our source for them. The control articles were selected by using Google Scholar to find articles on criminology by co-authors of Stewart but not Stewart himelf. We did add four more control articles, doubling the original number, which did change our results for the control group.

We did check with the Retraction Watch website and the six retracted Stewart articles are there, no more. We selected all the documents there but mainly because of Pickett.

We wanted to select control articles published since 2000 that dealt with criminology and were authored by co-authors on the retracted Stewart articles.

When I checked SCOPUS and WoS I had difficulty pulling up citation counts, maybe my internet connection wasn’t good enough. However, it was not our objective to see if citation rates fell off after articles were retracted but to show to what extent articles had been cited, for better or worse. Getting into the weeds on quality of citations wasn’t the main priority of this report.
This point seemed more appropriate to 3.21 but we tried to clearly keep or change material so it would more definitively fit into the introduction, review of literature, methods, results, limitations, or discussions sections.
This is now at line 248. The material comes from two tables in that article and I am hopeful that mentioning both tables and their page numbers is sufficient for an accurate quote. I am not sure how to quote part of a table, in other words, except in the way it has been done.
Thank you for your helpful comments!

Our Response to Reviewer #2

We divided the background into an introduction and a review of literature, keeping the introduction short, as recommended. We stated the general hypothesis in the introduction. We checked each section and moved or eliminated content that did not fit.
To increase the sample size we doubled the number of control articles. We wanted to add a fifth new control article but its contents would not allow us to test for enough anomalies or the lack thereof. The idea proposed for obtaining 20 papers and applying the same anomalies is interesting but beyond the scope of this paper which is focused on the types of anomalies claimed by Pickett, only some of which might apply to retracted research in general. In future research we will expand the sample size to include the whole body of Stewart’s research and perhaps use a larger control group. Beyond that, we may attempt to apply what has been learned to a wider set of social science authors.
Line 16-17. We deleted this confusing statement although we did note later that some of the useful programs (GRIM, SPRITE) work best for samples under 200, which is what we were trying to get at.
Line 27: We added this to line 42 and later in the methods section.
Line 42-43. This was deleted.
Lines 35-92. We split the material into two sections – an introduction and a review of literature, keeping the introduction shorter. Some material was moved to the discussion section as noted.
Lines 93-126. We now mention the basic hypothesis at lines 57-59 and at 147-151.
It is true that a method that works for Stewart may not work for others. We try to clarify that at lines 484-490 as a limitation of the research. But we see this as a first step for later work. Even if the anomalies don’t generalize (although the binary variables and Benford’s Law should at least), the ideas of developing a specific type of control group and comparing the two groups with parametric and nonparametric statistics are an important guidance for future researchers.
Tables 1 and 2. We reorganized the tables in sequence and content to make for smoother reading and a better logical order. We also placed them in more appropriate locations.
Lines 149-158 and lines 159-212. We moved content or deleted it in order to achieve better consistency of content for each section.
Lines 281-309. We dropped hypothesis 4 entirely and reduced the number of hypotheses to three.
Lines 310-468. We reorganized both sections to reduce overlap.
Lines 469-470. We deleted the objectionable material entirely.
Lines 473-474. We discussed this issue a bit more at lines 454-458. It could go either way but the effects probably either didn’t exist very strongly or they canceled each other out.
Lines 507-515. This material was deleted.
Lines 525-530. We moved this lines 446-451.
Appendix A. We redid this analysis because of the added control group articles but we kept the results as an appendix because they are a relatively minor factor.
Thank you so much for your many comments, they were very helpful for improving the paper!

To the Editors:

We deleted one reference and added several to improve the relevance of their content.
We set up the document to keep track changes but it may not have worked. In any event, there are so many changes to the paper based on our changes in hypotheses (8 to 3), addition of new control group articles, and focusing anomalies based on regression tables rather than other material, that most of the material is either new or the references have new numbers.

Our responses to the referees are presented above.

We kept the material in Appendix A but we don’t feel strongly about moving it if you wish.
We apologize for the delay past ten days. Due to relatives visiting I didn’t see the major changes results until after Christmas and given the holidays, it took extra time to make all the recommended/suggested changes, including finding more control articles and rerunning most of the analyses with those new control articles added.
Thank you for re-considering this paper, with best wishes for your New Year!

Reviewer 2 Report

This paper does not abide by the generally accepted rules of how to write a scientific paper. The Introduction should be short, merely posing a question and clarifying why that question is interesting or important. The Intro is also the place to state a clear hypothesis. The Methods should state clearly what was done, not what was found or what it means. The Results should collate all findings in the same place, without any interpretation at all, since some readers may value the data but disagree with the interpretation given them. The Discussion should say what the authors believe it all means, without presenting new data or getting distracted into discussing topics for which no data are shown. This is as stylized as haiku for a reason; without this style, it all seems like an opinion piece.

The greatest flaw of this paper is the tiny sample size. I know this arises because of the effort to analyze papers by the same set of authors but, at the end of that effort, we don’t know if the rules developed here have any generality. Rather than focus on a few people, the authors should increase the sample size by several-fold, using these data as a training set, then analyzing a larger test set. A quick search of the Retraction Watch database (http://retractiondatabase.org/RetractionSearch.aspx) shows that there have been more than 50 retracted papers in the category of “social science OR (SOC) Sociology” alone. I appreciate that this is quite a lot of additional work, but the authors should choose their best indicators and apply them to at least 20 new papers in a test set.

Line 16-17: Please clarify, if possible, why “some of those approaches do not work well for results obtained from larger samples”, as this statement is counterintuitive. I would have thought that large N sample sizes would be far easier to analyze. It is perhaps best to delete this sentence from the Abstract, as it seems unlikely that clarification could be done in a few words. But this idea certainly bears some discussion later.

Line 27: The phrase “Deviations from Benford’s Law” should be replaced with a more descriptive phrase such as “deviations from the predicted distribution of first digits”. In general, the less jargon the better, especially in the Abstract.

Line 42-43: I don’t see the relevance of the sentence that reads, “The costs of detecting, confirming, and defending retracted research should be added to the loss in value of grant funding.”.

Lines 35-92: The Introduction is, to my mind, too long. The material in lines 50-65 may be all that is necessary, since an introduction is meant to merely pose a question and show why that question is interesting. Lines 66-92 seem more appropriate for the Discussion.

Lines 93-126: The section on Objectives is not appropriate, as it should be part of the Introduction. I also strongly believe it is wrong to have an objective, then try to “prove” it. A good scientist should have a hypothesis, then earnestly try to disprove it. Only if you fail to disprove it, can you provisionally accept it, but you have not “proven” it. This may sound like a semantic difference, but it is not; trying earnestly to falsify a hypothesis lends a rigor that cannot be achieved in any other way. The clearest sentence stating objective is, “The general objective was to answer the research question: do the two groups of articles (retracted vs. controls) differ in statistically discernable ways despite the small sample sizes?” This can become, “Our hypothesis is that retracted articles differ from control articles in statistically significant ways.” Half as many words and greater clarity!

Methods: I think it may be a problem that all retracted articles had Stewart as a co-author. One could imagine that cheaters have a particular style, so a method worked out to identify fraud by Stewart may not work well for cheaters other than Stewart. This is a problem of the tiny sample size, but it may also suggest that the authors have an axe to grind with Stewart.

Tables 1 and 2: Both of these tables are results and should not be in the Methods section. In addition, Table 2 presents results before there is any explanation of how they were obtained, and no explanation of what the various acronyms mean.

Lines 149-158: This section conflates methods with both results and discussion. There should be no interpretation in the Methods section.

Lines 159-212: All these sections conflate methods with both results and discussion. I find it confusing to thoroughly mix sections like this, and it makes it seem that there’s a lot of circular reasoning.

Lines 281-309: You cannot possibly test 8 hypotheses in a single paper with any kind of rigor!

Lines 310-468: This is a confusing morass of Results and Discussion. A result is a finding, free of any interpretation; all interpretation is done in the Discussion. The logic is that readers can accept your results without necessarily accepting your interpretation of them.

Lines 469-470: The first paragraph of the Discussion should summarize what has been found. Instead, we read, “Articles based on taxpayer funding were no more or less likely to have been retracted.” This is unrelated to any of the stated hypotheses and has no place here, except perhaps late in the Discussion, as an aside. The position of prominence given to it is distracting.

Lines 473-474: The authors state that, “Having more authors for an article didn’t seem to prevent fraud/retraction.” In fact, many authors on a paper is associated with a greater risk of retraction, as responsibility is diffused across a broad array of people, who may not take that responsibility seriously. Fraudulent authors will apparently seek careless but prominent co-authors, to increase the likelihood of journal acceptance.

Lines 507-515: This material has no place here.

Lines 525-530: Here is what should be the opening paragraph of the Discussion.

Appendix A: I think this should become part of the paper.

Author Response

Here are our comments for both reviewers and the editors:

Our response to Reviewer #1

The selection process is described on lines 163-172 in the revised version. Pickett proposed the questionable articles, that was our source for them. The control articles were selected by using Google Scholar to find articles on criminology by co-authors of Stewart but not Stewart himelf. We did add four more control articles, doubling the original number, which did change our results for the control group.

We did check with the Retraction Watch website and the six retracted Stewart articles are there, no more. We selected all the documents there but mainly because of Pickett.

We wanted to select control articles published since 2000 that dealt with criminology and were authored by co-authors on the retracted Stewart articles.

When I checked SCOPUS and WoS I had difficulty pulling up citation counts, maybe my internet connection wasn’t good enough. However, it was not our objective to see if citation rates fell off after articles were retracted but to show to what extent articles had been cited, for better or worse. Getting into the weeds on quality of citations wasn’t the main priority of this report.
This point seemed more appropriate to 3.21 but we tried to clearly keep or change material so it would more definitively fit into the introduction, review of literature, methods, results, limitations, or discussions sections.
This is now at line 248. The material comes from two tables in that article and I am hopeful that mentioning both tables and their page numbers is sufficient for an accurate quote. I am not sure how to quote part of a table, in other words, except in the way it has been done.
Thank you for your helpful comments!

Our Response to Reviewer #2

We divided the background into an introduction and a review of literature, keeping the introduction short, as recommended. We stated the general hypothesis in the introduction. We checked each section and moved or eliminated content that did not fit.
To increase the sample size we doubled the number of control articles. We wanted to add a fifth new control article but its contents would not allow us to test for enough anomalies or the lack thereof. The idea proposed for obtaining 20 papers and applying the same anomalies is interesting but beyond the scope of this paper which is focused on the types of anomalies claimed by Pickett, only some of which might apply to retracted research in general. In future research we will expand the sample size to include the whole body of Stewart’s research and perhaps use a larger control group. Beyond that, we may attempt to apply what has been learned to a wider set of social science authors.
Line 16-17. We deleted this confusing statement although we did note later that some of the useful programs (GRIM, SPRITE) work best for samples under 200, which is what we were trying to get at.
Line 27: We added this to line 42 and later in the methods section.
Line 42-43. This was deleted.
Lines 35-92. We split the material into two sections – an introduction and a review of literature, keeping the introduction shorter. Some material was moved to the discussion section as noted.
Lines 93-126. We now mention the basic hypothesis at lines 57-59 and at 147-151.
It is true that a method that works for Stewart may not work for others. We try to clarify that at lines 484-490 as a limitation of the research. But we see this as a first step for later work. Even if the anomalies don’t generalize (although the binary variables and Benford’s Law should at least), the ideas of developing a specific type of control group and comparing the two groups with parametric and nonparametric statistics are an important guidance for future researchers.
Tables 1 and 2. We reorganized the tables in sequence and content to make for smoother reading and a better logical order. We also placed them in more appropriate locations.
Lines 149-158 and lines 159-212. We moved content or deleted it in order to achieve better consistency of content for each section.
Lines 281-309. We dropped hypothesis 4 entirely and reduced the number of hypotheses to three.
Lines 310-468. We reorganized both sections to reduce overlap.
Lines 469-470. We deleted the objectionable material entirely.
Lines 473-474. We discussed this issue a bit more at lines 454-458. It could go either way but the effects probably either didn’t exist very strongly or they canceled each other out.
Lines 507-515. This material was deleted.
Lines 525-530. We moved this lines 446-451.
Appendix A. We redid this analysis because of the added control group articles but we kept the results as an appendix because they are a relatively minor factor.
Thank you so much for your many comments, they were very helpful for improving the paper!

To the Editors:

We deleted one reference and added several to improve the relevance of their content.
We set up the document to keep track changes but it may not have worked. In any event, there are so many changes to the paper based on our changes in hypotheses (8 to 3), addition of new control group articles, and focusing anomalies based on regression tables rather than other material, that most of the material is either new or the references have new numbers.

Our responses to the referees are presented above.

We kept the material in Appendix A but we don’t feel strongly about moving it if you wish.
We apologize for the delay past ten days. Due to relatives visiting I didn’t see the major changes results until after Christmas and given the holidays, it took extra time to make all the recommended/suggested changes, including finding more control articles and rerunning most of the analyses with those new control articles added.
Thank you for re-considering this paper, with best wishes for your New Year!

Round 2

Reviewer 2 Report

The website for this journal states the following:

Article: These are original research manuscripts. The work should report scientifically sound experiments and provide a substantial amount of new information. The article should include the most recent and relevant references in the field. The structure should include an Abstract, Keywords, Introduction, Materials and Methods, Results, Discussion, and Conclusions (optional) sections, with a suggested minimum word count of 4000 words. Please refer to the journal webpages for specific instructions and templates.

That format has not been used. Instead, a major section has been added called "Literature review", directly after the Introduction. This is simply a way to respond to my earlier comments with the least possible effort. All material in the lit review should either fit into the Discussion or be deleted. Similarly, the section on Research Questions and Hypotheses should be deleted, with the relevant material stated as a single hypothesis in the Introduction.

I note that the current Introduction is probably all that is necessary, in terms of posing the question. However, I would push back strongly against the statement that "validation becomes much more difficult with larger samples." There are statistical methods that are only appropriate for large samples (Carlisle, Anaesthesia 2017; 72(8) 944-952), and I cannot see how those methods would be applied to small samples.

I don't understand why Table 5 shows three significant figures as this does not seem appropriate. Two will be adequate.

I do like the first paragraph of the Discussion, as this is a good summary. However, the Discussion is ~2,500 words long, which seems long-winded.

I didn't spend much time on this review as it is clear the authors didn't spend much time on their revisions.

Author Response

The material in the introduction and literature review was combined and shortened. We put the general and specific hypotheses at the end of the introduction. We deleted the section on Research Questions and Hypotheses. The issue with sample sizes was deleted from the introduction as you suggested. We did try to clarify the issue though in the discussion. It's not a matter of statistics but of the web-based calculators, GRIM and SPRITE that cannot handle sample sizes over about 200 as they attempt to reconstruct entire data sets from only the mean, SD, min, max, and N. If no such data appear to allow for reconstruction, it may hint that the data reported are in error. But GRIM and SPRITE cannot be used with very large samples. We shifted to using two decimals in Table 5. We also cut material from the discussion section. Thank you for both of your reviews, which were extremely helpful. We will be glad to attempt further improvements, as needed. (we sent in our comments earlier but they seem to have disappeared, so if you received them, this is our attempt to repeat them after the first attempt didn't seem to work)

Round 3

Reviewer 2 Report

The Introduction is still too long for my taste, but it is better focused and has become less distracting. The Hypotheses, in particular, are much improved.

But this paper is just too long for the content! There are nearly 8 pages of Methods, and the methods are simple. What we have here is a method meant to reveal fraud in a small selection of papers by a single author, and we already know fraud is a problem in these papers.

I’d suggest a cut in length of about a third to a half, to align the length with the content.

Since a method is reported here, the sensible thing to do is to use the new method on another, larger sample of fraudulent papers, to see if the method works. Otherwise, it risks seeming like the authors have a vendetta against Dr. Stewart.

Line 273-274: This is a result in the Methods section, and it does not make apparent sense. Methods are what you did, Results are what you found. It’s really that simple.

All Tables belong in the Results, not the Methods!

There is a serious conflation between sections, with results and even discussion bleeding into the Methods. Let me illustrate the problem with a particularly egregious example from lines 280-287:

The original text reads, “Hand calculation was measured by dividing unstandardized regression coefficients by their standard errors [B/SE] and attending to whether the reported t-value was replicated exactly to two or three decimals. Appendix A illustrates the difference between computer-generated results and hand calculation; 20% of the computer-generated t-values differed from the hand calculated values (an earlier analysis had found 80%). Why would someone report hand calculated values when computer generated values are provided automatically with a computer printout? The answer is not clear. Brown and Heathers [39] have provided more details on this issue of hand calculation.”

The sentence, “20% of the computer-generated t-values differed from the hand calculated values (an earlier analysis had found 80%)” should be in the Results section.

The sentences, “Why would someone report hand calculated values when computer generated values are provided automatically with a computer printout? The answer is not clear. Brown and Heathers [39] have provided more details on this issue of hand calculation,” should be in the Discussion.

More than half of the passage above is inappropriately placed! The random admixture of methods, results, and discussion is still alarmingly common throughout the paper and MUST be corrected.

In Table 1, “Case #” apparently does not correspond to the reference number in the reference section. What does it correspond to? Where is a reference for the articles under study?

Line 588: Restate Hypothesis 1. Do the same for the other hypotheses in the Results.

Line 658-665: These are results in the Discussion.

The Discussion is overly detailed, with very little synthesis, so that the reader can easily lose track of any broad truths among the minutiae. We do not need a recapitulation of every finding; we need a sense of what it means. What synthesis there is (Line 770-777) does not seem to relate to this paper.

The section on Conclusions may be all that is necessary for the Discussion. I’d suggest deleting lines 647-777.

Author Response

Our response concerning changes made and recommendations accepted is attached.

Author Response File: Author Response.docx

Round 4

Reviewer 2 Report

The Abstract is enormously improved! Much of the rest of the paper is also better focused, but it is still too long for the content. The Discussion is fine, but the Intro could be shorter, and both the Methods and Results could benefit from a tighter focus. If a difference is not statistically significant, it’s not a real difference and need not be discussed. It is easy to get lost in a morass of non-significant and marginally significant findings, when there appear to be findings of real significance.

Line 42: Solutions to what? Some ambiguity here.

Line 63: The style of citation (“As noted by Mistry, Grey, and Bolland [15],”) is literary, rather than scientific, and such name-dropping can become confusing. I’d suggest simply, “As noted [15],”. This citation is then followed by a 9-word quote. Nine words are too few to worry about; just restate in your own words. These problems occur elsewhere, though I won’t note it.

Table 1: Much improved, but sample size, funding, and Google cites are still Results. It’s probably best to move this table to the Results section. Also, I’m a bit confused. Pickett is co-author on many of the papers, including the retracted ones, so he turned on his co-authors? Wouldn’t it have been better to call them out before publication? Why take his word for which articles should be retracted?

Lines 273-279: I find this description of methods confusing, but maybe I’m just sleepy after lunch. But bear in mind that many of your readers may just have eaten lunch too.

Table 2: Not sure what these values are; do they represent descriptive statistics for the retracted articles only, for the pooled articles, or for the difference between retracted and control?

Lines 571-578: This is somewhat indecipherable. The text states, “A binomial test for one or fewer zeroes out of 524 tests yields a probability of the same level, but z = -7.41.”. I don’t know what this means, but it seems like a correction for multiple testing would be required. In general, the statistics are presented in a way that will baffle the statistically naïve; clarity is more important than dogged detail.

Lines 580-582: The text states, “20% of the computer generated t-values differed from the hand calculated values (an earlier analysis had found 80%).” What accounts for the difference in analysis of 20% vs 80%? This is a large discrepancy to leave unexplained.

Table 6: Don’t split between two pages because I suspect it may look that way in the final version too.

Lines 652-653: I think it is improper to say that “All three of our hypotheses received support in terms of effect sizes, with some results not significant statistically.” If a difference is not significant, it is not a difference, no matter the effect size.

The Discussion requires a paragraph more of synthesis. Clearly, it is not possible to do these exhaustive tests on every paper submitted to a journal, or even every paper that looks questionable. What would you suggest as to a quick analysis? Can you determine the sensitivity and specificity of the chosen measures?

Author Response

See attachment below. Thank you for your very helpful reviews!

Author Response File: Author Response.docx

Article Menu

Can Retracted Social Science Articles Be Distinguished from Non-Retracted Articles by Some of the Same Authors, Using Benford’s Law or Other Statistical Methods?

Further Information

Guidelines

MDPI Initiatives

Follow MDPI