Currently, most famous publishers as well as many universities worldwide use software to detect the rate of similarity between scientific papers, dissertations and the available literature. This is a common step in the current editing process. However, not all journals use such software, thus contributing to an overlap between papers in their databases. Specific criteria used in these software tools are also unknown to the authors.
In addition, to date, there is neither a measuring unit nor a general standard in the scientific community for the allowable rate of similarity. For example, based on the knowledge of the presenting authors, some editors reject research papers with a similarity percentage higher than 10% or sometimes up to 35% and review papers with a much higher similarity percentage. Similarity measurement can also be performed at different steps in the review or editing processes, i.e., when a paper is submitted or accepted. Moreover, universities and even general governmental organizations in science from different countries may also have different rules and limits regarding similarity, which has been described with some examples by Maurer et al. [1
]. Consequently, some researchers may be accused of bad citing or even plagiarism, brought about by the lack of existing similarity standards and missing knowledge of the applied software tool to measure it.
As an example of our concerns, the preceding two original paragraphs of this letter were checked with available free versions of plagiarism software: PlagScan [2
], CheckText [3
] and Plagiarism Checker X [4
] respectively. The first one reveals that “no strongly similar text sources were found on the internet”. The second one indicates 4% of plagiarized words corresponding to common words such as “currently” or “moreover”. The last one reports 31 plagiarized words out of 201 with nine sources identified with an estimated plagiarism percentage of 14%, recommending optional improvement. Although these tools only check text similarity and are probably less powerful than iThenticate [5
] used by Crossref Similarity Check and which is subject to a charge, this example demonstrates that the detected similarity depends on the applied software.
Although there exist academic and national boards, e.g., JISC [6
] (Launch of Jisc Plagiarism Advisory Service) in the United Kingdom [1
], a universal standard would be desirable in the scientific community, for the rate of allowable similar and matched content—referring especially to so-called “self-plagiarism” [7
] or copying of general phrases in the introduction section [9
]—for an ethically fair and equitable treatment of authors. This mainly implies a wide consensus between journals, editors, authors and institutions. With this goal, we recommend the creation of a representative committee to propose appropriate common tools and standards for measuring matched content and similarity rate in scientific documents. In a sophisticated version, the standard might be adjusted by considering the field of research, as using technical words in science is unavoidable, and, hence, the rate of similarity might be automatically higher. This is also the reason why author names, affiliations and the reference list are obviously excluded from the similarity analysis. Furthermore, the similarity in some parts of articles such as introductions or methods could be weighed differently from results and discussions as well as conclusions. As Brumfiel discussed [9
], open archives or preprint servers such as arXiv [11
] are often misused for plagiarism and authors with poor English knowledge tend to copy phrases from their own earlier work or the work of others. Therefore, the content of previously published paper(s) by the same (group of) author(s) also has to be evaluated to differentiate between self-plagiarism and the correct re-use of previous published works. While similar and matched content detection by software is very quick and useful, this could be coupled to human analysis for better efficiency as evidenced in [12
]. The more sophisticated the results of text analysis software are, the more solid is the basis on which the editor makes his/her decision. As Glänzel et al. [13
] and also the Nature journal
] stated, a careful human cannot be replaced. An automatic rejection based on a simple similarity value therefore should not occur.
Before potentially undertaking this large-scale work, our main suggestions to improve this issue and make the process more transparent are that each journal should systematically reveal in the author guidelines which similarity checker software is used. In addition, concrete similarity limits must also be mentioned.