Preprints in Chemistry: An Exploratory Analysis of Differences with Journal Articles

: The exploratory analysis of the differences between preprints and the corresponding peer reviewed journal articles for ten studies ﬁrst published on ChemRxiv and on Preprints, though statistically non-signiﬁcant, suggests outcomes of relevance for chemistry researchers and educators. The full transition to open science requires new education of doctoral students and young researchers on scholarly communication in the digital age. The preliminary ﬁndings of this study will contribute to inform the curriculum of the aforementioned new courses for young chemists, eventually promoting accelerated innovation in a science that, unique amid all basic sciences, originates a huge industry central to the wealth of nations.


Introduction
Publishing scientific articles in the form of "preprints" (though most preprints will never have a print version [1]), namely of freely accessible scientific documents posted on the internet before the peer review process, is rapidly replacing the conventional publishing process in several basic sciences. For instance, the publication rate of https://arxiv.org (arXiv), a website managed by the Library of Cornell University, in 2019 approached 13,000 preprints per month (12,989/month) [2]. Originally aimed at physics, mathematics and computer science scholars, arXiv currently hosts works also from quantitative biology, quantitative finance, statistics, electrical engineering, systems science and economics scholars. Similarly, the number of papers published by https://www.biorxiv.org (bioRxiv), a preprint repository for the life sciences managed by Cold Springer Harbor Laboratory since late 2013, in October 2020 exceeded the 100,000 threshold, with a publication rate of 2943 preprints/month in the first eight months of 2020 [3].
In slightly more than three years since its debut (in May 2016) https://www.preprints. org (Preprints), the multidisciplinary preprint platform owned by the scientific publisher MDPI, reached the milestone of 10,000 preprints [4]. Yet, it took only 13 months to almost double the number of preprints to 17,000 by late October 2020. Showing the global impact of preprints, the latter studies at Preprints were co-authored by over 64,000 authors, whereas those at bioRxiv from close to 424,000 scholars.
We briefly remind that, in general, prior to publication of the preprint an editor working for the organization owning the preprint server checks the uploaded manuscripts for minimum quality and lack of plagiarism. Eventually, the manuscript authored with no requirements on how to write and structure the article is posted online as a PDF (portable document format) file, but also in HTML and XML formats by certain servers such as bioRxiv and Authorea so as to make data-rich preprints in HTML easily discovered by search engines, easily translated and readily "data mined".
Dubbed Chemistry Preprint Server (CPS), the first chemistry preprint server was launched in August 2000 at http://preprint.chemweb.com. Two years later the CPS hosted already 500 preprints in numerous areas of chemistry, from biochemistry to computational chemistry [5], co-authored by scholars based in 51 different countries. Alas, the website chemweb.com was subsequently closed because "changes in search algorithms resulted in a dramatic decline in traffic and a corresponding drop in revenue" [6]. Other attempts to launch chemistry preprint servers from large publishing companies were unsuccessful [7].
Publishing their research work in the the most oligopolistic sector of the highly profitable scientific publishing industry [8], chemistry scholars were recently found to be those publishing with the lowest frequency in open access (OA) journals. In detail, the analysis of 100,000 recent articles from all disciplines found that less than 20% of the chemistry papers were freely accessible [9]. In this context, by August 2017 the American Chemical Society, joined by the Royal Society of Chemistry and the German Chemical Society, launched a new chemistry preprint server at https://chemrxiv.org (ChemRxiv, today partly owned also by the Chemical Societies of Japan and of China). By late October 2020, the platform hosted 6422 preprints, with an average publication rate of 324 preprints/month recorded in the first eight months of 2020 [10]. By the same time, Preprints hosted close to 1000 chemistry preprints.
The question of how much papers change between preprint and final published article is important because it may aid to dispel myths surrounding preprints in scientific communities (like research chemists) still reluctant to their uptake, as well as to enhance trust of scholars in this new form of scientific publishing. One myth identified by Tennant and co-workers for example, is "the risk of 'scooping' often used to argue against preprints, whereas in reality the opposite is true as a preprint defines precedence and 'ownership' of research" [11]. Surveying 38 stakeholders based in eight European countries (from research funders through unengaged researchers), Chiarelli and collaborators recently reported that trust was "the essential factor in preprint posting" [12] with preprints creating "what might be called a trust barrier" [12].
Yet, in 2016, a first seminal study, comparing more than 12,000 arXiv preprints with the corresponding refereed journal articles, concluded that little differences exist between the preprint and peer reviewed articles when considering titles, abstracts and the body of the text (both on the semantic and on the editorial level) [13]. In addition, extending the same statistical analysis to 2500 preprints from bioRxiv revealed very little changes between the final published scientific papers and their preprint versions [14]. Similarly, a recent analysis of 56 preprints published by bioRxiv in 2016 found that, on average, the peer reviewed articles were of "higher quality of reporting" than preprints, but that the difference was small [15].
The following exploratory analysis looks at the differences between preprints and the corresponding peer reviewed journal articles for 10 studies first published as preprints in ChemRxiv and in Preprints.

Methodology
Ten preprints which underwent subsequent publication as peer reviewed articles in international scientific journals were selected, five from ChemRxiv (Table 1) and five from Preprints ( Table 2). The preprints were selected because they represent different fields of today's chemical research: spectroscopy and electron microscopy, catalysis, natural products, nanochemistry, green chemistry, and scientific education in the context of the emerging circular economy.
In the following analysis, changes between a preprint and its corresponding peer reviewed article are considered minor when concerning only style, grammar or graphical aspects such as the format of a table. Changes are classified as significant when the peer reviewed article includes new data, new experimental details and new discussion of results not present in the previous preprint.
Each table includes the preprint title, the journal in which the peer reviewed article was eventually published and the current (2019) journal impact factor (JIF), a citation-based metric [16]. The number of unique views of the selected preprints by 23 October 2020 is also included. Comparison between each preprint and the corresponding journal study concerns title, abstract, text and references. The preprints are identified (numbered) with bold numbers, rather than referred to them by type in the following, in order to generalize the text for non-chemistry experts (i.e., for readers not familiar with chemical terms and processes).   Table 3 lists changes across the ten selected studies between final published journal article and preprint. The time between preprint and journal article publication is also listed.

Results and Discussion
Upon acceptance for publication in different journals following peer review, most journal articles had the same title of the deposited preprints. Only in the case of preprint 8 posted at Preprints, the title of the corresponding journal article was shorter.
The abstracts of the preprints published in ChemRxiv and the corresponding journal articles were the same in three out of five cases. The journal article deriving from preprint 2 specified that the article derived from interaction with the members of the Association of Environmental Engineering and Science Professors in a workshop organized at the 2017 association conference. The journal article published after preprint 4 includes two minor writing style changes. The abstract of the preprints published in Preprints and the final published journal articles was the same in two out of five cases, specifically for preprints 6 and 9 and the corresponding journal articles. In the case of preprint 10, the abstract of the journal article [17] was significantly shorter than that in the preprint. The abstract of the final published article [18] is longer and slightly more informative than that of preprint 7, similarly to what happens for the abstract of the journal article [19] when compared to that of preprint 8.
Little or no differences were found between the texts of the final journal articles and the preprints published in ChemRxiv months or weeks before. Preprint 1 even used the template of the subscription journal in which it was eventually published five months after the preprint. Interestingly, the study made freely accessible as preprint includes on each page the sentence "Submitted manuscript: confidential" [20].
Preprint 2 makes use of the template of the subscription journal in which it was published two months after publication of the preprint as open access (OA) document, with a table (Table 1) resulting of even higher readability (using colors) in the preprint [21] than in the peer reviewed article. In the case of preprint 3, the final article published four months after the preprint in an OA journal includes three more references and slightly longer conclusions [22].
Preprints 4 and 5 do not use a journal template, but their content is almost identical to the final published articles. Preprint 4 does not include page numbers [23] but embeds high resolution colored figures and schemes. Downloading the preprint from ChemRxiv, users would also download the Table of Contents graphics and the same 470-page long Supporting Information section found four months later in the final published article. When compared to the text of preprint 5, the peer reviewed article published 16 days after the preprint [24] includes at the end of the article a brief "Post preprint addendum" and five more references.
The latter preprint was uploaded, approved and published on the same day (17 October 2018). The day before Angewandte Chemie published a manuscript [25] of a Swiss-German team reporting the invention of a similar method to obtain the molecular structure of microcrystalline molecular compounds via electron diffraction. The manuscript was received by the journal's editorial office on October 2, 2018.
Larger, though still not significant differences were noted between the selected preprints deposited at Preprints and the published journal articles. When compared to the text of preprint 6, the final published article illustrates concepts through new re-search in a quickly developing field of chemistry, published in the literature in the long period of time (20 months) between the publication of the preprint and that of the journal article [26]. In the case of preprint 7, the final published article was virtually identical to the preprint, unless for a minor mistake in the sequential order of the Figures in the preprint that was corrected in the journal article [18].
When compared to preprint 8, the peer reviewed article [19] had a substantially higher number of references (22 vs. 17) and a longer and more informative conclusions section. In comparison to preprint 9, the final published article [27] includes five new schemes and one new figure. The experimental section and the conclusions are identical.
The largest differences in the present analysis were noted between preprint 10 and the corresponding journal article [17]. The latter embeds a more succinct presentation, with only four tables in the journal article vs. six in the preprint. Furthermore, the journal article includes as figure 1 an elegant and highly explanatory image displaying the experimental design, and as figure 2 an image showing electron microscopic pictures of treated and non-treated orange peels. Both figures were absent in the preprint. Finally, the journal article includes a richer conclusion section.
Published between 2018 and 2020, by 23 October 2020 all selected preprints but one had more than 100 reads (unique views). In general, the number of views was significantly higher for preprints published in ChemRxiv. For comparison, the most viewed preprint at Preprints among those selected herein had 928 views whereas the most viewed preprint at ChemRxiv had 60,352 views. In general, by the same date the most viewed preprint published by the multidisciplinary Preprints server had 5369 views. 1 The high number of reads for preprints posted at ChemRxiv was noted since the early days of the preprint server, when a manager of the OA program of the ACS was "pleasantly surprised" [28] by the fact that by June 12, 2018, the 400 preprints posted had about 378,000 downloads/views. The trend continued, and two years later the editor of the online publishing platform remarked how preprints at ChemRxiv had been accessed "more than 10 million times, with upwards of 250,000 visitors to the site each day" [29].

Outlook and Perspectives
The exploratory and statistically non-significant analysis (due to the small sample size of the sample studied) reported in this study offers preliminary evidence that in chemistry, likewise to what happens in physics [13] and in the life sciences [14,15], the differences between preprints and the corresponding articles published after peer review are small. The preprints selected represent widely different fields of today's chemical research, from spectroscopy and electron microscopy through catalysis, natural products, nanochemistry, green chemistry and even scientific education. A statistical analysis including a much larger sample is needed to corroborate or confute these preliminary findings. Though preliminary and aware of this limitation, the study offers a few outcomes of relevance to today's chemistry scholars interested in the adopting open science practices given that, along with "green" self-archiving research articles on institutional or personal websites [30], the adoption of preprint is an essential part of the practice of open science [31].
Today, chemists can publish their work in preprint form on several preprint servers including ChemRxiv, Preprints, SSRN, Authorea, ResearchSquare, Zenodo, Beilstein Archives, OSF Preprints, ResearchGate and many others. Though being the community with the lowest uptake of preprints when compared to life scientists, physicists, computer scientists and mathematicians, chemistry scholars massively read preprints. For example, by late October 2020 about 6500 preprints posted on ChemRxiv had close to 13.5 million views. For comparison, a highly read OA chemistry journal such as Molecules recorded 13.8 million reads for 26,563 articles published by late December 2020 [32]. Furthermore, preprints deposited at ChemRxiv that had been cited 430 times in 2019 and 85 times in 2018, in 2020 started to be cited at a fast rate with close to 1050 citations in the first 10 months of the year. 2 The reason explaining why research chemists eagerly read preprints may be that by doing so they learn new outcomes of relevance to their research several months ahead of time. Indeed, even in 2013 when virtually all chemistry journals were published on the internet, the average publication time (submitted to published time) for chemistry manuscripts was nine months (and four and a half months for submitted to accepted) [33].
Following studies and even experiments with reviewers concerning the peer review process carried out when he was editor of a prestigious medical journal, Smith in 2006 concluded that peer review "is a flawed process, full of easily identified defects with little evidence that it works" [34]. Hence, rather than striving to publish their work in peer reviewed journals of high impact factor, young chemistry researchers should be aware that the JIF is a poor statistical indicator imposed by a very small number of highly cited papers (for which most papers published in high impact factor journals actually get fewer citations than indicated by the JIF) [35]. On the other hand, by making their work freely and immediately accessible on the internet first in the form of preprints and subsequently in the form of peer reviewed journal articles in OA or paywalled journals, chemistry scholars too will rapidly reap the benefits of open science already demonstrated in closely related disciplines (life sciences and physics) in terms of enhanced citations, media attention, collaborations, job and funding opportunities [36].
Additionally for chemistry researchers, the fairer evaluation of scholarship [37] today includes several indicators beyond citations collectively called alternative metrics ("altmetrics", for which even an international OA journal was established in 2018). 3 The number of reads (views) and downloads of each preprint, for example, is a clear indication of interest of the scholarly community. The preprint anticipating the almost concomitant discovery of a new method to obtain the molecular structure of microcrystalline molecular compounds via electron diffraction received by January 2021 an Altmetric "attention score" of 686 [38], which rank the preprint in the top 5% of all research outputs scored by Altmetric so far.
A few economic figures may help to explain why chemistry scholars showed reluctance to adopt open science practices, including publishing their work in preprint form after the early successful attempts with the Chemistry Preprint Server [5]. It is enough to ask even a prolific author in the chemical sciences if she/he knows what the cost paid is by her/his institution's library to access articles published by a typical subscription-based chemistry journal; and what is the market concentration level of the publishing industry in the chemical sciences. Most often, she/he will be generally surprised to learn that chemistry has historically recorded the highest average journal serial prices [39], and that in 2020 the average price for chemistry journals was the highest amid all disciplines, exceeding the $6300 threshold (Table 4). For comparison, in 2016 the average price for chemistry journals was $5105 [40]. Similarly, a few chemistry scholars are aware that only five publishers control publishing of more than 70% of chemistry studies [8]. I agree with Polka [41] and with other open science researchers [42] who found that the key challenge for the transition to open science is cultural change. To effectively foster said cultural change, in its turn, chemistry scholars and educators need to expand the education of doctoral students and young researchers to include scholarly communication in the digital age [43]. The preliminary findings of this study will contribute to inform the curriculum of the aforementioned new courses for young chemists, eventually promoting accelerated innovation in chemistry [44] and the associated social, economic and environmental benefits due to the fact that chemistry, unique amid all basic sciences, originates a huge global industry central to the wealth of every nation hosting chemical productions [45].

Conflicts of Interest:
The author declares no conflict of interest.