Preprints in Scholarly Communication: Re-Imagining Metrics and Infrastructures

: Digital scholarship and electronic publishing among the scholarly communities are changing when metrics and open infrastructures take centre stage for measuring research impact. In scholarly communication, the growth of preprint repositories over the last three decades as a new model of scholarly publishing has emerged as one of the major developments. As it unfolds, the landscape of scholarly communication is transitioning, as much is being privatized as it is being made open and towards alternative metrics, such as social media attention, author-level, and article-level metrics. Moreover, the granularity of evaluating research impact through new metrics and social media change the objective standards of evaluating research performance. Using preprint repositories as a case study, this article situates them in a scholarly web, examining their salient features, beneﬁts, and futures. Towards scholarly web development and publishing on semantic and social web with open infrastructures, citations, and alternative metrics—how preprints advance building web as data is discussed. We examine that this will viably demonstrate new metrics and in enhancing research publishing tools in scholarly commons facilitating various communities of practice. However, for the preprint repositories to sustain, scholarly communities and funding agencies should support continued investment in open knowledge, alternative metrics development, and open infrastructures in scholarly publishing.


Introduction
Electronic publishing provided many benefits of sharing research materials online. Besides the mainstream publishing in books, peer reviewed journals, and conference papers, research outputs have increased in many forms-preprints, datasets, multimedia, and software-not only to disseminate, but also for reproducibility and replication. Although outputs of publications in variety of ways increased, preprints stand out for their "accessibility" to early dissemination and "subject to review" status. They are publicly accessible, typically in line with definitions of open access, before being formally published. Preprints are scientific publications published online and publicly accessible before peer review in a journal publication. Increased growth of preprints [1] and repositories to host them are on the rise covering different disciplines, specifically moving beyond natural sciences to social sciences and humanities, though there is a widespread skepticism [2] for their acceptance of and recognition among the scholarly communities for scientific validation. Along with the growing trend of open access publishing [3], preprint repositories have grown, "while still used for small portion of papers, provided much earlier access to scientific findings" among the scholarly communities [4,5].

Background
In pursuit of open knowledge since the eighteenth century, scientific and scholarly communities exchanged communication without any formally integrated and holistic use of peer review [10]. Nevertheless, when the body and expanse of scholarly literature grew exponentially in the midto late-twentieth century with information explosion, the dissemination of current knowledge, archiving of the canonical knowledge base, quality control of published information, assignment of priority, and credit for their work to authors, became a norm for peer reviewing process [11]. Along the way in scientific writing, various roles of authorship, levels of contribution, and the rules for publishing research data in the public domain, especially before the paper released was defined by journals. It was the then editor of The New England Journal of Medicine, Franz J. Ingelfinger, whose ideas on "sole distribution" in his editorial in September 1969 for scientific communities became popular [12]. Subsequently, journals became primary mode of communication much before the widespread of peer review. With clear guidelines for authorship, academics and researchers began communicating scientific research through peer reviewed journals as a primary for publishing their scholarship.
Post-1990s, in which the Internet came to be much more widespread, this did not disrupt the scholarly publishing perhaps as much as expected, as we still have the same large players as we did in the pre-digital age dominating the landscape [13]. It was thought that the Web would kill of scholarly journals, because the cost of dissemination would plummet to near-zero. However, the large publishers simply shifted the offline system online-which is why we still have things like journals, issues, articles, copyright, and metrics that are designed for a pre-Web era. What this did, importantly, was to emphasize that it was publishers who were in charge, because they manage the metrics (for evaluation/reputation) and the copyrights. Preprints challenge both of these things. This is perhaps a deeper significance that needs to be explored. Many commercial publishers and open access mega-journals consolidated their positions as large players, even as the ties between open access (OA) and incentives, power relationships between politics, publishers, and academies increased [14]. In addition, geographical heterogeneity and geopolitics play a larger part in it, as both countries in the global North and global South are attempting to address issues in open access policies and the integration of nonprofit workflows into scholarly publishing. Although preprints emerge as an equalizer to leverage its potential, the landscape is complex. In the past three decades, different regions strived for distinct things to promote open access across national-state, institutional, and sectoral levels, as advocated in Africa, China, and South Asia. The efforts of SciELO in Latin America; a radical open access program, Plan S announced in Western Europe by research funders in 2018 and the United States of America (USA) are calling for global action towards more inclusive, open and multilingual scholarship [15,16].
Nonetheless, open source technologies and open access movements necessitated retooling the existing scholarly processes towards openness. Although open access publishing started to grow, some leading publishers, such as commercial, learned societies, and university press had actively opposed the growth of OA [17], until they found a way to transform it into a new business model and were cautious to take up OA model of publishing. Consequently, the period 2006-2017 reported the high growth of OA mega-journals, which focused on scientific trustworthiness and soundness, eschewing judgement of novelty or importance. Open access journals PLOS ONE and Scientific Reports are dominating this now [18,19].
Breaking the conventional boundaries, digital scholarly publishers have flourished innovatively with a wide variety of repository solutions and open journal publishing platforms, testing a range of open access publishing models. This is to achieve Gold open access (OA at the publishing source), Green open access (self-archiving), and Diamond open access (gold, but explicitly with no article processing charges), as indexed by the Directory of Open Access Journals, which also lists OA journals that charge article processing charges. As open access publishing models, licensing options and infrastructures are getting larger-the data and resources-enrich the web towards building open data. When the existing complexities of proprietary software, commercial publishers, and paywalled content is widespread, preprints entered to disrupt the scholarly communication system-making the vast amount of unpublished data and scholarly content available, regardless of peer review process. Preprints as a leveler, enrich the scholarly web on top of the existing scholarly resources for discoverability. It does so by allowing access to not yet printed versions, timestamping ideas and findings, and adds meaning to interconnect people, concepts, and applications [20]. Therefore, preprints play a larger role in scholarly publishing strengthening the infrastructures of web through linked data, scholarly-rich content, and applications.

Rethinking Research Impact Metrics
As countries, institutions, and research communities compete on a global stage to measure and evaluate their national, institutional, and research outputs, various outcome and metrics-based research frameworks assess different research activities and performance. Some of those key areas are science and technology indicators, patents, bibliometrics, citations, rankings, research and development factors, measurements for innovation, and metrics for assessing quality of scientific outputs. However, there is an increasing need to support research artefacts to be as inclusive as possible, going beyond research papers to preprints, software, codes, posters, media, and datasets. It should also include scholarly activities, such as teaching and public outreach. Again, it is largely debated that the benefits of research impact should rise above academia on economy, society, public policy, human development, and the environment. This refers to the strategy, resources, and the infrastructure supporting the research, as adapted in the UK Research Excellence Framework-currently assessing the excellence of research in higher education institutions in the United Kingdom (UK) [21]. This is more important towards understanding what constitutes scholarly impact-when literature obsolescence and non-citation is rife, even with journals having impact factor of five [22,23].
Journal Impact Factor, CiteScore, Scimago Journal Rank, Source Normalized Impact per Paper metrics for journals and h-index, i10-index, and s-index for authors have determined and built reputation of scientific productivity and research impact of digital scholarship [24]. However, there is a growing demand for other kinds of metrics, such as at article-level and author-level-having their own merits beyond Journal Impact Factor, which is an aggregate of citation count for a journal in which the work is published [25]. Though academics and scientometricians have developed many metrics to measure the scientific output, whether the metrics work, fair, or overused need evaluation, as citation counts have less than one percent of usage for an article [26]. Many of the metrics that exist for measuring journal quality necessitates a paradigm shift to measure author-level metrics, which essentially captures the citation-related data and the connectivity-related metrics of authors [27]. Many metrics are still focused on published, peer reviewed articles as a primary output. However, the point is, with preprints and a wider diversity of processes and outputs, this demands new metrics beyond those for traditional outputs to be developed; but also, applied in a responsible manner. The Web also opens up a whole field of additional context to explore things and hence a more 'Contextualized Metrics' is required for measuring those.
Moreover, defining impact in various contexts gets extremely challenging at academic, economic, and societal levels-given that the way the traditional metrics used for evaluation are deeply flawed [28]. For example, that they are being mis-used beyond their original intention (for example: Journal Impact Factor), deeply unscientific, and mostly operated by commercial entities and often incredibly biased in different dimensions. Citation rates, journal ranks, and impact factors are inherently hierarchical and hence institutionalizing them as a scientific impact assessment tool has unintended consequences of negative effects [29,30]. It is further found that methodological quality and reliability of published research works in several fields may be decreasing with increasing journal rank [30,31]. Supporting new methods in data and scholarly publishing, open research community must encourage publishing null results or failed experiments, against a growing body of evidence, questioning the conventional forms of impact assessment, which insist on quantifying research outputs and cannot capture diverse, wide-ranging, and inclusive research impact [32]. Moreover, relevancy of citations and impact factor is widely questioned for their role in problem-solving and societal impact [33]. Increasingly, open access has seen a great push in academics, policy making, science communication, and so on and preprints definitely add to this environment as an additional layer that will enrich the scholarly ecosystem further.
Re-imagining open infrastructures and metrics, this article aims to situate preprints in the emerging research ecosystem, as establishing disciplinary-centric and public preprint repositories are on the rise, in the last two decades or so. As preprints get mainstreamed, research publications coming out from highly to moderate novelties of incremental, supportive, or confirmatory results and their supplementary data will benefit more visibility [34]. Research communication, academic outputs, and scholarly artefacts have diversified in many ways they are available for various communities of practice-transcending disciplinary boundaries of research. Scholarly communication is evolving and diversifying. We need to rethink our metrics and evaluation systems based on this in the rapidly changing landscape. Research outputs are more than journal articles, and so measuring their impact should go beyond them, including pre-publication outputs. Their credibility, impact, and value should be measured through heterogenous metrics, which calls into question the whole idea of trying to measure scholarship. Are metrics appropriate? Or is qualitative assessment needed? Is such assessment even operationally better than randomness?

Growth of Preprint Repositories: From arXiv to ESSOAr
As exhibited in Table 1, the rapid growth of preprint repositories prompted the scholarly communities to define what constitute a preprint is, when there is no clear consensus on what they are. An examination of definitions by some of the preprint repositories reveals that they are "draft, unpublished, incomplete, or unedited final versions of papers, maybe work in progress and not typeset". In one of the early attempts, Gunther [7] distinguished the preprints from an electronic publishing and e-print server perspective, calling them as "'pre-peer-review' or 'pre-submission' documents" in a Guest Editorial in 2000. According to PeerJ Preprints, a preprint repository [35], it is described as "a draft of an article, abstract, or poster that has not yet been peer reviewed for formal publication". Many scholars have attempted to define what exactly a preprint is-distinguishing preprint as a scholarly item based on subject to evaluation as in pre-and postprints and preprint server as an infrastructure. Neylon [36] proposed a model that distinguishes the preprints by "characteristics of the object, its 'state' from the subjective 'standing' granted to it by different communities". Rittman explains preprints as "a piece of research made publicly available before it has been validated by the research community. That is to say, some output that follows the scientific process has not yet been peer-reviewed for journal publication [37]". However, Tennant et al. [8] propounded a definition of what is a preprint based around its peer review status, which is in line with Sherpa/Romeo description: • Preprint: Version of a research paper, typically prior to peer review and publication in a journal. • Postprint: Version of a research paper, subsequent to peer review (and acceptance), but before any type-setting or copy-editing by the publisher. Also sometimes called a 'peer reviewed accepted manuscript'. Publishers are accepting preprints for peer review in journals, even if they are available in preprints repositories submitted in parallel. They are submitted to preprint repositories without peer review, free of cost by authors to solicit feedback from peers, often maybe submitted to a journal later for peer review and subsequent publication. arXiv is a preprint repository that was established for high energy physics in 1991. However, other disciplines took more time to realize the potential of using preprint repositories and the best practices of early dissemination of research works online to maximize research impact [8].  Figure 1 shows the growth of preprints in life sciences from 2007 to 2018, which are reporting a high number of submissions. Life sciences established more preprints, such as arXiv q-bio, which is a quantitative biology archive and it has been part of arXiv, publishing preprints since September 2003. A nonprofit, Cold Spring Harbor Laboratory launched bioRxiv, a biology preprint repository in November 2013. In April 2013, PeerJ Inc. launched its PeerJ Preprints covering biological, medical, and environmental sciences.  Journal Impact Factor indicates the quality of journals through citation metrics, though measuring scholarly and societal impact is more important [38]. Hence, a new paradigm shift is that publications should not be subjective of impact, novelty, and interest, but is based on scientific and methodological soundness or objective. In other words, many journals have emerged to report on what "scientific literature might gradually become less biased against negative or null results and it will be less dominated by the trends and 'hot topics' of the day [39]", for which preprints provides the access to check the pre-publication of results prior to peer review. Journal Impact Factor of journals and their peer reviewing process are found to be excruciatingly slow (typically 85-150 days or longer) and invariably slows decision making-affecting careers of early researchers with no recognition until the research is published, though foster international collaboration and global reach [40]. Among the other criticisms widely conceded among researchers is that science and knowledge are measured by numerical ranking systems, which makes researchers pursue the rankings first over the research [41]. Muller argues against these counterproductive conventions on the performance evaluation and metrics calling it as 'metric fixation' [42]. Preprints break these conventions, giving advantage to research publications, for their merits and research impact, as they Journal Impact Factor indicates the quality of journals through citation metrics, though measuring scholarly and societal impact is more important [38]. Hence, a new paradigm shift is that publications should not be subjective of impact, novelty, and interest, but is based on scientific and methodological soundness or objective. In other words, many journals have emerged to report on what "scientific literature might gradually become less biased against negative or null results and it will be less dominated by the trends and 'hot topics' of the day [39]", for which preprints provides the access to check the pre-publication of results prior to peer review. Journal Impact Factor of journals and their peer 1 Record statistics are collected from their respective websites, except bioRxiv and for this, data is collected from OSF Preprints. reviewing process are found to be excruciatingly slow (typically 85-150 days or longer) and invariably slows decision making-affecting careers of early researchers with no recognition until the research is published, though foster international collaboration and global reach [40]. Among the other criticisms widely conceded among researchers is that science and knowledge are measured by numerical ranking systems, which makes researchers pursue the rankings first over the research [41]. Muller argues against these counterproductive conventions on the performance evaluation and metrics calling it as 'metric fixation' [42]. Preprints break these conventions, giving advantage to research publications, for their merits and research impact, as they become openly available as early pre-publication outputs, but they also are independent of any specific journal venue at the point of sharing.
Preprint repositories play a vital role in the dissemination of research artefacts for impact and making them visible to connect audience, which is the material cultures of academic reading and writing, maybe transient in social media communication, but calls for reuse, credit, and replication in an open research ecosystem with data, code, citations, and software. Also, scholar identity has grown alongside the technological innovations for technology-influenced scholarship through participatory technologies in public sphere. Academics, practitioners, and researchers increasingly [43] tend to communicate their research using social media as an utility on research landscape and lifecycle-as a digital opportunity to learn the tools and techniques and apply them for research communication in the changing research landscape. There is a growing trend of publishing in unrefereed preprint repositories; writings blog posts or field notes online; creating infographics, data visualizations, and publishing research data in data journals; making podcasts, creating videos/images, photo-essays, and overlay journals-all diversifying the scholarly communication [44]. Furthermore, the scholarly communication activities and processes on informal channels boost interaction, collaboration, seeking, citing, publishing and disseminating in orthodox, moderate, and heterodox use scenarios [45]. Few examples are The Conversation Global [46] and Policyforum.net [47], the online independent news platforms, which are run by research communities. Using these platforms journalists, scientists, academicians, and researchers primarily aim to communicate scholarly information for lay audience. In this, preprints help journalism, promoting transparency, and science communication for public.

Methods
For the purpose of this study, a sample of ten preprint repositories was chosen, which were based on their history, popularity, and disciplinary diversity. This was a combination of preprints (that go on to be published or not), postprints, final published articles, datasets, working papers, which were examined of their salient features, disciplinary focus, and number of records available between March and September 2018 (See Table 1). As a case study of preprints, the research was conducted at two stages. First, was to highlight their principal features, such as system architecture, persistent identifiers and registries, disciplinary focus, research data management, peer reviewing models, infrastructures, and metrics.
The second stage consisted of using indicators in depth for analysis: software and open source technologies used, standards and protocols adopted, knowledge organization systems applied, interoperability and open licensing options, indexing and aggregating agencies involved, metrics and peer reviewing processes, community standards, and web 3.0 applications available. Subject and disciplines of preprints, such as life sciences, technology, engineering, and social sciences were included for this study. Additionally, management aspects, such as funding agencies and whether preprints were supported by for-profit corporations or nonprofits, their advisory committees, code of conduct; management of digital object identifiers, submission guidelines, copyright policies, and publishing workflows were investigated. Subsequently, a comparative analysis at the site and record levels were performed to synthesize results and discussions further.

Comparative Features of Preprint Repositories
The results presented below are in eight sections. Key findings are categorized based on themes, such as: System architecture, Persistent identifiers and registries, Disciplinary focus and management, Interoperability and open licensing, Indexing and aggregators, Knowledge organization systems, authority control and subject categories, Metrics and open reviews, and Community standards.

System Architecture
System architecture refers to the database structures, hardware, and software used to set up a preprint repository. As shown in Table 2, there was a limited number of software solutions available when preprint repositories were started, so legacy preprint repositories, such as arXiv and RePEc, are migrating to use digital repository software-Invenio and EPrints respectively to integrate new applications, such as DOIs, ORCIDs, and Altmetric. E-LIS is a public preprint repository in library and information science run on DSpace. Though DSpace and EPrints dominate globally in repositories development, OSF Preprints and Figshare are new entrants for repository solutions. Few preprints are building application programme interfaces (APIs) to build robust features and accommodate services from other programs. It is found that, out of ten, four preprint repositories have Open APIs, which are RePEc, MDPI Preprints, OSF Preprints, and Figshare. OSF Preprints is an aggregator from across almost all of the other servers. It also links to other services, such as Figshare or GitHub, and is virtually unlimited in scope of what can be 'attached' to preprints and offers local storage. It uses SHARE, a community open-source initiative suite of technologies. 2 Out of the ten preprints evaluated, three preprint repositories are using custom proprietary systems, which could not be identified, as listed in column 2 of Table 2 in infrastructure. Managing research data has become integral part of system architecture, where multiple files are supported from word processors to datasets in variety of formats, such as LaTex to Zip, for preservation and essentially all of the preprints support that [48].

Persistent Identifiers and Registries
Persistent identifiers help to provide perpetual IDs for digital objects to identify and retrieve them. Most preprint repositories have identifiers, such as article IDs, URIs, and Handle system for publications, which make the records unique, identifiable, persistent, and retrievable (See Table 2). Many of them are cross-linked and directed to the DOIs of the article, where the latest version of article is available as permalinks. An example of arXiv ID is arXiv:hep-th/9603067, where hep-th stands for High Energy Physics-Theory and 9,603,067 is the unique record number. Another example of RePEc identifier handle is: https://ideas.repec.org/p/hhs/cesisp/0277.html, where hhs:cesisp denotes the Centre of Excellence for Science and Innovation Studies, Royal Institute of Technology, Stockholm, Sweden, followed by the unique record number: 0277. Among the ten preprint repositories analyzed, seven are found to be using Crossref's DOI services for preprint records. Crossref dominate DOIs among preprints. At OSF Preprints, each project is assigned a globally unique identifier, or GUID, though DOIs are used as well. DOIs versioning was found to be unique with ChemRxiv, MDPI Preprints, and PeerJ Preprints for version control. Further, DOIs assigned to supplementary data, file, code, and dataset enable them citable as well. Moreover, one of the important features found is registries, which records various projects to make them available publicly as crucial content providers and helps in avoiding the duplication of studies. OSF Registries has 274,910 registrations of research studies of systematic reviews and meta-analyses in clinical psychology and medicine cross-searchable with Research Registries and ClinicalTrials.gov registries.

Disciplinary Focus and Management
The examination of preprints history and growth reveals that disciplinary focus has been one of the major factors for establishing them. Since the need for sharing the scholarly research arose in different settings-laboratory, academic, research, and practice-preprints were created and supported by diverse disciplinary areas (see Table 1). This also ties into the social differences and norms between different research communities, wherein replication, reproducibility, and methodological approaches vary greatly among different domains, especially when preprints have a 'state' from the subjective 'standing' granted to it by different communities of practice [36] (p. 4). arXiv was started with physics, but soon expanded covering quantitative biology, astronomy, computer science, and mathematics. Biology has been quite conventional, but in recent years it has been reporting high number of submissions in bioRxiv and PeerJ Preprints (See Figure 1). In the last year, there are more than 20 disciplinary-based preprint platforms that have emerged. See here the disciplinary preprints, backed by Centre for Open Science. 3 Its other country-specific examples are INA-Rxiv, Arabixiv, and AfricArxiv committed for Indonesia, The Arab states, and Africa, respectively, to promote open science. Moreover, managing preprint repositories are not only solely resting with public institutions or government, but also funded by nonprofits and for-profit companies [8]. arXiv is hosted by Cornell University Library, RePEc by Munich University Library and consortia, E-LIS is supported by AIMS, FAO, and University of Naples Federico II, Naples-Centralino, MDPI Preprints by MDPI; bioRxiv is hosted by Cold Spring Harbor Laboratory, OSF Preprints by Centre for Open Science, are nonprofits. ChemRxiv is collaboratively managed by American Chemical Society, German Chemical Society (GDCh), and the Royal Society of Chemistry, UK, and ESSOAr by American Geophysical Union are learned societies. These agencies are backing the growth and development of preprints in key disciplinary areas. However, SSRN acquired by RELX Group in May 2016 and both PeerJ Preprints and MDPI's preprints.org are services run by commercial publishers, meaning that preprint servers are seen as a key part of business models (see Table 2). This is part of dangerous move from some publishers into controlling the entire research workflow and is symptomatic of a highly dysfunctional scholarly publishing market [49].

Interoperability and Open Licensing
Out of ten repositories, arXiv, RePEc, and OSF Preprints are found to be interoperable and support a whole range of integrated search features, such as cross-searching of content such as abstract, full text of articles across multiple repositories, and owned by different content providers. ChemRxiv run by Figshare is a proprietary platform, but it has a unique model where all of the available content is shown on the single portal, owned and provided by various institutions worldwide at https://figshare.com. Creative Commons license is found to be predominantly used by many of the preprint repositories for licensing to allow the reusing of the content and data. However, the degree of freedom varies across preprint repositories. arXiv uses the following license types, which goes from most accommodative to restrictive: Attribution 1.0 Generic (CC BY 1.0), Attribution 4.0 International (CC BY 4.0), ShareAlike (SA), NonCommercial (NC), and some even have CC BY-NC, CC BY-NC-SA types [50,51]. E-LIS, PeerJ Preprints, and MDPI Preprints use Attribution 4.0 International (CC BY 4.0), which allows sharing and adapting of works. This implies that there is a "unrestricted use, distribution and reproduction in any medium and the original work is properly cited" [52]. ChemRxiv allows for Attribution-NonCommercial-NoDerivs CC BY-NC-ND to "download works and share them with others as long as they are credited, but they can't change them in any way or use them commercially" and applies embargo, keeps confidential files, generate private links, and reserve DOIs and also accepts any file format up to 5 GB. ESSOAr follows Attribution 1.0 Generic (CC BY 1.0) License. RePEc does not state any one of the above licensing options, while SSRN allows this: papers by copyright owner or have the copyright owner's permission, are permitted to post under publishing agreement or the publisher's copyright policies or institution's license agreement or under a Creative Commons license. Preprints to peer reviewed journals portability is worth mentioning here for bioRxiv preprints, which become easy for authors and currently this service is available for 107 biology journals.

Indexing and Aggregators
It was found that all of the preprints are indexed and aggregated by commercial, institutional, data repositories, and databases-which are bibliographic, aggregating, and depositive in nature. RePEc has its own indexing platform, called IDEAS, a comprehensive bibliographic database in economics, available free, and indexes over 2,700,000 items of research, indexed in EconLit, EconStor, Google Scholar, Inomics, OAISter, OpenAIRE, and EBSCO. E-LIS provides seek option for references, which can be retrieved in Google Scholar. bioRxiv preprints are indexed by the following services: Google Scholar, Crossref, Meta, and Microsoft Academic Search. MDPI Preprints are indexed by Europe PMC, Google Scholar, Scilit, Academic Karma, SHARE, and PrePubMed. PeerJ Preprints are indexed in Google Scholar. Crossref provides DOIs for preprints and DataCite primarily work for providing persistent identifiers for all kinds of research data and it is integrated with ChemRxiv. Though many of the abstracting and indexing databases, OpenAIRE, ResearchGate, Academia.edu OAISter, index preprints there is no established standards available for preprints, hence no usage statistics are reported, unlike peer reviewed journals that report COUNTER-complaint usage statistics.

Knowledge Organization Systems, Authority Control and Subject Categories
For authority control of authors, arXiv, bioRxiv, MDPI Preprints, and ESSOAr use the endorsement of authors through ORCID. All of the preprint repositories display author-supplied keywords or tags and browsing preprints by subjects/disciplines is prevalent. In addition, many of the preprint repositories display the subject/discipline category and are based on which preprint categories are displayed. For example at bioRxiv the category of articles submitted are New Results, Confirmatory Results, or Contradictory Results vis-à-vis differentiate the conventional papers Research, Opinions, Reviews, Technical, Concepts or Case Studies published in social science preprints, which are SSRN, OSF Preprints. PeerJ Preprints, arXiv, bioRxiv, and OSF have advanced features, such as article versioning, adding links, and comments. PeerJ Preprints has faceted browsing of its collections by manuscript type; filtering articles by entity, which are references, questions, answers, figures, and by published date and subjects. Really Simple Syndication (RSS) is popular among the preprints for having syndicated updates on new articles, subject areas, besides social media. RePEc and SSRN are using JEL Classification Codes, whereas E-LIS uses JITA Classification of Library and information Science to classify the scholarly literature. There are no standardised metadata schema adopted by preprints, except Dublin Core metadata schema followed in DSpace at E-LIS, and the rest of the preprint repositories use a more simplified metadata input formats.

Metrics and Open Reviews
Since citations data are quite distributed in various databases by their journals coverage, they need to be aggregated from multiple platforms, such as Crossref, Scopus and Web of Science, for use. Google Scholar's citation data is essentially found to be the superset of Scopus and Web of Science databases [53]. All the preprints provide citation tools support to export references in multiple file formats supported by various reference management software. Among all of the preprints, arXiv reports a unique subject wise submissions, access, and download details-daily, monthly, and institutional wise. RePEc reports the number of citations, downloads, and abstract views; top-level metrics for institutions, regions, authors, and document types. Also, it reports statistics by research items, series and journals, authors, and institutions [54]. SSRN preprints have report on institutional level data for downloads, abstract views and rank of papers, authors, and organizations, besides integrated PlumX Metrics, which is an alternative metric platform of Elsevier. See here an example [55]. PeerJ Preprints reports a unique article-level metrics, which are grouped as social referrals by social media and top referrals, which are essentially search engines, bookmarks, URLs, and email alerts. See the example in Figure 2 [56].
Altmetric platform is integrated with bioRxiv, MDPI Preprints, ChemRxiv, PeerJ Preprints, and ESSOAr preprints-aggregating social media metrics. PeerJ Preprints reports visitors, downloads, and views; OSF Preprints shows downloads count; MDPI Preprints exhibits on views, downloads, commenting options in public and private, and also provide rating options; E-LIS shows monthly and yearly downloads in the graph at article-level and repository-level and also other statistics available are the most downloaded items, top authors. ChemRxiv shows views, downloads, and citations; ESSOAr reports download counts. MDPI Preprints allows the viewing of reviewer comments through Publons, a peer-review profile platform and the only one to do so among the preprints, while PeerJ Preprints provides open feedback, Q&A, and linking options to engage with readers and reviewers.

Community Standards
Community standards help to develop, integrate, and steer the standards, protocols, and codes of conduct to take the initiatives (systems, software, and programs) to the wider community of committers, developers, and funders for strengthening open access initiatives, open source technologies, and into scholarly publishing. This refers to the standards, which are free and open source software, projects, and communities for interoperability. One of the important metadata harvesting interoperability protocol is Open Archives Initiative-Protocol for Metadata Harvesting-v.2.0 (OAI-PMH v2.0). arXiv, RePEc, and e-LIS are compliant to this protocol to support harvesting of records from other digital repositories and set the trends for community standards in building open archives [57]. For the standards of software and operating system, arXiv uses GNU and MIT License. RePEc has GNU and Guildford Protocol. E-LIS adopted Open Data Commons-Open Database License. As much as preprints operate on open community standards, managing them need advisory boards, funding strategies, and steering committees to take the initiatives forward, which are further discussed in the Table 3. None of the preprints explicitly display code of conduct.

Community Standards
Community standards help to develop, integrate, and steer the standards, protocols, and codes of conduct to take the initiatives (systems, software, and programs) to the wider community of committers, developers, and funders for strengthening open access initiatives, open source technologies, and into scholarly publishing. This refers to the standards, which are free and open source software, projects, and communities for interoperability. One of the important metadata harvesting interoperability protocol is Open Archives Initiative-Protocol for Metadata Harvesting-v.2.0 (OAI-PMH v2.0). arXiv, RePEc, and e-LIS are compliant to this protocol to support harvesting of records from other digital repositories and set the trends for community standards in building open archives [57]. For the standards of software and operating system, arXiv uses GNU and MIT License. RePEc has GNU and Guildford Protocol. E-LIS adopted Open Data Commons-Open Database License. As much as preprints operate on open community standards, managing them need advisory boards, funding strategies, and steering committees to take the initiatives forward, which are further discussed in the Table 3. None of the preprints explicitly display code of conduct.

Preprints for Building Scholarly Infrastructures and Metrics
Preprint repositories are becoming pivotal at the intersection of scholarly web and open infrastructures, developing their pathways towards dynamic research ecosystem in the advent of open technologies, such as persistent identifiers, open data harvesting and protocols, integrated data aggregators, and various discovery layers. Since the preprints make the content available, building infrastructures around them are central to build, scale, and measure. Interoperability and crosswalking between them is critical for discoverability and citability of scholarly data. Though, some funders have guidelines, for example, at NIH, 5 there are no general standards or established principles for preprints publishing. This is important for researchers, publishers, infrastructures, and service providers to have coherent workflows and integration of multiple data sources and open infrastructures into unifying platforms to collect evidence of research impact, which will improve the demonstrated reliability [27]. Building novel metrics upon preprint infrastructures help with the quality assurance of scientific outputs, however, has its limitations. For example, alternative metrics say little about the quality of a paper and kinds of impact, but more about its popularity [58]. Hence, the alternative metrics for alternative scholarly infrastructures need to be designed wisely to prevent adverse effects, as in how some of the conventional metrics are misused, such as Journal Impact Factor.
Embracing Findable, Accessible, Interoperable, Reusable (FAIR) principles for scientific data management and stewardship focuses on reuse of scholarly data, specifically enhancing the ability of the machines to automate reusability of data. The potential impact and good practices of using FAIR principles among the UK academic research community has been found to be existing and continually improving, despite disciplinary differences. However, it is found that there is lack of understanding of FAIR data and principles; need for investments in development of data tools, services, and processes to support open research; adopting FAIR principles across the broad coordinating activities and policy development at cross-disciplinary, national, and international levels [59,60]. DataCite has been steering on persistent identifiers for research data citation, discovery, and accessibility; measurement of grants and impact made by funding agencies [61,62]. Hypothesis has been experimenting with open annotation use cases on preprints and discussed the burden of moderating (editorial and site), identity, and versioning among the preprint repositories [63]. OSF Preprints has been experimenting on open annotations. At the nexus of building open scholarly infrastructure-metric in the broader scholarly communication system, preprints push for developing and integrating evidences of impact for evaluating research and researchers with these emergent systems below:   facilitating the measurement of citations data, while remaining widely distributed to be discovered on the scholarly web.

Towards Building Sustainable Open Infrastructures with Preprints
Preprints drive demand for new scholarly metrics and infrastructures, having been part of the scholarly outputs, reporting preliminary results. Preprint repositories that are designed with open source software, technologies, and infrastructures become essentially sustainable [64]. Commenting upon the needs of open development as a socio-technological innovation towards open access, Chan [65] noted that "the term is a broad proposition that open models and peer-based production, enabled by pervasive network technologies, non-market based incentive structures and alternative licensing regimes, can result in greater participation, access and collaboration across different sectors . . . A key understanding of 'open development' is that while technologies are not the sole driver of social change, they are deeply embedded in our social, economic and political fabric. We therefore need to understand 'openness' within the context of a complex socio-technical framework". The collective action for scholarly communication necessitates funding for infrastructure services to be interoperable, scalable, open, and community-based for open infrastructures as the potential funders and organizations look for demonstrable community-based services, like preprints supporting open research. Notable are SCOSS and the CoKo Foundation here as promising initiatives in this space.
Hence, developing conceptual frameworks to support investors in infrastructures for open scholarship and in developing community capacity through the OA Sustainability Index becomes important. This is to take on initiatives, like preprints development, in hitherto under-represented disciplines and extending frontiers of open knowledge [66]. Sustainability of research ecosystem with research, education, and knowledge production components are crucial-as the implementation of preprint policies relies on the development of a fully-functioning OA infrastructure [67]. In order to build resilient open infrastructures to be inclusive and sustainable systems-creating, sharing, and disseminating knowledge is important in scholarly publishing for workflow integrations, metadata reuse, and publisher integration with the research lifecycle. In support of open and collaborative science, Chan [65] further argues that "open approaches to knowledge production have the potential to radically increase the visibility, reproducibility, efficiency, transparency, and relevance of scientific research, while expanding the opportunities for a broad range of actors to participate in the knowledge production process . . . openness is not simply about gaining access to knowledge, but about the right to participate in the knowledge production process, driven by issues that are of local relevance, rather than research agendas set elsewhere or from the top down". This is where preprint repositories are proven to be a disruptive development towards building public science. Scientific publishers, research enterprises, funding agencies are at a deflecting point where research systems should be built, designed, and disseminated inherently open, and developing preprint services provides just that opportunity for scientific communities [68].
We need to strengthen and expand the community and institutional role in managing preprints and their development. For that, we should redefine frameworks to overcome barriers and challenges in establishing open infrastructures for scholarly communication networks, so that open research principles are inbuilt in our research ecosystem, production processes, and in scientific publishing. Open Science by Design report that was released by the US National Academies of Sciences, Engineering, and Medicine is a step towards that [69]. Research is global and scholarly communities need interoperable hubs, interlinking data, and infrastructures supporting information exchange across repositories with standards, metadata schema, and semantic interoperability, as there is lack of standards for aggregating data used across platforms [70]. Preprints are disrupting the scholarly communication system and many leading publishers are slowly participating in the process-supporting, accepting of, and archiving in preprint repositories. However, some of the important challenges are inconsistent metadata schema in data harvesting, supporting multilingual systems, lack of standards in integration, and protocols for aggregating data and implementing them across platforms in version control, deduplication, and digital preservation. In strengthening the open infrastructures and metrics, preprints add to the ever-growing repository types and artefacts, which are indexed and mined by indexers, aggregators, and search engines; built into registries, authority files for authors and organizations, and vocabulary control of subject terminologies. In this, all of the stakeholders-publishers, governments, funders, organizations, authors, and institutions-will shape the preprint repositories growth as they are accepted, developed, and available. According to Johnson and Fosci [67], the key priority areas for immediate action for open infrastructures are below, which also resonates for preprint repositories:

•
Interoperable, community-led preprints with strong open access initiatives and programmes should adopt sound governance structures with greater representation from funders and policy makers, promoting the wider use of crucial identifiers and standards for preprints with maximum community participation, like open access repositories. • Ensure the financial sustainability of critical services, particularly the DOAJ, SHERPA and strengthening coalitions and funders, like SCOSS for preprint services, balancing different disciplines, and their representation fairly. • Take into the account the rapid growth of preprints and create an integrated infrastructure for them, based on roadmaps and strategies for mainstreaming them across other modes of scholarly communication.

•
Invest strategically in preprint repositories and services in order to create a coherent OA infrastructure that is efficient, integrated, and representative of all stakeholders.

Preprints for Open Science and Public
With its ability to promote open, ethical, and transparent research workflows and processes, preprints promote building open infrastructures and symbiotic services as-web of data where reproducibility is at its core-mutually supporting and growing along with other research artefacts [71]. As more and more preprint repositories grow, this is going to consolidate the research ecosystem towards a resilient, transparent, and open research environment for public in promoting scientific temper and awareness as a public good.
Preprint repositories as public good initiatives offer enormous opportunities for researchers to manage the life cycle of research production, data management, access and collaboration control, project analytics, version control, and centralized access in a distributed environment [72]. They allow for researchers to disseminate preliminary work or draft papers to a wider global community of researchers, before formally submitting to peer reviewed journals to obtain feedback or comments. It also helps to speed up the communication of research results and foster collaborations. Currently, many journals accept preprint submissions. Nature and Science have been accepting preprints for long time, since they publish physics papers. At American Chemical Society, 20 of the 50 journals accept preprints unconditionally [73]. Fostering scholarly commons, such as preprints, will open up opportunities for scientists and the public to solve some of our pressing problems from climate change to drug discovery, and it is possible through open science. Without limits and no embargos, preprints pose no great threats than if they remain inaccessible and restricted for the public [74].

Peer Review in Preprints: Revisiting for Present Times
The peer review process exists to enable nominally disinterested experts to assure the quality of academic publications, but preprint servers usually host articles, which have not yet been subject to peer review. The question of peer review at this juncture is, for open science, will the scientific communities accept preprints without peer review, when this process itself has been entangled with a lack of incentives, credits, and recognition for peer reviewers [36,75,76]. Since preprints are not necessarily peer reviewed and explicit about that, this remains to be discussed. This offers enormous potential for establishing processes like the open review mechanism and new models of peer review.
Open science through preprints promote transparency and secure provenance, time, and integrity of scientific data in an open and distributed infrastructure documenting the every step of the research process and data for public. As bibliometric measures are not the indicator of achievement, there is a need to evaluate what needs to change in our culture, who are all involved, what are the best effective ways, and how it can be measured [77,78]. The challenges of maintaining unbiased review systems without gender bias in authorship and peer review, keeping the diversity of gender, racial, and ethnic communities, and high quality of ethics and transparency calls for attention and cultural change in scholarly communication. ASAPbio's initiative is worth to mention for accelerating scholarly communication in life sciences through preprints. There is also an equal emphasis on standards, research integrity and ethics, quality, and credibility to navigate through the peer review process with scope for new initiatives having potential issues and advantages disrupting scholarly communication both in systems and as a process with incentives in place of fostering open research environments and open access publishing [76,79].
Hence, reforming scholarly communication system to overcome barriers in legal framework, information technology infrastructures, business models, indexing services and standards, the academic reward system, marketing and critical mass to integrate subject-specific, institutional, and data repositories into the main channels of scientific publications is critical, in which preprints development is a key component [80]. Though long established as a standardized practice with no other viable options for scientific communities, the peer review process is crucially invaluable and unquestionable and for preprints, this process calls for openness. Moreover, it should broaden the approaches to accommodate open rewards, incentives, and other non-monetary benefits, as they advance scientific communication [75] to solve social problems, make sense for policy makers, and push forward scholarship to the advancement of humanity.

Conclusions
Preprint repositories are gaining momentum to become active partners of the scholarly research ecosystem and contribute to open scholarship as a new model of scholarly publishing, as discussed in this article. Nevertheless, the dangers of commercialization of preprints does not augur well for the open science. This necessitates questions of sustainability of preprint repositories and to what degree commercial business models interfere with open science. Without embargos, preprints pose no risks to the public understanding of science and hence imposing limits is against the public interest [74]. Preprints apparently add to the existing complexities in scholarly publishing, however, its plethora of models, scale, and form give rise to opportunities to embrace it on one hand and on the other hand may take time for mainstreaming in scholarly publishing [81][82][83][84][85]. Nonetheless, what constitutes them and whether they will stand out in the constructs of scholarly communication remains to be seen in the wake of diverse open data, open access-publishing models, open infrastructures, and web 3.0 technologies [86]. These factors are central for scholarly communication to enrich and strengthen scholarly web with search engines, indexing systems, semantic technologies, and social software analytics to maximize research impact and build reputation systems through open infrastructures and metrics for authors and institutions. Going forward, on the landscape of preprints and metrics, maybe overlay systems could be implemented, based on repositories using new metrics as overlay journals emerge. Preprint repositories have emerged as movement and they are implemented in different ways; approached in heterogeneous forms and seeing them along with conventional journals may be a possibility or whether they will change the scholarly communication landscape fundamentally as hubs of early-research output have important caveats for open science [6,37]. However, the trade-offs, such as the questions of conflict of interests, risks, and research ethics with which preprints are published need to be addressed for the public in the public domain and in understanding science [87,88].