Reproducibility: A Researcher-Centered Deﬁnition

: Recent years have introduced major shifts in scientiﬁc reporting and publishing. The scientiﬁc community, publishers, funding agencies, and the public expect research to adhere to principles of openness, reproducibility, replicability, and repeatability. However, studies have shown that scientists often have neither the right tools nor suitable support at their disposal to meet these modern science challenges. In fact, even the concrete expectations connected to these terms may be unclear and subject to ﬁeld-speciﬁc, organizational, and personal interpretations. Based on a narrative literature review of work that deﬁnes characteristics of open science, reproducibility, replicability, and repeatability, as well as a review of recent work on researcher-centered requirements, we ﬁnd that the bottom-up practices and needs of researchers contrast top-down expectations encoded in terms related to reproducibility and open science. We identify and deﬁne reproducibility as a central term that concerns the ease of access to scientiﬁc resources, as well as their completeness, to the degree required for efﬁciently and effectively interacting with scientiﬁc work . We hope that this characterization helps to create a mutual understanding across science stakeholders, in turn paving the way for suitable and stimulating environments, ﬁt to address the challenges of modern science reporting and publishing.


Introduction
Reproducibility is widely recognized as a cornerstone of modern science that is expected to enable the validation and reuse of published findings. The term reproducibility is closely connected to related calls for open science, repeatability, reusability, and replicability. These terms are not standardized [1], are used interchangeably, carry different meanings across scientific fields, or imply questionable dependencies [2], in turn adding to the complexity of supporting and conducting responsible modern research.
In this paper, we argue that the ambiguity of the definitions and interpretations around responsible modern science principles increases the barriers for both researchers and support staff to conduct and facilitate science that is transparent and reusable. This is all the more problematic given the widely unresolved and far-reaching socio-technical challenges involved in the open sharing of useful scientific resources. Here, key issues include incentivizing and motivating researchers to openly share and document their materials [3,4], training researchers on best practices [5,6], and providing suitable technical infrastructure [7].
In recent years, human-computer interaction (HCI) researchers and practitioners have increasingly studied the requirements and needs of different stakeholders involved in enabling open and reproducible science, advocating for HCI's unique role in supporting the transformation [7]. For example, gamification has been explored as a design tool to create meaningful sharing incentives [8], HCI researchers systematically studied technical and social requirements around sharing [9], and a strong thread of research investigated practices and challenges involved in the design of suitable infrastructure, involving librarians, service developers, and research data managers [10]. However, research has not yet extensively mapped the effects of ambiguous terminologies on the sharing willingness and practices of researchers, or its impact on support staff and service developers.
Based on a narrative literature review on key terminologies and recent HCI research threads, we argue that taking a researcher-centered perspective in defining the terminology around reproducible science can contribute to an increased understanding and acceptance of modern science practices, and can help various stakeholders to tailor both their technical and non-technical infrastructure developments to this shared understanding.
This paper is structured as follows. First, we describe and reflect on the narrative literature review as the principal method underlying this work. Next, we present the results of the literature review, with particular regard to the ambiguity of current terminologies and HCI research in the domain of open and reproducible science. Finally, we present and analyze our researcher-centered definition of reproducibility, as a central term that concerns the ease of access to scientific resources, as well as their completeness, to the degree required for efficiently and effectively interacting with scientific work, and we conclude with a discussion of its practical implications for researchers and the wider science community.

Human-Computer Interaction Perspective
The findings and contributions of this paper stem largely from a human-computer interaction perspective. In this field, the terms 'human-centered' and 'user-centered' are established to describe research and design approaches that place the user in the center of any activity (i.e., mapping requirements, designing frameworks, and evaluating those frameworks). In this paper, we refer to 'researcher-centered' as activities and definitions to reflect the focus on the researchers' needs and socio-technical frameworks. Examples of work that focus on researcher-centered contributions include Badiola et al. [11] and Kay et al. [12].

Method
The principal goal of this paper is to define and advocate for researcher-centered definitions around reproducible science. In order to derive such a definition and to introduce the researcher-centered perspective, we report on a narrative literature review that we conducted within two distinct scopes: (1) current terminologies around open science, reproducibility, replicability, and related terms; and (2) HCI research exploring and supporting open and reproducible science practices.
The literature reported as part of this narrative review is largely the result of three years of full-time PhD research conducted and supervised by the authors of this paper. The resulting PhD thesis, Interactive Tools for Reproducible Science-Understanding, Supporting, and Motivating Reproducible Science Practices [4], reported in 2020 on four major in-thefield studies that placed researcher motivation, day-to-day practices, and infrastructure challenges at their core. To this end, the researchers conducted qualitative and mixedmethod studies with 45 researchers and data managers in data-intensive particle physics and across a wide range of diverse scientific fields. In addition to the extensive literature reviewed for the mentioned thesis, the authors reflect on literature from both review scopes.

Reflexivity
We recognize the importance of reflecting on our choice for a narrative literature review. We note that the goal of this work is not to present original research systematically mapping the literature of a specific domain. Rather, we want to reflect on the use of terminology related to open and reproducible science practices, as well as researcher-centered work within the wider HCI domain, to present a critical discourse around the mismatch between ambiguous modern science terminology and researchers' practices and needs. A narrative literature review allows for such a critical discourse [13].
Further, we recognize the need to reflect on the increased risk for bias, a key disadvantage of narrative literature reviews, as compared to systematic reviews [14]. Here, we note that, unlike common narrative reviews, the literature selection presented in this paper is grounded in a long-running, structured research project. We further argue that the impact of any potential form of bias on this work plays a secondary role, as our main contribution is to advocate for the development of researcher-centered definitions around reproducible science. Any concrete definition, such as the one we provide, will require extensive testing across the diverse scientific landscape.

Results
We present the results of our narrative literature review across three themes: ambiguity of terminology; socio-technical barriers; and researcher-centered strategies.

Ambiguity of Terminology
Dror G. Feitelson is among a growing number of scholars who have stressed that terms such as replicability, reproducibility, and repeatability are not always used consistently, arguing for the development of more precise and universal terminology [15]. Feitelson noted that the use of these terms largely depends on domain preferences. While general science communication has largely referred to reproducibility as a central term to describe modern science practices geared towards validation and reuse, the social sciences, and psychology in particular, prefer the term replication, according to Feitelson's analysis. The author contributes suggested terminology around five terms: repetition (rerun original artefacts), replication (precisely replicate, recreating artefacts), variation (repeat or replicate with measured modification of a parameter), reproduction (recreate the spirit with your own artefacts), and corroboration (obtaining the same results with another procedure).
Stefan Schmidt also noted a systematic missed opportunity to provide clear definitions around the term replication beyond its current conceptual use in science [16]. He stressed that "a detailed examination of the notion of replication reveals that there are many different meanings to this concept and the relevant procedures, but hardly any systematic literature." His well-received analysis concludes "that the notion of replication has several meanings and is a very ambiguous term".
Discussions and proposals surrounding the nomenclature of terms such as replication and reproduction concern not only scholars such as Feitelson and Schmidt, but a set of diverse science stakeholders. For example, the Association for Computing Machinery (ACM) also recognized, related to the concept of reproducibility, that "the terminology in use has not been uniform" [17]. Their proposed definitions of repeatability, replicability, and reproducibility revolve around the acting team and the nature of the experimental setup: repeatability (same team, same experimental setup), reproducibility (different team, different experimental setup), and replicability (different team, same experimental setup). Notably, these definitions have themselves evolved significantly over time. While the characterizations of reproducibility and replicability were swapped entirely in an earlier version of the document, the ACM noted that changes were made as a result of discussions with the National Information Standards Organization (NISO). This is a strong example hinting at the major challenges of introducing clear-cut nomenclature, even if designed for a well-defined and limited audience.
Chen et al. [2] provide an impressive account of how to further develop and adapt definitions to fit specific scientific domains. Based on the ACM's definitions and terminology used by Goble [18] and Barba [19], the authors characterized terminology around science reproducibility tailored to the particle physics domain. Their characterizations reflect the unique data volume and data science challenges of this particular field. For example, the authors characterize replication as the same dataset and same implementation, but independent analysts, while reproducibility is introduced as variations in implementation and independent analysts. Notably, their definitions make explicit use of the term reuse, thereby shifting the meaning from an abstract goal that is considered an underlying motivation and effect of modern science practices, such as replication and reproduction, to a dedicated and clearly defined concept that is distinct from the aforementioned concepts. This latter observation hints towards an increase in complexity that stems from the shift of the reuse term as a motivation for replication and reproduction to a dedicated strategy that needs to find its place in an already unstructured and ambiguous science terminology space. However, we argue that it is not the only one. Another major term that is closely linked in practice to goals of repeatability, replicability, reproducibility, and reuse is open science. Open science, characterized as "transparent and accessible knowledge that is shared and developed through collaborative networks" [20], is often perceived as a key movement and enabler of science reproducibility. The rationale is clear: researchers can only repeat, replicate, reproduce, or reuse artefacts and/or methods if they are available and sufficiently documented. Nevertheless, perceiving sharing and openness as the only required actions to enable reproducibility and the like fails to acknowledge the complexity of modern science practices. In their well-received article "Open is not enough", Chen et al. [2] provide insight into practice in particle physics. The authors argue that "openness alone does not guarantee reproducibility or reusability, so it should not be pursued as a goal in itself." Instead, the authors stress that shared resources must be accompanied by a rich set of resources, meta-data, and explanations that make replication, reproduction, and reuse realistically feasible.
In summary, the review in this section shows that reproducibility and related terms are missing clear definitions and are currently used interchangeably across fields of science. Further, we note that open sharing, a key requirement of the FAIR data principles [21,22], cannot act as a common denominator to address and resolve these issues. While open science is a term and movement supported by and understood across fields of science, it describes only a subset of activities necessary to make science useful and transparent beyond the original research process and publication.

Socio-Technical Barriers
Research data management (RDM) is another key term that refers to "the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results" [23]. As such, RDM encompasses actions related to the sharing of resources. However, sharing is hindered by a wide range of socio-technical barriers. This understanding is also echoed in the work of Oleksik et al. [24], who stressed that in order to design effective tools for collaborative data generation and reuse, "a deeper understanding of the social and technological circumstances" is needed.
Common barriers relate to the mismatch between the extensive effort needed to conduct good RDM [25] and the perceived low personal gain for following these practices [26,27], the fear of judgement for erroneous or less-than-perfect quality shared data [9], and the fierce competition in academia [28]. In response, a wide range of extrinsic and intrinsic solutions have been proposed, including journals and conferences demanding resource sharing [29,30], monetary rewards [31], and gamification-based peer recognition [8].
Howison and Herbsleb [32] reported on incentives and barriers of open source science software development, related to academic reputation. They found that small contributions to software repositories are often not at all reflected in publications. The authors stressed that challenges of code sharing and academic recognition have not yet been sufficiently addressed to promote open source developments in science. Clearly, this is a growing issue as science becomes increasingly data-intensive and reliant on software processing.
Vertesi and Dourish [33] studied the value of data and willingness to share data through the lens of their context of production. Based on ethnographic studies with two robotic space exploration teams, they found that the production context informs researchers' views on data as either collective or individual resources, in turn impacting their willingness to make data available to third parties.
This social aspect to data sharing is also reflected in the large-scale survey that Monya Baker [3] conducted with 1500 scientists across a range of diverse scientific fields, including chemistry, physics, environmental science, biology, and medicine. Baker found that 90% of the survey respondents perceived either a significant or slight reproducibility crisis. Around 80% of the participants stated that "incentives for better practice" and "incentives for formal reproduction" were likely or very likely to boost reproducibility.
Besides the challenges around motivation and incentives for sharing, another set of common barriers include technical concerns. Akers and Doty [10] reported on a survey to which 330 university faculty members responded. Their findings showed that survey respondents in 2013 commonly stored data on external hard drives and university servers. This also impacts the way in which resources are shared. The faculty members indicated that sharing by email and through popular cloud storage services were among the most common sharing strategies. These findings still resonate within more recent cross-domain studies (e.g., [9,34]) and are echoed even in the most data-intensive particle physics in 2019 [4,35].
Tang and Hu [34] reported on a survey study that focused on the role of librarians and RDM support provided by academic libraries. Their work showed that most challenges were related to technical limitations, including storage capacity and data bandwidth. To address technical challenges, science repositories are increasingly developed and promoted, both general repositories (e.g., Dryad and Zenodo) and field-specific ones (e.g., CERN Analysis Preservation (CAP) [2] in particle physics and ICARDA MEL (https://mel.cgiar.org/accessed 19 February 2022) for agricultural science). However, the challenge remains to educate researchers about how to use these services to conduct effective RDM. This training perspective is another socio-technical challenge that places responsibility on academic libraries and support services [5,6].

Researcher-Centered Strategies
Discussions around the irreproducibility of research provide indications that today's mere availability of science repositories does not solve the broader socio-technical challenges discussed in the previous section. This understanding is increasingly reflected in the work of human-centered, or better, researcher-centered studies that aim to inform the design of socio-technical frameworks through an extensive understanding of researchers' practices and needs. Feger et al. [35] demonstrate this in the case of particle physics at CERN. The authors studied the community's adoption of the CAP service, tailored specifically to the storing and sharing needs of researchers in this data-intensive domain. Despite this tailored design that aimed at lowering storage and documentation efforts through autocompletion mechanisms, the researchers participating in this interview and walkthrough study expressed hesitation towards adoption of the service, stressing common issues around personal motivation, heavy workload, and competition. However, the study also showed that a central repository suh as CAP can support specific use cases that in turn profit contributing scientists. The authors mapped a range of features related to collaboration stimulation, effective communication, and automated error analysis, as tangible rewards for contributors to the repository. The authors referred to such features as platforms' "secondary usage forms" that hold the potential to turn researcher-centered insight on practices and needs into tangible incentives for researchers to follow reproducible practices.
In their later work, Feger et al. [8] further explored peer recognition as a motivating design element. The authors explored gamification, the use of game design elements in non-game contexts [36], to promote and reward reproducible research practices. Recognizing that an overly simplistic approach to gamification would most likely alienate researchers, the authors systematically mapped design requirements across a wide range of game design elements. This work resulted in the design and implementation of tailored science badges [37] that focused on peer assessment and recognition and that were overall considered appropriate and rewarding by researchers participating in their evaluation. The potential of rewarding design components is also echoed in the work of Kidwell et al. [38]. The authors reported on the effects of the Psychological Science Journal adopting open science badges. Here, authors received public digital badges reflecting the sharing state of data and/or materials. The analysis of Kidwell et al. showed that the badges significantly increased sharing rates, both in relation to earlier sharing behaviors in the Psychological Science Journal as well as compared to other journals in the same discipline.
Differentiating between generic science badges and tailored ones reveals similarities to differentiating between generic and field-specific repositories. While generic game design elements, including badges, can easily be adopted across scientific fields and services, the design of tailored elements is grounded in an in-depth understanding of researchers' needs and practices in a given community. The ACM started introducing science badges [17] that blur the lines between the two. These badges are generic in nature but tailored to the publication outlets and the research supported within the ACM's scope.
Finally, we wish to highlight a researcher-centered framework that places human motivation for RDM practices at its center. The stage-based model of personal RDM commitment evolution [9] is grounded in an interview study with scientists and data managers from a wide range of diverse scientific domains. The model contains four stages (non-reproducible practices; overcoming barriers; sustained commitment; and rewards) and corresponding transitions along a spectrum of RDM commitment evolution and RDM commitment decrease. The stage overcoming barriers relates closely to issues around education and technical infrastructure, while rewards aims at providing tangible benefits to researchers who follow RDM practices. Thereby, the model encapsulates and recognizes issues around the wider socio-technical barriers, while emphasizing the value of researchercentered studies to provide tangible solutions.

Discussion
Our narrative literature review showed the ambiguity of terminologies around responsible modern science practices, including open science, reproducibility, replicability, repeatability, and reusability. Not only do these terms carry different meanings across fields of science, but they are even used interchangeably within disciplines and publications. While we do not intend to claim that such ambiguity represents a major barrier to open and reproducible science, we stress that the lack of a universal understanding represents a missed opportunity to advocate for the value of modern science principles across the wider research community. This belief is anchored in our extensive understanding of the socio-technical barriers that hinder the broader adoption of effective practices tailored to the challenges of modern data-intensive science. In response, we propose a researchercentered definition of reproducibility as a central term that concerns the ease of access to scientific resources, as well as their completeness, to the degree required for efficiently and effectively interacting with scientific work.
In this section, we first discuss the three key elements of this researcher-centered definition: (1) ease of access; (2) completeness; and (3) efficient and effective interaction.

Ease of Access
Access to scientific resources is a key issue independent of the relationship between scientists and the team that originally performed the experiment. Research shows that external teams need to rely on existing open data or the goodwill of the team that originally performed an analysis. Both strategies are characterized by a high degree of uncertainty [35]. The same applies in part to research data (re-)use within the same team/same organization. Here, access uncertainty is affected by unstructured and volatile storage. Vines et al. [39], for example, found that research data availability decreased rapidly after publication. When they contacted original authors and requested data for a reproducibility study, they found that the data availability of 516 studies, with article ages ranging from 2 to 22 years, decreased by 17% per year. However, even if resources are still available and researchers are willing to share them, personal communication through emails is still one of the most common access strategies [10,35], representing an inefficient and non-scalable exchange pattern.

Completeness
Completeness refers not only to the quality of a dataset or software to be intact and not missing any components. Instead, completeness also relates to the availability of any accompanying meta-data and thorough documentation that make (re-)use possible. For example, missing information about the concrete runtime environment might make a software snippet as useless as if parts of the source code were missing.

Efficient and Effective Interaction
Finally, both ease of access and completeness of data must be considered in the context of the intended interaction. One common issue that we mapped in data-intensive particle physics relates to the tweaking of individual parameters to test small-scale hypotheses [35]. Moreover, the notion of "all about getting the plots", as stated in the work of Howison and Herbsleb [28], is reflected in our own findings. Here, researchers emphasized the need to frequently go back to previous analyses and to rerun them simply for the purpose of changing the data visualization for a presentation. It is clear that such examples are part of a class of interaction scenarios that do not tolerate extensive effort to access or complete scientific resources. They must be readily available and usable; otherwise, (re-)use attempts are likely to fail right from the start, harming both the original creators through lack of credit, as well as the researchers who want to (re-)use the resources.
In summary, we refer to efficiency as a property that is characterized by the resources that researchers need to invest in order to interact with scientific materials. Key metrics include time and money spent in the process. In contrast, effectiveness is characterized by the researchers' ability to successfully complete their task. Good accessibility of the complete data and meta-data that are suitable for the intended task is likely to foster an efficient and effective interaction with shared research materials. Limited accessibility or incomplete materials lower interaction efficiency or successful task completion.

Applicability
We conclude this section with a brief discussion of the applicability of the proposed researcher-centered definition. Here, we note three key limitations that may impact the adoption of the definition.
First, deciding on the ease of access might not be entirely within the control of the researchers or their institutes. Sharing can be affected by external factors such as license restrictions and embargo periods. Depending on these limitations, research materials might need to be shared within a restricted circle or might not be shared at all. Resource accessibility might change over time as embargo periods or license claims run out. Second, these external factors impact resource completeness for certain members of a community if that specific community is excluded from accessing research materials.
Third, we note that privacy restrictions represent additional limitations to data sharing, accessibility, and resource completeness. For example, interview recordings represent key research resources in social sciences and many other fields. However, interview data can be sensitive and it is often challenging or impossible to fully anonymize these materials. Legal frameworks such as the European Union's General Data Protection Regulation and missing consent from study participants might further block data sharing.
These examples hint at the diversity of research materials and resulting challenges for data sharing and (re-)use. If we take computer science as an example, we can imagine the spectrum of different research approaches and materials, as well as their individual requirements and needs for effective sharing and (re-)use. In this domain alone, researchers cover theory with mathematical models and develop applications for a wide range of technologies, from mobile phones to high-performance servers, to name just a few. Creative coding is another example of computer programming that closely focuses on creating art forms or expressive interactions. Here, the lines between more traditional executable research materials and design resources become blurry, in turn increasing the challenges for effective reproducibility. In response, our researcher-centered definition of reproducibility aims to be inclusive, without prescribing specific actions. Rather, we expect the definition to contribute to a productive conversation across science stakeholders, with the goal of creating suitable socio-technical frameworks for science reproducibility and (re-)use.

Conclusions
We reported on our narrative literature review that focused on common open and reproducible science terminologies as well as HCI research on open and reproducible science practices. We found that scientists' key practices and needs for the (re-)use of research materials are not reflected in the wider taxonomy of terminologies that are related to the notion of reproducibility and reuse. In this context, we argue that the ambiguity of terms such as reproducibility, replicability, repeatability, and reuse is not the key barrier to researchers' widespread adoption of these science practices. Rather, the missing integration of qualities that matter to researchers is concerning. In response, we proposed a researchercentered definition of reproducibility that reflects a bottom-up perspective on data (re-)use practices: reproducibility concerns the ease of access to scientific resources, as well as their completeness, to the degree required for efficiently and effectively interacting with scientific work.
We note that this researcher-centered definition of reproducibility does not refer to any primary or secondary terms related to reproducibility. In particular, it does not make references to open science, sharing, reuse, repetition, replication, or RDM. Rather, it encapsulates our understanding of researchers' needs and practices related to calls for reproducible research. This is conceptually in line with van de Sandt et al. [40], who, in 2019, emphasized that clear definitions of the terms use and reuse were still missing and who concluded that "there is no reason to distinguish use and reuse" from a researcher point of view. In response, they defined "(re)use as the use of any research resource regardless of when it is used, the purpose, the characteristics of the data and its user." We note that while our own researcher-centered definition of reproducibility takes a similar approach to simplifying terminology for the sake of acceptance and compliance across the research community, our definition does place the purpose for interaction with scientific resources at its center. We argue that this is key to the general acceptance of the definition across fields of science, as well as to creating tangible outcomes that stem from the integration of the definition's components into the design of socio-technical science frameworks. We envision two key opportunities resulting from the adoption of our researcher-centered definition: (1) we expect that the definition will provide researchers with improved tools to plan their resources' life cycle by analyzing RDM activities according to the three pillars ease of access, completeness, and supported purpose; (2) we envision that a common and basic shared understanding of what denotes reproducibility and its key components will act as a common protocol in the exchange between researchers and support staff, including service developers, science regulators, data managers, and librarians. As a concrete application, research directors, data managers, and librarians might reflect this definition in their policies by distinguishing between classes of research contributions. For example, instead of demanding only the sharing of research materials to conclude a project, policies might mandate the sharing of executable pipelines or containers for complex computational analyses that have reached a final state. Along these lines, future repositories and RDM services might even provide an estimation of how much time and effort needs to be invested for (re-)using shared research materials.
Finally, we note that introducing a researcher-centered definition of reproducibility leaves room for speculation on the definition's interplay, if adopted in practice across fields of science, with related terms. We expect that funding agencies, publishers, and regulators will continue to distinguish between different terms (e.g., repeatability, reuse, reproducibility, and replicability) established in specific fields of science in order to outline concrete expectations. However, we also expect that a common research-centered understanding of reproducibility will help to translate and negotiate expectations between the various stakeholders in science.