1. Summary
Determining the quality of scientific data is a task of key importance for any research project and involves considerations at conceptual, practical and methodological levels. The task has arguably become even more pressing in recent years, as a result of the ways in which the volume, variety, value, volatility, veracity and validity of scientific data have changed with the rise of data-intensive methods in the sciences [
1]. At the start of the last decade, many commentators argued that these changes would bring dramatic shifts to the scientific method and would per se make science better, thanks to fully automated reasoning, more data-driven methods, less theorizing and more objectivity [
2]. However, analyses of the use of data-intensive methods in the sciences have shown that the feasibility and benefits of these methods are not automatic results of these changes, but crucially rest upon the transparency, validity and quality of data practices [
3]. As a consequence, there are currently various attempts at implementing guidelines to maintain and promote the quality of datasets, developing ways and tools to measure it and conceptualizing the notion of quality [
4,
5,
6].
In this commentary, I want to focus on the latter line of research and discuss the following question: what are high-quality data? I propose a framework for data quality that suggests a contextual approach, whereby quality should be seen as a result of the context where a dataset is used and not only of the intrinsic features of the data. I develop this approach by integrating philosophical discussions on the quality of data, information and evidence. In
Section 2, I start by reviewing analyses of quality in different areas of philosophical research, particularly in the philosophy of information, the philosophy of science and philosophy of biomedicine. I identify and integrate shared results from this review and argue that these point towards a contextual approach, presenting the approach in
Section 3. I then discuss what the approach entails and how it can be used in practice, looking at current debates on quality in the scientific and philosophical literature (
Section 4). I conclude by summarizing the discussion of the commentary in
Section 5.
2. Quality as a Property of Information, Evidence and Data
Quality has been discussed in areas of philosophical work highly engaged with research practices and debates in the sciences [
7]. In this context, I identify three main areas of research whose results are particularly significant for conceptualizations of quality and yet have only partially been applied to issues in data quality. I want to bring forth these results and their integration as important contributions for more general and interdisciplinary discussions on data quality. I identify and discuss research on quality as a property of three closely related notions: information, data and evidence.
First, research on quality has traditionally focused on information quality, which became prominent in computer science in the 1990s. In this context, an influential line of research started to move beyond traditional interpretations of quality in terms of accuracy only, developing a multi-dimensional and purpose-dependent view whereby a piece of information is of high quality insofar as it is fit for a certain purpose [
8]. This line of research has developed into two main approaches since the 1990s: by surveying opinions and definitions of academics and practices from an “empirical” point of view; and by studying the different dimensions of quality and interrelations between these from a theoretical and “ontological” perspective [
9]. The empirical approach has expanded conceptualizations of information quality to include not only traditional dimensions such as accuracy, but also objectivity, completeness, relevance, security, access and timeliness; here, the goal has primarily been to categorize these dimensions, rather than to define them [
10]. On the other hand, the goal of the ontological approach has been to understand how to connect different dimensions of information quality (such as those surveyed through the empirical approach [
11]) and conceptualize and measure potential disconnections as errors [
12].
These discussions have been picked up and analyzed in the area of research known as philosophy of information. According to Phyllis Illari and Luciano Floridi, computer science has not fully embraced the purpose-dependent approach to information quality in all of its implications and theoretical understandings of information quality are still in search of a way of applying the approach to concrete contexts [
6] (p. 8). With these problems and goals in mind, Illari has suggested that information quality suffers from a rock-and-a-hard-place problem [
13]. While information quality is defined as information that is fit for purpose, many still think that some aspects and dimensions of information quality should be independent of specific purposes (the rock). At the same time, there is a sense in which quality should make information fit for multiple—if not all—purposes: a piece of information that is fit for a specific purpose, but not for others, will not be considered of high quality (the hard place). As a way of going beyond the impasse, Illari has argued that we should classify information quality on the basis of a relational model, which links the different dimensions of quality to specific purposes and uses [
13]. Therefore, Illari conceives of quality as a property of information that is highly dependent on its context, i.e., the specific uses, aims and purposes we want to employ a piece of information for. In other words, quality cannot be independent of fit for a specific purpose and cannot consist in a fit for any single purpose.
I identify a similar push for the purpose-dependent and contextual approach in a second area of philosophical analyses, which have more specifically focused on the use of data in the context of scientific practice. The increasing volume and variety of data used in the sciences, with related and different levels of veracity, validity, volatility and value, have created a number of potential benefits as well as challenges for scientific epistemology [
14]. Determining and assessing quality is one of the main challenges of data-intensive science because of the diversity of sources of data and integration practices, the often short “timespan” and relevance of data, the difficulties of providing quality assessments and evaluations in a timely manner and the overall lack of unified standards [
4]. Partly as a result of these shifts, recently philosophers of science have expanded their focus on data as an important component of scientific epistemology [
15]. In this context, some analyses have focused on the tools that are used to calibrate, standardize and assess the quality of data in the sciences. For instance, data quality assessment tools are often applied to clinical studies, in the form of scales or checklists about specific aspects of the study, with the goal of checking whether the study, e.g., makes use of specific statistical methods, sufficiently describes subject withdrawal, etc. According to Jacob Stegenga, there are two main issues affecting the use of these tools in the biomedical context: a poor level of inter-rating operability, i.e., different users of the tools achieve different instead of similar results; and a low level of inter-tool operability, i.e., different types of tools give different instead of similar results when assessing the same study [
16]. Stegenga has argued that this can be conceptualized as a result of the underdetermination of the evidential significance of data: there is no uniquely correct way of estimating information quality and different results will always be obtained in relation to the context, users and type of study. I interpret these results in similar terms to the aforementioned analysis by Illari [
13], as pointing to the crucial role that the context where data are analyzed and used plays in determination of its quality. Quality is not an intrinsic property of data that only depends on the characteristics of the data itself: quality will differ depending on contextual features, such as the tools used to assess quality, who uses them, their purposes, etc.
Further support for this point comes from Sabina Leonelli’s studies of data practices—especially assessment methods—in the life sciences [
17]. Leonelli has argued that existing approaches to data quality assessment mostly fail at delivering on their objectives or being actually used in standard practice, to the point that, currently, new and more recently developed technologies and techniques of data collection are used as unofficial markers for data quality. This leads to a problematic situation for the following reasons. Using technologies as markers of quality creates problematic relations with industry, whose economic interests in pushing specific and new technologies do not necessarily align with the epistemic aims of research communities. In particular, when quality standards are locked in and tied to specific technologies, researchers without access to those technologies cannot meet those standards. In this way, using technologies as proxies reduces diversity by creating systematic disadvantages towards researchers who have little access to the latest technologies, often excluding their contributions. To overcome these issues, Leonelli has argued for a different approach to quality: the quality of data is determined by the alignment and relations between data and other components of scientific research, including technologies but also research questions, methods and infrastructures. I interpret this as a further point in the purpose-driven and local approach to quality, which takes into account the contextual features of data use as much as the intrinsic characteristics of the data themselves.
These discussions align with other and close areas of philosophical research, which are focused on the history and epistemology of experimentation [
18,
19] and the role of measurement practices, concepts and quantity terms [
20]. In this context, measurement has been discussed as an inferential process that starts from instrument indications and results in outcomes, in the form of claims about the status of the object that is measured. In this sense, Bas van Fraassen has interpreted measurement outcomes as regions of the space of possible values identified by measurement practices, whose dependence on theory is involved at the stage of the interpretation of the outcomes as much as for their capacity of representing the objects of interest [
21]. More recently, Luca Mari has argued that measurement should be discussed as a form of information gathering; on this basis, measurement and standardization practices should be seen as producers of knowledge and their quality can be measured as the quality of the types of knowledge they produce [
22]. In this direction, standardization is a type of modeling, whereby the calibration of measurement as a system of practices and conceptualizations is obtained by the specific modeling and representation of the elements involved in a specific context of those measurements [
23,
24].
The third line of philosophical research that I want to discuss here has focused on quality as a property of scientific evidence, especially in the biomedical context. This research has partly been a reaction to the rise of evidence-based medicine (EBM), an approach to medical research and practice that is based on a specific categorization and ranking of evidence. Since the 1980s, as a movement to reform medical practice and research, EBM aimed at improving decision-making by removing the influence of subjective preferences from different stages of the process. As formulated by Sackett and colleagues, the central idea of EBM has been “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [
25] (p. 71). Practically, EBM proponents have introduced “evidence hierarchies”, which describe the assumed quality of different types of evidence and are supposed to help decision-makers to project some order in the available evidence [
26]. This order aligns better support for the efficacy of different interventions with better evidence types, which in the EBM context consists in evidence from randomized controlled trials or systematic reviews and meta-analyses of randomized controlled trials.
Philosophers of medicine have analyzed and criticized various tenets of EBM, including the theoretical and methodological basis of the choice of specific types of evidence as high-quality evidence [
27], the exclusion and denigration of some types of evidence [
28] and the ways in which hierarchies of evidence are delineated in evidence-based approaches [
29]. While these analyses have not explicitly taken issue with notions of quality per se, their results are significant for our discussion on how to approach data quality. The ways in which evidence is classified and its quality is assessed in EBM seem to apply an intrinsic and universalistic approach to evidence, whereby, e.g., evidence collected through randomized controlled trials (RCTs) is “gold standard”. This means that RCTs are normally given the highest level of quality, although this may be lowered in case of methodological problems; instead, evidence from other methods such as observational studies could be ranked as high quality, but are automatically given a lower rank as the starting point [
30]. In other words, certain methods are considered to be prima facie and epistemically superior, with a gold-like, higher value compared to alternatives [
31]. The problem with this approach to the classification of evidence quality is that it is applied to most areas of biomedical research, with no consideration for specifics and different research contexts. In many areas of biomedical research, the gold standard evidence hailed by EBM often cannot be produced, but this does not necessarily mean that the evidence produced is of low quality. For example, Saana Jukola has shown that in nutrition research, RCTs cannot be conducted because of practical, ethical and methodological aspects of this line of research [
32]. Differently from the EBM approach, the quality of biomedical evidence is used to meet specific—rather than universal—hierarchies depending on the aims and the context in which it is to be used.
3. Developing a Contextual Approach to Data Quality
Where does the previous review leave us? I argue that discussions of quality in relatively distinct areas of the philosophical literature can be integrated into an overarching approach to quality in the context of scientific data, which has contextuality as its core principle. According to this approach, quality is a contextual feature of data: it is a result of the relations established between a dataset and the questions, aims and tools employed in the context of the use of data; the assessment of the quality of a dataset needs to focus on the features of this context as much as the dataset itself.
As such, the contextual approach is a development of contemporary accounts of data and information. Leonelli has defended a view of scientific data according to which data is a relational entity, whose evidential value is not a given and intrinsic component, but is rather a result of the relations established between the questions, claims and purposes involved in scientific practices and the objects that are used as data [
15]. Similarly, Floridi has discussed information in relational terms, according to which something counts as information only for a certain type of agent and use—to the point that misinformation does not count as information for an agent interested in the production of knowledge [
33].
More specifically, the main difference-maker that determines the relevant features and dimensions of data quality in a specific context is the use of data, including the goals, assumptions, chains of inference and evidential reasoning involved. Depending on the features of the context where a dataset is used, some dimensions will become more important than others. For example, in cases of urgency and the need to use data for immediate policy measures, the timeliness and accuracy of data might be preferred over its completeness. In turn, the context of use is shaped by the evidential reasoning that determines the use of a dataset as a representation of certain phenomena and is thus based on various chains of inference and the mobilization of other evidence and knowledge, as well as specific assumptions, warrants and goals [
34]. This means that, for instance, when mobilizing and integrating different datasets, users might determine and assess the quality of a dataset for its relevance and compatibility with available evidence, as opposed to its objectivity. Therefore, in a specific context, the quality of a dataset is determined by the use of the dataset and in particular the alignment of its dimensions with the contextual properties of use. In this sense, conceptualizing the quality of a dataset in contextual terms is a move beyond seeing quality as a “static” and universal component of datasets, which can be determined independently and on the basis of their intrinsic characteristics only. At the same time, emphasizing the role of context entails that quality will have to be assessed differently in different contexts. Yet, this does not mean that the categorization of quality into different dimensions and components is subjective, or that the development of approaches, tools and standards for the evaluation of data quality is unnecessary. These need to be encouraged as attempts at developing more local and situated approaches to quality, which includes critically evaluating the use of data and explicitly reflecting on its relation with specific dimensions of quality. In this sense, the contextual approach indicates quality criteria and assessment approaches according to which:
the specific elements and dimensions of quality will be different depending on the features and goals of data use;
some dimensions will be more important than others in different contexts; and
since each dimension of quality needs a specific type of measurement, different measurement tools and techniques will be used in different contexts.
4. The Contextual Approach in Practice
The contextual approach is a way of conceptualizing quality as a relation of scientific data with other components of a research context and therefore thinking about how quality assessment should be structured and implemented. But how does the approach translate in research practice? I now discuss three cases where the contextual approach can be seen in practice—areas of research which show how the approach can work in the context of specific applications, as well as the issues and different directions the approach points to.
The first case I want to discuss is one of the current attempts at improving the quality of research, by making data FAIR, i.e., findable, accessible, interoperable and reusable [
5]. FAIR is a set of guidelines that aims to raise the quality of data collected and produced as a result of research practices and related analytic and processing tools. Both this specification of the four components and the contribution of the FAIR movement can be informed and interpreted from the perspective of the contextual approach to data quality. First, the four FAIR guiding principles can be conceptualized as dimensions of quality as an overarching (desired) feature of a dataset—e.g., accessibility was already present in some of the categorizations of quality in the information quality debates of the 1990s [
8]. Second, as such, the guiding principles presented by FAIR are contextual dimensions of quality. Namely, the findability, accessibility, interoperability and reusability of a dataset are features that determine the quality of a dataset but arise and can be evaluated only in specific contexts. Whether a dataset is findable, accessible, interoperable or reusable will depend on the specific features of a dataset—and indeed the FAIR guiding principles are presented with concrete suggestions for how to deliver these principles. For instance, in this context, the findability of a dataset can be achieved through, e.g., metadata that can assign a unique identifier to the dataset, describe it and are registered in searchable resources and repositories. The digital object identifier (DOI) is a good example of a way in which the findability of publications and datasets can be applied and their quality therefore improved as a result.
What I want to highlight here is that these elements, such as such as metadata and repositories, pertain to the context of a dataset, rather than the intrinsic properties of the dataset itself. The evaluation of the findability of a dataset comes down to these elements and, more generally, the context in which the data are actually used, as much as of the dataset in itself. I agree that, in individual cases, a dataset will either be found or not, but the realization of findability as a property of a dataset and a dimension of its quality will be due to the specific context where the use of a dataset is realized. This is why I argue that the FAIR guidelines largely instantiate the approach to data quality that I have presented in this commentary. At the same time, seeing the features indicated by FAIR in contextual terms also suggests that they will be highly dependent on the context of the specific sciences where the data are used, which means that their meaning and application will change, as research practices are different. For instance, although findability is a relatively generalizable feature, whether a dataset is findable by researchers is not an absolute or intrinsic feature of a dataset that can be easily scaled to any discipline. Different researchers will approach and potentially access a dataset coming from the perspective of their discipline and its specific data, which might be different from those of the context of data productions. This is why the work of data curation is so important in current data-intensive research, because it provides data with information on its original context and in this way enables new users to judge its reliability, quality and relevance for new and different uses [
35].
Another implication of this contextual interpretation of FAIR is that each re-use of data will lead to a different quality assessment and that a dataset used in one research project could have a different level of quality in another. Yet, one might argue that quality ratings need to be independent of context, especially for new users and different communities, who have not used the data yet and want to know about its quality. Is the approach I am proposing, thus, unhelpful and not applicable to actual research practice? I agree that quality ratings are necessary, in particular for data re-use, but here I think that the contextual approach can give different suggestions. Approaching quality in contextual terms suggests that researchers should take into consideration that initial quality ratings are highly dependent on the original context of the production and use of the data. This does not strip them of their value, but pushes new users to consider the situated nature of data production and make full use of the results of data curation. In addition, as we have seen, viewing quality ratings in context-independent terms is problematic, considering the evidential underdetermination of quality assessment tools, their low operability and failures at being used in research practice [
17]. More generally, in contemporary and data-intensive research, a single dataset is often used as different types of evidence, depending on how the dataset is analyzed, interpreted and used. For example, a dataset about dietary practices of a population could be used as evidence for diverse types of studies on, e.g., food practices and culinary culture, socioeconomic status and the epidemiology of obesity, etc. As a consequence, data quality could similarly differ depending on this use: what counts as a certain level of quality for a research project in epidemiology might be different in the case of research on food history.
A second point of discussion that I want to focus on comes from various issues connected to the reproducibility of research. The possibility of reproducing results has traditionally been considered a crucial feature of the scientific method, as a way of ensuring that experimental design and methods can be tested to deliver the same results of the original study and therefore prove the reliability of the researcher. As famously argued by Karl Popper, “non-reproducible single occurrences are of no significance to science” [
36] (p. 64). The issue has recently gained prominence in meta-scientific discussions as a result of the failures to reproduce various studies in different areas of the sciences [
37]. These and other issues have therefore been framed in terms of a “reproducibility crisis” of the sciences, which has been attributed to a number of problems and features of contemporary research management and practice, such as failures in quality checks, lack of transparency of research methods and data, cognitive and other forms of bias, the problematic use of, e.g., p-hacking, etc. [
38].
The reproducibility crisis is usually connected to discussions on a general decrease in the quality of scientific research and in particular the quality of research outputs, such as specific types of data. This is where the reproducibility and quality of data can be connected, in the sense that reproducibility is often taken to be an overarching epistemic value of science and, among other things, a sign of the quality of data [
39]. In this sense, I argue that the contextual approach can be used to critically analyze current discussions on the reproducibility crisis, suggesting that we should look at reproducibility as a contextual property of research practices, as opposed to requiring it as a general indicator of quality that can be equally applied in any research context. Recently, Felipe Romero has argued that reproducibility and replication do not necessarily apply to all of the sciences or work as regulatory ideals and indicators of research quality. What counts as reproducible is actually highly variable and depends on:
As a consequence, what reproducibility means, what can and should be reproduced and the degree to which reproducibility is possible—if at all—change significantly in connection to the specific context of research. With a focus of experimental practices, Uljana Feest has argued that the role of reproducibility is not particularly central in this context [
41] and Sabina Leonelli has claimed that reproducibility requirements should be tailored to the contextual features, circumstances and goals of a specific scientific project or area of research, as well as the assumptions, values and judgements that are involved in practice [
42]. This direction in the literature on the reproducibility crisis further intersects with the contextual approach I have presented. The push for more documentation, coordination and transparency of local variants of reproducibility are ways to document the use of data and frame the quality and reproducibility of data and results as contextual properties of research practices. This is also a move beyond the application of general requirements and protocols, which are supposed to apply universally and thus largely independently of the specific contexts of research. As I have argued, the contextual approach suggests to move away from viewing quality in universal terms and in the direction of locality and dependency, also in the context of the current debate on reproducibility. At the same time, similarly to the previous discussion on re-use, this brings up the question of how and when to apply reproducibility standards [
43]. Here, the contextual approach follows a “local” approach to the issue, according to which requiring reproducibility as a general standard is highly problematic and discussing the issue on a case-by-case basis is the direction to follow.
A third point that I want to discuss further explicates this interplay between universal and local standards, in the context of current critiques of the quality of data collected through non-experimental methods. For instance, the use of observational studies in areas of research such as the life and health sciences has been significantly criticized because of the low degree of quality and reproducibility that they yield [
32]. In EBM evidential hierarchies, data from observational research are ranked at a very low level of quality compared to clinical studies. In this context, various issues of quality connected with bias, internal and external validity and reproducibility are condensed into more general critiques of observational data, to the point that the collection and employment of other types of data are often encouraged [
44].
One of the problems with this line of argument is that this does not do justice to methodologies and traditions of research that employ observational methods and are not primarily experimental, nor can have similar levels of control on the experimental environment and setup—in this sense, these critiques do not consider the contextual nature of data quality. In many areas of the sciences, observational studies are the primary means of data collection and not just on the basis of practical constraints but also for epistemic reasons. For example, in epidemiology, observational data can deliver epistemic goals that would be difficult—if not impossible—to get through experimental studies [
32]. I acknowledge that EBM evidential hierarchies are designed with specific requirements and reasons for classifying types of evidence in the ways they do. The problem is that these hierarchies are often considered to have a universal value and are applied to a variety of different contexts, without considering local and contextual features. Following the contextual approach, the quality of observational data depends on the specific ways and context in which the data are used and cannot be only evaluated on the basis of the methods used to collect the data. Decisions on whether a dataset can be used as a source of high-quality evidence depends on the compatibility with other datasets and their evidential significance is produced and evaluated as part of data practices—not just on the basis of the intrinsic property of a dataset or its compliance with experimental and universal standards [
45].