In her 2016 lecture to the Warburg Institute entitled “Exempla and the Epistemology of the Humanities” [1
], historian of science Lorraine Daston described the experience well known in many interdisciplinary research fields when “mutual incomprehension or sometimes mutual indignation threatens to sever lines of communication” between researchers as they attempt to agree on the most basic premises of valid knowledge creation. The primary differences in the positions commonly taken by natural and human scientists lie, in Daston’s view, not in the finesse of the strategies and standards deployed to create knowledge, nor even the nature of the knowledge itself, but rather in the fact that the natural scientists have “a rich epistemological vocabulary in which to reflect critically on their various ways of knowing”. In contrast, Daston claims that it is barely an exaggeration to say that there “is no epistemology of the human sciences”. This is not due to the fact that the methods and strategies for knowledge creation are deficient in these fields, but rather, in Daston’s view, that they have been so little in the focus of the historians and philosophers of science that it can now seem almost presumptuous to speak of an epistemology of the human sciences, this seeming like a “usurpation of the rights and privileges of the mightier natural sciences”.
In an attempt to redress some of this historical neglect, Daston proceeded in her lecture to explore an epistemic strategy she finds particularly refined in the humanities, that is, the argument via exempla. Facing the same challenge, William Franke posits an approach based on recognising that epistemic strategies in the human sciences are optimised to account for the different relationship between the researcher and the object of their study in these fields: “…the humanities consist in relational or personal knowing rather than in objective, methodological knowledge. In other words, we ourselves are involved in what we know, and this character of personal involvement is crucial to the nature of such knowledge. This applies, I would suggest, to all knowledge, in the humanities and in the sciences alike, with the difference that in the humanities we do not try to eliminate—or, at least, to limit as much as possible—this personal involvement that underpins our knowing” [2
These two different tentative attempts to establish the kind of vocabulary needed to enable humanists to speak not only of what they know, but also how they know, illustrate how challenging it can be to manage the gaps between disciplines traversed by approaches such as the digital humanities. At its best, such work can be truly collaborative and integrative, with the epistemic benefits being reciprocal and balanced between all contributors, but this is not always the case. One particular area where the impact can be seen of the gaps between methods and models applied in the humanities and those of computer science, software engineering and related disciplines, such as human–computer interaction, is in the management of uncertainty. If you are challenged to explain how you know what you know, then you will find it perhaps more difficult still to pinpoint and categorise sources and points of uncertainty, that is, how you know what you do not know. This paper will therefore look at how this differentiation between humanistic ‘fact’ and interpretation shapes the nature of humanistic research questions and attitudes toward sources in the face of such uncertainty. It will look at this phenomenon first in its analogue manifestations, then in the context of digital tools and data-driven research, thereby exposing some of the challenges inherent in designing digital systems to manage uncertainty in the context of research in the humanities. The frame of reference will be largely drawn from the discipline of history (for the humanities themselves are highly heterogenous), though the conclusions will be applicable across many such disciplines dedicated to understanding the artefacts of human culture and creativity.
As such, the work makes its claim to an interdisciplinary positioning, broadly within the digital humanities, attempting to communicate in a structured and nuanced fashion the manner in which uncertainty is recognised and managed, largely according to the perspective of the archival historian. It is written to reflect the practical and epistemic positioning of that perspective and should, therefore, be viewed as a sort of user story at the theoretical or macro level, reflecting a composite view of community norms, values and perceptions, rather than an individual case. Although it may not seem immediately actionable, understanding and being sensitive to the values, methods and processes in operation at this level is a key success factor for the adoption in one domain (the humanities) of tools to support the epistemic processes developed according to others (information or computer science). In this way, the primary contribution of this work will hopefully be to assist in the reduction of “scepticism toward ‘algorithmically reached’ interpretation” [3
] (p. 48) and facilitate the kind of cultural change that Ribes and Baker refer to as ‘conceptual innovation,’ which is “…an extended process: one cannot simply make claims about the importance of… [e.g., cyberinfrastructure] and expect immediate meaningful community uptake” [4
In order to contribute maximally to this dialogue, this paper will first look at the kinds of research questions that historians pose as starting points for their research. The next section will use these and other accounts to extract implicit characteristics of uncertainty in this work and the role it plays in historical interpretation. The fourth section will explore some of the mechanisms proposed by scholars for dealing with uncertainty in their work, while the fifth will specifically look at an example the role technology can play in these processes. The final section and the conclusion will extract from the sum total of this discussion a list of the areas where particular sensitivities exist in the management of uncertainty in the humanities, specifically as pertains to the migration of this process into data-driven environments, as well as best practice exemplars in the management of these concerns.
2. The Nature of Humanities Research Questions and Processes
Humanities research questions are mostly unable to be answered adequately though a single source of information. They also tend not to adhere to specific categories or types, making generalisation of them difficult and often misleading. For this reason, this article draws upon specific narrative examples of historical research questions captured and recorded in the context of two different digital projects. The first of these is the Collaborative European Digital Archival Research Infrastructure project (CENDARI). As a part of developing its portal and Virtual Research Environment, CENDARI produced a very instructive set of use cases, deployed as user stories and scenarios, featuring research questions from medieval and modern history [5
]. Some of the questions explored in that report are as follows:
“My project examines how the rural-urban divide shaped Habsburg Austrian society’s experience of the war from about 1915.”
“I want to investigate the relationship between the Bec-Hellouin Abbey, in Normandy, and other monasteries, priory, archbishopric and the kingdom of England, from its foundation to the XV century.”
“I wish to examine the ways in which the Ottoman empire and Islam were perceived by the political (liberal and Catholic) elites in the Slovenian lands of the Habsburg monarchy in the decade before the outbreak of the First World War.”
Eef Masson made the useful observation that humanities scholars “do not seek to establish unassailable, objective truths” and “instead […] approach their objects of study from interpretive and critical perspectives, acting in the assumption that in doing so they necessarily also preconstitute them,” [6
] (p. 25) a statement which is borne out by these examples. None of these questions has a straightforward, factual answer: indeed, each of them proposes a multifaceted investigation of an issue, perception or relationship that, even at the time of its happening, would have been complex to explain. These questions, in other words, are not only rife with uncertainty, but suffused with and dependent upon it. One can imagine factual layers that could be recruited to support these investigations, such as correspondence flows in to and out of the Abbey of Bec-Hellouin or records of trade relationships between the agricultural heartlands of the Austro-Hungarian Empire and its great cities, such as Vienna and Budapest. But even from and within these, selections and interpretations will need to be constantly made.
The use of such source materials will often be opportunistic, as there is no one pathway toward the resolution of these questions, and none that is not almost assuredly partial, unreliable or biased. Learning how to judge the provenance and authority of sources is therefore an essential part of the formation of the historian—a necessary and complementary skill to the kinds of questioning the above examples demonstrate.
3. Defining Uncertainty in the Humanities
The breadth of such humanistic research questions leaves a great latitude for encountering, incorporating and managing uncertainty. Uncertainty is not equally found in all sources, or by all researchers, nor does it have an equivalent impact on the ability to produce knowledge. As Petersen wrote, “uncertainty takes many forms, whether epistemic, statistical, methodological, or sociocultural” [7
]. Outside of the humanities, many models and tools have been developed, either for capturing uncertainty in data (such as Petersen’s “Uncertainty Matrix” [7
]) or for capturing aspects of processing that could introduce uncertainty (such as the NASA EOSDIS [8
]). These would, however, be cumbersome if not impossible to adapt to and apply in the humanities, an aspect of the overall environment discussed further in Section 4
Epistemically closer to home, Kouw, Van Den Heuvel, and Scharnhorst [9
] cite Brugnach et al.’s [10
] idea of a relational concept of uncertainty in which three possible sources of uncertainty exist: unpredictable systems, incomplete knowledge of a system, or incompatible frames of reference for the system [9
]. As a theoretical basis for exploring data and uncertainty in the humanities, this is a powerful model but somewhat removed from the wide variety of observable practices in humanistic disciplines. A more empirical account can perhaps be found in Arlette Farge’s The Allure of the Archives
], which gives a rich and detailed account of the many faceted challenges of attempting to create knowledge via the instrument of original sources held in historical archives. From the very outset, the reader of this text is struck by the omnipresent language of uncertainty—the researcher is “unaware … guesses … nothing indicates” [11
] (p. 19), the sources “fragmented … incomplete … imprecise” [11
] (p. 79), “tangled … contradictory … inconsistent … far from clear. … opaqueness and contradiction begin to creep in … incongruous spaces emerge” [11
] (pp. 80–83). Her contention that “the archive is an excess of meaning” [11
] (p. 31) is repeatedly borne out by her descriptions of the processes of discovery and interpretation that frame her professional activities. Farge’s account, therefore, provides an excellent basis on which to extract and characterise the sources of uncertainty encountered by the historian, which include the following:
: Farge writes that “the words of those ensnared actors contain perhaps more intensity than truth” [11
] (p. 27). This indicates perhaps one the most obvious sources of uncertainty—this is not exactly aleatory uncertainty (that is, irreducible uncertainty resulting from random occurences), but as close as one gets to working with the artefacts of human activity, given that humans often struggle to understand even their own reasons for taking particular actions or decisions and the impact these actions and decisions may have. Somebody did something, or something happened—why did they do this? Why did it happen? Is this mark in the manuscript a doodle or a representation of a face? What was the inspiration behind this author’s use of this word, this image?
Similarly, uncertainty arises due to errors, especially in cataloguing, but also in interpretation. I do not know where this document came from, is it in the right place/box? The date or origin given for this object in the finding aid or catalogue record does not make sense to me as an expert, what is the source of this gap? By revisiting evidence, I can see that a medical diagnosis made decades earlier was probably wrong, or that an earlier interpretation, was based on biased (see definition below) or incomplete records. Known disagreements between existing records or interpretations could be seen as a subcategory of error, for one can assume that in cases of differing accounts, all or part of one or both would be based on incorrect assumptions or information.
exist in the record, leading to partial, missing, perspective-limited or conflicting information. Farge writes: “both presence and absence from the archive are signs we must interpret in order to understand how they fit into the larger landscape” [11
] (p. 71) This is also a very simple form of uncertainty to imagine. I have a letter, I know who received it, but who wrote it? I know the age of an object, where did it come from? I know a document is from 1944 (or, more commonly, ‘5th Century,’ ‘medieval era,’ ‘around 1650’ etc.) but what specific date? One account claims that 20 people were killed in the skirmish, another account claims 200, which is accurate? Was this story written before or after the author heard about a particular event?
, that is, the intentional or unintentional incorporation into the archival record of a personal inclination toward or against an individual or group. Farge is exceptionally attuned to sources of bias in the archive. It can be introduced at any number of points in time and the research process, by the nature of the sources gathered and the authorities gathering them, or by the researchers’ own influences and knowledge. “The archives can always be twisted into saying anything, everything and its opposite” [11
] (p. 97). I can verify that a statement was made and by whom, but how can I verify the veracity or intention of the speaker? I am working with a collection that supports a particular conclusion, but is there material excluded here (intentionally or unintentionally) that would contradict it? This woman’s writing (which we no longer have) is described as inferior by a male contemporary, but was that an aesthetic judgement or a gender-based one?
. Uncertainty is seldom a property of a document (unless it is itself internally inconsistent) but rather of the person attempting to interpret it in a given context. Awareness of, and the importance given to, the discovery of a source of uncertainty matching any one of the categories described above is very likely to be dependent on the individual reader, their time, place and purpose for accessing a document, that is, the research question that drives their action. Because of this complex relationship between background knowledge, research question, and data source, sometimes, a source or nature of uncertainty cannot be defined precisely: it consists of a sense of something wrong, without being able to say exactly why (a mismatch with tacit knowledge as opposed to the ‘happy accident’ of discovery that is serendipity). For example: the number of items found in a digital search seems to low, but I cannot be sure what the source of the problem is. I know a certain object is in this collection, why can’t I find it in the catalogue? This visualisation (e.g., in a GIS) does not match my tacit understanding of a phenomenon. The author writes that his intention was to portray a character in a certain way, but that does not match my own interpretation—as Farge expresses it, “The heart of the matter is never immediately clear” [11
] (p. 62).
Humanistic data streams (defined here as the sources and other inputs that are used to inspire questions and build interpretations) are comprised of these kinds of ambiguous, contradictory, ‘messy’ components. The need to verify a single individual fact is a subordinate task for the historian, whose method embraces a much broader uncertainty. Uncertain data (defined as data whose meaning is unresolved or unresolvable for any or any given purpose) would in many fields be epistemically ‘off limits,’ an unusable and unstable ground for any conclusions to be made. This, however, is the norm and not the exception in the humanities, and any humanist should be well able to isolate and build around, either via proxy sources or other corroborative material, those aspects of a useful source that is also somehow flawed. As Farge describes, “When research runs into the opaqueness of the documents, and the documents no longer readily offer up the clarity and convenience of an easy ‘it’s like this, because that’s what’s written’ then our work as historians truly begins. We must start with what the texts harbour that is improbable and incoherent but also irreducible to any readily available interpretations” [11
] (p. 72).
4. Motivations and Mechanisms for Managing Uncertainty
Across the disciplines, there are many motivations and mechanisms for approaching the problems introduced by uncertainty into a research process. In many cases, these motivations are future-oriented, and primarily concerned with the need to take particular actions on the basis of decisions made under uncertainty, such as starting or supporting a war [12
] or acting in the face of climate change [13
]. The categories and types of uncertainty that will be defined under these conditions, as well as the strategies most relevant to the management of this uncertainty, will be very different in these cases than in those involving historical research. The lack of a solid basis of meta-reflection on epistemic processes in the humanities, discussed in the introduction to this piece, would make it difficult to adapt models from other contexts under the best of circumstances, but specificities in and between disciplines, and the world views they represent, are also significant barriers. For example, the W3C standard for uncertainty reasoning [14
] envisions one uncertainty state it calls ‘empirical,’ which can “be resolved by obtaining additional information.” Historians do not have the leisure to simply undertake further data gathering, and, as a result, the W3C category of ‘empirical uncertainty’ would seem to collapse from their perspective into other categories in the taxonomy, such as ‘incompleteness’ or ‘inconsistency.’ Similarly, the long recognised and powerful Uncertainty Matrix developed by Petersen [7
] introduces both a level of detail to its model and a conceptual framework so foreign to the work of the historian that even if some instances could be mapped to it, widespread adoption would be so resource intensive (given how central uncertainty is to the humanities) as to create more questions than answers. This is not to say that there could not be benefits inherent in such an exercise, only that asking a researcher whose discipline is not based on such models to adapt one from another discipline is like asking someone to translate into English text from a language without verbs: an already challenging exercise is rendered nearly futile by turning it from a means into an end in itself.
This is not to say that historians do not discuss the problem of uncertainty and how to deal with it, and it is to some of these discussions that we now turn. Of the examples of this sort that do exist, they tend to follow one of three general patterns, the first of which is to advocate that uncertainty not be reduced but made transparent in research findings. Adrian Blau [15
] takes this approach in his study of the place of uncertainty in the history of ideas, a particularly fertile ground for seemingly intractable uncertainties to take root. Uncertainty, he writes in his introductory paragraphs, is “inevitable for intellectual historians,” and must be tackled by “reducing and reporting it” [15
] (p. 358). There are a number of reasons for this that come out in the course of Blau’s discussion: because the field deals with actions and beliefs, which are always ‘underdetermined’ (and therefore subject to a number of potential interpretations), but also because the object of historical research is not really the events of the past, but the evidence that remains behind to attest to them: “…intellectual historians who make empirical claims are not saying what happened, but how strong they think their evidence is” [15
] (p. 358). Stated differently, “we can be definitive about what people wrote but only ‘speculative’ about their beliefs” [15
] (p. 369).
Although Blau’s approach may seem on the surface conservative, and therefore representing a resistance to reducing the uncertainty in the research processes that underpins intellectual history, in fact he indicates a number of potential pathways for the better management of this intrinsic aspect of the field. For one thing, he targets the fact that historical research lacks an appropriate language for speaking about uncertainty as it occurs in sources and conclusions: “One consequence of the subjectivity of historical uncertainty is that there is no agreed language for communicating degrees of uncertainty. Statisticians can report degrees of uncertainty in terms of significance levels, confidence intervals, standard deviations, and so on. Intellectual historians must use terms like “probably” and “likely”. This is far less precise [15
] (pp. 364–365). This issue, while important in itself, is embedded within another, perhaps even more central lacunae in the practices of historical research, which is the fact that history, because of its basis in the uncertainty of evidence and events, is much more dependent on the context of its community of practice than other disciplines might be “Uncertainty reminds researchers about the dangers of evidence misleading them, and of them misreading evidence. Uncertainty requires us to ask if our theoretical expectations are right, if we have focused overly on a single explanation, if we have looked at the similarities and differences of plausible alternatives, and at their strengths and weaknesses. Ignoring uncertainty by no means precludes high-class research. But it does make errors more likely” [15
] (p. 372). Due to the nature of historical research questions, and the manner in which historical evidence is generally only able to act as a more or less flawed proxy for the real answers being sought, uncertainty must be maintained in the process both as a part of the communications process between historians (a process what can be improved through refined language and techniques) but which also as a reminder of the vary nature of the conclusions being reached: “If we have done our best and an interpretation is still highly uncertain, we should say so; if nothing else we will look less foolish if we or someone else later finds more support for an alternative claim. […] If our estimates of uncertainty are honest and we keep noting that our inferences are possible but unlikely, we will seek more evidence, make different claims, or not publish at all. But if we know we are right, we are more likely to go wrong. The […] most important reason for reporting uncertainty is thus to remind ourselves about the subjectivity of our research” [15
] (p. 367). The preservation of uncertainty therefore becomes not a basis for questioning the validity of the conclusions of historical research, but the very basis for their claims, which must be limited in their scope because of the nature of the observations possible under the circumstances of the discipline.
Blau distinguishes the methods and fundamental conditions of historical research from statistics, in that “for statisticians, uncertainty is objective, but for intellectual historians, uncertainty is subjective. The other two differences follow from the first: there is no agreed way of reporting subjective uncertainty, or of estimating it” [15
] (pp. 362–363). These issues become an important point of contrast with an approach such as that taken by Myles Lavin [16
] who advocates very strongly for adopting a statistical approach to the epistemic uncertainty of history. His argument does, in many ways, bring out some of the same sorts of weaknesses in the historical research process as Blau does, in particular the nature of the historical evidence base, the lack of a stable language for describing levels and types of uncertainty, and the possibility for overconfidence and anchoring in previous work to lead to the propagation of conclusions based upon earlier flawed interpretations. But while Blau advocates the surfacing of assumptions and uncertainties as a method to deal with these weaknesses, Lavin prefers an almost opposing strategy, that is the application of Bayesian probabilities to the uncertainties of historical research. His argument draws from some very strong proponents of probabilistic methods, but also from the contention that there is no real difference between the kinds of uncertainty (epistemic as opposed to aleatory, events as opposed to quantities, past events rather than future actions) encountered in historical research and those found in other domains.
Although Lavin’s desire to manage the information blind spots that surely inform much historic research is welcome, his argument leaves a number of essential issues unexplored. Even if we take his understanding of the nature of the uncertainty in question here as valid (the discussion of quantities versus events, for example, seems particularly tentative [16
] (p. 99), the process by which probabilities could be assigned to the range of interpretations available seems merely to take the process of weighing and considering, which Blau sought to make transparent, and displace it to a quantifiable space within a compounded Bayesian black box. This is perhaps not the intention, for, as he explains, “It becomes much easier if we remind ourselves that probability curves ‘do not exist,’ as De Finetti said, “They are only a language in which we express our state of knowledge or state of certainty” [16
] (p. 103), but given that this is the case, it would seem that Blau’s strategy of maintaining the complexity and the provenance of the arguments being made would be an ultimately more productive one, in particular as the assignment and accretion of probabilities does not remove the reliance upon potentially flawed assumptions from the process: “Even in fields with much better data, estimation often entails an irreducible element of subjective judgment” [16
] (p. 102). In particular the proposition that “A traditional point estimate based on most-likely values for each of the input quantities could never hope to command credibility because of the proliferating uncertainties” [16
] (p. 106) seems rather overextended, given that authority in historical research has indeed been constructed for many centuries now without recourse to probabilities.
Ultimately, Lavin’s proposed approach to managing the uncertainty in a historian’s research process leaves a taste of ‘old wine in new bottles’ by trying to make tacit knowledge more explicit through a process that is itself largely based upon this same tacit knowledge.
5. Humanities, Uncertainty and the Digital
As the work of Blau and Lavin illustrate, the powerful training a humanist receives for managing uncertainty in their source material does not necessarily translate well to digital or data-driven environments requiring a clear recording of uncertainty. In particular, exchanging a set of diverse and varied sources for a homogenous corpus of ‘data’ that cannot be surveyed and ‘seen’ in the same way, is a challenging shift. Kouw, Van Den Heuvel, and Scharnhorst in particular acknowledge this “highly ambiguous meaning of data in the humanities” [9
], a position that Christine Borgman advances in her conjecture that “ … [b]ecause almost anything can be used as evidence of human activity, it is extremely difficult to set boundaries on what are and are not potential sources of data for humanities scholarship” [17
But it is not just words that are being shifted as the humanities move from sources to data, it is methods and values as well. As Masson describes it, “with the introduction of digital research tools, and tools for data research specifically, humanistic scholarship seems to get increasingly indebted to positivist traditions. For one, this is because those tools, more often than not, are borrowed from disciplines centred on the analysis of empirical, usually quantitative data. Inevitably, then, they incorporate the epistemic traditions they derive from” [6
As we have seen from the examples given above, positivism is not an approach currently favoured in historical or literary scholarship and indeed is rather discredited in these disciplines. However, the push toward a sort of ‘new positivism,’ arising not from a research or epistemic cultural imperative so much as an opportunistic one based on tool availability, cannot be ignored. Christine Borgman describes the challenge of the humanist using a digital tool not necessarily developed for her accustomed mode of questioning as follows: “they are caught in the quandary of adapting their methods to the tools versus adapting the tools to their methods. New tools lead to new representations and interpretations” [17
]. Digital humanities should, by all means, open up the way to new interpretations (which must be informed, of course, by an understanding of the function of tools) but it should also be able to resolve this quandary state by moving the tool to the user, drawing strength from the different perspective the humanist brings to the use of quantitative approaches, rather than resisting them.
A very well developed example of this can be seen in the work of the e-Science and Ancient Documents (or eSAD) project [18
]. ESAD takes as its starting point not so much a theoretical position on uncertainty, but a very specific and applied one, namely the attempt to create a software tool to support the interpretation of ancient papyri. Interestingly, this project also bases its reflections on the desire to make tacit knowledge and implicit processes more explicit, but with a somewhat different focus from either Blau or Lavin, in that the final goal of building a tool places the emphasis on aligning with existing expert workflows, so as to ensure a minimal additional overhead in the enhanced process. Like Lavin, this work views interpretation as an accretive process that builds a network of ‘percepts’ [18
] (p. 350) but, like Blau, it places high value on maintaining documentation of the research process: “how to digitize appropriately an artefact and how to record a thought process” promoting a recognition that “digitization is both sampling and interpreting” [18
] (p. 350). The project thereby also recognises much that is already implicit in the modelling processes that perhaps manage uncertainty but may also be already introducing or amplifying existing biases, introducing ‘spurious exactitude’ and potentially hiding genuine uncertainty. As such, the primary focus of the eSAD tool was to facilitate efficient oscillation between different levels and modes of reading, and by not creating a system that requires interpretations to be encoded in any machine readable way (such as by allowing strokes to be drawn without requiring any alignment to specific letter or characters). As the authors note (and in contrast to Lavin), “quantifying uncertainty is always risky and usually presupposes that problems are complete… which is far from the case in a papyrological context,” a material condition which makes more preferable an approach that “allows us to point out inconsistencies without forbidding them” [18
] (p. 355).
6. Productive and Unproductive Management of Uncertainty in Humanities Research
By and large, humanistic researchers are not looking for tools that change utterly what they study or how they undertake their investigations, so much as an enhancement of and supplement to their already heterogenous sources and adaptable methods. This is not to say that the non-neutral impact of the digital on epistemic processes is not recognised, but rather that the digital interventions preferred are generally more of the nature of a supplement than a displacement. In this sense, Tim Sherratts’ distinction between research infrastructures that can be deployed in a manner that is ‘tactical’ and targeted rather than broad and encompassing [19
], is a useful one. Virtual research environments and other forms of ‘one-stop-shop’ for carrying out research are generally less readily accepted and reused in the humanities than are specific tools able to deliver specific parts of a workflow, such as the TEI standard [20
] for text encoding or Voyant tools [21
] for data visualisation. The conservatism that underlies this tendency can have quite wide-reaching consequences, with humanists viewing such widely accepted practices of data-driven research as ‘data cleaning’ or ‘data scrubbing’ with great suspicion, viewing these activities in the much more negative light of ‘data manipulation’ [9
They have good reason to be suspicious. In his analysis of the process by which the metadata was created for the massive oral history collection of the Shoah Visual History Archive, Todd Presner [22
] speaks of the potential algorithmic approaches seem to have to be ‘ethical,’ that is neither losing the suffering of the individual in the masses, nor focussing only on particular well-known stories, such as Anne Frank’s or Elie Wiesel’s. And yet, as Presner shows, this apparently ethical viewpoint is deeply flawed when it comes to representing uncertainty, leading to the tagging of some material as ‘indeterminate data’ or ‘non-indexable content’. Presner’s account of this is worth quoting in full, as it not only raises a number of concerning ethical issues, but also highlights the real difficulties facing researchers when it comes to encoding particularly difficult or rich materials into a dataset:
“‘Indeterminate data’ such as ‘non-indexable content,’ must be given either a null value or not represented at all. How would emotion, for example, need to be represented to allow database queries? While certain feelings, such as helplessness, fear, abandonment, and attitudes, are tagged in the database, it would be challenging to mark-up emotion into a set of tables and parse it according to inheritance structures (sadness, happiness, fear, and so forth, all of which are different kinds of emotions), associative relationships (such as happiness linked to liberation, or tears to sadness and loss), and quantifiable degrees of intensity and expressiveness: weeping gently (1), crying (2), sobbing (3), bawling (4), inconsolable (5). While we can quickly unpack the absurdity (not to mention the insensitivity) of such a pursuit, there are precedents for quantified approaches to cataloguing trauma. [...] Needless to say, databases can only accommodate unambiguous enumeration, clear attributes, and definitive data values; everything else is not in the database
. The point here is not to build a bigger, better, more totalizing database but that database as a genre always reaches its limits precisely at the limits of the data collected (or extracted, or indexed, or variously marked up) and the relationships that govern these data. We need narrative to interpret, understand, and make sense of data” [22
Presner gestures towards a complete rethinking of the database as a genre: specifically regarding representations of ‘the indeterminate’. “Such a notion of the archive specifically disavows the finality of interpretation, relishes in ambiguity, and constantly situates and resituates knowledge through varying perspectives, indeterminacy, and differential ontologies” [22
]. Perhaps, if we are to realise the potential of the digital humanities, this is the direction in which we must look?
Knowledge organisation frameworks and their associated tools, like metadata standards, taxonomies, controlled vocabularies and ontologies, have all provided powerful frameworks to increase our ability to connect and find information. They also, however, can reduce the complexity of the information available around a particular object and its digital surrogate, stripping it of its original context and its provenance in a way very much counter to the recommendations of Blau, discussed above. Ironically, therefore, the way in which cultural data, in particular historical data, is prepared may well increase its findability, but reduce its usability by reducing potential sources of uncertainty.
In order to provide some further evidence of how uncertainty was viewed and managed by historians within their research projects, the PROVIDE DH project carried out a series of interviews with researchers working with a specific corpus of historical source collections that are central to any understanding of 17th century Ireland [23
]. The project’s reason for choosing this particular topic as its focus was threefold: first, the relevant source materials were readily available in well-structured digital formats; second the original sources and their digital counterparts were both well-known for the complexity and uncertainty of the information they presented; and third, the project team had access to a cohort of researchers actively working with both the original and digitised sources. From each of these interviews, two use cases were extracted for a total of eight research cases related to this body of sources, each of which demonstrated a different face of the uncertainty these researchers deal with, and their strategies for dealing with it. The cases cover a very wide range of challenges, from specific gaps between contemporary and historical cultural practices, to language usage and text normalisation, and in particular to the locus of uncertainty, be it in the primary sources, their later interpretation, or even in the digital or analogue environments in which they are presented. From these cases, and from the other material presented above, we can derive the following prompts for the developers of digital supports to the management of uncertainty in the historical research process, each of which is illustrated by one or more projects or approaches demonstrating effective management this particular issue.
provide access to context and provenance. As Kouw, Van Den Heuvel, and Scharnhorst state: “Metadata provide context, but the question of whose context is particularly contentious in the humanities” [9
]. Systems that capture and make it possible to explore the provenance of data streams, the variety and richness of its contexts, who has contributed to them, and what they may have been produced in proximity to could greatly enhance a researcher’s ability to overcome inherent weaknesses in a source. Although they may not be the most technically interesting examples, genetic editions of texts, such as the Beckett Digital Manuscript [24
] project can still be an inspiration to developers seeking to work with humanities research materials. By maintaining and making transparent the development of an eventual ‘final’ text, such editions reveal provenance and build user confidence by enabling what the system displays at one level to always be queried and checked on another. Delivering this for historical sources may be a more difficult challenge, however, as the material required to form a useful corpus is likely to be more dispersed, loosely structured and larger overall.
do not focus on an unrealistic ‘single source’ model. Researchers not trained in the humanities may assume that deeply interrogating a single source is a norm for the humanities, as it may be in other disciplines. While this does occur, knowledge of that single source must be supported by corroborating evidence from elsewhere. While there are some emerging examples of powerful single source corpora for humanistic research (such as historic newspapers, social media, or parliamentary records), humanists will still long maintain an inconvenient tendency to “[draw] on all imaginable sources of evidence” [17
] or at least deploy a strategy of “triangulation” [15
] (p. 361), a fact that should not just be accepted, but celebrated. The SESHAT project’s use of linked open data to build multivocal datasets able to resolve or document the origins of competing hypotheses without prejudging their resolution has been a leader in this respect [25
support the development of more precise vocabularies for expressing uncertainty. As all of the articles surveyed above pointed out, in one way or another, the language available for talking about uncertainty in the humanities is not particularly precise or clear. While digital environments should avoid at all costs implying certainty, evidenced by a clear and unequivocal single accepted interpretation, where it is not (as the eSAD project achieves in its presentation of strokes, not letters), developers can use a similar process to investigate what the variations are in usage, and how they might be flexibly incorporated. It should be borne in mind while doing this that the possible gradations may not be in the obvious place, that is from more to less uncertain. An equivalent for the standard deviation of the statistician that can be seen as clear enough to gain community acceptance may exist on the axis of how and where uncertainty enters a system (as seen very clearly in the PROVIDE DH scenarios) or need to be aligned to specific research questions: one size will likely not fit all. Achieving this on a generic level may be challenging, but many good examples exist where specific challenges have been met in ways that map to the specifics of research domains and questions, such as the identification of elements on ancient coins [26
] and for some aspects of temporal-spatial encoding [27
focus on interoperability and ‘comparative legibility’ in corroborating sources. This is a corollary to the item above: if a single source will never be enough, then finding new ways to move fluidly between sources would be the far greater gain. This does not mean pulling all data into a universal federated information bank, a process that would inevitable lose context and flatten complexity. Rather, the goal would be to enable sources that are siloed to be combined and compared more easily than they are now, that is, more easily than as a linearly accessed succession of searches operating in different environments with different affordances and norms of interaction in each. One might think of the Orange tool chain platform [28
] as an inspiration for this kind of linked, rather than siloed or merged, experience.
provide a ‘fuzzy search’ that can reduce false negatives, such as is incorporated in the excellent interface of the Transkribus handwriting recognition tool [29
]. While such a capacity will not solve all of the problems uncertainty about data might instil, it will at least promote an interrogability that may increase confidence.
Interrogability of processing must also become more the norm. In a universe where the majority of humanistic sources were textual, a methodological source (such as a work of critical theory) could be read, evaluated and then used or discarded. Only the foolhardy scholar would attempt to use a source s/he had not read. And yet, in the digital age, such equivalent tools for framing arguments and approaches, from topic modelling to stylometric tools, do not always explicitly expose their ‘lines of argument,’ their ‘thought processes.’ Instead, they often seem to run the risk of promoting the maxim ‘garbage in, gospel out,’ leaving the user who may not be aware of the limitations of the tool to accept the authoritative voice of its output. DARPA’s research into “Explainable AI” [30
] points in a direction that could provide models for this, despite the additional cost and limitations such an approach may place on the technology deployed.
explore embodied practices. Part of the strength of the humanistic research process, and its adaptation to the heterogenous and uncertain sources it relies upon, comes from its multimodality. The embodied elements of humanistic research practices are highly complex, enabling a very subtle management of time and space, of kinds of knowledge and of complementary sources, which is antithetical to work on a single platform or device. A better match with these strategies would create a far more fluid relationship between their needs and digital tools and environments. Collaborations with novel physical infrastructures for the digital humanities such as the HumLab at Umeå University in Sweden could lead to fresh approaches to resolving complex knowledge creation problems such as that of uncertainty in the humanities [31
enable trust. Digital tools will speed up some aspects of a humanities researcher’s process, but other aspects will almost certainly defy interrogation by digital methods. According to Tenopir et al.’s substantive report on trust and authority in scholarly communications, the top criteria scholars used to judge their sources were “criteria … associated with personal perusal and knowledge, the credibility of the data and the logic of the argument and content” [32
]. All of these processes are ones that the digital presentation of source material has the potential to impede, by reducing perusability, removing context, restructuring an existing logic framework, or indeed presenting data stripped of the interpretive narrative meant to accompany it. To disrupt these elements is to disrupt the internalised, tacit verification system of the humanist, without which, sources and tools are of no use at all. The hugely successful development of the Text Encoding Initiative (or TEI, [20
]) as a standard that managed not only to prove itself able to capture the complexity of humanistic data but also to gain a widespread acceptance as sympathetic to humanities methods proves that this is possible.
finally, and most importantly, do not try to remove uncertainty, but signal where it is. Humanists will never have certainty, because the sources, and the humans who created them, are flawed. Because of this, honing a human instrument able to draw conclusions under these circumstances is a value and a process humanists hold dear. There are many things a researcher has to learn to deal with just by ‘slogging through’ them, this is a part of the discovery and learning process. But, properly deployed, the digital can contribute a lot to what a humanist does with the uncertainty they have, and how they move toward a greater and better-grounded confidence in their interpretations, maintaining the all-important aspects of provenance as a manner by which to preserve and communicate uncertainty while reducing the dependence on potentially biased methods just below this surface.
I have long been inspired on the work of historians on the phenomenon known as epistemicide, the systematic marginalisation to the point of extinction of certain ways of creating knowledge, which was particularly pronounced in at the height of the long 16th Century, with its many examples of colonial expansion activity [33
]. In our digital, quantitative age, with its keyword searches, artificial intelligences and statistical profiling algorithms, I worry we may be facing into another great wave of this same phenomenon. Bruno Latour seems to harbour the same fear, proposing that we need to “...recalibrate, or realign, knowledge with uncertainty, and thereby remain open to a productive disruptive aspect of uncertainty” [34
] (p. 245). As Kouw, Van Den Heuvel, and Scharnhorst point out, “uncertainty is often explained as a lack of knowledge, or as an aspect of knowledge that implies a degree of unknowability. Such interpretations can result in commitments to acquire more information about a particular situation, system, or phenomenon, with the hope of avoiding further surprises” [9
]. But for the humanist, the joy of discovery, and of reaching across a gulf of time and text to connect with others, is a surprise one could never wish away. The authors of this passage continue to say that we need to understand and appreciate “how uncertainty can be a source of knowledge that can disrupt categories that provide epistemological bearing” [9
]. If our attempts to assist researchers to manage uncertainty with digital tools are to succeed, we must be ever mindful of this.
Historian Christina Larner sounded a warning bell as early as 1984 that “Inadequate data [does] not become scientific information simply by virtue of being processed through a computer” [35
]. This does not mean, however, that uncertain historical sources cannot be made into knowledge with the assistance of a computer in a manner consistent with humanistic ways of knowing, however, and it is toward this goal we must strive.