Strategies and Recommendations for the Management of Uncertainty in Research Tools and Environments for Digital History
Abstract
:1. Introduction
2. The Nature of Humanities Research Questions and Processes
“My project examines how the rural-urban divide shaped Habsburg Austrian society’s experience of the war from about 1915.”
“I want to investigate the relationship between the Bec-Hellouin Abbey, in Normandy, and other monasteries, priory, archbishopric and the kingdom of England, from its foundation to the XV century.”
“I wish to examine the ways in which the Ottoman empire and Islam were perceived by the political (liberal and Catholic) elites in the Slovenian lands of the Habsburg monarchy in the decade before the outbreak of the First World War.”
3. Defining Uncertainty in the Humanities
- Inherent uncertainty: Farge writes that “the words of those ensnared actors contain perhaps more intensity than truth” [11] (p. 27). This indicates perhaps one the most obvious sources of uncertainty—this is not exactly aleatory uncertainty (that is, irreducible uncertainty resulting from random occurences), but as close as one gets to working with the artefacts of human activity, given that humans often struggle to understand even their own reasons for taking particular actions or decisions and the impact these actions and decisions may have. Somebody did something, or something happened—why did they do this? Why did it happen? Is this mark in the manuscript a doodle or a representation of a face? What was the inspiration behind this author’s use of this word, this image?
- Similarly, uncertainty arises due to errors, especially in cataloguing, but also in interpretation. I do not know where this document came from, is it in the right place/box? The date or origin given for this object in the finding aid or catalogue record does not make sense to me as an expert, what is the source of this gap? By revisiting evidence, I can see that a medical diagnosis made decades earlier was probably wrong, or that an earlier interpretation, was based on biased (see definition below) or incomplete records. Known disagreements between existing records or interpretations could be seen as a subcategory of error, for one can assume that in cases of differing accounts, all or part of one or both would be based on incorrect assumptions or information.
- Gaps exist in the record, leading to partial, missing, perspective-limited or conflicting information. Farge writes: “both presence and absence from the archive are signs we must interpret in order to understand how they fit into the larger landscape” [11] (p. 71) This is also a very simple form of uncertainty to imagine. I have a letter, I know who received it, but who wrote it? I know the age of an object, where did it come from? I know a document is from 1944 (or, more commonly, ‘5th Century,’ ‘medieval era,’ ‘around 1650’ etc.) but what specific date? One account claims that 20 people were killed in the skirmish, another account claims 200, which is accurate? Was this story written before or after the author heard about a particular event?
- Bias, that is, the intentional or unintentional incorporation into the archival record of a personal inclination toward or against an individual or group. Farge is exceptionally attuned to sources of bias in the archive. It can be introduced at any number of points in time and the research process, by the nature of the sources gathered and the authorities gathering them, or by the researchers’ own influences and knowledge. “The archives can always be twisted into saying anything, everything and its opposite” [11] (p. 97). I can verify that a statement was made and by whom, but how can I verify the veracity or intention of the speaker? I am working with a collection that supports a particular conclusion, but is there material excluded here (intentionally or unintentionally) that would contradict it? This woman’s writing (which we no longer have) is described as inferior by a male contemporary, but was that an aesthetic judgement or a gender-based one?
- Undefined. Uncertainty is seldom a property of a document (unless it is itself internally inconsistent) but rather of the person attempting to interpret it in a given context. Awareness of, and the importance given to, the discovery of a source of uncertainty matching any one of the categories described above is very likely to be dependent on the individual reader, their time, place and purpose for accessing a document, that is, the research question that drives their action. Because of this complex relationship between background knowledge, research question, and data source, sometimes, a source or nature of uncertainty cannot be defined precisely: it consists of a sense of something wrong, without being able to say exactly why (a mismatch with tacit knowledge as opposed to the ‘happy accident’ of discovery that is serendipity). For example: the number of items found in a digital search seems to low, but I cannot be sure what the source of the problem is. I know a certain object is in this collection, why can’t I find it in the catalogue? This visualisation (e.g., in a GIS) does not match my tacit understanding of a phenomenon. The author writes that his intention was to portray a character in a certain way, but that does not match my own interpretation—as Farge expresses it, “The heart of the matter is never immediately clear” [11] (p. 62).
4. Motivations and Mechanisms for Managing Uncertainty
5. Humanities, Uncertainty and the Digital
6. Productive and Unproductive Management of Uncertainty in Humanities Research
- -
- provide access to context and provenance. As Kouw, Van Den Heuvel, and Scharnhorst state: “Metadata provide context, but the question of whose context is particularly contentious in the humanities” [9]. Systems that capture and make it possible to explore the provenance of data streams, the variety and richness of its contexts, who has contributed to them, and what they may have been produced in proximity to could greatly enhance a researcher’s ability to overcome inherent weaknesses in a source. Although they may not be the most technically interesting examples, genetic editions of texts, such as the Beckett Digital Manuscript [24] project can still be an inspiration to developers seeking to work with humanities research materials. By maintaining and making transparent the development of an eventual ‘final’ text, such editions reveal provenance and build user confidence by enabling what the system displays at one level to always be queried and checked on another. Delivering this for historical sources may be a more difficult challenge, however, as the material required to form a useful corpus is likely to be more dispersed, loosely structured and larger overall.
- -
- do not focus on an unrealistic ‘single source’ model. Researchers not trained in the humanities may assume that deeply interrogating a single source is a norm for the humanities, as it may be in other disciplines. While this does occur, knowledge of that single source must be supported by corroborating evidence from elsewhere. While there are some emerging examples of powerful single source corpora for humanistic research (such as historic newspapers, social media, or parliamentary records), humanists will still long maintain an inconvenient tendency to “[draw] on all imaginable sources of evidence” [17] or at least deploy a strategy of “triangulation” [15] (p. 361), a fact that should not just be accepted, but celebrated. The SESHAT project’s use of linked open data to build multivocal datasets able to resolve or document the origins of competing hypotheses without prejudging their resolution has been a leader in this respect [25].
- -
- support the development of more precise vocabularies for expressing uncertainty. As all of the articles surveyed above pointed out, in one way or another, the language available for talking about uncertainty in the humanities is not particularly precise or clear. While digital environments should avoid at all costs implying certainty, evidenced by a clear and unequivocal single accepted interpretation, where it is not (as the eSAD project achieves in its presentation of strokes, not letters), developers can use a similar process to investigate what the variations are in usage, and how they might be flexibly incorporated. It should be borne in mind while doing this that the possible gradations may not be in the obvious place, that is from more to less uncertain. An equivalent for the standard deviation of the statistician that can be seen as clear enough to gain community acceptance may exist on the axis of how and where uncertainty enters a system (as seen very clearly in the PROVIDE DH scenarios) or need to be aligned to specific research questions: one size will likely not fit all. Achieving this on a generic level may be challenging, but many good examples exist where specific challenges have been met in ways that map to the specifics of research domains and questions, such as the identification of elements on ancient coins [26] and for some aspects of temporal-spatial encoding [27].
- -
- focus on interoperability and ‘comparative legibility’ in corroborating sources. This is a corollary to the item above: if a single source will never be enough, then finding new ways to move fluidly between sources would be the far greater gain. This does not mean pulling all data into a universal federated information bank, a process that would inevitable lose context and flatten complexity. Rather, the goal would be to enable sources that are siloed to be combined and compared more easily than they are now, that is, more easily than as a linearly accessed succession of searches operating in different environments with different affordances and norms of interaction in each. One might think of the Orange tool chain platform [28] as an inspiration for this kind of linked, rather than siloed or merged, experience.
- -
- provide a ‘fuzzy search’ that can reduce false negatives, such as is incorporated in the excellent interface of the Transkribus handwriting recognition tool [29]. While such a capacity will not solve all of the problems uncertainty about data might instil, it will at least promote an interrogability that may increase confidence.
- -
- Interrogability of processing must also become more the norm. In a universe where the majority of humanistic sources were textual, a methodological source (such as a work of critical theory) could be read, evaluated and then used or discarded. Only the foolhardy scholar would attempt to use a source s/he had not read. And yet, in the digital age, such equivalent tools for framing arguments and approaches, from topic modelling to stylometric tools, do not always explicitly expose their ‘lines of argument,’ their ‘thought processes.’ Instead, they often seem to run the risk of promoting the maxim ‘garbage in, gospel out,’ leaving the user who may not be aware of the limitations of the tool to accept the authoritative voice of its output. DARPA’s research into “Explainable AI” [30] points in a direction that could provide models for this, despite the additional cost and limitations such an approach may place on the technology deployed.
- -
- explore embodied practices. Part of the strength of the humanistic research process, and its adaptation to the heterogenous and uncertain sources it relies upon, comes from its multimodality. The embodied elements of humanistic research practices are highly complex, enabling a very subtle management of time and space, of kinds of knowledge and of complementary sources, which is antithetical to work on a single platform or device. A better match with these strategies would create a far more fluid relationship between their needs and digital tools and environments. Collaborations with novel physical infrastructures for the digital humanities such as the HumLab at Umeå University in Sweden could lead to fresh approaches to resolving complex knowledge creation problems such as that of uncertainty in the humanities [31].
- -
- enable trust. Digital tools will speed up some aspects of a humanities researcher’s process, but other aspects will almost certainly defy interrogation by digital methods. According to Tenopir et al.’s substantive report on trust and authority in scholarly communications, the top criteria scholars used to judge their sources were “criteria … associated with personal perusal and knowledge, the credibility of the data and the logic of the argument and content” [32]. All of these processes are ones that the digital presentation of source material has the potential to impede, by reducing perusability, removing context, restructuring an existing logic framework, or indeed presenting data stripped of the interpretive narrative meant to accompany it. To disrupt these elements is to disrupt the internalised, tacit verification system of the humanist, without which, sources and tools are of no use at all. The hugely successful development of the Text Encoding Initiative (or TEI, [20]) as a standard that managed not only to prove itself able to capture the complexity of humanistic data but also to gain a widespread acceptance as sympathetic to humanities methods proves that this is possible.
- -
- finally, and most importantly, do not try to remove uncertainty, but signal where it is. Humanists will never have certainty, because the sources, and the humans who created them, are flawed. Because of this, honing a human instrument able to draw conclusions under these circumstances is a value and a process humanists hold dear. There are many things a researcher has to learn to deal with just by ‘slogging through’ them, this is a part of the discovery and learning process. But, properly deployed, the digital can contribute a lot to what a humanist does with the uncertainty they have, and how they move toward a greater and better-grounded confidence in their interpretations, maintaining the all-important aspects of provenance as a manner by which to preserve and communicate uncertainty while reducing the dependence on potentially biased methods just below this surface.
7. Conclusions
Funding
Acknowledgments
Conflicts of Interest
References
- Daston, L. Exempla and the Epistemology of the Humanities. Available online: https://www.youtube.com/watch?v=8JlXfIyqsG4 (accessed on 15 August 2019).
- Franke, W. Involved Knowing: On the Poetic Epistemology of the Humanities. Humanities 2015, 4, 600–622. [Google Scholar] [CrossRef] [Green Version]
- Antonijević, S. Amongst Digital Humanists: An Ethnographic Study of Digital Knowledge Production; Palgrave: Basingstoke, UK, 2015. [Google Scholar]
- Ribes, D.; Baker, K. Modes of Social Science Engagement in Community Infrastructure Design. Communities Technol. 2007, 107–130. [Google Scholar] [CrossRef]
- CENDARI Project Team. Domain Use Cases. Available online: http://www.cendari.eu/sites/default/files/CENDARI_D4.2%20Domain%20Use%20Cases%20final.pdf (accessed on 15 August 2019).
- Masson, E. Humanistic Data Research an Encounter between Epistemic Traditions in Mirko Tobias Schäfer and Karin van Es. In The Datafied Society. Studying Culture through Data; Amsterdam University Press: Amsterdam, The Netherlands, 2017. [Google Scholar]
- Petersen, A. Simulating Nature: A Philosophical Study of Computer-Simulation Uncertainties and their Role in Climate Science and Policy Advice; Uitgeverji Maklu: Antwerpen, Belgium, 2006. [Google Scholar]
- NASA’s Earth Observing System Data Information System (EOS DIS). Available online: https://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products (accessed on 15 August 2019).
- Kouw, M.; Van Den Heuvel, C.; Scharnhorst, A. Exploring Uncertainty in Knowledge Representations: Classifications, Simulations, and Models of the World in Wouters. In Virtual Knowledge: Experimenting in the Humanities and the Social Sciences; MIT Press: Cambridge, MA, USA, 2013; pp. 89–125. [Google Scholar]
- Brugnach, M.; Dewulf, A.; Pahl-Wostl, C.; Taillieu, T. Toward a Relational Concept of Uncertainty: About Knowing Too Little, Knowing Too Differently, and Accepting Not to Know. Ecol. Soc. 2008, 13. [Google Scholar] [CrossRef]
- Farge, A. The Allure of the Archive; trans. by Thomas Scott-Railton; Yale University Press: New Haven, CT, USA; London, UK, 2013. [Google Scholar]
- Bradley, R.; Drechsler, M. Types of Uncertainty. Erkenntnis (1975-) Spec. Issue Radic. Uncertain. 2014, 79, 1225–1248. [Google Scholar] [CrossRef]
- Lewandowsky, S.; Pancost, R.D.; Ballard, T. Uncertainty as Knowledge. Philos. Trans. Math Phys. Eng. Sci. 2015, 373, 1–11. [Google Scholar] [CrossRef] [PubMed]
- W3C Incubator Group Report. Uncertainty Reasoning for the World Wide Web. Available online: https://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/ (accessed on 15 August 2019).
- Blau, A. Uncertainty and the History of Ideas. Hist. Theory 2011, 50, 358–372. [Google Scholar] [CrossRef]
- Lavan, M. Epistemic Uncertainty, Subjective Probability, and Ancient History. J. Interdiscip. Hist. 2019, 50, 91–111. [Google Scholar] [CrossRef] [Green Version]
- Borgman, C. Big Data, Little Data, No Data; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Tarte, S.M. Digitizing the Act of Papyrological Interpretation: Negotiating Spurious Exactitude and Genuine Uncertainty. Lit. Linguist. Comput. 2011, 26, 349–358. [Google Scholar] [CrossRef]
- Sherratt, T. Towards A Manifesto for Tactical DH Research Infrastructure. Available online: https://www.youtube.com/watch?v=FL5pP2ysjU4 (accessed on 15 August 2019).
- The Text Encoding Initiative. Available online: https://tei-c.org/ (accessed on 15 August 2019).
- Voyant: See through Your Text. Available online: https://voyant-tools.org/ (accessed on 15 August 2019).
- Presner, T. Probing the Ethics of Holocaust Culture, in Fogu, Kansteiner, and Presner, History Unlimited; Harvard University Press: Cambridge, MA, USA; Available online: http://www.hup.harvard.edu/catalog.php?isbn=9780674970519 (accessed on 15 August 2019).
- Edmond, J.; Doran, M. Uncertainty, Digital Sources and Historical Research in the Early Modern Period; PROVIDE DH Internal Project Report; in press.
- Beckett Digital Manuscript Project. Available online: https://www.beckettarchive.org/ (accessed on 15 August 2019).
- SESHAT Data Bank. Available online: http://seshatdatabank.info/methods/ (accessed on 15 August 2019).
- Tolle, K.; Wigg-Wolf, D. Uncertainty? ECFN Meeting 2014—Basel Goethe University 2014. Available online: http://ecfn.fundmuenzen.eu/images/Tolle_Wigg-Wolf_Uncertainty.pdf (accessed on 15 August 2019).
- Martin-Rodilla, P.; Gonzalez-Perez, P. Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theoretical Framework and ConML Implementation with a Real Case Study. In Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality ACM, Salamanca, Spain, 24–26 October 2018. [Google Scholar]
- Orange. Available online: https://orange.biolab.si/ (accessed on 15 August 2019).
- Transkribus. Available online: https://transkribus.eu/Transkribus/ (accessed on 15 August 2019).
- Gunning, D. Explainable Artificial Intelligence (XAI). Available online: https://www.darpa.mil/program/explainable-artificial-intelligence (accessed on 15 August 2019).
- Emerson, L. Humanities Infrastructure, Inflection and Imagination: An Interview with Patrick Svensson. (24 April 2018). Available online: https://whatisamedialab.com/tag/humlab/ (accessed on 15 August 2019).
- Tenopir, C.; Nicholas, D.; Watkinson, A.; Volentine, R.; Allard, S.; Levine, K.; Tenopir, C.; Herman, E. Trust and Authority in Scholarly Communications in the Light of the Digital Transition, Final Report; University of Tennessee: Knoxville, TN, USA; CIBER Research Ltd.: Berkshire, England, 2013. [Google Scholar]
- Hall, B. Beyond Epistemicide: Knowledge Democracy and Higher Education. Available online: http://hdl.handle.net/1828/6692 (accessed on 15 August 2019).
- Latour, B. Re-assembling the Social: An Introduction to Actor-Network-Theory; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
- Larner, C. Witchcraft and Religion. The Politics of Popular Belief; Oxford University Press: Oxford, UK, 1984. [Google Scholar]
© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Edmond, J. Strategies and Recommendations for the Management of Uncertainty in Research Tools and Environments for Digital History. Informatics 2019, 6, 36. https://doi.org/10.3390/informatics6030036
Edmond J. Strategies and Recommendations for the Management of Uncertainty in Research Tools and Environments for Digital History. Informatics. 2019; 6(3):36. https://doi.org/10.3390/informatics6030036
Chicago/Turabian StyleEdmond, Jennifer. 2019. "Strategies and Recommendations for the Management of Uncertainty in Research Tools and Environments for Digital History" Informatics 6, no. 3: 36. https://doi.org/10.3390/informatics6030036
APA StyleEdmond, J. (2019). Strategies and Recommendations for the Management of Uncertainty in Research Tools and Environments for Digital History. Informatics, 6(3), 36. https://doi.org/10.3390/informatics6030036