Capturing the Silences in Digital Archaeological Knowledge

: The availability and accessibility of digital data are increasingly significant in the creation of archaeological knowledge with, for example, multiple datasets being brought together to perform extensive analyses that would not otherwise be possible. However, this makes capturing the silences in those data—what is absent as well as present, what is unknown as well as what is known—a critical challenge for archaeology in terms of the suitability and appropriateness of data for subsequent reuse. This paper reverses the usual focus on knowledge and considers the role of ignorance—the lack of knowledge, or nonknowledge—in archaeological data and knowledge creation. Examining aspects of archaeological practice in the light of different dimensions of ignorance, it proposes ways in which the silences, the range of unknowns, can be addressed within a digital environment and the benefits which may accrue.


Introduction
A new scientific paradigm in archaeology has recently been characterized: from Kristiansen's "Third Science Revolution" [1] (pp. [12][13][14], through Sørensen's "Scientific Turn" and its "new empiricism" [2] (p. 101), to Cunningham and MacEachern's discussion of archaeology's aspiration to become "big science" [3] (p. 630), for example. In such an environment, it can be argued that empiricism in archaeology has become resurgent (e.g., [2,4,5]), promoting knowledge and its acquisition through observation, experimentation, recording, and analysis. However, with the foregrounding of knowledge, ignorance, or simply the lack of knowledge, can become an absence, a silence, a void to be filled: "our understanding … is a delicate interplay of knowing and not knowing. However, because we are often aware of what we know, and rarely aware of what we do not, we tend to overemphasize the range and importance of our knowing. While the known and the knowable are minute portions of the unknown, it is the known we identify with 'reality'" [6] (p. 173).
In this way a focus on knowledge and the knowable promotes the primacy of data and the authority of "facts", and Sørensen argues that this new empiricism raises a number of challenges, not least in the way in which it "… generates an unhelpful return to the ethos of letting 'data speak for itself' … because… 'facts do not lie' and thus become associated with 'truth'" and a "perceived need to force scientific methods onto otherwise ambiguous archaeological research topics …" [2] (p. 102); (see also [5] and see [7] for a contrary view).
At the same time, an associated paradigm shift has been identified in archaeology's equivalent of the "datalogical turn" (e.g., [8]): a new ontological approach built on the availability of digital data and its automated adaptive algorithmic processing [4]. Archaeological knowledge is increasingly predicated on digital data, much of it digitized from material analogs but increasingly born digital from the outset. This digitalization or datafication of the archaeological domain opens new possibilities and perspectives in terms of access, reuse, and analysis of data in the creation of knowledge about the past. Underpinning this environment has been the creation of a wide variety of digital recording systems, the development and application of a broad range of analytical tools, and the establishment of digital infrastructures to store, categorize, and distribute data. However, the degree of standardization and conformity within these structures and associated analyses often disguises the diversity of archaeological data. For example, Huvila [9] (pp. 151-154) sees archaeological information as highly idiosyncratic, their fragmentary and incomplete character captured using a variety of standards and methodologies, and featuring multiple temporalities in terms of both the material remains and the documentation of them. Similarly, Martin-Rodilla [10] (pp. [33][34] characterizes archaeological data in terms of their temporal and spatial dimensions, their high degree of variability, their multidisciplinary nature, their preferentially textual focus, the vagueness of cognitive processes associated with reasoning around data, and limited studies about the visualization of data and the effects on knowledge generation. Although none of these characteristics should be a surprise to archaeologists, they emphasize the complexities of archaeological data in relation to knowledge creation, underlined by Wylie's description of archaeology as a discipline defined by the challenges of working with gaps and absences in its primary data [11] (p. 204).
In a digital environment, however, it can seem as if this awareness is set aside: narrative subtleties are frequently lost as data are combined, mined, recycled, and repurposed. The amalgamation of multiple datasets, the stripping of context, the remoteness encouraged by algorithmic methods, and the embrace of the large-scale over small detail all combine to provide a challenging environment for archaeological knowledge creation (e.g., [4] (p. S13)). In such circumstances, what is needed is a means of characterizing and capturing the absences in archaeological data, rather than setting them aside. The rise of new empiricism combined with the growth of datafication in archaeology makes it an appropriate time to reconsider our approach to archaeological knowledge and its shadowy data [11].

Current Practice
Aspects of these absences are recognized, most evidently in the debates surrounding the importance of providing contextual information alongside data. The move to open and reproducible science in archaeology has highlighted the importance of context as a means of maximizing data reuse potential through providing information about the circumstances of discovery and the methods used in data retrieval, categorization, and analysis, for example (e.g., [12][13][14]). Metadata is understood to address this by providing descriptions of data that enable a common context to be provided across different datasets. Typically, however, most metadata are focused on the needs of discovery (the title of the dataset, its nature, and location, authorship, rights, sources, etc.). Consequently, metadata primarily serves the purpose of reuse only insofar as it facilitates the search for potential datasets to reuse. Beyond metadata, the concept of paradata has been used to describe what might otherwise be termed "data provenance": the decisions and processes surrounding the collection, recording, modification, and processing of data (e.g., [15][16][17]). However, the precise constituents of paradata are ill-defined. For example, the Seville Principles for virtual archaeology simply refer to paradata in terms of the need for clarity, conciseness, and availability, alongside the importance of providing as much information as possible [18] (p. 280). Elsewhere, paradata has been described as "documentation about equipment, protocols, procedures for collecting and processing data, and experimental or laboratory conditions of data handling" [19]. In archaeology, paradata has been characterized simply as the recording of "interpretative decisions" [20] (p. 121) or more explicitly in terms of "detailed information about the excavation of the remains, the analyst's training and expertise, where analysis took place, which methods and reference materials were used, how the dataset was modified, etc." [12] (p. 45). What paradata precisely consists of is not specified further, and the way in which paradata are prepared and presented is equally loosely defined. For example, it may consist of a document, or a separately published peer-reviewed paper, or shared online and linked to the dataset [12] (p. 45).
Three key aspects may be drawn from this brief overview of paradata. First, a seeming paradox of paradata is its potentially infinite regression. As Gant and Reilly [21] (p. 109) observe, the provenance of paradata is itself a subject of further meta/paradata, creating what they call layers of nested datasets of introspection, or "peridata". Similarly, Martin-Rodilla and Gonzalez-Perez [22] (p. 36) (see also [23] (pp. 181-189)) highlight that making metadata or "metainformation" the focus of study creates meta-metainformation, and argue that these "metastacks" underline that "metainformation is just a particular role that information may play in specific scenarios" [22] (p. 37). In other words, what is meta/paradata to one researcher may be data to another, and vice versa. Secondly, the loose definition and narrative form of most archaeological paradata lend themselves to differential interpretations and implementations by different projects and different authors. In the process, an opportunity is lost to use the paradata in an explicitly analytical way, or even to retrieve datasets according to qualities embedded in their paradata. This is despite the existence of more formal models for characterizing paradata, including the W3C model for data provenance [24] and the CRMdig extension to CIDOC CRM designed to capture paradata for digital data (e.g., [25,26]). Thirdly, the character of paradata remains relatively limited. While it extends what is traditionally seen as metadata, especially in its more structured forms it retains a relatively technical aspect with a strong focus on the digital background to digital data. For example, it focuses on the systems, formats, processing, and analytical methods used in creating the digital data, although it may not include the code or scripts actually used [27]. While perhaps inevitable, this technological emphasis means that nontechnological, more human-centered decisions and actions may be unrepresented. This might reasonably be expected to affect approaches to the data and consequently skew results in unforeseen and unrecognized ways. For example, a blind analysis of an archaeological faunal dataset [13] which lacked background contextual and methodological information (provenance, relationships, standards, etc.) demonstrated that different analysts arrived at different conclusions from the same data. For instance, analysts responded differently in terms of their aggregation of data categories and in their assessments of the reliability, consistency, and comparability of the data. These kinds of absences are characterized by Hand [28] as "dark data": data that have not been recorded and remain largely hidden but which may have a major impact on subsequent use and interpretation. It is true that the problem of gaps and absences in the data is not new: it has always been a challenge to reuse data-digital or otherwise-that have been collected and presented by others. However, the digital nature of data exacerbates this situation through their availability and accessibility, increasing the knowledge distance between data producers and data reusers [29]. This makes it even more important to identify the silences and, once identified, to capture them.

Introducing Ignorance
One approach to the question of these silences is to switch from the traditional focus on knowledge and knowledge creation to one which considers what we do not know: essentially to examine our ignorance. Considering ignorance can be a valuable means to understanding knowledge since "… we often understand something by contrasting it with what it is not. Thus, we might get a better grip on knowledge by analyzing ignorance." [30] (p. 57). Similarly, ignorance rather than knowledge can be seen to drive science [31], and "Awareness of ignorance occasions inquiry, and fuels it." [6] (p. 172). This is not an approach that has been extensively investigated until recently (and not at all in archaeology) but beyond the obvious paradox-the desire to know what we do not know [32] (p. 18)-the approach offers a number of advantages. For example, Frickel identifies a series of strategies which can be used in combination to identify and explain the production of knowledge gaps, to "systematically investigate temporal and spatial processes of ignorance production and their institutionalization within and among different social domains." [33] (p. 272). These include an "inferential approach": "When actors understand themselves to be operating from a position of notknowing, they plan for and talk about what they do not know and they act accordingly." [33] (p. 272). This inferential approach to ignorance sits well with an understanding of archaeological practice, such as in developing an excavation strategy for instance, and underlines that ignorance can act positively to guide future action.
If approaches to understanding ignorance accord with some perspectives on archaeological knowledge and new data-based empiricism (e.g., [5,34]), they are also seen to relate to information technologies. For example, while computers are typically seen as a means of increasing knowledge and thereby reduce ignorance, they may also enhance ignorance through misinformation and restricting access (e.g., [35] (p. 413)). Furthermore, the act of digitalization or datafication may also propagate ignorance: "Not only do our information systems limit information, they alter it to make it processable" [6] (p. 174). This highlights two key problematic areas. First, information technologies invisibly manipulate our access to knowledge and consequently create ignorance. For instance, the filter bubbles of online search tools, recommendation systems, social media, etc., tailor themselves to our interactions and shape what we subsequently encounter, while algorithmic big data and machine learning tools frequently disguise human biases (e.g., [36]). This is not something that has been addressed within an archaeological context in any detail, but it seems clear that, for example, digital archives providing gateways to archaeological data will influence discovery, retrieval, and subsequent analysis in various ways ( [4] (p. S12); [37] (p. 338)). Secondly, the digitalization of data is itself reductive. To make data computable, accessible, and retrievable requires the structuring of recording systems, the atomization of selected attributes, the compression of information into largely predetermined categories, and so on. Much of this remains hidden, leaving users ignorant of the black-boxed structures, configurations, and algorithms embedded in the digital cognitive artifacts employed in archaeological analysis [38]. Together, these two factors support the use of the concept of ignorance as a means of considering the digital creation and manipulation of archaeological knowledge. As Frazier [39] (p. 1) observes, information science is typically defined in terms of what is seen, captured, and known (data, information, and knowledge often represented as the DIK(W) pyramid) whereas ignorance-or agnotology (the cultural production of ignorance and its study [40])-can be a valuable tool in addressing the ambiguities surrounding knowledge.
Ignorance is not something that has been explicitly discussed within archaeology beyond perhaps its mitigation (e.g., [41] (pp. 128-129)), although it is related to more familiar discussions surrounding the uncertainty, vagueness, and ambiguity of archaeological data (e.g., [5,34]). The exception, however, is Wylie's [42] discussion of ignorance within an archaeological context which is not widely cited in the archaeological literature. Wylie identified two aspects of archaeological ignorance. First are epistemological factors, where empirical data is lacking or has not survived, or the technologies to recover, analyze, and interpret the data do not yet exist [42] (p. 185). For example, Sørensen warns against archaeological treatments of empirical evidence that "categorically reduce or eliminate the role of absent data and the potential vagueness or ambiguity of data in the scientific pursuit of the human past" [2] (p. 107). Secondly, Wylie identifies ontological constraints, particularly in dealing with the complexities of human actions and intentions past and present [42] (p. 185). For example, Proctor [40] (p. 22) points to the different epistemic traditions of field archaeologists and linguistic archaeologists (philologists) concerning differences in understanding the costs of knowledge and ignorance in relation to artifact provenance. Wylie argues that the combination of these epistemological and ontological factors forces archaeologists seeking to reuse data into routinely undertaking what amounts to a secondary retrieval of data: "In the process, they find not only that the empirical legacy of 150 years of archaeological work is rife with gaps and inconsistencies, a reflection of evolving retrieval and recording practices, but also that it has been badly compromised by poor storage conditions, sometimes lost altogether, or dispersed among institutions in ways that greatly complicate any systematic use of existing records or collections." [42] (p. 195). She points to "the creative use of new computer technologies" [42] (p. 195) as a means of addressing these problems, but it is argued here that digital tools are equally likely to compound or contribute to such issues in a way that may not always be entirely helpful.

Characterizing Ignorance
There are different dimensions to ignorance, and several approaches to categorizing them. For instance, Proctor [40] distinguishes between ignorance as a native state, where ignorance is the common condition, overcome through the acquisition of knowledge [40] (pp. 4-6); ignorance as selective choice or bias, recognizing that any investigation focuses on certain aspects and omits others [40] (pp. [6][7][8]; and ignorance as a strategic ploy, where it is actively cultivated to deceive, withhold information, misdirect, or deny access to knowledge [40] (pp. [8][9][10]. In certain respects, Wylie's [42] characterization of archaeological ignorance in terms of epistemological factors overlaps with Proctor's native state ignorance, while her ontological factors overlap with Proctor's selective ignorance. Whether archaeology actively employs ignorance as a strategic ploy would seem more controversial, although the degradation of open spatial data to reduce the risk of site looting would be one clear instance. Wylie's categories are also akin to Smithson's distinction between informational ignorance, where ignorance arises through errors concerning facts, and epistemological ignorance, errors in the processing of facts [43] (pp. 6-7).
Probably the most popularly known characterization of ignorance is that of Donald Rumsfeld, when as US Secretary of Defense he referred to known knowns, known unknowns, and unknown unknowns in response to a question about whether evidence existed or not (e.g., [44]). As DeNicola [32] (p. 40) observes, Rumsfeld omitted the category of unknown knowns, things we do not know we know, while Kerwin [6] (pp. 178-182) had previously outlined similar categories of ignorance. Leaving aside known knowns as explicitly representing knowledge rather than ignorance, four specific aspects of ignorance can be usefully addressed in terms of unknowns (summarized in Table  1): 1. Things we know we do not know [6] (p. 179). These are the known unknowns representing a form of conscious ignorance [43] (p. 6) or specified ignorance [45] (p. 7). This may include nonknowledge (which may be taken into account in the future) and negative knowledge (where what is not known is considered unimportant) [46] (pp. 59-61). 2. Things we do not know we do not know [6] (pp. 179-180). These are the unknown unknowns, otherwise referred to as unrecognized ignorance [45] (p. 10), ignorance-squared (i.e., ignorance of ignorance) [47] (p. 157), or nescience [46] (p. 66), for example. 3. Things we think we know but do not know [6] (p. 180). These are characterized by Smithson as erroneous cognition or simply errors which may arise through distortions or biases, or through incomplete, uncertain, or missing information [43] (pp. 7-8). 4. Things we do not know we know [6] (pp. 180-181). These are unknown knowns, things we do not realize we know, or which we operate through custom or instinct and are rarely articulated, otherwise known as tacit knowledge. Need to be articulated to become known knowns How does the characterization of known unknowns, unknown unknowns, mistaken knowns, and unknown knowns, affect our approach to digital data and the creation of archaeological knowledge? Changing the emphasis away from knowledge switches the focus onto the implications of the data we know are missing, data we do not know we are missing, data which might have existed but has not survived, data which might have existed but has not been collected, data which we do not know is selective or the reasons for its selection, and data we do not know is unreliable because of issues of measurement, uncertainty, ambiguity, and bias. It restores and reinforces an image of partial knowledge against what can seem an almost fetishistic perspective of data as a resource to be mined algorithmically with its volume overcoming any possible issues with reliability or quality [4]. While the digitalization of data is not the sole cause of the reduction of information through its archaeological capture, it is nevertheless a contributory factor and its own shortcomings are easily disguised in the presentation of the resulting digital data since many of these features are absent from current metadata or paradata.
Consideration of (un)known (un)knowns can, therefore, highlight the challenges of data creation and subsequent use, but it also sheds light on our treatment and attitudes toward those data. For example, a study looking at the reuse of archaeological field data [14] suggested that although the lack of contextual information is frequently cited as a shortcoming of archaeological data, nevertheless data are reused as ways are found of circumventing the absences via workarounds. In this way, from a position of conscious ignorance, we proceed to combine things we think we know but do not know with unknown unknowns, which themselves are compounded by the unknown knowns of the original data creators. Thinking about the range of unknowns, therefore, leads to a series of questions about data. For example, what is omitted or missing? Are specific types of data not included? Why are they not included? What aspects do we habitually overlook or set aside in our practice? Where do the ambiguities, uncertainties, inaccuracies, and confusion lie? What are the consequences of these absences on our knowledge? In this light, Table 1 would suggest that there are two primary areas to be addressed: handling the known unknowns (and, by association, the mistaken unknowns), and dealing with the unknown knowns.

On Known Unknowns
Any archaeological study incorporating the reuse of data encounters a range of unknowns, especially when analyzing multiple datasets that have been captured by different people separated in time and space. Common problems that are typically encountered include variable recovery methodologies, multiple and/or idiosyncratic recording methods, and questions of trust in the identification and classification of cases, for example. These are not restricted to digital encounters but have been a feature of meta-analyses for many years and constitute a set of known unknowns and mistaken knowns when dealing with multiple datasets from different sources. For example, studies of burial data without direct access to the physical evidence typically need to resolve differential methods of aging and sexing individuals-whether through skeletal characteristics (assuming these can be taken at face value) or characteristic grave goods (assuming they are genderassociated), or both, or neither (e.g., [48] (pp. 572-573); [49] (pp. 343-345); [50] (p. 2)). To resolve known unknowns such as these, data about age at death and sex are frequently fitted to general classes (such as neonate, infant, subadult, etc.) because of the variance in recording across different data sets (see also [51] (p. 540) for example), although such reclassification risks the creation of mistaken knowns. Similarly, identification of animal specimens may be recorded at different levels: from simple presence/absence to estimates of quantity recorded in different ways (e.g., [52] (pp. 4-5); [53] (p. 8); [54] (p. 5)), requiring the analyst to detect whether the absence of certain categories is genuine or simply that they were not considered (e.g., [54] (p. 5)). For a given analysis, datasets may need to exceed minimum estimates to be included (e.g., [53] (p. 8)), or quantities are omitted altogether (e.g., [54] (p. 6)), or the study restricted to the most abundant taxa (e.g., [55] (p. 2)). Elsewhere, differential recovery methodologies-whether screening is used, mesh size, collection, and retention procedures for artifacts and ecofacts, and so on-affect how that data can subsequently be used (e.g., [52] (p. 5); [53] (p. 8); [54] (p. 7)). Again, data may be set aside altogether, or certain fields dropped because they are missing from some datasets, or ultimately the problem ignored altogether because differential factors are seen to be outweighed by the quantity of data. The latter is a common claim in "big data" analyses (e.g., [51] (p. 544); [53] (p. 1) although the statistical assumptions behind this kind of approach are open to challenge (e.g., [4] (p. S13)).
These and similar kinds of known unknowns can be handled in various ways to address the problems and biases in the data but are most commonly addressed through data cleansing. These constitute a set of often undefined processes that seek to identify where the discrepancies lie, determine whether or not they can be resolved, and take the necessary action (correction, recategorization, or omission). Such issues may consist of syntactic anomalies (e.g., irregularities in the presentation of the data), semantic anomalies (e.g., handing duplicate or contradictory data), and coverage anomalies (e.g., missing or incomplete data) [56] (pp. 6-7). The experience of data cleansing is inevitably iterative, with multiple stages back and forth as problems with the data are often revealed only once an analysis has been attempted, as interesting insights are separated from data anomalies, as data are added or removed in an attempt to resolve problems (e.g., [57] (pp. 272-274)) and as mistaken knowledge comes to light. However, these iterations are often poorly documented, if at all, but this kind of documentation is an important component of ensuring the accountability of the researchers and the reusability of the data [27,58]. While there have been considerable efforts in recent years to make the collection of data documentation as easy as possible (e.g., [59] (p. 231)), this primarily relates to metadata for data discovery rather than paradata capturing the set of inferences, actions, and modifications that lie behind the data. In many respects, this is a far more complex problem. Capturing the provenance of data as a continuous chain of history back to its origins (e.g., [60] (p. 70)) in a manner which is automatically or at least easily captured and maintained remains an as-yet unsolved problem, and limited to specific instances: for example, EXIF tags embedded in images taken with digital cameras, or descriptive metadata summarizing geoprocessing variables and procedures embedded in GIS geodatabases (e.g., [61] (pp. 20-21)). The problem for a data reuser is that in their absence such details shift from known unknowns to unknown unknowns over time and between practitioners.

On Unknown Knowns
In many instances, unknown knowns consist of habitual practices that are not considered by the practitioner to be worthy of explanation or discussion. These only come to light when encountered by another who does not recognize them as standard practice (e.g., [62] (p. 67)). In this way, what is considered unnecessary or irrelevant and hence silent is potentially what another (usually later) researcher would consider to be meta/paradata or data. These silences consequently represent the unknown knowns of the original practitioner: tacit knowledge which operates through custom, experience, or instinct and which is rarely articulated or explained to others, ultimately becoming those others' unknown unknowns. How do these unknown knowns arise in practice and become unknown unknowns? Here there are useful parallels with Hayes' [63] discussion of forgetting, which she considers to be a powerful archaeological contribution to understanding the production of history [63] (p. 200). In part, this builds on the identification by Connerton [64] of seven kinds of forgetting, using a range of historical examples to show how forgetting leads to knowledge about the past being lost or ignored. Hayes sees forgetting as a critical component of memory [63] (p. 199), which parallels the way that ignorance is seen as part of acquiring knowledge. Indeed, ignorance is an outcome of successful forgetting, and, like ignorance, digital tools are frequently employed as a means of forestalling forgetting. The characterization of forgetting transposed into archaeological knowledge practice may, therefore, reveal something of the background to unknown knowns.

Forgetting Disciplines
Disciplinary divides may give rise to forgetting [63] (p. 204). For example, Khazraee [65] (p. 330) sees the integrity of an archaeological site as broken across different disciplinary specializations, with evidence viewed through different disciplinary lenses. In this sense forgetting may be envisaged as the absence of some specialists from the field, their restriction to the laboratory and post-excavation phase, or their omission from the investigatory and analytical process altogether, leading to certain categories of data not being recognized, collected, or analyzed. For instance, in their metastudy of Neolithic farming using zooarchaeological data, Orton and colleagues [54] (p. 5) found that certain taxa of animals were only sporadically included in datasets, and argued that these absences were related to those categories being specialisms within zooarchaeology. Micromorphological studies may be similarly underrepresented, with geoarchaeologists frequently absent from field projects [66,67]. While archaeology has always been celebrated for its interdisciplinarity-it is "in our bones" [68] (p. 49)-this is not without its challenges. For example, the natural sciences may be perceived as threatening and provocative [68] (p. 50), and the incorporation of data from the natural and hard sciences into cultural narratives can be problematic at times (e.g., [69] (pp. 178-179)). A disciplinary approach is by definition limited and partial: "Disciplinary researchers inherit ways of interacting with the world that are specialized for producing certain kinds of knowledge, and this specialization brings certain kinds of ignorance with it" [70] (p. 645). Interdisciplinary research can, therefore, be seen as a means of redressing ignorance, but interdisciplinarity can itself be a source of ignorance: "ignorance can emerge as a function of how the group is collectively situated" [70] (p. 649).

Forgetting through Effacement
Effacement [63] (p. 210), where interactions are not written into the record, deliberately or otherwise, is related to Connerton's repressive erasure [64] (pp. 60-61) which accentuates a specific perspective by covertly erasing or editing out parts and emphasizing others. In archaeological practice, this might be identified in three specific areas: the effacement of methodology, the effacement of technique, and effacement through translation. For example, methodological practice may be effaced through a black-boxing approach: for instance, Leighton [62] (p. 69) observes that archaeologists rarely describe in detail how an excavation was carried out or explain how specific practices support a particular claim. Similarly, Edgeworth warns that "Silent or tacit knowledge is one component in the mix, but at the same time processes of explicit thinking, reflection and discussion entailed in the practical labor of following cuts and other material patterns must not be written out of accounts of excavation." [71] (p. 107).
Effacement of technique is related to disciplinary forgetting (above). This may include the omission of geomorphological techniques [66], for example, but may simply be a consequence of different excavation strategies which affect the scale of examination of the deposits and their formation (e.g., [72] (p. 105); [67] (p. 238)). Effacement through translation most clearly arises through the digitization of analog records, an interpretative act involving decisions about what to include together with the categorization and standardization of information to suit the digital tools. Translation, therefore, always risks loss or change of meaning which may only partially be addressed through reflection and "data intimacy" [20] (p. 130).

Forgetting over Time
Certain actions or interactions may not be recognized as significant until well after the event, adding a temporal dimension to forgetting [63] (p. 205). In this context, decisions or activities may be considered insignificant or so customary that they are not recorded (and essentially effaced) but may come to be seen by others at a later time to crucially determine the functionality or usefulness of the data, by which time the opportunity to capture that information has been lost. Certain things are deemed worth recording while others are not, and these perspectives change over time under different paradigms and with different levels of resources. For instance, features from certain periods may be given precedence over those from others (e.g., [73] (p. 93)). As Edgeworth observes, "there is an aspect to the archaeologist's perception and interpretation of the world that actively leaps out to regard some things as significant, so that these favored objects or features seem to stand out in the foreground of our attention, while thousands of other things (and aspects of things) not perceived to be significant remain in the background." [71] (p. 110).
The nature of archaeological remains as constructs rather than records, constituted and interpreted at the time of their recognition, means certain aspects may not be recognized in the initial encounter, may not be thought significant or relevant at the time, or may not be capable of being captured, and are consequently laid aside, forgotten, becoming the unknown unknowns of subsequent encounters.

Forgetting by Command
Finally, there may be commanded forgetting [63] (p. 212), what Connerton calls "prescriptive forgetting" [64] (pp. [61][62]. It is commonly characterized as institutionalized forgetting, perhaps most clearly represented in archaeology through top-down approaches to recording and increasingly mechanistic data capture, whether through paper proformas or digital equipment. Hence, for example, "The norms of our recording system enables forms, such as this trench plan, to be repeated, again and again, until they are 'forgotten' and simply become forms of life, and habitual practice" [73] (p. 87). Customary, if not mandated, recording systems are often seen as deskilling method and practice through providing predetermined descriptive fields and predefined categories (e.g., [74] (p. 423)), and although not limited to digital systems they can be seen as supporting a digitally enhanced practice enabling fuller engagement with the physical data (e.g., [75] (p. 339)). In either case, an element of forgetting takes place within standardized practice, especially since "… when confronted by a feature or a site that is 'out of place' we immediately fall back upon our standardized and comfortable methods to create the trench and the feature-and thus render the past knowable." [73] (p. 88).
The range of forgetfulness surrounding practice usefully characterizes the range of unknown knowns which, unless they are captured in some way, risk becoming the unknown unknowns of subsequent practitioners separated by space and time.

The Tacit Problem
Taking these (un)known (un)knowns and transforming them into known knowns presents considerable challenges; however, as has been shown, these are not entirely unfamiliar within archaeological practice. For example, as discussed above there is a range of established methods for attempting to resolve the known unknowns inherent in analyses across multiple datasets, even if they are frequently not discussed or documented in detail. These approaches often bring their own limitations as they seek to address differential recording, sampling bias, or the range of anomalies within the data. Indeed, they may potentially introduce new problems-mistaken knowns-through incorrect recategorization, invalid treatment of "unknown" or "null" data, or mishandling of contradictory or duplicate data, for example. If such decisions and actions are themselves not adequately recorded subsequent studies using these newly formed datasets will be negatively impacted. Similarly, archaeological recognition of the importance of unknown knowns is seen in the use of unstructured recording methods during fieldwork using diaries or notebooks, often alongside standard proforma records, as a means of capturing elements of the tacit knowledge of the fieldworker through recording their thoughts, ideas, mistakes, decisions, and actions. In the process, a record of the changing understanding of the site is created, albeit one that mixes data, interpretation, and speculation, and which is often remote from the digital record and ultimately only deposited in the physical archive, if at all. Moreover, such unstructured methods may be largely restricted to research scenarios and absent from the commercial context because of time and resource constraints.
In some instances, using less structured formats to represent more tacit elements of recording has been translated into a digital environment. For example, excavations at Çatalhöyük employed digital site diaries as a means of revealing the assumptions and preconceptions of the project team as part of the overall record [76] (pp. 436-437), while at Pompeii, digital diary-style entries were kept alongside the database forms [77] (p. 64). Elsewhere, the Federated Archaeological Information Management System (FAIMS) includes annotation fields on each data record, which is suggested to mimic handwritten annotations in the margins of proforma sheets [78] (p. 49). Such approaches are seen to contribute to a reflexive approach to fieldwork and are claimed to result in a richer body of data (e.g., [77] (p. 64)), although they may also result in a tension between standardized recording practices and the more fluid and flexible approaches which are seeking to expose the otherwise largely tacit knowledge employed in the field (e.g., [79] (pp. S20-S21)). Again, commercial imperatives may restrict the application of such methods.
However, these approaches to revealing unknown knowns are largely incomplete and based on a limited perspective of tacit knowledge. In particular, they assume that these unknown knowns can be externalized and are just waiting for the time and resource to be made available to simply retrieve them, whereas more widely tacit knowledge is understood to be difficult, if not impossible, to articulate and to codify (e.g., [80] (p. 129); [81]; [82] (p. 228); [83] (p. 472)). Indeed, like the paradox behind the study of ignorance-seeking to know what we do not know [32] (p. 18)-there are a series of paradoxes at the heart of the investigation of tacit knowledge. If tacit knowledge can be articulated, is it still tacit [84] (p. 364)? Or if tacit knowledge cannot be articulated then can it even be discussed? If it can, then presumably it can be recorded [81]. This tension is exacerbated by the introduction of digital technology which tends to focus on what can be codified, emphasizing explicit knowledge over tacit and attempting to convert tacit into explicit. In turn, this might suggest that there are some aspects of tacit knowledge that are easier to articulate than others and it may be useful to distinguish between these, in the process highlighting key aspects of these unknown knowns along with potential methods for their representation.
Over time, tacit knowledge has become characterized as a rich and complex tapestry of competencies, skills, know-how, physical abilities, practices, personal insight, expertise, gut-feeling, and so on (e.g., [85] (p. 84)), which may be capable of externalization to differing degrees. For example, Kingston [81] defines three categories of tacit knowledge (summarized in Table 2): 1. Symbolic Experiential Knowledge is "gained from experience that the knowledge owner knows they possess. It is in the form of words or concepts; it can, therefore, be verbalized or recorded, but never has been" [81]. It primarily consists of heuristics, categorizations, and patterns which can be made explicit through words, diagrams, and models, for example. 2. Non-symbolic Experiential Knowledge is "gained from experience that is not in the form of symbols but in some other form: numeric; geometric; perceptual; or physiological. The owner of this knowledge knows that they have it, but may find it very difficult to verbalise" [81]. If capable of being captured, such knowledge may be represented through the use of photographs, diagrams, videos, and so on. 3. True Tacit Knowledge is knowledge that a person has but does not know it, which may make it difficult, if not impossible, to capture.
Kingston's three-way characterization is broadly similar to Collins' [86] characterization of tacit knowledge, for example (Table 2). Symbolic experiential knowledge is comparable to Relational Tacit Knowledge (or weak tacit knowledge): knowledge that can be made explicit [86] (pp. [85][86][87][88][89][90][91][92][93][94][95][96][97][98]. Nonsymbolic experiential knowledge is equivalent to Somatic Tacit Knowledge (or medium tacit knowledge) and is associated with the properties of human bodies and brains, and physical things [86] (pp. [99][100][101][102][103][104][105][106][107][108][109][110][111][112][113][114][115][116][117]. True tacit knowledge largely corresponds to Collective Tacit Knowledge (or strong tacit knowledge) which we do not know how to make explicit and cannot see how it might be explicated [86] (pp. 119-138). Similarly, Virtanen [87] identifies what he defines as three different levels of content of mind from the perspective of its accessibility which again broadly map onto the characterizations of Kingston and Collins (Table 2). He sees conscious linguistic representations, or representations that are capable of being made linguistic, in terms of declarative knowledge or propositional thoughts [87] which are like the categorical or pattern-based knowledge forms identified by Kingston [81]. Then there are conscious representations that are difficult to articulate, which Virtanen [87] sees as more phenomenological in nature, where there may be a lack of suitable words, an inability to describe the phenomenon or experience, or an incompletely formulated concept. There is a clear equivalence here with Kingston's nonsymbolic experiential knowledge and Collins' somatic tacit knowledge. What remains is incapable of being represented: the unreachable content of the mind [87], which maps onto Kingston's true tacit knowledge and Collins' collective or strong tacit knowledge. Unreachable content incapable of being consciously represented Breaking down and characterizing tacit knowledge in these ways underlines the limitations of prior attempts to record archaeological tacit knowledge during data capture, which inevitably focus on aspects that can be (relatively) easily captured and codified, and which can be incorporated within a digital structure. In the process, significant aspects of the unknown knowns are left unaddressed. For example, in a reversal of Pozzali's criticism of the wider body of literature on tacit knowledge [82] (p. 229), archaeology rarely captures aspects of the implicit skills and kinesthetic abilities associated with bodily engagement with the material evidence in the field or laboratory, because these are difficult to articulate and no simple way of communicating them has been identified.

Tacit Articulation
The most common approaches to externalizing tacit knowledge are through dialogue, narration, or discourse (e.g., [88] (p. 1133); [80] (p. 130); [84] (p. 360)), although of course, this depends on its capability to be articulated in the first place. Tacit knowledge mediates practice and may be foregrounded when actions surrounding a particular task do not meet the practitioner's tacit expectations. Such breakdowns help practitioners exteriorize tacit practice in response to anomalous situations through employing coping strategies which reveal formerly hidden aspects and connections within practice. These may then be externalized through what Tsoukas calls attentiondrawing forms of talk: "'look at this,' 'have you thought about this in that way?', 'try this,' 'imagine this,' 'compare this to that'" [83] (p. 471). In this way, he argues, tacit knowledge is manifestedrather than captured, converted or translated-in what we do: "New knowledge comes about not when the tacit is converted to explicit, but when tacit knowledge is re-punctuated (articulated) through dialogical interaction" [83] (p. 473). Analysis of discourse can reveal the tacit knowledge used by the practitioner to resolve their encounter, so that "… an understanding of speakers' lived in experience and interpretation of their world-their orientation to context-can be unlocked" [88] (p. 1136).
Although strongly reminiscent of trench-side or laboratory conversations, such discourses or dialogues are rarely captured in archaeological contexts. What is otherwise largely silent may be represented within field diaries and notebooks, for example, but otherwise the opportunities for a narrative explanation within standardized data recording are typically limited. However, one area which has been investigated, if not widely implemented, is video recording. Video has been employed on a number of sites (e.g., [89][90][91]), notably at Çatalhöyük where it was introduced in as a means of providing reflexivity-"the camera at the trowel's edge" [91]-through recording group discussions in the excavation trenches along with individual accounts and laboratory work [92] (p. 696). This practice was limited since the cameras were not operated by the excavators themselves and filming was dependent on the archaeologists summoning a camera when something was considered worth recording [93] (p. 39). Elsewhere at Çatalhöyük, the BACH (Berkeley Archaeologists at Çatalhöyük) project created a detailed video archive recorded as part of the excavation documentation, including short digital diaries and discussions among excavators and lab specialists, for example [94]; [93] (p. 41). Advantages of video recording may include the level of spontaneity and individual expression in spoken, rather written, language, and the way that spoken language seems less dominated by formalized, standardized codes (e.g., [95] (p. 230)), both of which may facilitate the externalization of tacit knowledge. More recently, Chrysanthi and colleagues [91] described experiments with personal video recorders worn during fieldwork at Portus and Çatalhöyük. Using head-mounted cameras enabled hands-free capture of excavation and recording [91] (p. 250), recording conversations about the work, key points of debate and discussion, and in some cases showing the act of excavation itself without commentary. These are described as narratives of the lived experience and as arguments surrounding the archaeological process [91] (pp. 254-255). For example, they note the way in which uncertainty surrounding interpretation is captured on video, highlighting the development of understanding in contrast to the way in which it is more commonly subsumed within standard recording practices [91] (pp. 262-264). Their approach is akin to what Morgan [96] (p. 334) characterizes as the phenomenological genre of video recording, providing the viewer with the gaze of the archaeologist and conveying their personal experience of the excavation process, unlike the direct testimonial approach [96] (p. 331) which creates authoritative summaries presenting the conclusions rather than the processes.
One of the strengths of video capture is, therefore, the prospect of recording the lost aspects of process-capturing aspects of both the symbolic experiential and nonsymbolic experiential forms of tacit knowledge, whether through verbalizing heuristic decisions, representing patterns, demonstrating physical actions in the moving images, or visualizing the shape, texture, and color of layers and the relationships between them by moving the viewpoint around, for instance. For example, video provides a means of capturing the materialization of observations through drawing and the conversations that occur around them [97] (p. 146). A final archived field drawing represents the end of a process that is normally not captured: from the initial marking of the actual archaeological surfaces to differentiate the interfaces, through intermediate sketches and diagrams (e.g., [21] (p. 104)) and associated annotations, these intervening procedures shed light on the developing understanding of the features concerned.
On the other hand, video is limited in certain key respects. For example, poor color reproduction of sections and layers [91] (p. 253) may be compounded by insufficient resolution, and variable levels of exposure and focus, especially in poor lighting conditions, which may restrict the utility of the resulting video (all problems shared by photography in general). More significant is the absence of sensory information beyond the visual and aural. While the sound of tools on surfaces may provide some insight into the physical encounter alongside its visualization, it provides little information about the levels of resistance, texture, and "feel" that expert excavators rely on. This is what Gant and Reilly [21] (p. 108) call the "voice of the deposits" and the multithreaded narrative created by the different aspects of interaction with them. The affective nature of archaeology modifies thought and action through encounters with its materiality and physicality (e.g., [98] (p. 4); [99] (pp. 360-361)) at the same time as the archaeologist materializes the archaeological feature [100] (p. 78). Such aspects are perhaps best suited to the impressionistic genre of video [96] (p. 332) or else move beyond video to incorporate other digital techniques. For example, the use of 3D imaging from photography can show textural as well as color variations in stratigraphy (e.g., [101] (p. 188)) and is increasingly widely used in archaeological practice. The representation of the sensual has been attempted through 3D models of troweling gestures [21] (pp. [110][111][112] and "sonic stratigraphy" [21] (pp. [113][114][115], through auralization of spaces (e.g., [102,103]), and through the creation of multi-sensory augmented reality incorporating real-world imagery with 3D models which respond to actions and use haptics to capture feel and smell (e.g., [104,105]).
Other means of externalizing the unknown knowns may also be sought. For example, analyzing the ways in which a project archive changes through time may reveal the shifts in methodology and interpretative values by detecting and mapping the alterations to the underlying data structures [79] (pp. S26-S28). The scope for such an approach relies on a long-standing project such as at Çatalhöyük, but it does point to the potential for revealing aspects of the inferences and tacit knowledge embedded within the data collected through analyzing their structure. This is akin to Gardin's work on codifying archaeological representation, discourse and meaning (e.g., [106]; see summary in [107] (pp. 311-316)) in which he sought to analyze "the mental operations carried out in archaeological constructions of all sorts, from the collecting of data to the writing of an article or book" [106] (p. xi). For example, applying a model of argumentation such as defined by CRMinf [108,109], while not designed to externalize tacit knowledge since it specifically reasons about explicit facts, may reveal tacit elements in terms of what is "left over" from or does not fit within the formal model. This kind of inverse modeling, where tacit knowledge is revealed by what is being modeled without being modeled itself, may also be applied in other areas. For instance, the kinds of knowledge elicitation techniques applied to the development of expert systems in the 1980s may in certain respects reveal weak tacit knowledge [86] (p. 160) through the attempt to model the reasoning of human experts using methods ranging from discourse analysis, interviews, think-aloud methods, and critical decision analysis. While only partly successful at the time due to the degree of formalization required and the difficulty of articulating tacit knowledge (e.g., [110]), expert systems were capable of emulating the outcomes of human reasoning in certain well-defined areas, and analysis of the gaps in those emulations may reveal aspects corresponding to tacit elements within those fields. While expert systems sought to manually create a set of formal rules to adequately describe real-world reasoning, the more recent development of deep learning tools seek to reduce a complex problem into a series of simple nested mappings, each described by a different layer of the model and generated automatically from data (e.g., [111] (pp. 5-7)). Deep learning tools and neural networks may be capable of surfacing medium tacit knowledge [86] (pp. 160-161) through the way in which they operate through training programs that use positive and negative feedback to fine-tune the system. For example, a neural network may draw inferences using features in images other than those expected, and rather than being treated necessarily as errors to be refined out, they may shed light on potential tacit inferences and biases instead. Externalizing true, or strong, tacit knowledge is more problematic still, as noted above. However, Kingston [81] suggests that it may be possible to use statistical associations between evidence used to make decisions and the decisions reached in order to gain some insight into the knowledge involved, in much the same way as knowledge is represented in neural networks.

From Ignorance into Knowledge
This paper has sought to demonstrate the value of approaching the gaps in archaeological digital knowledge by turning from the traditional evaluation of what we know to consider what we do not know, and characterizing the different forms of ignorance and their implications for archaeological practice. It has shown how current archaeological approaches to resolving the (un)known (un)knowns are limited in scope, whether through the poorly documented methods of handling known unknowns or the restricted perspective on articulating unknown knowns. Shining a light on these limitations offers the prospect of addressing them, and several means of doing so have been outlined. However, the approach is not without its challenges, some of which remain to be addressed.
For example, there is the question as to whether the process adds much to archaeological knowledge in the first place. For instance, Chadwick [72] (p. 103) questioned whether the hours of video recording at Çatalhöyük added much to site interpretation, or whether it simply reflected a desire to capture or record everything. However, the proposition here is that, far from attempting to record "everything" (were that even possible), this approach emphasizes the importance of capturing what is already undertaken as a matter of standard practice so as to better represent existing practice rather than creating a new practice or capturing new data for its own sake. In the process, our knowledge claims will be strengthened and enhanced. Elsewhere, Leighton [62] (p. 67) argues that the small differences in basic archaeological practices between projects and between countries are not sufficient to affect our evaluation of the knowledge claims of other archaeologists. While this may be true, here it is argued that our knowledge of those practices is insufficient without proper consideration of their (un)known (un)knowns as a means of surfacing the process by which interpretations arise, which otherwise remains primarily implicit. In this way, archaeology can indeed challenge existing Science and Technology Studies models of scientific knowledge production as Leighton [62] (p. 68) suggests, but from a position of greater strength than is currently the case.
Of more practical concern are the implications of adopting this kind of approach in the real world. Do the practical realities outweigh any supposed theoretical purity? It is certainly true that many of these techniques take additional time. For instance, the time taken to process video footage is not insignificant, which is one reason why it has seen relatively little use. The need to create descriptive metadata, link clips to site records, generate textual descriptions of content, extract aural narrative components as text, and so on, represent a considerable investment of time. Where such processing is undertaken, it tends to be in the evenings during fieldwork and during the postexcavation phase (e.g., [91] (p. 251)) which clearly raises issues of sustainability. Aspects of processing may be automated to some degree: for instance, keyframes for summary storyboarding may be identified and extracted (e.g., [112]) and transcripts automatically created with semantic concepts and keywords extracted (e.g., [113]), but these techniques are in their relative infancy and not yet widely available. Consequently, to be feasible the approach must be capable of being embedded within existing practice. For example, increasingly commonplace in the field is 3D imaging (3Di) using structure from motion analysis of multiple overlapping photographs as a means of capturing an excavation's plans and sections or a local landscape. This is much faster than traditional drawing and survey techniques and can incorporate color and texture which can be enhanced through virtual lighting. The benefits are not unequivocal (e.g., [97] (pp. 143-145)), but, rightly or wrongly, there is evidence that the technique is being increasingly used to replace rather than enhance traditional field drawings (e.g., [114] (p. 254)). Whether such imagery is used as a replacement for, or enhancement to, traditional field drawing, there are time savings to be had. While in a commercial environment these might be used to reduce costs and time on site, they may also be used to enhance the archaeological encounter itself (e.g., [101] (p. 188)). Indeed, Powlesland [115] (p. 23) argues that "… dropping the traditional approach to recording and replacing the record with an ongoing and repeated process of 3Di … would be a tragic misuse of a very powerful technology", emphasizing instead the value of spending the time released to gain a better interpretative understanding of the archaeology. This would also provide the opportunity to implement methods that seek to capture the silences within that interpretative process.
Finally, it has to be asked whether a focus on ignorance risks being detrimental to the credibility of archaeological knowledge. Can it lead to the accusation of providing comfort for "alternative facts" through its highlighting of uncertainties in the archaeological process and emphasizing the shortcomings in the knowledge used in arriving at archaeological interpretations? Will it increase skepticism in the outcomes of archaeological inquiry and foster the further growth of pseudoarchaeology? Here, the argument is that a failure to recognize and address ignorance would adopt a position of essentially unarguable knowledge claims that is unhelpfully associated with authority and power. Fundamentally, a focus on ignorance encourages a greater degree of honesty in knowledge creation. Indeed, the mistaken illusion of knowledge-the things we think we know but do not-is arguably a greater threat than the unknown knowns, known unknowns, and unknown unknowns. In this light, ignorance is a virtuous condition for inquiry and a foundational aspect of knowledge (e.g., [116] (p. 386); [30] (p. 57)). Identifying areas of ignorance provides for new investigations creating new knowledge that will, in turn, reveal more ignorance. Consequently, addressing ignorance simply makes our archaeological knowledge more robust through inverting the lens through which we view it and shedding light on our "data shadows" [11].
Funding: This research received no external funding.
Acknowledgments: I thank Cesar Gonzalez-Perez for the invitation to contribute to this Special Issue. I am also grateful to the anonymous reviewers for their constructive and helpful feedback. As ever, any errors or misconceptions remain my own.

Conflicts of Interest:
The author declares no conflicts of interest.