Researchers and mediators of knowledge from the humanities and the arts are living in interesting times. After an era of a largely idiographic orientation towards exceptional and extraordinary works of art (referred to as “canon” or “seminal works”), new options to access and analyze large corpora and object collections (i.e., the oftentimes hidden or neglected rest of the archives) are available and up for a vivid methodological debate [1
]. In the wake of digitization and datafication initiatives, scholars and cultural custodians increasingly find themselves in the presence of remediated digital object collections [5
]. These complex corpora from galleries, libraries, archives and museums (i.e., the GLAM institutions) invite and require new ways and means how to investigate them, how to encounter and experience their accumulated content, and also quite practically how to represent their extensive stocks on the screens by graphical user interfaces.
Hybrid assemblies of computer scientists, designers, digital humanists, and art historians have started to develop and discuss answers to these questions. Quantitative and computational technologies enable new kinds of access to the cultural riches of GLAM institutions—both for the idiographic study of single objects, but also for the study of large aggregates and bigger cultural-historical pictures. Thus, for the whole spectrum of cultural corpora—from literary to visual to performing arts—new perspectives for “distant reading” and “distant viewing” are unfolding, and for these purposes, the visual representation of cultural data plays an ever-increasing role [6
]. To create and communicate overviews and conceptual orientation for collections, researchers and mediators have to develop novel portals to cultural complexity—and the use of visualizations (i.e., the versatile use of “graphs, maps, trees”) is “one way to begin doing this” ([9
], p. 4).
Visualization of cultural collections thus has become a research and development field of its own, oriented towards a unique constellation of rich data, diverse (i.e., expert and non-expert) users, and a corresponding variety of heterogeneous tasks [8
]. Along with this unique constellation comes a whole range of specific research challenges. This article is dedicated to one of these challenges, which arises from oftentimes poor levels of data quality
and large amounts of uncertainty
in cultural heritage collections and historical databases. While several authors have begun to address uncertainty and its visualization for single data dimensions (e.g., for temporal origins), no reflection has looked at the bigger picture until now and tried to understand uncertainty with regard to multiple data dimensions and synthesized design options for different dimensions. With the following considerations we will thus recollect related work (Section 2
) and document a whole spectrum of data quality dimensions for cultural collections (Section 3
). Building on this collection, we will explore how to handle this omnipresent variety of uncertainty challenges during the visualization design process—and showcase a conceptual assessment of techniques for the PolyCube framework of collection visualization [10
], which aims to represent cultural data and its many uncertainties in a more synoptic and systematic fashion (Section 4
). Finally, we will discuss related challenges and future work (Section 5
2. Related Work
Visualization designers frequently encounter a substantial challenge when confronted with historical data from a cultural collection: In comparison to many modern data collections, the quality of archival information can be remarkably poor. As a standard condition, historical data sources often rely on estimates for different data dimensions (such as time or place of origin), which introduce a lack of precision, accuracy, or data certainty to the very basis of structured or quantified information. Sometimes, the information is ambiguous (e.g., due to polysemic place names or alternative interpretations) and thus subject of an ongoing scholarly discourse. It is also common to sometimes have no entries for several data dimensions at all (missing data), or to have heterogeneous levels of data quality for different subcollections within a larger collection body. Against this background, we will shortly discuss standard datafication strategies and data structures in cultural collections, and align them with visualization techniques, which are commonly used to represent various data dimensions for expert and non-expert audiences.
The digitization of cultural object collections often begins with the creation of “realistic” object images (photographs, scans, digital images) of given artifacts. To enrich these realistic images with further data, GLAM institutions frequently operate with three main datafication strategies, which lay the basis for subsequent “distant reading” or “distant viewing” approaches.
Firstly, cultural analysts often use and digitize existing information, which has been attached to cultural objects in the past. This kind of object information appears either as structured metadata (such as date and place of origin, creator, collector, object category or style)—or as a language-based, textual description of each object and its context [8
Secondly, researchers can extract structured metadata from the mentioned textual object descriptions or make “intrinsic” object features explicit, by computationally extracting textual or visual features from the objects of a collection or from associated descriptions. As such, natural language processing [6
] or computational image recognition and feature extraction techniques are frequently applied [3
Thirdly, researchers can also generate new layers of computer-readable object information, for instance by object annotation, which exemplarily creates novel object categories available for set-typed visualization [18
From these techniques, only the former two could be used in a quasi-automated fashion, whereas qualitative information from object annotation remains tied to labour-intensive human input first. Regardless of the type of datafication, we will assume for this article that all techniques—including feature extraction—generate entries in specified data categories (or data dimensions), which lead to a standard array of structured data. While feature extraction is constantly working towards more complex features, we may assume that complex top-level features and concepts of traditional (non-digital) hermeneutical interpretation methods (e.g., “expression”, (explicit and implicit) “meaning”, art-historical “relevance”) will not be “computer-readable” and thus not available for a while (cf. [20
: Given these different datafication techniques, uncertainty
is seeping into cultural collection databases either through the historical object information itself (i.e., through the accumulated guesses, estimates, or omissions of former collectors and curators), it can be introduced through digitization procedures of analogue object information (OCR, transcription, database creation), uncertainty can be introduced by feature extraction (due to largely probabilistic algorithmic recognition and identification methods), or through processes of human sensemaking, interpretation, and categorical annotation. Later on, this uncertainty “propagates” through the representational system, and can get both omitted or further amplified by design choices of data modelling, data processing, data visualization, and obviously also by complex procedures of human interpretation [21
Information visualization approaches to cultural collections usually select multiple data dimensions—and encode the given data per dimension (quantitative, ordinal, categorical, or textual) into interactive, visual representations. As such, graph, map, tree, set, or timeline visualizations (and complex coordinated combinations thereof) help different types of users to visually analyze, access, and navigate an increasing number of cultural collections on screen. Generous, web-based interfaces thus play the same role for digital researchers or visitors as galleries, libraries, archives or museums do for researchers and visitors in the real world: They present, structure, mediate, narrate, and grant or hinder insights, learning, enjoyment, as well as cognitive and emotional cultural experiences [25
Given this growing number of visualization interfaces to cultural collections, it is noteworthy how a few of them take the sometimes-substantial amounts of data uncertainty into account and make them explicit [8
]. If we find such work, it mainly focuses on one single data dimension: the temporal dimension of object information (e.g., [26
]). While this focus on time seems to correlate with the factual relevance of this dimension in cultural heritage and history domains [28
], it is obviously not the only dimension where uncertainties exist and where their representation could help to make the factual state of cultural information and knowledge more transparent.
summarizes all the relevant aspects of cultural collection data so far—and connects them to standard techniques of collection visualization for different types of users. Uncertainty ((4) in Figure 1
) comes along with various datafication procedures and thus affects all possible types of metadata, yet—according to our best knowledge—it has not been visualized and studied in multiple dimensions up to now. In general, while we find many visual interfaces in the area using multiple (coordinated) views to cover many data dimensions [28
], there has been no orchestrated effort to better integrate these information dimensions with visual representations up to now [10
], nor to integrate uncertainty in these visualizations for multiple data dimensions in a synoptic fashion. Consequently, also studies are missing which would evaluate the effect of multidimensional uncertainty visualization on sensemaking by expert and non-expert users.
3. Types of Uncertainty in Cultural Collections
3.1. Object Uncertainty
A type of uncertainty that is rarely taken into account at all is object uncertainty. With this term we summarize uncertainties pertaining to the fundamental “ontological” status of cultural objects—and whether they are part of a collection or not. Arguably, large parts of cultural collections evidently exist, which makes it obvious to represent their physical objects also as digital objects (whether as previews or overviews) on screens. Yet, for numerous historical collections (residual) knowledge exists on objects which have been lost, stolen, looted, lent out, given away, or not digitized until now. We consider these absent/present objects to be a first interesting challenge for the common “binary” approach to the representation of digital collections. A second challenge arises from possible discussions or controversies, whether an object should be included in a collection according to various art-historical or art-critical demarcation criteria. Furthermore, an object’s inclusion could be questioned due to its (lack of) relevance, or in general due to a controversy on its alleged metadata ascription (from place or date of origin to an assumed creator, meaning, genre, or style).
A third challenge might be even more pervasive: In contrast to spatially restricted exhibition facilities, databases and interfaces technically allow storing and exhibiting unlimited numbers of cultural objects. However, hardly any digitization project can circumnavigate discussions about diverging standards of object quality, distinguishing flagship objects and premium content from various subprime classes, including objects (and their digital counterparts) not deemed worthy of public interest or visitors’ attention. As a practical consequence, many GLAM institutions frequently prefer to hide these stocks than to expose the poorer states of digitization and documentation affairs to public scrutiny—and to risk negative judgments by the peer community. We list these three types of “object uncertainty” in Figure 2
(top row) in an inverted fashion, and will argue for novel strategies to represent these fundamental uncertainty aspects in future collection visualizations in the following.
Options for visualization
: In general, interfaces to cultural collections have to decide whether they use discrete visual surrogates for objects in a collection (e.g., dots or any other kind of glyphs)—or whether they aggregate individual objects into the abstract shapes of collection overviews (such as treemaps or area charts) [8
]. Figure 3
shows a variety of overview visualizations on the right hand side. Object uncertainty thus can be encoded for such discrete visual surrogates by a multitude of glyph modification techniques (see Figure 4
), or by uncertainty indicators to abstract collection overview visualizations.
3.2. Temporal Uncertainty
Time has been argued to be the primary data dimension for cultural heritage collections, which could be productively combined with every other analytical perspective [28
]. Yet, “especially when dealing with historic time, dates are often unknown or uncertain” ([26
], p. 62). As such, information on temporal origins of cultural objects can (1) lack accuracy or be imprecise, which can result from different temporal granularities and estimates (days, years, centuries, epochs) or from some level of probability assigned (“about”, “approx.”, “circa”). Furthermore, temporal information can be (2) ambivalent (e.g., it is often unclear whether the recorded date refers to the production, use, or sale of an object [26
]) or (3) contested, due to assessments from different experts or age determination methods. (4) Finally, temporal information can be completely absent or missing for a given object.
Options for visualization
: As for the large (second) class of imprecise time information, it is often challenging to quantify or formalize the corresponding entries so that they can also be visualized [26
]—without losing any information. To visualize uncertain temporal data, Kräutli proposed to keep the temporal granularity as entered in the database (not try to transfer years to days, which might result in an accumulation of objects on the 1st of January) and to calculate a probability value for the uncertainties. Such probabilities can be either visualized by all available standard techniques to modify glyphs (see Figure 4
, top row), or—in cases where linear axes are used for the encoding of time—with linear extensions in combination with uncertainty indicators, such as opacity, dashed lines, or wavelength (see Figure 4
, center row) [26
3.3. Geospatial Uncertainty
Geographical information for cultural objects (such as place of origin
, or places of exhibition
) is a frequent subject to various forms of uncertainty (see Figure 2
): (1) Spatial origins of objects can be given by imprecise references only, ranging from points and buildings to larger spatial granularities (cities, states, cultural and historical territories), and any given reference can be further modified by various levels of probability. (2) Even detailed place names can further be ambiguous (e.g., Vienna in Austria or in the U.S.?). (3) The place of origin may also be contested by experts or (4) spatial origin information can be absent or unknown at all.
Options for visualization
: When it comes to visual representations of data quality, geo-spatial uncertainty might be the most extensively explored data dimension, as maps are in use since ancient times and experts (from cartographers to developers of GIS systems) have to deal with positional uncertainty on a daily basis [30
]. Accordingly, a whole spectrum of encoding techniques has been developed, documented, and evaluated [32
]. Some of the available techniques for geo-spatial uncertainty encoding include fuzziness, scaling, color saturation, blur, and set symbols [32
], or dynamic encoding like blinking pixels [33
]. Figure 4
lists a multitude of corresponding encoding options (top row), which could be used to modify glyphs, acting as visual surrogates for objects in (discrete) collection overviews.
3.4. Set-Typed Uncertainty
Which types of uncertainty can arise in cultural collections with respect to sets? (1) The assignment of cultural objects to a set-typed category can be uncertain or only assumed with a certain degree of probability, (2) its proper assignment can be contested and debated, or (3) cultural objects can miss an assignment to set-typed information entirely.
Options for visualization
: The visualization of uncertainty in sets did not receive much attention up to now [38
]. For the visualization of fuzzy sets it has been suggested to position data points on rings with a different diameter within a circular set area, based on their probability [39
], or to vary the opacity of the set area accordingly [40
]. Aside from the position within a set area, the uncertainty of set membership of individual objects could also be encoded with varying transparency, or any other gradual modification of a glyph—as illustrated with selected techniques listed in Figure 4
3.5. Uncertainty in Graphs and Trees
Network graphs can make explicit or implicit relations visible between cultural objects (as so-called 1-mode networks, cf. [42
]) or represent relations between objects and various other entities (e.g., between objects and creators, objects and exhibitions, or objects and concepts, cf. [43
]). Depending on the underlying data and their spatialization due to various graph layouts, graphs can appear in a variety of types and shapes. As so-called directed, acyclic graphs
, they are also able to model structures commonly referred to as trees
, which allows interface designers to represent cultural genealogies or time-oriented networks of influences. For all graph-based constellations, both the nodes and links can be subject to uncertainty. As we already discussed, object uncertainty (see Section 3.1
), we argue for the distinction of the following types of relational uncertainty: (1) Relations can have a plausible or probabilistic nature only, (2) they can be contested by different observers, or (3) data on their existence can be missing at all.
Options for visualization
: The visualization of uncertainty in (time-oriented) networks or graphs is a novel, yet already established field of research. Approaches to relational uncertainty representations do not only have to visualize the uncertainty of links (e.g., by their visual modification, cf. [44
]), but also take related changes of the graph layout into account, which can react significantly due to the smallest modification of an overall relational structure [46
]. Figure 4
(bottom row) also summarizes exemplary techniques to modify edge visualizations in graphs or trees.
3.6. Uncertainty of Attributes
In addition to the well-established metadata dimensions mentioned so far, technically countless further object data can be stored as additional object attributes. Common examples are the names of object creators or owners, the estimated value of an object, its materials, or its size. Also sets of objects or relations between objects can have attributes attached to them, and all these attributes can be subject to various degrees of uncertainty, they can be disputed, or missing at all.
Options for visualization
: With regard to the diversity of object attributes, we see a corresponding variety of possible encoding techniques. Given the fact that attributes are often already used to modify the visual appearance of objects (such as glyph color), attribute uncertainty would be a meta-attribute, requiring an additional visual channel for encoding. These techniques again can range from the use of glyph modification techniques (Figure 4
, top row), to the use of labels, or to different other encodings for aggregated and abstract collection overviews.
3.7. Provenance Uncertainty
With the term object provenance we refer to the entirety of movements and changes, which any cultural object may have been exposed to. This includes all changes of the object and its circumstances, which have become known since its creation, such as change of owners, change of places, or changes of object attributes. All these aspects of provenance information can be subject to various degrees of uncertainty or scholarly debate.
Options for visualization
: Provenance information can be visualized as spatio-temporal traces or trajectories [47
]. To represent uncertainty along these trajectories, the modification of glyphs can denote uncertainty of spatio-temporal positions or attributes (Figure 4
, top row), while edge modification techniques can signify uncertainty of hypothetical connections, movements, or vectors along an object’s historical tracks (Figure 4
, bottom row).
4. Synoptic (Uncertainty) Visualization for Cultural Collections
Cultural collections frequently provide visualization designers with rich and multivariate stocks of content, which has given rise to a whole genre of “generous” [25
] or “humanist” [48
] multi-perspective designs, which aim to honor a collection’s complexity and represent its internal diversity. As a consequence, a majority of collection interfaces use the design principle of multiple views
, to offer a plurality of access points for the complementary visual analysis [28
], and thus support the multi-faceted interpretation of complex data collections [8
]. While it could be argued that this quasi-standard of multiple views has already laid the ground for the corresponding representation of multiple uncertainty dimensions, we are not aware of any work, which has taken on the challenge of representing more than one individual uncertainty dimension up to now.
We have reflected on possible reasons to omit uncertainty and data quality issues above (see Section 3.1
): Technically, the existence of uncertainty can always be attributed to both the GLAM field (and to the nature of historical data) in general and to the “negligence” of specific institutions—and it is safe to assume that many institutions prefer to avoid ascriptions or appearances of the latter kind. In this context, we can also speculate that all these reasons and motives to restrain full data transparency even magnify and multiply in the context of multiple revelations by multiple views. Yet in addition to psychological-reputational hindrances, there is another barrier on which we will focus in the following discussion, as it can also be easily underestimated: The representation of data quality and (un)certainty can significantly increase the complexity of every interface design, and it naturally complicates design processes, which often favor simple, elegant, and consistent design solutions.
This problem leads to a relatively unexplored future challenge of complex (yet consistent) interface design: How can visualization designers structure their design space and deal with the many combinatorial options (and possible interferences) when developing synoptic interfaces with multiple views?
From this practical standpoint, the topic of uncertainty visualization appears mainly as an additional layer of data and design complexity, which has to be adapted for—and integrated into—any given visualization, and which has to be fine-tuned with regard to the right amount of salience given to uncertainty. To support future research and development in this area, we assemble a variety of uncertainty encoding techniques, which have been documented by related work (see Figure 4
). Building on this overview, we discuss a simple assessment method which can help to manage the rarely discussed challenge of bringing uncertainty (and thus multiple additional visualization dimensions) into a given visualization design.
4.1. Uncertainty Encoding Techniques
summarizes various techniques, which have been documented to encode uncertainty dimensions by modifying glyphs (top row). While most of these techniques have been developed in the area of geo-spatial uncertainty visualization [32
], they can be transferred to most of the other uncertainty types discussed in Section 3
) and to the context of collection overviews in general. In the second row of Figure 4
, selected techniques are summarized to encode temporal uncertainty as a linear plot along a spatial dimension [26
]. The third row collects uncertainty visualization techniques for edges of graphs [44
While these techniques can be rather easily applied and implemented for single data dimensions, challenges of complex and combinatorial design emerge when an orchestrated implementation for synoptic interfaces is required [10
]. In such a context, the design of these techniques has to be combined, aligned, and well-attuned to encode uncertainty as an additional layer in a consistent fashion. To do so, all options should be assessed for their consistent use in different views and probably also for different types of uncertainty. As for a simple assessment method, we suggest to draw up a selection of relevant uncertainty encoding techniques and cross-tabulate these techniques with the multiple views, which a visualization system will use. When assessing the specific applicability of techniques for each given view, a selected technique will either expose a good or a bad fit, which can be documented with a value. In comparing different possible techniques, such a tabular approach helps to weed out design options which collide with already existing features of any system’s design space—and it can highlight encoding options which could be implemented in a consistent and well-attuned fashion. Figure 4
illustrates such an assessment with regard to the PolyCube framework of collection visualization.
4.2. The PolyCube Project
We set up the PolyCube framework (https://donau-uni.ac.at/en/polycube
) as a web-based multi-perspective visualization environment for time-oriented data from cultural collections. Its visualization architecture is of a modular fashion so that multiple standard visualization techniques and templates (such as maps, sets, and graphs) can be used to generate time-oriented visualizations [10
]. One central option to do so is given by the technique of space-time cube representations, which build on a 2D-visualization as a base, and add a third spatial dimension for encoding time, to generate historically expressive data sculptures. Aside from geographic space-time cubes (geo vis), the framework can generate set-typed space-time cubes (set vis, [53
]) and relational space-time cubes (graph vis) (see Figure 5
, upper row). Due to the known relevance of the temporal data dimension, time is consistently represented along the vertical axis across these coordinated cubes.
In addition—and to compensate possible drawbacks of 3D representations—time can be visualized with three further visualization techniques: color coding, juxtaposition, and animation. All these techniques to represent the essential time dimension in cultural collections are available on demand—and the different views are mediated via seamless transitions, which aims to support and preserve the users’ orientation and to maintain their mental map during navigation [10
]. While this framework enables the synoptic visual analysis of cultural collections from multiple time-visualization perspectives, the resulting combinatorial design space is of notable complexity. Thus, the ongoing introduction of uncertainty visualizations into its operating visualization architecture proofs to be a major challenge. We will shortly outline the corresponding assessment of the design space, which is represented in Figure 4
. In doing so, we also build on a related discussion of the geo-temporal design space, which has already shown that the desideratum of visual consistency becomes a major problem when taking too many time-visualization perspectives into account [52
Visualization designers in all areas commonly build on the technique of multiple views
, when it comes to complex interface design. Exemplarily, the Palladio
interface—a widely used open tool for cultural collection visualization—uses (among other views) (i) a geo-spatial and (ii) a relational visualization in combination with (iii) a timeline visualization [54
]. Building on this standard practice, the PolyCube framework aims to connect, combine, and intertwine its multiple views for the sake of a better integrated “bigger picture” of complex collections. As described earlier, it uses maps, sets, and graphs to represent geographic, set-typed, and relational data, and it uses space-time cubes, color coding, juxtaposition, and animation to encode the essential temporal data dimension in multiple complementary fashions. Figure 4
shows how the resulting cross-tabular design space could be matched with existing uncertainty types and uncertainty visualization techniques—to assess the goodness of fit for every visualization option.
Uncertainty visualization in PolyCube: In all four temporal perspectives (= rows) we use dot-like glyphs to represent individual cultural objects (i.e., for maps, sets, and graphs). Many uncertainty visualization options from the context of geographic research (first row of techniques: saturation, fuzziness, etc.) thus could be used to modify these glyphs—but they are differently well-suited to do so from different time-visualization perspectives. Exemplarily, color hue would be a good uncertainty encoding technique within the space-time cube, the juxtaposition, or the animation view (thus encoded in dark green), but it obviously does not go together well with the color-coding perspective (grey). Other techniques stemming from temporal uncertainty encoding (second row) can be productively used for encoding uncertainty with linear extensions (such as error bars) in the space-time cube (dark green), but they cannot be meaningfully applied to the three other time visualization perspectives or other types of data visualization. As for the visualization of graphs, the figure shows how the encoding of relational uncertainty could be done by modifying edges (bottom row), but have to be carefully chosen not to be mistaken for depth cues in the space-time cube (light green) or for transitions in the animation view (grey). Looking at the whole table, it becomes clear (1) that many uncertainty visualization techniques are not applicable within the PolyCube collection visualization framework, and (2) that there is no single best uncertainty encoding technique for use in all PolyCube views, but (3) that we could markedly narrow down the design space for different uncertainty types and views.
When doing such comparative assessments in a more fine-grained, quantitative fashion, visualization projects can find out which uncertainty visualization technique fits best with their own design space (e.g., by looking for the consistent possible use across cells and selecting those techniques with the highest average values). As for a disconcerting take away of our exemplary assessment for the PolyCube framework, Figure 4
also shows that we do not find any encoding technique for glyphs to fit well with all dimensions of the design space. While this has already been established with regard to the combination of geographic and temporal data uncertainty only [52
], it holds true all the more for the whole spectrum of data dimensions. This means that complex, synoptic interface design a) either has to be aware that (despite its merits) it comes at the costs of requuiring (additional) encoding options, and/or b) that the general “rule of consistency” [55
] has to be broken for the sake of a more flexible working solution (see Section 5.2
With this paper, we developed visualization options to represent the full width of possible data quality issues in cultural collections in a more systematic and synoptic fashion. According to our best knowledge, no visual interface allows users to inspect more than selected single dimensions of data uncertainty up to now. To facilitate future approaches to a more synoptic representation, we assembled existing uncertainty encoding techniques and discussed how they could be matched with the design space of a cultural collection visualization system. To illustrate associated challenges for complex interface design, we showcased how the design space of the PolyCube system reacts to the increase of uncertainty-related data and visualization complexity. As the PolyCube framework supports the synoptic exploration and experience of rich GLAM-typed data collections from multiple perspectives, it technically also enables novel types of integrated uncertainty representation. Thus, we outlined how to represent data uncertainty with regard to geo-, set-typed-, and relational-temporal collection information in an integrated fashion, but we also identified limits and necessary trade-offs imposed by the rise of visualization and interface complexity.
While our current research centers on related implementation and evaluation efforts, we see a specific demand for future research to study the corresponding costs and benefits of uncertainty visualization with regard to different user groups. While we advocate the deliberate implementation of uncertainty visualization techniques in the expert context, we see convincing reasons to not push the same features for non-expert users, but keep the focus on overview first, and offer further complexity layers on demand. Overall, the study of the outlined design space and its challenges amounts to a complex research program. Yet, the required developments might be worth the effort, as they arguably can contribute to the acceptability and validity of digital methods in the arts and humanities domains. Stepping back, we can situate this endeavor in a general strive for a wider data and representation realism. As realism in the arts is commonly defined as an endeavor to represent subject matters truthfully, free from artistic conventions, or implausible, exotic, and supernatural elements, we see reason to do the same for arts and humanities data domains, where “uncertain” oftentimes is as good as it gets.