Linked Data interfaces: a survey

Cultural Heritage has become a considerable active area of application of linked data and semantic web technologies. Visualizing and exploring the linked data produced to enhance cultural heritage is one of the significant challenges in the research field of Digital Humanities. Therefore, in this survey, we describe systems developed by the semantic web community in the Web of Linked Data context. We compared twenty-eight interfaces that show both about generals that digital library contents. We classified them by interaction paradigm, type of information displayed and complexity reduction strategies used.


Introduction
Large amounts of cultural heritage linked data are available for research and public use [49,50,51,52].Visualizing and exploring the linked data produced to enhance cultural heritage is one of the significant challenges in the research field of Digital Humanities.Therefore, relevant literature is surveyed for relevant works concerning the visualization and exploration of semantic data as knowledge graphs.We selected twenty-eight tools, considered significant to represent the identified categories of classification.We have tagged the tools primarily for the reference context: digital libraries or generic databases than we classified them according to the type of interaction paradigm used, the type of information displayed and finally, the strategy used to reduce the displayed information.The classification categories are listed in figure 1.In figure 2 are listed the Linked Data (LD) interfaces to which the categories they belong have been assigned.The first category (a) describes the interaction paradigm used, which can be a tabular visualization, node-link visualization or visual query composition; the second category (b) describes the type of information to be displayed: data, model or data to model (namely schema extraction); the third category (c) describes the strategies used to reduce the displayed information: navigational visualization, incremental visualization or summarized visualization; the last category (d) describes the interfaces that deals with digital library contents.bernasconi@diag.uniroma1.it(E.Bernasconi); miguel.ceriani@uniba.it(M.Ceriani); mecella@diag.uniroma1.it(M.Mecella) 0000-0003-3142-3084 (E.Bernasconi); 0000-0002-5074-2112 (M.Ceriani); 0000-0002-9730-8882 (M.Mecella)

Related work
This section introduces some previous work that surveyed and classified LD interfaces.We have listed below some of these by year and description in table 1.In [39], the authors provide an overview of current systems and techniques for visualizing LD but with a less detailed discussion.The authors identified six classification categories, some independent of each other, while we focused on the three categories representing each tool for LD visualization and exploration.In [41], semantic portals have been categorized conceptually into three generations: portals for search and browsing, portals with tools for distant reading and portals for serendipitous knowledge discovery.The authors focused more on the purposes of use than connecting the uses with the primary interfaces' characteristics.[40] overviews 16 existing tools in three categories, classified according to 16 criteria.The first Figure 2: Semantic web interfaces category represents the LD browsers; the second represents domain-specific and cross-domain recommenders; the third represents the exploratory search systems.In [46], the authors perform a focus group examination.They aim to determine the users' needs and system requirements for visualizing LD using the dashboard approach.The authors highlight the heterogeneity of LD and the need for highly customized visualizations for various kinds of LD.In [45], authors have surveyed existing tooling for LD consumption for non-technical end-users and presented general requirements for end-user LD platforms, encompassing a variety of topics, such as dataset discovery and data manipulation.In [47], authors present the available techniques for visualizing large linked datasets and the systems that implement them.They identify seven categories of LD interfaces: semantic browsers; graph-based visualization systems; hierarchical visualization systems; SPARQL endpoint visualization tools; facet browsers; query writers, and schema extraction.Existing works tend to classify link data interfaces by usage purposes and knowledge graph (KG) visualization manners, not categorizing them by characteristics common to all LD exploration and visualization interfaces.For this reason, we propose the analysis of some tools that are representative of each category that belongs to each of them with obviously different specifications.We are talking about the interaction paradigm used, which includes how the KG is shown to the user.The type of information displayed indicates what type of data, models, or information conversion functions the specific interface being analyzed uses.The strategy of reducing the information displayed that each interface uses to overcome the problems of loading the multitude of information contained in the knowledge base that the interfaces query.Below we will analyze each category identified, referring to the analysis of tools that we believe are representative of the paradigm, information, and strategy described.

Visualization of semantic data
The extracted semantics can be extremely useful for exploring a corpus of documents, but they are not fixed and homogeneous like a set of predefined metadata.Therefore, data models and visual user interfaces need to deal with these complex and heterogeneous data.The semantic web [1] and LD [3] efforts deal with data modelling, integration, and interaction of this kind of data on the web.These efforts lately contributed to the emergence of KGs to organize complex data-sets integrating multiple sources [10,6].In a visualization scenario, users are expected to be interested in finding specific resources and exploring something exciting and valuable without knowing what they are looking for until they identify it (serendipity effect).In this case, users perform a sequence of operations (e.g., queries) in which each operation's result determines the next operation's formulation.In this context, many user interfaces for visualization and exploration of KGs exist, and new ones are being developed every year, especially using semantic web and LD technologies [9,5,7,2].We have identified three categories that identify the main characteristics of interfaces for viewing LD.The first category describes the interaction paradigm used, the second category describes the type of information to be displayed; the last category describes the strategies used to reduce the displayed information.

By interaction paradigm
Several tools have been developed offering basic interactive operations that allow users to visually explore KG provided by either a data file or SPARQL endpoint.This subsection lists the tools for viewing LD classification by interaction paradigm.The interaction paradigm concerns using tools and processes to produce a visual representation of the data that can be explored and analyzed directly within the visualization.Different interaction paradigms support different data-driven insights methodologies.We have identified three types of interaction paradigms: • Tabular: In interfaces with a tabular interaction paradigm, information about a single resource is shown in one visualization.Views focus on a table or multiple tables, which show specific properties linked to the asset, such as media files such as photos, descriptions, or links to other linked assets.• Node-link: In the node-link paradigm, resources are represented by nodes or boxes, while triples are represented by arcs that connect the resources.The node-link view can be static or dynamic (the latter allowing interaction).• Visual query composition: The visual query paradigm comprises user interfaces that allow the user to perform advanced queries without necessarily having technical knowledge about the RDF model and the SPARQL language.

Tabular visualization tools
DBpedia1 (fig.3) is a project aiming to extract structured content from the information created in the Wikipedia project 2 .DBpedia is presented in tabular format as a list of triples.The resource page allows users to view the list of all the connections (all the triples in which the resource acts as a subject/object), including inbound and outbound arcs.The user's interaction with this interface provides that all the predicates and objects, when they are resources, can be clicked.That is, they are links (written in blue and underlined) that allow you to jump from one page to another, all resources, therefore, have links that allow exploration of related resources.

Node-Link visualization tools
Aemoo [16] aims to provide additional information than the mere content that can be found in a SPARQL endpoint, it provides exploratory search over the Web.The tool receives information from the DBPedia endpoint and improves that content with information gathered over Wikipedia, Google News3 , and Twitter4 .Its primary focus is to provide information about the resource building a bridge between the Semantic Web and the traditional Web.Aemoo uses Wikipedia as its primary source of information.It collects all Wikilinks connected to that resource and divides them into "set nodes".Each "set node" contains resources of the same class.Finally, it explores Google News and Twitter searching for more information connected to the subject.All the set nodes are displayed on a graph where the core node is the resource of interest.Hovering an entity will show contextual information about the relation between the subject and the hovered element; generally, that information is a sentence found on Wikipedia.Aemoo approach is based on knowledge patterns, which represent the core elements contributing to knowledge about specific events.LODmilla [21,22] shows digital library contents with a node-link interaction paradigm where users can search and extract data associations that are hidden inside the LD with the help of nodes.LODmilla interface does not display the data that is underlying the documents.
ResearchSpace [53] is an open-source, versatile and flexible platform for working with digital cultural heritage data in a linked data environment, allowing for better discoverability and reuse of data.It includes a node-link interaction paradigm with incremental visualization for knowledge exploration.It also enables a collaborative annotation feature to annotate texts or images.

Visual query composition
In a different context, some works study the problem of node-link visual query composition.These tools allow users without prior knowledge of semantic web technologies to express SPARQL queries using visual graph representations easily.SPARQLFilterFlow [37] is a web tool based on a filter/flow model.The tool allows users to formulate SPARQL queries using graphical elements in a tree-based visualization.Similarly, in QueryVOWL [23,24] the user uses visual elements based on VOWL graphical elements to construct graphs that are transformed into SPARQL queries.This tool's primary focus is query construction rather than data exploration.

By type of information
This subsection lists tools classified by the type of information represented: • Data Visualization.Tools made to visualize the actual data.• Model visualization.Tools that show data models (i.e., schemas and ontologies).
• Data to model visualization.Tools that, starting from the data, extract and visualize the underlying data model.
The tools representing each category are described below.

Data visualization
Knowledge exploration tools form a particular category of seeking information on many knowledge bases to reveal related information to the searcher and retrieve what users are looking for.With this search category, the final targets of the search are not known, and the goal itself is not defined.Therefore, a set of additional activities, for instance, learning, exploration and evaluation, are accessible through this category.Ariadne [54] is a digital cultural heritage portal for knowledge exploration in archaeology.It was developed by the European Commission that provides a centralized access point to digital cultural heritage resources across Europe.The Ariadne portal is designed to help users discover and explore a wealth of cultural heritage resources, including collections, digital libraries, archives, and museums.The portal aggregates metadata from these resources and presents it in a consistent, searchable format, allowing users to easily find and access digital cultural heritage content from various sources.Additionally, the Ariadne portal supports linked data principles and provides APIs for integrating the portal into other applications and services.Overall, the Ariadne portal aims to improve access to digital cultural heritage resources and facilitate the discovery, reuse, and preservation of cultural heritage content.

Model visualization
Ontodia [8] is a Web-based ontology, and semantic data set visualization tool with additional functionality in sharing and distributing resulting diagrams.Ontodia utilizes the 2D node-link visualization approach adopting a UML-inspired way of displaying additional information about the node.In terms of layouts, the tool offers force-directed and grid layouts.It includes the hierarchical relationships view that displays the parent-child relationships between classes in a tree layout.Since the tool claims the ability to visualize semantic datasets, it also enables drag-n-dropping instances on the diagram.The view of the diagram could be freely adjusted by a user through drag-n-dropping additional items on the canvas, rearranging them, removing nodes from the graph, and turning links between nodes on and off.Ontodia supports data exploration capability so the user can sort out the nodes related to the selected node.From the instance panel, he can drag-n-drop one or several related nodes on canvas, thus expanding the graph and exploring the ontology.The tool has unique diagram management features that allow users to publish the fixed URL of the diagram on the Web, share it with others via the email address or lock it to themselves.The tool introduces the data source entity, and access to later can be managed similar to controlling access to a diagram.Searching and filtering are fully available for classes, instances and links.Ontodia was designed to simplify the visualization of ontologies and semantic data.For this reason, some of the OWL constructs were omitted, and only the basic ones are kept on the graph.
TopBraid Composer 5 , is designed mainly for ontology editing.Its visualization, a side feature, is not particularly convenient for simple tasks for several reasons.Composer's full version is paid, while Ontodia is a free service, which could be a decisive factor for researchers.Moreover, TopBraid exists in the Eclipse environment, which is not comfortable for nonprogrammers, while Ontodia is deliberately designed for non-programming users.
WebVOWL [25] is a web application for the user-oriented visualization of ontologies.It implements the visual notation for OWL ontologies (VOWL) by providing graphical depictions for elements of OWL that are combined to a force-directed graph layout representing the ontology.Interaction techniques allow users to explore the ontology and customize the visualization.The VOWL visualizations are automatically generated from JSON files into which the ontologies need to be converted.A Java-based OWL2VOWL converter is provided along with WebVOWL.The force-directed graph layout uses a physics simulation where the forces are iteratively applied, resulting in an animation that dynamically positions the graph nodes.The energy of the forces cools down in each iteration, and the layout animation stops automatically after some time to provide a stable graph visualization.WebVOWL renders the graphical elements according to the VOWL specification.

Data to model visualization (schema extraction)
Recently, the problem of data to model visualization, namely the schema extraction, has been examined in the context of LD.VizLOD [35], LD-VOWL [19], and RDF2Graph [36] use SPARQL queries to process RDF triples to infer schema information.The tools process the LD to infer the ontology schema.The tools first identify and present the most representative concepts, using several methods and assumptions, e.g., consider the classes with a more significant number of instances as representative.Then, the ontology schema is (progressively) visualized as a graph, offering several interactive operations.

Complexity reduction strategies
Viewing a large number of data objects is a challenging task.Providing an overview can be extremely difficult, even in small datasets, because handling and visualizing datasets come with information overloading issues.Consequently, visual scalability is a fundamental requirement of modern systems, which must effectively support data reduction/abstraction on many data objects.Two strategies have been identified for reducing the number of information displayed: • Navigational visualization.The visualization in many user interfaces is focused around a specific data object, typically a resource.The user can see the "neighborhood" of the current resource and can navigate to directly related resources.This strategy is often used in the tabular interaction paradigm.
• Incremental visualization.The paradigm of incremental visualization is often adopted in dynamic node-link user interfaces.The user controls a workspace where they can add or remove views of specific data objects from the dataset as needed.There are often shortcuts to visualize data objects related to the objects already in view.• Summarized Visualization.In order to offer an overview of a dataset, while avoiding the problem of overplotting (related to visual information overloading) in large graph visualizations, several tools use data reduction techniques to provide graph summaries.
The tools representing each category are described below.

Navigational Visualization
Far from the paradigm of simple linear search results lists, new and more expressive navigation features, such as node-link views, cluster maps, geographic maps and timelines, support the user in the perception of information.Due to the high diversity of relationships between entities, interfaces must be highly generic.This requirement needs methods to structure information visually based on the user's interests.Therefore, it is necessary to assign specific relevance to related entities.For example, there are more than 600 facts (RDF triples) to view information about the DBpedia entity "Imperatore Augusto" 6 .This amount of information cannot be presented to the user at a glance.Furthermore, each user may have different preferences.Heuristics based on the statistical and semantic analysis of the underlying RDF graph structure are applied to classify related entities based on their relevance.Therefore, relevance rankings need to be customized.User behavior can be monitored by analyzing the log file.With the user's preferences, it is possible to generate a profile and map it to a linked open data sub-chart, representing the user's interests.This allows subjective relevance ranking and personalized search recommendations.

Incremental visualization
Fenfire [33], LodLive [20], and LodView7 are Web exploratory tools that allow users to browse LD using dynamic node-link interaction paradigm.The user can explore LD by following the links starting from a given URI or a SPARQL endpoint.
LodLive is a LD visualization tool that allows the user to explore a dataset starting from a single resource and then to widen the research by expanding the properties of the elements displayed on the screen.LodLive can retrieve information from several endpoints after providing the URI of a resource.It also provides an autocompletion feature to the users for choosing the URI of the resource of interest.Resources are drawn on the screen through circles that contain the value of the property rdfs:label; then, other smaller circles are drawn concentrically.Every circle represents an object connected to the central resource.

Summarized Visualization
H-BOLD [18] (fig.4), the acronym for High-level visualization over Big Linked Open Data, is the successor of LODeX.It aims to help the general user, without any knowledge about SPARQL and the content of a dataset, to explore significant sources of linked open data.For summarized representation and exploration, H-BOLD collects information (number of triples, class list, property list, and class relations) about datasets in a data store and uses that information for generating a "Schema Summary".If the number of classes is high, a community detection algorithm is executed to shrink the graph into the "Cluster Schema".Cluster and Schema summaries can be expanded on-the-fly for detailed visualization of the dataset.The tool can create a fast and compact overview of the content of a SPARQL endpoint.In the same context, RDF4U [26] offers graph visualization over summarized graphs.Researchers noticed that graphs are potent instruments for enhancing human comprehension due to their readability and synthesis, but this is true when graphs display only relevant information.The RDF4U visualization approach is based on combining graph simplification, triple ranking, and property selection.The tool automatically analyzes data collected after querying and searches for redundant information during graph simplification.
The Graphless [17] visualization tool generates summaries based on statistical data, e.g., nodes' connectivity degree and property frequency.

Paths discovery
Some tools focus on analyzing the associations between LD entities.In this context, associations (i.e., relationships) are considered the paths connecting the entities in the LD graph.
Arca [13,14] (fig.5) is a tool for the knowledge exploration of digital libraries.It is a modular tool that includes a knowledge extraction and semantic enrichment engine and an interface for searching and exploring data.The node-link visualization has been selected as a paradigm with a multi-level representation of information.The books and their associated contents are highlighted at the centre of everything.The mode of interaction is incremental concerning the information sought: we pass from the particular to arrive at the general.Interaction supports serendipity, a concept according to which unexpected information can be reached by exploring and navigating the graph.Arca allows the creation of visual queries through the component "trace path" integrated.Trace path allows the achievement of information common to two selected concepts, such as books containing two concepts or concepts common to two books.Furthermore, Arca integrates an association validation system to support a collaborative improvement of data quality [11].RelClus [31] uses class/property hierarchies to generate hierarchies of paths.WiSP [30] (Weighted Shortest Paths for RDF Graphs) uses weighted shortest path algorithms to identify and present the most relevant paths between two resources.The weights are computed based on several metrics, e.g., PageRank and node degree.
The LODVader [32] (LOD Visualization, Analytics and DiscovERy) system visualizes linked datasets, following the graph layout used in the LOD cloud visualization, where the nodes represent the datasets and the edges the links between them.
Yewno Discover [4] (fig. 6) is an integrated system that offers classification and visual exploration of academic materials to help scholars in their research, but is not adaptable and flexible to different contexts of use, except with ad hoc adjustments.Furthermore, in respect to the Arca system, it makes limited use of the KG structure for exploration (node-link interaction paradigm with static view).[15] is a framework that provides a set of reusable and extensible components, application state management, and a read-only API for SPARQL queries, which can be used to create a user interface for a semantic portal.Sampo uses different search paradigms: free-text search, faceted search, geospatial search, and temporal search.It provides the users with different views of the search results in tables, lists, geospatial, or temporal visualizations.Differently from Sampo UI, the system proposed in this thesis, offer also a knowledge extractor service from unstructured data and a semantic enrichment service.

Conclusion
The survey reveals that LD interfaces use different interaction paradigms, each with its strengths.At the moment, few interfaces combine the different paradigms, and this can be a good point to focus on to improve the browsing experience of digital library content.Information reduction strategies are present in most tools and are crucial for exploring complex KGs.However, while there are different tools and paradigms for data exploration, it is an open challenge to support the creation and exploration of data models by domain experts who may not possess a strong technical background on ontologies and RDF.
19th IRCDL (The Conference on Information and Research science Connecting to Digital and Library science), February 23-24, 2023, Bari, Italy * Corresponding author.

Table 1 :
Related work