Next Article in Journal
Going beyond the “T” in “CTC”: Social Practices as Care in Community Technology Centers
Next Article in Special Issue
LOD for Data Warehouses: Managing the Ecosystem Co-Evolution
Previous Article in Journal
A Machine Learning Filter for the Slot Filling Task
Previous Article in Special Issue
A Hybrid Information Mining Approach for Knowledge Discovery in Cardiovascular Disease (CVD)
Article Menu

Export Article

Open AccessArticle
Information 2018, 9(6), 134; https://doi.org/10.3390/info9060134

High Performance Methods for Linked Open Data Connectivity Analytics

1
Institute of Computer Science, FORTH-ICS, Heraklion 70013, Greece
2
Department of Computer Science, University of Crete, Heraklion 70013, Greece
*
Authors to whom correspondence should be addressed.
Received: 9 May 2018 / Revised: 29 May 2018 / Accepted: 29 May 2018 / Published: 3 June 2018
(This article belongs to the Special Issue Semantics for Big Data Integration)
Full-Text   |   PDF [1604 KB, uploaded 3 June 2018]   |  

Abstract

The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples, Literals, and Schema Elements, while more connections can occur due to equivalence relationships between URIs, such as owl:sameAs, owl:equivalentProperty and owl:equivalentClass, since many publishers use such equivalence relationships, for declaring that their URIs are equivalent with URIs of other datasets. However, there are not available connectivity measurements (and indexes) involving more than two datasets, that cover the whole content (e.g., entities, schema, triples) or “slices” (e.g., triples for a specific entity) of datasets, although they can be of primary importance for several real world tasks, such as Information Enrichment, Dataset Discovery and others. Generally, it is not an easy task to find the connections among the datasets, since there exists a big number of LOD datasets and the transitive and symmetric closure of equivalence relationships should be computed for not missing connections. For this reason, we introduce scalable methods and algorithms, (a) for performing the computation of transitive and symmetric closure for equivalence relationships (since they can produce more connections between the datasets); (b) for constructing dedicated global semantics-aware indexes that cover the whole content of datasets; and (c) for measuring the connectivity among two or more datasets. Finally, we evaluate the speedup of the proposed approach, while we report comparative results for over two billion triples. View Full-Text
Keywords: content-based connectivity measurements; semantic web; linked data; dataset discovery; information enrichment; LOD scale analytics; lattice of measurements; MapReduce; big data content-based connectivity measurements; semantic web; linked data; dataset discovery; information enrichment; LOD scale analytics; lattice of measurements; MapReduce; big data
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Mountantonakis, M.; Tzitzikas, Y. High Performance Methods for Linked Open Data Connectivity Analytics. Information 2018, 9, 134.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top