Special Issue "Semantics for Big Data Integration"

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: closed (15 September 2018)

Special Issue Editors

Guest Editor
Prof. Maurizio Vincini

Dipartimento di Ingegneria "Enzo Ferrari" – DIEF, Universita' di Modena e Reggio Emilia, Modena, Italy
Website | E-Mail
Interests: data integration; ontology
Guest Editor
Prof. Domenico Beneventano

Dipartimento di Ingegneria "Enzo Ferrari" – DIEF, Universita' di Modena e Reggio Emilia, Modena, Italy
Website | E-Mail
Interests: data integration; ontology

Special Issue Information

Dear Colleagues,

In recent years, there has been a great deal of interest in big data.  Much of the work on big data has focused on volume and velocity in order to consider data set size, but the problems of variety, velocity, and veracity are equally important in dealing with heterogeneity, diversity, and complexity of data. Semantic technologies can be a means to deal with these issues.

Therefore, the purpose of this Special Issue is to publish high-quality research, from academic and industrial stakeholders, for disseminating innovative solutions that explore how big data can leverage semantics, i.e., what are the challenges and opportunities arising from adapting and transferring semantic technologies to the big data context. 

Original, high-quality contributions that have not yet been published, submitted, or are not currently under review by other journals or peer-reviewed conferences are sought. 

Topics of interest include, but are not limited to, the following topics:

  • interplay of semantics and big data 
  • semantic methods and technologies applied to big data dimensions   
  • scalability of semantic methods and technologies 
  • the use of semantic metadata, linked open data and ontologies for big data
  • semantic for big data extraction, transformation and integration
  • Knowledge integration from big data on the Web 
Prof. Maurizio Vincini
Prof. Domencio Beneventano
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 850 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Big data dimensions
  • Big data technology
  • Big data integration
  • Semantic data design
  • Information visibility

Published Papers (5 papers)

View options order results:
result details:
Displaying articles 1-5
Export citation of selected articles as:

Research

Open AccessArticle Integration of Web APIs and Linked Data Using SPARQL Micro-Services—Application to Biodiversity Use Cases
Information 2018, 9(12), 310; https://doi.org/10.3390/info9120310
Received: 9 November 2018 / Revised: 3 December 2018 / Accepted: 3 December 2018 / Published: 6 December 2018
PDF Full-text (1555 KB) | HTML Full-text | XML Full-text
Abstract
In recent years, Web APIs have become a de facto standard for exchanging machine-readable data on the Web. Despite this success, however, they often fail in making resource descriptions interoperable due to the fact that they rely on proprietary vocabularies that lack formal
[...] Read more.
In recent years, Web APIs have become a de facto standard for exchanging machine-readable data on the Web. Despite this success, however, they often fail in making resource descriptions interoperable due to the fact that they rely on proprietary vocabularies that lack formal semantics. The Linked Data principles similarly seek the massive publication of data on the Web, yet with the specific goal of ensuring semantic interoperability. Given their complementary goals, it is commonly admitted that cross-fertilization could stem from the automatic combination of Linked Data and Web APIs. Towards this goal, in this paper we leverage the micro-service architectural principles to define a SPARQL Micro-Service architecture, aimed at querying Web APIs using SPARQL. A SPARQL micro-service is a lightweight SPARQL endpoint that provides access to a small, resource-centric, virtual graph. In this context, we argue that full SPARQL Query expressiveness can be supported efficiently without jeopardizing servers availability. Furthermore, we demonstrate how this architecture can be used to dynamically assign dereferenceable URIs to Web API resources that do not have URIs beforehand, thus literally “bringing” Web APIs into the Web of Data. We believe that the emergence of an ecosystem of SPARQL micro-services published by independent providers would enable Linked Data-based applications to easily glean pieces of data from a wealth of distributed, scalable, and reliable services. We describe a working prototype implementation and we finally illustrate the use of SPARQL micro-services in the context of two real-life use cases related to the biodiversity domain, developed in collaboration with the French National Museum of Natural History. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Figures

Figure 1

Open AccessArticle Chinese Microblog Topic Detection through POS-Based Semantic Expansion
Information 2018, 9(8), 203; https://doi.org/10.3390/info9080203
Received: 25 June 2018 / Revised: 25 July 2018 / Accepted: 8 August 2018 / Published: 10 August 2018
PDF Full-text (2896 KB) | HTML Full-text | XML Full-text
Abstract
A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs.
[...] Read more.
A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. Since traditional methods showed low performance on a short text from a microblog, we put forward a topic detection method based on the semantic description of the microblog post. The semantic expansion of the post supplies more information and clues for topic detection. First, semantic features are extracted from a microblog post. Second, the semantic features are expanded according to a thesaurus. Here TongYiCi CiLin is used as the lexical resource to find words with the same meaning. To overcome the polysemy problem, several semantic expansion strategies based on part-of-speech are introduced and compared. Third, an approach to detect topics based on semantic descriptions and an improved incremental clustering algorithm is introduced. A dataset from Sina Weibo is employed to evaluate our method. Experimental results show that our method can bring about better results both for post clustering and topic detection in Chinese microblogs. We also found that the semantic expansion of nouns is far more efficient than for other parts of speech. The potential mechanism of the phenomenon is also analyzed and discussed. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Figures

Figure 1

Open AccessArticle LOD for Data Warehouses: Managing the Ecosystem Co-Evolution
Information 2018, 9(7), 174; https://doi.org/10.3390/info9070174
Received: 9 June 2018 / Revised: 9 July 2018 / Accepted: 11 July 2018 / Published: 17 July 2018
PDF Full-text (1263 KB) | HTML Full-text | XML Full-text
Abstract
For more than 30 years, data warehouses (DWs) have attracted particular interest both in practice and in research. This success is explained by their ability to adapt to their evolving environment. One of the last challenges for DWs is their
[...] Read more.
For more than 30 years, data warehouses (DWs) have attracted particular interest both in practice and in research. This success is explained by their ability to adapt to their evolving environment. One of the last challenges for DWs is their ability to open their frontiers to external data sources in addition to internal sources. The development of linked open data (LOD) as external sources is an excellent opportunity to create added value and enrich the analytical capabilities of DWs. However, the incorporation of LOD in the DW must be accompanied by careful management. In this paper, we are interested in managing the evolution of DW systems integrating internal and external LOD datasets. The particularity of LOD is that they contribute to evolving the DW at several levels: (i) source level, (ii) DW schema level, and (iii) DW design-cycle constructs. In this context, we have to ensure this co-evolution, as conventional evolution approaches are adapted neither to this new kind of source nor to semantic constructs underlying LOD sources. One way of tackling this co-evolution issue is to ensure the traceability of DW constructs for the whole design cycle. Our approach is tested using: the LUBM (Lehigh University BenchMark), different LOD datasets (DBepedia, YAGO, etc.), and Oracle 12c database management system (DBMS) used for the DW deployment. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Figures

Figure 1

Open AccessArticle High Performance Methods for Linked Open Data Connectivity Analytics
Information 2018, 9(6), 134; https://doi.org/10.3390/info9060134
Received: 9 May 2018 / Revised: 29 May 2018 / Accepted: 29 May 2018 / Published: 3 June 2018
Cited by 1 | PDF Full-text (1604 KB) | HTML Full-text | XML Full-text
Abstract
The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can
[...] Read more.
The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples, Literals, and Schema Elements, while more connections can occur due to equivalence relationships between URIs, such as owl:sameAs, owl:equivalentProperty and owl:equivalentClass, since many publishers use such equivalence relationships, for declaring that their URIs are equivalent with URIs of other datasets. However, there are not available connectivity measurements (and indexes) involving more than two datasets, that cover the whole content (e.g., entities, schema, triples) or “slices” (e.g., triples for a specific entity) of datasets, although they can be of primary importance for several real world tasks, such as Information Enrichment, Dataset Discovery and others. Generally, it is not an easy task to find the connections among the datasets, since there exists a big number of LOD datasets and the transitive and symmetric closure of equivalence relationships should be computed for not missing connections. For this reason, we introduce scalable methods and algorithms, (a) for performing the computation of transitive and symmetric closure for equivalence relationships (since they can produce more connections between the datasets); (b) for constructing dedicated global semantics-aware indexes that cover the whole content of datasets; and (c) for measuring the connectivity among two or more datasets. Finally, we evaluate the speedup of the proposed approach, while we report comparative results for over two billion triples. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Figures

Figure 1

Open AccessArticle A Hybrid Information Mining Approach for Knowledge Discovery in Cardiovascular Disease (CVD)
Information 2018, 9(4), 90; https://doi.org/10.3390/info9040090
Received: 14 March 2018 / Revised: 8 April 2018 / Accepted: 10 April 2018 / Published: 12 April 2018
Cited by 1 | PDF Full-text (1691 KB) | HTML Full-text | XML Full-text
Abstract
The healthcare ambit is usually perceived as “information rich” yet “knowledge poor”. Nowadays, an unprecedented effort is underway to increase the use of business intelligence techniques to solve this problem. Heart disease (HD) is a major cause of mortality
[...] Read more.
The healthcare ambit is usually perceived as “information rich” yet “knowledge poor”. Nowadays, an unprecedented effort is underway to increase the use of business intelligence techniques to solve this problem. Heart disease (HD) is a major cause of mortality in modern society. This paper analyzes the risk factors that have been identified in cardiovascular disease (CVD) surveillance systems. The Heart Care study identifies attributes related to CVD risk (gender, age, smoking habit, etc.) and other dependent variables that include a specific form of CVD (diabetes, hypertension, cardiac disease, etc.). In this paper, we combine Clustering, Association Rules, and Neural Networks for the assessment of heart-event-related risk factors, targeting the reduction of CVD risk. With the use of the K-means algorithm, significant groups of patients are found. Then, the Apriori algorithm is applied in order to understand the kinds of relations between the attributes within the dataset, first looking within the whole dataset and then refining the results through the subsets defined by the clusters. Finally, both results allow us to better define patients’ characteristics in order to make predictions about CVD risk with a Multilayer Perceptron Neural Network. The results obtained with the hybrid information mining approach indicate that it is an effective strategy for knowledge discovery concerning chronic diseases, particularly for CVD risk. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Figures

Figure 1

Back to Top