Novel Database Systems and Data Mining Algorithms in the Big Data Era

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 March 2021) | Viewed by 12537

Special Issue Editors


E-Mail Website
Guest Editor
Department of Control and Computer Engineering, Polytechnic of Turin, 10129 Turin, Italy
Interests: data mining; big data analytics; classification algorithms

E-Mail Website
Guest Editor
Dipartimento di Informatica, Universita' degli Studi di Verona, Ca' Vignal 2 - Strada le Grazie 15, I-37134 Verona - VR, Italy
Interests: database; big data; graph-based query languages; context-awareness; recommender systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Automation and Computer Science, Polytechnic of Turin, 10129 Turin, Italy
Interests: multidocument text summarization; cross-lingual text analytics; quantative trading systems based on ML; sentiment analysis; vector representations of text and deep natural Language processing; time series analysis and forecasting; anomaly detection from time series data; classification of structured data; itemset mining and association rule discovery; generalized pattern extraction
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The increasing availability of huge amount of data, the so-called big data, possibly produced by devices that define heterogeneous, decentralized and distributed environments, poses new challenges for the database and data mining communities.

First, big data must be efficiently and properly collected, integrated, stored, managed and queried by means of novel database systems. Then, novel scalable data mining and machine learning algorithms can be applied on big data to extract useful, compact, interpretable and actionable knowledge from the collected data useful to improve decision-making processes. Some efforts have been already devoted to address the scalability issues related big data management and analytics. However, more efficient and novel systems and algorithms are still needed. Moreover, other important issues, such as data integration, data tailoring, data contextualization and interpretability are still open research issues in the big data context.

This special issue focuses on the design, implementation and validation of novel database systems and data mining algorithms for big data management and analytics. The special issue covers the entire big data analytics pipeline: from data acquisition to knowledge extraction and exploitation.

The topics of interest include, but are not limited to:

  • NoSQL databases
  • Scalable and distributed frameworks for big data management and analytics
  • Scalable data mining and machine learning algorithms
  • Big data integration
  • Data tailoring for big data
  • Big data analytics and contextual information
  • Big data and interpretability
  • Big heterogeneous data (e.g., textual data, images, videos, social data)
  • Vector representations of text

Prof. Dr. Paolo Garza
Prof. Dr. Elisa Quintarelli
Prof. Dr. Luca Cagliero
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Big data acquisition and storage
  • NoSQL databases
  • Big data integration
  • Data tailoring
  • Scalable and distributed big data frameworks
  • Scalable big data mining algorithms
  • Scalable machine learning algorithms
  • Big data interpretability
  • Context-aware big data analytics
  • Heterogeneous data

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 1707 KiB  
Article
K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm
by Danilo Giordano, Marco Mellia and Tania Cerquitelli
Electronics 2021, 10(10), 1166; https://doi.org/10.3390/electronics10101166 - 13 May 2021
Cited by 8 | Viewed by 3319
Abstract
The increasing capability to collect data gives us the possibility to collect a massive amount of heterogeneous data. Among the heterogeneous data available, time-series represents a mother lode of information yet to be fully explored. Current data mining techniques have several shortcomings while [...] Read more.
The increasing capability to collect data gives us the possibility to collect a massive amount of heterogeneous data. Among the heterogeneous data available, time-series represents a mother lode of information yet to be fully explored. Current data mining techniques have several shortcomings while analyzing time-series, especially when more than one time-series, i.e., multi-dimensional timeseries, should be analyzed together to extract knowledge from the data. In this context, we present K-MDTSC (K-Multi-Dimensional Time-Series Clustering), a novel clustering algorithm specifically designed to deal with multi-dimensional time-series. Firstly, we demonstrate K-MDTSC capability to group multi-dimensional time-series using synthetic datasets. We compare K-MDTSC results with k-Shape, a state-of-art time-series clustering algorithm based on K-means. Our results show both K-MDTSC and k-Shape create good clustering results. However, K-MDTSC outperforms k-Shape when complicating the synthetic dataset. Secondly, we apply K-MDTSC in a real case scenario where we are asked to replace a scheduled maintenance with a predictive approach. To this end, we create a generalized pipeline to process data from a real industrial plant welding process. We apply K-MDTSC to create clusters of weldings based on their welding shape. Our results show that K-MDTSC identifies different welding profiles, but that the aging of the electrode does not negatively impact the welding process. Full article
(This article belongs to the Special Issue Novel Database Systems and Data Mining Algorithms in the Big Data Era)
Show Figures

Figure 1

35 pages, 1404 KiB  
Article
J-CO: A Platform-Independent Framework for Managing Geo-Referenced JSON Data Sets
by Giuseppe Psaila and Paolo Fosci
Electronics 2021, 10(5), 621; https://doi.org/10.3390/electronics10050621 - 7 Mar 2021
Cited by 12 | Viewed by 2074
Abstract
Internet technology and mobile technology have enabled producing and diffusing massive data sets concerning almost every aspect of day-by-day life. Remarkable examples are social media and apps for volunteered information production, as well as Open Data portals on which public administrations publish authoritative [...] Read more.
Internet technology and mobile technology have enabled producing and diffusing massive data sets concerning almost every aspect of day-by-day life. Remarkable examples are social media and apps for volunteered information production, as well as Open Data portals on which public administrations publish authoritative and (often) geo-referenced data sets. In this context, JSON has become the most popular standard for representing and exchanging possibly geo-referenced data sets over the Internet.Analysts, wishing to manage, integrate and cross-analyze such data sets, need a framework that allows them to access possibly remote storage systems for JSON data sets, to retrieve and query data sets by means of a unique query language (independent of the specific storage technology), by exploiting possibly-remote computational resources (such as cloud servers), comfortably working on their PC in their office, more or less unaware of real location of resources. In this paper, we present the current state of the J-CO Framework, a platform-independent and analyst-oriented software framework to manipulate and cross-analyze possibly geo-tagged JSON data sets. The paper presents the general approach behind the J-CO Framework, by illustrating the query language by means of a simple, yet non-trivial, example of geographical cross-analysis. The paper also presents the novel features introduced by the re-engineered version of the execution engine and the most recent components, i.e., the storage service for large single JSON documents and the user interface that allows analysts to comfortably share data sets and computational resources with other analysts possibly working in different places of the Earth globe. Finally, the paper reports the results of an experimental campaign, which show that the execution engine actually performs in a more than satisfactory way, proving that our framework can be actually used by analysts to process JSON data sets. Full article
(This article belongs to the Special Issue Novel Database Systems and Data Mining Algorithms in the Big Data Era)
Show Figures

Figure 1

16 pages, 1689 KiB  
Article
CBase-EC: Achieving Optimal Throughput-Storage Efficiency Trade-Off Using Erasure Codes
by Chuqiao Xiao, Yefeng Xia, Qian Zhang, Xueqing Gong and Liyan Zhu
Electronics 2021, 10(2), 126; https://doi.org/10.3390/electronics10020126 - 8 Jan 2021
Cited by 1 | Viewed by 1809
Abstract
Many distributed database systems that guarantee high concurrency and scalability adopt read-write separation architecture. Simultaneously, these systems need to store massive amounts of data daily, requiring different mechanisms for storing and accessing data, such as hot and cold data access strategies. Unlike distributed [...] Read more.
Many distributed database systems that guarantee high concurrency and scalability adopt read-write separation architecture. Simultaneously, these systems need to store massive amounts of data daily, requiring different mechanisms for storing and accessing data, such as hot and cold data access strategies. Unlike distributed storage systems, the distributed database splits a table into sub-tables or shards, and the request frequency of each sub-table is not the same within a specific time. Therefore, it is not only necessary to design hot-to-cold approaches to reduce storage overhead, but also cold-to-hot methods to ensure high concurrency of those systems. We present a new redundant strategy named CBase-EC, using erasure codes to trade the performances of transaction processing and storage efficiency for CBase database systems developed for financial scenarios of the Bank. Two algorithms are proposed: the hot-cold tablets (shards) recognition algorithm and the hot-cold dynamic conversion algorithm. Then we adopt two optimization approaches to improve CBase-EC performance. In the experiment, we compare CBase-EC with three-replicas in CBase. The experimental results show that although the transaction processing performance declined by no more than 6%, the storage efficiency increased by 18.4%. Full article
(This article belongs to the Special Issue Novel Database Systems and Data Mining Algorithms in the Big Data Era)
Show Figures

Figure 1

22 pages, 2497 KiB  
Article
A Cloud-to-Edge Approach to Support Predictive Analytics in Robotics Industry
by Simone Panicucci, Nikolaos Nikolakis, Tania Cerquitelli, Francesco Ventura, Stefano Proto, Enrico Macii, Sotiris Makris, David Bowden, Paul Becker, Niamh O’Mahony, Lucrezia Morabito, Chiara Napione, Angelo Marguglio, Guido Coppo and Salvatore Andolina
Electronics 2020, 9(3), 492; https://doi.org/10.3390/electronics9030492 - 16 Mar 2020
Cited by 25 | Viewed by 4420
Abstract
Data management and processing to enable predictive analytics in cyber physical systems holds the promise of creating insight over underlying processes, discovering anomalous behaviours and predicting imminent failures threatening a normal and smooth production process. In this context, proactive strategies can be adopted, [...] Read more.
Data management and processing to enable predictive analytics in cyber physical systems holds the promise of creating insight over underlying processes, discovering anomalous behaviours and predicting imminent failures threatening a normal and smooth production process. In this context, proactive strategies can be adopted, as enabled by predictive analytics. Predictive analytics in turn can make a shift in traditional maintenance approaches to more effective optimising their cost and transforming maintenance from a necessary evil to a strategic business factor. Empowered by the aforementioned points, this paper discusses a novel methodology for remaining useful life (RUL) estimation enabling predictive maintenance of industrial equipment using partial knowledge over its degradation function and the parameters that are affecting it. Moreover, the design and prototype implementation of a plug-n-play end-to-end cloud architecture, supporting predictive maintenance of industrial equipment is presented integrating the aforementioned concept as a service. This is achieved by integrating edge gateways, data stores at both the edge and the cloud, and various applications, such as predictive analytics, visualization and scheduling, integrated as services in the cloud system. The proposed approach has been implemented into a prototype and tested in an industrial use case related to the maintenance of a robotic arm. Obtained results show the effectiveness and the efficiency of the proposed methodology in supporting predictive analytics in the era of Industry 4.0. Full article
(This article belongs to the Special Issue Novel Database Systems and Data Mining Algorithms in the Big Data Era)
Show Figures

Figure 1

Back to TopTop