Next Article in Journal
Three-Body Dispersion Potentials Involving Electric Octupole Coupling
Previous Article in Journal
Symmetry Analysis, Explicit Solutions, and Conservation Laws of a Sixth-Order Nonlinear Ramani Equation
Previous Article in Special Issue
A Quick Gbest Guided Artificial Bee Colony Algorithm for Stock Market Prices Prediction
Article Menu
Issue 8 (August) cover image

Export Article

Open AccessFeature PaperArticle
Symmetry 2018, 10(8), 342; https://doi.org/10.3390/sym10080342

A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark

1
Electrical and Computer Engineering Department, Semnan University, Semnan 35131-1911, Iran
2
Faculty of Electrical and Computer Engineering Department, Semnan University, Semnan 35131-1911, Iran
*
Author to whom correspondence should be addressed.
Received: 12 July 2018 / Revised: 28 July 2018 / Accepted: 13 August 2018 / Published: 15 August 2018
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Full-Text   |   PDF [4317 KB, uploaded 18 August 2018]   |  

Abstract

Unsupervised machine learning and knowledge discovery from large-scale datasets have recently attracted a lot of research interest. The present paper proposes a distributed big data clustering approach-based on adaptive density estimation. The proposed method is developed-based on Apache Spark framework and tested on some of the prevalent datasets. In the first step of this algorithm, the input data is divided into partitions using a Bayesian type of Locality Sensitive Hashing (LSH). Partitioning makes the processing fully parallel and much simpler by avoiding unneeded calculations. Each of the proposed algorithm steps is completely independent of the others and no serial bottleneck exists all over the clustering procedure. Locality preservation also filters out the outliers and enhances the robustness of the proposed approach. Density is defined on the basis of Ordered Weighted Averaging (OWA) distance which makes clusters more homogenous. According to the density of each node, the local density peaks will be detected adaptively. By merging the local peaks, final cluster centers will be obtained and other data points will be a member of the cluster with the nearest center. The proposed method has been implemented and compared with similar recently published researches. Cluster validity indexes achieved from the proposed method shows its superiorities in precision and noise robustness in comparison with recent researches. Comparison with similar approaches also shows superiorities of the proposed method in scalability, high performance, and low computation cost. The proposed method is a general clustering approach and it has been used in gene expression clustering as a sample of its application. View Full-Text
Keywords: distributed data clustering; big data; density-based clustering; density peak detection; gene expression; apache spark; Bayesian locality sensitive hashing; ordered weighted averaging; micro array; scalable clustering distributed data clustering; big data; density-based clustering; density peak detection; gene expression; apache spark; Bayesian locality sensitive hashing; ordered weighted averaging; micro array; scalable clustering
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Hosseini, B.; Kiani, K. A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark. Symmetry 2018, 10, 342.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Symmetry EISSN 2073-8994 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top