Next Article in Journal / Special Issue
Visual Analysis of Stochastic Trajectory Ensembles in Organic Solar Cell Design
Previous Article in Journal / Special Issue
Constructing Interactive Visual Classification, Clustering and Dimension Reduction Models for n-D Data
Article Menu

Export Article

Open AccessArticle
Informatics 2017, 4(3), 24; https://doi.org/10.3390/informatics4030024

Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization

1
Visual Analytics and Imaging Lab, Computer Science Department, Stony Brook University, Stony Brook, NY 11794, USA
2
Chemical and Material Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
3
Imre Consulting, Richland, WA 99352, USA
*
Author to whom correspondence should be addressed.
Academic Editors: Achim Ebert and Gunther H. Weber
Received: 1 June 2017 / Revised: 25 July 2017 / Accepted: 26 July 2017 / Published: 28 July 2017
(This article belongs to the Special Issue Scalable Interactive Visualization)
View Full-Text   |   Download PDF [4613 KB, uploaded 28 July 2017]   |  

Abstract

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded in the data. Knowing the hierarchy enables not only the revelation of the nature of the data, it is also often the first step in big data analytics. However, current algorithms for learning the hierarchy are typically not scalable to large volumes of data with high dimensionality. To tackle this challenge, in this paper, we propose a new scalable approach for constructing the tree structure from data. Our method builds the tree in a bottom-up manner, with adapted incremental k-means. By referencing the distribution of point distances, one can flexibly control the height of the tree and the branching of each node. Dimension reduction is also conducted as a pre-process, to further boost the computing efficiency. The algorithm takes a parallel design and is implemented with CUDA (Compute Unified Device Architecture), so that it can be efficiently applied to big data. We test the algorithm with two real-world datasets, and the results are visualized with extended circular dendrograms and other visualization techniques. View Full-Text
Keywords: data management; hierarchy construction; parallel computing; visualization data management; hierarchy construction; parallel computing; visualization
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Wang, J.; Zelenyuk, A.; Imre, D.; Mueller, K. Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization. Informatics 2017, 4, 24.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Informatics EISSN 2227-9709 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top