Next Article in Journal
Combining Partial Least Squares and the Gradient-Boosting Method for Soil Property Retrieval Using Visible Near-Infrared Shortwave Infrared Spectra
Next Article in Special Issue
SparkCloud: A Cloud-Based Elastic Bushfire Simulation Service
Previous Article in Journal
Characterizing the Seasonal Crustal Motion in Tianshan Area Using GPS, GRACE and Surface Loading Models
Previous Article in Special Issue
A Flexible Algorithm for Detecting Challenging Moving Objects in Real-Time within IR Video Sequences
Article Menu
Issue 12 (December) cover image

Export Article

Open AccessArticle
Remote Sens. 2017, 9(12), 1301; doi:10.3390/rs9121301

Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform

1
School of Resources & Environment, University of Electronic Science and Technology of China, 2006 Xiyuan Ave., West Hi-Tech Zone, Chengdu 611731, China
2
Institute of Remote Sensing Big Data, Big Data Research Center, University of Electronic Science and Technology of China, 2006 Xiyuan Road, West Hi-Tech Zone, Chengdu 611731, China
3
Texas A&M Engineering Experiment Station and High Performance Research Computing, Texas A&M University, College Station, TX 77843, USA
4
Key Laboratory of Spatial Data Mining & Information Sharing of Ministry of Education, Fuzhou University, No. 2 Xueyuan Road, Fuzhou University New District, Fuzhou 350116, China
5
International School of Software, Wuhan University, 129 Luoyu Road, Wuhan 430079, China
6
School of Computer Science, China University of Geosciences, Wuhan 430074, China
7
Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 10094, China
*
Authors to whom correspondence should be addressed.
Received: 27 October 2017 / Revised: 1 December 2017 / Accepted: 8 December 2017 / Published: 12 December 2017

Abstract

Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale. View Full-Text
Keywords: spatial data mining; DBSCAN algorithm; parallel computing; spark platform; traffic congestion area discovery spatial data mining; DBSCAN algorithm; parallel computing; spark platform; traffic congestion area discovery
Figures

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Huang, F.; Zhu, Q.; Zhou, J.; Tao, J.; Zhou, X.; Jin, D.; Tan, X.; Wang, L. Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform. Remote Sens. 2017, 9, 1301.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top