Next Article in Journal
Normalized-Mutual-Information-Based Mining Method for Cascading Patterns
Previous Article in Journal
Dynamic Monitoring of Agricultural Fires in China from 2010 to 2014 Using MODIS and GlobeLand30 Data
Open AccessArticle

Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data

Department of Geography, University of South Carolina, Columbia, SC 29208, USA
Spatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USA
Yunnan Provincial Geomatics Center, Kunming 650034, China
Authors to whom correspondence should be addressed.
Academic Editor: Wolfgang Kainz
ISPRS Int. J. Geo-Inf. 2016, 5(10), 173;
Received: 8 August 2016 / Revised: 16 September 2016 / Accepted: 20 September 2016 / Published: 27 September 2016
Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner. View Full-Text
Keywords: geoprocessing; cloud computing; big data; geospatial cyberinfrastructure; Hadoop geoprocessing; cloud computing; big data; geospatial cyberinfrastructure; Hadoop
Show Figures

Figure 1

MDPI and ACS Style

Li, Z.; Yang, C.; Liu, K.; Hu, F.; Jin, B. Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data. ISPRS Int. J. Geo-Inf. 2016, 5, 173.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

Back to TopTop