Next Article in Journal
Normalized-Mutual-Information-Based Mining Method for Cascading Patterns
Previous Article in Journal
Dynamic Monitoring of Agricultural Fires in China from 2010 to 2014 Using MODIS and GlobeLand30 Data
Article Menu
Issue 10 (October) cover image

Export Article

Open AccessArticle
ISPRS Int. J. Geo-Inf. 2016, 5(10), 173;

Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data

Department of Geography, University of South Carolina, Columbia, SC 29208, USA
Spatiotemporal Innovation Center, George Mason University, Fairfax, VA 22030, USA
Yunnan Provincial Geomatics Center, Kunming 650034, China
Authors to whom correspondence should be addressed.
Academic Editor: Wolfgang Kainz
Received: 8 August 2016 / Revised: 16 September 2016 / Accepted: 20 September 2016 / Published: 27 September 2016
Full-Text   |   PDF [2972 KB, uploaded 27 September 2016]   |  


Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner. View Full-Text
Keywords: geoprocessing; cloud computing; big data; geospatial cyberinfrastructure; Hadoop geoprocessing; cloud computing; big data; geospatial cyberinfrastructure; Hadoop

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Li, Z.; Yang, C.; Liu, K.; Hu, F.; Jin, B. Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data. ISPRS Int. J. Geo-Inf. 2016, 5, 173.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
ISPRS Int. J. Geo-Inf. EISSN 2220-9964 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top