Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data
AbstractBig geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability. View Full-Text
Share & Cite This Article
Hu, F.; Xu, M.; Yang, J.; Liang, Y.; Cui, K.; Little, M.M.; Lynnes, C.S.; Duffy, D.Q.; Yang, C. Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data. ISPRS Int. J. Geo-Inf. 2018, 7, 144.
Hu F, Xu M, Yang J, Liang Y, Cui K, Little MM, Lynnes CS, Duffy DQ, Yang C. Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data. ISPRS International Journal of Geo-Information. 2018; 7(4):144.Chicago/Turabian Style
Hu, Fei; Xu, Mengchao; Yang, Jingchao; Liang, Yanshou; Cui, Kejin; Little, Michael M.; Lynnes, Christopher S.; Duffy, Daniel Q.; Yang, Chaowei. 2018. "Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data." ISPRS Int. J. Geo-Inf. 7, no. 4: 144.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.