Large Scale Geospatial Data Management, Processing and Mining

A special issue of ISPRS International Journal of Geo-Information (ISSN 2220-9964).

Deadline for manuscript submissions: closed (30 September 2021) | Viewed by 22031

Special Issue Editors


E-Mail Website
Guest Editor
Intelligent Technologies Research Center (CiTIUS), University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Interests: spatial databases; sensor data; environmental data infrastructures; scientific data management; smart cities; GIS; approximate processing; spatial query processing for ML

E-Mail Website
Guest Editor
Intelligent Technologies Research Center (CiTIUS), University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Interests: sensor data; scientific data management; data mining; smart cities; industry 4.0; GIS

E-Mail Website
Guest Editor
Escuela Politécnica, Universidad de Extremadura, 10004 Cáceres, Spain
Interests: accuracy data; circular and spherical analysis; fusion data; GIS; geoprocessing; modelling & analysis spatial data; smart cities; remote sensing

Special Issue Information

Dear Colleagues,

The amount of data manipulated by geospatial and environmental applications was traditionally very large. However, nowadays, the data deluge generated by the increasing number of Earth observation networks and modeling infrastructures is accompanied by a huge amount of information collected through Volunteered Geographic Information (VGI) platforms, mobile crowdsensing, and social media. The range of applications that may benefit from the above data is also wide, including urban planning and management in smart cities, smart transportation systems (including autonomous vehicles), environmental monitoring and modeling, risk assessment and management, smart agriculture, cattle rising, fishing and aquiculture, smart energy production and management, cultural heritage, tourism, smart health (disease geospatial distribution), active and healthy ageing and many other. Apart from the problems generated by the huge volume of the data, it is also important to bear in mind that it is highly heterogeneous both in formats and semantics, including both structured geospatial features, coverages and graphs; and unstructured geospatial images, point clouds and text documents.


This special issue explores the main research challenges that arise during the management, processing and mining of the above very large geospatial datasets. In particular, the topics covered include (though are not limited to) geospatial semantic data integration, advanced geospatial data fusion, on-line analytical processing over geospatial data lakes, geospatial data mining, HPC for geoprocessing, raster data management and mining geospatial mobile computing and geospatial natural language processing.


Prof. Dr. José R.R. Viqueira
Prof. Dr. José M. Cotos
Prof. Dr. Aurora Cuartero
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. ISPRS International Journal of Geo-Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Spatial Data Management
  • Spatial Data Mining
  • Large Scale Geospatial Processing
  • Geospatial Data Lakes
  • Geospatial Analytics
  • Raster Data Management
  • Raster Data Mining
  • Geospatial NLP
  • Geospatial Data Fusion
  • Accuracy and Reliability of Data

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 40150 KiB  
Article
Development of Big Data-Analysis Pipeline for Mobile Phone Data with Mobipack and Spatial Enhancement
by Apichon Witayangkurn, Ayumi Arai and Ryosuke Shibasaki
ISPRS Int. J. Geo-Inf. 2022, 11(3), 196; https://doi.org/10.3390/ijgi11030196 - 15 Mar 2022
Viewed by 3608
Abstract
Frequent and granular population data are essential for decision making. Further-more, for progress monitoring towards achieving the sustainable development goals (SDGs), data availability at global scales as well as at different disaggregated levels is required. The high population coverage of mobile cellular signals [...] Read more.
Frequent and granular population data are essential for decision making. Further-more, for progress monitoring towards achieving the sustainable development goals (SDGs), data availability at global scales as well as at different disaggregated levels is required. The high population coverage of mobile cellular signals has been accelerating the generation of large-scale spatiotemporal data such as call detail record (CDR) data. This has enabled resource-scarce countries to collect digital footprints at scales and resolutions that would otherwise be impossible to achieve solely through traditional surveys. However, using such data requires multiple processes, algorithms, and considerable effort. This paper proposes a big data-analysis pipeline built exclusively on an open-source framework with our spatial enhancement library and a proposed open-source mobility analysis package called Mobipack. Mobipack consists of useful modules for mobility analysis, including data anonymization, origin–destination extraction, trip extraction, zone analysis, route interpolation, and a set of mobility indicators. Several implemented use cases are presented to demonstrate the advantages and usefulness of the proposed system. In addition, we explain how a large-scale data platform that requires efficient resource allocation can be con-structed for managing data as well as how it can be used and maintained in a sustainable manner. The platform can further help to enhance the capacity of CDR data analysis, which usually requires a specific skill set and is time-consuming to implement from scratch. The proposed system is suited for baseline processing and the effective handling of CDR data; thus, it allows for improved support and on-time preparation. Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

18 pages, 3458 KiB  
Article
Spatial Data Sequence Selection Based on a User-Defined Condition Using GPGPU
by Driss En-Nejjary, François Pinet and Myoung-Ah Kang
ISPRS Int. J. Geo-Inf. 2021, 10(12), 816; https://doi.org/10.3390/ijgi10120816 - 2 Dec 2021
Viewed by 2097
Abstract
The size of spatial data is growing intensively due to the emergence of and the tremendous advances in technology such as sensors and the internet of things. Supporting high-performance queries on this large volume of data becomes essential in several data- and compute-intensive [...] Read more.
The size of spatial data is growing intensively due to the emergence of and the tremendous advances in technology such as sensors and the internet of things. Supporting high-performance queries on this large volume of data becomes essential in several data- and compute-intensive applications. Unfortunately, most of the existing methods and approaches are based on a traditional computing framework (uniprocessors) which makes them not scalable and not adequate to deal with large-scale data. In this work, we present a high-performance query for massive spatio–temporal data. The query consists of selecting fixed size raster subsequences, based on the average of their region of interest, from a spatio–temporal raster sequence satisfying a user threshold condition. In our paper, for the purpose of simplification, we consider that the region of interest is the entire raster and not only a subregion. Our aim is to speed up the execution using parallel primitives and pure CUDA. Furthermore, we propose a new method based on a sorting step to save computations and boost the speed of the query execution. The test results show that the proposed methods are faster and good performance is achieved even with large-scale rasters and data. Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

31 pages, 3788 KiB  
Article
Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark
by Panagiotis Moutafis, George Mavrommatis, Michael Vassilakopoulos and Antonio Corral
ISPRS Int. J. Geo-Inf. 2021, 10(11), 763; https://doi.org/10.3390/ijgi10110763 - 11 Nov 2021
Cited by 4 | Viewed by 2130
Abstract
Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge. Apache Spark is a memory-based framework suitable for real-time and batch processing. Spark-based systems allow users to [...] Read more.
Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge. Apache Spark is a memory-based framework suitable for real-time and batch processing. Spark-based systems allow users to work on distributed in-memory data, without worrying about the data distribution mechanism and fault-tolerance. Given two datasets of points (called Query and Training), the group K nearest-neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been actively studied in centralized environments and several performance improving techniques and pruning heuristics have been also proposed, while, a distributed algorithm in Apache Hadoop was recently proposed by our team. Since, in general, Apache Hadoop exhibits lower performance than Spark, in this paper, we present the first distributed GKNN query algorithm in Apache Spark and compare it against the one in Apache Hadoop. This algorithm incorporates programming features and facilities that are specific to Apache Spark. Moreover, techniques that improve performance and are applicable in Apache Spark are also incorporated. The results of an extensive set of experiments with real-world spatial datasets are presented, demonstrating that our Apache Spark GKNN solution, with its improvements, is efficient and a clear winner in comparison to processing this query in Apache Hadoop. Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

11 pages, 13870 KiB  
Communication
Performance Evaluation of Parallel Structure from Motion (SfM) Processing with Public Cloud Computing and an On-Premise Cluster System for UAS Images in Agriculture
by Anjin Chang, Jinha Jung, Jose Landivar, Juan Landivar, Bryan Barker and Rajib Ghosh
ISPRS Int. J. Geo-Inf. 2021, 10(10), 677; https://doi.org/10.3390/ijgi10100677 - 7 Oct 2021
Cited by 3 | Viewed by 2290
Abstract
Thanks to sensor developments, unmanned aircraft system (UAS) are the most promising modern technologies used to collect imagery datasets that can be utilized to develop agricultural applications in these days. UAS imagery datasets can grow exponentially due to the ultrafine spatial and high [...] Read more.
Thanks to sensor developments, unmanned aircraft system (UAS) are the most promising modern technologies used to collect imagery datasets that can be utilized to develop agricultural applications in these days. UAS imagery datasets can grow exponentially due to the ultrafine spatial and high temporal resolution capabilities of UAS and sensors. One of the main obstacles to processing UAS data is the intensive computational resource requirements. The structure from motion (SfM) is the most popular algorithm to generate 3D point clouds, orthomosaic images, and digital elevation models (DEMs) in agricultural applications. Recently, the SfM algorithm has been implemented in parallel computing to process big UAS data faster for certain applications. This study evaluated the performance of parallel SfM processing on public cloud computing and on-premise cluster systems. The UAS datasets collected over cropping fields were used for performance evaluation. We used multiple computing nodes and centralized network storage with different network environments for the SfM workflow. In single-node processing, an instance with the most computing power in the cloud computing system performed approximately 20 and 35 percent faster than in the most powerful machine in the on-premises cluster. The parallel processing results showed that the cloud-based system performed better in speed-up and efficiency metrics for scalability, although the absolute processing time was faster in the on-premise cluster. The experimental results also showed that the public cloud computing system could be a good alternative computing environment in UAS data processing for agricultural applications. Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

16 pages, 4104 KiB  
Article
MDST-DBSCAN: A Density-Based Clustering Method for Multidimensional Spatiotemporal Data
by Changlock Choi and Seong-Yun Hong
ISPRS Int. J. Geo-Inf. 2021, 10(6), 391; https://doi.org/10.3390/ijgi10060391 - 6 Jun 2021
Cited by 12 | Viewed by 4329
Abstract
The increasing use of mobile devices and the growing popularity of location-based ser-vices have generated massive spatiotemporal data over the last several years. While it provides new opportunities to enhance our understanding of various urban dynamics, it poses challenges at the same time [...] Read more.
The increasing use of mobile devices and the growing popularity of location-based ser-vices have generated massive spatiotemporal data over the last several years. While it provides new opportunities to enhance our understanding of various urban dynamics, it poses challenges at the same time due to the complex structure and large-volume characteristic of the spatiotemporal data. To facilitate the process and analysis of such spatiotemporal data, various data mining and clustering methods have been proposed, but there still needs to develop a more flexible and computationally efficient method. The purpose of this paper is to present a clustering method that can work with large-scale, multidimensional spatiotemporal data in a reliable and efficient manner. The proposed method, called MDST-DBSCAN, is applied to idealized patterns and a real data set, and the results from both examples demonstrate that it can identify clusters accurately within a reasonable amount of time. MDST-DBSCAN performs well on both spatial and spatiotemporal data, and it can be particularly useful for exploring massive spatiotemporal data, such as detailed real estate transactions data in Seoul, Korea. Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

23 pages, 2519 KiB  
Article
Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)
by Francisco M. Bellas Aláez, Jesus M. Torres Palenzuela, Evangelos Spyrakos and Luis González Vilas
ISPRS Int. J. Geo-Inf. 2021, 10(4), 199; https://doi.org/10.3390/ijgi10040199 - 25 Mar 2021
Cited by 4 | Viewed by 2822
Abstract
This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms [...] Read more.
This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system. Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

24 pages, 5973 KiB  
Article
Accessible Routes Integrating Data from Multiple Sources
by Miguel R. Luaces, Jesús A. Fisteus, Luis Sánchez-Fernández, Mario Munoz-Organero, Jesús Balado, Lucía Díaz-Vilariño and Henrique Lorenzo
ISPRS Int. J. Geo-Inf. 2021, 10(1), 7; https://doi.org/10.3390/ijgi10010007 - 26 Dec 2020
Cited by 9 | Viewed by 3406
Abstract
Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous [...] Read more.
Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous data sources. In this paper, we propose and validate the architecture of an information system that creates an accessibility data model for cities by ingesting data from different types of sources and provides an application that can be used by people with different abilities to compute accessible routes. The article describes the processes that allow building a network of pedestrian infrastructures from the OpenStreetMap information (i.e., sidewalks and pedestrian crossings), improving the network with information extracted obtained from mobile-sensed LiDAR data (i.e., ramps, steps, and pedestrian crossings), detecting obstacles using volunteered information collected from the hardware sensors of the mobile devices of the citizens (i.e., ramps and steps), and detecting accessibility problems with software sensors in social networks (i.e., Twitter). The information system is validated through its application in a case study in the city of Vigo (Spain). Full article
(This article belongs to the Special Issue Large Scale Geospatial Data Management, Processing and Mining)
Show Figures

Figure 1

Back to TopTop