# Model of Point Cloud Data Management System in Big Data Paradigm

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Apache Spark Framework

#### 2.2. Study Area

#### 2.3. Data-Processing Module Architecture

#### 2.3.1. Logical Data Model

#### 2.3.2. Physical Data Model

#### 2.3.3. User Defined Types (UDT)

#### 2.3.4. Range Queries

_{min}≤ Z

_{point}≤ Z

_{max}, where Z

_{min}and Z

_{max}, are the smallest and highest value of Morton code that is covered by the given range. However, this also includes the large number of points that do not belong to the given range, which slows down the determination of the final result set.

- the range of spatial query circuits are produced on shorter sequences containing only codes within a given range (Figure 8)(Z
_{min}, Z_{max}) ≥ ((Z^{1}_{min}, Z^{1}_{max}), (Z^{2}_{min}, Z^{2}_{max}), …, (Z^{m}_{min}, Z^{m}_{max})) - if Z
^{i+1}_{min}− Z^{i}_{max}< delta arrays are merged to reduce the number of ranges within which the search will be performed. In this way, the set of candidates does not increase drastically, and is obtained at the speed of execution ((Z^{1}_{min}, Z^{1}_{max}), (Z^{2}_{min}, Z^{2}_{max}), …, (Z^{m}_{min}, Z^{m}_{max})) ≥ ((Z^{1}_{min}, Z^{1}_{max}), (Z^{2}_{min}, Z^{2}_{max}), …, (Z^{k}_{min}, Z^{k}_{max})), where m is the number of ranges in a query region, k is the number of ranges after merging, and k < m. - The join operation is performed on all points and ranges with the condition that Z
^{i}_{min}≤ Z_{point}≤ Z^{i}_{max}, where i = 1 to k. - Coordinates of the points from the candidate set are compared with the coordinates of the range query in order to get the final result.

#### 2.3.5. kNN

- An array of points was sorted according to the rising value of the Morton code.
- The RDD partitions were created using the Custom Range Partitioner that provided overlap between partitions for a × k rows. The constant a enables trade-offs between accuracy and execution speed. The larger value provides better accuracy but leads to slower execution.
- Using the MapPartitions operation on those partitions, a new RDD containing pairs (point, set of candidate points) were created for each point of the initial RDD, in the form (P
_{i}, (P_{i}_{−a×k}, P_{i}_{−a×k+1}, …, P_{i}_{−1}, P_{i}_{+1}, …, P_{i}_{+a×k−1}, P_{i}_{+a×k})). - The MapPartitions operation was then performed on the newly created RDD to sort the candidate points according to the distance from the point Pi and return k nearest points.

#### 2.3.6. Improving Results Using Multiple Z-order Curves

^{i}

_{shift}, Y

^{i}

_{shift}, Z

^{i}

_{shift}) and/or rotating over the origin for randomly generated angles (W

^{i}, F

^{i}, K

^{i}), where 1 ≤ i ≤ 4. After that, Morton codes were generated for every set. Finally, a set of candidate points was selected from the set which gives the shortest array of ranges in the query region and because of that provides the lowest execution time.

## 3. Results

#### 3.1. Experimental Platforms

- PostrgeSQL (version 9.5.10) flat table where every point represents a single row.
- Apache Spark (version 1.6.2).

#### 3.2. Data Description

#### 3.3. PostgreSQL

#### 3.4. Apache Spark

#### 3.5. Query Evaluation

#### 3.5.1. PostgreSQL Query

#### 3.5.2. Apache Spark Query

#### 3.6. Query Performance

## 4. Discussion

## 5. Conclusions

- Extension of the Morton code index beyond 64-bit encoding in order to cover larger areas and increase coordinate precision.
- Research on using more than three dimensions in space filling curve indexing. For example, four-dimensional space for dynamic point clouds.
- Integration of vector geospatial data and implementation of spatial joins with point clouds.
- Feature extraction operations in order to produce elements for definition of CityGML structures, such as roofs, walls, city furniture, trees, etc.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Microsoft. Kinect. 2014. Available online: https://www.microsoft.com/en-us/kinectforwindows/ (accessed on 15 May2017).
- Eldawy, A.; Mokbel, M.F. The Era of Big Spatial Data. In Proceedings of the International Workshop of Cloud Data Management CloudDM 2015 Co-Located with ICDE 2015, Seoul, Korea, 13–17 April 2015. [Google Scholar]
- Suijker, P.M.; Alkemade, I.; Kodde, M.P.; Nonhebel, A.E. User Requirements Massive Point Clouds for eSciences (WP1); Technical Report; Delft University of Technology: Delft, The Netherdlands, 2014. [Google Scholar]
- van Oosterom, P.; Martinez-Rubi, O.; Ivanova, M.; Horhammer, M.; Geringer, D.; Ravada, S.; Tijssen, T.; Kodde, M.; Gonalves, R. Massive point cloud data management: Design, implementation and execution of a point cloud benchmark. Comput. Graph.
**2015**, 49, 92–125. [Google Scholar] [CrossRef] - Yu, J.; Wu, J.; Sarwat, M. GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data. In Proceedings of the ACM International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL GIS 2015, Seattle, WA, USA, 3–6 November 2015. [Google Scholar]
- Emerging Tech. 2016. Available online: https://gcn.com/blogs/emerging-tech/2016/03/geomesa-cloud-gis.aspx (accessed on 1 April 2018).
- GeoWave User Guide. Available online: https://locationtech.github.io/geowave/userguide.html (accessed on 1 April 2018).
- Zhou, H.; Yiran, C.; Lin, W.; Xia, P. GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark. ISPRS Int. J. Geo-Inf.
**2017**, 6, 285. [Google Scholar] [CrossRef] - Boehm, J. File-centric organization of large LiDAR Point Clouds in a Big Data context. In Proceedings of the IQmulus Workshop—Processing Large Geospatial Data, Cardiff, UK, 8 July 2014. [Google Scholar]
- Boehm, J.; Liu, K.; Alis, C. Sideloading—Ingestion of large point clouds into the Apache Spark big data engine. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2016**, XLI-B2, 343–348. [Google Scholar] [CrossRef] - Alis, C.; Boehm, J.; Liu, K. Parallel processing of big point clouds using Z-Order-based partitioning. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2016**, 41, 71–77. [Google Scholar] [CrossRef] - Liu, K.; Boehm, J. Classification of big point cloud data using cloud computing. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2015**, XL-3/W3. [Google Scholar] [CrossRef] - Boehm, J.; Brédif, M.; Gierlinger, T.; Krämer, M.; Lindenbergh, R.; Liu, K.; Oberste-Dommes, F.; Sirmacek, B. The IQmulus urban showcase: Automatic tree classification and identification in huge mobile mapping point clouds. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2016**, XLI-B3, 301–307. [Google Scholar] [CrossRef] - Liu, K.; Boehm, J.; Alis, C. Change detection of mobile LIDAR data using cloud computing. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2016**, XLI-B3, 309–313. [Google Scholar] [CrossRef] - Dean, J.; Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, 6–8 December 2004. [Google Scholar]
- Isargand, M.; Yu, Y. Distributed Data-Parallel Computing Using a High-Level Programming Language; SIGMOD: Providence, RI, USA, 2009. [Google Scholar]
- Amović, M.; Pajić, V.; Govedarica, M.; Vasiljević, S. Spatio-temporal types of data in big data paradigm. In Proceedings of the IFKAD 2016, Towards a New Architecture of Knowledge: Big Data, Culture and Creativity, Dresden, Germany, 15–17 June 2016; pp. 466–480, ISBN 978-88-96687-09-3. [Google Scholar]
- Abaker, I.; Hashema, T.; Changb, V.; Anuara, N.B.; Adewolea, K.; Yaqooba, I.; Gania, A.; Ahmeda, E.; Chiromac, H. The role of big data in smart city. Int. J. Inf. Manag.
**2016**, 36, 748–758. [Google Scholar] [Green Version] - Building Smart Cities Through Integrated GIS. Available online: https://www.hexagongeospatial.com/industries/smart-cities (accessed on 11 January 2018).
- Ambrust, M.; Xin, R.S.; Lian, C.; Huai, Y.; Davies, L.; Bradley, J.K.; Mneg, X.; Kaftan, T.; Franklin, M.J.; Ghodsi, A.; et al. SPARK SQL: Relational Data Processing in Spark. In Proceedings of the SIGMOD’15 International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015. [Google Scholar]
- Karau, H.; Konwinski, A.; Wendell, P.; Zaharia, M. Learning SPARK; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2015; ISBN 9781449358624. [Google Scholar]
- Megiddo, N.; Shaft, U. Efficient nearest neighbor indexing based on a collection of space filling. RJ 10093 (91909). Math. Comp. Sci.
**1997**. [Google Scholar]

CPU | RAM | HDD |
---|---|---|

INTEL Core i7-6700 3.4 GHz | 16 GB, DDR 4, 3200 MHz | 1 TB, 7200 rpm |

PostgreSQL | Apache Spark | |
---|---|---|

Data Set 1 | 7038 ms | 2307 ms |

Data Set 2 | 4519 ms | 1807 ms |

Data Set 3 | 10,612 ms | 2614 ms |

Cluster Size | Data Set 1 |
---|---|

3 nodes | 3620 ms |

5 nodes | 2307 ms |

9 nodes | 1477 ms |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pajić, V.; Govedarica, M.; Amović, M.
Model of Point Cloud Data Management System in Big Data Paradigm. *ISPRS Int. J. Geo-Inf.* **2018**, *7*, 265.
https://doi.org/10.3390/ijgi7070265

**AMA Style**

Pajić V, Govedarica M, Amović M.
Model of Point Cloud Data Management System in Big Data Paradigm. *ISPRS International Journal of Geo-Information*. 2018; 7(7):265.
https://doi.org/10.3390/ijgi7070265

**Chicago/Turabian Style**

Pajić, Vladimir, Miro Govedarica, and Mladen Amović.
2018. "Model of Point Cloud Data Management System in Big Data Paradigm" *ISPRS International Journal of Geo-Information* 7, no. 7: 265.
https://doi.org/10.3390/ijgi7070265