Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm
Abstract
:1. Introduction
- (1)
- Confronting the limitation of traditional spatial indexing, which excludes the temporal dimension, we utilize the joint B+ tree to index the temporal dimension, thereby facilitating multidimensional spatio-temporal queries;
- (2)
- We capitalize on the micro-clusters generated by the CluStream algorithm in our stream processing stage. In combination with the B+ tree, we construct in-memory indexes to satisfy the necessities of real-time geohazard data stream monitoring and enable a rapid response during the warning process;
- (3)
- We leverage the Hilbert-R tree enhanced with the CluStream data stream clustering algorithm to preprocess multidimensional spatial data. This strategy serves to minimize the areas of node MBRs and reduce their similarities, thus avoiding excessive overlap between MBRs and unnecessary multi-path retrieval during querying processes;
- (4)
- Employing the open-source columnar database, HBase, within the Hadoop big data processing framework, we achieve efficient storage of geohazard data.
2. Model Overview
- (1)
- The Hilbert-R tree serves as the principal component for facilitating spatio-temporal queries, executing spatial dimension queries based on the spatial coordinates of the objects under consideration.
- (2)
- The CluStream algorithm processes spatial objects in the leaf nodes of a Hilbert-R tree. This technique streamlines the clustering of spatial datasets and minimizes node overlap, as well as dead space.
- (3)
- The B+ tree is employed for indexing the time dimension of the BCHR tree, enabling the filtering of temporal information during spatio-temporal queries.
- (4)
- The Rowkey of HBase is designed for data querying, and is a composite of the Hilbert code and Unix timestamp in this study. Utilizing the query results from both the B+ tree and the Hilbert-R tree, the Rowkey can directly pinpoint the location of data in HBase and identify the data needed to meet the query parameters.
3. Indexing Implementation Details
3.1. B+ Tree
3.2. Hilbert-R Tree
3.2.1. CluStream Algorithm
3.2.2. Hilbert-R Tree Optimized Based on the CluStream Algorithm
3.2.3. Algorithm for Generating the BCHR Tree
Algorithm 1: GenerateBCHRTree |
Require: Dataset S, Time segmentation T, Number of clusters K, Maximum capacity of a node M |
Ensure: Hilbert-R Tree |
Initialize an empty tree T |
for each object o in S do |
Compute its MBR and the center of its MBR |
Use CluStream algorithm to generate K clusters based on these centers |
Calculate all objects’ Hilbert values from their MBR centers in time frames (T0 to T1 for micro-clusters and T1 to Tn for macro clusters) |
end for |
for each cluster c in Clusters do |
Calculate the Hilbert value of the cluster center |
if number of space objects in c ≤ M then |
Create a leaf node by all space objects in |
else |
Sort all the space objects according to their Hilbert values in ascending order |
Create groups containing M space objects (the last group may have objects < M) |
Create a leaf node for each grouped space objects |
end if |
Sort the leaf nodes to be inserted into the Hilbert R-tree based on Hilbert values |
end for |
From bottom-up, construct the Hilbert R-tree using the sorted leaf nodes |
return T |
3.2.4. Spatio-Temporal Range Query Algorithm for BCHR Tree
- (1)
- Spatial scope query
Algorithm 2: IntersectSearch |
Result: results |
Input: n (Node), bb (Bounding Box) |
Output: results |
results = {}; // An empty list to hold Rectangles found within ‘bb’. |
for each entry e within the node n do |
Get the Minimum Bounding Rectangle (MBR) for the entry e. |
if bb intersects with the MBR of the entry e then |
if n is a leaf node then |
results = results ∪ e; |
else |
results = results ∪ IntersectSearch(e.node, bb); |
end if |
end if |
end for |
return results; |
- (2)
- Real-time temporal and spatial range queries
Algorithm 3: RealTimeSpatialTemporalQuery |
Result: finalResults |
Input: bb (Bounding Box), t1, t2 |
Initialize microClusters ← Read from CluStream algorithm |
Initialize HilbertRT ← Constructed by microClusters |
Initialize BplusTree ← Constructed by (time, microCluster) pairs from microClusters |
Initialize timeResults ← searchRange(BplusTree, t1, t) |
Initialize spaceResults ← searchIntersect(HilbertRT, bb) |
Initialize finalResults ← Ø |
for eachresultintimeResults do |
If result ∈ spaceResults then |
finalResults ← finalResults ∪ result |
end |
end |
return finalResults |
4. Experiments
4.1. Experimental Design
Algorithm 4: GenerateGeologicalDisasterData |
Result: geodisaster_dataset |
Input: Defined_hazards, total_num_of_disasters, timerange, coordinaterange, hazard_prone_locations |
Initialize geodisaster dataset to empty list |
for counter < total_num_of_disasters do |
Randomly select a hazard from Defined_hazards |
Generate random Disaster Time within timerange and convert to Unix Timestamp |
Generate random Longitude and Latitude within coordinaterange |
Convert Longitude and Latitude to decimal |
Multiply by 1,000,000 to avoid floating point and ensure meter level accuracy |
If current location is in hazard_prone_locations or its surrounding coordinates then |
Increase the probability of this Disaster Type |
end If |
Add record (Disaster Type, Disaster Time, Timestamp, Longitude, Latitude) to geodisaster_dataset |
Increment counter |
end for |
return geodisaster_dataset |
4.2. Comparative Experiments and Analysis of Results
4.2.1. Performance of Real-Time Spatio-Temporal Query
4.2.2. Performance of Spatial Range Queries
4.2.3. Performance of Index Insertion
5. Conclusions and Outlook
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yunnan Provincial Government. Yunnan Province Geological Hazard Prevention and Control ‘14th Five-Year Plan’ (2021–2025). 30 August 2022. Available online: http://dnr.yn.gov.cn/html/2022/dizaifangzhi_0830/33678.html (accessed on 1 October 2023).
- Zhang, Y.; Zhang, A.; Gao, M. Research on Three-Dimensional Electronic Navigation Chart Hybrid Spatial Index Structure Based on Quadtree and R-Tree. ISPRS Int. J. Geo-Inf. 2022, 11, 319. [Google Scholar] [CrossRef]
- Liu, X.; Deng, Y.; Ni, Y. FastTree: A hardware KD-tree construction acceleration engine for real-time ray tracing. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2015; pp. 1595–1598. [Google Scholar] [CrossRef]
- Guttman, A. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 18–21 June 1984; pp. 47–57. [Google Scholar] [CrossRef]
- Park, K. Location-based grid-index for spatial query processing. Expert Syst. Appl. 2014, 41, 1294–1300. [Google Scholar] [CrossRef]
- Dusia, A.; Sethi, A.S. Recent advances in fault localization in computer networks. IEEE Commun. Surv. Tutor. 2016, 18, 3030–3051. [Google Scholar] [CrossRef]
- łgorzata Steinder, M.; Sethi, A.S. A survey of fault localization techniques in computer networks. Sci. Comput. Program. 2004, 53, 165–194. [Google Scholar] [CrossRef]
- Zeydan, E.; Yabas, U.; Sözüer, S. Streaming alarm data analytics for mobile service providers. In Proceedings of the NOMS 2016-2016 IEEE/IFIP Network Operations and Management Symposium, Istanbul, Turkey, 25–29 April 2016; pp. 1021–1022. [Google Scholar] [CrossRef]
- Sharifzadeh, M.; Shahabi, C. VoR-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proc. VLDB Endow. 2010, 3, 1231–1242. [Google Scholar] [CrossRef]
- Yang, Y.; Bai, P.; Ge, N. LAZY R-tree: The R-tree with lazy splitting algorithm. J. Inf. Sci. 2020, 46, 243–257. [Google Scholar] [CrossRef]
- Macyna, W.; Majcher, K. Cost-based storage of the R-tree aggregated values over flash memory. In Proceedings of the 2018 International Conference on Industrial Enterprise and System Engineering (ICoIESE 2018), Johor, Malaysia, 16–17 July 2018; Atlantis Press: Amsterdam, The Netherlands, 2019; pp. 97–102. [Google Scholar] [CrossRef]
- Wang, X.; Meng, W.; Zhang, M. A novel information retrieval method based on R-tree index for smart hospital information system. Int. J. Adv. Comput. Res. 2019, 9, 133–145. [Google Scholar] [CrossRef]
- Hong, Y.; Tang, Q.; Gao, X. Efficient R-tree based indexing scheme for server-centric cloud storage system. IEEE Trans. Knowl. Data Eng. 2016, 28, 1503–1517. [Google Scholar] [CrossRef]
- Yuan, S.; Pi, D.; Zhao, X. Differential privacy trajectory data protection scheme based on R-tree. Expert Syst. Appl. 2021, 182, 115215. [Google Scholar] [CrossRef]
- Goyal, P.; Challa, J.S.; Kumar, D. Grid-R-tree: A data structure for efficient neighborhood and nearest neighbor queries in data mining. Int. J. Data Sci. Anal. 2020, 10, 25–47. [Google Scholar] [CrossRef]
- He, Q.; Chen, Y.; Dong, Q. Mining moving object gathering pattern based on resilient distributed datasets and R-tree index. Neurocomputing 2020, 393, 194–202. [Google Scholar] [CrossRef]
- Huang, D.; Sun, L.; Zhao, D. Research on Ocean Big Data Indexing Technology Based on ADMD Fusion Strategy. J. Univ. Sci. Technol. 2015, 10, 813–821. [Google Scholar] [CrossRef]
- Zhang, M.; Lu, F.; Shen, P. Evolution and Development of the R-Tree Family. J. Comput. 2005, 28, 289–300. [Google Scholar]
- Kamel, I.; Faloutsos, C. Hilbert R-tree: An Improved R-tree using Fractals. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 12–15 September 1994; pp. 500–509. [Google Scholar]
- Liu, R.; An, X.; Gao, X. A Kind of Spatial Index Structure Based on R-Tree. Comput. Eng. 2009, 35, 3. [Google Scholar]
- Wang, J. A Kind of R-Tree Optimization Algorithm Combining Spatial Clustering Algorithm. Comput. Eng. Appl. 2014, 50, 112–115. [Google Scholar]
- Jiang, Z. Research on The Hilbert-R Tree Construction Method Based on the Improved GMM Clustering Algorithm; Harbin Engineering University: Harbin, China, 2019. [Google Scholar]
- Zhang, Y.-H.; Wen, C.; Zhang, M.; Xie, K.; He, J.-B. Fast 3D Visualization of Massive Geological Data Based on Clustering Index Fusion. IEEE Access 2022, 10, 28821–28831. [Google Scholar] [CrossRef]
- Cheng, H.; Xie, K.; Wen, C.; He, J.-B. Fast Visualization of 3D Massive Data Based on Improved Hilbert R-Tree and Stacked LSTM Models. IEEE Access 2021, 9, 16266–16278. [Google Scholar] [CrossRef]
- Aggarwal, C.C.; Philip, S.Y.; Han, J. A framework for clustering evolving data streams. In Proceedings of the 2003 VLDB Conference, Berlin, Germany, 9–12 September 2003; Morgan Kaufmann: Burlington, MA, USA, 2003; pp. 81–92.3. [Google Scholar] [CrossRef]
- Zubaroğlu, A.; Atalay, V. Online embedding and clustering of evolving data streams. Stat. Anal. Data Min. ASA Data Sci. J. 2023, 16, 29–44. [Google Scholar] [CrossRef]
- Uddin, R.; Ravishankar, C.V.; Tsotras, V.J. Indexing moving object trajectories with hilbert curves. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 6–9 November 2018; pp. 416–419. [Google Scholar] [CrossRef]
- Chavent, M.; Lechevallier, Y.; Briant, O. DIVCLUS-T: A monothetic divisive hierarchical clustering method. Comput. Stat. Data Anal. 2007, 52, 687–701. [Google Scholar] [CrossRef]
- Guha, S.; Rastogi, R.; Shim, K. Cure: An efficient clustering algorithm for large databases. Inf. Syst. 2001, 26, 35–58. [Google Scholar] [CrossRef]
- Kanungo, T.; Mount, D.M.; Netanyahu, N.S. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 881–892. [Google Scholar] [CrossRef]
- Batool, K.; Abbas, G. A Comprehensive Review on Evolving Data Stream Clustering. In Proceedings of the 2021 International Conference on Communication Technologies (ComTech), Rawalpindi, Pakistan, 21–22 September 2021; pp. 138–143. [Google Scholar] [CrossRef]
- Jacox, E.H.; Samet, H. Iterative spatial join. ACM Trans. Database Syst. 2003, 28, 230–256. [Google Scholar] [CrossRef]
- Zamfir, V.-A.; Carabas, M.; Carabas, C.; Tapus, N. Systems Monitoring and Big Data Analysis Using the Elasticsearch System. In Proceedings of the 2019 22nd International Conference on Control Systems and Computer Science (CSCS), Bucharest, Romania, 28–30 May 2019; pp. 188–193. [Google Scholar] [CrossRef]
Category | Configuration |
---|---|
Number of VM | 3 |
Processor | Intel Core (4 cores) |
RAM | 8 GB |
Hard Drive | 50 GB |
Operation System | Centos 7.5 |
Hadoop Version | 3.1.3 |
Zookeeper Version | 3.5.7 |
HBase Version | 2.4.11 |
JDK Version | 1.8.0_212 |
Elasticsearch Version | 7.8.0 |
Disaster Time | Timestamp | X Coordinate | Y Coordinate | Disaster Type |
---|---|---|---|---|
1 January 2000 0:20 | 946657230 | 97295417 | 26758213 | Landslide |
15 February 2017 2:04 | 1487095493 | 98463226 | 24882692 | Mudslide |
25 February 2018 10:24 | 1519525491 | 103810338 | 22889044 | Collapse |
26 April 2019 10:13 | 1556244796 | 99869433 | 21967614 | Land Subsidence |
29 May 2022 13:43 | 1653803032 | 104122763 | 24934143 | Earth Cracker |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Song, W.; Chen, J.; Wei, Q.; Wang, J. Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm. ISPRS Int. J. Geo-Inf. 2024, 13, 93. https://doi.org/10.3390/ijgi13030093
Li J, Song W, Chen J, Wei Q, Wang J. Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm. ISPRS International Journal of Geo-Information. 2024; 13(3):93. https://doi.org/10.3390/ijgi13030093
Chicago/Turabian StyleLi, Jiahao, Weiwei Song, Jianglong Chen, Qunlan Wei, and Jinxia Wang. 2024. "Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm" ISPRS International Journal of Geo-Information 13, no. 3: 93. https://doi.org/10.3390/ijgi13030093
APA StyleLi, J., Song, W., Chen, J., Wei, Q., & Wang, J. (2024). Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm. ISPRS International Journal of Geo-Information, 13(3), 93. https://doi.org/10.3390/ijgi13030093