Efficient Management of High-Frequency Sensor Data Streams Using a Read-Optimized Learned Index
Abstract
1. Introduction
2. Related Work
3. Preliminaries and Problem Definition
3.1. Overview of Real-Time Spatial Queries for Sensors
- Spatial Dataset and Objects: Let be a dataset containing N complex spatial observations captured by sensors (e.g., environmental boundaries, vehicle trajectories, or obstacle zones). Each spatial observation is encapsulated by a Minimum Bounding Rectangle (MBR), denoted as . While MBRs are efficient for coarse-grained indexing, they introduce approximation errors, especially for non-rectangular (e.g., diagonal or L-shaped) geometries.
- Spatial Range Query (SRQ): Given a query window Q representing a specific monitoring area or a safety perimeter, the goal of an SRQ is to retrieve the set R of all spatial objects that intersect with Q to enable immediate decision-making:
- Q.Geometry: The exact geometric shape r used for precise intersection tests in the refinement phase.
- Q.MBR: The Minimum Bounding Rectangle of r, denoted as , used for index traversal and pruning in the filtering phase.
3.2. The GLIN Data Retrieval Response Latency Model
- Probe Stage: The learned model is used to predict the index location (a range on the Z-order curve) of the MBR . The time for this stage is typically very low, benefiting from the model’s predictive power and the efficiency of the underlying B-Tree or ALEX structure.
- Refine Stage: The located leaf node contains a candidate set C. The refine stage must perform expensive geometric intersection tests on each object in C to filter out false positives (FPs) and identify true positives (TPs).
3.3. Formalizing the Refinement Bottleneck
3.4. Ingestion Throughput and Analysis Immediacy
4. DyGLIN Methodology
4.1. Core Architecture: Edge-Oriented Sensor Stream Decoupling
4.2. Read Path: Real-Time Retrieval and Boundary Verification
| Algorithm 1: Real-Time Spatial Retrieval |
| Input: Query Q (MBR), DyGLIN Index |
| Output: Result Set R |
| 1: |
| 2: /* Index probe (top-level RMI probing) */ |
| 3: |
| 4: for each do |
| 5: |
| 6: /* 1. HMBR filter */ |
| 7: |
| 8: |
| 9: /* 2. Incremental buffer filter */ |
| 10: |
| 11: |
| 12: /* 3. CF filter */ |
| 13: |
| 14: for each do |
| 15: if then |
| 16: |
| 17: end if |
| 18: end for |
| 19: /* 4. Final refinement */ |
| 20: |
| 21: |
| 22: end for |
| 23: return |
4.3. Write Path: Amortized Analysis of Sensor Stream Ingestion
| Algorithm 2: High-Frequency Stream Ingestion |
| Input: Geometry G, DyGLIN Index |
| Output: Success/Failure |
| 1: /* 1. Find the corresponding leaf node */ |
| 2: LeafNode L = |
| 3: if then |
| 4: |
| 5: end if |
| 6: /* O(1) operation: Only append to the buffer */ |
| 7: |
| 8: return Success |
| Algorithm 3: Invalid Signal Removal |
| Input: Geometry G, DyGLIN Index |
| Output: Success/Failure |
| 1: LeafNode L = |
| 2: /* O(1) tag: Mark as “Deleted” in CF */ |
| 3: |
| 4: /* Mark in MDS if consistent with HMBR, otherwise defer physical removal to merge*/ |
| 5: return Success |
| Algorithm 4: Background Sensor Data Archiving |
| Input: LeafNode L (DB is full) |
| Output: void |
| 1: /* Batch insertion of data */ |
| 2: |
| 3: /* Reconstruct the hierarchical MBR */ |
| 4: |
| 5: /* Clear the buffer */ |
| 6: |
| 7: /* Clear CF: If the CF is too full, rebuild it here */ |
| 8: if then |
| 9: |
| 10: end if |
5. Experiments
5.1. Experiment Setup
5.1.1. Datasets for Sensor-Based Spatial Analysis
5.1.2. Sensor-Driven Workloads and Performance Metrics
5.2. Evaluation of Real-Time Query Efficiency in Sensor Networks
5.3. Evaluation of Sensor Stream Ingestion and Resource Efficiency
5.4. Component Effectiveness Analysis in Sensor Stream Workloads
5.4.1. Effectiveness of Precision Spatial Filtering
5.4.2. Impact of Delta Buffer and Cuckoo Filter
5.5. Correctness Validation for Cuckoo Filter-Based Logical Deletion
5.5.1. Cuckoo Filter Configuration and Theoretical Analysis
5.5.2. Experimental Correctness Validation
5.6. Sensitivity Analysis
5.6.1. Impact of HMBR Granularity
5.6.2. Impact of Delta Buffer Capacity
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DyGLIN | Dynamic Generate Learning-based Index |
| CDF | Cumulative Distribution Function |
| CF | Cuckoo Filter |
| DB | Delta Buffer |
| HMBR | Hierarchical MBR |
| ID | Identifier |
| MBR | Minimum Bounding Rectangle |
| MDS | Main Data Store |
| SFC | Space-Filling Curve |
| WODS | Write-Optimized Data Structure |
| QRT | Query Response Time |
| SRQS | Spatial Range Queries |
References
- Xi, M.; Wen, J.; He, J.; Xiao, S.; Yang, J. An Expert Experience-Enhanced Security Control Approach for AUVs of the Underwater Transportation Cyber-Physical Systems. IEEE Trans. Intell. Transport. Syst. 2025, 26, 14086–14098. [Google Scholar] [CrossRef]
- Yang, Y.; Yi, H.; Xi, M.; Wen, J.; Yang, J. GenAI-Driven Unsupervised Denoising for Consumer Device Imagery. IEEE Consum. Electron. Mag. 2025, 14, 94–102. [Google Scholar] [CrossRef]
- Al Jawarneh, I.M.; Foschini, L.; Bellavista, P. Polygon simplification for the efficient approximate analytics of georeferenced big data. Sensors 2023, 23, 8178. [Google Scholar] [CrossRef]
- Kadav, P.; Sharma, S.; Rojas, J.F.; Patil, P.; Wang, C.; Ekti, A.R.; Meyer, R.T.; Asher, Z.D. Automated lane centering: An off-the-shelf computer vision product vs. Infrastructure-based chip-enabled raised pavement markers. Sensors 2024, 24, 2327. [Google Scholar] [CrossRef]
- Zhang, X.; Bai, W.; Liu, J.; Yang, S.; Shang, T.; Liu, H. Enhancing geomagnetic navigation with PPO-LSTM: Robust navigation utilizing observed geomagnetic field data. Sensors 2025, 25, 3699. [Google Scholar] [CrossRef]
- Yu, J.; Wu, J.; Sarwat, M. Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2015; pp. 1–4. [Google Scholar]
- Eldawy, A.; Mokbel, M.F. Spatialhadoop: A mapreduce framework for spatial data. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, Seoul, Republic of Korea, 13–17 April 2015; pp. 1352–1363. [Google Scholar]
- Balderas-Díaz, C.; Miraz, M.; Hossain, M.A.; Guevara, V. ADEPT Framework: Optimizing Data Flow in Dynamic Environments for Ambient Assisted Living. In Proceedings of the International Symposium on Ambient Intelligence, Salamanca, Spain, 20–22 November 2024; pp. 1–8. [Google Scholar]
- Guttman, A. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, USA, 18–21 June 1984; pp. 47–57. [Google Scholar]
- Samet, H. The quadtree and related hierarchical data structures. ACM Comput. Surv. 1984, 16, 187–260. [Google Scholar] [CrossRef]
- Pandey, V.; van Renen, A.; Kipf, A.; Sabek, I.; Ding, J.; Kemper, A. The case for learned spatial indexes. arXiv 2020, arXiv:2008.10349. [Google Scholar] [CrossRef]
- Kraska, T.; Beutel, A.; Chi, E.H.; Dean, J.; Polyzotis, N. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 489–504. [Google Scholar]
- Al-Mamun, A.; Wu, H.; He, Q.; Wang, J.; Aref, W.G. A survey of learned indexes for the multi-dimensional space. ACM Comput. Surv. 2025, 58, 1–37. [Google Scholar] [CrossRef]
- Nathan, V.; Ding, J.; Alizadeh, M.; Kraska, T. Learning multi-dimensional indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 985–1000. [Google Scholar]
- Ding, J.; Nathan, V.; Alizadeh, M.; Kraska, T. Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. arXiv 2020, arXiv:2006.13282. [Google Scholar] [CrossRef]
- Wang, H.; Fu, X.; Xu, J.; Lu, H. Learned index for spatial queries. In Proceedings of the 2019 20th IEEE International Conference on Mobile Data Management (MDM), Hong Kong, China, 10–13 June 2019; pp. 569–574. [Google Scholar]
- Li, P.; Lu, H.; Zheng, Q.; Yang, L.; Pan, G. LISA: A learned index structure for spatial data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 2119–2133. [Google Scholar]
- Qi, J.; Liu, G.; Jensen, C.S.; Kulik, L. Effectively learning spatial indices. Proc. VLDB Endow. 2020, 13, 2341–2354. [Google Scholar] [CrossRef]
- Pai, S.; Mathioudakis, M.; Wang, Y. Wazi: A learned and workload-aware z-index. arXiv 2023, arXiv:2310.04268. [Google Scholar]
- Wang, C.; Yu, J.; Zhao, Z. GLIN: A (G) eneric (L) earned (In) dexing Mechanism for Complex Geometries. In Proceedings of the 11th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, Hamburg, Germany, 13–16 November 2023; pp. 1–12. [Google Scholar]
- Wu, J.; Zhang, Y.; Chen, S.; Wang, J.; Chen, Y.; Xing, C. Updatable learned index with precise positions. arXiv 2021, arXiv:2104.05520. [Google Scholar] [CrossRef]
- Hidaka, F.; Matsui, Y. Flexflood: Efficiently updatable learned multi-dimensional index. arXiv 2024, arXiv:2411.09205. [Google Scholar]
- Tang, W.; Zhang, C.; Yang, J.; Wu, J.; Huang, H. Updatable Spatial Learned Index Based on Dimensionality Reduction. In Proceedings of the 2024 10th International Conference on Big Data and Information Analytics (BigDIA), Shenzhen, China, 23–25 August 2024; pp. 757–764. [Google Scholar]
- Beckmann, N.; Kriegel, H.-P.; Schneider, R.; Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, USA, 23–25 May 1990; pp. 322–331. [Google Scholar]
- Sellis, T.; Roussopoulos, N.; Faloutsos, C. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects; Carnegie Mellon University: Pittsburgh, PA, USA, 1987. [Google Scholar]
- Ding, J.; Minhas, U.F.; Yu, J.; Wang, C.; Do, J.; Li, Y.; Zhang, H.; Chandramouli, B.; Gehrke, J.; Kossmann, D.; et al. ALEX: An updatable adaptive learned index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 969–984. [Google Scholar]
- Ferragina, P.; Vinciguerra, G. The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 2020, 13, 1162–1175. [Google Scholar] [CrossRef]
- Mishra, M.; Singhal, R. RUSLI: Real-time updatable spline learned index. In Proceedings of the Fourth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Virtual Event, China, 20 June 2021; pp. 1–8. [Google Scholar]
- Davitkova, A.; Milchevski, E.; Michel, S. The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. In Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), Copenhagen, Denmark, 30 March–2 April 2020; pp. 407–410. [Google Scholar]
- Patil, M.; Ravishankar, C.V. Model Reuse in Learned Spatial Indexes. In Proceedings of the 36th International Conference on Scientific and Statistical Database Management, Santa Cruz, CA, USA, 10–12 July 2024; pp. 1–12. [Google Scholar]
- Fan, B.; Andersen, D.G.; Kaminsky, M.; Mitzenmacher, M.D. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, Sydney, Australia, 2–5 December 2014; pp. 75–88. [Google Scholar]
- Mao, Q.; Qader, M.A.; Hristidis, V. Comparison of LSM indexing techniques for storing spatial data. J. Big Data 2023, 10, 51. [Google Scholar] [CrossRef]
- Arge, L. The buffer tree: A new technique for optimal I/O-algorithms. In Proceedings of the Workshop on Algorithms and Data Structures, Kingston, LO, Canada, 16–18 August 1995; pp. 334–345. [Google Scholar]






| Name | Type | Cardinality (M) | Size (GB) | Width (deg) | Height (deg) |
|---|---|---|---|---|---|
| AREAWATER | Polygon | 2.28 | 1.52 | 2.56862 | 1.463470 |
| LINEWATER | LineString | 5.8 | 4.56 | 1.52892 | 0.981663 |
| PARKS | Polygon | 9.96 | 5.76 | 155.842 | 82 |
| Method | Selectivity | Mean (ms) | 95% CI | Median | P95 | P99 |
|---|---|---|---|---|---|---|
| DyGLIN | 0.1% | 0.86 | [0.81, 0.92] | 0.78 | 1.12 | 1.45 |
| 1.0% | 6.04 | [6.85, 7.45] | 6.52 | 8.95 | 11.20 | |
| 10.0% | 68.87 | [68.20, 74.80] | 66.80 | 92.50 | 118.40 | |
| GLIN | 0.1% | 1.18 | [1.12, 1.25] | 1.05 | 1.85 | 2.62 |
| 1.0% | 9.83 | [9.25, 10.41] | 9.12 | 14.50 | 19.80 | |
| 10.0% | 96.96 | [92.13, 104.17] | 90.77 | 144.02 | 210.95 | |
| R-Tree | 0.1% | 2.41 | [2.25, 2.28] | 2.15 | 4.85 | 7.99 |
| 1.0% | 20.52 | [19.35, 20.80] | 18.61 | 42.32 | 75.44 | |
| 10.0% | 201.38 | [188.58, 211.93] | 182.42 | 455.60 | 820.25 | |
| Quad-Tree | 0.1% | 2.85 | [2.65, 3.08] | 2.52 | 5.60 | 9.43 |
| 1.0% | 22.02 | [20.95, 23.09] | 20.18 | 48.69 | 85.19 | |
| 10.0% | 238.57 | [210.76, 230.48] | 211.50 | 495.75 | 887.58 |
| Workload (R/W Ratio) | Method | Total Throughput ( ops/s) | Avg Query Latency (ms) | P95 Query Latency (ms) |
|---|---|---|---|---|
| 95/5 (Read-Heavy) | DyGLIN | 1.34 | 7.26 | 9.14 |
| GLIN | 1.04 | 10.13 | 25.31 | |
| R-Tree | 0.83 | 21.75 | 145.75 | |
| Quad-Tree | 0.89 | 23.95 | 183.70 | |
| 50/50 (Balanced) | DyGLIN | 1.28 | 7.80 | 9.84 |
| GLIN | 0.93 | 13.65 | 38.88 | |
| R-Tree | 0.53 | 55.00 | 295.30 | |
| Quad-Tree | 0.57 | 62.70 | 365.20 | |
| 5/95 (Write-Heavy) | DyGLIN | 1.21 | 8.34 | 10.60 |
| GLIN | 0.81 | 18.17 | 48.44 | |
| R-Tree | 0.32 | 92.25 | 438.25 | |
| Quad-Tree | 0.34 | 105.45 | 580.70 |
| Dataset | Total Memory Footprint (MB) | DyGLIN Overhead Breakdown (vs. GLIN) | ||||
|---|---|---|---|---|---|---|
| Rtree | QuadTree | GLIN | +HMBR | +Cuckoo Filter | DyGLIN Total | |
| AREAWATER | 285.4 | 341.2 | 75.6 | 8.2 (10.8%) | 4.5 (6.0%) | 88.3 |
| LINEWATER | 460.8 | 682.5 | 112.4 | 14.8 (13.2%) | 7.4 (6.6%) | 134.6 |
| PARKS | 850.2 | 1420.1 | 230.5 | 28.9 (12.5%) | 15.0 (6.5%) | 274.3 |
| Method | Dataset | MBR Filtration | HMBR Filtration | Candidate Set Reduction Rate |
|---|---|---|---|---|
| GLIN | AREAWATER | 83,588 | N/A | 0 |
| LINEWATER | 14,790 | N/A | 0 | |
| PARKS | 23,721 | N/A | 0 | |
| DyGLIN | AREAWATER | 83663 | 23,593 | 0.718 |
| LINEWATER | 14,573 | 8992 | 0.383 | |
| PARKS | 24,121 | 16,957 | 0.297 |
| Method | Component Added | Deletion Semantics | Dataset | Average Query Time (ms) | Insertion Throughput ( ops/sec) | Delete Throughput ( ops/sec) |
|---|---|---|---|---|---|---|
| Rtree | None | Physical | AREAWATER | 25.14 | 0.82 | 0.11 |
| LINEWATER | 52.36 | 0.75 | 0.09 | |||
| PARKS | 158.42 | 0.64 | 0.04 | |||
| QuadTree | None | Physical | AREAWATER | 28.56 | 0.88 | 0.15 |
| LINEWATER | 67.21 | 0.81 | 0.12 | |||
| PARKS | 172.15 | 0.70 | 0.06 | |||
| GLIN | None | Physical | AREAWATER | 9.83 | 1.01 | 0.72 |
| LINEWATER | 19.19 | 0.92 | 0.65 | |||
| PARKS | 64.23 | 0.84 | 0.41 | |||
| DyGLIN-NoBuffer | + HMBR | Logical | AREAWATER | 6.45 | 0.14 | 0.98 |
| LINEWATER | 15.12 | 0.28 | 0.79 | |||
| PARKS | 51.29 | 0.11 | 0.53 | |||
| DyGLIN | + Delta Buffer + Deletion Filter | Logical | AREAWATER | 6.04 | 1.34 | 1.04 |
| LINEWATER | 15.23 | 1.25 | 0.81 | |||
| PARKS | 51.34 | 1.02 | 0.53 |
| Parameter | Value and Justification |
|---|---|
| Fingerprint Size | 24 bits (provides theoretical FPR %) |
| Bucket Size | 8 slots per bucket (4-way associativity) |
| Number of Buckets | 8192 buckets (total capacity: 65,536 items) |
| Max Kickouts | 500 attempts (ensures insertion success rate) |
| Hash Function | 64-bit hash combining MBR coordinates and geometry type |
| Load Factor | Target (maintains low false positive rate) |
| Configuration | Recall | Precision | F1 Score | Measured FPR | Theoretical FPR |
|---|---|---|---|---|---|
| Baseline (16-bit FP, no safeguard) | 98.53% | 100.00% | 99.26% | 1.47% | 0.0015% |
| Recommended (24-bit FP, with safeguard) | 100.00% | 100.00% | 100.00 | 0.01% | 0.000006% |
| Number of Micro-MBRs | Avg Query Time (ms) | Memory Consumption (MB) |
|---|---|---|
| 4 | 8.24 | 78.5 |
| 16 | 6.04 | 88.3 |
| 64 | 5.85 | 115.2 |
| 256 | 5.76 | 194.8 |
| Buffer Capacity B | Write Throughput ( ops/s) | Avg Query Latency (ms) | P95 Query Latency (ms) |
|---|---|---|---|
| 16 | 0.85 | 5.82 | 7.41 |
| 64 | 1.22 | 5.91 | 7.68 |
| 256 | 1.34 | 6.04 | 8.14 |
| 1024 | 1.38 | 6.45 | 10.52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Luo, H.; Wen, J.; Chen, D.; Li, Z.; Xi, M.; He, J.; Xiao, S.; Yang, J. Efficient Management of High-Frequency Sensor Data Streams Using a Read-Optimized Learned Index. Sensors 2026, 26, 1217. https://doi.org/10.3390/s26041217
Luo H, Wen J, Chen D, Li Z, Xi M, He J, Xiao S, Yang J. Efficient Management of High-Frequency Sensor Data Streams Using a Read-Optimized Learned Index. Sensors. 2026; 26(4):1217. https://doi.org/10.3390/s26041217
Chicago/Turabian StyleLuo, Hu, Jiabao Wen, Desheng Chen, Zhengjian Li, Meng Xi, Jingyi He, Shuai Xiao, and Jiachen Yang. 2026. "Efficient Management of High-Frequency Sensor Data Streams Using a Read-Optimized Learned Index" Sensors 26, no. 4: 1217. https://doi.org/10.3390/s26041217
APA StyleLuo, H., Wen, J., Chen, D., Li, Z., Xi, M., He, J., Xiao, S., & Yang, J. (2026). Efficient Management of High-Frequency Sensor Data Streams Using a Read-Optimized Learned Index. Sensors, 26(4), 1217. https://doi.org/10.3390/s26041217

