FPCache: A Fingerprint-Rectified Learned Index Cache for Disaggregated Memory
Abstract
1. Introduction
- Fingerprint-assisted Two-stage Read Approach: To mitigate read amplification caused by prediction errors in learned indexes, we propose a two-stage read approach. Instead of directly retrieving the full range of candidate records determined by the model error bound, the compute node first fetches a compact fingerprint array from remote memory corresponding to the predicted region. These fingerprints are then matched locally to identify candidate positions for the query key. The system subsequently retrieves the corresponding key-value records from remote memory using fine-grained point accesses. This design transforms range reads into point accesses, reducing unnecessary data transfers and improving system throughput.
- Fingerprint-Offset Compression Strategy: To address the high cache footprint on compute nodes, we design a fingerprint-offset compression strategy. In FPCache, compute nodes maintain the learned index, which consists of a set of piecewise linear segments. FPCache caches data entries belonging to hot segments of the learned index. Instead of storing original keys and full physical addresses, the cached entries are represented using fixed-length fingerprints and position offsets (), where the offset records the deviation between the model-predicted position and the actual data location. By compressing both keys and address pointers, this strategy significantly reduces the space required for each cached entry. This approach reduces cache footprint on compute nodes, allowing them to accommodate a larger number of entries within limited memory resources.
2. Background and Motivation
2.1. Disaggregated Memory Architecture
2.2. Learned Indexes for Disaggregated Memory
2.3. Motivation
3. Design
3.1. Overview
3.2. Fingerprint-Assisted Two-Stage Read Approach
- Stage-1 (Fingerprint Prefetching and Local Matching): Upon receiving a query at the compute node, the system first uses the local learned index model to estimate the position of the target key and determines a retrieval range based on the maximum prediction error bound. Based on this interval, the compute node issues an RDMA read operation (①) to fetch the fingerprint array within the error range into local DRAM, forming a temporary fingerprint cache for subsequent verification. Meanwhile, the system computes the fingerprint of the query key using the same hash function and performs an element-wise comparison against the retrieved fingerprint array to identify potential matches. This process produces a set of candidate offsets relative to the predicted position. Due to hash collisions introduced by fingerprint compression, this candidate set may contain multiple possible positions (②). At this stage, only compact fingerprint metadata is transferred from remote memory, while KV payloads remain untouched, significantly reducing network data transfer.
- Stage-2 (Precise Data Retrieval or Query Termination): When the candidate set is empty, the system directly concludes that the target key does not exist and terminates the query. Otherwise, for each candidate offset, the system computes the corresponding physical address by combining the segment base address with the fixed entry size. It then issues batch RDMA read operations to retrieve the corresponding KV records from remote memory. A full key verification is performed on these retrieved records at the compute node. If a matching record is found, its value is returned and the query completes. Otherwise, the result is identified as a false positive caused by fingerprint collisions, and a miss is returned (③).
3.3. Fingerprint-Offset Compression Strategy
4. Evaluation
4.1. Experimental Setup
4.2. Overall Performance
4.2.1. Performance on YCSB Benchmark
4.2.2. Performance on Real-World Datasets
4.2.3. Performance Under Dynamic Hotspot Changes
4.2.4. Network Traffic and RDMA IO Analysis
4.3. Ablation Study
4.3.1. Performance Evaluation of the Two-Stage Read Approach
4.3.2. Performance Evaluation of the Dual Compression Strategy
4.4. Parameter Sensitivity
4.4.1. Sensitivity to Key-Value Length
4.4.2. Sensitivity to Fingerprint Design
4.4.3. Sensitivity to Prediction Error Bounds
4.4.4. Sensitivity to Cache Capacity
5. Related Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling laws for neural language models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
- Redis. Available online: https://redis.io/ (accessed on 7 April 2026).
- Memcached. Available online: https://memcached.org/ (accessed on 7 April 2026).
- Li, B.; Ruan, Z.; Xiao, W.; Lu, Y.; Xiong, Y.; Putnam, A.; Chen, E.; Zhang, L. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles; Association for Computing Machinery: New York, NY, USA, 2017; pp. 137–152. [Google Scholar]
- Neeli, S.S.S. Real-time data management with in-memory databases: A performance-centric approach. IJAIDR-J. Adv. Dev. Res. 2020, 11, 1–8. [Google Scholar]
- Zhang, K.; Wang, K.; Yuan, Y.; Guo, L.; Li, R.; Zhang, X.; He, B.; Hu, J.; Hua, B. A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster. VLDB J. 2017, 26, 729–750. [Google Scholar] [CrossRef]
- Chen, H.; Zhang, H.; Dong, M.; Wang, Z.; Xia, Y.; Guan, H.; Zang, B. Efficient and available in-memory KV-store with hybrid erasure coding and replication. ACM Trans. Storage (TOS) 2017, 13, 1–30. [Google Scholar] [CrossRef]
- Beck, M.; Kagan, M. Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure. In Proceedings of the 3rd Workshop on Data Center-Converged and Virtual Ethernet Switching; International Teletraffic Congress: Cracow, Poland, 2011; pp. 9–15. [Google Scholar]
- Kejriwal, A.; Gopalan, A.; Gupta, A.; Jia, Z.; Yang, S.; Ousterhout, J. SLIK: Scalable Low-Latency Indexes for a Key-Value Store. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC 16); USENIX Association: Berkeley, CA, USA, 2016; pp. 57–70. [Google Scholar]
- Wang, Q.; Lu, Y.; Shu, J. Sherman: A write-optimized distributed b+ tree index on disaggregated memory. In Proceedings of the 2022 International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1033–1048. [Google Scholar]
- Luo, X.; Shen, J.; Zuo, P.; Wang, X.; Lyu, M.R.; Zhou, Y. Chime: A cache-efficient and high-performance hybrid index on disaggregated memory. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles; Association for Computing Machinery: New York, NY, USA, 2024; pp. 110–126. [Google Scholar]
- An, H.; Wang, F.; Feng, D.; Zou, X.; Liu, Z.; Zhang, J. Marlin: A concurrent and write-optimized b+-tree index on disaggregated memory. In Proceedings of the 52nd International Conference on Parallel Processing; Association for Computing Machinery: New York, NY, USA, 2023; pp. 695–704. [Google Scholar]
- Lu, B.; Huang, K.; Liang, C.J.M.; Wang, T.; Lo, E. Dex: Scalable range indexing on disaggregated memory [extended version]. arXiv 2024, arXiv:2405.14502. [Google Scholar] [CrossRef]
- Wang, J.; Wang, Q.; Zhang, Y.; Shu, J. Deft: A scalable tree index for disaggregated memory. In Proceedings of the 20th European Conference on Computer Systems; Association for Computing Machinery: New York, NY, USA, 2025; pp. 886–901. [Google Scholar]
- Gao, P.X.; Narayan, A.; Karandikar, S.; Carreira, J.; Han, S.; Agarwal, R.; Ratnasamy, S.; Shenker, S. Network requirements for resource disaggregation. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16); USENIX Association: Berkeley, CA, USA, 2016; pp. 249–264. [Google Scholar]
- Guo, Z. Building End-to-End Disaggregation Stack via Cross Layer Co-Design; University of California: San Diego, CA, USA, 2025. [Google Scholar]
- Chen, Y.; Li, A.; Li, W.; Deng, L. FB +-tree: A Memory-Optimized B +-tree with Latch-Free Update. arXiv 2025, arXiv:2503.23397. [Google Scholar] [CrossRef]
- Ziegler, T.; Tumkur Vani, S.; Binnig, C.; Fonseca, R.; Kraska, T. Designing distributed tree-based index structures for fast rdma-capable networks. In Proceedings of the 2019 International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 2019; pp. 741–758. [Google Scholar]
- Kraska, T.; Beutel, A.; Chi, E.H.; Dean, J.; Polyzotis, N. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 2018; pp. 489–504. [Google Scholar]
- Wei, X.; Chen, R.; Chen, H.; Zang, B. Xstore: Fast rdma-based ordered key-value store using remote learned cache. ACM Trans. Storage (TOS) 2021, 17, 1–32. [Google Scholar] [CrossRef]
- Li, P.; Hua, Y.; Zuo, P.; Chen, Z.; Sheng, J. ROLEX: A scalable RDMA-oriented learned Key-Value store for disaggregated memory systems. In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST 23); USENIX Association: Berkeley, CA, USA, 2023; pp. 99–114. [Google Scholar]
- Qiao, P.; Zhang, Z.; Li, Y.; Yuan, Y.; Wang, S.; Wang, G.; Yu, J.X. AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value Store. IEEE Trans. Knowl. Data Eng. 2024, 36, 2877–2894. [Google Scholar] [CrossRef]
- Luo, X.; Zuo, P.; Shen, J.; Gu, J.; Wang, X.; Lyu, M.R.; Zhou, Y. SMART: A High-Performance adaptive radix tree for disaggregated memory. In Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), Boston, MA, USA, 10–12 July 2023; pp. 553–571. [Google Scholar]
- Herlihy, M.; Shavit, N.; Tzafrir, M. Hopscotch hashing. In Proceedings of the International Symposium on Distributed Computing; Springer: Berlin/Heidelberg, Germany, 2008; pp. 350–364. [Google Scholar]
- Cao, W.; Zhang, Y.; Yang, X.; Li, F.; Wang, S.; Hu, Q.; Cheng, X.; Chen, Z.; Liu, Z.; Fang, J.; et al. Polardb serverless: A cloud native database for disaggregated data centers. In Proceedings of the 2021 International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2477–2489. [Google Scholar]
- Wang, J.; Li, C.; Wang, T.; Guo, J.; Yang, H.; Zhuansun, Y.; Guo, M. Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters. arXiv 2025, arXiv:2503.20275. [Google Scholar] [CrossRef]
- Miao, M.; Ren, F.; Luo, X.; Xie, J.; Meng, Q.; Cheng, W. Softrdma: Rekindling high performance software rdma over commodity ethernet. In Proceedings of the 1st Asia-Pacific Workshop on Networking; Association for Computing Machinery: New York, NY, USA, 2017; pp. 43–49. [Google Scholar]
- Ziegler, T.; Nelson-Slivon, J.; Leis, V.; Binnig, C. Design guidelines for correct, efficient, and scalable synchronization using one-sided RDMA. In Proceedings of the ACM on Management of Data; Association for Computing Machinery: New York, NY, USA, 2023; Volume 1, pp. 1–26. [Google Scholar]
- Taranov, K.; Fischer, F.; Hoefler, T. Efficient RDMA Communication Protocols. arXiv 2022, arXiv:2212.09134. [Google Scholar] [CrossRef]
- CloudLab. Available online: https://cloudlab.us/ (accessed on 7 April 2026).
- Searching on Sorted Data. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JGVF9A (accessed on 7 April 2026).
- Wu, J.; Zhang, Y.; Chen, S.; Wang, J.; Chen, Y.; Xing, C. LSI: A Learned Secondary Index Structure. In Proceedings of the 5th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar]
- Wei, G.; Li, Y.; Song, H.; Li, T.; Yao, L.; Xu, Y.; Cui, H. DMTree: Towards Efficient Tree Indexing on Disaggregated Memory via Compute-side Collaborative Design. In Proceedings of the 24th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, 24–26 February 2026. [Google Scholar]
- Oukid, I.; Lasperas, J.; Nica, A.; Willhalm, T.; Lehner, W. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 2016; pp. 371–386. [Google Scholar]
- Leis, V.; Kemper, A.; Neumann, T. The adaptive radix tree: ARTful indexing for main-memory databases. In Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 8–12 April 2013; IEEE: New York, NY, USA, 2013; pp. 38–49. [Google Scholar]
- Zuo, P.; Zhou, Q.; Sun, J.; Yang, L.; Zhang, S.; Hua, Y.; Cheng, J.; He, R.; Yan, H. RACE: One-sided RDMA-conscious extendible hashing. ACM Trans. Storage (TOS) 2022, 18, 1–29. [Google Scholar] [CrossRef]
- Fagin, R.; Nievergelt, J.; Pippenger, N.; Strong, H.R. Extendible hashing—A fast access method for dynamic files. ACM Trans. Database Syst. (TODS) 1979, 4, 315–344. [Google Scholar] [CrossRef]
- Liu, Y.; Xie, M.; Shi, S.; Xu, Y.; Litz, H.; Qian, C. Outback: Fast and communication-efficient index for key-value store on disaggregated memory. arXiv 2025, arXiv:2502.08982. [Google Scholar] [CrossRef]



















| Error Bound () | 8 | 32 | 64 | 128 | 256 |
|---|---|---|---|---|---|
| Data retrieved per query (KB) | 0.25 | 1 | 2 | 4 | 8 |
| Throughput (Kops/s) | 1112 | 928.3 | 729.4 | 447.8 | 239.4 |
| Key Size | 8 B | 16 B | 32 B | 64 B | 128 B |
|---|---|---|---|---|---|
| Cached Entries | 6.55 M | 4.36 M | 2.62 M | 1.46 M | 0.77 M |
| Workload Type | Distribution | Key Size | Value Size |
|---|---|---|---|
| YCSB-B (95% Read, 5% Update) | Zipfian | 128 B | 128 B |
| YCSB-C (100% Read) | Zipfian/Uniform | 128 B * | 128 B * |
| YCSB-D (95% Read, 5% Insert) | Zipfian | 128 B | 128 B |
| YCSB-E (95% Scan, 5% Update) | Zipfian | 128 B | 128 B |
| Facebook user_id | Zipfian | 8 B | 8 B |
| Wiki timestamps | Zipfian | 8 B | 8 B |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jia, C.; Cai, M. FPCache: A Fingerprint-Rectified Learned Index Cache for Disaggregated Memory. Electronics 2026, 15, 2210. https://doi.org/10.3390/electronics15102210
Jia C, Cai M. FPCache: A Fingerprint-Rectified Learned Index Cache for Disaggregated Memory. Electronics. 2026; 15(10):2210. https://doi.org/10.3390/electronics15102210
Chicago/Turabian StyleJia, Chenyang, and Miao Cai. 2026. "FPCache: A Fingerprint-Rectified Learned Index Cache for Disaggregated Memory" Electronics 15, no. 10: 2210. https://doi.org/10.3390/electronics15102210
APA StyleJia, C., & Cai, M. (2026). FPCache: A Fingerprint-Rectified Learned Index Cache for Disaggregated Memory. Electronics, 15(10), 2210. https://doi.org/10.3390/electronics15102210

