TB-Collect: Efficient Garbage Collection for Non-Volatile Memory Online Transaction Processing Engines
Abstract
:1. Introduction
- We analyze and experiment with existing GC approaches in DRAM and NVM OLTP engines, discovering that those GC approaches are not well suited for NVM OLTP engines, and summarize the design directions for GC in NVM OLTP engines.
- We propose a garbage collection approach for NVM OLTP engines, TB-Collect, which does not require version chain traversal during reclamation, solves the problem of version chain merging caused by long transactions at a minimal cost, and significantly reduces writes to NVM.
- We implement TB-Collect on DBx1000 [26] and MySQL [27,28], including the GC algorithm and NVM storage, and we evaluate the performance in a real NVM hardware environment and five simulated NVM configurations. Experimental results show that TB-Collect improves transactional throughput by up to 1.58× compared to the existing methods.
2. Background and Motivation
2.1. MVCC in NVM OLTP Engine
2.2. Version Storage
2.3. Garbage Collection Approaches
2.3.1. Timely Version Chain Pruning
2.3.2. Background Scanning
2.3.3. Partition Clearing
2.4. Challenge
3. Related Work
3.1. Garbage Collection in Drive-Based Databases
3.2. OLTP Engines for NVM
4. TB-Collect Design
4.1. Tile-Block Append-Only Storage
4.2. Internal Structure of GC
4.3. Garbage Collection Processing
4.4. Long Transaction and Chain Consolidation
Algorithm 1: Collect obsolete block function. |
Algorithm 2: Clean obsolete block function. |
4.5. GC After Recovery
Algorithm 3: GC after recovery function. |
5. Evaluation
5.1. Experimental Setup
5.2. TPCC Experiments
5.3. YCSB Experiments
5.4. Chain Consolidation Performance
5.5. Garbage Collection Performance
5.6. Garbage Collection in MySQL with Different NVM Configurations
6. Discussion
6.1. Advantages of TB-Collect
6.2. Limitations and Future Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Intel Optane DC Persistent Memory Architecture and Technology. Available online: https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html (accessed on 15 March 2023).
- Diaconu, C.; Freedman, C.; Ismert, E.; Larson, P.; Mittal, P.; Stonecipher, R.; Verma, N.; Zwilling, M. Hekaton: SQL server’s memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 1243–1254. [Google Scholar]
- Stonebraker, M.; Weisberg, A. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 2013, 36, 21–27. [Google Scholar]
- Faerber, F.; Kemper, A.; Larson, P.-Å.; Levandoski, J.; Neumann, T.; Pavlo, A. Main memory database systems. Found. Trends Databases 2017, 8, 1–130. [Google Scholar] [CrossRef]
- Larson, P.A.; Levandoski, J. Modern main-memory database systems. Proc. VLDB Endowment 2016, 9, 1609–1610. [Google Scholar] [CrossRef]
- Huang, J.; Schwan, K.; Qureshi, M.K. NVRAM-aware logging in transaction systems. Proc. VLDB Endow. 2014, 8, 389–400. [Google Scholar] [CrossRef]
- Kimura, H. FOEDUS: OLTP engine for a thousand cores and NVRAM. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbournem, Australia, 31 May 2015–4 June 2015; pp. 691–706. [Google Scholar]
- Arulraj, J.; Perron, M.; Pavlo, A. Write-behind logging. Proc. VLDB Endow. 2016, 10, 337–348. [Google Scholar] [CrossRef]
- Liu, G.; Chen, L.; Chen, S. Zen: A high-throughput log-free OLTP engine for non-volatile main memory. Proc. VLDB Endow. 2021, 14, 835–848. [Google Scholar] [CrossRef]
- Liu, G.; Chen, L.; Chen, S. Zen+: A Robust NUMA-Aware OLTP Engine Optimized for Non-Volatile Main Memory. VLDB J. 2023, 32, 123–148. [Google Scholar] [CrossRef]
- Schwalb, D.; Kumar, G.B.K.; Dreseler, M.; Anusha, S.; Faust, M.; Hohl, A.; Berning, T.; Makkar, G.; Plattner, H.; Deshmukh, P. Hyrise-NV: Instant recovery for in-memory databases using non-volatile memory. In Proceedings of the Database Systems for Advanced Applications: 21st International Conference, DASFAA 2016, Dallas, TX, USA, 16–19 April 2016; pp. 267–282. [Google Scholar]
- Ji, Z.; Chen, K.; Wang, L.; Zhang, M.; Wu, Y. Falcon: Fast OLTP Engine for Persistent Cache and Non-Volatile Memory. In Proceedings of the 29th Symposium on Operating Systems Principles, Koblenz, Germany, 23–26 October 2023; pp. 531–544. [Google Scholar]
- Ma, S.; Chen, K.; Chen, S.; Liu, M.; Zhu, J.; Kang, H.; Wu, Y. ROART: Range-query optimized persistent ART. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST 21), Online, 23–25 February 2021; pp. 1–16. [Google Scholar]
- Zhang, B.; Zheng, S.; Qi, Z.; Huang, L. NBTree: A Lock-free PM-friendly Persistent B+-Tree for eADR-enabled PM Systems. Proc. VLDB Endow. 2022, 15, 1187–1200. [Google Scholar] [CrossRef]
- Chen, S.; Jin, Q. Persistent b+-trees in non-volatile main memory. Proc. VLDB Endow. 2015, 8, 786–797. [Google Scholar] [CrossRef]
- Wang, T.; Johnson, R. Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 2014, 7, 865–876. [Google Scholar] [CrossRef]
- Shin, S.; Tirukkovalluri, S.K.; Tuck, J.; Solihin, Y. Proteus: A flexible and fast software supported hardware logging approach for nvm. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, MA, USA, 14–18 October 2017; pp. 178–190. [Google Scholar]
- Zhang, M.; Hua, Y. Silo: Speculative Hardware Logging for Atomic Durability in Persistent Memory. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Montreal, QC, Canada, 25 February–1 March 2023; pp. 651–663. [Google Scholar]
- Böttcher, J.; Leis, V.; Neumann, T.; Kemper, A. Scalable garbage collection for in-memory MVCC systems. Proc. VLDB Endow. 2019, 13, 128–141. [Google Scholar] [CrossRef]
- Peloton. Available online: https://pelotondb.io/ (accessed on 20 March 2025).
- Raza, A.; Chrysogelos, P.; Anadiotis, A.C.; Ailamaki, A. One-shot garbage collection for in-memory OLTP through temporality-aware version storage. Proc. ACM Manag. Data 2023, 1, 1–25. [Google Scholar] [CrossRef]
- Condit, J.; Nightingale, E.B.; Frost, C.; Ipek, E.; Lee, B.; Burger, D.; Coetzee, D. Better I/O Through Byte-Addressable, Persistent Memory. In Proceedings of the ACM SIGOPS 22nd symposium on Operating Systems Principles, Big Sky, MT, USA, 11–14 October 2009. [Google Scholar]
- Jacob, B.; Wang, D.; Ng, S. Memory Systems: Cache, DRAM, Disk; Morgan Kaufmann: Burlington, MA, USA, 2010. [Google Scholar]
- Akram, S. Exploiting Intel Optane Persistent Memory for Full Text Search. In Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management, Online, 22 June 2021; pp. 80–93. [Google Scholar]
- Le Gallo, M.; Sebastian, A. An overview of phase-change memory device physics. J. Phys. D Appl. Phys. 2020, 53, 213002. [Google Scholar] [CrossRef]
- Yu, X.; Bezerra, G.; Pavlo, A.; Devadas, S.; Stonebraker, M. Staring into the abyss: An evaluation of concurrency control with one thousand cores. Proc. VLDB Endow. 2014, 8, 209–220. [Google Scholar] [CrossRef]
- Botros, S.; Tinley, J. High Performance MySQL; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2021. [Google Scholar]
- MySQL 8.4 Reference Manual. Available online: https://dev.mysql.com/doc/refman/8.4/en/ (accessed on 15 October 2024).
- Neumann, T.; Mühlbauer, T.; Kemper, A. Fast serializable multi-version concurrency control for main-memory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; pp. 677–689. [Google Scholar]
- Freitag, M.; Kemper, A.; Neumann, T. Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems. Proc. VLDB Endow. 2022, 15, 2797–2810. [Google Scholar] [CrossRef]
- Izraelevitz, J.; Kelly, T.; Kolli, A. Failure-atomic persistent memory updates via JUSTDO logging. ACM SIGARCH Comput. Archit. News 2016, 44, 427–442. [Google Scholar] [CrossRef]
- Agrawal, R.; Jagadish, H.V. Recovery algorithms for database machines with non-volatile main memory. In International Workshop on Database Machines; Springer: Berlin/Heidelberg, Germany, 1989; pp. 269–285. [Google Scholar]
- Oukid, I.; Booss, D.; Lehner, W.; Bumbulis, P.; Willhalm, T. SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery. In Proceedings of the 10th International Workshop on Data Management on New Hardware, Seattle, WA, USA, 18–23 June 2014; pp. 1–7. [Google Scholar]
- An, M.; Park, J.; Wang, T.; Nam, B.; Lee, S.W. NV-SQL: Boosting OLTP Performance with Non-Volatile DIMMs. Proc. VLDB Endow. 2023, 16, 1453–1465. [Google Scholar] [CrossRef]
- Funke, F.; Kemper, A.; Mühlbauer, T.; Neumann, T.; Leis, V. Hyper beyond software: Exploiting modern hardware for main-memory database systems. Datenbank-Spektrum 2014, 14, 173–181. [Google Scholar] [CrossRef]
- Kemper, A.; Neumann, T. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011; pp. 195–206. [Google Scholar]
- Lyu, Z.; Zhang, H.H.; Xiong, G.; Guo, G.; Wang, H.; Chen, J.; Praveen, A.; Yang, Y.; Gao, X.; Wang, A.; et al. Greenplum: A hybrid database for transactional and analytical workloads. In Proceedings of the 2021 International Conference on Management of Data, Shaanxi, China, 20–25 June 2021; pp. 2530–2542. [Google Scholar]
- Lee, J.; Shin, H.; Park, C.G.; Ko, S.; Noh, J.; Chuh, Y.; Stephan, W.; Han, W.S. Hybrid garbage collection for multi-version concurrency control in SAP HANA. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 1307–1318. [Google Scholar]
- Ahn, M.; Willhalm, T.; May, N.; Lee, D.; Desai, S.M.; Booss, D.; Kim, J.; Singh, N.; Ritter, D.; Rebholz, O. An Examination of CXL Memory Use Cases for In-Memory Database Management Systems using SAP HANA. Proc. VLDB Endow. 2024, 17, 3827–3840. [Google Scholar] [CrossRef]
- Julakanti, S.R.; Sattiraju, N.S.K.; Julakanti, R. Transforming Data in SAP HANA: From Raw Data to Actionable Insights. NeuroQuantology 2022, 19, 854–861. [Google Scholar]
- Intel Corporation. eADR: New Opportunities for Persistent Memory Applications. 2021. Available online: https://www.intel.com/content/www/us/en/developer/articles/technical/eadr-new-opportunities-for-persistentmemory-applications.html (accessed on 20 December 2023).
- Cai, S.; Chen, K.; Liu, M.; Liu, X.; Wu, Y.; Zheng, W. Garbage collection and data recovery for N2DB. Tsinghua Sci. Technol. 2021, 27, 630–641. [Google Scholar] [CrossRef]
- Färber, F.; Cha, S.K.; Primsch, J.; Bornhövd, C.; Sigg, S.; Lehner, W. SAP HANA database: Data management for modern business applications. ACM Sigmod Rec. 2012, 40, 45–51. [Google Scholar] [CrossRef]
- Wu, S.; Lin, Y.; Mao, B.; Jiang, H. GCaR: Garbage collection aware cache management with improved performance for flash-based SSDs. In Proceedings of the 2016 International Conference on Supercomputing, Istanbul, Turkey, 1–3 June 2016; pp. 1–12. [Google Scholar]
- Kim, Y.; Oral, S.; Shipman, G.M.; Lee, J.; Dillow, D.A.; Wang, F. Harmonia: A globally coordinated garbage collector for arrays of solid-state drives. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), Denver, CO, USA, 23–27 May 2011; pp. 1–12. [Google Scholar]
- Lee, J.; Kim, Y.; Shipman, G.M.; Oral, S.; Wang, F.; Kim, J. A semi-preemptive garbage collector for solid state drives. In Proceedings of the (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 10–12 April 2011; pp. 12–21. [Google Scholar]
- Hu, J.; Jiang, H.; Tian, L.; Xu, L. GC-ARM: Garbage collection-aware RAM management for flash based solid state drives. In Proceedings of the 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage, Xiamen, China, 28–30 June 2012; pp. 134–143. [Google Scholar]
- Jung, M.; Prabhakar, R.; Kandemir, M.T. Taking garbage collection overheads off the critical path in SSDs. In Proceedings of the Middleware 2012: ACM/IFIP/USENIX 13th International Middleware Conference, Montreal, QC, Canada, 3–7 December 2012; pp. 164–186. [Google Scholar]
- Yan, S.; Li, H.; Hao, M.; Tong, M.H.; Sundararaman, S.; Chien, A.A.; Gunawi, H.S. Tiny-tail flash: Near-perfect elimination of garbage collection tail latencies in NAND SSDs. ACM Trans. Storage (TOS) 2017, 13, 22. [Google Scholar] [CrossRef]
- Wang, K.; Tan, H.; He, Z.; Li, J.; Li, K. CDA-GC: An effective cache data allocation for garbage collection in flash-based solid-state drives. Integration 2025, 102, 102359. [Google Scholar] [CrossRef]
- Hedayati, S.; Maleki, N.; Olsson, T.; Ahlgren, F.; Seyednezhad, M.; Berahmand, K. MapReduce scheduling algorithms in Hadoop: A systematic study. J. Cloud Comput. 2023, 12, 143. [Google Scholar] [CrossRef]
- Bernstein, P.A.; Hadzilacos, V.; Goodman, N. Concurrency Control and Recovery in Database Systems; Addison-Wesley: Reading, MA, USA, 1987; Volume 370. [Google Scholar]
- Lomet, D.; Fekete, A.; Wang, R.; Ward, P. Multi-version Concurrency via Timestamp Range Conflict Management. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering (ICDE), Arlington, VA, USA, 1–5 April 2012; pp. 714–725. [Google Scholar]
- Guo, Z.; Wu, K.; Yan, C.; Yu, X. Releasing locks as early as you can: Reducing contention of hotspots by violating two-phase locking. In Proceedings of the 2021 International Conference on Management of Data, Shaanxi, China, 20–25 June 2021; pp. 658–670. [Google Scholar]
- Cooper, B.F.; Silberstein, A.; Tam, E.; Ramakrishnan, R.; Sears, R. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, Indianapolis, IN, USA, 10–11 June 2010; pp. 143–154. [Google Scholar]
- TPC Benchmark C. Available online: http://www.tpc.org/tpcc/ (accessed on 10 May 2023).
- Yang, Y.; Zhu, J. Write skew and zipf distribution: Evidence and implications. ACM Trans. Storage (TOS) 2016, 12, 1–19. [Google Scholar] [CrossRef]
- MySQL. MySQL Internals: Writing a Custom Storage Engine; Technical report; Oracle Corporation: Austin, TX, USA, 2024. [Google Scholar]
- The Binary Log. Available online: https://dev.mysql.com/doc/refman/8.4/en/binary-log.html (accessed on 15 October 2024).
- External Locking. Available online: https://dev.mysql.com/doc/refman/8.4/en/external-locking.html (accessed on 15 October 2024).
- Volos, H.; Magalhaes, G.; Cherkasova, L.; Li, J. Quartz: A lightweight performance emulator for persistent memory software. In Proceedings of the 16th Annual Middleware Conference, Vancouver, BC, Canada, 7–11 December 2015; pp. 37–49. [Google Scholar]
- BenchmarkSQL. Available online: https://github.com/pingcap/benchmarksql (accessed on 8 July 2023).
- Haas, P.J.; Ilyas, I.F.; Lohman, G.M.; Markl, V. Discovering and exploiting statistical properties for query optimization in relational databases: A survey. Stat. Anal. Data Mining ASA Data Sci. J. 2009, 1, 223–250. [Google Scholar] [CrossRef]
- Valavala, M.; Alhamdani, W. Automatic database index tuning using machine learning. In Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021; pp. 523–530. [Google Scholar]
- Cook, J.E.; Klauser, A.W.; Wolf, A.L.; Zorn, B.G. Semi-automatic, self-adaptive control of garbage collection rates in object databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, QC, Canada, 4–6 June 1996; pp. 377–388. [Google Scholar]
GC Issue | Description | NVM Disadvantage | GC Approaches | TB-Collect |
---|---|---|---|---|
Low reclamation rate from chain traversal | Requires scanning all historical versions, inefficient | Lower read bandwidth than DRAM slows version chain traversal | Version Chain Pruning, background scanning | Block-level concurrent reclamation without traversing version chains |
Long version chains degrade access efficiency | Coarse-grained full-partition clearing leads to long chains | Lower read bandwidth than DRAM slows version access | Partition Clearing | Finer-grained block reclamation helps limit chain length |
Overhead of version chain consolidation | Hash index construction triggers many small writes | Write amplification, and lower write bandwidth than DRAM | Partition Clearing | Block-level management and deferred merging to reduce small writes |
OLTP Engine | NVM Storage Layout | GC Approach |
---|---|---|
NoGC | Append-only | Without GC |
Steam | Append-only | Timely Version Chain Pruning |
OneShot | Delta | Partition Clearing |
Zen | Append-only | Background Scanning |
N2DB | Append-only | Timely Version Chain Pruning |
TB-Collect | Append-only | Block Clearing |
Config | Read/Write Bandwidth (% of DRAM) | Cache Line |
---|---|---|
A | 70%, 50% | 256 B |
B | 70%, 50% | 128 B |
C | 90%, 70% | 128 B |
D | 90%, 90% | 128 B |
E | 100%, 100% | 64 B |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, J.; Zhang, Q.; Xiang, Y.; Gong, X. TB-Collect: Efficient Garbage Collection for Non-Volatile Memory Online Transaction Processing Engines. Electronics 2025, 14, 2080. https://doi.org/10.3390/electronics14102080
Wei J, Zhang Q, Xiang Y, Gong X. TB-Collect: Efficient Garbage Collection for Non-Volatile Memory Online Transaction Processing Engines. Electronics. 2025; 14(10):2080. https://doi.org/10.3390/electronics14102080
Chicago/Turabian StyleWei, Jianhao, Qian Zhang, Yiwen Xiang, and Xueqing Gong. 2025. "TB-Collect: Efficient Garbage Collection for Non-Volatile Memory Online Transaction Processing Engines" Electronics 14, no. 10: 2080. https://doi.org/10.3390/electronics14102080
APA StyleWei, J., Zhang, Q., Xiang, Y., & Gong, X. (2025). TB-Collect: Efficient Garbage Collection for Non-Volatile Memory Online Transaction Processing Engines. Electronics, 14(10), 2080. https://doi.org/10.3390/electronics14102080