Exploiting Data Duplication to Reduce Data Migration in Garbage Collection Inside SSD
Abstract
:1. Introduction
- We conduct a preliminary experiment that quantitatively analyzes the impact of different operations within GC on long-tail latency, and it identifies page reads and page writes (data migration) during GC as the predominant contributors to such issues.
- We propose a novel GC scheme rooted in data duplication characteristics—SplitGC, which mainly includes a read scrub-assisted fingerprint generation scheme and a latency-bounded GC scheme.
- We conduct a series of experiments and validate the effectiveness of our scheme. Compared with the state of the art, SplitGC reduces tail latency induced by GC by 8% to 83% at the 99.99th percentile and significantly decreases the amount of valid page migration by 38% to 67% compared with existing schemes.
2. Background and Research Motivation
2.1. SSD Architecture
2.2. Garbage Collection
2.3. Motivation
3. Design
3.1. Design Overview
3.2. Read Scrub-Assisted Fingerprint Generation Scheme
3.3. Latency-Bounded GC Scheme
3.4. Overhead Analysis
4. Experiment
Experimental Results Analysis
5. Related Work
5.1. Optimizing the GC Algorithm
5.2. Scheduling GC
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dean, J.; Barroso, L.A. The tail at scale. Commun. ACM 2013, 56, 74–80. [Google Scholar] [CrossRef]
- Kim, S.; Bae, J.; Jang, H.; Jin, W.; Gong, J.; Lee, S.; Ham, T.J.; Lee, J.W. Practical erase suspension for modern low-latency SSDs. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, USA, 10–12 July 2019; pp. 813–820. [Google Scholar]
- Hong, D.; Kim, M.; Cho, G.; Lee, D.; Kim, J. GuardedErase: Extending SSD lifetimes by protecting weak wordlines. In Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST 22), Santa Clara, CA, USA, 22–24 February 2022; pp. 133–146. [Google Scholar]
- Cho, S.; Kim, B.; Cho, H.; Seo, G.; Mutlu, O.; Kim, M.; Park, J. AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, La Jolla, CA, USA, 27 April–1 May 2024; Volume 3, pp. 101–118. [Google Scholar]
- Kang, W.; Shin, D.; Yoo, S. Reinforcement learning-assisted garbage collection to mitigate long-tail latency in SSD. ACM Trans. Embed. Comput. Syst. (TECS) 2017, 16, 1–20. [Google Scholar] [CrossRef]
- Yang, P.; Xue, N.; Zhang, Y.; Zhou, Y.; Sun, L.; Chen, W.; Chen, Z.; Xia, W.; Li, J.; Kwon, K. Reducing garbage collection overhead in SSD based on workload prediction. In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19), Renton, WA, USA, 8–9 July 2019. [Google Scholar]
- Cheng, W.; Luo, M.; Zeng, L.; Wang, Y.; Brinkmann, A. Lifespan-based garbage collection to improve SSD’s reliability and performance. J. Parallel Distrib. Comput. 2022, 164, 28–39. [Google Scholar] [CrossRef]
- Matsui, C.; Arakawa, A.; Sun, C.; Takeuchi, K. Write order-based garbage collection scheme for an LBA scrambler integrated SSD. IEEE Trans. Very Large Scale Integr. (Vlsi) Syst. 2016, 25, 510–519. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, Y. DA-GC: A dynamic adjustment garbage collection method considering wear-leveling for SSD. In Proceedings of the 2020 on Great Lakes Symposium on VLSI, Virtual Event, China, 7–9 September 2020; pp. 475–480. [Google Scholar]
- Pang, S.; Deng, Y.; Zhang, G.; Zhou, Y.; Qin, X.; Wu, Z.; Li, J. PcGC: A Parity-Check Garbage Collection for Boosting 3D NAND Flash Performance. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2023, 42, 4364–4377. [Google Scholar] [CrossRef]
- Wu, F.; Zhou, J.; Wang, S.; Du, Y.; Yang, C.; Xie, C. FastGC: Accelerate garbage collection via an efficient copyback-based data migration in SSDs. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–28 June 2018; pp. 1–6. [Google Scholar]
- Wu, S.; Du, C.; Li, H.; Jiang, H.; Shen, Z.; Mao, B. CAGC: A content-aware garbage collection scheme for ultra-low latency flash-based SSDs. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Portland, OR, USA, 17–21 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 162–171. [Google Scholar]
- Chen, F.; Luo, T.; Zhang, X. CAFTL: A Content-Aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11), San Jose, CA, USA, 5–17 February 2011. [Google Scholar]
- Kim, B.S.; Yang, H.S.; Min, S.L. AutoSSD: An autonomic SSD architecture. In Proceedings of the 2018 USENIX Annual Technical Conference, Boston, MA, USA, 11–13 July 2018; pp. 677–690. [Google Scholar]
- FIU IODedup Traces. Available online: http://iotta.snia.org/traces/block-io/391 (accessed on 2 May 2025).
- Kim, J.; Lee, C.; Lee, S.; Son, I.; Choi, J.; Yoon, S.; Lee, H.u.; Kang, S.; Won, Y.; Cha, J. Deduplication in SSDs: Model and quantitative analysis. In Proceedings of the 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), San Diego, CA, USA, 16–20 April 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1–12. [Google Scholar]
- Chun, M.; Lee, J.; Kim, M.; Park, J.; Kim, J. RiF: Improving Read Performance of Modern SSDs Using an On-Die Early-Retry Engine. In Proceedings of the 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Edinburgh, UK, 2–6 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 643–656. [Google Scholar]
- Ye, M.; Li, Q.; Gao, C.; Deng, S.; Kuo, T.W.; Xue, C.J. Stop unnecessary refreshing: Extending 3D NAND flash lifetime with ORBER. CCF Trans. High Perform. Comput. 2022, 4, 281–301. [Google Scholar] [CrossRef]
- Kim, B.S.; Choi, J.; Min, S.L. Design tradeoffs for SSD reliability. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST 19), Boston, MA, USA, 25–28 February 2019; pp. 281–294. [Google Scholar]
- Gupta, A.; Pisolkar, R.; Urgaonkar, B.; Sivasubramaniam, A. Leveraging Value Locality in Optimizing NAND Flash-based SSDs. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11), San Jose, CA, USA, 15–17 February 2011. [Google Scholar]
- Wu, S.; Du, C.; Zhu, W.; Zhou, J.; Jiang, H.; Mao, B.; Zeng, L. EaD: ECC-assisted deduplication with high performance and low memory overhead for ultra-low latency flash storage. IEEE Trans. Comput. 2022, 72, 208–221. [Google Scholar] [CrossRef]
- Yang, H.; Hong, J.; Dan, F.; Lei, T.; Shu, P.Z. Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In Proceedings of the International Conference on Supercomputing, Tucson, AZ, USA, 31 May–4 June 2011. [Google Scholar]
- Narayanan, I.; Wang, D.; Jeon, M.; Sharma, B.; Caulfield, L.; Sivasubramaniam, A.; Cutler, B.; Liu, J.; Khessib, B.; Vaid, K. SSD failures in datacenters: What? when? and why? In Proceedings of the 9th ACM International on Systems and Storage Conference, Haifa, Israel, 6–8 June 2016; pp. 1–11. [Google Scholar]
- Dong, Y.; Chen, B.; Pan, Y.; Zou, X.; Xia, W. H2C-Dedup: Reducing I/O and GC Amplification for QLC SSDs from the Deduplication Metadata Perspective. In Proceedings of the 2024 ACM Symposium on Cloud Computing, Redmond, WA, USA, 20–22 November 2024; pp. 704–719. [Google Scholar]
- Yen, M.C.; Chang, S.Y.; Chang, L.P. Lightweight, integrated data deduplication for write stress reduction of mobile flash storage. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 2590–2600. [Google Scholar] [CrossRef]
- Qin, Y.; Feng, D.; Liu, J.; Tong, W.; Zhu, Z. DT-GC: Adaptive garbage collection with dynamic thresholds for SSDs. In Proceedings of the 2014 International Conference on Cloud Computing and Big Data, Sydney, Australia, 3–5 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 182–188. [Google Scholar]
- Lee, J.; Kim, Y.; Shipman, G.M.; Oral, S.; Kim, J. Preemptible I/O scheduling of garbage collection for solid state drives. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2013, 32, 247–260. [Google Scholar] [CrossRef]
- Jung, M.; Choi, W.; Kwon, M.; Srikantaiah, S.; Yoo, J.; Kandemir, M.T. Design of a host interface logic for GC-free SSDs. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2019, 39, 1674–1687. [Google Scholar] [CrossRef]
- Sha, Z.; Li, J.; Song, L.; Tang, J.; Huang, M.; Cai, Z.; Qian, L.; Liao, J.; Liu, Z. Low I/O intensity-aware partial GC scheduling to reduce long-tail latency in SSDs. ACM Trans. Archit. Code Optim. (TACO) 2021, 18, 1–25. [Google Scholar] [CrossRef]
- Choi, W.; Jung, M.; Kandemir, M.; Das, C. Parallelizing garbage collection with I/O to improve flash resource utilization. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, Tempe, AZ, USA, 11–15 June 2018; pp. 243–254. [Google Scholar]
- Shahidi, N.; Kandemir, M.T.; Arjomand, M.; Das, C.R.; Jung, M.; Sivasubramaniam, A. Exploring the potentials of parallel garbage collection in ssds for enterprise storage systems. In Proceedings of the SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA, 13–16 November 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 561–572. [Google Scholar]
- Gao, C.; Shi, L.; Xue, C.J.; Ji, C.; Yang, J.; Zhang, Y. Parallel all the time: Plane level parallelism exploration for high performance SSDs. In Proceedings of the 2019 35th Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, USA, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 172–184. [Google Scholar]
Trace | Write Ratio | Duplication Rate | Average Request Size (KB) |
---|---|---|---|
home1 | 98.6% | 45.3% | 8.25 |
home2 | 86.9% | 33.6% | 10.93 |
home3 | 99.0% | 36.0% | 8.26 |
webmail | 69.6% | 70.1% | 8.00 |
webresearch | 99.9% | 75.2% | 8.00 |
webusers | 89.3% | 63.5% | 8.76 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nie, S.; Niu, J.; Yang, C.; Zhang, P.; Yang, Q.; Wang, D.; Wu, W. Exploiting Data Duplication to Reduce Data Migration in Garbage Collection Inside SSD. Electronics 2025, 14, 1873. https://doi.org/10.3390/electronics14091873
Nie S, Niu J, Yang C, Zhang P, Yang Q, Wang D, Wu W. Exploiting Data Duplication to Reduce Data Migration in Garbage Collection Inside SSD. Electronics. 2025; 14(9):1873. https://doi.org/10.3390/electronics14091873
Chicago/Turabian StyleNie, Shiqiang, Jie Niu, Chaoyun Yang, Peng Zhang, Qiong Yang, Dong Wang, and Weiguo Wu. 2025. "Exploiting Data Duplication to Reduce Data Migration in Garbage Collection Inside SSD" Electronics 14, no. 9: 1873. https://doi.org/10.3390/electronics14091873
APA StyleNie, S., Niu, J., Yang, C., Zhang, P., Yang, Q., Wang, D., & Wu, W. (2025). Exploiting Data Duplication to Reduce Data Migration in Garbage Collection Inside SSD. Electronics, 14(9), 1873. https://doi.org/10.3390/electronics14091873