Next Article in Journal
FFTNet: Fusing Frequency and Temporal Awareness in Long-Term Time Series Forecasting
Previous Article in Journal
Frequency-Domain Masking and Spatial Interaction for Generalizable Deepfake Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ZNSage: Towards Hardware-Agnostic Swap Metadata Management on ZNS SSDs

1
Center for Creative Convergence Education, Hanyang University, Seoul 04763, Republic of Korea
2
Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(7), 1301; https://doi.org/10.3390/electronics14071301
Submission received: 10 February 2025 / Revised: 22 March 2025 / Accepted: 24 March 2025 / Published: 26 March 2025
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
The flash translation layer (FTL) in SSDs introduces logical-to-physical address mapping, but this mapping alters data layout, leading to performance degradation. This issue is particularly evident in the Linux swap subsystem. While SSDs offer higher performance than HDDs, traditional FTL-based SSDs fail to meet swap performance expectations due to scattered page placement. The introduction of zoned namespace (ZNS) SSDs aims to address this problem by enforcing sequential writes within predefined zones, allowing hosts to directly manage physical data layout and garbage collection. Prior work leveraged this capability to enhance swap performance but introduced severe hardware dependencies, requiring specific logical block addressing (LBA) formats and vendor-specific firmware modifications. To overcome these limitations, we propose ZNSage, a hardware-agnostic swap subsystem for ZNS SSDs. Unlike prior work, ZNSage eliminates hardware dependencies by storing logical-to-physical mapping information in system memory, enabling compatibility with any ZNS SSD. ZNSage not only improves swap performance but also optimizes garbage collection by minimizing page faults during swap zone reclamation. Ultimately, ZNSage provides a generalized and efficient swap solution for next-generation storage architectures.

1. Introduction

In hyperscale data centers, frequently accessed or performance-critical “hot” data are stored on SSDs for fast retrieval, while less frequently accessed “warm” or “cold” data reside on more cost-efficient HDD media [1]. Beyond performance, there is a fundamental difference in the unit of read/write operations between HDDs and SSDs: HDDs use sectors, while SSDs use pages. Due to this difference in I/O granularity, SSDs cannot provide the same host interface as HDDs. Instead, they rely on the FTL to handle address translation and ensure compatibility.
The primary functions of the FTL are logical-to-physical address translation and garbage collection. When storing and accessing data, the FTL dynamically maps logical addresses requested by applications to arbitrary physical addresses. Additionally, to free up space, garbage collection relocates valid pages to different blocks, meaning that logical-to-physical mappings are not fixed. As a result, the logical data layout that applications perceive is arbitrarily transformed by the FTL and does not persist in the SSD. The key issue is that this mismatch between logical and physical data layouts degrades performance. Regardless of the storage type, sequential read/write operations outperform random ones. For high-end HDDs and SSDs, the performance gap is substantial: for HDDs, sequential reads can be up to 100 times faster and sequential writes up to 160 times faster than their random counterparts. While this gap is significantly reduced for SSDs, sequential reads and writes can still be up to 2.03 times and 2.44 times faster, respectively [2], making it a non-negligible factor. Consequently, applications attempt to optimize I/O performance by aggregating data for sequential writes and reads. However, despite these efforts, the FTL distributes even contiguous data across flash memory blocks, preventing the expected sequential access performance.
A concrete example of this performance degradation is the Linux swap subsystem. The Linux swap subsystem [3] extends virtual memory using disk space, optimizing memory allocation by swapping pages in and out between system memory and the disk. However, as disk access is orders of magnitude slower than system memory [4], swap operations often cause severe performance degradation. While SSDs are faster than HDDs, simply using them as swap devices does not yield the expected performance benefits. Due to the high cost of swap-out operations, the swap subsystem attempts to optimize costs by grouping as many adjacent pages as possible before swapping them out. However, because of the FTL, these pages are ultimately scattered across flash memory, making subsequent sequential reads impossible. Moreover, swap pages consist of both valid pages that will be accessed again and invalid pages that will never be used. It is not feasible for the swap subsystem to convey this information to the SSD in real time, as doing so would impose excessive overhead. As a result, from the SSD’s perspective, even unnecessary swap pages are treated as valid and are unnecessarily included in garbage collection, further accelerating performance degradation.
To address such mismatches between application-level and storage-level data layouts, the ZNS standard [5,6] was introduced in 2020. ZNS SSDs divide their storage space into fixed-size zones and enforce sequential writes within each zone. The reason ZNS SSDs maintain logical-to-physical data layout consistency is due to the placement of the FTL. Unlike conventional SSDs, where the FTL is hidden inside the device, ZNS SSDs shift this responsibility to the host. This enables applications to directly manage the physical data layout on the SSD and even control garbage collection.
With the introduction of ZNS SSDs, ZNSwap [7] was proposed to enhance the performance of the Linux swap subsystem. As mentioned earlier, in ZNS SSDs, the host FTL directly manages logical-to-physical page mappings. Thus, for ZNS SSDs to function as swap devices, the FTL must be extended to manage swap page mappings and store this mapping information. The key strategy of ZNSwap is to distribute mapping metadata across the metadata area of physical pages, eliminating the need for system memory to store this information. Specifically, whenever garbage collection moves swap pages to new physical addresses, reverse mapping metadata are stored in the metadata space of the newly relocated physical pages, consuming 24 bytes per page.
However, this strategy introduces severe hardware constraints. First, the set of supported logical block address (LBA) formats varies across SSDs, and each LBA format allows for a different metadata size. Since reverse mapping metadata require 24 bytes, ZNSwap can only support ZNS SSDs that provide an LBA format with sufficient metadata space. Second, even if an SSD supports the required LBA format, ZNSwap must still be ported to each SSD model. This is because SSDs differ in firmware, interface types (NVMe, SAS, SATA), and vendor-specific specifications, which affect how metadata regions within SSD pages are accessed. Given that the total size of reverse mapping metadata for swap pages is at most a few tens of MB and only needs to be maintained while the system is powered on, the effort required to save system memory appears excessive.
This paper proposes ZNSage, a swap subsystem for ZNS SSDs that eliminates hardware dependencies. As discussed earlier, enabling ZNS SSDs to function as swap devices requires a mechanism to manage and store logical-to-physical page mappings for swap pages. Unlike ZNSwap, which embeds mapping metadata within the SSD, ZNSage stores this information in system memory. This approach removes hardware constraints, allowing any ZNS SSD to be used as a swap device.
This paper makes the following key contributions:
  • We fully resolve the limitations of traditional SSD-based swapping by leveraging ZNS SSDs, significantly improving swap performance.
  • Unlike prior work, such as ZNSwap, which imposes hardware restrictions, ZNSage provides a generalized swap solution applicable to all ZNS SSDs, regardless of vendor or model.
  • We minimize page faults that may occur during swap zone garbage collection, further improving system efficiency.

2. Background

SSDs employ various techniques to enhance performance and endurance, including multi-streaming, ZNS, and protection information (PI), which optimize data placement, garbage collection, and integrity.

2.1. Multi-Stream and ZNS SSDs as SSD Garbage Collection Optimization Techniques

Garbage collection (GC) is an essential internal operation in SSDs designed to reclaim large amounts of free space. This process is necessary due to the unique characteristics of NAND flash memory: (1) in-place updates are not possible, (2) only sequential writes are allowed, and (3) erasures cannot be performed at the page level but only at the block level, typically comprising 64 to 256 pages. When an SSD lacks sufficient space for new writes, garbage collection is triggered to free up space.
The GC process consists of three main components: a victim block, the valid pages within the victim block, and a clean block. The process begins by copying valid pages from the victim block to a clean block, after which the entire victim block is erased, reclaiming a large amount of available space. As illustrated in Figure 1, assuming A–D are invalid pages, A′–D′ are valid pages, X is the victim block, and Y is the clean block, the GC process moves A′–D′ to Y and then erases X to obtain a new clean block. This process is not a one-time operation but can continue iteratively until the required free space is secured.
The fundamental problem with GC is that it involves massive data copying and block erasures. During GC execution, all application I/O requests are stalled, and the additional writes—which applications did not explicitly request—significantly degrade SSD throughput and exponentially increase write amplification (WAF), reducing SSD lifespan. Consequently, continuous optimization efforts have been made to mitigate these overheads. Since the erase operation is mandatory, optimization efforts primarily focus on the page copying stage, where the key challenge is minimizing the number of pages that need to be copied. Two major optimization techniques have been proposed, each balancing a trade-off between page placement complexity and the number of pages copied during GC.
  • Indirect Page Invalidity Updates (via Side Channels): This method does not control page placement but instead relies on side channels to update page invalidity information, ensuring that invalid pages are excluded from GC copying. Only the application layer can determine whether a page is still valid; however, because the application and storage layers are completely separated, once data are written, their validity does not automatically update. The SSD TRIM command serves precisely this purpose, but due to its significant interface overhead, it is typically applied at coarse granularities (e.g., 1MB LBA regions for Linux swap), limiting its practical effectiveness [7].
  • Lifetime-Based Page Placement: This method groups pages with similar lifetimes into the same flash block, so when their lifespan ends, the entire block can be erased at once, eliminating the need for page copying. A well-known example is multi-stream, introduced in 2014 [8,9,10]. However, this technique had severe resource constraints (supporting fewer than 10 streams in most implementations) and lacked demonstrated benefits at the application level, leading to diminished interest in the technology.
In 2020, the ZNS command set [5,6] was introduced as a standard that inherits the streaming concept. Unlike multi-stream, which faced challenges in proving its effectiveness at the application level, ZNS must satisfy two critical conditions to demonstrate its practical utility:
  • Minimal Interface Constraints Due to Firmware or Hardware Limitations: Prior research on multi-stream has shown that using only one or two streams does not yield significant performance improvements [8,9]. If stream merging or stream recycling is required due to resource limitations [8], or if file size constraints prevent unrestricted stream creation [9], the substantial interface overhead (“stream I/O tax”) makes adoption impractical.
  • Integration into the SSD’s Primary I/O Path: Modern SSDs prioritize maximizing their primary I/O performance through mechanisms such as channel and way parallelism, fully hardware-accelerated data paths, and firmware-based implementations only for secondary I/O operations (e.g., Set/GetFeature, TRIM). Since operations integrated into the primary I/O path exhibit superior performance, using streams as an auxiliary mechanism imposes inherent limitations on overall SSD performance. For stream I/O to provide clear application-level performance gains, it must be integrated into the primary I/O path.
Since ZNS SSDs meet both of these conditions, they are expected to deliver greater application-level benefits than multi-stream. Leading SSD manufacturers, including Western Digital, Samsung, and SK Hynix, have already released ZNS SSDs adhering to this standard [11,12,13]. Unlike traditional SSDs, which rely on internal FTL, applications previously had no way to determine the physical data layout inside the SSD. However, applications already manage their logical data layout, and if they can also control the physical layout, they can easily align the two.
ZNS enables applications to:
  • Minimize GC overhead by including only valid pages in GC operations;
  • Ensure sequential physical writes whenever a zone write command is issued.
This not only eliminates logical–physical layout mismatches but also allows applications to fully leverage sequential read/write performance. Recent research has explored optimizing performance by adopting ZNS SSDs for storage systems with relatively simple I/O patterns, such as key–value stores [14,15,16,17,18,19,20,21] and the Linux swap subsystem [7].

2.2. LBA Format and PI [22] in SSDs

The LBA format defines how data are structured and accessed in SSDs. Each LBA sector typically consists of user data (e.g., 512 B or 4 KB) and optional metadata, such as the error correction code (ECC). Additionally, some SSDs support PI as an optional extension to the LBA structure, improving data integrity and error detection. PI generally includes a guard field, which contains a cyclic redundancy check (CRC) or checksum for detecting bit errors; an application tag for software-defined integrity validation; and a reference tag to associate sectors with their expected values. By operating in conjunction with the data integrity field (DIF) at the storage device level and data integrity extensions (DIXs) at the host level, PI ensures end-to-end data validation, thereby enhancing the overall reliability of SSD storage.

3. Design

This section describes the design considerations of ZNSage, a Linux swap subsystem that operates independently of ZNS SSD models.

3.1. Employing ZNS SSD as a Swap Device

ZNS SSDs represent a state-of-the-art solution for aligning the logical–physical data layout of SSD storage. However, they are not a universal solution, as they do not support random writes due to their zone-based architecture. Traditional applications exhibit diverse I/O patterns, many of which are not strictly sequential. Consequently, the synergy between ZNS SSDs and an application is maximized only when the conversion from random to sequential access is natural and efficient. Although not directly evaluated on ZNS SSDs, prior research such as FlashAlloc [23], which implemented an interface similar to multi-streaming on conventional SSDs, demonstrates that the throughput and WAF benefits of multi-streaming diminish as random writes increase—particularly due to metadata file writes. In complex applications where only a small fraction of accesses are sequential or where forced conversion from random to sequential access is required, the overhead of this transformation can outweigh the benefits, leading to degraded performance.
Thus, the compatibility between an application’s I/O pattern and ZNS SSD characteristics is crucial. In the case of the Linux swap subsystem, the I/O pattern is relatively simple and naturally amenable to sequential access. This makes ZNS SSDs a highly suitable choice for swap storage, effectively resolving the inherent limitations of SSD-based swap. As discussed in Section 1, the key issue with conventional SSD-based swap is the mismatch between logical and physical data layouts, which results in physically scattered pages despite their logical adjacency. Additionally, due to the high communication overhead between the host and SSD, the system cannot efficiently convey information about valid and invalid swap pages, leading to unnecessary inclusion of invalid pages in garbage collection.
In contrast, ZNS SSD-based swap can mitigate these issues. By leveraging a host-managed FTL, the system can track zone states and ensure that logically adjacent swap pages are placed within the same zone, preserving their physical locality. Furthermore, garbage collection efficiency improves, as only valid swap pages are included in GC without additional communication overhead. This is because, unlike conventional SSD swap systems where the host–SSD communication cost is high, a ZNS SSD-based swap system can directly pass validity information to the host FTL, enabling an accurate and low-cost GC process.

3.2. ZNS SSD-Agnostic Swap Subsystem in Linux

As discussed in Section 1, using a ZNS SSD as a swap device requires the swap subsystem to manage the logical-to-physical mapping of swap pages. This necessitates both an FTL capable of tracking mapping information and sufficient storage space to maintain the mapping. When garbage collection occurs on a victim swap zone, valid swap pages within that zone must be relocated to a clean zone, triggering an update to the logical–physical mapping. To accommodate this, ZNSwap [7] stores updated reverse mapping metadata (which associate physical pages with their corresponding logical pages) within the metadata space of the relocated physical page, consuming 24 bytes per entry. This approach distributes mapping information across the physical page metadata space, eliminating the need for additional system memory for mapping storage.
For instance, ZNSwap utilizes Western Digital’s Ultrastar DC ZN540 ZNS SSD [12], which supports five different LBA formats (Table 1). The metadata size (ms field) in these formats can be 0, 8, or 64 bytes, with the metadata size including PI if enabled. Since ZNSwap requires 24 bytes per page for reverse mapping metadata, it utilizes the LBA format that supports 64-byte metadata (ms:64). However, this approach has significant limitations:
This paper argues that storing mapping metadata within SSD metadata space is an inadequate strategy for four key reasons:
  • Hardware Constraints: As explained earlier, not all ZNS SSDs support the required LBA format, and the use of metadata space varies across firmware, interfaces (NVMe, SAS, SATA), and vendor specifications. This creates significant hardware dependencies and portability issues.
  • Minimal Memory Overhead: The assumption that significant system memory is required for mapping storage is questionable. For a system with 64 GB+ of RAM, the recommended swap size is 4 GB [24]. Assuming a 4 KB page size, this corresponds to a maximum of 1 M swap pages, each requiring 24 bytes of reverse mapping metadata, amounting to only 24 MB—a trivial amount of memory.
  • No Persistence Requirement: Unlike file system metadata, swap page mappings do not require persistence. Since swap pages exist solely to extend system memory, they are only valid while the system is powered on. Thus, storing swap metadata in persistent SSD storage is unnecessary.
  • Existing Metadata Usage: The metadata space utilized by ZNSwap is already allocated for critical functions in enterprise environments, such as data integrity protection (PI) [22] and user data expansion. Using this space for swap metadata requires careful consideration, as data integrity and storage capacity extensions take precedence over swap-related functions.
To address these challenges, this paper proposes ZNSage, a Linux swap subsystem that operates independently of ZNS SSD models. Unlike ZNSwap, which stores mapping metadata in SSD metadata space, ZNSage maintains mapping information in system memory, leveraging its small size and volatile nature for efficient storage. This approach offers several advantages. It eliminates hardware dependencies, allowing any ZNS SSD to be used as a swap device. It also preserves SSD metadata space for its intended PI and storage capacity functions. Additionally, it avoids unnecessary persistence overhead, aligning with the ephemeral nature of swap pages. By decoupling swap metadata from SSD hardware constraints, ZNSage provides a universally compatible, efficient, and optimized solution for leveraging ZNS SSDs as swap storage.

4. Implementation

This section describes the data structures, overall process, and optimization strategies of ZNSage.

4.1. Reverse Mapping Data Structure for Swap Pages

Since a ZNS SSD manages logical-to-physical page mappings directly through a host FTL, an additional mechanism is required to track mapping information for each swap page. When garbage collection occurs in a swap zone, valid swap pages must be copied to a new swap zone, necessitating updates to the logical-to-physical page mapping. ZNSwap manages this mapping using reverse mapping metadata, which allows retrieving the logical page corresponding to a physical page. This paper adopts a similar approach but extends it by maintaining this metadata in system memory rather than SSD metadata space. To facilitate this, an additional data structure is introduced for system memory management.
As shown in Figure 2, the proposed reverse mapping data structure consists of anon_vma and an index, enabling retrieval of the logical page corresponding to a physical page. Specifically, anon_vma provides access to the vm_area_struct that manages the virtual address space of each process. From vm_area_struct, it is possible to navigate to mm_struct, then to the process’s page table directory pointer (pgd). Once pgd is found, the second component of the reverse mapping data structure—the page table index—is used to compute the virtual page address corresponding to the physical page.

4.2. Managing Reverse Mapping Data in System Memory

To ensure compatibility across different ZNS SSD models, this study manages reverse mapping data in system memory instead of SSD metadata space. Figure 3 illustrates the data structure used for reverse mapping in system memory. Each swap zone consists of multiple swap slots, where each swap slot represents a swap page, requiring reverse mapping data. To facilitate this, the struct swap_zone structure, which represents a swap zone, is extended with the page_md_m field. In the proposed implementation, all zones are managed through the zns_swap_info structure, which allows zone-specific information (swap_zone) to be accessed and, through its metadata field, reverse mapping data to be retrieved for each swap slot within the zone.
In typical usage scenarios, the memory requirement for swap metadata is minimal. For example, with a 4 GB swap size, only 24 MB of system memory is required for reverse mapping data. Given that this is a small fraction of total system memory, efficient management of this memory is not a critical concern for the system’s overall performance. This allows the focus to remain on ensuring compatibility and functionality across different SSD models rather than optimizing system memory management for metadata.

4.3. Overall Swap-Out Process

ZNSage selects swap-out pages based on the least recently used (LRU) policy, assigning them to swap zones and swap slots. Since ZNS SSDs use an append-only write model, once a swap zone is determined, the swap slot is automatically assigned to the next write position. The swap zone allocation can follow various policies; the current implementation adopts a per-core policy, where each core is assigned a dedicated swap zone. As illustrated in Figure 4, when a page stored at memory address 0xde8a000 is selected for swapping out, the per-core policy assigns it to zone N. Given that the next write position in this zone is 0 × 810,020,000, swap slot 131,072 is assigned accordingly.
However, assigning a swap zone and slot does not immediately trigger a write request. First, the reverse mapping information is updated and stored in the metadata field corresponding to the assigned slot. Then, the write request is added to the bio (block I/O) list of the target zone. A bio is a data structure used in block device I/O operations, representing a buffered I/O request.
To manage write requests efficiently, ZNSage maintains a bio list for each zone. The prototype implementation of ZNSage uses an SK Hynix ZNS SSD, which processes writes in 192 KB units. Figure 5 depicts the process of handling write requests while considering this write unit. If a write request occurs and no bio list exists for the target zone, a new 192 KB bio list is created. As swap-out operations continue, write requests are added to the zone’s bio list. Once the accumulated requests reach 192 KB, the bio is submitted for writing.

4.4. Garbage Collection Optimization via Page Fault Minimization

Repeated swap-in and swap-out operations cause invalid pages to accumulate in swap zones, requiring garbage collection to reclaim space. During garbage collection, valid pages from a victim swap zone are copied to a new swap zone via system memory, and the victim zone is erased to become a free zone. However, even though valid pages temporarily reside in system memory, the page table still considers them to be in SSD storage, leading to page faults. This results in unnecessary read operations from the victim zone, incurring significant overhead.
To mitigate this issue, as illustrated in Figure 6, ZNSage updates the page table when transferring valid pages to system memory, ensuring that it correctly reflects the new locations. This prevents unnecessary page faults during garbage collection.

5. Evaluation

This section outlines the evaluation environment and results of ZNSage, along with a comparative analysis against the prior system, ZNSwap.

5.1. Experimental Setup

The system configuration used for evaluation is summarized in Table 2. To evaluate ZNSage, two ZNS SSDs were tested: the SK Hynix ZNS Prototype SSD, which is incompatible with ZNSwap due to its LBA format limitation (metadata larger than 24 bytes are unsupported), and the Western Digital ZN540 ZNS SSD, used for ZNSwap. The system setup remained identical for both SSDs to ensure a fair comparison. ZNSage was evaluated using the SK Hynix SSD, while ZNSwap was tested on the Western Digital SSD, with both setups applying the same 12% over-provisioning ratio as assumed by ZNSwap.
To monitor lock contention and performance, we collected data from Lock Contention Statistics (/proc/lock_stat) and Kernel Code Coverage (/debug/gcov), enabling kernel configurations CONFIG_LOCK_STAT, CONFIG_GCOV_KERNEL, and CONFIG_GCOV_PROFILE_ALL. These configurations introduced significant overhead, reducing bandwidth by up to 9× in ZNSwap evaluations. For fairness, all experimental settings matched those used in the ZNSwap evaluation. To simulate memory shortage, we set the Cgroup memory limit to 2 GB. While 512 GB of swap space is not typical, this setting was chosen to focus on swap performance.

5.2. Micro-Benchmarks

The micro-benchmarks assess the performance of ZNSage under different swap device utilization levels using the vm-scalability benchmark [25]. The goal was to evaluate the efficiency and stability of swap bandwidth at varying device utilizations.

5.2.1. Benchmark Setup for vm-Scalability

The case-anon-w-rand-mt test from vm-scalability was used to test swap performance under random read and write operations, with 16 test threads. Swap device utilization varied from 20% to 80%.

5.2.2. vm-Scalability Performance Results

The results show that ZNSage performed efficiently, with stable swap bandwidth across varying device utilization levels, indicating that ZNSage operates efficiently even under higher swap load.
As shown in Figure 7, the swap bandwidth remained stable regardless of increasing device utilization, which is noteworthy since higher device utilization generally leads to reduced I/O bandwidth on SSDs. This suggests that ZNSage handles swap activity efficiently, maintaining consistent performance despite changes in swap space utilization.

5.2.3. Performance Comparison with ZNSwap

ZNSage outperformed ZNSwap by 12% in swap-out bandwidth at 60% device utilization, with similar improvements observed at 20%, 40%, and 80% utilization levels, as shown in Figure 8, demonstrating its overall higher efficiency in handling swap operations.

5.3. Macro-Benchmarks

To test ZNSage under more realistic, memory-intensive workloads, we used Memcached [26] with YCSB benchmarks [27]. This section explores how ZNSage performs in real-world scenarios, including both read-heavy and update-heavy workloads.

5.3.1. Benchmark Setup for Memcached and YCSB

The Memcached server ran on the hardware setup described in Section 5.1, and the YCSB client was run on a separate system, as shown in Table 3. Two configurations were tested: (1) 100% system memory usage and (2) 50% system memory, 50% swap usage. The YCSB workload included both read-heavy and update-heavy patterns.

5.3.2. Memcached–YCSB Results

The results demonstrate that swap usage led to performance degradation, particularly under higher update frequencies. However, ZNSage managed the performance drop more effectively than other systems.
Figure 9 presents the throughput results for both YCSB workloads. As the update ratio increased, performance degradation was more pronounced, with a 40.66% drop in throughput for the 95% read/5% update workload. However, the degradation slowed to 22.83% in the 50% read/50% update case, showing that ZNSage mitigates the impact of high update rates on performance more effectively than other systems.

6. Conclusions

This paper proposes ZNSage, a Linux swap subsystem designed for ZNS SSDs. Unlike its predecessor, ZNSwap, which is restricted to specific ZNS SSDs due to hardware limitations, the proposed system overcomes these constraints by managing swap page mappings in system memory. Additionally, to optimize garbage collection during swapping, it introduces a scheme that minimizes page faults on swap pages.
Performance evaluations using the vm-scalability micro-benchmark demonstrate up to 12% higher swap bandwidth compared to ZNSwap. Furthermore, real-world workload experiments with Memcached–YCSB quantify the impact of swapping on system performance, highlighting its effectiveness in mitigating performance degradation.
As future work, we plan to explore integrating techniques like fuzzy logic and approximate computing, as discussed in [28], into the ZNS-SSD-based Linux swap system. These methods could enhance memory management efficiency. Additionally, we aim to investigate the energy impact of swap operations, particularly in I/O-intensive scenarios, and explore how optimizations in swap management can reduce energy consumption alongside improving performance. Moreover, we plan to extend our testing to include a broader range of workloads, such as SQL database and machine learning tasks, to better understand the behavior of swap operations under different system demands. This would provide a more comprehensive evaluation of the ZNS-SSD-based swap system across diverse real-world applications.

Author Contributions

Methodology, I.J.; Software, M.J.K.; Writing—original draft, I.J.; Writing—review & editing, M.J.K.; Project administration, I.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea grant funded by the Korean government (MSIT) (RS-2023-00250918).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Herreria, A.A. Balancing Act: HDDs and SSDs in Modern Data Centers. Western Digital Blog, 25 July 2024. Available online: https://blog.westerndigital.com/a-balancing-act-hdds-and-ssds-in-modern-data-centers/ (accessed on 10 February 2025).
  2. Rana, A. Sequential vs. Random Read/Write Performance in Storage. StoredBits, 2 September 2024. Available online: https://storedbits.com/sequential-vs-random-data/ (accessed on 10 February 2025).
  3. Jacob, B. The Memory System; Morgan & Claypool Publishers: Kentfield, CA, USA, 2009. [Google Scholar]
  4. Green, S. DRAM or Not? The Difference Between DRAM and DRAM-Less SSDs (and Why It Matters). Phison Blog, 1 July 2024. Available online: https://phisonblog.com/dram-or-not-the-difference-between-dram-and-dram-less-ssds-and-why-it-matters/ (accessed on 10 February 2025).
  5. NVM Express. Zoned Namespace Command Set Specification, Revision 1.2. NVM Express, 5 August 2024. Available online: https://nvmexpress.org/wp-content/uploads/NVM-Express-Zoned-Namespace-Command-Set-Specification-Revision-1.2-2024.08.05-Ratified.pdf (accessed on 10 February 2025).
  6. Bjørling, M.; Aghayev, A.; Holmberg, H.; Ramesh, A.; Le Moal, D.; Ganger, G.R.; Amvrosiadis, G. ZNS: Avoiding the Block Interface Tax for Flash-Based SSDs. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 21), Berkeley, CA, USA, 14–16 July 2021; pp. 689–703. [Google Scholar]
  7. Bergman, S.; Cassel, N.; Bjørling, M.; Silberstein, M. ZNSwap: Unblock Your Swap. In Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 2022), Carlsbad, CA, USA, 1–18 October 2022. [Google Scholar]
  8. Kang, J.-U.; Hyun, J.; Maeng, H.; Cho, S. The Multi-Streamed Solid-State Drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 14), Philadelphia, PA, USA, 17–18 June 2014. [Google Scholar]
  9. Yang, J.; Pandurangan, R.; Choi, C.; Balakrishnan, V. AutoStream: Automatic Stream Management for Multi-Streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference, Haifa, Israel, 22–24 May 2017; pp. 1–11. [Google Scholar]
  10. Kim, T.; Hong, D.; Hahn, S.S.; Chun, M.; Lee, S.; Hwang, J.; Lee, J.; Kim, J. Fully Automatic Stream Management for Multi-Streamed SSDs Using Program Contexts. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST 19), Boston, MA, USA, 25–28 February 2019; pp. 295–308. [Google Scholar]
  11. Samsung Semiconductor Newsroom. Samsung Introduces Its First ZNS SSD with Maximized User Capacity and Enhanced Lifespan. Samsung Newsroom, 2 June 2021. Available online: https://news.samsungsemiconductor.com/global/samsung-introduces-its-first-zns-ssd-with-maximized-user-capacity-and-enhanced-lifespan/ (accessed on 10 February 2025).
  12. Western Digital. Ultrastar DC ZN540 Data Sheet. Western Digital Documentation. September 2021. Available online: https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/collateral/data-sheet/data-sheet-ultrastar-dc-zn540.pdf (accessed on 10 February 2025).
  13. Chung, W. Benefits of ZNS in Datacenter Storage Systems. In Proceedings of the Flash Memory Summit, Santa Clara, CA, USA, 6–8 August 2019; Available online: https://files.futurememorystorage.com/proceedings/2019/08-06-Tuesday/20190806_ARCH-102-1_Chung.pdf (accessed on 10 February 2025).
  14. Choi, G.; Lee, K.; Oh, M.; Choi, J.; Jhin, J.; Oh, Y. A New LSM-Style Garbage Collection Scheme for ZNS SSDs. In Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems, Online, 13–14 July 2020. [Google Scholar]
  15. Holmberg, H. ZenFS, Zones, and RocksDB. SDC 2020. September 2020. Available online: https://www.snia.org/educational-library/zenfs-zones-and-rocksdb-who-likes-take-out-garbage-anyway-2020 (accessed on 23 March 2025).
  16. Oh, G.; Yang, J.; Ahn, S. Efficient Key-Value Data Placement for ZNS SSD. Appl. Sci. 2021, 11, 11842. [Google Scholar] [CrossRef]
  17. Stavrinos, T.; Kourtis, K.; Ioannidis, S. Don’t Be a Blockhead: Zoned Namespaces Make Work on Conventional SSDs Obsolete. In Proceedings of the Workshop on Hot Topics in Operating Systems, Ann Arbor, MI, USA, 31 May–2 June 2021; pp. 144–151. [Google Scholar]
  18. Han, K.; Gwak, H.; Shin, D.; Hwang, J.-Y. ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction. In Proceedings of the 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21), Online, 14–16 July 2021; pp. 147–162. [Google Scholar]
  19. Lee, H.-R.; Lee, C.-G.; Lee, S.; Kim, Y. Compaction-Aware Zone Allocation for LSM-Based Key-Value Store on ZNS SSDs. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems, New York, NY, USA, 27–28 June 2022; pp. 93–99. [Google Scholar]
  20. Jung, J.; Shin, D. Lifetime-Leveling LSM-Tree Compaction for ZNS SSD. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems, New York, NY, USA, 27–28 June 2022; pp. 100–105. [Google Scholar]
  21. Jung, S.; Lee, S.; Han, J.; Kim, Y. Preemptive Zone Reset Design within Zoned Namespace SSD Firmware. Electronics 2023, 12, 798. [Google Scholar] [CrossRef]
  22. Snow, N. What Is T10 Protection Information? Kioxia Blog, 26 September 2023. Available online: https://blog-us.kioxia.com/post/2023/09/26/what-is-t10-protect-information (accessed on 10 February 2025).
  23. Park, J.; Choi, S.; Oh, G.; Im, S.; Oh, M.-W.; Lee, S.-W. FlashAlloc: Dedicating Flash Blocks by Objects. Proc. VLDB Endow. 2023, 16, 3266–3278. [Google Scholar]
  24. Both, D. What’s the Right Amount of Swap Space for a Modern Linux System? Opensource.com, 11 February 2019. Available online: https://opensource.com/article/19/2/swap-space-poll (accessed on 10 February 2025).
  25. VM-Scalability. Available online: https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/about/ (accessed on 10 February 2025).
  26. Fitzpatrick, B. Distributed Caching with Memcached. Linux J. 2004, 124, 5. [Google Scholar]
  27. Cooper, B.F.; Silberstein, A.; Tam, E.; Ramakrishnan, R.; Sears, R. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, Indianapolis, IN, USA, 10–11 June 2010; pp. 143–154. [Google Scholar]
  28. Khaleqi Qaleh Jooq, M.; Behbahani, F.; Al-Shidaifat, A.; Khan, S.R.; Song, H. A High-Performance and Ultra-Efficient Fully Programmable Fuzzy Membership Function Generator Using FinFET Technology for Image Enhancement. AEU—Int. J. Electron. Commun. 2023, 163, 154598. [Google Scholar] [CrossRef]
Figure 1. Garbage collection process in SSDs.
Figure 1. Garbage collection process in SSDs.
Electronics 14 01301 g001
Figure 2. Hardware-agnostic data structure for reverse mapping of swap pages.
Figure 2. Hardware-agnostic data structure for reverse mapping of swap pages.
Electronics 14 01301 g002
Figure 3. In-memory data structure for managing reverse mappings and swap metadata.
Figure 3. In-memory data structure for managing reverse mappings and swap metadata.
Electronics 14 01301 g003
Figure 4. Allocation of swap zones and slots.
Figure 4. Allocation of swap zones and slots.
Electronics 14 01301 g004
Figure 5. Write request handling.
Figure 5. Write request handling.
Electronics 14 01301 g005
Figure 6. Optimization of page fault reduction during garbage collection: (1) transferring valid pages, (2) updating the page table, and (3) copying system memory to a clean zone.
Figure 6. Optimization of page fault reduction during garbage collection: (1) transferring valid pages, (2) updating the page table, and (3) copying system memory to a clean zone.
Electronics 14 01301 g006
Figure 7. Swap performance at different swap utilization levels.
Figure 7. Swap performance at different swap utilization levels.
Electronics 14 01301 g007
Figure 8. Relative swap bandwidth of ZNSage compared to ZNSwap.
Figure 8. Relative swap bandwidth of ZNSage compared to ZNSwap.
Electronics 14 01301 g008
Figure 9. Overall throughput from Memcached–YCSB.
Figure 9. Overall throughput from Memcached–YCSB.
Electronics 14 01301 g009
Table 1. LBA formats available in Western Digital ZN540 ZNS SSD.
Table 1. LBA formats available in Western Digital ZN540 ZNS SSD.
LBA Format (lbaf)Metadata Size (ms)LBA Data Size (lbads)Relative Performance (rp)
0 090
1890
20120
38120
464120
Table 2. System configuration (ZNS SSD).
Table 2. System configuration (ZNS SSD).
CPUIntel Core i7-12700K (3.6 GHz, 8 cores), hyper-threading (HT) disabled, performance mode enabled, turbo mode disabled
System memory16 GB DRAM
ZNS SSDSK Hynix ZNS SSD (32 TB)—zone capacity: 96 MB, zone size: 96 MB
Western Digital ZN540 ZNS SSD (8 TB)—zone capacity: 1077 MB, zone size: 2 GB
Operating systemUbuntu 22.04, Cgroup memory limit: 2 GB, swap space: 512 GB
Table 3. System configuration.
Table 3. System configuration.
CPUIntel Core i5-12500K (3.0 GHz, 6 cores), hyper-threading (HT) disabled, performance mode enabled, turbo mode disabled
System memory16 GB DRAM
Network2.5 Gbps Ethernet
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jo, I.; Koo, M.J. ZNSage: Towards Hardware-Agnostic Swap Metadata Management on ZNS SSDs. Electronics 2025, 14, 1301. https://doi.org/10.3390/electronics14071301

AMA Style

Jo I, Koo MJ. ZNSage: Towards Hardware-Agnostic Swap Metadata Management on ZNS SSDs. Electronics. 2025; 14(7):1301. https://doi.org/10.3390/electronics14071301

Chicago/Turabian Style

Jo, Insoon, and Min Jee Koo. 2025. "ZNSage: Towards Hardware-Agnostic Swap Metadata Management on ZNS SSDs" Electronics 14, no. 7: 1301. https://doi.org/10.3390/electronics14071301

APA Style

Jo, I., & Koo, M. J. (2025). ZNSage: Towards Hardware-Agnostic Swap Metadata Management on ZNS SSDs. Electronics, 14(7), 1301. https://doi.org/10.3390/electronics14071301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop