VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization

Moon, Gyupin; Kang, Donghyun

doi:10.3390/app152212049

Open AccessArticle

VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization

by

Gyupin Moon

and

Donghyun Kang

^*

Department of Computer Science and Artificial Intelligence, Dongguk University-Seoul, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(22), 12049; https://doi.org/10.3390/app152212049

Submission received: 21 October 2025 / Revised: 10 November 2025 / Accepted: 11 November 2025 / Published: 12 November 2025

Download

Browse Figures

Versions Notes

Abstract

The memory demand of modern applications has been rapidly increasing with the continuous growth of data volume across industrial and academic domains. As a result, computing devices (i.e., IoT devices, smartphones, and tablets) often experience memory shortages that degrade system performance and quality of service by wasting CPU cycles and energy. Thus, most operating systems rely on the swap mechanism to mitigate the memory shortage situation in advance, even if the swap memory fragmentation problem occurs over time. In this paper, we analyze the fragmentation behavior of the swap memory space within storage devices over time and demonstrate that the latency of swap operations increases significantly under aged conditions. We also propose a new extension of the traditional swap mechanism, called VSwap, that mitigates the swap memory fragmentation problem in advance by introducing two core techniques, virtual migration and address remapping. In VSwap, virtual migration gathers valid swap pages scattered across multiple clusters into contiguous regions within the swap memory space, while address remapping updates the corresponding page table entries to preserve consistency after migration. For experiments, we enable VSwap on the traditional swap mechanism (i.e., kswapd) by implementing it with simple code modifications. To confirm the effectiveness of VSwap, we performed a comprehensive evaluation based on various workloads. Our evaluation results confirm that VSwap is more useful and highly valuable than the original swap mechanism. In particular, VSwap improves the overall performance up to 48.18% by harvesting available swap memory space in advance with negligible overhead; it performs close to the ideal performance.

Keywords:

swap mechanism; swap memory fragmentation issue; migration; address remapping

1. Introduction

The memory requirements in computing environments have become higher due to the rise of AI, cloud, and Internet of Things technologies [1,2,3,4]. As a result, computing systems (i.e., smartphones, tablets, and edge devices) have gradually increased the memory capacity to meet the memory requirements of applications [5]. For example, the latest commercial smartphones and edge devices have up to 16 GB and 4 GB, respectively [6,7]. This change means that the memory capacity on the devices is an important venue for efficiently enabling diverse services (e.g., Smart Home and Industrial IoT with on-device AI) [1,8,9]. However, many devices still face an out-of-memory (OOM) issue because of further increases in the demand for memory by runnable applications as the available memory space grows [2,10]. Thus, the hardware extension to address the OOM issue will reveal a memory shortage over time [8,11].

Meanwhile, operating systems (e.g., Linux and Android) have traditionally tried to make memory space available by reclaiming free space in the software aspect, and they commonly employ two mechanisms. One is an out-of-memory killer (OOMK), which is triggered when available free space on memory drops below the predefined threshold (e.g., 122 MB) [12,13]. To reclaim space in memory, it kills one or more applications consuming large amounts of memory, regardless of human intent [5,10]. This means that all used pages are reclaimed at once when the corresponding application is killed. The OOMK mechanism is effective in reclaiming a large amount of free space, but it can cause unintended behaviors like so-symbol lookup and XML overhead [10]. An alternative to the OOMK mechanism is the swap mechanism [3,8,14]. It has gained a great deal of traction as a solution to the OOM issue in operating systems [5,15]. When the available memory space is insufficient, the swap mechanism first finds pages that have low re-access probability and then swaps them out to the swap memory space inside the secondary storage devices (e.g., HDDs or SSDs) to make free space [3,8,15]. Intuitively, the swap mechanism is considered more efficient than the OOMK because it selects the victim pages to be swapped out using the temporal locality of the pages, and it can avoid unnecessary resource reclamation [16,17].

With the advancement of IoT technologies and the increasing applications that consume much more memory, research efforts are being made to handle memory resources more efficiently [5,10,11,15,18,19,20,21]. A recent proposal attempted to orchestrate OOMK and swap to reduce the number of memory reclamations and the launch latency of applications [10]. Some previous studies proposed new swap policies where user preference and re-access possibilities are more prioritized for victim page selection [5,11]. Prior work went into designing the new swap mechanism to prevent random writes caused by swap-out operations [15], and some researchers proposed ZNSwap, where swap operations are piggybacked on host-side garbage collection for ZNS SSDs [18]. Unfortunately, although efforts have been made to enhance the swap mechanism, it still introduces significant and unpredictable overhead. Especially, the overhead can gradually increase as the pre-reserved swap memory space inside the storage device gets fragmented over time; we call this the swap memory fragmentation (SMF) problem.

The root cause of the SMF issue is that the swap mechanism reuses swap pages that were invalidated by the swap-in operations, resulting in additional overhead. In other words, after the swap memory is fragmented, the swap mechanism has to spend time finding the invalidated swap pages while delaying block allocation for swap-out operations in the swap memory space. Furthermore, scattered writes from a series of swap-out operations negatively affect I/O throughput because they lead to performing random writes; this leads to long seek time in HDDs and additional die or chip-level collisions in SSDs. In general, prior studies on swap optimization have focused on improving victim-page selection or reducing write amplification, while rarely addressing fragmentation that accumulates inside the swap memory space itself [5,10,11,18]. Although IOSR [15] attempted to reduce swap-space fragmentation through allocation control, its approach was preventive rather than corrective. This overlooked fragmentation gradually degrades swap efficiency, motivating the need for a kernel-level mechanism that can directly compact fragmented swap regions.

In this paper, we first note why the SMF issue occurs over time in detail. Then, we propose an extension of the traditional swap mechanism, called VSwap, that optimizes the swap memory space for defragmentation purposes. The key idea of VSwap is to enable efficient data migration with minimal I/O cost. If swap memory reaches its capacity limit, VSwap triggers virtual vswap-in and vswap-out to gather valid swap pages across multiple swap clusters into a new swap cluster; a swap cluster consists of a fixed number of swap pages, and it is a unit of swap memory management. We also design an I/O monitor to avoid interference from VSwap with normal I/O operations issued by running applications. Unlike previous approaches that rely on specific storage or file system implementations (e.g., ZNSwap), VSwap is the first kernel-level swap defragmentation mechanism that operates generically across both HDDs and SSDs. It achieves this through a vswap-in and vswap-out operation collaborating with an I/O monitor, which performs defragmentation transparently during idle periods to minimize interference with normal I/O operations. We implemented a prototype of VSwap on the Linux kernel (v5.15), and used the original swap mechanism as the baseline for comparison. Compared to baseline, VSwap shows better performance by up to 48.18% in the synthetic workload, where the swap memory is highly fragmented. Finally, we summarize our contributions as follows:

Identification of SMF issue. Previous studies have focused on swap-out operations to accelerate the performance of the swap mechanism. However, we highlighted an in-depth analysis of the swap memory space and side effects. Our analysis revealed that the swap memory space suffers from excessive swap-in and swap-out operations, resulting in performance degradation during execution.
Design and development of VSwap. We designed VSwap that efficiently addresses the above SMF issue with negligible extra overhead. Also, we implemented VSwap as a novel extension to the original swap mechanism to be generic to support any computing devices (i.e., Smartphones, Tablets, and Edge IoT devices).
Comprehensive evaluation. We conducted comprehensive evaluations on synthetic and real-world workloads to compare VSwap with the traditional swap mechanism (i.e., baseline). Then, we performed a top-down analysis to clearly confirm the effectiveness of VSwap.

The remainder of this paper is organized as follows. After diving into the traditional swap mechanism and the motivation for our work in Section 2, Section 3 proposes VSwap to mitigate the performance drop caused by the SMF issue. The evaluation results of VSwap are shown in Section 4, and related work is given in Section 5. Finally, Section 6 concludes this research.

2. Background and Motivation

In this section, we introduce the behaviors of the traditional swap mechanism in detail. Then, we present a comparative analysis between clean and aged conditions in the swap memory space to motivate our extension design for the original swap mechanism.

2.1. Swap Mechanism

Most operating systems, including Linux, Android, and Windows, offer the swap mechanism to handle the OOM issue on the software side [16,22,23]. In general, the swap memory space is set inside the storage layer (e.g., HDD or SSD) to robustly extend the limited physical memory pool (i.e., host DRAM) in a computing system [11,15]. For space efficiency, the swap memory space consists of consecutive clusters; a cluster size is 2 MB. A cluster is composed of a fixed number of swap pages (i.e., 512) that have contiguous memory offsets in the swap memory space [15,18]. The kernel daemon (i.e., kswapd) is responsible for allocating and reclaiming the clusters and internal swap pages for supporting swap-in or swap-out operations [3,8]. If the available space in host memory (i.e. DRAM) is insufficient, the daemon newly allocates a cluster and assigns one or more swap pages belonging to the allocated cluster to migrate data from host memory to swap pages; we call this swap-out operation [15,18]. The swap-in operation is triggered at a page fault time. When a page fault occurs, the kernel simply checks if the page resides in either the page cache or the swap memory space using the corresponding page table entry (PTE) [3]. Note that PTE has information that tracks which pages in host memory correspond to which swap pages in the swap memory space (i.e., swap offset). If the page is present in the swap memory space, the page fault handler in the memory subsystem performs a swap-in operation to bring the data of the swapped page to host memory [16,17]. After that, the handler updates the swap map that tracks the usage of all swap pages in the swap memory space to mark the swapped in page as “invalid” [22]. Unfortunately, this invalidation causes the swap memory fragmentation (SMF) issue in the future if all swap pages belonging to the cluster are not invalidated.

Now, let us dive into the SMF issue. If the kswapd daemon consumes all the swap pages in a cluster, it checks the existence of an available cluster. If it is possible, the daemon gets a new cluster to handle swap-out operations. However, the swap memory space can be insufficient over time due to frequent swap-out operations, leaving no swap clusters available in the swap memory space [15,18]. In this condition, if a new swap-out operation is triggered, the kswapd daemon first scans the swap map on the fly to find swap pages that were invalid due to previous swap-in operations [24]. In addition, kswapd may wait to acquire the lock before scanning the swap map to prevent concurrent updates to the map by the page fault handler [25]. Then, the daemon assigns one of the invalidated pages to handle the swap-out operation [26]. Intuitively, the swap-out operation under such an aged condition should perform worse than it does under the clean condition for the following reasons. First, it should take a lot of time to acquire the lock and scan the swap map, as mentioned before [25]. Second, the write pattern caused by swap-out operations changes from sequential to random; we will discuss this in detail in the next section [2,15]. In this paper, we call the swap-out operation under the aged condition and its overhead the swap memory fragmentation (SMF) issue.

2.2. Impact of Fragmented Swap Memory Space

The swap mechanism has a long research history, with early work extending the physical memory pool efficiently [27]. However, previous work overlooked the inherent issues of swap memory space pollution that arise from frequent swap operations over time. In this section, we introduce the performance concern caused by the SMF issue. To do this, we designed an experiment to capture the write patterns issued to the swap memory space on the storage devices in a stress scenario. In the scenario, we pre-allocated a consecutive 2 GB space on the underlying SSD device for the swap memory space. Then, we consumed almost all pages in the physical memory pool on the host DRAM with dummy data, leaving just 500 MB of available free space. To trigger the swap operation, we devised a synthetic benchmark that runs as follows.

It first performs mmap system calls to request 1GB of memory with page allocation.
Then, it sets the allocated pages “dirty” so that the dirty pages can be swapped out in the near future.
It partially unmapped swap pages with munmap system calls to leave one swap page in each cluster assigned by the previous mmap system call. In other words, the clusters become fragmented after performing the third step.
To lead to more fragmentation, it repeated the above three steps 50 times.

Figure 1 shows the write pattern issued by the swap-out operations of the original swap mechanism while running the benchmark. As shown in Figure 1, the pattern can be categorized into two areas: sequential and random. The traditional swap mechanism starts in a sequential pattern and continues in the same pattern until about 120 s. This is because kswapd can consecutively allocate swap pages in a cluster in the early stages. Note that kswapd tries to assign swap pages to the same cluster as much as possible to maximize the performance benefits of sequential writes. In other words, consecutive memory addresses of swap pages in a cluster are transformed into sequential logical block addresses (LBAs) during swap-out operations.

Meanwhile, after most of the cluster is fragmented, the write pattern changes to random writes. Unfortunately, the randomness of the pattern gradually increases over time as a result of the cumulative reduction in swap memory space after performing the aforementioned third step. Note that swap-out operations with random writes significantly harm overall performance because they lead to poor storage operations that are much slower than sequential writes on both HDDs and SSDs [2,15]. To confirm our intuitive insight, we measured the latency of swap-out operations under early stage (i.e., clean) and fragmented (i.e., aged) conditions, respectively. As a result, throughput after entering the random area drops from 103.93 MB/s to 78.06 MB/s (approximately 24.9%).

This observation motivates us to propose VSwap, which is an extension of the original swap mechanism to improve I/O efficiency and to ensure compatibility with existing computing systems (i.e., smartphones, tablets, and edge devices).

3. Design and Implementation

In this section, we introduce VSwap, a novel extension mechanism, called VSwap, that is designed to optimize swap memory space in computing environments that require more memory capacity than physical memory. The design goals of VSwap are to (1) instantly identify fragmented swap clusters in the swap memory space and (2) migrate valid swap pages scattered across multiple clusters to a new cluster to secure available free clusters in advance. The challenges in achieving the goal are threefold. First, data migration in the swap memory space incurs extra I/O operations to the underlying storage device. Second, it must be able to run the migration without interfering with the execution of latency-sensitive applications. Third, the solution should operate transparently to the official memory subsystem in the kernel with simple code modifications.

3.1. Basic Operations with Two Auxiliary Tables

As VSwap is an extension of the traditional swap mechanism, it follows the basic rules in the page allocation or reclamation procedure in the swap memory space. In our extension, we slightly modify swap-in and swap-out operations to update two auxiliary tables: reverse Swap-to-PTE (S2P) map tables and a fragment table. Figure 2 shows the overall architecture of VSwap and its internal operations. In the figure, the black circled numbers (➊, ➋) indicate the swap-out flow handled by kswapd, while the white circled numbers (①–④) represent the subsequent swap-in and VSwap flow that performs background defragmentation during idle periods.

Reverse S2P map table: It maps the offset of a swapped page within the swap memory space (i.e., swap offset) to the address of the page table entry dedicated to the swapped page (i.e., PTE address). This table plays an important role in updating the swap offset that resides in the PTE after performing the virtual migration; we will discuss this in the next section in detail. For implementation, we modified the swap-out operation to inject a new procedure that inserts a pair of points (swap offset, PTE address) into the S2P table. For example, when a page 0 is swapped with swap-out to the swap memory space inside the secondary storage media, VSwap records a pair of points (0, 1FF32B) in the S2P map table (denote ➊–➋ in Figure 2).

Fragment table: It is designed to find fragmented clusters in the swap memory space quickly. To do so, VSwap monitors the fragmented ratio of each cluster, which means how many swap pages within a cluster leave a “valid” state. Whenever a swap page is swapped from the swap memory space to the host DRAM, VSwap weights the ratio of a targeted cluster to reflect invalidation of the corresponding swap page. A lower ratio indicates that there are fewer swap pages with a “valid” state in the cluster. VSwap inserts the cluster into the fragment table if its ratio drops below the predefined threshold (e.g., 5% of the total number of swap pages in a cluster). For example, if the fragmented ratio of the cluster 0 drops below the threshold after completing swap-in operations for swap page 509 and 511 in order, VSwap inserts the cluster 0 into the fragment table (denote ①–② in Figure 2). This is because the cluster only has two valid swap pages (offset 0 and offset 510). Note that if the fragmented ratio of a cluster becomes 0, which means all swap pages in the cluster are marked with “invalid”, VSwap removes the cluster in the fragment table because the memory subsystem in the kernel reclaims the whole swap memory space dedicated to the cluster for reuse later.

3.2. Virtual Swap Operations

Now, we discuss the key idea, called virtual migration, and how it tackles the challenges mentioned above. The procedures of the virtual migration are illustrated in Algorithm 1.

Virtual migration: To efficiently migrate valid pages in the swap memory space, we designed virtual migration, a simple and transparent swap migration mechanism. In VSwap, the virtual migration leverages two key swap operations with the auxiliary tables above: a vswap-in that reads a swap page across multiple clusters listed on the fragment table, and a vswap-out that writes data in the page into a different cluster in the swap memory space. We implement both operations atomically to ensure data consistency (see Algorithm 1, line 7). The swap-in operation for migrated swap pages must be transparent to upper layers. To achieve this, VSwap scans the reverse S2P table for the corresponding PTE address and updates the swap offset in that PTE entry (see Algorithm 1, line 8). This process, called address remapping, enables the kernel to access the page from the new swap offset during a page fault without additional handling. In other words, if a page fault occurs due to the swapped page, the memory subsystem in the kernel normally reads the contents of the swapped page from the migrated new swap offset instead of the original swapped offset without impairing use.

Algorithm 1 Pseudo code of the virtual migration

1:: // this function is started or stopped by I/O monitor
2:: Function: virtual_migration(fragment_table, S2P_table, reserved_cluster)
3:: for each fragmented_cluster in fragment_table do
4:: for each valid_page in fragmented_cluster do
5:: new_page ← get_swap_page (reserved_cluster)
6:: PTE_addr ← S2P_table[valid_page]
7:: virtual_swap_in_out(valid_page, new_page)
8:: update_PTE(PTE_addr, new_page)
9:: swap_free(valid_page)
10:: end for
11:: status ← check_I/O_monitor()
12:: if status is busy then
13:: return
14:: end if
15:: end for

Efficient data placement for virtual migration: Unfortunately, an extra I/O overhead while running the virtual migration is inevitable in that the migration needs to read the swap pages from the original location and write them to the new location. We can help to make the right data placement that mitigates the overhead by using the following hardware and software features. (1) To do so, we utilize the hardware characteristic that sequential writes are much faster than random writes; it can help to reduce the latency of seek and rotation time in HDDs and the garbage collection overhead in SSDs. To take advantage of this, VSwap assigns swap pages to be used in the virtual migration in a log-structured fashion. The migrated swap pages have consecutive swap offsets in a cluster for sequential writes. (2) We also consider the “hotness” (i.e., swap-in likelihood) of swap pages. In other words, swap pages left in fragmented clusters for a long time have a low possibility of swap-in operation in a short time. Thus, it is valuable to gather them in the same cluster in the swap memory space because it reduces the possibility of being fragmented in the future. To enable the benefits from both features, VSwap tries to locate as many swap pages in the same cluster as possible.

I/O Monitoring for low interference: To address the second challenge, we newly designed I/O monitor that continuously tracks I/O operations issued by applications and decides whether to start migrating in the background (denote ③–④ in Figure 2). To do so, the monitor periodically collects the read and write operations issued to the underlying storage device from the applications in order. We also built it to wake up the I/O monitor at the end of a time quanta (e.g., 5 s) to run it with minimal interference. The monitor calculates a metric that indicates the total number of I/O requests accumulated within the specified time period. If the metric drops below the predefined threshold, VSwap recognizes it as an idle state, and the I/O monitor triggers to run the virtual migration (denote ④ in Figure 2). The monitor also controls whether to perform the migration operation continuously or not. If the metric rises above the threshold during migration operations, the I/O monitor updates the state from “idle” to “busy” to terminate the running migration (see Algorithm 1 line 12–line 14). This I/O monitor is a key component of VSwap, designed to ensure transparency to high-priority applications by deferring migration activities to idle periods. As a result, VSwap minimizes interference with foreground I/O workloads and maintains stable overall performance.

Reserved clusters: To efficiently support the virtual migration, VSwap pre-reserves a pair of clusters, called the reserved cluster (RC) 0 and 1, in the swap memory space (see Figure 3). As illustrated in Figure 3, the virtual migration proceeds in two steps: (1) vswap-in loads valid swap pages from fragmented clusters into the swap cache, and (2) vswap-out writes them sequentially into the reserved clusters (RC0 or RC1). Once RC0 becomes full, VSwap alternates to RC1 to continue migration without interruption. VSwap first assigns one side of the pair (i.e., RC 0) when a swap page is required in the virtual migration and ensures that one side always has available swap memory spaces. When all swap pages within the assigned RC 0 are used, VSwap quickly assigns the other side of the pair (i.e., RC 1), and it converts RC 0 to a normal cluster so that the contents of the swapped pages can be read later by a swap-in operation. VSwap newly allocates the other available cluster as the reserved cluster (i.e., RC 0) so that a pair of clusters is always maintained.

4. Evaluation

In this section, we present the performance effectiveness of VSwap in various real-world workloads and offer a summary of the benefits from utilizing virtual migration. We evaluated VSwap to answer the following questions:

Can VSwap accelerate swapping processing while the underlying swap memory space ages?
How many free clusters can VSwap maintain compared to traditional swap mechanisms?
What are the tradeoffs in VSwap design?

4.1. Experimental Setup

We conducted all experiments on a machine equipped with a 16-core Intel i9-12900KF CPU, 32 GB DRAM, and a 250 GB Samsung 870 EVO SSD. To enable the swap mechanism, we allocate a consecutive 2 GB space within the SSD as the swap memory space. There are 1024 clusters in the swap memory space, and each cluster is in turn composed of 512 swap pages whose size is 4KB. To measure the performance of the swap mechanism, we pre-consumed almost all pages in the physical memory pool on the host DRAM with dummy data, leaving just 500 MB of available free space. Thus, the swap mechanism will be triggered after consuming 500 MB of memory space.

We implemented a prototype of VSwap in the Linux kernel (version 5.15) on Ubuntu 20.04, using the default kernel configuration with minimal code modifications for integration. The fragmentation threshold was empirically set to 5%. Note that higher thresholds may cause excessive page migration, leading to unnecessary I/O overhead. And lower thresholds can reduce the positive effectiveness of defragmentation. We compared it with two approaches: the Ideal and Base approaches. We define the Ideal as when the available swap memory space is always sufficient to allocate new swap pages, so the swap mechanism is lightweight and efficient. To achieve the Ideal condition, we allocate 100 GB as the swap memory space, unlike Base and VSwap. On the other hand, Base refers to the traditional swap mechanism.

For a comprehensive evaluation, we used two synthetic workloads and four real-world workloads. For the synthetic workloads, we utilized a synthetic benchmark mentioned in Section 2.2 and pmbench [28], which simulates a practical and intensive memory access. We also compared VSwap on memory-intensive real-world workloads: Memcached [29] with YCSB-A [30], Metis [31,32] with word counts and page view counts, and Mobilenet for AI workloads. In each test, we compare VSwap with Ideal, whose performance is normalized to 1. We executed all experiments five times and took the average as the performance result, and set time intervals to make an idle state between experiments.

4.2. Synthetic Workloads

We first examine the performance while running the swap-out operations based on the synthetic benchmark (see Figure 1 in Section 2.2). This is because the devised benchmark can show an intuitive performance and memory consumption.

Figure 4 shows the normalized memory allocation time and the number of free clusters within the swap memory space. In Figure 4a, the x-axis represents the accumulated allocation size as the logical time orders in the designed benchmark, where the allocation of 1GB is repeated 50 times. As shown in Figure 4, Ideal shows the best performance and a good free cluster ratio, even though the benchmark requires intensive memory allocation. Of course, the number of free clusters decreases over time, but it never affects the latency of memory allocation because Ideal always allocates free clusters without the scanning operations for the swap map mentioned in Section 2.1.

In Figure 4, VSwap shows similar performance trends observed in Ideal, although it has only 2 GB swap memory space; the performance up to 29 iterations is very similar. Figure 4b explains the intuitive reason behind the similarity between Ideal and VSwap. As shown in Figure 4b, VSwap quickly recovers free clusters under the memory-intensive workload by using the virtual migration operations. Thus, it can offer free clusters without the scanning operations. Unfortunately, VSwap reveals a negative impact behind the 400 s in terms of free cluster ratio. The reason is that reclaiming free clusters is limited in that the accumulated valid pages increase over time. Despite such a limitation, Figure 4b also confirms that the virtual migration contributes significantly to performance improvements. VSwap improves performance by up to 48.18% compared to the Base. The performance improvement can have additional implications in that the underlying storage device has a positive impact on I/O performance due to sequential writes. It is important to note that this performance gain is achieved despite the I/O monitoring and migration overheads introduced by VSwap. This is because the advantage of maintaining sequential write patterns through virtual migration clearly outweighs the minor cost of these background operations.

To clearly confirm the effectiveness of VSwap, we also performed an evaluation based on the pmbench benchmark that is widely used to profile paging performance while triggering fault-intensive memory operations. Figure 5 shows the evaluation results on the pmbench benchmark. As shown in Figure 5a, VSwap shows a good performance and similar performance to the Ideal on both Linear and Uniform workload. Especially, it improves the performance by up to 4.47% on average compared with the Base on the Uniform workload. The reason for the small performance gap compared to that of Figure 4 is that pmbench continuously requires memory space. The memory space used is reclaimed during the idle time before the next benchmark runs (see Figure 5b,c). Such a pattern can provide opportunities where Base can simply obtain free clusters without the scanning operations because reclaiming is easily available. Note that VSwap efficiently maintains more available clusters than Base. In Figure 5c, VSwap can efficiently secure the number of free clusters by up to 26.85% and it leads to a performance gain. These results demonstrate that VSwap mitigates the SMF issue by continuously reclaiming fragmented clusters through virtual migration, as indicated by the increased free cluster ratio in Figure 5b,c.

4.3. Real-World Workloads

To produce more meaningful test results, we compared VSwap on the real-world benchmarks: Memcached with YCSB and Metis.

For evaluation, we first selected Memcached with the YCSB benchmark, which is one of the standard frameworks for evaluating performance while consuming memory space. Figure 6 shows the results of the evaluation running the YCSB load and the YCSB-a workload, which simulates a read-heavy scenario with a Zipfian distribution. For better understanding, Figure 6a shows the average performance of YCSB load and YCSB-a separately. As shown in Figure 6a, the performance gain of the YCSB load is better than that of YCSB-a; VSwap improves the overall performance up to 9.36% and 6.26% compared to Base for the YCSB load and YCSB-a, respectively. This is because YCSB load performs sequential writes to record initial data in the database (recordcount = 1,000,000) while YCSB-a is a mixed workload (read 50% and update 50%). Meanwhile, Figure 6a shows the trends of the free cluster ratios when running consecutively the YCSB load and YCSB-A. Unlike synthetic workloads, the free cluster ratios in Figure 6b fluctuate significantly in the YCSB workload; sometimes, the gap between VSwap and Base is greater than 75%. The reason why the performance gain is not significant compared to the gap of the free cluster ratio is as follows. First, VSwap cannot have the opportunity to run the virtual migration because YCSB-a constantly issues I/O requests. Note that the virtual migration only works in the idle state to avoid interference with application performance. Second, since read and update operations can be handled based on the DRAM page cache, the use of the swap mechanism may result in a small performance gain.

Furthermore, we performed more experiments using Metis, which is a multi-core MapReduce library; it requires the allocation, reallocation, and deallocation of numerous memory chunks to efficiently handle the internal three states (i.e., Map, Reduce, and Merge) [31,32]. To compare the Base with VSwap, we ran two workloads inside Metis: (1) Word Count workloads with 18 threads and a 300 MB dataset; (2) Page View Count using 4 threads with a 500 MB dataset. Figure 7 shows average performance and free cluster ratios in five consecutive experiments. As shown in Figure 7a, VSwap shows enhanced performance by 10.32% in Word Count and 8.21% in Page View Count on average in multiple runs. In addition, Figure 7b,c clearly shows that VSwap eagerly reclaims swap pages, which are no longer used and unneeded, in idle time. This eager reclaim has a positive impact on performance in terms of memory allocation because it reduces the scanning time for the swap map. In other words, VSwap quickly assigns an empty swap cluster by securing as many free clusters as possible in advance when the request for memory allocation occurs on the fly.

4.4. Real-World AI Workload

In general, AI workloads require and consume a vast volume of memory space to perform the training and inference process [33]. To confirm the performance benefit of VSwap, we additionally compared VSwap on the MobileNet V2 model, which is one of the AI workloads on mobile devices. For evaluation, we employ the CIFAR-10 dataset, which is a widely used benchmark in machine learning and computer vision to evaluate image classification models. In the evaluation, the MobileNet V2 model only performs an inference process based on 1,280 images without any learning process to emulate the edge IoT or mobile environment [6,33].

To measure the performance of the inference process, we executed MobileNet five consecutive times with 15 s intervals. Figure 8 shows the average performance normalized to the Ideal and the ratios of free cluster on the MobileNet. As shown in Figure 8a, VSwap achieves a similar effect to the Ideal case, and improves overall performance by up to 27.35% compared to Base on average. We can describe the performance benefit of VSwap in Figure 8b. Note that MobileNet periodically requires CPU resources to perform the inference process, resulting in an idle state. This implies that VSwap efficiently reclaims free clusters by migrating the scattered valid swap pages across clusters into the reserved cluster of VSwap (i.e., RC0 or RC1) during the inference time.

In general, VSwap shows high performance for real-world workloads, including server and edge IoT environments. We believe the performance gains of VSwap over diverse memory-intensive workloads are due to the virtual migration and storage-friendly write patterns.

4.5. Limitation and Tradeoffs

The overhead of VSwap can be categorized into two parts: memory space and performance overhead. First, VSwap requires extra memory space to maintain auxiliary tables (i.e., S2P and fragment table). Thus, we used 25 KB as the linked list pointer to implement the single fragment table; 16 bytes were allocated for list_head, and a total of 24 KB were used to assign 8 bytes pointers to each cluster. When the fragmented ratio of a cluster drops below the predefined threshold, the pointer of the fragment table is updated to point to the next fragmented cluster. We also used the total of 8 MB memory space to maintain S2P tables; 2 GB of swap memory space has a total of 524,288 pages, and each swap page additionally consumes 16 bytes to find the corresponding PTE address.

In terms of CPU overhead, VSwap introduces only a small processing load because the I/O monitoring and migration tasks are lightweight and event-driven. Note that the I/O monitor is periodically triggered (every five seconds) to collect block request counters. Most of the migration time spends I/O transfer during Virtual Swap-in/out, rather than CPU computation. Finally, the migration is triggered only during idle periods, ensuring negligible interference with user applications.

Meanwhile, VSwap never triggers the virtual migration in busy times when some I/O operations are ongoing to the underlying storage device. This is because it can hurt the performance and latency of ongoing applications. Fortunately, most computing systems have their own scheduling policies to efficiently share limited hardware resources, so that an idle state periodically occurs in real-world systems.

5. Related Work

This section discusses prior studies on swap mechanisms for various devices and fragmentation mitigation techniques.

Edge device. As applications demand more memory, an efficient swap mechanism has become essential in edge devices [10,11,14]. SmartSwap [11] predicts rarely used applications in mobile devices and performs process-level early swap-out of their pages. MemSaver [14] records event-based hot page history in mobile devices and selectively swaps out other pages of background applications. SWAM [10] selects the victim pages for swap-out based on their application access or reference history. Then, it decides between a storage-based slow path and a compressed-memory-based fast path according to the characteristics of the victim page.

Storage device. To address the performance and lifetime issues of storage devices, many studies have focused on an efficient swap mechanism [2,8,15]. IOSR [15] addresses swap memory space pollution by employing process-oriented allocation, which predicts the swap page counts of each process and allocates space accordingly rather than in cluster units. DISS [8] improves swap efficiency on storage devices by invalidating unused swap pages in application-specific regions, which reduces storage space overhead and I/O interference. Wang et al. [2] mitigates flash lifetime degradation caused by random writes in swap I/O by buffering and compressing swap pages in groups.

Fragment mitigation. Fragmentation is a traditional issue in systems that operate with large write units (e.g., blocks or segments), as partially valid data within these units prevents efficient space reclamation [8,18,34,35]. To mitigate fragmentation, many systems employ various data relocation techniques that compact valid data and free up contiguous space. For example, SSDs perform internal garbage collection to reclaim space by relocating valid pages within a block [8,18]. ZNSwap [18] extends this concept to the kernel level, relocating valid swap data at the zone level to meet the sequential write constraints of Zoned Namespace (ZNS) SSDs. F2FS [34] performs segment cleaning by moving valid data from low-utilization segments to newly allocated contiguous space to improve write efficiency. Similarly, RocksDB [35], a database based on the Log-Structured Merge (LSM) tree, performs compaction to merge duplicate or obsolete data from upper levels into lower levels, thereby reducing fragmentation and improving storage efficiency.

In summary, prior research on swap mechanisms has mainly focused on optimizing page selection or allocation efficiency, while existing fragmentation within the swap memory space has remained largely unaddressed [2,3,5]. Edge device level techniques (e.g., SmartSwap [11], MemSaver [14], DISS [8], IOSR [15]) improve swapping policies or space utilization, but they cannot compact fragmented swap regions once fragmentation occurs. Fragmentation mitigation methods at the storage or filesystem layer (e.g., F2FS [34], RocksDB [35], ZNSwap [18]) perform compaction on data blocks rather than swap clusters, and thus cannot directly resolve swap-space fragmentation at the kernel level. In contrast, VSwap bridges this gap by introducing a kernel-level compaction mechanism that proactively mitigates swap memory fragmentation through virtual migration and address remapping.

6. Conclusions

Today, applications that consume a larger volume of memory space are increasing in various environments, such as IoT and embedded environments. In this paper, this study introduces the swap mechanism that dynamically extends memory space using the storage device and explores the swap memory fragmentation (SMF) issue that leads to unacceptable performance collapse. Then, we proposed VSwap as an extension of the traditional swap mechanism to address some challenges with the SMF issue. To confirm the effectiveness of VSwap, we conducted comprehensive evaluations with synthetic and real-world workloads. Our evaluation results clearly show that VSwap can achieve good performance up to 48.18% and 27.35% compared to the traditional swap mechanism for synthetic and real-world workloads, respectively. By performing a performance analysis, we found that the virtual migration of VSwap accounts for a significant portion of performance improvement by harvesting as many free clusters as possible in advance. Finally, we believe that VSwap would guide a new direction of performance improvement for various systems (e.g., servers, tablets, and IoT devices) that suffer from memory pressure and hardware limitations. Although VSwap introduces a small amount of memory overhead due to the additional S2P and fragment tables, this overhead can grow proportionally with larger swap configurations. As future work, we plan to design a more memory-efficient S2P structure to improve scalability in resource-constrained environments.

Author Contributions

Conceptualization, G.M. and D.K.; methodology, G.M.; software, G.M.; validation, G.M. and D.K.; formal analysis, G.M. and D.K.; investigation, G.M. and D.K.; resources, G.M. and D.K.; data curation, G.M.; writing—original draft preparation, G.M. and D.K.; writing—review and editing, D.K.; visualization, G.M. and D.K.; supervision, D.K.; project administration, D.K.; funding acquisition, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00251730) and the “Regional Innovation System & Education (RISE)” through the Seoul RISE Center, funded by the Ministry of Education (MOE) and the Seoul Metropolitan Government. (2025-RISE-01-007-05). This research was performed in collaboration with “HAPPY AI (https://www.happyaidata.kr/ accessed on 21 September 2025)”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
OOM	Out-OF-Memory
OOMK	Out-Of-Memory Killer
XML	eXtensible Markup Language
HDD	Hard Disk Drive
SSD	Solid-State Drive
ZNS	Zoned Namespace
SMF	Swap Memory Fragmentation
VSwap	Virtual Swap
IoT	Internet of Things
DRAM	Dynamic Random Access Memory
PTE	Page Table Entry
LBAs	Logical Block Addresses
S2P	Swap-to-PTE
RC	Reserved Cluster
I/O	Input/Output
LSM	Log-Structured Merge

References

Lee, J.; Park, H.; Song, Y. An Improved Flash-Based Swap System for Performance and Flash Endurance. In Proceedings of the 2023 IEEE International Conference on Consumer Electronics, Las Vegas, NV, USA, 6–8 January 2023; pp. 1–5. [Google Scholar]
Wang, Y.; Tsai, C.; Chang, L. Killing processes or killing flash? Escaping from the dilemma using lightweight, compression-aware swap for mobile devices. ACM Trans. Embed. Comput. Syst. 2021, 20, 90. [Google Scholar] [CrossRef]
Li, C.; Shi, L.; Liang, Y.; Xue, C. SEAL: User experience-aware two-level swap for mobile devices. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2020, 39, 4102–4114. [Google Scholar] [CrossRef]
Guo, D.; Zhang, S.; Zhang, J.; Yang, B.; Lin, Y. Exploring Contextual Knowledge-Enhanced Speech Recognition in Air Traffic Control Communication: A Comparative Study. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 16085–16099. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Liu, L.; Xiao, L. iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments. ACM Trans. Archit. Code Optim. 2024, 21, 47. [Google Scholar] [CrossRef]
Sah, D.; Almujaiwel, S.; Cengiz, K.; Alrashdi, I. Energy-Efficient Task Allocation for IIoT Deep Learning Applications: An Embedded Edge Clusters Solution. IEEE Internet Things J. 2025, 12, 34900–34909. [Google Scholar] [CrossRef]
Chen, Y.; Wang, X.; Zhang, Y.; Liu, H.; Zhao, J.; Wang, L.; Xu, M. Blockchain-based Nash bargaining for task scheduling in IoT edge computing environments. IEEE Internet Things J. 2023, 11, 13851–13864. [Google Scholar] [CrossRef]
Yu, D.; Luo, L.; Wang, H.; Lv, Y.; Shi, L. DISS: A Novel Data Invalidation Scheme for Swap-Data on Flash Storage Systems. In Proceedings of the 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 20–23 January 2025; pp. 1153–1159. [Google Scholar]
Li, X.; Zhao, H.; Xu, J.; Zhu, G.; Deng, W. APDPFL: Anti-Poisoning Attack Decentralized Privacy Enhanced Federated Learning Scheme for Flight Operation Data Sharing. IEEE Trans. Wirel. Commun. 2024, 23, 19098–19109. [Google Scholar] [CrossRef]
Lim, G.; Kang, D.; Ham, M.; Eom, Y. SWAM: Revisiting swap and OOMK for improving application responsiveness on mobile devices. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, Madrid, Spain, 2–6 October 2023; pp. 1–13. [Google Scholar]
Zhu, X.; Liu, D.; Zhong, K.; Len, J.; Li, T. SmartSwap: High-performance and user experience friendly swapping in mobile systems. In Proceedings of the 54th Annual Design Automation Conference, Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
Lebeck, N.; Krishnamurthy, A.; Levy, H.M.; Zhang, I. End the Senseless Killing: Improving Memory Management for Mobile Operating Systems. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 20), Boston, MA, USA, 15–17 July 2020; pp. 873–887. [Google Scholar]
Li, C.; Bao, J.; Wang, H. Optimizing low memory killers for mobile devices using reinforcement learning. In Proceedings of the 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC), Valencia, Spain, 26–30 June 2017; pp. 2169–2174. [Google Scholar] [CrossRef]
Challa, P.; Song, B.; Jiang, S. MemSaver: Enabling an all-in-memory switch experience for many apps in a smartphone. In Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, London, UK, 7–11 May 2024; pp. 267–275. [Google Scholar]
Li, W.; Shi, L.; Li, H.; Li, C.; Sha, E. IOSR: Improving I/O efficiency for memory swapping on mobile devices via scheduling and reshaping. ACM Trans. Embed. Comput. Syst. 2023, 22, 129. [Google Scholar] [CrossRef]
Kim, S.H.; Jeong, J.; Kim, J.S. Application-Aware Swapping for Mobile Systems. ACM Trans. Embed. Comput. Syst. 2017, 16, 182. [Google Scholar] [CrossRef]
Liang, Y.; Li, J.; Ausavarungnirun, R.; Pan, R.; Shi, L.; Kuo, T.W.; Xue, C.J. Acclaim: Adaptive Memory Reclaim to Improve User Experience in Android Systems. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 20), Boston, MA, USA, 15–17 July 2020; pp. 897–910. [Google Scholar]
Bergman, S.; Cassel, N.; Bjorling, M.; Silberstein, M. ZNSwap: Un-block your swap. ACM Trans. Storage 2023, 19, 12. [Google Scholar] [CrossRef]
Zhao, H.; Chen, Y.; Wang, X.; Wang, D.; Xu, H.; Deng, W. Joint Optimization Scheduling Using AHMQDE-ACO for Key Resources in Smart Operations. IEEE Trans. Consum. Electron. 2025; early access. [Google Scholar] [CrossRef]
Deng, W.; Feng, J.; Zhao, H. Autonomous Path Planning via Sand Cat Swarm Optimization with Multistrategy Mechanism for Unmanned Aerial Vehicles in Dynamic Environment. IEEE Internet Things J. 2025, 12, 26003–26013. [Google Scholar] [CrossRef]
Deng, W.; Xu, H.; Guan, Z.; Sun, Y.; Ran, X.; Ma, H.; Zhou, X.; Zhao, H. PSO-K-Means Clustering-Based NSGA-III for Delay Recovery. IEEE Trans. Consum. Electron. 2025; early access. [Google Scholar] [CrossRef]
Guo, W.; Chen, K.; Feng, H.; Wu, Y.; Zhang, R.; Zheng, W. MARS: Mobile Application Relaunching Speed-Up Through Flash-Aware Page Swapping. IEEE Trans. Comput. 2016, 65, 916–928. [Google Scholar] [CrossRef]
Kida, S.; Imamura, S.; Kono, K. Revisiting Memory Swapping for Big-Memory Applications. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, Hsinchu, Taiwan, 19–21 February 2025; pp. 33–42. [Google Scholar]
Dickins, H. mm: Swap: Cleanup Swap_Map Scanning in Get_Swap_Pages(). 2019. Available online: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebebbbe904634b0ca1c674457b399f68db5e05b1 (accessed on 13 August 2025).
Corbet, J. Making Swapping Scalable. 2016. Available online: https://lwn.net/Articles/704478/ (accessed on 13 August 2025).
Li, S. mm: Swap: Change Block Allocation Algorithm for SSD (Cluster-Based Swap). 2015. Available online: https://github.com/torvalds/linux/commit/2a8f9449343260373398d59228a62a4332ea513a (accessed on 13 August 2025).
Denning, P.J. Virtual Memory. ACM Comput. Surv. 1970, 2, 153–189. [Google Scholar] [CrossRef]
Caldwell, B. Pmbench. Available online: https://github.com/blakecaldwell/pmbench/tree/master (accessed on 18 July 2025).
Memcached Project. Memcached. Available online: https://github.com/memcached/memcached (accessed on 18 July 2025).
Cooper, B. YCSB: Yahoo! Cloud Serving Benchmark. Available online: https://github.com/brianfrankcooper/YCSB (accessed on 18 July 2025).
Mao, Y. Metis. Available online: https://github.com/ydmao/Metis?tab=readme-ov-file (accessed on 18 July 2025).
Mao, Y.; Morris, R.; Kaashoek, M.F. Optimizing MapReduce for Multicore Architectures. In Technical Report MIT-CSAIL-TR-2010-020, Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology: Cambridge, MA, USA, 2010. [Google Scholar]
Prashanthi, S.K.; Kesanapalli, S.A.; Simmhan, Y. Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models. Proc. Acm Meas. Anal. Comput. Syst. 2022, 6, 44. [Google Scholar] [CrossRef]
Lee, C.; Sim, D.; Hwang, J.; Cho, S. F2FS: A New File System for Flash Storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, 16–19 February 2015; pp. 273–286. [Google Scholar]
Dong, S.; Kryczka, A.; Jin, Y.; Stumm, M. RocksDB: Evolution of Development Priorities in a Key-Value Store Serving Large-Scale Applications. ACM Trans. Storage 2021, 17, 26. [Google Scholar] [CrossRef]

Figure 1. The write patterns caused by swap-out operations under the stress test.

Figure 2. The overall architecture of VSwap extension.

Figure 3. The procedure of virtual migration based on reserved clusters.

Figure 4. Performance and free cluster ratios while running devised benchmark (i.e., stress test). (a) The allocation time normalized to the Ideal; (b) Free cluster ratio over time in fifty consecutive experiments.

Figure 5. pmbench evaluation result. (a) The average performance results normalized to the Ideal; (b) Free cluster ratio in five consecutive experiments (Linear); (c) Free cluster ratio in five consecutive experiments (Uniform).

Figure 6. Memcached with YCSB result. (a) The average performance results normalized to the Ideal; (b) Free cluster ratio in five consecutive experiments.

Figure 7. Metis evaluation result. (a) The average performance results normalized to the Ideal; (b) Free cluster ratio in five consecutive experiments (Word Count); (c) Free cluster ratio in five consecutive experiments (Page View Count).

Figure 8. MobileNet v2 with CIFAR-10 dataset. (a) The inference performance results normalized to the Ideal; (b) Free cluster ratio in five consecutive experiments.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Moon, G.; Kang, D. VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization. Appl. Sci. 2025, 15, 12049. https://doi.org/10.3390/app152212049

AMA Style

Moon G, Kang D. VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization. Applied Sciences. 2025; 15(22):12049. https://doi.org/10.3390/app152212049

Chicago/Turabian Style

Moon, Gyupin, and Donghyun Kang. 2025. "VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization" Applied Sciences 15, no. 22: 12049. https://doi.org/10.3390/app152212049

APA Style

Moon, G., & Kang, D. (2025). VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization. Applied Sciences, 15(22), 12049. https://doi.org/10.3390/app152212049

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

VSwap: A New Extension to the Swap Mechanism for Enabling Swap Memory Space Optimization

Abstract

1. Introduction

2. Background and Motivation

2.1. Swap Mechanism

2.2. Impact of Fragmented Swap Memory Space

3. Design and Implementation

3.1. Basic Operations with Two Auxiliary Tables

3.2. Virtual Swap Operations

4. Evaluation

4.1. Experimental Setup

4.2. Synthetic Workloads

4.3. Real-World Workloads

4.4. Real-World AI Workload

4.5. Limitation and Tradeoffs

5. Related Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI