4.1. Experimental Setup
To evaluate our proposed SSD management scheme, we collected workload traces from real applications running on a commercial SSD. All experiments were conducted on a Samsung T5 Portable SSD (1 TB) connected to a workstation with an Intel® Core™ i7-9700K CPU, 32 GB RAM, and Red Hat Enterprise Linux 7.9 (kernel 3.10.0-1160).
For each workload, tracing was activated on the target SSD before launching the application under test. We then started the corresponding application—such as MySQL [
26], Cassandra [
27], MongoDB [
28], RocksDB [
29], SQLite [
30], Dbench [
31], or the Varmail workload model—and generated load using its appropriate benchmarking tool. Specifically, sysbench [
32] was used to drive MySQL, YCSB [
33] to exercise Cassandra, MongoDB and RocksDB, Filebench to run the Varmail workload, and the Phoronix Test Suite to run SQLite and Dbench. As these applications executed under load, blktrace [
34]. recorded all resulting block-layer I/O events, and blkparse later translated these logs into text-based traces. When file-level semantics were required, ftrace [
35] output was analyzed in parallel. This process ensured that each trace reflected authentic, application-generated storage behavior.
We developed an event-driven SSD simulator in C to evaluate our proposed GC-concurrency control mechanism under realistic, trace-driven workloads. The purpose of this simulator is not to model every firmware detail, but to accurately reproduce the timing, queuing, and channel-level scheduling interactions that determine how garbage collection competes with foreground I/O in a multi-channel SSD. Because commercial SSDs do not expose internal states such as page validity, channel occupancy, or GC progress, a simulation framework is required to isolate and measure the specific impact of concurrency control. Our model incorporates widely used SSD configuration parameters—including page size, block count, and NAND timing characteristics—drawn from prior studies on flash memory organization and garbage-collection behavior [
4,
36,
37,
38,
39]. The key parameters used in our evaluation are summarized in
Table 1.
The parameters used in the equation for determining the maximum number of concurrent GC channels were configured as follows: = 0.5, = 0.85, = 8, = 4 (i.e., /2). These threshold values were chosen based on empirical observations and design considerations. The lower threshold = 0.5 ensures that garbage collection remains completely disabled while more than half of the SSD’s capacity is still available, allowing foreground I/O performance to proceed unimpeded during light to moderate usage. The upper threshold = 0.85 reflects the point at which GC becomes increasingly urgent, and full concurrency is needed to prevent space exhaustion. Between these thresholds, GC intensity ramps up smoothly to balance cleaning efficiency with user responsiveness. = 4 was selected to ensure that at least half of the system’s channels remain available to service user read/write requests at any time, thereby minimizing the risk of system-wide stalls caused by GC contention.
Table 2 outlines the latency parameters used in our SSD simulation model for page-level operations across different components, including SRAM and NAND flash memory. To determine SRAM access delay, we utilized timing data from a single-port, 16-bit SRAM implemented on a Xilinx Spartan-7 FPGA (model: xc7s100fgga676-1). The measurements were obtained using Xilinx Vivado 2021.2 design tools. For NAND flash memory timing, we referenced the Micron MT29F4G08ABADAH4 device model [
40], which provides representative read and write delay specifications for modern flash memory.
4.2. Trace File Analysis
Table 3 summarizes the characteristics of the workload traces collected using the methodology described in
Section 4.1. These traces represent a diverse set of real-world storage environments, including database systems (MySQL, Cassandra, MongoDB, RocksDB) and file-system benchmarks (SQLite, Dbench, Varmail). Each trace captures the actual block-I/O patterns generated by its corresponding application under load, including request type distributions, access locality, and temporal burstiness.
To further characterize the nature of the workloads used in our experiments, we analyzed the types of files accessed during trace execution.
Table 3 summarizes the distribution of file system entities involved in each workload. For several workloads, the proportion of certain categories is extremely small, and these categories are therefore marked as “n/a”.
To better characterize the workloads used in our experiments, we analyzed user-request patterns, as summarized in
Table 4. This includes the proportions of read, write, and delete operations, along with their breakdown into sequential and random access types. The results reveal distinct I/O behaviors across applications—for example, some traces are dominated by sequential writes, while others involve frequent small random reads.
Table 5 complements this by detailing the request-size distribution by operation type and access pattern. Together, these statistics offer a comprehensive view of workload diversity, essential for evaluating the generality and effectiveness of our proposed GC scheduling scheme. For several workloads—including Dbench, MySQL, SQLite, and Varmail—the proportion of read requests is extremely small, as reported in
Table 4. Because these traces do not contain enough read operations to produce statistically meaningful latency measurements, the corresponding read-latency entries in
Table 4 and
Table 5 are marked as “n/a” (not applicable). This notation indicates that read-latency analysis is not meaningful for those workloads and avoids misinterpretation of incomplete or unreliable values.
4.3. Experimental Results
We conducted a series of simulations using our C-based SSD simulator to evaluate the effectiveness of the proposed adaptive GC scheduling scheme. Specifically, we measured both average and maximum read/write latencies under a variety of real-world workload traces. We conducted a series of simulations using our C-based SSD simulator to evaluate the effectiveness of the proposed adaptive GC (Garbage Collection) scheduling scheme. Specifically, we measured both average and maximum read/write latencies under a variety of real-world workload traces. To establish a meaningful comparison, we implemented three conventional GC strategies:
Conv_0_AllCh: GC is enabled for all channels from the very beginning of SSD operation (i.e., at 0% storage utilization).
Conv_0.7_AllCh: GC is postponed until the SSD reaches 70% storage capacity, at which point all channels may participate.
Conv_0.7_HalfCh: This variant follows the same 70% usage threshold but introduces a concurrency constraint, allowing at most half of the channels to perform GC operations simultaneously.
These configurations allow controlled evaluation of GC trigger timing and concurrency effects, enabling a focused and fair comparison with our proposed Adaptive GC method.
Table 6 summarizes the measured average and maximum latencies for all configurations. For each workload, we compare the proposed method against the best-performing configuration among the three conventional GC schemes (Conv_0_AllCh, Conv_0.7_AllCh, and Conv_0_7_HalfCh). The average improvement factor was 4.86×. When excluding two extreme outliers (Cassandra’s average read latency and RocksDB’s maximum read latency), the improvement factor still remained substantial with an average of 1.55× and a standard deviation of 1.17, highlighting the consistency and robustness of the proposed method across diverse workload characteristics.
For several workloads—including Dbench, MySQL, SQLite, and Varmail—the proportion of read requests is extremely small, as reported in
Table 4. Because these traces do not contain enough read operations to produce statistically meaningful latency measurements, the corresponding read-latency entries in
Table 6 are marked as “n/a” (not applicable). This notation indicates that read-latency analysis is not meaningful for those workloads and avoids misinterpretation of incomplete or unreliable values
The proposed GC scheme delivered particularly strong performance gains for workloads such as Cassandra and RocksDB. To understand the cause, we examined each workload’s I/O behavior.
Figure 3 illustrates host I/O request patterns over time, and
Figure 3a highlights Cassandra’s distinctive access pattern marked by intermittent spikes in read and write activity. These idle periods provide ideal windows for GC to operate with minimal interference, which our adaptive scheduling effectively exploits.
To analyze this effect in more detail with Cassandra workload, we examined key runtime metrics such as the host request rate, the number of GC operations executed, and the depth of pending I/O queues across the SSD system. Under the conventional GC strategy, Conv_0_AllCh,
Figure 4a shows that spikes in user requests coincide with GC activity, causing sudden increases in queue depth and degraded responsiveness. This contention is particularly harmful during sequential reads, where data is striped across channels and the slowest channel dictates overall completion time. In contrast, the proposed adaptive GC method dynamically limits the number of channels engaged in GC. As illustrated in
Figure 4b, this results in more evenly distributed GC activity, which smooths queue depth fluctuations and improves latency. By avoiding GC bursts and maintaining steady space reclamation, the system achieves superior responsiveness and stability.
Large sequential data is typically striped across multiple channels to maximize parallelism and throughput. While this enhances bandwidth, it also creates interdependence: a sequential read cannot be completed until all its segments are retrieved from the respective channels. When GC occupies all channels simultaneously—as is common in conventional schemes—this dependency leads to pronounced delays in sequential reads as with Cassandra workload. Our adaptive GC mechanism addresses this by throttling the number of GC-active channels based on current SSD usage. This ensures that a sufficient number of channels remain free to handle user I/O, significantly reducing the risk of system-wide stalls. As a result, sequential reads are more likely to complete without interruption, particularly under workloads with intermittent access patterns such as Cassandra. Although it remains possible that a GC-active channel may hold multiple segments of a sequential file, such worst-case overlaps are statistically rare in SSDs with high channel counts. Thus, the adaptive throttling approach not only smooths average latency but also robustly guards against extreme tail latencies, offering improved responsiveness under mixed and sequential workloads.
The MongoDB workload presented a particularly challenging case for the proposed GC scheme, showing higher latencies across all metrics compared with the best-performing conventional approaches. From the beginning of operation, MongoDB generates a continuous stream of interleaved read and write requests, with frequent delete operations overlapping write activity. Once SSD utilization exceeds roughly 70%, these deletions lead to a large number of invalid blocks and trigger intensive GC activity, as illustrated in
Figure 5a. Under the Conv_0.7_AllCh configuration, this results in a sharp increase in write latency, as nearly all channels become occupied by GC tasks. However, since GC requests are prioritized ahead of user reads in this configuration, read latency remains comparatively lower despite the heavy background activity.
In contrast, the proposed adaptive GC method intentionally limits the number of channels engaged in GC at any given time. While this design successfully prevents full-channel blocking, it also means that GC extends over a longer duration. In workloads like MongoDB, where write and delete operations occur continuously, this prolonged GC overlap with user I/O increases queue depth and response times. Consequently, both read and write latencies rise as the SSD remains under sustained mixed-load pressure.
Figure 5b reflects this behavior, showing how the adaptive scheme trades immediate throughput for more evenly distributed—but ultimately heavier—GC interference in continuously active workloads. This illustrates a key trade-off: by preventing all-channel GC activation, our method avoids system-wide contention, yielding significantly improved latency stability and predictability.
We also examined how the SSD’s available storage capacity evolves over time under different GC strategies, as shown in
Figure 6.
Figure 6a plots the number of empty blocks at each time interval, where a block is only counted as empty after its garbage collection process is fully completed. Until GC is executed, blocks selected for reclamation remain marked as occupied.
As expected, Conv_0_AllCh shows the slowest decline in storage availability because it initiates GC immediately and aggressively, reclaiming space as soon as possible. This behavior is confirmed in
Figure 6b, which illustrates the number of GC executed. In contrast, the proposed adaptive GC scheme, like the other two threshold-based methods—Conv_0.7_AllCh and Conv_0.7_HalfCh—begins reclamation only after the SSD reaches a designated utilization threshold. However, the key distinction is in how GC proceeds after activation. Whereas Conv_0.7_AllCh and Conv_0.7_HalfCh exhibit steep drops in the number of empty blocks, followed by sudden surges—reflecting large bursts of simultaneous GC activity across many channels—the proposed scheme maintains a smooth, gradual decrease in available space. These abrupt oscillations in the conventional approaches result from initiating GC across all or half the channels simultaneously, leading to short-lived but intensive space reclamation events.
By contrast, the proposed adaptive method evenly distributes GC activity over time, preventing abrupt shifts and maintaining a stable, consistent pattern of space recovery. This not only aligns GC execution with actual storage pressure but also helps to avoid performance degradation caused by sudden bursts of background activity.
Overall, this behavior demonstrates a key advantage of the proposed scheme: it avoids inefficient, burst-driven GC, supports continuous background cleaning, and enables more predictable and stable SSD performance across varying workloads and usage levels.
Overall, these results highlight several strengths of the proposed scheme:
It smoothly adapts GC intensity to usage conditions, reducing both average and tail latency.
It avoids channel-wide stalls by limiting concurrent GC, preserving throughput for foreground tasks.
It proves especially beneficial under burst-driven and journal-heavy workloads, where latency spikes from conventional GC methods can be severe.
These findings suggest that dynamic, usage-aware GC scheduling is essential for maintaining SSD responsiveness across diverse workloads, and that a fixed-threshold approach may be insufficient—especially in multi-channel architectures.
To validate the parameter choices used in our GC-concurrency control model, we conducted a sensitivity analysis examining how different threshold values and GC-concurrency limits affect latency and GC activity.
Table 7 and
Table 8 summarize the results obtained from varying
and
, and
Gmax across a broad range of configurations. These results confirm that the selected values—
= 0.5,
= 0.85, and
Gmax =
Nch/2 = 4—provide the most consistent trade-off between cleaning effectiveness and user-level responsiveness.
In addition,
Table 9 presents an evaluation using a larger NAND page size (8 KB) to examine the robustness of the proposed GC-concurrency control under different flash-geometry configurations. Increasing the page size alters the granularity of write and GC operations, which reduces the overall frequency of GC events and consequently narrows the performance gap between the proposed method and the conventional schemes. While the proposed method generally maintains stable and predictable performance across workloads, the results also reveal important workload-dependent behavior. For MongoDB, the proposed approach continues to show slightly worse performance than the best conventional configuration, consistent with the observations made under the 4 KB page size. For RocksDB, the proposed scheme clearly outperformed the conventional approaches at 4 KB, but this advantage diminishes under the 8 KB geometry, and one of the conventional configurations produces lower average write latency. This shift appears to stem from the way RocksDB’s compaction bursts interact with the coarser page granularity, which reduces GC frequency and alters the balance between GC concurrency and foreground I/O.