Next Article in Journal
A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems
Previous Article in Journal
ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DirectFS: An RDMA-Accelerated Distributed File System with CPU-Oblivious Metadata Indexing

1
CRRC Nanjing Puzhen Co., Ltd., Nanjing 210031, China
2
College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China
3
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(19), 3778; https://doi.org/10.3390/electronics14193778
Submission received: 18 August 2025 / Revised: 13 September 2025 / Accepted: 22 September 2025 / Published: 24 September 2025
(This article belongs to the Section Computer Science & Engineering)

Abstract

The rapid growth of data-intensive applications has imposed significant demands on the performance of distributed file systems, particularly in metadata operations. Traditional systems rely heavily on metadata servers to handle indexing tasks, leading to Central Processing Unit (CPU) bottlenecks and increased latency. To address these challenges, we propose Direct File System (DirectFS), an Remote Direct Memory Access (RDMA)-accelerated distributed file system that offloads metadata indexing to clients by leveraging one-sided RDMA operations. Further, we propose a range of techniques, including hash-based namespace indexing and hotness-aware metadata prefetching, to fully unleash the performance potential of RDMA hardware. We implement DirectFS on top of Moose File System (MooseFS) and compare DirectFS with state-of-the-art distributed file systems using a variety of Filebench v1.4.9.1 and MDTest from the IOR suite v4.0.0 workloads. Evaluation results demonstrate that DirectFS achieves significant performance improvements for metadata-intensive benchmarks compared to other file systems.

1. Introduction

The explosive growth of data-intensive applications such as large-scale machine learning and real-time analytics has imposed significant pressure on the performance and scalability of distributed file systems (DFS) [1,2,3,4,5,6]. Traditional DFS architectures, primarily designed for conventional storage media and TCP/IP-based communication, are increasingly unable to meet the low-latency and high-throughput demands of modern workloads. In particular, metadata operations—such as file creation, lookup, and directory traversal—have emerged as critical bottlenecks in contemporary DFS environments [7,8,9,10,11,12].
In traditional distributed file systems, metadata management typically follows a server-active architecture, where one or more dedicated metadata servers are responsible for maintaining the global namespace and handling all metadata-related operations [1,2,3,4,5,6]. These include file and directory creation, pathname resolution, permission checks, and inode or dentry lookups. Clients do not traverse the namespace independently; instead, they issue metadata requests to the server through RPC. The server actively parses the pathname, traverses the internal metadata structures, and returns the target metadata or a corresponding error code.
The advent of high-performance hardware technologies, including Non-Volatile Memory (NVM) and Remote Direct Memory Access (RDMA), presents a transformative opportunity to improve the metadata performance of DFS [13,14,15,16,17,18]. NVM, as a novel class of persistent storage hardware, combines the low latency and byte addressability of DRAM with the durability of traditional block devices [19]. It enables direct, fine-grained access to persistent metadata without the overhead of software-managed block layers. Meanwhile, RDMA, as a high-performance networking technology, allows direct memory access between nodes without kernel involvement or CPU interruption, drastically reducing cross-node communication latency and CPU overhead [19].
However, existing server-active DFS metadata architectures are fundamentally mismatched with the characteristics of NVM and RDMA. Designed for slow block-based storage and TCP/IP-based networking, the server-active metadata architecture often fails to unlock the potential of modern hardware due to several critical limitations.
First, metadata server CPU overhead becomes a dominant bottleneck. In traditional hardware, latency is dominated by network and storage devices, making server-side CPU latency negligible. However, with RDMA and NVM drastically reducing data transport and access latency, CPU tasks such as directory parsing and index traversal now represent the critical path. This shift causes server saturation even under modest metadata loads.
Second, cross-node interactions incur inefficiencies due to their reliance on RPC-based communication. RPC typically maps to two-sided RDMA verbs such as RDMA_SEND and RDMA_RECV [20,21]. These verbs require server-side CPU involvement to deserialize and process work requests, which incurs high software overhead. Consequently, systems that adopt RPC-style metadata handling over RDMA fail to fully leverage the low-latency, zero-copy semantics that one-sided RDMA operations offer.
However, designing a client-active metadata path is not straightforward. An intuitive approach is to expose raw metadata structures in remote memory and let clients traverse them directly using RDMA READs. However, resolving file paths this way requires one RTT per directory level, leading to linear latency growth. In deep directory trees, traversing with one-sided RDMA even has higher latency than the RPC-based approach.
To solve these problems, we present DirectFS, a distributed file system with metadata path co-designed for RDMA and NVM. The core idea is to adopt a client-active architecture. Instead of relying on RPC and server-side processing, DirectFS allows clients to directly traverse metadata structures stored in remote persistent memory via one-sided RDMA operations. To reduce the overhead of deep directory resolution, we introduce a hash-based namespace index that narrows the search scope to a small subtree, significantly reducing the number of RDMA round-trips required for path lookup. In addition, a hotness-aware metadata prefetching technique is proposed to improve access locality in subtree traversal.
We implement DirectFS on top of MooseFS and evaluate it on a two-node Cloudlab testbed using both microbenchmarks and realistic workloads. Compared to baseline systems including MooseFS, Crail, and a naive RDMA variant, DirectFS achieves up to 2.3× higher metadata throughput and reduces RDMA I/O traffic by over 60%. DirectFS enables efficient metadata operations in distributed file systems, unlocking the full potential of RDMA and NVM technologies.

2. Background and Motivation

2.1. Metadata Management in Distributed File Systems

Metadata management is a critical component of distributed file systems, responsible for tracking file attributes, directory structures, and access permissions. Traditional DFS architectures typically employ a server-active model, where dedicated metadata servers handle all metadata operations [1,2,3,4,5,6]. For example, CephFS [3,22] uses a centralized metadata server, Ceph-MDS, to manage the global namespace. Clients interact with the metadata server through remote procedure calls (RPCs) to perform operations such as file creation, lookup, and directory traversal.

2.2. RDMA

Remote Direct Memory Access (RDMA) is a high-performance networking technology that allows direct memory access between nodes in a distributed system without involving the CPU or operating system kernel. This capability significantly reduces communication latency and CPU overhead, making RDMA particularly well-suited for data-intensive applications [23]. RDMA provides two types of operations: one-sided and two-sided verbs. One-sided verbs, such as RDMA READ and RDMA WRITE, allow a client to directly access the memory of a remote server without requiring the server to actively participate in the operation. RDMA two-sided verbs consist of RDMA_SEND and RDMA_RECV, which provide channel semantics like socket programming. Two-sided verbs require server-side CPU involvement to process requests, which can introduce significant overhead.

2.3. RDMA-Enabled Distributed File Systems

Many popular industrial distributed file systems are trying to support RDMA networks for higher performance. However, they only treat RDMA as a drop-in transport upgrade. For example, for CephFS [3], Crail [24] incorporated RDMA support by RDMA-based RPC libraries, i.e., Accelio [25] and DaRPC [21]. GlusterFS [26] uses an RDMA library only for data communication between clients and servers.
In academia, several systems have explored deeper RDMA integration [13,14,15,16,17,18], but still fall short in fully unleashing the performance potential of RDMA hardware. Octopus [14] and NVFS [13] leverage one-sided RDMA and persistent memory to accelerate data operations, but retains a server-active model where the server CPU participates in directory lookups. Orion [17] only offloads metadata handling for simple operations. Assise [15] improves system performance by placing PM near clients.

2.4. Limitations of Prior Art

Prior DFSes use server-active architecture for metadata handling. This architecture fails to unlock the potential of modern hardware due to several critical limitations:
High CPU overhead. Metadata server CPU overhead has become a dominant bottleneck in RDMA/NVM-enabled distributed file systems. In traditional environments, metadata requests are spaced out by high network and storage latency—often over 100 μs—so the server handles fewer requests per second, and its CPU load remains manageable.
However, RDMA and NVM reduces network and storage latency to us-scale [19], allowing clients to issue metadata operations at much higher rates. This shift significantly increases the arrival rate of metadata requests, rapidly pushing metadata server CPU utilization toward saturation.
To illustrate this problem, we run a microbenchmark comparing two systems: MooseFS [27], which uses traditional TCP-based RPC, and Crail [24], which uses RDMA-accelerated DaRPC. A single client issues statfile operations using up to 128 concurrent threads, and we monitor the CPU utilization of the metadata server.
As shown in Figure 1, both systems exhibit increasing CPU usage with higher concurrency. However, Crail’s CPU utilization grows significantly faster, reaching over 80% at 32 threads, while MooseFS remains under 60%. This is because RDMA’s low latency shortens the inter-request interval, increasing pressure on the server’s CPU pipeline. Although Crail benefits from faster transport [24], its server still processes every metadata request via software, including message handling, index lookup, and response generation. This highlights that RDMA and NVM’s low latency cannot translate into end-to-end metadata scalability without fundamental redesign to reduce or eliminate server-side CPU involvement.
Slow cross-node interaction. Second, cross-node interactions incur inefficiencies due to their reliance on RPC-based communication. RPC abstractions are typically built on two-sided RDMA verbs such as RDMA_SEND and RDMA_RECV [20,21], which require active CPU involvement at both sender and receiver. The remote metadata server must post receive buffers, deserialize and process requests, and post replies—operations that introduce substantial software overhead.
To understand the cost of cross-node interaction in RDMA-based metadata operations, we profile the latency breakdown of several CephFS [3] metadata operations over RDMA. As shown in Figure 2, communication-related tasks—including receive buffer preparation, serialization/deserialization, and request dispatch—dominate total latency. Even for simple stat operations, interaction overhead accounts for over 60% of the total latency, highlighting the inefficiency of server-involved metadata paths under RDMA.
Consequently, systems built atop two-sided RPC over RDMA remain CPU-bound and fail to take full advantage of RDMA’s low-latency, zero-copy characteristics. Only by shifting to pure one-sided RDMA metadata access, eliminating RPC and server CPU involvement, can distributed file systems achieve scalable, high-throughput metadata handling.
Inefficient access offloading. A naive solution to enable client-active architecture is to allow clients to directly traverse the server’s directory tree with one-sided RDMA verbs. However, this approach suffers from significant inefficiencies when dealing with deep directory structures. In such cases, clients must perform multiple RDMA READ operations to fetch each directory’s metadata sequentially, as directory traversal requires resolving each intermediate directory in the path. This process introduces multiple round-trip times (RTTs) between the client and the server, significantly increasing the overall latency for metadata operations.
We conduct a microbenchmark to illustrate this problem. We create a directory tree with varying depths and measure the time taken to resolve a file path at each depth using both a naive one-sided RDMA approach and a traditional RPC-based method. The results in Table 1 show that as the directory depth increases, the naive RDMA approach even exhibits higher latency than the RPC-based method, which only requires a single round-trip to resolve the entire path. This is because the naive approach incurs one RTT per directory level, leading to linear latency growth with depth.

3. Design

3.1. Overview

We propose DirectFS, a novel RDMA-accelerated distributed persistent memory file system. The core component of DirectFS is a client-active metadata indexing mechanism. Unlike traditional server-active metadata paths, DirectFS offloads all metadata resolution logic to clients, allowing direct access to remote metadata using one-sided RDMA operations without involving server CPUs. The architecture of DirectFS is illustrated in Figure 3.
The metadata plane of DirectFS consists of three key components: the Metadata Indexer, the RDMA Communication Module, and the Metadata Manager. These modules cooperate to support efficient, serverless metadata lookup via one-sided RDMA.
Metadata Indexer. The Metadata Indexer is responsible for processing metadata operation requests issued by the client, including both reads and writes. It first normalizes the input pathname (e.g., removing redundant slashes, resolving “.” and “..”) and classifies the operation into read or write categories. For write operations such as mkdir and create, the indexer forwards the request to the remote metadata server via the RPC interface, where it is handled by the Metadata Manager to ensure consistency and coordination.
For read operations such as stat or lookup, the indexer performs path resolution entirely on the client side using one-sided RDMA. Based on the pathname structure, it selects the appropriate resolution strategy, then computes the corresponding memory addresses and issues one-sided RDMA READs to the Metadata Mempool to retrieve the required directory entries.
RDMA Communication Module. The RDMA Communication Module is deployed on both the metadata server and client sides to enable direct, low-latency access to remote metadata. On the server side, this module is responsible for registering the metadata memory pool—residing in non-volatile memory—as RDMA-accessible memory regions. It exposes the base addresses and corresponding memory keys (RKey) to authorized clients, allowing remote one-sided access.
On the client side, the RDMA module manages connection establishment and maintains the necessary RDMA Queue Pairs (QPs) and completion queues. Once the RDMA connection is established between a client and a metadata server, clients can issue one-sided RDMA operations to access or update metadata stored on the server without involving the remote CPU or software stack. This module provides the foundational transport layer for the client-active metadata resolution path in DirectFS.
Metadata Manager. The Metadata Manager resides on the metadata server and is responsible for managing all metadata structures and processing client-issued write operations such as mkdir, create, and unlink. It maintains the global state of the Metadata Mempool stored in NVM, including a persistent allocation bitmap and an operation log. These structures support safe metadata allocation and provide crash consistency guarantees through write-ahead logging and recovery mechanisms.
In addition to persistent structures, the Metadata Manager also maintains runtime metadata in DRAM. Specifically, it uses an LRU list to monitor file and directory access frequency. This heat-tracking mechanism is used by the system to guide heat-aware metadata placement and optimize spatial locality for future RDMA access.
To fully unleash the performance potential of RDMA and NVM hardware, we further propose two optimizations to reduce RDMA RTTs and improve metadata access locality.
Hash-based Namespace Indexing. To reduce the overhead of deep directory traversals in metadata lookups, DirectFS employs a hash-based namespace indexing mechanism, which uses full pathname hashing to transform most component resolutions into a single hash table lookup.
DirectFS partitions the directory tree into multiple subtrees. A hash table maps the corresponding pathnames to the subtree root directory entries. When a client issues a metadata read operation, the Metadata Indexer first computes the hash of the pathname prefix and uses it to quickly locate the corresponding subtree root in the hash table. This allows the client to access the subtree root directory entry via a few one-sided RDMA READs, significantly reducing the number of RDMA operations required for deep directory traversals.
Hotness-aware Metadata Prefetching. To improve pathname resolution performance within a subtree, DirectFS employs a hotness-aware metadata prefetching mechanism; while resolving the subtree root, DirectFS simultaneously prefetches metadata of directory entries that are likely to be accessed later. This prediction is guided by a hotness tracking structure maintained on the metadata server, which uses an LRU-based scheme to identify frequently accessed subdirectories and files within each subtree.
When a prefetch hit occurs, subsequent pathname resolution steps within the subtree can be satisfied directly from the client-side buffers, eliminating the need for additional RDMA round-trips. This significantly reduces the depth and latency of metadata traversal, particularly for workloads with high temporal locality or repeated accesses to hot directories. Furthermore, because hot entries are colocated in memory through the heat-aware layout strategy, multiple useful entries can be prefetched efficiently in a single RDMA operation, improving bandwidth utilization.

3.2. Hash-Based Namespace Indexing

To reduce the overhead of deep directory traversals in metadata lookups, DirectFS employs a hash-based namespace indexing mechanism, which transforms most pathname component resolutions into a single hash table lookup via full-path hashing. DirectFS partitions the global directory tree into multiple disjoint subtrees. Instead of traversing each directory level recursively, the client can resolve the subtree root directory entry in a single hash table lookup.
As shown in Figure 4, DirectFS adopts a depth-based namespace tree partitioning strategy, where the global namespace is divided at pre-defined levels N 0 , N 1 , , N k . All paths that share the same prefix up to a level N i belong to the same subtree. This ensures a balanced distribution of directories while preserving structural semantics, and allows efficient lookup of the subtree root corresponding to any given pathname. We use a global hash table stored in the Metadata Mempool to map full pathnames to their corresponding subtree roots.
When handling a read operation, the client first canonicalizes the input pathname by removing redundant slashes and resolving dot components (“.” and “..”). Next, it extracts the longest possible prefix of the pathname that matches a subtree boundary (i.e., selects the largest i such that the prefix contains N i components and N i does not exceed the pathname depth). Then, the client computes the hash of this prefix and uses it to look up the corresponding subtree root in the global hash table. Finally, the remaining pathname components within the subtree are resolved by level-by-level traversal starting from the subtree root.
We choose cuckoo hashing to implement the subtree hash table because it offers two key advantages for RDMA-based access: (1) it guarantees at most two candidate locations for each key, ensuring bounded lookup latency even in the presence of collisions, and (2) both candidate slots can be fetched in parallel using one-sided RDMA reads, making it highly efficient in terms of both latency and bandwidth. We also implement doorbell-batching optimization to reduce PCIe overhead [23]. This design allows the client’s Metadata Indexer to compute a pathname hash, fetch the candidate buckets, and resolve the subtree root directory entry in just one or two RDMA operations, even for deeply nested paths.
We use a lightweight seqlock-style versioning for reads and fine-grained per-bucket mutexes for writes. Each entry carries a monotonically increasing version; writers acquire the bucket lock, set the version to an odd value, update the entry, persist the modified fields, and then flip the version to the next even value before releasing the lock. Readers perform one-sided lookups without locks: they read the version, then the entry, then re-read the version; if the version is odd or changed, the reader retries. This ensures readers never observe a torn or partially updated entry. For crash safety, the publish order (data → version) guarantees all-or-nothing visibility of an entry; upon restart, any entry with an odd version is discarded. When UMR spans DRAM + NVM, DRAM segments are merely a performance hint. Since the NVM entry is authoritative, DRAM state loss cannot cause inconsistency. Rehashing is done in the background and switches the table root atomically.
The full-path hashing approach in DirectFS posed two design challenges: (1) it is incompatible with symbolic links and permission checks, as it bypasses intermediate path resolution and directory-level permission verification; (2) concurrent access to the global hash table by the client-side Metadata Indexer and server-side Metadata Manager may lead to race conditions and inconsistent states. We address these challenges as follows.
Speculative Pathname Resolution. To handle intermediate symbolic links, we propose speculative path resolution technique. The basic idea of speculative full-path hashing is to optimistically hash the longest known subtree prefix, assuming no symbolic links. If the lookup fails, the system recursively traverses from the nearest ancestor subtree. When a symlink is found, its target is concatenated with the remaining path, and the lookup restarts. This enables fast-path access in common cases while preserving correctness. The detailed algorithm is shown in Algorithm 1. Speculative pathname resolution allows DirectFS to resolve most pathnames in a single hash table lookup while still handling intermediate symbolic links correctly.
To further reduce the cost of speculative lookup failures, each client maintains a lightweight Speculative Prefix Cache (SPC). The SPC stores, for each accessed path, the longest prefix composed solely of regular directory components (i.e., excluding any symbolic links). This prefix allows the client to skip redundant failed attempts on shorter prefixes.
When performing a pathname lookup, the client first queries the SPC to identify the longest known symlink-free prefix. It then uses this prefix as the starting point for speculative hashing. If the lookup succeeds, the remaining suffix is resolved normally. If the directory tree has been changed (e.g., a previously regular component becomes a symlink), the client lazily updates the SPC with the newly discovered valid prefix.
Compressed Permission Encoding. To support permission checking during speculative lookups, each subtree stores the access control metadata along its prefix path, including the UID, GID, and permission bits (e.g., POSIX mode) of each ancestor directory. This metadata allows clients to perform local, RDMA-based permission checks without server involvement.
However, reading full permission metadata for every directory component can lead to significant network overhead, especially for long paths. To mitigate this, we introduce a compressed permission encoding scheme that reduces the size of metadata transmitted over RDMA. In most real-world file systems, directory permissions exhibit strong spatial locality—multiple consecutive directories often share identical ownership and mode settings [28]. To exploit this, we introduce a compressed encoding scheme that reduces metadata size using a simple run-length-encoding algorithm. As shown in Figure 5, each compressed entry records a permission tuple (uid, gid, mode) and a repetition count indicating how many consecutive path components it applies to.    
Algorithm 1: Speculative Pathname Resolution.
Electronics 14 03778 i001
When a directory’s permission metadata are modified via chmod, chown, or chgrp, all subtrees that include this directory in their stored prefix must be updated to maintain consistent permission checks. In practice, such permission changes are rare. Production traces from Alibaba Cloud’s Pangu file systems report that directory set_permission and rename operations together account for only 0.0083% of all metadata operations [7]; similarly, ZoFS reports 0 chmod/chown in 64,282 syscalls (Facebook trace) and only 16 chmod in 25,306 syscalls (Twitter trace) [28]. Moreover, all related subtree-prefix metadata updates are performed as local memory operations on the metadata server without involving network communication, so the maintenance overhead is minimal and has negligible impact on common-case performance.

3.3. Hotness-Aware Metadata Prefetching

Hash-based pathname indexing significantly reduces the cost of global namespace traversal by enabling direct lookup of subtree roots. However, metadata resolution within a subtree still requires step-by-step traversal, where each directory component may incur an individual RDMA read. To address this, we introduce Hotness-aware Metadata Prefetching, which proactively loads frequently accessed directory entries into the client before they are needed.
For each full-path key in the global hash table, the metadata server maintains not only the pointer to the subtree root, but also a small prefetch block containing up to k pointers to hot directory entries within the corresponding subtree. Due to the limited read bandwidth of NVM, prefetching may increase NVM device-level bandwidth contention under high load. Therefore, we place the prefetch block in DRAM. When a client performs a hash table lookup, the associated prefetch block is retrieved as well, and its entries are speculatively fetched into the client’s local buffer. This reduces subsequent lookup latency if any prefetched entry is accessed.
Clients periodically report their local metadata access patterns to the metadata server via lightweight RPCs. The server uses this information to update an LRU list per subtree and populate the corresponding prefetch block with the most frequently accessed entries. This process is asynchronous and does not affect fast-path lookups.
When effective, hotness-aware prefetching transforms subtree lookups into mostly local memory hits, significantly reducing RDMA operations and improving end-to-end metadata resolution latency.
Selective Prefetching Prefetching k directory entries per lookup may increase RDMA traffic and consume valuable IOPS resources. This overhead can be significant, especially when clients rarely access most of the prefetched entries. To reduce unnecessary reads, as shown in Figure 6, we augment each entry in the prefetch block with a compact 1-byte fingerprint [29,30], computed as a hash of the path prefix associated with that directory entry. At runtime, the client calculates the fingerprint of the current lookup path and only fetches entries with matching fingerprints. In practice, hashing ensures that only one fingerprint matches per lookup in most cases [30], leading to exactly one additional RDMA read in the common case. This allows the system to retain the benefits of prefetching while minimizing bandwidth and memory overhead.
Fused Prefetching Since prefetch blocks and the subtree hash table reside in different memory regions, naive implementations would require two separate RDMA reads to fetch both the hash table entry and the prefetch block, which is inefficient due to limited IOPS of RNICs [31]. To address this, we leverage User Memory Region (UMR) [32], a feature supported by modern RDMA-capable NICs (e.g., Mellanox ConnectX), to fuse prefetching and hash entry read requests into a single RDMA_READ. UMR allows the server to expose a virtual memory region composed of multiple non-contiguous physical memory segments. Internally, the NIC maintains a scatter-gather list of these physical segments and presents them to the client as a single, continuous virtual address range [32,33]. This enables the client to fetch data from multiple underlying buffers using a single RDMA read operation.
We use UMR to construct a virtualized hash table view. Each hash table entry logically concatenates the NVM-resident hash table entry with the DRAM-resident prefetch block. The client can then fetch both the subtree root pointer and hot sub-directory entry pointers within a single RDMA read. This saves valuable IOPS resources of the RNIC.
Adaptive K Selection. In practice, the benefit of prefetching grows with k at first but diminishes when k is large, since the bandwidth cost keeps increasing. We therefore dynamically adjust k using a simple feedback rule: starting from k = 2 , if the observed prefetch hit ratio improves significantly without raising tail latency or NIC utilization, we increase k; otherwise, we decrease it. This ensures that under low load the system can prefetch more aggressively, while under high load it backs off to avoid bandwidth contention.

4. Implementation

DirectFS is implemented by modifying both the client and metadata server of the MooseFS distributed file system. The entire implementation is written in C and spans approximately 5200 lines of code across both sides.
On the client side, DirectFS introduces two new modules: an RDMA Communication Module and a metadata indexing module. The RDMA module establishes and maintains one-sided RDMA connections with the metadata server, while the indexing module intercepts all metadata-related pathname operations and performs direct RDMA reads to access server-side metadata structures.
The metadata server is extended with an RDMA service module that handles memory registration and exposes the metadata memory pool to clients. The core metadata storage is rebuilt using a persistent memory (NVM) pool managed by PMDK. The server-side code supports fixed-location metadata allocation, directory tree manipulation, and transactional consistency for updates.
DirectFS also maintains auxiliary metadata structures in DRAM, including per-subtree hotness tracking lists and RDMA-friendly metadata views. These are updated based on client feedback and used to guide prefetching and access optimization.
All metadata operations are crash-consistent. Metadata updates are performed via PMDK transactions, and metadata pool state is recoverable upon crash using persistent journaling and allocation metadata.

5. Evaluation

We evaluate DirectFS on a two-node (unless otherwise stated) testbed deployed on Cloudlab (Clemson cluster). One node acts as the metadata server and the other as the client. Both nodes are Dell R650 machines equipped with Intel Xeon Platinum 8360Y CPUs, 256 GB of DRAM, and Mellanox ConnectX-6 NICs. To emulate persistent memory, we reserve 20 GB of DRAM on the server node as a simulated NVM region for metadata storage.
We use two widely adopted benchmark tools to evaluate metadata performance: Filebench [34] and MDTest [35]. Filebench simulates realistic application workloads such as web serving and file creation, providing a practical view of metadata operation efficiency under common file system patterns [34]. MDTest focuses specifically on metadata-intensive operations, including file and directory creation, lookup, read, and removal, and is commonly used to evaluate file system metadata scalability [35].
We compare DirectFS against the following systems:
  • MooseFS [27]: The baseline distributed file system where clients communicate with the metadata server via TCP. The server performs metadata operations in DRAM and returns results over the network.
  • Crail [24]: An RDMA-optimized user-space distributed file system that uses the DaRPC communication library for metadata access. Crail interacts with the file system via crail-fuse, a FUSE-based client implementation.
  • DirectFS-Naive: A simplified variant of DirectFS that supports metadata access via one-sided RDMA but lacks full-path indexing and prefetching optimizations. Clients resolve pathnames by traversing the directory tree level by level and issuing one RDMA read per component.
All systems are evaluated using their respective FUSE-based clients. MooseFS [27], DirectFS, and DirectFS-Naive use their built-in FUSE clients, while Crail [24] uses the crail-fuse [36] client for file system mounting and access. We report both incremental ablations to quantify each design choice, and comparisons to MooseFS and Crail, representing existing DFSes.

5.1. Evaluating Hash-Based Namespace Indexing

To isolate the performance benefit of hash-based pathname indexing, we disable prefetching in DirectFS and compare it against DirectFS-Naive, which uses one-sided RDMA to traverse the directory tree level by level. We use the fileserver.f workload in Filebench and pre-generate 100,000 files under the test directory tree.
Impact of Directory Depth. We first evaluate how directory depth affects RDMA I/O efficiency. We generate file paths with varying depths from 1 to 7 and measure the average number of RDMA reads per lookup.
As shown in Figure 7, DirectFS consistently performs fewer RDMA I/Os than DirectFS-Naive; while DirectFS-Naive performs one RDMA read per directory level, DirectFS resolves deep paths in nearly constant I/O count due to direct subtree lookup via hash-based indexing.
Impact of symlinks. To evaluate the cost of symbolic links on speculation, we insert 0 to 3 symlinks into the path and measure lookup I/O. Each symlink points to a valid directory and is placed at varying depths.
Figure 8 shows that DirectFS’s I/O count increases with the number of symlinks, as speculation may fail and require fallback traversal. However, it still outperforms DirectFS-Naive, which traverses all components regardless of symlink presence.
Impact of Permission Diversity. Since DirectFS stores compressed permission metadata per subtree, we evaluate how the number of unique permission tuples affects RDMA read size. We construct path prefixes with varying numbers of unique permission combinations (uid, gid, mode) and measure the average permission metadata size transferred per lookup.
As shown in Figure 9, more permission diversity leads to slightly larger metadata blocks. However, DirectFS maintains low RDMA read sizes due to permission compression, while DirectFS-Naive reads each dentry individually, resulting in higher cumulative transfer.

5.2. Evaluating Hotness-Aware Metadata Prefetching

To evaluate the impact of our metadata prefetching optimizations, we compare the following configurations:
  • DirectFS-Naive: a baseline that performs RDMA-based per-component traversal without any indexing or prefetching;
  • DirectFS-Hash: hash-based metadata indexing enabled, but prefetching disabled;
  • DirectFS-FP: DirectFS with prefetching enabled, but without selective prefetching;
  • DirectFS-SP: DirectFS with prefetching enabled, but without fused prefetching;
  • DirectFS: full design with hash-based indexing, selective prefetching, and fused prefetching.
We examine the effectiveness of prefetching through three microbenchmarks.
Impact of Prefetch Block Size. We first study how the number of directory entries per prefetch block affects RDMA read size. Intuitively, larger blocks prefetch more metadata but also incur higher bandwidth costs. As shown in Figure 10, DirectFS-FP incurs increasing RDMA size with block size. DirectFS reduces this cost using fingerprint-based selective prefetching, which filters unnecessary entries. Across all block sizes, DirectFS achieves 30–70% smaller RDMA reads compared to DirectFS-FP.
Impact of Access Skewness. To simulate realistic workload patterns, we vary metadata access skewness using a Zipf distribution. Figure 11 shows that under low skewness (i.e., uniform access), all prefetching schemes show limited benefit. As skew increases, DirectFS increasingly benefits from both selective and fused prefetching. Selective filtering avoids prefetching cold entries, while fused access minimizes RDMA IOPS. At Zipf = 1.2, DirectFS reduces RDMA traffic by 55% over DirectFS-SP.
Impact of Client Concurrency. We evaluate system throughput under increasing numbers of concurrent clients. Figure 12 shows that DirectFS scales better than all baselines. Compared to DirectFS-SP, fused prefetching reduces RDMA request rate, improving NIC efficiency. Compared to DirectFS-Hash, prefetching improves the cache hit rate on hot entries. At 128 clients, DirectFS achieves up to 1.7× the throughput of DirectFS-Hash.
Metadata Write Performance. To evaluate the overhead of metadata writes, we implement a lightweight microbenchmark issuing create, unlink, mkdir, rmdir, chmod, and rename operations. The test uses one metadata server and a single client node with 64 worker threads, each confined to its own private directory to avoid contention. We repeat each workload three times and report median latency. As shown in Table 2, DirectFS delivers latency nearly identical to MooseFS across all write operations (within 5–10%), confirming that our client-active read optimizations do not introduce measurable penalties on the write path.

5.3. Filebench

To evaluate the end-to-end metadata performance of DirectFS under realistic workloads, we run the fileserver.f profile from Filebench. This workload simulates common metadata-intensive operations such as statfile, openfile, closefile, deletefile, and creatfile. Since our focus is on metadata performance, we exclude data-oriented operations such as readfile, writefile, and appendfile.
Figure 13 reports the average latency of each metadata operation under a typical configuration: average directory width of 3 and depth of 6.3 As shown in the figure, all systems exhibit similar latency for metadata update operations such as creatfile and deletefile. This is expected, as DirectFS, DirectFS-Naive, and MooseFS rely on server-side metadata modification via synchronous communication, and DirectFS does not optimize metadata writes. Crail achieves lower latency on writes due to its use of RDMA-optimized RPC (DaRPC) between the client and metadata server.
In contrast, DirectFS significantly outperforms other systems in the statfile operation, which is a pure metadata lookup. Compared to MooseFS (TCP-based) and Crail (RDMA-RPC), DirectFS reduces metadata read latency by 69% and 26%, respectively. This improvement stems from its full-path hash-based namespace indexing and hotness-aware metadata prefetching. With these optimizations, DirectFS can resolve lookups using a small number of one-sided RDMA reads, avoiding round-trip communications and fully leveraging RDMA’s low-latency capabilities.

5.4. MDTest

We further evaluate the metadata read throughput of DirectFS using MDTest, a standard benchmark for measuring file system metadata performance. We focus on the stat operation, which retrieves file metadata without reading file contents. To examine how directory hierarchy affects system behavior, we vary the depth of the directory tree while maintaining a constant number of total files. We also vary the number of client nodes to assess the scalability of DirectFS.
Impact of directory depth. Figure 14 shows stat throughput with varying directory depth. DirectFS consistently achieves the highest throughput across all tested depths. This is due to its use of full-path hashing and speculative lookup, which minimize the number of RDMA accesses required to resolve deep paths. In contrast, DirectFS-Naive exhibits degraded performance as depth increases, since it must perform one RDMA read per path component, incurring high IOPS and network overhead.
MooseFS and Crail maintain relatively stable performance regardless of directory depth. This is expected, as both systems retrieve metadata through a single round-trip communication with the metadata server—via TCP in MooseFS and RDMA-RPC in Crail—regardless of path length. However, they require server-side CPU processing for each lookup, which limits their throughput compared to DirectFS’s one-sided RDMA reads.
Interestingly, DirectFS shows small throughput spikes at specific depths (e.g., 1, 4, 7), which correspond to directory levels that align with the configured subtree root granularity. In these cases, the final directory component directly maps to a subtree root, enabling a single RDMA access to complete the lookup without fallback traversal.
Impact of system scale. Figure 15 shows the impact of system scale (number of client nodes) on metadata throughput. We fix the number of metadata servers to one and increase the number of client nodes while keeping 64 client threads per node. RPC-based baselines saturate the metadata server’s CPU, so their throughput quickly plateaus, and may even decline due to resource contention. By contrast, DirectFS’s client-active fast path resolves lookups with a few one-sided RDMA READs and minimal server CPU involvement, so throughput continues to increase with more client nodes; the growth rate eventually tapers when NIC/PCIe limits are approached.
In conclusion, these results demonstrate that DirectFS’s namespace indexing mechanism effectively reduces remote metadata access overhead, and scales well even in deep and hierarchical directory structures or higher number of nodes.

6. Discussion

Benefits on real-world systems. DirectFS targets the metadata read fast path, cutting per-op CPU/RPC work and improving scalability under concurrent clients. This matters because many real systems are read-heavy on the metadata path: production traces from Alibaba Cloud’s Pangu show readdir is the most frequent directory operation (93.3% of directory ops) [7], and file/directory reads dominate overall activity; classic NAS studies report that metadata procedures dominate in physical client workloads (e.g., 72% in SPECsfs2008 [37]; 64% in Filebench’s web-server profile [34]). These observations indicate substantial headroom for the read path that our design exploits. Workloads dominated by metadata writes (e.g., rename storms) may reduce relative gains; our design still functions correctly but will incur more server-side maintenance work.
External validity. The main differences between our setup and real deployments are (i) scale and (ii) fabric complexity (multi-hop routing, heterogeneous links). Our benefits are topology-agnostic because the fast path replaces an RPC path-walk with a few one-sided RDMA READs, cutting server-side CPU overhead and on-wire messages. At larger scale, since servers face higher CPU pressure and the fabric experiences more congestion and bandwidth contention, the relative speedup tends to grow with the number of client nodes, as depicted in Figure 15. Meanwhile, our hash-based namespace indexing reduces most lookups to one RDMA operation. Its performance gain is amplified in complex fabrics with higher RTT.

7. Conclusions

This paper presents DirectFS, an RDMA-accelerated distributed file system that offloads metadata indexing to clients by leveraging one-sided RDMA operations. By combining hash-based namespace indexing, and hotness-aware metadata prefetching, DirectFS significantly reduces lookup latency and RDMA overhead. Our evaluation shows that DirectFS outperforms traditional RPC-based and naive RDMA designs, achieving up to 2.3× higher throughput and 60% lower RDMA traffic on real-world metadata workloads.

Author Contributions

File system design, L.J.; file system implementation, Z.Z.; writing—original draft preparation, R.N.; writing—review and editing, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities grant number NS2024057 and the 2024 Yangtze River Delta Science and Technology Innovation Community Joint Research Project grant number 2024CSJZN00400.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Lingjun Jiang and Zhaoyao Zhang were employed by the company CRRC Nanjing Puzhen Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential.

References

  1. Ghemawat, S.; Gobioff, H.; Leung, S. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, Bolton Landing, NY, USA, 19–22 October 2003; pp. 29–43. [Google Scholar]
  2. Shvachko, K.; Kuang, H.; Radia, S.; Chansler, R. The Hadoop Distributed File System. In Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, Lake Tahoe, NV, USA, 3–7 May 2010; pp. 1–10. [Google Scholar]
  3. Weil, S.A.; Brandt, S.A.; Miller, E.L.; Long, D.D.E.; Maltzahn, C. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Seattle, WA, USA, 6–8 November 2006; pp. 307–320. [Google Scholar]
  4. Carns, P.H.; Ligon, W.B., III; Ross, R.B.; Thakur, R. PVFS: A Parallel File System for Linux Clusters. In Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta, GA, USA, 10–14 October 2000. [Google Scholar]
  5. The Lustre File System. Available online: https://www.lustre.org/ (accessed on 21 September 2025).
  6. Li, Q.; Xiang, Q.; Wang, Y.; Song, H.; Wen, R.; Yao, W.; Dong, Y.; Zhao, S.; Huang, S.; Zhu, Z.; et al. More Than Capacity: Performance-oriented Evolution of Pangu in Alibaba. In Proceedings of the 21st USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, 21–23 February 2023; pp. 331–346. [Google Scholar]
  7. Lv, W.; Lu, Y.; Zhang, Y.; Duan, P.; Shu, J. InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems. In Proceedings of the 20th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, 22–24 February 2022; pp. 313–328. [Google Scholar]
  8. Ren, K.; Zheng, Q.; Patil, S.; Gibson, G.A. IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, 16–21 November 2014; pp. 237–248. [Google Scholar]
  9. Patil, S.; Gibson, G.A. Scale and Concurrency of GIGA+: File System Directories with Millions of Files. In Proceedings of the 9th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, 15–17 February 2011; pp. 177–190. [Google Scholar]
  10. Thomson, A.; Abadi, D.J. CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, 16–19 February 2015; pp. 1–14. [Google Scholar]
  11. Niazi, S.; Ismail, M.; Haridi, S.; Dowling, J.; Grohsschmiedt, S.; Ronström, M. HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. In Proceedings of the 15th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, 27 February–2 March 2017; pp. 89–104. [Google Scholar]
  12. Aghayev, A.; Weil, S.A.; Kuchnik, M.; Nelson, M.; Ganger, G.R.; Amvrosiadis, G. File Systems Unfit as Distributed Storage Backends: Lessons from 10 years of Ceph Evolution. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, Huntsville, ON, Canada, 27–30 October 2019; pp. 353–369. [Google Scholar]
  13. Islam, N.S.; Wasi-ur-Rahman, M.; Lu, X.; Panda, D.K. High Performance Design for HDFS with Byte-Addressability of NVM and RDMA. In Proceedings of the 2016 International Conference on Supercomputing, Istanbul, Turkey, 1–3 June 2016; pp. 1–14. [Google Scholar]
  14. Lu, Y.; Shu, J.; Chen, Y.; Li, T. Octopus: An RDMA-enabled Distributed Persistent Memory File System. In Proceedings of the 2017 USENIX Annual Technical Conference, Santa Clara, CA, USA, 12–14 July 2017; pp. 773–785. [Google Scholar]
  15. Anderson, T.E.; Canini, M.; Kim, J.; Kostic, D.; Kwon, Y.; Peter, S.; Reda, W.; Schuh, H.N.; Witchel, E. Assise: Performance and Availability via Client-local NVM in a Distributed File System. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, Virtual Event, 4–6 November 2020; pp. 1011–1027. [Google Scholar]
  16. Guo, H.; Lu, Y.; Lv, W.; Liao, X.; Zeng, S.; Shu, J. SingularFS: A Billion-Scale Distributed File System Using a Single Metadata Server. In Proceedings of the 2023 USENIX Annual Technical Conference, Boston, MA, USA, 10–12 July 2023; pp. 915–928. [Google Scholar]
  17. Yang, J.; Izraelevitz, J.; Swanson, S. Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks. In Proceedings of the 17th USENIX Conference on File and Storage Technologies, Boston, MA, USA, 25–28 February 2019; pp. 221–234. [Google Scholar]
  18. Kim, J.; Jang, I.; Reda, W.; Im, J.; Canini, M.; Kostic, D.; Kwon, Y.; Peter, S.; Witchel, E. LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event, 26–29 October 2021; pp. 756–771. [Google Scholar]
  19. Wei, X.; Xie, X.; Chen, R.; Chen, H.; Zang, B. Characterizing and Optimizing Remote Persistent Memory with RDMA and NVM. In Proceedings of the 2021 USENIX Annual Technical Conference, Virtual Event, 14–16 July 2021; pp. 523–536. [Google Scholar]
  20. Kalia, A.; Kaminsky, M.; Andersen, D. Datacenter RPCs can be General and Fast. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), Boston, MA, USA, 26–28 February 2019; pp. 1–16. [Google Scholar]
  21. Stuedi, P.; Trivedi, A.; Metzler, B.; Pfefferle, J. DaRPC: Data Center RPC. In Proceedings of the ACM Symposium on Cloud Computing, Seattle, WA, USA, 3–5 November 2014; SOCC ’14. pp. 1–13. [Google Scholar] [CrossRef]
  22. Weil, S.A.; Pollack, K.T.; Brandt, S.A.; Miller, E.L. Dynamic Metadata Management for Petabyte-Scale File Systems. In Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, Pittsburgh, PA, USA, 6–12 November 2004; pp. 1–12. [Google Scholar]
  23. Kalia, A.; Kaminsky, M.; Andersen, D.G. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Annual Technical Conference, Denver, CO, USA, 22–24 June 2016; pp. 437–450. [Google Scholar]
  24. Stuedi, P.; Trivedi, A.; Pfefferle, J.; Stoica, R.; Metzler, B.; Ioannou, N.; Koltsidas, I. Crail: A High-Performance I/O Architecture for Distributed Data Processing. In Bulletin of the IEEE Computer Society Technical Committee on Data Engineering; IEEE: New York, NY, USA, 2017; Available online: https://api.semanticscholar.org/CorpusID:19264551 (accessed on 21 September 2025).
  25. Salomon, E.C. Accelio: IO, Message and RPC Acceleration Library. Available online: https://github.com/accelio/accelio (accessed on 21 September 2025).
  26. Boyer, E.B.; Broomfield, M.C.; Perrotti, T.A. GlusterFS One Storage Server to Rule Them All; Los Alamos National Laboratory (LANL): Los Alamos, NM, USA, 2012. [Google Scholar]
  27. Kruszona-Zawadzki, J.; Saglabs, S.A. Moose File System (MooseFS). Available online: https://moosefs.com (accessed on 21 September 2025).
  28. Dong, M.; Bu, H.; Yi, J.; Dong, B.; Chen, H. Performance and Protection in the ZoFS User-space NVM File System. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, Huntsville, ON, Canada, 27–30 October 2019; pp. 478–493. [Google Scholar]
  29. Lu, B.; Hao, X.; Wang, T.; Lo, E. Dash: Scalable hashing on persistent memory. Proc. VLDB Endow. 2020, 13, 1147–1161. [Google Scholar] [CrossRef]
  30. Oukid, I.; Lasperas, J.; Nica, A.; Willhalm, T.; Lehner, W. FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; SIGMOD ’16. pp. 371–386. [Google Scholar] [CrossRef]
  31. Luo, X.; Zuo, P.; Shen, J.; Gu, J.; Wang, X.; Lyu, M.R.; Zhou, Y. SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory. In Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), Boston, MA, USA, 10–12 July 2023; pp. 553–571. [Google Scholar]
  32. NVIDIA Developer Documentation. User-Mode Memory Registration (UMR); NVIDIA Corporation: Santa Clara, CA, USA, 2023. [Google Scholar]
  33. Gao, Y.; Li, Q.; Tang, L.; Xi, Y.; Zhang, P.; Peng, W.; Li, B.; Wu, Y.; Liu, S.; Yan, L.; et al. When Cloud Storage Meets RDMA. In Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), Virtual Event, 12–14 April 2021; USENIX Association: Berkeley, CA, USA, 2021; pp. 519–533. [Google Scholar]
  34. Filebench. Available online: https://github.com/filebench/filebench (accessed on 21 September 2025).
  35. MDTest. Available online: https://github.com/LLNL/mdtest (accessed on 21 September 2025).
  36. dmp265. Crail-Fuse: FUSE Interface for Crail Distributed File System; GitHub Repository: San Francisco, CA, USA, 2025. [Google Scholar]
  37. Standard Performance Evaluation Corporation (SPEC). SPECsfs2008 User’s Guide; Standard Performance Evaluation Corporation (SPEC): Warrenton, VA, USA, 2008. [Google Scholar]
Figure 1. Server CPU utilization.
Figure 1. Server CPU utilization.
Electronics 14 03778 g001
Figure 2. CephFS metadata latency breakdown.
Figure 2. CephFS metadata latency breakdown.
Electronics 14 03778 g002
Figure 3. DirectFS architecture.
Figure 3. DirectFS architecture.
Electronics 14 03778 g003
Figure 4. Hash-based namespace indexing.
Figure 4. Hash-based namespace indexing.
Electronics 14 03778 g004
Figure 5. Permission encoding example.
Figure 5. Permission encoding example.
Electronics 14 03778 g005
Figure 6. Prefetch block layout.
Figure 6. Prefetch block layout.
Electronics 14 03778 g006
Figure 7. Impact of directory depth.
Figure 7. Impact of directory depth.
Electronics 14 03778 g007
Figure 8. Impact of symlinks.
Figure 8. Impact of symlinks.
Electronics 14 03778 g008
Figure 9. Impact of permission diversity on RDMA read size.
Figure 9. Impact of permission diversity on RDMA read size.
Electronics 14 03778 g009
Figure 10. RDMA read size vs. prefetch block size.
Figure 10. RDMA read size vs. prefetch block size.
Electronics 14 03778 g010
Figure 11. RDMA read size vs. access skewness.
Figure 11. RDMA read size vs. access skewness.
Electronics 14 03778 g011
Figure 12. System throughput under increasing client count.
Figure 12. System throughput under increasing client count.
Electronics 14 03778 g012
Figure 13. Metadata operation latency under fileserver.f workload (avg. dir width = 3, depth = 6.3).
Figure 13. Metadata operation latency under fileserver.f workload (avg. dir width = 3, depth = 6.3).
Electronics 14 03778 g013
Figure 14. Mdtest throughput vs. directory depth.
Figure 14. Mdtest throughput vs. directory depth.
Electronics 14 03778 g014
Figure 15. Mdtest throughput vs. system scale.
Figure 15. Mdtest throughput vs. system scale.
Electronics 14 03778 g015
Table 1. Latency comparison between naive RDMA and RPC-based metadata resolution.
Table 1. Latency comparison between naive RDMA and RPC-based metadata resolution.
Depth1234567
Naive RDMA (μs)24313747586676
RPC (μs)63646561616362
Table 2. Metadata write microbenchmark.
Table 2. Metadata write microbenchmark.
FSCreateUnlinkMkdirRmdirChmodRename
MooseFS214197202195181230
DirectFS217201205202195251
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, L.; Zhang, Z.; Ni, R.; Cai, M. DirectFS: An RDMA-Accelerated Distributed File System with CPU-Oblivious Metadata Indexing. Electronics 2025, 14, 3778. https://doi.org/10.3390/electronics14193778

AMA Style

Jiang L, Zhang Z, Ni R, Cai M. DirectFS: An RDMA-Accelerated Distributed File System with CPU-Oblivious Metadata Indexing. Electronics. 2025; 14(19):3778. https://doi.org/10.3390/electronics14193778

Chicago/Turabian Style

Jiang, Lingjun, Zhaoyao Zhang, Ruixuan Ni, and Miao Cai. 2025. "DirectFS: An RDMA-Accelerated Distributed File System with CPU-Oblivious Metadata Indexing" Electronics 14, no. 19: 3778. https://doi.org/10.3390/electronics14193778

APA Style

Jiang, L., Zhang, Z., Ni, R., & Cai, M. (2025). DirectFS: An RDMA-Accelerated Distributed File System with CPU-Oblivious Metadata Indexing. Electronics, 14(19), 3778. https://doi.org/10.3390/electronics14193778

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop