Efficient I/O Merging Scheme for Distributed File Systems

An, Byoung Chul; Sung, Hanul

doi:10.3390/sym15020423

Open AccessArticle

Efficient I/O Merging Scheme for Distributed File Systems

by

Byoung Chul An

¹ and

Hanul Sung

^2,*

¹

Department of Convergence Electronic Engineering, Sangmyung University, Seoul 03016, Republic of Korea

²

Department of Game Design and Development, Sangmyung University, Seoul 03016, Republic of Korea

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(2), 423; https://doi.org/10.3390/sym15020423

Submission received: 23 December 2022 / Revised: 19 January 2023 / Accepted: 31 January 2023 / Published: 5 February 2023

(This article belongs to the Special Issue Recent Advances in Software for Symmetry)

Download

Browse Figures

Versions Notes

Abstract

:

Recently, decentralized file systems are widely used to overcome centralized file systems’ load asymmetry between nodes and the scalability problem. Due to the lack of a metadata server, decentralized systems require more RPC requests to control metadata processing between clients and servers, which adversely impacts the I/O performance and traffic imbalance by increasing RPC latency. In this paper, we propose an efficient I/O scheme to reduce the RPC overhead in decentralized file systems. Instead of sending a single RPC request at a time, we enqueued the RPCs in the global queue and merged them into larger RPC requests, thus avoiding excessive RPC latency overheads. The experimental results showed that our scheme improves write and read performance by up to 13% and 16%, respectively, compared with those of the original.

Keywords:

distributed file system (DFS); scalability; merging I/O

1. Introduction

As cloud computing services become prevalent, distributed file systems (DFSs), the core technology of the service, are in the spotlight. DFSs’ characteristics, such as user transparency and efficient I/O processing, make it easy for users to utilize cloud computing services. DFSs can be divided in two categories based on the presence of a metadata server: centralized and decentralized file systems.

Centralized file systems, such as the Hadoop Distributed File System (HDFS) [1] and Lustre [2], have separate servers for processing metadata and the data themselves. I/O requests from clients are first sent to metadata servers to handle the metadata operations of the file. The file systems use the results from the metadata server to process the actual read or write operations on the data server. Thus, when the number of requests for metadata increases in the metadata servers, the servers have a higher load than the clients, and users suffer low I/O performance caused by asymmetry between server nodes [3].

In addition, more than 60% of files used in Big Data processing are becoming smaller than 64 KB [4,5]. The phenomenon worsens the I/O performances by increasing the already high overheads in centralized file systems’ metadata servers.

To overcome the limitations of centralized file systems, decentralized file systems, such as the Gluster file system [6] and Swift [7], process metadata and data operations on the same servers. This results in I/O processing being performed in a client-driven manner [8] and creates little performance degradation caused by metadata overheads in metadata servers. As a result, decentralized metadata file systems become more scalable and suitable for dealing with large amounts of I/O requests.

Since distributed file systems drive multiple computer nodes that are physically separated, clients and servers communicate via the Remote Procedure Call (RPC) protocol, and the RPC overheads created by the communications are inevitable. Especially in the case of decentralized file systems, where there are no separated metadata servers, even more RPC requests for metadata are exchanged between clients and servers. Thus, RPC overheads in decentralized file systems increase under metadata-intensive small-sized I/Os, and I/O performance degradation is not solved even in decentralized file systems. Additionally, since each server node handles different I/O patterns, a performance bottleneck occurs in some nodes with small-sized I/O requests, resulting in performance imbalance between nodes. The imbalance problem in the metadata server of centralized file systems is solved, but the load asymmetry between nodes due to RPC overheads remains, resulting in broken symmetry and reduced scalability.

Some researchers mitigate the I/O performance degradation by modifying the software stacks for processing I/O requests [9,10,11,12]. However, the fundamental RPC overheads remain to be addressed by these researchers. RPC requests are stored in the I/O queue in the order requested, and the next request is sent only after the completion requests of the previous RPC request arrives. In other words, RPC requests are not processed in parallel; instead, they must wait for processing time in server nodes and the network latency of the previous requests. In addition, the number of RPC requests transmitted between nodes is not changed, and the processing time for RPC requests is considerable. Other researchers have handled RPC requests faster using modified network protocols with advanced hardware [13,14,15,16,17]. However, it is too costly to utilize all the high-spec devices in large-scale storage. Due to these reasons, a new software-level RPC protocol is needed, one that addresses the fundamental RPC overheads without any hardware supports.

In this paper, we propose a method to reduce RPC overheads, improve performance, and resolve I/O loads’ asymmetry in decentralized file systems by merging multiple RPC requests, then sending them at once to servers. We applied the I/O merging method on the Gluster file system, a well-known decentralized file system. We also evaluated the proposed method using the FIO benchmark and Small-File benchmark. The evaluated results showed that our proposed scheme outperforms the original file system by up to 16% and 13% in read and write performance, respectively, distributing I/O loads across nodes.

The rest of the paper is organized as follows. Section 2 shows the related works concerning various methods, such as modifying network stacks and adapting new network topologies, to reduce network traffic. Section 3 explains the basic architecture and I/O flow of the GlusterFS and identifies the cause of the RPC overheads in the file system. Section 4 presents the design and the implementation of the proposed I/O merging method. Section 5 compares the performance between the original GlusterFS and the proposed method. We discuss various factors that affect the performance of the merging method in Section 6. Section 7 explains the future work and concludes the paper.

2. Related Work

Warehouse-scale computing systems (WSCs) are systems where every computer node is connected via a network [18]. Over the years, hardware resources used for WSCs have improved gradually, increasing the parallelism in processing. However, network traffic between each node soars drastically and becomes unpredictable, causing a bottleneck. Studies and research continue to be performed to reduce network bottlenecks and maximize the throughput.

The field of networks has been actively researched to maximize network performance by adapting larger frames and modifying the protocol stack. Some research has conducted experiments to maximize network throughput usages by recovering partial packets of jumbo frames in wireless LAN communication [19]. Others have modified above and below the TCP protocol stack with a variety of optimizations, delivering up to 2 Gb/s of end-to-end TCP bandwidth [20]. Studies have built a new transport protocol for a high volume of short messages, which employs a receiver-managed priority queue, giving round-trip times of less than 15 us [21]. However, the RPC waiting problem mentioned throughout the paper still lies even with the efficient network protocols.

Researchers have utilized extra network hardware to improve the performances by shortening the processing time. ALTOCUMULUS uses a software–hardware co-designed NIC to schedule RPCs at nanosecond scales, replacing under-performing existing RPC schedulers [13]. FPGA-based re-configurable RPC stack integrated into near-memory NICs are designed for delivering 1.3–3.8× higher throughput while accommodating a variety of microservices [14].

Remote direct memory access (RDMA) is also redesigned or merged with various methodologies for better resource usage and low RPC latency. FaSST remodels the original one-sided RDMA primitives to two-sided datagrams. The system outperforms other designs by two-fold while using half the hardware resources [15]. Researchers have presented a new RPC framework that integrates RPC processing with network processing in the user space by using RDMA, coordinating the distribution of resources for both processings within a many-core system [16]. Another framework is designed based on Apache Thrift over RDMA with a hierarchical hint scheme optimizing heterogeneous RPC services and functions, showing up to 55% performance improvement [17]. However, the above research not only requires high-spec hardware supports, but also needs a modification of the software stacks related to the network. Furthermore, the number of RPC requests does not change, leaving the RPC bottleneck unresolved.

Other studies have reduced the latency with an efficient metadata service for large-scale distributed file systems. InifiniFS uses a service that resolves metadata bottlenecks, such as long path resolution and poor locality, in large-scale file systems [22]. The system utilizes access-content-decoupled partitioning, speculative path resolution, and an optimistic access metadata cache method to deliver high-performance metadata operations. Researchers have addressed challenging metadata management in centralized file system, such as Hadoop, by creating metadata access graphs based on historical access values, minimizing the latency [23]. There are also studies in the field of file systems that drop network overheads by reducing the operations per an RPC request. The network file system (NFS) flows through the VFS and utilizes the POSIX file system API. Due to the limitation of the low-level POSIX file system, numerous RPC requests are sent in small units, which creates overheads. Detouring the VFS layer and implementing the system at the user level, a team built their own API to resolve the bottleneck from the POXIS level in the NFS [24]. The API merges as much operations it can handle and sends the stacked RPC requests to reduce the RPC overhead. However, as vNFS is reconstructed from the kernel-level NFS to the user level, there is a limitation that the users need to re-implement the applications, using the API. The research has reduced the I/O processing time by adopting the modified metadata processing procedure and file systems, but, there are still problems regarding the huge amount of RPC requests exchanged between nodes and the performance bottleneck caused by waits among RPC requests.

3. Background and Motivation

3.1. The GlusterFS Architecture

The GlusterFS has a storage pool that contains several storage servers. A server has a logical collection of volumes, which are a cluster of bricks. A brick works as the basic unit of storage that holds files. The GlusterFS servers have their own local file systems, and files are stored in directories managed by the file systems. As depicted in Figure 1, servers are merged within the storage pool in parallel. Clients traverse between multiple servers within the storage pool.

Figure 2 shows a file operation flow from the client to the server in the GlusterFS. I/O requests from a user program reach the server through a virtual file system (VFS), the Filesystem in Userspace (FUSE) in the kernel space, and is translated by various translators and the FUSE translator according to the translation options. In the server node, the received requests are sent to the GlusterFS server translator and Portable Operating System Interface (POSIX) translator. The POSIX translator uses system calls to transfer controls to the VFS in the server’s kernel space, then the VFS accesses the appropriate local file system. The local file system, finally, handles the requests.

3.2. Handling I/O Requests in GlusterFS

Figure 3a illustrates RPC requests exchanged between a client and a server for handling a write request in the GlusterFS. There are nine RPC requests between them, starting with the “lookup” operation request and its “callback”. The operation–callback pattern is repeated throughout the entire request processing procedure. Figure 3b shows the RPC request transmissions for a read in the GlusterFS. There are three RPC–callback pairs: “lookup”, “open”, and “read”.

3.3. Performance Degradation

As Figure 3 depicts, there are additional RPC requests besides the actual read or write operations. For a large file, requests are processed with a single set of metadata operations for the file. The file system does not have to lookup or open the file multiple times to service requests for the same file. On the other hand, I/Os for multiple small files require as many metadata operation sets as the number of the files. The high metadata-to-I/O ratio causes overheads, which lead to I/O performance degradation.

Figure 4 shows the throughput of 4KB block-sized reads and writes on small files and a large file; the left is the result performed on a single 40 MB file, and the right is performed on 10,000 4 KB-sized files. Even though both issue 10,000 write requests, the performance on the large file is ten-times faster than that on the small files.

3.4. Enqueue and Dequeue Threads

Figure 5 shows the effects of the number of I/O threads on the number of requests queued in the I/O thread queue and the throughput in the GlusterFS. The row is the number of dequeue threads, and the column is the number of enqueue threads in both figures.

Figure 5a shows the number of requests queued in the I/O thread queue. The figure can be separated into two parts: the lower left, where we use more enqueue threads than dequeue threads, and the upper right, where the numbers are reversed.

In the lower left, requests pile up in the queue as there are more enqueue threads. Requests in the top right are dequeued just as they are enqueued as there are more dequeue threads. Since the current RPC request cannot be submitted until the previous ones have been transmitted, there still are a few requests remaining in the queue.

Even though the requests are queued dynamically according to the number of each thread, the throughput depicted in Figure 5b stays static throughout the figure. Due to the characteristic of the RPC requests, there are few fluctuations in the throughput against the change in the number of threads. Regardless of how many threads are enqueuing or dequeuing requests, we cannot improve throughput by adjusting the number of threads. The performance becomes even worse with metadata-intensive I/Os as the actual I/Os have to wait for metadata RPC requests to be sent.

4. Design and Implementation

To resolve metadata RPC overheads under metadata-intensive I/Os, we merge multiple RPC requests into larger RPC requests within the global queue, then send them to the server at once.

Figure 6a depicts the enqueue and dequeue flow of RPC requests in the original GlusterFS. On the client side, an I/O request is dequeued from the I/O thread queue, translated by the translator according to the option, and stays in the queue as an RPC request waiting its preceding RPC request to be processed. The delays between translation and transmission become the bottleneck.

Due to the bottleneck caused by the limitation of the RPC protocol, workloads executing I/Os suffer poor performances. We designed the global queue where multiple RPC requests are merged into a single request to overcome the limitation. Figure 6b and Figure 7 illustrate the implementation and the flowchart of the proposed solution.

The sender thread checks the queue periodically to see whether the number of requests has reached the predetermined water mark. If it has reached the mark, the merger thread serializes the RPC requests in a single packet and transmits it to the server. We provide the mutex lock in the queue for synchronization and preventing race conditions as multiple threads are accessing the same global queue. After receiving the packet, the server calls the parsing function, and the POSIX system calls for the corresponding I/O, then sends the result back to the client. The client checks the call ID of the received data and matches them with the requests.

We discuss the optimal number of requests to merge depending on the specs and environment of the server in Section 6.2.

RPC requests used in the GlusterFS are sent over the network since the server nodes are physically separated. Therefore, the network environment affects the performance. Even if we merge multiple RPC requests using the proposed method, the new RPC request might be divided into a few pieces to fit the size of the maximum transmission unit (MTU), or the network bandwidth is wasted if the merged request is smaller than the MTU. We used the jumbo frame [25], which is the 9000 bytes-sized MTU, under 10 Gbps Ethernet, and merged as many requests as possible to utilize the maximum network bandwidth.

5. Evaluation

5.1. Environmental Setup

We measured the read and write performances over our solution using the Small-File benchmark and FIO benchmark on the machines shown in Figure 8.

The evaluations were performed on two different client-to-server setups: one-to-one (a single client node connected to a single server node) and one-to-two (a single client node connected to two server nodes). As a request is sent only to one server in read processing, there are no differences between sending the requests to single and multiple servers. Thus, they were tested only on the one-to-one server setup for read I/O performances. For write requests, we used one-to-one and one-to-two client-to-server setups. The two nodes used in one-to-two setup are in replication mode.

The following subsections demonstrate and compare the performance of the proposed method with the original GlusterFS under the Small-File [26] benchmark and FIO [27] benchmark.

5.2. Performance of Merging Read Requests

Figure 9 depicts the throughput and execution time of the original GlusterFS and the proposed method according to the number of read requests merged in the global queue. Read I/O throughput on 30,000 4 KB-sized files using the Small-File benchmark is shown in Figure 9a. The original GlusterFS shows a 27 MB/s throughput. The throughput of our method increases as the number of RPC requests merged increases, showing 29 MB/s, 30 MB/s, and 31.5 MB/s for 2, 3, and 4–6 requests merged, respectively.

The read I/O throughput on 10,000 4 KB-sized files using the FIO benchmark is shown in Figure 9b. The performance tendency along the number of requests merged is the same as the result of the Small-File benchmark.

The right graphs in each subfigure represent the total execution time in the Small-File benchmark and FIO benchmark. Since RPC requests have to wait for their preceding requests, the total amount of time that requests wait between each request affects the execution time. It takes longer to execute I/O operations with more RPC requests to send. Merging multiple requests, therefore, shortens the execution time as it reduces the time requests waiting in the queue. Consequently, our method outperforms the original by up to 16% in both the execution time and read I/O throughput.

We present the throughputs evaluated on the eight-core machine in Figure 10. As the processing speed becomes faster with more cores and better specs, the performance bounds at 5 requests in the 8-core machine, whereas it bounds at 4 requests in the 4-core machine. Improving the throughput by 17% compared to the original, the performance itself is better than the 4-core machine because the 8-core machine merges five requests at once.

However, because of the overhead inherent to merging itself, simply increasing the number of RPC requests to merge does not improve the performance. If we choose to merge too many requests at once, overheads occur from holding the lock and serializing them in the queue. It also becomes a heavy burden for the servers to parse large serialized RPC requests, which causes more overheads. Hence, both throughput and execution time converge as we merge more requests.

5.3. Performance of Merging Write Requests

The evaluations on write I/O throughput using the scheme give similar results to that of read requests. Figure 11 depicts the throughput and the execution time according to the number of write requests merged in the global queue. We use 10,000 4 KB-sized files using the Small-File benchmark for the evaluation, shown in Figure 11a. The throughput of our method increases as more RPC requests are merged, showing 27.5 MB/s, 28.3 MB/s, and 29.3 MB/s for 2, 3, and 4–6 requests merged each, while the original shows a 26 MB/s throughput. There is up to a 13% improvement in the performance compared to the original.

Write I/O throughput on 10,000 4 KB-sized files using the FIO benchmark is shown in Figure 11b. The result resembles the Small-File benchmark as in Section 5.2, as does the result of the execution time.

Write I/O performance does not gain benefits in performance as much as read I/O performance with the solution. Because we must attach the user data on write requests, the requests become larger in size than read RPC request. Larger requests take longer for the client to enqueue them in the global queue, as well as for the server to parse the packets due to the size, causing less improvement in the performance.

Figure 12 shows the result conducted on two server nodes, both nodes set as the replication mode, depicting a maximum 12.4% increment in the throughput for the Small-File benchmark. This is even a smaller increment of the performance compared with the above. As the client has to prepare the RPCs to be sent to the two server nodes, it gives heavier loads to the clients, leading to less improvements.

6. Discussion

In this section, we discuss the parameters that affect the performances and decide how many requests should be merged into a single one in the global queue.

6.1. File Size and Block Size

We evaluated the performance of our solution with different block sizes. Figure 13b demonstrates the write I/O throughput on a 16KB-sized file, enlarging the block size from 4 KB to 8 KB to 16 KB. Because we are sending more user data to the server with bigger merged requests, the larger the block size becomes, the better the throughput becomes.

However, unlike write requests, there are no differences in read I/O throughput as the size changes, as shown in Figure 13a, since the RPC requests are not transmitted to the server in a block size manner. When we perform read operations on a large file, the GlusterFS sends the entire file from the server to the client for efficiency, even if the requesting block size is smaller than the file size. The client splits the file from the server into the block size and returns the results to the user. As a result, our I/O merging method with different block sizes does not affect the read operations.

6.2. The Optimal Number of Requests to Merge

Since multiple requests are merged before being sent to the server, we increased the performance as it eliminates the bottleneck caused by delays between RPC requests. Yet, the performance converges or rather drops, if we merge the RPC requests more than a certain number. Due the number changes along the experimental environments, it has to be found empirically. We identified the number to merge that fully utilizes the solution.

As more requests are combined at once, as shown in Figure 14, the transmission speed increases and the transmission time decreases. We gain increases in performance from the decreased transmission time. However, the performance bounds if the overheads in processing too many requests outgrow the transmission time. This relation is the key for determining how many requests to merge in general. The proposed method is fully utilized if we use the number that balances between processing time and the transmission time.

The balance is achieved differently with its own environment. As shown in Section 5.2, since the 8-core machine has more powerful hardware than the 4-core machine, it fully utilizes the solution with 5 requests merged instead of 4 requests. Thus, a faster processing speed from high-spec machines has to be considered as it reduces the processing time. It is also desirable to merge more read requests due to the delay caused by processing larger write requests, as mentioned in Section 5.3.

Additionally, a user application has to wait if we choose to merge more requests than the dequeue threads can obtain from the I/O thread queue. It is suitable to merge at least one less than the number of requests in the queue in general. The number has to be obtained via several experiments. In our evaluation setups, it is good to merge below 4 requests at once as 5–7 requests are piled up in the I/O thread queue.

7. Conclusions

In this paper, we proposed a new software-level RPC protocol by merging requests without any hardware support. With this method, we resolved the performance degradation and I/O load asymmetry created under metadata-intensive workloads in decentralized file systems. Each RPC request between servers and clients has to wait for its preceding requests before being sent, causing a bottleneck. To handle this fundamental RPC overhead, our solution merges multiple RPC requests and sends them at once. We created a lock-synchronized global queue to collect the requests and modified dequeue threads to send the merged one. With the proposed method, we eliminated waits between RPC requests and multiple RPC requests are processed in parallel, distributing the imbalanced loads and improving the performance. The proposed method was evaluated with the GlusterFS and outperformed the original one by up to 16% and 13% in read and write I/O processing, resolving the asymmetry. We also explored factors that affect the utilization of the merging method with various experimental environments.

Future Work We only designed a software-level RPC protocol to handle the degradation in decentralized file systems by merging multiple RPC requests in a single RPC request. In the future, we will devise a way to merge multiple I/O metadata’s processing through multiple software layers in a single one such that it removes redundant RPC requests, reducing the number of requests themselves. Furthermore, we are planning to devise system-level optimization, such as cache and memory management.

Author Contributions

Conceptualization, B.C.A. and H.S.; methodology, B.C.A. and H.S.; software, B.C.A.; validation, B.C.A. and H.S.; investigation, B.C.A.; writing—original draft preparation, B.C.A.; writing—review and editing, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a 2021 Research Grant from Sangmyung University (2021-A000-0386, 2022-A000-0051).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shvachko, K.; Kuang, H.; Radia, S.; Chansler, R. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, USA, 3–7 May 2010; pp. 1–10. [Google Scholar]
Schwan, P. Lustre: Building a file system for 1000-node clusters. In Proceedings of the 2003 Linux Symposium, Ottawa, ON, Canada, 23–26 June 2003; Volume 2003. [Google Scholar]
Depardon, B.; Mahec, G.L.; Séguin, C. Analysis of Six Distributed File Systems; Technical Report; HAL. 2013. Available online: https://hal.inria.fr/hal-00789086 (accessed on 19 November 2021).
Welch, B.; Noer, G. Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions. In Proceedings of the IEEE 29th Symposium Mass Storage Systems and Technologies, Long Beach, CA, USA, 6–10 May 2013; p. 1.12. [Google Scholar]
Dai, H.; Wang, Y.; Kent, K.; Zeng, L.; Xu, C. The State of the Art of Metadata Managements in Large-Scale Distributed File Systems—Scalability, Performance and Availability. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 3850–3869. [Google Scholar] [CrossRef]
Boyer, E.B.; Broomfield, M.C.; Perrotti, T.A. Glusterfs One Storage Server to Rule Them All. No. LA-UR-12-23586; Los Alamos National Lab. (LANL): Los Alamos, NM, USA, 2012.
Cabrera, L.-F.; Long, D.D.E. Swift: Using Distributed Disk Striping to Provide High I/O Data Rates; University of California, Santa Cruz, Computer Research Laboratory: Santa Cruz, CA, USA, 1991; Volume 8523. [Google Scholar]
Zheng, Q.; Ren, K.; Gibson, G. BatchFS: Scaling the File System Control Plane with Client-funded Metadata Servers. In Proceedings of the 9th Parallel Data Storage Workshop, New Orleans, LA, USA, 16 November 2014. [Google Scholar]
Fattahi, T.; Azmi, R. A new approach for directory management in GlusterFS. In Proceedings of the 2017 9th International Conference on Information and Knowledge Technology (IKT), Tehran, Iran, 18–19 October 2017. [Google Scholar]
Kim, D.; Eom, H.; Yeom, H. Performance optimization in glusterfs on ssds. Kiise Trans. Comput. Pract. 2016, 22, 95–100. [Google Scholar] [CrossRef]
Vaidya, M.; Deshpande, S. Critical study of performance parameters on distributed file systems using MapReduce. Procedia Comput. Sci. 2016, 78, 224–232. [Google Scholar] [CrossRef]
Tao, X.; Alei, L. Small file access optimization based on GlusterFS. In Proceedings of the 2014 International Conference on Cloud Computing and Internet of Things, IEEE, Changchun, China, 6–7 December 2019; pp. 101–104. [Google Scholar]
Zhao, J.; Uwizeyimana, I.; Ganesan, K.; Jeffrey, M.C.; Jerger, N.E. ALTOCUMULUS: Scalable Scheduling for Nanosecond-Scale Remote Procedure Calls. In Proceedings of the 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), Chicago, IL, USA, 1–5 October 2022. [Google Scholar]
Lazarev, N.; Xiang, S.; Adit, N.; Zhang, Z.; Delimitrou, C. Dagger: Efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual, 19–23 April 2021. [Google Scholar]
Kalia, A.; Kaminsky, M.; Andersen, D.G. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided(RDMA) Datagram RPCs. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016. [Google Scholar]
Stuedi, P.; Trivedi, A.; Metzler, B.; Pfefferle, J. Darpc: Data center rpc. In Proceedings of the ACM Symposium on Cloud Computing, Seattle, WA, USA, 3–5 November 2014. [Google Scholar]
Li, T.; Shi, H.; Lu, X. HatRPC: Hint-accelerated thrift RPC over RDMA. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, USA, 14–19 November 2021. [Google Scholar]
Barroso, L.A.; Clidaras, J.; Hölzle, U. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar]
Iyer, A.P.; Deshpande, G.; Rozner, E.; Bhartia, A.; Qiu, L. Fast Resilient Jumbo frames in wireless LANs. In Proceedings of the 2009 17th International Workshop on Quality of Service, IWQos, Charleston, SC, USA, 13–15 July 2009. [Google Scholar]
Chase, J.S.; Gallatin, A.J.; Yocum, K.G. End system optimizations for high-speed TCP. IEEE Commun. Mag. 2001, 39, 68–74. [Google Scholar] [CrossRef]
Montazeri, B.; Li, Y.; Alizadeh, M.; Ousterhout, J.K. Homa: A receiver-driven low-latency transport protocol using network priorities. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, Budapest, Hungary, 20–25 August 2018. [Google Scholar]
Lv, W.; Lu, Y.; Zhang, Y.; Duan, P.; Shu, J. InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems. In Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST 22), Santa Clara, CA, USA, 22–24 February 2022. [Google Scholar]
Nguyen, M.C.; Won, H.; Son, S.; Gil, M.; Moon, Y. Prefetching-based metadata management in advanced multitenant hadoop. J. Supercomput. 2019, 75, 533–553. [Google Scholar] [CrossRef]
Chen, M.; Bangera, G.B.; Hildebrand, D.; Jalia, F. vNFS: Maximizing NFS Performance with Compounds and Vectorized I/O. In Proceedings of the 15th USENIX conference on File nad Storage Technologies (FAST 17), Santa Clara, CA, USA, 27 February–2 March 2017. [Google Scholar]
Alliance, E.; Kohl, B. Ethernet Jumbo Frames. 2009. Available online: https://www.ethernetalliance.org/wp-content/uploads/2011/10/EA-Ethernet-Jumbo-Frames-v0-1.pdf (accessed on 6 January 2022).
England, B. SMALLFILE Benchmark. Available online: https://github.com/bengland2/smallfile (accessed on 6 January 2022).
Axboe, J. Fio-Flexible I/O Tester Synthetic Benchmark. 2005. Available online: https://github.com/axboe/fio (accessed on 13 June 2015).

Figure 1. The GlusterFS architecture.

Figure 2. The flow of a file operation in the GlusterFS.

Figure 3. Write and read RPC operations in the GlusterFS.

Figure 4. I/O performance on a large file and small files in the GlusterFS.

Figure 5. The number of requests in the I/O thread queue and throughput according to the number of enqueue and dequeue threads.

Figure 6. RPC request flow of the original GlusterFS and the proposed solution.

Figure 7. Flowchart of the proposed client side thread.

Figure 8. Hardware specifications used for the evaluation.

Figure 9. The performances according to the number of read requests merged in a 4-core machine.

Figure 10. The performances according to the number of read requests merged in an 8-core machine.

Figure 11. The performances according to the number of write requests merged in a 4-core machine.

Figure 12. The performances according to the number of write requests merged on 2 server nodes.

Figure 13. I/O performances on 16KB-sized file with different block sizes.

Figure 14. The component of the execution time.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

An, B.C.; Sung, H. Efficient I/O Merging Scheme for Distributed File Systems. Symmetry 2023, 15, 423. https://doi.org/10.3390/sym15020423

AMA Style

An BC, Sung H. Efficient I/O Merging Scheme for Distributed File Systems. Symmetry. 2023; 15(2):423. https://doi.org/10.3390/sym15020423

Chicago/Turabian Style

An, Byoung Chul, and Hanul Sung. 2023. "Efficient I/O Merging Scheme for Distributed File Systems" Symmetry 15, no. 2: 423. https://doi.org/10.3390/sym15020423

APA Style

An, B. C., & Sung, H. (2023). Efficient I/O Merging Scheme for Distributed File Systems. Symmetry, 15(2), 423. https://doi.org/10.3390/sym15020423

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient I/O Merging Scheme for Distributed File Systems

Abstract

1. Introduction

2. Related Work

3. Background and Motivation

3.1. The GlusterFS Architecture

3.2. Handling I/O Requests in GlusterFS

3.3. Performance Degradation

3.4. Enqueue and Dequeue Threads

4. Design and Implementation

5. Evaluation

5.1. Environmental Setup

5.2. Performance of Merging Read Requests

5.3. Performance of Merging Write Requests

6. Discussion

6.1. File Size and Block Size

6.2. The Optimal Number of Requests to Merge

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI