Adaptively Periodic I/O Scheduling for Concurrent HPC Applications

: With the convergence of big data and HPC (high-performance computing), various machine learning applications and traditional large-scale simulations with a stochastically iterative I/O periodicity are running concurrently on HPC platforms, which poses more challenges on the scarcely shared I/O resources due to the ever-growing data transfer demand. Currently the existing heuristic online and periodic ofﬂine I/O scheduling methods for traditional HPC applications with a ﬁxed I/O periodicity are not suitable for the applications with stochastically iterative I/O periodicities, which are required to schedule the concurrent I/Os from different applications under I/O congestion. In this work, we propose an adaptively periodic I/O scheduling (APIO) method that optimizes the system efﬁciency and application dilation by taking the stochastically iterative I/O periodicity of the applications into account. We ﬁrst build a periodic ofﬂine scheduling method within a speciﬁed duration to capture the iterative nature. After that, APIO adjusts the bandwidth allocation to resist stochasticity based on the actual length of the computing phrase. In the case where the speciﬁed duration does not satisfy the actual running requirements, the period length will be extended to adapt to the actual duration. Theoretical analysis and extensive simulations demonstrate the efﬁciency of our proposed I/O scheduling method over the existing online approach.


Introduction
High-performance computing (HPC) systems, especially supercomputers, play an unprecedentedly important role in modern scientific discovery, thanks to their enormous computing power and storage capacity. Large-scale numerical simulations from different fields, such as meteorology, aerospace, bio-pharmacy, and high-energy physics, are helping scientists to accelerate the progress of research and to save money by eliminating the need for real experiments [1]. With the era of the exascale supercomputer coming, more large-scale modeling, simulations, and other applications will be deployed and bring more challenges. I/O bottleneck is one of the most severe problems on HPC platforms.
Although computing power has increased dramatically, system I/O throughput cannot expand synchronously due to storage technology developments [2]. Larger-scale applications deployed on HPC will produce greater data transferring demands on the scarce I/O resource. Under the convergence trend of big data and HPC [3], certain big data applications have higher data requirements on the parallel file system (PFS). In addition, fault-tolerance technologies, such as checkpointing/restart, which are designed to resist the decreasing Mean Time between Failures (MTBF) also exacerbate I/O contention [4]. In order to meet these practical demands, data transferring and management must be more efficient.
Many studies have been conducted to mitigate the I/O bottleneck problem. In terms of system architecture, there are topology-aware methods [5], memory hierarchy-aware methods [6][7][8], burst-buffering methods [4,9,10], and so on. From the aspect of applications, many approaches, such as application coordinating [11,12], I/O scheduling [13][14][15][16][17], and data layouting [18,19], are proposed. The I/O scheduling method, which allocates I/O bandwidth to each application in order to optimize system utilization and applications efficiency, is widely used in HPC.
Nevertheless, the existing I/O scheduling approaches largely focus on traditional HPC applications that usually have a fixed I/O periocity. With the convergence of big data and HPC, many machine learning (ML)-based applications deployed on HPC exhibit a stochastically iterative I/O periodicity, the executions of which depend on specific input data to run in an iterative way to approximate an acceptable solution [20,21], such as the structural identification of orbital anatomy application [22]. Some scientific data analytic applications also present stochasticity, such as functional MRI quality assurance (fMRIQA) [22]. Furthermore, many traditional scientific applications based on solving large sparse linear systems with iterative methods, such as the randomized Kaczmraz method, also possess these properties [23]. The existing methods are either unable to fully exploit the characteristics of applications, such as online scheduling [13], or not suitable for applications with a stochastically iterative I/O periodicity, such as periodic I/O scheduling [14,15,24]. To simplify the expression, we refer to the application with a stochastically iterative I/O periodicity as a stochastic iterative application hereafter.
In order to utilize the stochastically iterative I/O periodicity of these emerging applications, we proposed an adaptively periodic I/O scheduling (APIO) to optimize application efficiency and system utilization. It first conducts a periodic scheduling to utilize the periodicity given the specified probabilistic distribution of applications, which allocates different specified bandwidths within different durations for each instance of applications in a period. In each period it then fine-tunes the allocation of bandwidth to resist the stochasticity of applications in run time. When the specified number of instances for some applications can not be scheduled within a period, it extends the period to adapt to the actual duration. Our proposed algorithm inherits the advantages of periodic I/O scheduling and adapts it for scheduling a wide range of HPC applications that have a stochastically itetative I/O periodicity.
The main contributions of this work include as follows: • We propose an adaptively periodic I/O scheduling algorithm, which combines the advantages of periodic scheduling and online scheduling to leverage the iterativeness and stochasticity of the ever-growing stochastic iterative applications on HPC; • We perform a theoretical analysis of the efficiency of the proposed scheduling; • We conduct simulations to show the efficiency and effectiveness of our proposed method compared to the existing online scheduling.
The rest of this paper is organized as follows. Section 2 describes the related works on stochastic iterative applications and I/O scheduling. Section 3 introduces the models on platform and application, the I/O scheduling problems, and the existing I/O scheduling algorithms. In Section 4, the proposed adaptively periodic I/O scheduling algorithm is presented, and the related analysis on the efficiency is also shown. Section 5 shows the simulation experiments and Section 6 concludes this work.

Related Works
The enormous data-transferring requirements from a variety of applications pose a huge challenge for HPC storage systems, especially the ones with bandwidth-limited PFS. Several research studies have been conducted to study how to use such systems efficiently in different scenarios. In this work, the focus is on scheduling I/Os from stochastic iterative applications that share the aggregated bandwidth of PFS concurrently. Therefore, we discuss the three closest parts in this section.

Stochastic Iterative Applications
With the computing capacity of HPC systems rapidly increasing, there are a variety of applications originating from a wide range of fields that involve a lot of computation and large amounts of data transfer, and whose execution takes a lot of time (hours, and even days), deployed on such HPC platforms. Due to fault tolerance or visualization, these applications often store the intermediate results regularly into the persistent storage and then show the periodicity [14,25]. This periodicity might cause I/O bursts and then worsen the I/O bottleneck problem when many applications access the underlying PFS concurrently. An architecture solution for mitigating this I/O congestion is burst-buffering, which is widely discussed in the literature [4,9]. In addition, the applications running on HPC often show stochasticity, in which their execution time depends on the input data [26].
In our work, we define the stochastic iterative application as the application with a stochastically iterative I/O periodicity. The application executes I/O operations iteratively, but there is a random interval between two I/O operations to complete computing. The iterative I/O periodicity has many reasons, such as the iterative computing way and checkpointing/restart. The stochasticity of the computing phase comes from data characteristics, non-stationary iterative methods, and so forth.
The reasons why stochastic iterative applications are becoming more common mainly include the following points: First, the trend on the convergence of big data and HPC appeals to many ML-based applications to be deployed, which achieve an acceptable solution by the stochastic iterative algorithm [20,21]. The structural identification of an orbital anatomy application is such an ML-based data analysis example [22]. Second, some scientific data analytic applications, such as functional MRI quality assurance (fMRIQA) [22], show stochasticity and they execute on different instances iteratively. Third, many traditional scientific applications based on solving large sparse linear systems with popular iterative methods, such as the randomized Kaczmraz method, also possess stochasticity [23].

I/O Scheduling
Through controlling the execution procedure of I/O requests, I/O scheduling can be applied to many data-transferring scenarios to mitigate I/O-related problems. In terms of HPC, it schedules the I/O requests from different applications to access the underlying persistent storage. It can be implemented on different storage layers for different purposes [27]. For application-side optimizations, Liao et al. [28] proposed a dynamic file-domain-partitioning method according to the locking protocol of PFS to optimize the parallel I/O of one application. For server-side methods [29], Song et al. [30] presented a server-side I/O coordination for PFS to reduce the interference of different applications. For interaction between multiple layers [7,8], data compression and smart data movement are designed. In this work, we study coordinating I/O requests from many stochastic iterative applications on the I/O nodes.
I/O scheduling deployed on I/O nodes can utilize the data location information to optimize data access. In reference [31], the proposed IOrchestrator reorganizes the I/O requests by considering the data spatial locality. In reference [5], Tessier et al. provide a topology-aware data aggregation method to minimize the data conflict on the computing network. In reference [19], a randomness detection method, SSDup, is designed to improve the data transferring. In reference [18], a contention-aware scheduling is presented to balance the workload on each SSD server. In addition, this kind of I/O scheduling can obtain global application information and can easily integrate it into job scheduling to coordinate multiple applications. In order to resist the effects of I/O interference, Dorier et al. [12] propose a coordinating scheduling for two applications, CALCioM. In reference [32], Carretero et al. provided a bandwidth-aware mapping algorithm to consider job and I/O scheduling simultaneously.
The closest study to this work is the offline periodic scheduling proposed by Aupy et al. [14], which constructs a period to consider the periodicity of HPC applications. It achieves better performance on system efficiency and application dilation than another general online scheduling method [13]. In our prior study [17], we proposed a Markov-chain-based I/O scheduling, which improves the online scheduling by considering the state of burst-buffers. This type of I/O scheduling has wide applicability for applications on HPC.

Stochastic Scheduling
In order to consider the stochasticity of jobs, many stochastic job scheduling methods had been proposed in the book by Pinedo [33]. A speculative scheduling method proposed in reference [26] provides a solution for stochastic HPC applications in a reservation-based scheduling contexts, building a speculative reservation sequence to rerun the job when the prior reservation is unsatisfactory. In reference [22], Gainaru et al. continuously optimize the speculative scheduling by checkpointing the completed work.
For stochastic iterative applications, Du et al. [23] verified the robustness of the periodic checkpointing, which essentially is an I/O scheduling case to deal with stochasticity. In reference [34], the authors construct the optimal checkpointing strategies to decide which iteration performs checkpointing. These works place the research object on the stochastic iterative applications. This work for scheduling the I/O of such stochastic iterative applications is motivated by them.

Preliminaries and Motivations
In this section, we first describe the HPC platform model and the application execution model. Then, the I/O scheduling problem was formulated. Finally, we introduce the existing online and offline methods related to this work, and then describe the motivations.

HPC Platform Model
The HPC platform consists of lots of computing nodes and storage nodes to satisfy the requirements of large-scale scientific applications. The computing nodes are identical in terms of the computing capacity and the local bandwidth in common. A job scheduler assigns these computing resources to the applications in batch, and then each application has its own exclusive computing nodes.
We depict the platform model assumed for this work in Figure 1. There are many applications running on the platform concurrently and sharing the underlying PFS through I/O nodes (ION). The computation for each application is isolated on the specified computing nodes, but the I/O operations contend the shared I/O bandwidth of PFS, B. When the total I/O bandwidth requirement exceeds the aggregated bandwidth B, some applications have to be delayed, which is decided by the I/O scheduler.

Application Execution Model
In our considered execution model, there are K applications with stochastically iterative I/O periodicities running concurrently on the HPC platform illustrated above. Applications execute alternatively between the computing phrase and I/O phrase. The combination of a computing phrase and an I/O phrase refers to an instance. Each application consists of N i such instances (because the applications running on HPC platform often last very long, so here we assume the number of instances is enough big to achieve a periodic scheduling such as in reference [14]). Unlike periodic scheduling [14], the length of the computing phrase in each instance is stochastic rather than fixed, which follows a distribution, D, and the length of the I/O phrase is fixed, since the data structure of the intermediate results is designed as fixed in advance.
To clarify the execution procedure, an example for three stochastic iterative applications is illustrated in Figure 2. Three applications, A 1 , A 2 , and A 3 , have their own execution characteristics. Each application A i has N i instances that have different computation lengths W k i and the same I/O volume IO i . Due to the limitations of the platform bandwidth and the periodicity of the applications [25], I/O congestion might be happening during the execution. If the I/O execution procedure is disordered under the best-effort strategy, the caused I/O congestion would have dramatically degraded the I/O performance for the Write Amplification of SSD (solid-state drive) [13]. The aim of I/O scheduling is to ensure the order of I/O execution by arranging specific bandwidth for each application. Meanwhile, for each application A i , it runs on β i computing nodes, which are specified by the HPC batch job scheduler. The local bandwidth of a computing node is b. Thus, the maximum rate to transfer data for A i is B i = min(B, β i · b). However, the real I/O rate of A i at time t is the minimum between B i and the remains of the PFS bandwidth, i.e.,

Problem Description
The objectives of I/O scheduling are to achieve the maximum system efficiency and the minimum application dilation, the same as in the work in reference [13]. We define the application efficiency first for each application A i at time t.
where n i (t) ≤ N i is the number of instances of A i that have been executed at time t, r i is the release time of A i . The optimal application efficiency ρ i can be obtained in the dedicated The system efficiency refers to the total performance of all processors in the platform. Additionally, the application dilation refers to the largest slowdown among all applications.
Therefore, we formulate two problems on these two objectives of I/O scheduling as follows: , and a HPC platform that has a B PFS aggregate bandwidth and N computing nodes with b local node bandwidth, find the I/O bandwidth assignment b i (t) for each application to maximize the total platform performance.
. The first and second constrained conditions are to satisfy the restriction of the application and platform bandwidth. In third and fourth constrained conditions, the I/O volume of each application is satisfied and the order of instances is promised implicitly.

Problem 2 (MinDilation): Find the I/O bandwidth assignment for each application b i (t)
to minimize the largest slowdown among applications with the same parameters and consistent constraint conditions as Problem 1.
The rationale behind the MinDilation objectives is to provide fairness between all applications. It guides the scheduling to minimize the maximum of slowdowns to avoid starving some applications. All notations mentioned in these problem descriptions are listed in Table 1.
With the rapid growth of computing resources, HPC centers tend to rent spare computing resources to more users currently. Different applications have different I/O requirements for reasons such as the levels of services and the types of storage hardware [35]. The I/O scheduling problems can be generalized to take the applications' criticality into account. Here, we provide a simple enhanced model by introducing a weighted parameter for each application. The objective of the MaxSysEfficiency problem can be modified as where w i denotes the importance of application A i . The MinDilation problem can also be modified in the same way. However, our proposed I/O scheduling in this work tends to make a global improvement while ignoring the demands of individual applications. The related weighted parameters are set to be one (i.e., all applications have the same importance).

Existing Methods and Motivations
Both problems described above have been proved as being NP-complete, even in a simple offline setting [13]. So we can just give some heuristics rather than an exact algorithm. These problems just have different optimization objectives. Thus, a unified method can deal with them with different strategies. Online I/O scheduling [13] is a greedy algorithm based on different heuristics. Periodic I/O scheduling [14] then improves the online one to exploit the periodicity. We describe both of them briefly and give the motivations of our work.

N
The number of all the computing nodes in HPC platform B The aggregate bandwidth of PFS b The bandwidth of each local computing node K The number of all the stochastic iterative applications A i The i-th application β i The number of allocated computing nodes for A i N i The number of instances for A i r i The release time of A i d i The final complete time of A i B i The possible maximum bandwidth for A i W k i The computing duration of the k-th instance of A i IO i The The optimal application efficiency for A i s k i The start time of the k-th instance of A i b i (t) The assigned bandwidth at time t for A ĩ ρ i (t) The real application efficiency at time t for A i

Online I/O Scheduling
The rationale behind this is determining a priority queue of applications based on some strategies at each event. This greedy algorithm can adapt to many application settings. However, it is an online centralized method, which has heavy computation and a lack scalability. For different optimization objectives, there are different strategies [13] shown below that can be chosen.

•
The RoundRobin strategy favors the application with the "first-come first-served" (FCFS) fashion. It ensures fairness and usually can be used for comparison; • The MinDilation strategy favors the applications with low values ofρ i (t) ρ i (t) . The application with low efficiency can be executed to improve the application efficiency that is useroriented; • The MaxSysEff strategy favors the applications with high β i . The application with a higher application efficiency represents that can utilize the system resources more efficiently. This objective is CPU-oriented; • The MinMax-γ strategy is a balance between MinDilation and MaxSysEff . It favors the applications that have high values of β i ρ i (t) ρ i (t) , and dilation values ofρ i (t) ρ i (t) below a certain threshold γ.

Periodic I/O Scheduling
For the case with a fixed length of application instances, periodic I/O scheduling utilizes the periodicity of applications to assign the I/O bandwidth to each application offline [14]. It searches an appropriate period through an exponential search and inserts the schedulable application into the period based on some strategies, which is the same as the online method. This method obtains better performance than online I/O scheduling for this special case. It is decentralized, so it does not cause an additional overhead when applications run.
The method first sets the minimum possible period T min = max i (W i + IO i /B i ) and the maximum possible period T max = K · T min with a specified parameter K . It increasingly searches all possible periods between T min and T max by a factor of (1 + ). For each possible period T, it inserts the schedulable application A i into the current bandwidth allocation by insert-in-pattern(P, A i ). If there is space to satisfy the I/O volume of A i in the period, then A i is schedulable. Finally, it chooses the optimal period T opt to obtain the best system efficiency, SE. The detailed algorithm is shown in Algorithm 1.

Algorithm 1 Periodical I/O Scheduling (PerSched) [14]
Input: A set of applications A i (β i , N i , r i , W i , IO i ), PFS bandwidth B, local bandwidth b, K , Output: The bandwidth allocation P opt for all applications and the period T opt 1: T min = max i (W i + IO i /B i ) and T max = K · T min 2: T = T min 3: SE = 0, T opt = 0, P opt = ∅ 4: while T ≤ T max do 5: while exists a schedulable application do 7: A = {A i is schedulable} 8: choose A i from A by strategy MaxSysEff

Motivations
This work is motivated by three observations: First, due to some reasons, like the convergence of big data and HPC, there are many stochastic iterative applications deployed on the HPC platform, whose computing phrases obey some distributions. Second, the existing method can not utilize the characteristic information of applications adequately. The general online method ignores the periodicity and stochasticity of applications completely. Additionally, the periodic method is not able to adapt to the stochastic applications directly. Third, the effects of the lengths of different application instances getting longer or shorter can be counteracted. So, the adaptively periodic method is proposed to satisfy the requirement of stochasticity.

Adaptively Periodic I/O Scheduling
In this section, we describe the adaptively periodic I/O scheduling (APIO) in detail. First, we introduce the overall scheme and the related data structure. Then, we present the APIO algorithm and give some analysis results.

Scheme and Data Structures
In order to exploit the periodicity and stochasticity of applications with a stochastically iterative I/O periodicity, we construct a scheme based on the periodic I/O scheduling. For each stochastic iterative application A i , the length of the computing phrase W k i of its each instance I k i is a random variable obeying a distribution D(µ i , σ i ). The practical length of W k i can be determined after the finish of that computing phrase. The overall scheme includes two steps: In the first step, it sets W k i to be the same as µ i and then utilizes the periodic I/O scheduling (Algorithm 1) to obtain a basic schedule P (periodic pattern). The schedule P can be expressed as ∪ . For each instance I k i , it includes a sequence of < t j , b j > representing that the I/O operation of A i starts at the time t j with the bandwidth b j . Then, we can construct an auxiliary array, f ree, to record the free space of PFS's I/O bandwidth. f ree is also a sequence of < t j , b j >. In the second step, it adjusts the basic schedule P at each event when any computing phrase ends.
To clarify the algorithm in the second step, we introduce a list-data structure, L, which records all the start times of the first I/O part < I k i .t 1 , p k i > of each instance I k i . p k i is the pointer of the instance I k i . L is a sorted array on I k i .t 1 increasingly. The basic schedule P and the free space f ree are also as input in the second step. These three main data structures are illustrated in Figure 3.

Adjusting the Periodic Schedule
Because the length of the computing phrase of each instance for the stochastic iterative application varies randomly, the periodic I/O scheduling pattern should be adjusted to achieve better performance or satisfy the extension of the computing. For an instance of an application, if its computing phrase ends in advance, its I/O phrase can be executed ahead. Otherwise, the execution of its I/O phrase would be postponed.
Specifically, when a computing phrase of an instance ends its execution on computing nodes, it will issue an event to notify that its I/O phrase can start. Let e k i be the event when the computing phrase of the k-th instance of the i-th application finished. If the time e k i .t that the event is issued is less than the assigned time I k i .t 1 , the I/O transferring should be started earlier. Its periodic schedule, I k i {< t 1 , b 1 >, < t 2 , b 2 >, · · · }, should be modified. It gets the space from f ree to execute IO i . Similarly, when e k i .t is greater than I k i .t 1 , its schedule also be adjusted.
In addition, when an event happens, there are some assigned I/O that have not been executed. We can assert that its execution time should be postponed. As such, its related schedule should be recalculated too. The detailed algorithm for the online execution of the stochastic iterative applications is described in Algorithm 2. The further explanation of the specific operations is also given.

Input: A set of applications
b, the periodic schedule P Output: The used time T per for the current period 1: gets the application set A, the remained bandwidth of PFS f ree and the auxiliary list L from P 2: while exists an event e k i do 3: if I k i is marked as empty then 4: Allocates bandwith for I k i and updates f ree allocates bandwith for I k i and updates f ree 13: end if 14: for each L.I k i .t 1 < e k i .t do 15: empties I k i and updates f ree 16: removes I k i term from L 17: end for 18: end if 19: executes the current bandwidth assignment 20: end while 21: T per = Time(A)

Cleaning Instances
When an event comes earlier or later, the bandwidth assigned previously is invalid and we need to recalculate the bandwidth assignment for the application issuing the event. We show an example in which an instance finished its computing phrase early in Figure 4. The solid line marked t − 1 represents the current time. t 1 denotes the expected time in the periodic I/O scheduling and the pre-assigned bandwidths should start at time t 1 . However, the computing phrase of the instance I 1 1 is finished early, so the pre-assigned bandwidths for I 1 1 are invalid and then they are reassigned the bandwidth from the remaining bandwidth, f ree. The gray part in the figure represents the expected execution based on the pre-assignment of the periodic I/O scheduling. Similarly, if the computing phrase of the instance I 1 1 is finished at a possible time t + 1 that is greater than t 1 , it will reassign the bandwidths too. For an instance I k i that will be cleaned, we first release the bandwidths {< t 1 , b 1 >, < t 2 , b 2 >, · · · } assigned to it and then add to f ree. The algorithm then allocates bandwidths to I k i based on the best-effort strategy from the remaining bandwidth f ree. Among these operations, each item < t j , b j > of I k i satisfies b j < B i and each item < t j , b j > of f ree satisfies 0 ≤ b j ≤ B. Note that when the instance I k i issues the event e k i , a new bandwidth part < e k i .t, b 1 > might be allocated. This will cause f ree to add a new item with e k i .t, and remove the first bandwidth part < I k i .t 1 , p k i > from the auxiliary list L for instance I k i , which is executed instantly.

Emptying Instances
Assuming an event e k i comes, it will update the bandwidth assignment of instance I k i directly. However, if e k i .t is greater than the start time of the I/O phrase of some instances, such as I k j with j = i, we can assert that the instance I k j will be postponed. To find such an instance, we maintain the auxiliary list L that records the first bandwidth part for each instance. From the beginning of the list, we find all the terms with I k i .t 1 < e k i .t. Thus, we mark these instances I k j with a flag variable empty and clean their assigned bandwidth to f ree. The record < I k j .t 1 , p k j > of I k j in L is also removed. When the event e k i comes, if the instance I k i is marked as empty, we just allocate the bandwidth for it from f ree directly. The operation of cleaning instances is for the instance itself, and the operation of emptying instances is for other instances. Algorithm 2 adjusts the periodic bandwidth assignment through both operations, which reserves the advantage of the periodic I/O scheduling.

APIO Algorithm
In order to utilize the periodicity and stochasticity of the stochastic iterative applications, the adaptively periodic I/O scheduling (APIO) algorithm adjusts the bandwidth assignment of periodic I/O scheduling. It is composed of two basic modules: PerSched (Algorithm 1) and OnlineAdj (Algorithm 2). The complete description is shown in Algorithm 3.
APIO first calculates the total number of periods is the number of instances for application A i in a period produced by algorithm PerSched. Then, for each period, it performs online scheduling by the OnlineAdj module, the input of which includes all the instance information of a period. Finally, it calculates the system efficiency SE and the dilation DI through the objectives shown in Formulas (1) and (2). This algorithm can be seen as a combination of online and offline minds. It utilizes the periodic offline scheduling to obtain some prior information and then performs online scheduling to resist the stochasticity of the applications. It utilizes comprehensively the global information and the local information. T tot + = T per , p + + 7: end while 8: calculates SE and DI from T tot

Performance Analysis
APIO is an online scheduling based on the pre-assignment of the periodic I/O scheduling to exploit the characteristics of stochastic iterative applications. Here, we analyze the advantages of this proposed method on the effectiveness and efficiency.
The key operations of APIO are the advance and delay of I/O transferring relative to the pre-assignment of the periodic method. Both operations do not worsen I/O congestion since there is enough space around the congestion area. In practice, the I/O overhead is less than one-third of the PFS aggregate bandwidth for most of the time [4].
The advance of I/O transferring can utilize the free space before the pre-assignment. As such, it does not worsen the schedule. Even if there is no free space, the pre-assignment of the application can satisfy the I/O requirement. When I/O transferring is postponed, some pre-assigned space might be wasted. However, in most cases, there is enough space to satisfy the postponed I/O requirement. With high probability (p(x) ≥ 0.95), the length of the computing phrase is less than twice the mean length. Additionally, there is a pre-assigned space for the next instance that can be used. So Theorem 1 below is held. Theorem 1. The performance of APIO is within two factors of the online scheduling, with a high probability for stochastic iterative applications with Gaussian Distribution.
Proof. Without the loss of generality, the performance considered here is the system efficiency for all the applications. For other objectives, we can achieve similar results.
In terms of application, the system efficiency is proportional to its completion time. This is assuming that the completion time of online scheduling proposed in reference [13] is T online for the stochastic iterative applications, and the completion time of periodic I/O scheduling in reference [14] is T periodic for the applications that are generated by reducing the stochasticity of the stochastic iterative applications. Since the periodic I/O scheduling can utilize the periodicity of applications sufficiently, it obtains a global optimization and then T periodic < T online with a high probability.
APIO adjusts the pre-assignment of periodic I/O scheduling. When the length of the computing phrase gets shorter, the completion time of the instance is less than the preassignment. However, when the length gets longer, the completion time would be longer. However, the length will be within two factors of the average length with a high probability. For the Gaussian distribution D(µ, σ), the probability p(x ≤ µ + 2σ) is 0.955. The preassigned space for the next instance can satisfy the I/O requirement of the current instance. So the completion time of APIO is within two factors of the periodic I/O scheduling with a high probability, which is T APIO < 2 · T periodic . Then, T APIO < 2 · T online . The theorem is proved.
Moreover, APIO is more efficient than the existing online scheduling [13]. It just assigns the I/O bandwidth for each instance once with several computations, rather than for each application in each event. The pre-assignment of I/O bandwidth for a period is pre-calculated, which provides a performance basis of our method and makes the efficiency possible. The other computation overhead is searching the sorted auxiliary arrays, which can result in a constant complexity in run time.

Simulation Results
In this section, some simulation experiments have been designed to evaluate the performance of our proposed method, APIO. The experiments are conducted on stochastic iterative applications constructed by real applications with different I/O characteristics. We compared the performances on system efficiency and the application dilation of APIO and online scheduling [13] under different I/O congestion settings. All the simulations are implemented through a discrete event simulator introduced by reference [4], which maintains an event queue to mimic the execution of applications on an HPC platform.

Experiments Settings
The settings of the simulation experiments include system configuration and application configuration. Both configurations are built by simulating the parameters of the real system and application.

System Configuration
In this work, the run-time platform had been described by a very simple model illustrated in Figure 1. The related system parameters refer to the experiment settings in references [4,15], which originate from the real environment of the Intrepid Blue Gene/P supercomputer in Argonne National Laboratory, US.
The aggregate bandwidth of PFS, B, is set as 100GB/s. The peak bandwidth for each node, b, is 1 GB/s. The number of computing nodes is assumed to be sufficient. For this simple model, it does not need other parameters. The discrete event simulator getting these platform parameters can simulate the running of the entire HPC platform.

Application Configuration
Application settings used in this work also originated from the real applications that are reported in APEX's report (https://www.nersc.gov/assets/apex-workflows-v2.pdf, (accessed on 18 April 2022) for the LANL (Los Alamos National Laboratory) workflows [4]. We considered four real scientific applications: the Eulerian Application Project (EAP), Lagrangian Applications Project (LAP), Silverton, and the vector particle-in-cell (VPIC). The detailed characteristics of these applications are depicted in Table 2. Note that B i rate (GB/s) implies the number of computing nodes assigned to the application A i and the checkpoint time implies the volume of I/O transfers.   1  2  3  4  5  6  7  8  9  10   EAP  0  0  0  0  0  0  1  0  0  1  LAP  10  8  6  4  2  2  2  0  0  0  Silverton  0  1  2  3  0  4  0  1  5  1  VPIC  0  0  0  0  1  0  0  1  0  0 In order to model the stochasticity of the application, we design three different distributions for the experiment. Uniform distribution is for simple situations, Truncated Normal and LogNormal distribution are more closer to the real situation. The parameters are derived from the characteristics of the APEX applications. The detailed distributions are shown in Table 4. (a) Probability distributions

Set #
From the parameters of these applications, we can construct the applications as the input of the discrete event simulator. The simulator eventually calculates the objective functions and obtains the system efficiency and application dilation for different I/O scheduling methods.

Results and Analysis
In this section, we show the experiment results by comparing the performance of APIO and the basic online I/O scheduling (BIOS [13]). Both methods are based on the MinMax-γ strategy with γ = 0.5, which is able to achieve higher than average performances compared to other strategies [13]. We conduct the simulation for each set of applications and different probability distributions, and then calculate the system efficiency and application dilation defined in Section 3.2. For each set, the test is repeated five times and we calculate the average.
Due to the simplicity of Uniform distribution, we first show the results of APIO and BIOS under the Uniform distribution. So, the length of the computing phrase of the application's instances is distributed in the interval [a, b] with an equal probability. APIO can utilize the probabilistic characteristic to optimize the I/O scheduling. The detailed results are shown in Figure 5. From set #1 to #10, the I/O congestion is increasing gradually. So, as Figure 5 shows, the system efficiency is also increasing and the application dilation is decreasing accordingly. For all the sets, the performance of APIO is superior to BIOS. At set #1, I/O congestion is the most serious, but APIO achieves the best relative performance. When I/O congestion disappeared, both methods obtained a similar performance.
Second, in order to show the influence of probability distributions, we conduct the simulation under the Truncated Normal and LogNormal distribution. The particular parameters of distribution are listed in Table 4. Other experiment's settings are the same as the Uniform distribution. The detailed results are shown in Figure 6.
As per the result shown, the trends of the system efficiency and application dilation are same as the results under the Uniform distribution. APIO obtains better performance than BIOS. However, the performance under the Truncated Normal distribution is better than the LogNormal distribution overall. The reason is that the proposed method favors the symmetric stochastic change of the computing phrase. Adaptively adjusting the periodic bandwidth allocation can counteract the effects of shrinking or expanding the computing phrase. The truncated normal distribution has better symmetry than the logNormal distribution and then obtains better performance. This result shows that the performance is seriously affected by the characteristics of the application. In addition, when I/O congestion is serious, such as in set #1 and #2, the system efficiency of APIO for the LogNormal distribution even surpasses the performance of BIOS for the Truncated Normal distribution.

Conclusions
In this paper, we studied the I/O scheduling problem for applications with a stochastically iterative I/O periodicity to achieve the targeted objectives, such as system efficiency and application dilation. The existing methods did not utilize the stochasticity and periodicity presented in a wide range of applications, including particularly big data analytics. To take both characteristics of these applications into account, we proposed an online scheduling method, namely the adaptively periodic I/O scheduling (APIO) method, which dynamically adjusts the pre-assigned bandwidth online, which is provided by periodic I/O scheduling. APIO combines the advantages of the periodic I/O scheduling method to utilize the periodicity with the online adjustment to adapt the stochasticity. We provide the performance analysis to show the effectiveness and efficiency of the proposed method. The simulation experiment results show the superiority of the proposed method to the existing online scheduling method.
In our future work, a theoretical analysis based on computational complexity and probability theory will be done. Meanwhile, there are many research directions for new I/O scheduling methods that can be investigated in the future. A more sophisticated scheduling method based on more application properties can be studied. Weighted I/O scheduling with consideration for the applications' criticality will be explored, and energy-efficient I/O scheduling based on an HPC platform's energy model will also be studied in order to reduce energy consumption.