Hypergraph + : An Improved Hypergraph-Based Task-Scheduling Algorithm for Massive Spatial Data Processing on Master-Slave Platforms

Bo Cheng 1,2, Xuefeng Guan 1,2,*, Huayi Wu 1,2 and Rui Li 1,2 1 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China; chengbo@whu.edu.cn (B.C.); wuhuayi@whu.edu.cn (H.W.); ruili@whu.edu.cn (R.L.) 2 Collaborative Innovation Center of Geospatial Technology, 129 Luoyu Road, Wuhan 430079, China * Correspondence: guanxuefeng@whu.edu.cn; Tel.: +86-27-6877-8311


Introduction
In recent years, with the rapid development of surveying and remote sensing technologies, the volume of spatial data has increased dramatically [1][2][3].Spatial data processing is a typical type of data-intensive applications where users must access and process massive spatial data.Figure 1 depicts a typical data-intensive computing scenario comprised of a set of storage and computing nodes that collaborate in a network.Each task requires a subset of input files from the storage nodes; a task may share a number of files with other tasks, while an individual task is submitted to one computing node for execution.The computing nodes themselves are connected to the storage nodes for data transfer through a network.This collaboration is orchestrated by a task/data scheduling strategy; therefore, scheduling strategy efficiency has an important influence on collaboration performance.For such data-intensive applications, a number of scheduling strategies have been proposed, including task-oriented, data-aware, and hypergraph-based algorithms.Among these scheduling algorithms, only the hypergraph-based type of algorithms can fully capture data sharing among tasks and thus minimize the overall data transfer while still maintaining a balanced distribution of computing loads across the nodes.Heterogeneous processing platforms, however, generate additional problems in these hypergraph-based scheduling strategies.The formulated hypergraph model can completely represent the relationship among tasks, data files and compute platforms, but as the task execution node and file transfer destination are unknown before scheduling, these types of improved hypergraph algorithms cannot take processors or network heterogeneity into consideration.Without due consideration of platform heterogeneity, scheduling with single hypergraph partitioning is not optimal.Furthermore, the existing scheduling algorithms including hypergraph approaches generally neglect the overlap between task executions and data transfers.These overlooked overlaps might be exploited to further decrease total task execution time.
To address these problems, we propose an extended hypergraph-based task-scheduling algorithm, named Hypergraph+.Hypergraph+ firstly encapsulates a master-slave platform, spatial data processing applications, and a scheduling objective into a general hypergraph model.The later Hypergraph+ scheduling contains two consecutive stages: matching and ordering.In the matching stage, a Fitness function represents platform heterogeneity, evaluates the quality of hypergraph partitioning, and selects the optimum partition.In the ordering stage, a Sharing-Files metric determines the task execution in order to maximize overlap between communication and computation.
We conducted experiments to compare our proposed Hypergraph+ algorithm with three classical scheduling algorithms on a virtual heterogeneous master-slave platform using the GridSim simulation toolkit [4].These classical scheduling algorithms include MinMin [5], XSufferage [6], and the pure hypergraph-based scheduling algorithm [7] that we term Hypergraph in this paper for the sake of simplicity.The target application is a real IDW interpolation of a massive point cloud.Simulation results illustrate that in comparison to Hypergraph, our proposed Hypergraph+ algorithm can decrease task execution time by more than 43% when scheduling massive spatial data processing applications.
The rest of this paper is organized as follows.Hypergraph partitioning and scheduling strategies for data-intensive applications are introduced in Section 2. The formulated general hypergraph model for task scheduling is presented in Section 3. Section 4 describes the proposed Hypergraph+ algorithm, followed by simulation result details in Section 5. Section 6 concludes the paper.

Hypergraph and Hypergraph Partitioning
A hypergraph H = (V, N) is defined as a set of vertices V and a set of hyperedges N that connect those vertices [8].Each hyperedge n j P N is a non-empty subset of vertices V, i.e., n j Ď V. Figure 2 illustrates one hypergraph; in this figure, the closed curve represents one hyperedge and the dots in the closed curve denote the vertices on this hyperedge.A graph can be treated as a special type of hypergraph where each hyperedge can only connect two vertices.Similar to a graph, the weights w i and costs c j can be assigned to the vertices (v i P V) and hyperedges (n j P N) of the hypergraph, respectively.
ISPRS Int.J. Geo-Inf.2016, 5, 141 3 of 16 for task scheduling is presented in Section 3. Section 4 describes the proposed Hypergraph+ algorithm, followed by simulation result details in Section 5. Section 6 concludes the paper.

Hypergraph and Hypergraph Partitioning
A hypergraph H = (V, N) is defined as a set of vertices V and a set of hyperedges N that connect those vertices [8].Each hyperedge   ∈ N is a non-empty subset of vertices V, i.e.,   ⊆ V.A partition Π = {V1, V2, … , VK} is called a K-way partition of H if (1) each part Vk is a non-empty subset of H, (2) all parts are disjointed pairwise and (3) the union of all parts is equal to V. In one partition Π, if a hyperedge has at least one vertex in a part, then it is connected to this part.The connectivity set Λj of a hyperedge nj denotes all the parts connected by nj, and the connectivity value λj = |Λj| of nj is defined as the number of parts connected by nj.If a hyperedge connects more than one part, it is cut (i.e., λj > 1), and if otherwise, it is considered as uncut (i.e., λj = 1).The cutsize of a partition Π is computed as in Equation (1): where Ncut is the set of all cut hyperedges and each cut hyperedge nj incurs a cost of   (  − 1).This partition cutsize is also known as the connectivity-1 metric.
To solve the hypergraph partition problem, a partition must be found where the cutsize is minimized, and a relative balance among all the parts is maintained.A partition Π of H is balanced if the workload Wk of each part Vk satisfies the balance criterion, shown in Equation (2): Although hypergraph partitioning is a NP hard problem, there are still some excellent hypergraph partition algorithms.In addition, open source tools, such as hMETIS [9], PaToH [10], and Parkway [11], are available to implement high-quality hypergraph partitioning.Hypergraph A partition Π = {V 1 , V 2 , . . ., V K } is called a K-way partition of H if (1) each part V k is a non-empty subset of H; (2) all parts are disjointed pairwise and (3) the union of all parts is equal to V. In one partition Π, if a hyperedge has at least one vertex in a part, then it is connected to this part.The connectivity set Λ j of a hyperedge n j denotes all the parts connected by n j , and the connectivity value λ j = |Λ j | of n j is defined as the number of parts connected by n j .If a hyperedge connects more than one part, it is cut (i.e., λ j > 1), and if otherwise, it is considered as uncut (i.e., λ j = 1).The cutsize of a partition Π is computed as in Equation (1): cutsize pΠq " where N cut is the set of all cut hyperedges and each cut hyperedge n j incurs a cost of c j `λj ´1˘.This partition cutsize is also known as the connectivity-1 metric.
To solve the hypergraph partition problem, a partition must be found where the cutsize is minimized, and a relative balance among all the parts is maintained.A partition Π of H is balanced if the workload W k of each part V k satisfies the balance criterion, shown in Equation (2): where W k " ř v i PV k w i denotes the sum of the vertex weights of one part V k ; W avg " is the average weight; and ε is a predetermined imbalanced value.
Moreover, a hypergraph can be also represented as a bipartite graph, e.g., conceptual graph [19,20].A conceptual graph contains two disjoint vertices and the semantic relationships as directed edges connecting the disjoint vertices.Conceptual graphs can support problem solving and decision making processes, including artificial intelligence, data mining and case-based reasoning [21].However, a hypergraph is much more general and obvious than a conceptual graph, and was selected here to model massive spatial data processing.

The Scheduling Heuristics for Data Intensive Applications
Scheduling data-intensive applications has been extensively studied, and a number of scheduling algorithms have been proposed.According to whether and how they take data transfer into account, these algorithms can be classified into three categories: task-oriented, data-aware, and hypergraph-based.
Task-oriented scheduling algorithms usually require detailed information about tasks and machines for accurate estimation of task execution times on each machine.Maheswaran et al. proposed several typical mapping heuristics including MinMin, MaxMin and Sufferage [5].From the unscheduled tasks, the MinMin chooses the task that has the minimum earliest completion time and allocates this task to a corresponding machine that can compute it the quickest.Unlike MinMin, MaxMin assigns the task with the maximum earliest completion time to the fastest executing node.Sufferage selects the task with the highest sufferage value, defined as the difference between its earliest completion time and its second earliest completion time.None of these heuristics, however, considers data issues when making scheduling decisions in data intensive applications and therefore, they are inefficient.
Different from task-oriented algorithms, data-aware scheduling algorithms can produce significant performance improvements as they take both data transfer and task scheduling into account [2,22,23].Casanova et al. proposed an extension of Sufferage called XSufferage, which exploits file locality and computes a cluster-level sufferage value to achieve better performance [6].The Close-to-Files algorithm [24] schedules tasks with file replication on the least loaded processor close to the sites where the input files are stored.Zhang et al. proposed metaheuristic data pre-scheduling and dynamic task scheduling strategies to solve all-to-all comparison problems in heterogeneous distributed systems [25].Szmajduch and Kołodziej presented a new version of the Expected Time to Compute Matrix model (ETC Matrix), in which the data transmission and task computation are involved [26].These data-aware scheduling algorithms however, do not consider file sharing patterns in a global way and thus cannot fully exploit high degrees of shared I/O.
Hypergraph-based scheduling algorithms globally optimize the data transfer during task scheduling.Khanna et al. proposed a hypergraph partitioning-based strategy to schedule a batch of independent tasks to minimize the volume of remote data transfer and contention on storage nodes while maintaining a balanced computational load distribution across compute nodes [7].Kaya and Aykanat proposed an iterative scheduling approach that improves the scheduling performance by adopting hypergraph-partitioning [18].They exploit data sharing in a global way to achieve more enhanced performance than the other two types of algorithms.
However, since the file transfer node and task execution destination are unknown, the formulated hypergraph scheduling model cannot fully represent the underlying heterogeneous platforms in which the processors have different processing capabilities and network links have different bandwidths.Hence, a single hypergraph partitioning may not be optimal since platform heterogeneity is neglected.Furthermore, these scheduling algorithms cannot adequately exploit the overlap between communication and computation.Therefore, a new task-scheduling algorithm that can address the heterogeneous platform problem and maximize the communication-computation overlap is urgently needed.

Hypergraph-Based Task Scheduling Model
In this section, we formulate a general hypergraph-based task scheduling model consisting of a master-slave platform, spatial data processing applications, and the scheduling objective.

Platform Model
The target platform conforms to a typical heterogeneous master-slave paradigm and contains a master P 0 and a set of p slave processors, P = {P 1 , P 2 , . . ., P p } as depicted in Figure 3.The master P 0 is connected to slaves over a local area network.The slaves are employed as computing nodes and each has a relative processing capability ρ to execute the tasks.We assume that all the data files are initially stored on the master P 0 , so if an input file required by a task is not in the slave processor where the task is executed, it must be requested from the master P 0 .

Hypergraph-Based Task Scheduling Model
In this section, we formulate a general hypergraph-based task scheduling model consisting of a master-slave platform, spatial data processing applications, and the scheduling objective.

Platform Model
The target platform conforms to a typical heterogeneous master-slave paradigm and contains a master P0 and a set of p slave processors, P = {P1, P2, …, Pp} as depicted in Figure 3.The master P0 is connected to slaves over a local area network.The slaves are employed as computing nodes and each has a relative processing capability ρ to execute the tasks.We assume that all the data files are initially stored on the master P0, so if an input file required by a task is not in the slave processor where the task is executed, it must be requested from the master P0.The bandwidth of the link between the master P0 and the slave processor Pk is denoted by bk (k = 1, 2, …, p), while the maximum outgoing bandwidth of P0 is denoted by bm.In order to decrease the waiting time for tasks, task executions and file transfers can overlap on the slaves, i.e., a slave processor can execute a task while accepting the necessary files to execute the next task.
The multiplexed connection model [27] that enables communications between the masters and slaves is used: (1) it allows multiple slaves to download files from the master P0 simultaneously; (2) two slaves cannot request the same file at the same time; and (3) a slave processor can receive another file after it has saved the previously received file on its local disk.

Application Model
The spatial data processing application A = (T, F) consists of a set of independent tasks T = {t1, t2, …, tn} and m files F = { f1, f2, …, fm}.The execution of each task ti depends upon a subset of files, denoted by Fi = {f1, f2, …, fk}; a given file may be shared by several tasks.The target application A can be represented as a hypergraph model H = (V, N) to capture this data-sharing pattern.In our proposed formulated hypergraph model H, tasks correspond to vertices and files correspond to hyperedges.A hyperedge nj connecting some vertices means that this file fj is needed as input and is shared by a set of tasks.The vertex weight wi is the estimated completion time of the corresponding task Tct(ti), and the hyperedge weight cj is equal to the file size Size(fj).
The estimated completion time of one task Tct(ti) is the sum of the total input file transfer time from the master P0 and the actual task computation time.Prior to task mapping, the file transmission destination is unknown, but the actual file transfer time can be estimated from the size of file fj divided The bandwidth of the link between the master P 0 and the slave processor P k is denoted by b k (k = 1, 2, . . ., p), while the maximum outgoing bandwidth of P 0 is denoted by b m .In order to decrease the waiting time for tasks, task executions and file transfers can overlap on the slaves, i.e., a slave processor can execute a task while accepting the necessary files to execute the next task.
The multiplexed connection model [27] that enables communications between the masters and slaves is used: (1) it allows multiple slaves to download files from the master P 0 simultaneously; (2) two slaves cannot request the same file at the same time; and (3) a slave processor can receive another file after it has saved the previously received file on its local disk.

Application Model
The spatial data processing application A = (T, F) consists of a set of independent tasks T = {t 1 , t 2, . . ., t n } and m files F = {f 1 , f 2, . . ., f m }.The execution of each task t i depends upon a subset of files, denoted by F i = {f 1 , f 2 , . . ., f k }; a given file may be shared by several tasks.The target application A can be represented as a hypergraph model H = (V, N) to capture this data-sharing pattern.In our proposed formulated hypergraph model H, tasks correspond to vertices and files correspond to hyperedges.A hyperedge n j connecting some vertices means that this file f j is needed as input and is shared by a set of tasks.The vertex weight w i is the estimated completion time of the corresponding task T ct (t i ), and the hyperedge weight c j is equal to the file size Size(f j ).
The estimated completion time of one task T ct (t i ) is the sum of the total input file transfer time from the master P 0 and the actual task computation time.Prior to task mapping, the file transmission destination is unknown, but the actual file transfer time can be estimated from the size of file f j divided by the maximum outgoing bandwidth b m of P 0 .For spatial data processing applications, it is feasible to assume the actual computation time of a task is proportional to the size of its input files F i, and C is the predefined computation cost of one data byte.Thus, the total estimated completion time of task t i will be defined as in Equation (3): Neighborhood computations are usually required in spatial data processing applications.Figure 4 shows some typical neighborhood configurations, including von Neumann, Moore, and extended Moore neighborhoods [28].In neighborhood processing, one cell generally corresponds to a block of pixels, whose attribute values are stored in a file.When a neighborhood algorithm is used, for example, to calculate slopes and aspects from elevations, the computation task for a given cell requires the values of its neighborhood cells (including the cell itself), i.e., a set of corresponding files.
ISPRS Int.J. Geo-Inf.2016, 5, 141 6 of 16 by the maximum outgoing bandwidth bm of P0.For spatial data processing applications, it is feasible to assume the actual computation time of a task is proportional to the size of its input files Fi, and C is the predefined computation cost of one data byte.Thus, the total estimated completion time of task ti will be defined as in Equation ( 3): Neighborhood computations are usually required in spatial data processing applications.Figure 4 shows some typical neighborhood configurations, including von Neumann, Moore, and extended Moore neighborhoods [28].In neighborhood processing, one cell generally corresponds to a block of pixels, whose attribute values are stored in a file.When a neighborhood algorithm is used, for example, to calculate slopes and aspects from elevations, the computation task for a given cell requires the values of its neighborhood cells (including the cell itself), i.e., a set of corresponding files.B, C, D, E, F}, F3 = {B, C, E, F}, F4 = {A, B, D, E, G, H}, F5 = { A, B, C, D, E, F, G, H, I},  F6 = {B,C, E, F, H, I}, F7 = {D, E, G, H}, F8 = { D, E, F, G, H, I}, and F9 = {E, F, H, I}.Neighborhood computations are usually required in spatial data processing applications.Figure 4 shows some typical neighborhood configurations, including von Neumann, Moore, and extended Moore neighborhoods [28].In neighborhood processing, one cell generally corresponds to a block of pixels, whose attribute values are stored in a file.When a neighborhood algorithm is used, for example, to calculate slopes and aspects from elevations, the computation task for a given cell requires the values of its neighborhood cells (including the cell itself), i.e., a set of corresponding files.B, D, E, G, H}, F5 = { A, B, C, D, E, F, G, H, I},  F6 = {B,C, E, F, H, I}, F7 = {D, E, G, H}, F8 = { D, E, F, G, H, I}, and F9 = {E, F, H, I}.

Scheduling Objective
The scheduling objective is to minimize the overall execution time, known as the makespan, which starts from the first file transfer and ends with the completion of the last task execution [18].Since the estimated completion time of one task is the sum of the total data transfer time and actual task computation time, the scheduling objective is to shorten the amount of data transfer and balance the computational load across the slaves in such a way that the overall execution time is minimized.With the help of the formulated hypergraph model, this objective will be further generalized and considered as the objective of hypergraph partitioning.
Data transfer minimization is achieved by the hypergraph partitioning objective.After constructing the hypergraph H, the objective of a typical hypergraph partitioning problem is to find a partition Π = {V1, V2, … , VK} where the cutsize is minimized.For a given partition Π, a cut hyperedge nj with connectivity λj means that the file fj needs to be transferred   − 1 more times but incurs additional (  − 1) * Size(  ) bytes of data transmission.Thus, the total communication cost can be computed as in Equation ( 4 is equal to total input file size and can be treated as constant, so the comm(Π) depends on the cutsize(Π).Thus, minimizing the cutsize is equivalent to minimizing the total data transfer.
The scheduling load-balance is guaranteed by the hypergraph partitioning constraint.Equation (2) shows that a partition Π of H is balanced if each part Vk satisfies the balance constraint.Since the estimated completion time of a task is the weight of the corresponding vertex, then the workload of one slave processor Pk is equal to the accumulated execution time of all assigned tasks:

Scheduling Objective
The scheduling objective is to minimize the overall execution time, known as the makespan, which starts from the first file transfer and ends with the completion of the last task execution [18].Since the estimated completion time of one task is the sum of the total data transfer time and actual task computation time, the scheduling objective is to shorten the amount of data transfer and balance the computational load across the slaves in such a way that the overall execution time is minimized.With the help of the formulated hypergraph model, this objective will be further generalized and considered as the objective of hypergraph partitioning.
Data transfer minimization is achieved by the hypergraph partitioning objective.
After constructing the hypergraph H, the objective of a typical hypergraph partitioning problem is to find a partition Π = {V 1 , V 2 , . . ., V K } where the cutsize is minimized.For a given partition Π, a cut hyperedge n j with connectivity λ j means that the file f j needs to be transferred λ j ´1 more times but incurs additional `λj ´1˘˚S ize `fj ˘bytes of data transmission.Thus, the total communication cost can be computed as in Equation ( 4): where ř n j PN c j is equal to total input file size and can be treated as constant, so the comm pΠq depends on the cutsize pΠq.Thus, minimizing the cutsize is equivalent to minimizing the total data transfer.
The scheduling load-balance is guaranteed by the hypergraph partitioning constraint.Equation (2) shows that a partition Π of H is balanced if each part V k satisfies the balance constraint.Since the estimated completion time of a task is the weight of the corresponding vertex, then the workload of one slave processor P k is equal to the accumulated execution time of all assigned tasks: Thus, achieving balance among all the grouped vertices during hypergraph partitioning corresponds to balancing the workload of slave processors during the scheduling.

The Hypergraph+ Scheduling Algorithm
The proposed Hypergraph+ scheduling algorithm has two consecutive stages: matching and ordering.Section 4.1 introduces hypergraph partitioning for mapping tasks to the slave processors.Section 4.2 explains an ordering algorithm that efficiently orders tasks for execution and accordingly transfers the needed files.

Hypergraph Partitioning for Matching Tasks
Hypergraph partitioning provides an initial scheme to assign tasks to slave processors so that data transfers are minimized and computational workloads are balanced.Single hypergraph partitioning may not be optimal, however, under conditions of platform heterogeneity.Therefore, we consider both network heterogeneity and processor heterogeneity when evaluating the quality of partitioning results for optimization.The whole flow of matching tasks to slaves is shown in Figure 7. First, the input hypergraph model is quickly partitioned with the PaToH tool [10] to obtain partition Π= {V 1 , V 2 , . . ., V K }.Next, the fitness evaluation is carried out on partition Π with the fitness function: Fitness(Π).Finally, the optimum partition is chosen to map tasks to slaves.The Fitness(Π) evaluation is as follows: (a) Equation ( 4) is only valid for homogeneous network cases, and λ j is set to constant 1 b m .After obtaining one hypergraph partitioning, the actual communication volume comm 1 pΠq is calculated with Equation ( 6).The heterogeneous network, λ j is modified to ř , where Λ j denotes the set of slave processors needed to transfer file f j , and b k is the bandwidth between P k and P 0 .
(b) Then, the actual workload of each slave processor W 1 k pΠq is calculated, which is the sum of the computation load and communication cost.In contrast to Equation ( 5), relative processing capability ρ k is added to represent the actual computational load and b k is substituted for b m to derive the actual communication load in Equation (7).
(c) The average of workload W 1 avg pΠq and the mean square deviation of workload sd 1 W 1 k pΠq are calculated as in Equations ( 8) and ( 9).
The fitness value is defined in (10): a smaller fitness value implies that the partition has lower communication overhead and a more equally balanced computational load.
From Equation (10), a lower fitness value means a better partition quality.Generally, the lowest fitness value is inversely proportional to the repetition number n, but a greater repetition number will increase the entire evaluation time cost.To achieve a cost/quality balance, the iteration number n is chosen as follows.Initially, n is set to a given number (e.g., 10).Then, n doubles each time until the percentage decrease in the lowest fitness value is smaller than a given threshold (e.g., 5%).In this way, the evaluation will generate satisfactory partition quality without costing much time.

Ordering Tasks and File Transfers
After all the tasks have been assigned to their destination processors, Hypergraph+ will then determine the task execution order and input file transfer so as to maximize the overlap between computation and communication while decreasing the end-point contention among the slaves.
In order to achieve overlap maximization, a Sharing-Files(SF) metric is introduced to order the task execution on each processor.This metric computes how similar one task is to other tasks.The SF value of one task is defined as the number of bytes that the task input files shares with other tasks on the assigned processor.It can be calculated as in Equation ( 11): Task t i with higher SF(t i ) value means that its input files are shared with more tasks.Task t i that has the highest SF(t i ) value will be executed first; then, the required files are transferred in advance; thus other tasks relying on these files are subsequently executed.In this case, communication and computation can be overlapped to decrease the waiting time of tasks.Algorithm 1 outlines the proposed task ordering heuristic.

Simulated Resources
Simulations provide a repeatable and controllable evaluation environment, and were used to perform an evaluation of our proposed Hypergraph+ algorithm.We selected the GridSim toolkit [4] to conduct the simulations since it allows us to model heterogeneous processor resources and network connectivity with different bandwidths.GridSim also supports both static and dynamic scheduling simulations.
In this simulation, six slave processors were defined as in Table 1 to execute the input tasks.Each slave processor contains two distinct characteristics, the CPU speed and network bandwidth.Since the task execution time can be defined in terms of million instructions (MI), the CPU resource speed was modeled as million instructions per second (MIPS).The network bandwidth is the bandwidth of the link between the master and the slave.The MIPS and bandwidth were randomly generated for this evaluation experiment [29].

Experimental Application and Datasets
We selected spatial interpolation as the target application to evaluate the Hypergraph+ scheduling algorithm.For simplicity, inverse distance weighted (IDW) interpolation was used in the experiments.IDW reflects the principle that the estimated value of a cell is more likely correlated with nearby points than distant points [30].The IDW interpolation equation is defined as, where Z p is the interpolated value at the target point p; Z i is the observed value at the ith scatter point p i in the neighborhood of p; k is the number of scatter points taken into the interpolation in the predefined neighborhood of p; d i is the Euclidian distance from the ith scatter point p i to p; and β is an arbitrary positive number called the weighting exponent.
A LiDAR point cloud dataset was used as real input in the experiments.These LiDAR point cloud data were acquired in Gilmer County, West Virginia, USA and were free for downloading on the Internet (http://www.wvview.org/data/lidar/Gilmer/).The dataset contains 0.883 billion points, and the point spacing is about 1.4 m, illustrated in Figure 8.This dataset is stored in the ASPRS LAS file format.The total data size is approximately 16.4 GB.
A LiDAR point cloud dataset was used as real input in the experiments.These LiDAR point cloud data were acquired in Gilmer County, West Virginia, USA and were free for downloading on the Internet (http://www.wvview.org/data/lidar/Gilmer/).The dataset contains 0.883 billion points, and the point spacing is about 1.4 m, illustrated in Figure 8.This dataset is stored in the ASPRS LAS file format.The total data size is approximately 16.4 GB.The experimental LiDAR dataset was later divided into multiple point blocks.IDW interpolation requires Moore neighborhood to be used as the neighboring blocks input, and the formulated Hypergraph application model is the same as the example model defined in Section 3.2.In this application model, the hyperedge weight c j was set to each point block size.The vertex weight w i was set to the interpolation time of the corresponding point block T ct .
In Equation ( 3), we derived that the actual computation time of one task is proportional to the size of its input files.Thus, an additional experiment was conducted firstly to explore the quantitative relationship between the IDW interpolation time and the input points size.As shown in Figure 9, the relationship between the data size of points and IDW interpolation runtime is almost linear (R 2 > 0.99).From the curve fit function, C was solved to 0.0002 for Equation (3).The experimental LiDAR dataset was later divided into multiple point blocks.IDW interpolation requires Moore neighborhood to be used as the neighboring blocks input, and the formulated Hypergraph application model is the same as the example model defined in Section 3.2.In this application model, the hyperedge weight cj was set to each point block size.The vertex weight wi was set to the interpolation time of the corresponding point block Tct.
In Equation ( 3), we derived that the actual computation time of one task is proportional to the size of its input files.Thus, an additional experiment was conducted firstly to explore the quantitative relationship between the IDW interpolation time and the input points size.As shown in Figure 9, the relationship between the data size of points and IDW interpolation runtime is almost linear (R 2 > 0.99).From the curve fit function, C was solved to 0.0002 for Equation (3).

Evaluation Results and Discussions
With the formulated platform and application models in Sections 5.1 and 5.2, experiments were carried out to evaluate the performance and efficiency of Hypergraph+, comparing it to MinMin [5], XSufferage [6] and Hypergraph, which is the original hypergraph partitioning-based approach [7].These three heuristics are typical task-oriented, data-aware, and hypergraph-based scheduling algorithms described as in Section 2.2.
The metrics used for evaluating the scheduling algorithms are makespan, I/O reduction percentage, and running time.The makespan, i.e., the overall execution time, is the most common performance measure for a scheduling algorithm.A lower makespan means better performance of the scheduling algorithm.The I/O reduction percentage was calculated as the ratio of the amount of data sets accessed from the local disk storage to the total amount of data sets required by the tasks.A higher I/O reduction percentage means a greater decrease in data transfers.The running time is

Evaluation Results and Discussions
With the formulated platform and application models in Sections 5.1 and 5.2, experiments were carried out to evaluate the performance and efficiency of Hypergraph+, comparing it to MinMin [5], XSufferage [6] and Hypergraph, which is the original hypergraph partitioning-based approach [7].These three heuristics are typical task-oriented, data-aware, and hypergraph-based scheduling algorithms described as in Section 2.2.
The metrics used for evaluating the scheduling algorithms are makespan, I/O reduction percentage, and running time.The makespan, i.e., the overall execution time, is the most common performance measure for a scheduling algorithm.A lower makespan means better performance of the scheduling algorithm.The I/O reduction percentage was calculated as the ratio of the amount of data sets accessed from the local disk storage to the total amount of data sets required by the tasks.A higher I/O reduction percentage means a greater decrease in data transfers.The running time is the time spent scheduling tasks to computing processors, and reflects the time complexity of scheduling algorithms.A scheduling algorithm will be more efficient with less running time.All three metrics provide a complete evaluation for each scheduling algorithm.
In our experiments, the target application computed a digital elevation model of Gilmer County from the LiDAR dataset.The original point cloud was divided into different block sizes and this could lead to different task granularities and different degrees of I/O overlap, i.e., smaller block sizes created more I/O overlap.Illustrated from Figure 5 and the example in Section 3.2, the number of IDW interpolation tasks was equal to the number of point blocks.Experimental results are illustrated in Figures 10 and 11     As shown in Figure 10a, when the number of tasks increased, the makespan of MinMin increased quite rapidly; XSufferage and Hypergraph followed the same pattern, but the makespan of Hypergraph+ grew much more slowly.During the entire task execution process, our proposed Hypergraph+ algorithm reduced the total execution time for MinMin, XSufferage, and Hypergraph by 70%, 62%, and 43%, respectively.These results demonstrate that Hypergraph+ outperforms the other scheduling strategies.
Figure 10b shows that the percentage of I/O reduction in these heuristics varied with the number of tasks.When the number of tasks increased, the percentage I/O reduction in MinMin was about 40%, XSufferage was nearly 55%, and Hypergraph was above 80%, but Hypergraph+ achieved a 2%-5% higher reduction than Hypergraph.In terms of the I/O reduction metric, Hypergraph+ was superior to MinMin, XSufferage and Hypergraph.
As shown in Figure 10a,b, MinMin and XSufferage perform slower with lower I/O reduction than Hypergraph+ and Hypergraph.This is because MinMin does not consider data sharing at all, and XSufferage fails to exploit data sharing patterns globally.Hypergraph+ and Hypergraph take data sharing into consideration globally such that the tasks with shared input are assigned to the same processor as much as possible.In addition, Hypergraph+ can obtain an optimal hypergraph partition result and maximizes the overlap probability between communication and computation to decrease the waiting time for tasks.Therefore, Hypergraph+ achieves better performance than Hypergraph in terms of makespan and I/O reduction percentage.
As illustrated in Figure 11, when the number of tasks increased, the running time increased at a much faster rate for MinMin and XSufferage, in contrast to Hypergraph+ and Hypergraph.MinMin and XSufferage must calculate the expected completion time for each task on each computing node to choose one task until all tasks are executed; consequently, the time complexity was O(n 2 ).On the other hand, Hypergraph+ and Hypergraph use hypergraph partitioning to map all tasks to processors quickly, as the time complexity of hypergraph partitioning was O(n) + O(logn) [31].Hypergraph+ was only about 3 s slower than Hypergraph on average.As seen in Figure 10a, Hypergraph+ conserved more than 2400 seconds compared with Hypergraph.This small overhead can be negligible.Thus, Hypergraph+ can achieve better performance than the other three algorithms and still maintains high efficiency.

Conclusions
This paper presents a Hypergraph+ scheduling algorithm that extends the existing hypergraph-based scheduling algorithm for massive spatial data processing to obtain better performance.It first formulates a general hypergraph model to represent tasks, spatial datasets and processing platform.Then, the quality of hypergraph partitioning results is evaluated by a Fitness function to map tasks to the processors such that the total volume of communication is minimized while balancing computational workloads.Moreover, Hypergraph+ schedules tasks and file transfers to maximize the overlap probability between communication and computation with reduced end-point contention among processors.Simulations were carried out to compare Hypergraph+ with MinMin, XSufferage, and Hypergraph using spatial interpolation applications on heterogeneous master-slave platforms.Simulation results illustrate that the Hypergraph+ is on the average 43% better than Hypergraph in terms of makespan, while preserving the efficiency of Hypergraph.
In the future, we will extend the Hypergraph+ algorithm to distributed file system storage centers.Currently, the distributed file system, e.g., Hadoop HDFS, is used to store and process massive spatial datasets.Data replication is often employed in Hadoop HDFS to improve availability and throughput.Therefore, our Hypergraph+ scheduling algorithm can be further investigated to address the data replication problem and exploit a higher degree of data sharing in a Hadoop environment.
Figure 2 illustrates one hypergraph; in this figure, the closed curve represents one hyperedge and the dots in the closed curve denote the vertices on this hyperedge.A graph can be treated as a special type of hypergraph where each hyperedge can only connect two vertices.Similar to a graph, the weights wi and costs cj can be assigned to the vertices (  ∈ V) and hyperedges (  ∈ N) of the hypergraph, respectively.

Figure 2 .
Figure 2.An illustration of one hypergraph: the dots represent the vertices, and the closed curves denote the hyperedges.
2) where   = ∑     ∈  denotes the sum of the vertex weights of one part Vk;   = ∑     ∈  is the average weight; and ε is a predetermined imbalanced value.

Figure 2 .
Figure 2.An illustration of one hypergraph: the dots represent the vertices, and the closed curves denote the hyperedges.

Figure 5 .
Figure 5. Tasks and files in a Moore neighborhood algorithm.

Figure 5 .
Figure 5. Tasks and files in a Moore neighborhood algorithm.

Figure 5 .
Figure 5. Tasks and files in a Moore neighborhood algorithm.

Figure 7 .
Figure 7.The flow diagram of hypergraph partitioning for matching tasks.

Figure 9 .
Figure 9.The quantitative relationship between input points size and IDW interpolation time.

Figure 9 .
Figure 9.The quantitative relationship between input points size and IDW interpolation time. .

Figure 10 .
Figure 10.Performance evaluation with different numbers of tasks.(a) Makespan; (b) I/O reduction percentage.

Figure 11 .
Figure 11.Efficiency evaluation with different numbers of tasks.Figure 11.Efficiency evaluation with different numbers of tasks.

Figure 11 .
Figure 11.Efficiency evaluation with different numbers of tasks.Figure 11.Efficiency evaluation with different numbers of tasks.

Table 1 .
Slave setup for the simulation.