New Bipartite Graph Techniques for Irregular Data Redistribution Scheduling

: For many parallel and distributed systems, automatic data redistribution improves its locality and increases system performance for various computer problems and applications. In general, an array can be distributed to multiple processing systems by using regular or irregular distributions. Some data distribution adopts BLOCK, CYCLIC, or BLOCK-CYCLIC to specify data array decomposition and distribution. On the other hand, irregular distributions specify a di ﬀ erent-size data array distribution according to user-deﬁned commands or procedures. In this work, we propose three bipartite graph problems, including the “maximum edge coloring problem”, the “maximum degree edge coloring problem”, and the “cost-sharing maximum edge coloring problem” to formulate these kinds of distribution problems. Next, we propose an approximation algorithm with a ratio bound of two for the maximum edge coloring problem when the input graph is biplanar. Moreover, we also prove that the “cost-sharing maximum edge coloring problem” is an NP-complete problem even when the input graph is biplanar.


Introduction
For many years, parallel and distributed networked computer systems have been developed and implemented to tackle compound scientific problems powerfully. Many parallel and distributed systems have been designed to change the distribution of an array at run-time to improve the desired performance. At that time, each processor needs to know which local elements of the array should be sent to the specified processor, and which elements of the array should be received from the specified processor. In general, an array can be distributed to multiple processing systems by using regular or irregular distributions. The former is used in the SCALAPACK library, which uses the block-cyclic data distribution on a virtual grid of processors to reach load-balance on arrays, and adopts equal memory usage between processors [1,2]. On the other hand, an irregular distribution does not adopt specific rules or methods to represent the desired decomposition. Instead, they usually represent the irregular array distribution with the help of user-defined command or procedures. As an example, High Performance Fortran version 2 (HPF2) supports a generalized block distribution (named GEN_BLOCK) format [3,4], when unevenly sized data of an array is to be allocated to a set of processors.
In Figure 1, a data redistribution pattern BC (1,4,4) is depicted as an example, and its weighted distribution graph GBC (1,4,4) is also shown in Figure 2. In Figure 1, array elements A(1:16) mapped onto four processors (P) are to be redistributed to four (P) processors. In source distribution, every block with α(=1) element are mapped onto one of processors which are ordered in the cyclic sequence. As a result, A(1), A(5), A(9), A (13) are mapped onto processor P0 in Figure (1,4,4) is K4, 4. In a similar way, an irregular distribution GB (P, Q) is modeled by a weighted bipartite graph GGB(P, Q) = (S, T, E). For instance, in Figure 3, array elements A(1:100) are to be redistributed. Originally, array elements A(1:24), A(25:50), A(51:64), A(65:100) are mapped onto P0, P1, P2, and P3, respectively. After redistribution, a new distribution is formed, such as A(1:15), A(16:49), A(50:89), A(90:100) onto Q0, Q1, Q2, and Q3, respectively. Its distribution graph, GGB (4,4), is presented in Figure  4. Note that one processor may need to send array elements to multiple consecutive processors in an irregular distribution. For example, P0 has 24 elements A(1:24) which A(1:15) are sent to Q0, and A(16:24) are sent to Q1. As a result, there is an edge with a weight 15 and another edge with a weight 9 in the distribution graph to indicate these two communications in Figure 4. In a similar way, an irregular distribution GB (P, Q) is modeled by a weighted bipartite graph G GB(P, Q) = (S, T, E). For instance, in Figure 3, array elements A(1:100) are to be redistributed. Originally, array elements A(1:24), A(25:50), A(51:64), A(65:100) are mapped onto P 0 , P 1 , P 2 , and P 3 , respectively. After redistribution, a new distribution is formed, such as A(1:15), A(16:49), A(50:89), A(90:100) onto Q 0 , Q 1 , Q 2 , and Q 3 , respectively. Its distribution graph, G GB (4,4) , is presented in Figure 4. Note that one processor may need to send array elements to multiple consecutive processors in an irregular distribution. For example, P 0 has 24 elements A(1:24) which A(1:15) are sent to Q 0 , and A(16:24) are sent to Q 1 . As a result, there is an edge with a weight 15 and another edge with a weight 9 in the distribution graph to indicate these two communications in    (4,4) of GB (4,4) in Figure 3.
Most parallel computer programming languages supply run-time primitives or instructions to reallocate the array distribution of a program. Consequently, designing effective algorithms to do array redistribution is crucial for implementing distributed memory system software for these programming languages. In addition to array redistribution in parallel systems, in recent years, different data redistribution issues also received a lot of attention, including data redistribution in wireless sensor networks to minimize power consumption [5], data redistribution and retrieval in wireless sensor networks to preserve data [6], data redistribution in parallel join operations in distributed system [7], data distribution without breaking its semantics [8], data redistribution of data between processes [9], co-optimized application-level data movement and network-level data communications for distributed operators [10].
In this work, a bipartite graph is used to model data redistributions. Notice that a data redistribution is modeled by a bipartite graph G = (S, T, F), named a redistribution graph. In G, set S represents processor set containing a source to send data, and set T represents processor set containing a destination to receive the data, and each edge in F indicates the message. The interested data redistributions are modeled as the following graph problems in this work.
When a set of independent edges of the given graph G is used to represent a set of synchronized contention-free communications in MECP, the value max{w(e)⏐e∈Fi} indicates the longest time required to complete all communications allocated to ith step. Accordingly, the sum of ( max , which indicates the overall communication time for the data redistribution represented by G. In the next problem, we try to not only shorten the overall communication time but also the number of required communication steps. Additional variants of MECP use the minimum number of edge colorings, which are defined as follows.    (4,4) of GB (4,4) in Figure 3.
Most parallel computer programming languages supply run-time primitives or instructions to reallocate the array distribution of a program. Consequently, designing effective algorithms to do array redistribution is crucial for implementing distributed memory system software for these programming languages. In addition to array redistribution in parallel systems, in recent years, different data redistribution issues also received a lot of attention, including data redistribution in wireless sensor networks to minimize power consumption [5], data redistribution and retrieval in wireless sensor networks to preserve data [6], data redistribution in parallel join operations in distributed system [7], data distribution without breaking its semantics [8], data redistribution of data between processes [9], co-optimized application-level data movement and network-level data communications for distributed operators [10].
In this work, a bipartite graph is used to model data redistributions. Notice that a data redistribution is modeled by a bipartite graph G = (S, T, F), named a redistribution graph. In G, set S represents processor set containing a source to send data, and set T represents processor set containing a destination to receive the data, and each edge in F indicates the message. The interested data redistributions are modeled as the following graph problems in this work.
When a set of independent edges of the given graph G is used to represent a set of synchronized contention-free communications in MECP, the value max{w(e)⏐e∈Fi} indicates the longest time required to complete all communications allocated to ith step. Accordingly, the sum of  Most parallel computer programming languages supply run-time primitives or instructions to reallocate the array distribution of a program. Consequently, designing effective algorithms to do array redistribution is crucial for implementing distributed memory system software for these programming languages. In addition to array redistribution in parallel systems, in recent years, different data redistribution issues also received a lot of attention, including data redistribution in wireless sensor networks to minimize power consumption [5], data redistribution and retrieval in wireless sensor networks to preserve data [6], data redistribution in parallel join operations in distributed system [7], data distribution without breaking its semantics [8], data redistribution of data between processes [9], co-optimized application-level data movement and network-level data communications for distributed operators [10].
In this work, a bipartite graph is used to model data redistributions. Notice that a data redistribution is modeled by a bipartite graph G = (S, T, F), named a redistribution graph. In G, set S represents processor set containing a source to send data, and set T represents processor set containing a destination to receive the data, and each edge in F indicates the message. The interested data redistributions are modeled as the following graph problems in this work.

Definition 1.
For a bipartite graph G = (S, T, F) as an input, the maximum edge coloring problem (MECP) is to find a set of edge colorings (partitions) {F 1 , F 2 , F 3 , . . . , F z } of F to minimize z i=1 max w(e)|e ∈ F i , where w(e) is the weight of edge e.
When a set of independent edges of the given graph G is used to represent a set of synchronized contention-free communications in MECP, the value max{w(e)|e∈F i } indicates the longest time required to complete all communications allocated to ith step. Accordingly, the sum of max{w(e)|e∈F i }(for 1 ≤ i ≤ z) is denoted as z i=1 max w(e)|e ∈ F i , which indicates the overall communication time for the data redistribution represented by G. In the next problem, we try to not only shorten the overall communication time but also the number of required communication steps. Additional variants of MECP use the minimum number of edge colorings, which are defined as follows.

Definition 2.
For a bipartite graph G = (S, T, F) as an input, the maximum degree edge coloring problem (MDECP) is to find out a set of edge colorings {F 1 , F 2 , F 3 , . . . , F ∆(G) } of G to minimize ∆(G) i=1 max w(e)|e ∈ F i , where w(e) denotes the weight of e and ∆(G) denotes the maximum degree of G.
For example, when the input is the distribution graph G in Figure 4, a solution to the MECP is the edge partition {{P o , Q 0 }, {P 1 , Q 1 }, {P 3 , Q 2 }}, {{P o , Q 1 }, {P 2 , Q 2 }, {P 3 , Q 3 }}, and {{P 1 , Q 1 }} with the minimum value max{15, 16, 25} + max{9, 15, 11} + max{9} = 25 + 15 + 9 = 49, as shown in Figure 5. Since the number of colors used equals to the maximum degree of G, the edge partition is also a solution to the MDECP.   Figure 5. Since the number of colors used equals to the maximum degree of G, the edge partition is also a solution to the MDECP.
In previous research, the maximum coloring problem has been defined in [11,12]. The problem here is finding a proper coloring on the vertex set of input graphs such that the color classes (disjoint vertex sets) V1, V2, …, Vk minimize where w denotes the weight function and k denotes the number of colors used. However, the maximum coloring problem is an NP-hard problem even when its input is an interval graph [11]. Due to its intractability, a lot of papers aim to devise approximation algorithms for it on special graphs [11,12]. Obviously, the above problem is a kind of vertex coloring, which is totally different from the edge coloring problems defined in this work.
Note that some traditional open shop scheduling problems can be solved by using a bipartite-graph based approach [13]. A shop consists of many processors. Each of these processors performs a different task. In open shop no restrictions are placed on the order in which the tasks for any job are to be processed, which is different from the MECP and MDECP defined in this work. In [13], Gonzalez et al. showed that preemptive and non-preemptive optimal finish time schedules can be obtained in linear time when the number of processors is two. When the number of processors is larger than two, the optimal finish time schedule can still be obtained in polynomial time by applying maximal matching in bipartite graphs.
To the best knowledge of the authors, this is the first time anyone has set out to define MECP and MDECP by using bipartite graph terms and techniques. Though some sophisticated algorithms have been developed for MDECP [3,4,14], MECP and MDECP are open to scholars to devise a polynomial-time algorithm even when the input belongs to biplanar graphs. In this work, we present the first approximation algorithm for MECP with a ratio bound of two when the input graph is biplanar.
Moreover, a new graph problem called the cost-sharing maximum edge coloring problem (CSMECP) is defined in this work to further model how to partition communication messages in irregular redistribution scheduling. In our previous conference paper [15], we designed a heuristic algorithm to partition large data segments into multiple small data segments and to schedule them in different communication steps without using graph techniques. In this work, we formally In previous research, the maximum coloring problem has been defined in [11,12]. The problem here is finding a proper coloring on the vertex set of input graphs such that the color classes (disjoint vertex sets) where w denotes the weight function and k denotes the number of colors used. However, the maximum coloring problem is an NP-hard problem even when its input is an interval graph [11]. Due to its intractability, a lot of papers aim to devise approximation algorithms for it on special graphs [11,12]. Obviously, the above problem is a kind of vertex coloring, which is totally different from the edge coloring problems defined in this work.
Note that some traditional open shop scheduling problems can be solved by using a bipartite-graph based approach [13]. A shop consists of many processors. Each of these processors performs a different task. In open shop no restrictions are placed on the order in which the tasks for any job are to be processed, which is different from the MECP and MDECP defined in this work. In [13], Gonzalez et al. showed that preemptive and non-preemptive optimal finish time schedules can be obtained in linear time when the number of processors is two. When the number of processors is larger than two, the optimal finish time schedule can still be obtained in polynomial time by applying maximal matching in bipartite graphs.
To the best knowledge of the authors, this is the first time anyone has set out to define MECP and MDECP by using bipartite graph terms and techniques. Though some sophisticated algorithms have been developed for MDECP [3,4,14], MECP and MDECP are open to scholars to devise a polynomial-time algorithm even when the input belongs to biplanar graphs. In this work, we present the first approximation algorithm for MECP with a ratio bound of two when the input graph is biplanar.
Moreover, a new graph problem called the cost-sharing maximum edge coloring problem (CSMECP) is defined in this work to further model how to partition communication messages in irregular redistribution scheduling. In our previous conference paper [15], we designed a heuristic algorithm to partition large data segments into multiple small data segments and to schedule them in different communication steps without using graph techniques. In this work, we formally reformulate the ideas by using a new graph decision problem, CSMECP. So far, it is an open question to know whether CSMECP is NP-complete or not. In this work, we also prove that the CSMECP is an NP-complete problem even when the input belongs to biplanar graphs.
In short, the contributions of this work are listed below.

2.
Design a first approximation algorithm with ratio bound of two for MECP when the input is a biplanar graph.

3.
Give a formal proof to show that the CSMECP is an NP-complete problem even when the input is a biplanar graph.
The remainder of our work is organized below. Section 2 introduces required graph definitions. Related research is briefly described and discussed in Section 3. We design and prove the correctness of an approximation algorithm for MECP in Section 4. Section 5 defines the CSMECP and shows that it is NP-complete. At last, Section 6 concludes this work and points out some future work.

Definitions and Notations
A tree is a connected acyclic graph. The components of a graph, G, are its maximal connected subgraphs. We use ω(G) to denote the number of components of G. Let N(v) denote the set of vertices adjacent to v. A multi-graph is a graph with two more edges with the same pair of ends. The degree d G (v) of a vertex v in a loopless graph G is the number of incident edges. The maximum degree of vertices of G is denoted by ∆(G).
A k-edge coloring of G = (V, E) is a labeling function f :E→S, where |S| = k. The labels are colors; the edges of one color form a color class. A k-edge-coloring is proper if incident edges have different labels. A graph is k-edge colorable if it has a proper k-edge-coloring. The edge chromatic number χ'(G) of a loopless graph G is the least k such that G is k-edge-colorable. For example, Figure 6 shows a graph that is 3-edge colorable; its edge chromatic number χ'(G) = 3.
Algorithms 2018, 11, x FOR PEER REVIEW 5 of 15 reformulate the ideas by using a new graph decision problem, CSMECP. So far, it is an open question to know whether CSMECP is NP-complete or not. In this work, we also prove that the CSMECP is an NP-complete problem even when the input belongs to biplanar graphs. In short, the contributions of this work are listed below.
1. Define three new bipartite graph models including MECP, MDECP, and CSMECP. 2. Design a first approximation algorithm with ratio bound of two for MECP when the input is a biplanar graph. 3. Give a formal proof to show that the CSMECP is an NP-complete problem even when the input is a biplanar graph.
The remainder of our work is organized below. Section 2 introduces required graph definitions. Related research is briefly described and discussed in Section 3. We design and prove the correctness of an approximation algorithm for MECP in Section 4. Section 5 defines the CSMECP and shows that it is NP-complete. At last, Section 6 concludes this work and points out some future work.

Definitions and Notations
A tree is a connected acyclic graph. The components of a graph, G, are its maximal connected subgraphs. We use ω(G) to denote the number of components of G. Let N(v) denote the set of vertices adjacent to v. A multi-graph is a graph with two more edges with the same pair of ends. The degree dG(v) of a vertex v in a loopless graph G is the number of incident edges. The maximum degree of vertices of G is denoted by Δ(G).
For example, Figure 6 shows a graph that is 3-edge colorable; its edge chromatic number χ'(G) = 3. A matching in a graph G is a set of non-loop edges with no shared endpoints. A graph is bipartite if its vertex set is the union of two disjoint independent sets, called partite sets. A complete bipartite graph is a simple bipartite graph such that two vertices are adjacent if and only if they are in different partite sets; when the sets have sizes m and n, this complete bipartite graph is denoted by Km, n. The line graph of a graph G, written L(G), is the graph whose vertices are the edges of G, with (e, f) is in the edge set of L(G) when e = (u, v) and f = (v, w). In Figure 7, the right graph is the line graph L(G) of the left graph G. A matching in a graph G is a set of non-loop edges with no shared endpoints. A graph is bipartite if its vertex set is the union of two disjoint independent sets, called partite sets. A complete bipartite graph is a simple bipartite graph such that two vertices are adjacent if and only if they are in different partite sets; when the sets have sizes m and n, this complete bipartite graph is denoted by K m, n . The line graph of a graph G, written L(G), is the graph whose vertices are the edges of G, with (e, f ) is in the edge set of L(G) when e = (u, v) and f = (v, w). In Figure 7, the right graph is the line graph L(G) of the left graph G.
Algorithms 2018, 11, x FOR PEER REVIEW 5 of 15 reformulate the ideas by using a new graph decision problem, CSMECP. So far, it is an open question to know whether CSMECP is NP-complete or not. In this work, we also prove that the CSMECP is an NP-complete problem even when the input belongs to biplanar graphs. In short, the contributions of this work are listed below.
1. Define three new bipartite graph models including MECP, MDECP, and CSMECP. 2. Design a first approximation algorithm with ratio bound of two for MECP when the input is a biplanar graph. 3. Give a formal proof to show that the CSMECP is an NP-complete problem even when the input is a biplanar graph.
The remainder of our work is organized below. Section 2 introduces required graph definitions. Related research is briefly described and discussed in Section 3. We design and prove the correctness of an approximation algorithm for MECP in Section 4. Section 5 defines the CSMECP and shows that it is NP-complete. At last, Section 6 concludes this work and points out some future work.

Definitions and Notations
A tree is a connected acyclic graph. The components of a graph, G, are its maximal connected subgraphs. We use ω(G) to denote the number of components of G. Let N(v) denote the set of vertices adjacent to v. A multi-graph is a graph with two more edges with the same pair of ends. The degree dG(v) of a vertex v in a loopless graph G is the number of incident edges. The maximum degree of vertices of G is denoted by Δ(G). χ'(G) of a loopless graph G is the least k such that G is k-edge-colorable. For example, Figure 6 shows a graph that is 3-edge colorable; its edge chromatic number χ'(G) = 3. G = (V, E) is an interval graph provided that one can assign to each vertex u an interval I u such that I u ∩I v is nonempty precisely when (u, v)∈E. For example, in Figure 8, the graph is an interval graph because there is a set of intervals representing vertex set.

G = (V, E)
is an interval graph provided that one can assign to each vertex u an interval Iu such that Iu∩Iv is nonempty precisely when (u, v)∈E. For example, in Figure 8, the graph is an interval graph because there is a set of intervals representing vertex set. A method for embedding a bipartite graph G = (X, Y, E) into two parallel lines is putting the vertices of X on the first line and the vertices of Y on the second line. For every edge (x, y) in G where x∈X and y∈Y, draw an interval from x to y. In the above embedding, if all edges do not cross, this bipartite graph is called biplanar [16]. A sequence of X(Y) owns the adjacency property if ∀ v in Y(X). A graph G = (X, Y, E) is called a doubly convex-bipartite graph when both sequences of X and Y have the adjacency property [17]. At last, most graph definitions used in the work can be found in [18].
For easy reference, all notations used in this work are listed in the Table 1.  the edge chromatic number of G z the number of used colors GBC(α, β, P, Q) the distribution graph of BLOCK-SYCLIC array redistribution using Block-Cyclic(α) to Block-Cyclic(β) (from P processors to Q processors) GGB(P, Q) the distribution graph of GEN_BLOCK redistribution from P processors to Q processors

Related Work
Regular data redistribution like the SCALAPACK library uses block-cyclic data distribution on a virtual grid of processors to achieve good load-balance on arrays and an equal memory usage between processors. Arrays are wrapped by blocks in all dimensions corresponding to the processor grid. Many algorithms and formulism for discussing regular array redistribution are classified into two types: (1) set identification; and (2) communication optimization. Ramaswamy et al. found a mathematical representation for redistribution, which belongs to the former [19]. Prylli et al. implemented their algorithm in the ScaLAPACK so as to go from a block cyclic distribution to another one [20]. Based on a generalized matrix formula, Park et al. devised an algorithm for array redistribution from cyclic(x) on source processors to cyclic(Kx) on destination processors [1]. Petitet et al. proposed various redistribution methods in BLOCK-CYCLIC fashion for a very special application which executed block-partitioned linear algebra algorithms on dense matrices [2]. A method for embedding a bipartite graph G = (X, Y, E) into two parallel lines is putting the vertices of X on the first line and the vertices of Y on the second line. For every edge (x, y) in G where x∈X and y∈Y, draw an interval from x to y. In the above embedding, if all edges do not cross, this bipartite graph is called biplanar [16]. A sequence of X(Y) owns the adjacency property if ∀ v in Y(X). A graph G = (X, Y, E) is called a doubly convex-bipartite graph when both sequences of X and Y have the adjacency property [17]. At last, most graph definitions used in the work can be found in [18].
For easy reference, all notations used in this work are listed in the Table 1.  the distribution graph of BLOCK-SYCLIC array redistribution using Block-Cyclic(α) to Block-Cyclic(β) (from P processors to Q processors) G GB(P, Q) the distribution graph of GEN_BLOCK redistribution from P processors to Q processors

Related Work
Regular data redistribution like the SCALAPACK library uses block-cyclic data distribution on a virtual grid of processors to achieve good load-balance on arrays and an equal memory usage between processors. Arrays are wrapped by blocks in all dimensions corresponding to the processor grid. Many algorithms and formulism for discussing regular array redistribution are classified into two types: (1) set identification; and (2) communication optimization. Ramaswamy et al. found a mathematical representation for redistribution, which belongs to the former [19]. Prylli et al. implemented their algorithm in the ScaLAPACK so as to go from a block cyclic distribution to another one [20]. Based on a generalized matrix formula, Park et al. devised an algorithm for array redistribution from cyclic(x) on source processors to cyclic(Kx) on destination processors [1]. Petitet et al. proposed various redistribution methods in BLOCK-CYCLIC fashion for a very special application which executed block-partitioned linear algebra algorithms on dense matrices [2]. Bandera et al. pointed out some characterizations of the sparse matrix redistribution and its related issues because of the use of compressed representations [21]. Hsu et al. proposed a new method to perform a BLOCK-CYCLIC array redistribution [22]. They also implemented the proposed method on an IBM SP2 parallel machine.
On the other hand, with respect to irregular array redistribution, previous work has usually focused on indexing, message generation, or communication efficiency. Guo et al. first mentioned the problem of communication set generation for irregular applications [31]. They also devised optimization techniques for irregular array in nested loops. To decrease communication cost in GEN_BLOCK redistribution, Lee et al. [25] devised logical processor reordering methods on irregular array redistribution. In [3,4], Wang et al. designed a divide-and-conquer approach to execute irregular array redistribution. In [14], Yook et al. presented two-stage scheduling for relocation problems on a processor grid. Their proposed scheduling algorithm makes use of a spatial locality in message passing. Their algorithms have been implemented on CRAY T3E and IBM SP2.
In [15], Yu et al. presents an efficient algorithm to partition large messages into multiple small ones and schedules them by using the minimum number of steps without communication contention and, in doing so, reduces the overall redistribution time. When the number of processors or the maximum degree of the redistribution graph increases or the selected size of the messages is medium, the proposed algorithm can significantly reduce the overall redistribution time to 52%. In [15], Yu et al. failed to model their problem theoretically and did not discuss the intractability of the resolved problem. In this work, we formally reformulate the ideas by using a new graph decision problem. We also prove that the new graph problem is an NP-complete problem even when the input belongs to biplanar graphs.

Two-Approximation Algorithm for Irregular Data Redistribution Scheduling
In Theorem 1, we are going to formally prove that the corresponding graphs of GEN_BLOCK redistribution are biplanar graphs, which are a subclass of special bipartite graphs, called doubly convex-bipartite graphs [17].

Theorem 1. The corresponding redistribution graph of GEN-BLOCK data redistribution is biplanar.
Proof. Clearly the corresponding graph G GB(P, Q) = (S, T, E) is bipartite. The rest is to prove that it is planar. We can try to find out a biplanar embedding for G GB(P, Q) (see Figures 3 and 4 for example) by placing all processors on two lines and numbering them from left to right (i.e., {P 0 , P 1 , . . . , P |P|−1 } and {Q 0 , Q 1 , . . . , Q |Q|−1 }). In the GEN-BLOCK format GB (P, Q), any set of consecutive array elements are originally allocated to consecutive source processors with respect to {P 0 , P 1 , . . . , P |P|−1 }. Moreover, any set of consecutive array elements needs to be reallocated to consecutive destination processors with respect to {Q j , Q j+1 , . . . , Q k }. As a result, when there is a message sent and reallocated from P i to Q o(i) , then for any j such that i ≤ j ≤ |P|−1 and any message sent from P j to Q o(j) , we have o(j) ≥ o(i). Suppose the corresponding redistribution graph is not biplanar, there is at least a pair of crossing edges such that (P j , Q o(j) ) and (P j+∆ , Q o(j+∆) ) where ∆ > 0 and o(j + ∆) < o(j). A contradiction occurs. Therefore, the corresponding redistribution bipartite graph is a biplanar graph.
Theorem 2 presents extra properties of biplanar graphs, which will be used in this work. Theorem 2. The following statements are equivalent [16,32,33]: (1) A graph G is a biplanar graph.
(2) After removing all leaves in G, the remainder is an acyclic graph and contains no vertices with a degree more than two. (3) A graph G contains a set of caterpillars which are disjoint graphs.
In Theorem 2, a caterpillar is a tree in which a single path (the spine) is incident to (or contains) every edge. When n denotes the number of vertices (i.e., |V|) in G, the edge set of a bipartite graph contains at most O(n 2 ) edges. Yet, if the input is planar, the number of edges in G would be reduced to 3n − 6 [18]. Moreover, because biplanar graphs are essentially a subclass of planar graph, the number of edges in biplanar graphs is evidently less than 3n − 6. Additionally, by combining Theorem 1 with Theorem 2, we conclude that the graph generated from the GEN-BLOCK method is a forest in the following corollary. The next corollary further indicates that the number of edges in G is less than or equal to n − 1.

Corollary 1.
The graph G generated from GEN-BLOCK data redistribution is a forest. Proof. Suppose that G consists of ω(G) trees T 1 , T 2 , . . . , T ω(G) (by Corollary 1). Since the size of edge set of a tree is one less than the size of its vertex set and G consists of ω(G) trees, we have |E| = |X| + |Y| − ω(G).
Although we know that the redistribution graph constructed from GEN-BLOCK data redistribution is a biplanar graph by Theorem 1, so far there exists no polynomial time algorithm for MECP even when the input belongs to such graph. On the other hand, it becomes crucial to design an efficient approximation algorithm for MECP when the input is restricted to biplanar graphs. In order to devise such algorithm, we need to investigate the properties of the line graph L(G) of a biplanar graph G, because it is easy to see that the max-coloring problem on L(G) is equivalent to MECP on G. With the help of Theorem 2 and Corollary 1, we have found that G contains a set of caterpillars, which are useful to show that L(G) is an interval graph.
The next theorem becomes essential for us to design an approximation algorithm for the MECP latter. Proof. When a connected communication graph G = (S, T, E) of GEN-BLOCK data redistribution is given as an input, we only need to find out an interval model of L(G). In view of the fact that G is biplanar (by Theorem 1), we can construct a biplanar embedding of G on two lines. Without changing the sequences of the biplanar embedding, we need to show that these sequences are able to be rescaled to satisfy the next four properties:

1.
Every vertex v in S∩T can be assigned and labeled with a distinct integer χ(v) from left to right where 1 ≤ χ(v) ≤ |S|+|T|. Moreover, the x-coordinate of vertex v on two horizontal lines is exactly χ(v).

2.
Vertex set S∩T can be grouped into two disjoint subsets: spine set B and leaf set L. Here B contains all vertices in the spine, and L contains the rest of vertices (that is, all vertices with degree one in G); that is, L contains all leaf nodes in G. Since every vertex v in L is a leaf, there is a unique edge (v, b k ) connecting v to another vertex in B by Theorem 2.

4.
Due to the properties of bipartite planar graphs, L is able to be divided into R ≤ |B| disjoint leaf sets L 1 , L 2 , . . . , L |B| such that each vertex in the same leaf set is adjacent to the same vertex in B.
We can rescale the biplanar drawing properly so that every vertex v in the same leaf set is allocated with a x-coordinate χ(v) contained in a same subintervals selected from [1, , . . . , [χ( |B|−1 ) + 1, |S| + |T|]. Moreover, when two leaf nodes are in different leaf sets, their x-coordinates contained in different subintervals selected from [1, , . . . , [χ( |B|−1 ) + 1, |S| + |T|] (e.g., blue dash rectangles in Figure 9). Also notice that for each edge (u, v) in this sequence, their corresponding x-coordinates χ(u) and χ(v) are located in the same subintervals selected from [1, To show that L(G) is an interval graph, we need to find out its interval model. According to the above rescaled planar drawing, we first label each vertex of G with its x-coordinate, and then an interval [χ(v1) − 0.1, χ(v2) + 0.1] is constructed for every (v1, v2) in edge set (Figure 9). The rest shows that the intervals obtained represent L(G).
We devise an approximation algorithm for maximum edge coloring problem (called AMECP) listed below. Theorem 5. AMECP is an approximation algorithm with a ratio bound of two for MECP when the input is a biplanar graph.
Proof. Because an edge coloring of G is always mapped to a vertex coloring of L(G), the MECP of G is converted to the maximum coloring problem of L(G) according to the mapping. Due to the fact that L(G) is an interval graph (proved in Theorem 4), an approximation algorithm with a ratio bound of two for MECP of G can be devised by executing Pemmaraju et al.'s algorithm [11], which found a relation between maximum coloring and on-line graph coloring and exploited this observation to design an approximation algorithm for max vertex coloring on interval graphs. At last, we show AMECP is an approximation algorithm with a ratio bound of two for MECP. Proof. Algorithm AMECP is analyzed here step by step. The first step takes O(n 2 ) time because to implement it we need to compare every pair of edges in E of G to create the edge set of L(G). By Theorem 3, |E| = |X| + |Y| − ω(G). That means |E| < n, because ω(G) ≥ 1. As a result, this step takes at most (n − 1) × (n − 2)/2 = O(n 2 ) computation time. The second step can be implemented in an efficient way (O(n 2 ) time) if we apply Booth and Lueker's interval graph recognition algorithm, which adopted well-known PQ trees to test the consecutive ones property [34]. The third step needs O(nlog n) time if applying Pemmaraju et al.'s two-approximation algorithm [11]. At last, the last step can be implemented in linear time trivially. Finally, AMECP takes O(n 2 ) time totally.

New Graph Model for Applying Partitioning Message Techniques
In order to solve the CSMECP efficiently, we try to shorten the total communication time but do not increase the required total steps simultaneously. In [15], Yu et al. have designed a heuristic algorithm for the CSMECP. Unlike existing conventional approaches, the intuitive idea behind our previous work [15] is partitioning larger data into multiple small data units, and our algorithm reschedules these added small data units properly in different existing steps. So far, it is an open question to know whether CSMECP is NP-hard or not. In this section, we give a proof for CSMECP.
When G is bipartite, χ (G) = ∆(G) is a well-known graph property [18]. In terms of graphs, this equation indicates the number of minimum colors is equal to the maximum degree of the inputted bipartite graph. In terms of communication, this equation indicates that the minimum number of communication steps for this distribution (discussed in this work) equals the maximum degree ∆(G) of the corresponding distribution graph G. Specifically, to avoid increasing the total communication steps, we should partition larger data into small data units as many as possible and scatter them into different steps if ∆(G) = ∆(G') here G' represents the new distribution graph after these partitions. How to scatter these new add-in small data elements into different steps is another key issue for CSMECP. When a new data communication is added in a pre-schedule for an arbitrary step, if the size of the new data unit is larger than that of every data units in the pre-schedule, the time for this step becomes longer. Note that CSMECP is designed to avoid this case.
In CSMECP, to model how to partition larger data into small data units, we add duplicate edges for each of these edges with larger weights to the redistribution graph. For such edge e with larger weight, these corresponding new-adding edges share weights with e. The above techniques are used to present the properties of CSMECP as follows. Property 1. When a weighted biplanar graph G = (S, T, E) is taken as an input, the CSMECP is a kind of decision problem, which adds additional edges E so that the new graph G = (S, T, E∪E ) satisfies the subsequent constraints: 1.
Additional edge (s, t) is allowed to be added in edge set E only if these two end points, s and t, are originally adjacent by an edge (s, t) in E.

2.
∆(G ) = ∆(G). That means after adding some multiple edges, the maximum degree of the new graph G remains unchanged when compared to that of the original one G.

3.
In G = (S, T, E∪E ), w(e) e = (s, t) in E in G = (S, T, E), where w(e) is the weight of e.
When a weighted bipartite graph G = (S, T, E) with a value K are given as inputs, the cost-sharing maximum edge coloring problem (CSMECP) is to answer whether there is a proper edge coloring {F 1 , F 2 , F 3 , . . . , F ∆ } of G = (V, E∪E ) such that ∆ i=1 max w(e)|e ∈ F i ≤ K.
In the rest of this section, we show that CSMECP is NP-complete even when G is biplanar by transforming the partition problem to CSMECP. Property 2. When a set A with a function s(k)∈Z + for each k∈A are given as inputs, the partition problem is to answer whether there exists a subset A * in A such that k∈A * s(k) = k∈A−A * s(k). Theorem 7. CSMECP is NP-complete even when the input is a biplanar graph.
Proof. First, to prove that CSMECP∈NP is obvious, because a nondeterministic procedure is easily devised to guess the added edges E and their shared weights, and then the algorithm checks whether the above three conditions in Property 1 are satisfied.
Next, we reduce the partition problem, which has been found to be NP-complete in [35], to CSMECP. When an instance of the partition problem A = {α 1 , α 2 , . . . , α n } with a weight function s(α) for each α i ∈A are given as inputs, we generate a graph G=(S, T, E) so that 1.

Conclusion and Future Work
We have defined MECP, MDECP, and CSMECP graph models in this work. In addition, the intractability of CSMECP has been formally proved, and we also devised a two-approximation algorithm for MECP when the input is a biplanar graph. Note that, by applying bipartite graph techniques, an efficient heuristic algorithm including three steps to decrease the total redistribution time has been devised in our previous work [15]. Moreover, in [15], extensive simulation results are presented for the proposed heuristic algorithm for CSMECP. In future, how to estimate the reduction ratio precisely and formally is a big challenge to us. The bipartite graph techniques devised and defined in the work including MECP, MDECP, and CSMECP are expected to be applied to other practical problems in different distributed systems and networks.