Algorithmic Perspectives of Network Transitive Reduction Problems and their Applications to Synthesis and Analysis of Biological Networks

In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a classic computational problem in combinatorial algorithms. We will subsequently consider a few non-trivial extensions or generalizations of this problem motivated by applications in systems biology. We will then discuss the applications of these algorithmic methodologies in the context of three major biological research questions: synthesizing and simplifying signal transduction networks, analyzing disease networks, and measuring redundancy of biological networks.


Introduction
In this survey paper, we review several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Investigations of problems of these types that involve dealing with formal frameworks of very similar combinatorial nature have been OPEN ACCESS done two by independent groups of communities of researchers, one being the theoretical computer science and computer networking research community and the other being the biological network research community. However, from the published literature it follows that there is minimal cooperation between such groups. The purpose of this survey is to promote a constructive dialogue among these two research communities working on similar problems so that intrigued biologists may probe further and learn new techniques from the perspective of formal analysis of algorithms and intrigued computer scientists may probe further to learn new terminologies and applications in biology. Following the general guidelines of this special issue, we first present the formal algorithmic ideas separately from their application and subsequently discuss the applications that involve these formal frameworks.
Minimum equivalent digraph is a classical computational problem (cf. [1]) with several recent extensions motivated by applications in social sciences and systems biology. A formal definition of the basic equivalent digraph problem is as follows. A complementary problem is the MAX-ED problem whose objective is to maximize |E\A|. Even though the complexity of finding an exact solution is the same for both MIN-ED and MAX-ED, the same may not necessarily be true for their approximate solutions (in the same manner as for node cover and independent set problems for general graphs [2]). For example, suppose that we have a graph with 1,000 edges and an exact solution for MIN-ED and MAX-ED with 490 edges. Suppose that an approximation algorithm for MIN-ED guarantees that we will find a solution with at most 980 edges. Thus, this approximation algorithm provides an approximation ratio of 980/490 = 2 for MIN-ED. However, the same algorithm for MAX-ED can have an approximation ratio as large as = = 25.5 (1) Skipping the condition A E in the definition of MIN-ED (or MAX-ED) yields the so-called transitive reduction (TR) problem which was solved in polynomial-time by Aho, Garey and Ullman [3]. See Figure 1 for an illustration of valid solutions of MIN-ED.

Three Extensions of the Basic Version
In this subsection, we discuss three non-trivial extensions of the basic problem that have been formulated based on their applications. We will review in more details the applications of the basic version as well as the other extensions separately in Section 4. The solution in (c) is optimal since it has fewer edges.

MIN-ED and MAX-ED with Critical Edges
This extension is the same as MIN-ED or MAX-ED except that a given subset D of edges must be present in any valid solution. Formally, we are given D is changed to D A E. Let us denote this version as critical-MIN-ED and critical-MAX-ED, as appropriate. As we will see subsequently, this extension is quite non-trivial if one desires a good approximate solution.

Weighted Version of MIN-ED or MAX-ED
In this version, each edge has a weight (positive real number) and an optimal valid solution must have the minimum possible value of total edge weights. Formally, we have a weight function w:E + and the goal is either to minimize e A w(e) or to maximize e E w(e) e A w(e). Let us denote this version as weighted-MIN-ED or weighted-MAX-ED, as appropriate. Obviously, the basic version is a special case of this weighted version when every edge weight is 1.

Binary Transitive Reduction (BTR)
This extension is a generalization of the basic version with critical edges and is described as follows [4 7]. We have an edge-1, 1}. The label or parity of a path P = (u 0 , u 1 , , u k ) is derived from the labels of its edges and given by i i-1 ,u i ). The transitive closure relation is now generalized as ( ) E = {(u i , u j , q): path P using edges in E from u i to u j }. Then, A is a binary transitive reduction of E with a required subset D if . Obviously, the basic version with critical edges is a special case of BTR when every edge label is 1. There are two (maximization and minimization) objective functions corresponding to the two generalizations of the basic version MIN-ED and MAX-ED; they will be denoted by MIN-BTR and MAX-BTR, respectively. We will use the notation u i , p E u j to indicate a path from node u i to node u j of parity p { 1, 1}.
The relationships between various versions of the basic equivalent digraph problem are as follows: where A < B means problem A is a special case of problem B. The relationships between the problem Weighted-MIN-ED and the problems critical-MIN-ED and MIN-BTR (and, similarly between the problem Weighted-MAX-ED and the problems critical-MAX-ED and MAX-BTR) are not completely known, though it is possible to design approximation algorithms for critical-MIN-ED and MIN-BTR based on approximation algorithms for Weighted-MIN-ED.
We review the following standard definitions in approximation algorithms theory. A -approximate solution (or simply a -approximation) of a minimization (respectively, maximization) problem is a polynomial-time solution with an objective value no smaller than (respectively, no larger than) times the value of the optimum; an algorithm of performance or approximation ratio produces an -approximate solution. A problem is APX-hard if there exists a > 1 such that no polynomial-time algorithm has an approximation ratio of unless P = NP. The notation OPT(G) (or simply OPT when G is clear from the context) will always denote the objective value of an optimal solution for the problem under consideration. We assume that the reader is familiar with the basic concepts of design and analysis of algorithms found in graduate level algorithms textbooks such as [2,8], and basic concepts of computational biology found in standard textbooks such as [9,10].

Summary of Known Algorithmic and Inapproximability Results
In this section, we briefly review known algorithmic and inapproximability results for the various equivalent digraph and transitive reduction problems defined in the previous section, leaving a more detailed description of algorithmic techniques used to obtain these results in the next section.
The algorithmic research work on MIN-ED was initiated by Moyles and Thomson [1] who described an efficient polynomial-time reduction of this problem for an arbitrary graph to that for a strongly connected graph, followed by an exact but exponential time algorithm for strongly connected graphs. Subsequently, an approximation algorithm for MIN-ED was detailed by Khuller, Raghavachari and Young [11] with an approximation ratio of 2 1 6 36 1.617 + (for any constant > 0), which was improved to an approximation algorithm with an approximation ratio of 3 2 independently by Vetta [12] and by Berman, DasGupta and Karpinski [13]. Except [13], none of these approximation algorithms will generalize directly to critical-MIN-ED with the same approximation ratio. The only non-trivial approximation algorithm known for either MAX-ED or critical-MAX-ED is a 2-approximation algorithm described in [13].
For weighted-MIN-ED, Frederickson and JàJà [14] designed a 2-approximation algorithm using an algorithm for minimum cost rooted arborescence due to Edmonds [15] and Karp [16]. Basically, it suffices to find a minimum cost in-arborescence and out-arborescence in respect to an arbitrary root node v V and take the union of all the edges in these two arborescences as the approximate solution.
Albert et al. [4] showed how to convert any algorithm for MIN-ED with an approximation ratio to an algorithm for critical-MIN-ED with an approximation ratio of 2

3
. They also provided a 2-approximation for MIN-BTR, but in fact, minor modification of their method and analysis as outlined in [13] yields a 5 3 -approximation. Other heuristics for these problems were investigated in [5,6] but none of these heuristics guarantees a better approximation ratio. Table 1 shows a theoretical comparison of running times and approximation ratios of some of the known algorithms for the transitive reduction problems. Unfortunately, a systematic comparative empirical evaluation of these algorithmic approaches is not available in the published literature. However, implementations of several algorithmic approaches on an individual level are available. For example, Kachalo et al. [6] provided a software called NET-SYNTHESIS which used some of the algorithmic approaches described in Sections 3.2 and 3.4, and Milanovíc et al. [17] discussed two meta-heuristic approaches to solve a more general version of the MIN-BTR problem.
On the inapproximability side, Papadimitriou [18] left it as an exercise to show that MIN-ED is NP-hard. Subsequently, Khuller, Raghavachari and Young [11] provided a formal proof of both NP-hardness and APX-hardness of MIN-ED for arbitrary graphs. Motivated by their cycle contraction method in [11], they were interested in the complexity of the problem when there is an upper bound on the length of any cycle in the input graph. In [18] the authors showed that MIN-ED can be solved in polynomial time if = 3, MIN-ED is NP-hard if = 5, and MIN-ED is APX-hard if 17. Reference [13] improved the APX-hardness result to show that both MIN-ED and MAX-ED are APX-hard even when 5. The exact complexity of both MIN-ED and MAX-ED when = 4 is still unresolved. MAX-BTR Berman, DasGupta and Karpinski [13] O(n log n) 2

Review of a Few Algorithmic Techniques Used for Transitive Reduction Problems
In this section, we review a few key algorithmic techniques that have been used in the literature to investigate algorithmic complexities of various versions of the transitive reduction problem. Our goal is not to provide every technical detail involving these methods, but rather to bring our salient features of these techniques in a way that may be understood by the practitioners as well.

From General Graphs to Strongly Connected Graphs
Recall that a digraph (V, E) is strongly connected if and only if, for every pair of nodes u i and u j , both the paths E i j u u and E j i u u exist. A reduction that was originally suggested in [1] and have been implicit in all subsequent works is the assumption that an -approximation algorithm for critical-MIN-ED and critical-MAX-ED when the given graph is strongly connected also implies an -approximation algorithm for the same problem on arbitrary digraphs. To understand why this is true, we first note that all these four problems can be solved easily in polynomial time using the following greedy approach if the input graph G = (V, E) is a directed acyclic graph (DAG) with D E as the set of required edges ( is the standard mathematical symbol of an empty set): Compute a topological ordering u 1 , u 2 , , u n of the nodes of G (* thus, if (u i , u j ) E then i < j *) for i = n, n 1, n 2, , 1 do for j = n, n 1,

Return (V, A) as the solution
It is easy to implement the above algorithm to run in polynomial time. Now, suppose that the input graph G is not a DAG and consider the strong component graph G = C is a strongly connected component of G} (C, i , u j ) E for some u i C and u j } + -approximation algorithm for critical-MIN-ED or critical-MAX-ED on each strongly connected component of G. Then, the union of the edges in this an -approximation for the entire graph G. For MIN-BTR or MAX-BTR Albert et al. [4] provides a more complex reduction to show that an -approximation algorithm for strongly connected graphs also implies an -approximation algorithm for arbitrary digraphs. To achieve this, each strongly connected component is replaced a graph with appropriately such that the resulting graph is a DAG and an -approximation for the entire graph can be recovered using an exact optimal solution of the DAG and -approximations of the strongly connected components.
Thus, for the remainder of this section, we assume without loss of generality that the input graph G is strongly connected.

The Cycle Contraction Method [11]
Consider an input graph G = (V, E) for the MIN-ED problem and suppose that G has a directed Hamiltonian cycle, i.e., a (directed) cycle that contains every node exactly once. Then clearly the edges in this cycle constitute an optimal solution of |V| edges. This intuition suggests a general strategy of repeatedly finding a longest cycle in the given graph, selecting the edges in this cycle and modifying the graph to reflect the selection of edges until we reach a valid solution.
However, finding a directed Hamiltonian cycle or the longest cycle is in general NP-hard [2]. To circumvent the NP-hardness issue, Khuller, Raghavachari and Young in [11] designed the i , v j ) is nothing but the act of merging the two nodes v i and v j into a new single node v ij and deleting any resulting self-loops or multi-edges. Similarly, contraction of a cycle is defined as the contraction of every edge of the cycle; see Figure 2 for an illustration. Note that if c is a constant then one can easily check in polynomial time if a graph has a cycle of at least c edges. The algorithm, parameterized by a constant c > 3 to be chosen by the user, now proceeds as follows: Solve MIN-ED on the reduced graph exactly using the algorithm in [19] and select the edges in this exact solution.  ( 1) ( 1) y c c c c edges.
The above approach can also be applied to critical-MIN-ED by simply adding all the edges from the required set of edges D to the solution. The number of edges z in the resulting solution of critical-MIN-ED satisfies Another possibility outlined in [4] is to replace every required edge (u i , u j ) D by introducing a new node u ij and adding two new edges (u i , u ij ) and (u ij , u j ), running the approximation algorithm for MIN-ED on this new graph, and then replacing the edges (u i , u ij ) and (u ij , u j ) in the solution by the original edge (u i , u j ). If an optimal solution of critical-MIN-ED on G uses edges from E\D then this approach returns a solution (V,A) with

The Arborescence Approach [14]
A (rooted) spanning out-arborescence of a directed edge-weighted graph G = (V, E) is a directed acyclic spanning sub-graph (V, A) of G such that every node except one node (the root) has exactly one incoming edge and the weight of such an out-arborescence is the sum of the weight of its edges. A spanning in-arborescence is defined analogously except that every node except the root has exactly one outgoing edge. An exact polynomial-time solution for computing a spanning in-arborescence or spanning out-arborescence of minimum weight was provided by the authors in [15,16,20]. An overview of this algorithm for computing a minimum weight out-arborescence (as formulated in [16]) is as follows. We first remove all incoming edges to the root v. Then we proceed as follows. First, we select for each node, except the root v, an incoming edge of minimum weight. If these edges do not give a spanning arborescence, then there must be a (directed) cycle C formed by a subset of these edges. Let w(C) = min {w(e)|e C}. We contract the cycle C to a -node, and decrease the weight of every edge (u, v) from a node u C to a node v C by w(C), where is the weight of the unique edge in C that is incoming to v. The process is then repeated on the reduced graph, and continued until we have a spanning arborescence on the remaining graph. The mega-nodes are then expanded in the reverse order. Each time a mega-node is expanded, exactly one of its edges that would produce two incoming edges to a node is discarded. A minimum weight in-arborescence can be computed by the same algorithm if we reverse the direction of all the edges of the input graph. See Figure 3 for an illustration.
For weighted-MIN-ED, Frederickson and JàJà [14] proposed the following simple algorithm that gives a 2-approximation for an input graph G = (V, E): Select an arbitrary node v of G Find a minimum weight spanning in-arborescence (V, A 1 ) of G rooted at v Find a minimum weight spanning out-arborescence (V, A 2 ) of G rooted at v Return (V, A 1 A 2 ) as the solution The above solution is a valid solution since we can reach any node v j starting from any node v i by taking a path from v i to the root v followed by a path from v to the node v j . The solution is a 2-approximation since any valid solution of weighted-MIN-ED includes both a minimum weight spanning in-arborescence and a minimum weight spanning out-arborescence and thus OPT(G) max {|A 1 |, |A 2 |}. A simple example of an input graph was also provided in [14] for which the above algorithm provides a solution to total weight 2OPT(G).
For critical-MIN-ED, a very similar approach as described below can be used to again provide a 2-approximation for an input graph G = (V, E): Albert et al. [4] showed how to modify the above algorithm and combine it with any -approximation algorithm for MIN-ED to obtain an improved algorithm for critical-MIN-ED with an approximation ratio of 2

3
. Currently, the best possible value of is 1.5 which leads to a 5 3 -approximation for critical-MIN-ED using this approach.

From Critical-MIN-ED And Critical-MAX-ED To MIN-ED And MAX-ED [4,13]
The results in [4,13] show how to transform a solution to critical-MIN-ED (respectively, critical-MAX-ED) to a solution to MIN-ED (respectively, MAX-ED) by adding a single edge (We remind the reader that we assume that the input graph is strongly connected.) that can be found in polynomial time. The idea behind this is as follows. We can distinguish our input (and strongly connected) graph G based on whether G = (V, E) has a cycle of parity 1 (double parity graph) or not (single parity graph). Whether G is a single or double parity graph can be easily checked in O(|V| 3 ) time by using a simple modification of the well-known Floyd-Warshall transitive closure algorithm [8] as outlined in [4]. Now we can observe the following: If G is a single parity graph then for every pair of nodes u i , u j V, exactly one of the two the Otherwise, G is a double parity graph. We again first ignore the edge labels and compute a solution (V, A) of critical-MIN-ED (respectively, critical-MAX-ED) on G. Note that (V, A) contains a rooted arborescence, say (V, A 1 ) with A 1 A, rooted at some node u r . We label each node u i V with (u i ) = (P i ) where P i is the unique path in (V, A 1 ) from u r to u i . Since G is a double parity graph, there must exist an edge (u i , u j ) E such that (u i ) (u j ) (u i , u j ), and adding this edge (if not already present) to A produces a valid solution of critical-MIN-ED or critical-MAX-ED for G.

Linear Programming Based Approach [13]
We refer the reader to a standard graduate level textbook such as [21] for basic concepts and definitions related to linear programming and its applications to designing approximation algorithms.
An exponential-size linear programming (LP) formulation for the minimum weight rooted (at node u r ) out-arborescence problem for an edge-weighted input graph G = (V, E) was provided by Edmonds [15] in the following manner. We use a binary indicator variable Edmonds [15] showed that the above LP always has an integral optimal solution (i.e., an optimal solution with x e {0, 1} for all e E) which provides an optimal solution for our minimum weight rooted out-arborescence problem. Note that the above LP has O(2 |V| ) constraints in the worst case.
However, the advantage of such a linear programming is that we can now make use of powerful mathematical tools, such as the duality theorem, from the theory of linear programming. We can modify the above LP formulation to a primal LP formulation P 1 for MIN-ED provided we set w(e) = 1 for all e u r U from the condition in constraint (1). The dual program D 1 of this LP can be constructed by having a variable y U for every U V. Both the primal and the dual LP are written down below for clarity. whose exact solution is in general NP-hard to compute. We will denote this ILP corresponding to P 1 by IP 1 .

Applying LP-Based Approach to Critical-MIN-ED
We provide a high-level overview of the primal-dual approach used in [13] for critical-MIN-ED on an input graph G = (V, E).
1. We start with an initial assignment of values to variables in IP 1 in the following manner. We keep only a subset of constraints of IP 1 such that the resulting ILP can be solved exactly in polynomial time, giving an optimal solution A 1 E. Then, it follows that OPT(G) |A 1 |. 2. However, (V, A 1 ) may not be a valid solution for critical-MIN-ED on G (i.e., IP 1 ). Then, we try to make A 1 a valid solution by adding and/or removing edges so that we use a total of at most 3 2 1 edges where OPT(G) |A 1 |, giving a 3 2 -approximation for critical-MIN-ED. The edge alteration procedure was carried out in [13] using the DFS (depth-first-search) algorithm as originally outlined in a seminal paper by Tarjan (e.g., see the textbook [22]).
The initial solution A 1 referred to above in Step 1 is obtained in the following manner. For U V, define (U) = {(u i , u j ) E:u i U and u j U}. Call a constraint of type e (U) x e 1in IP 1 if for some node u i either (U) ({u i }) or (U) o({u i }). It was shown in [13] that the set of tractable constraints of IP 1 can be found easily and the resulting ILP can be solved exactly using any algorithm that finds a maximum matching in a bipartite graph. Figure 4 shows an example of the initial solution A 1 found by this approach. The DFS-based edge addition/removal method referred to in Step 2 is highly technical with elaborate case analysis and is beyond the scope of this review paper. In a nutshell, difficulties may arise because in some cases the algorithm may be forced to use more than 3 1 2 |A | 1 edges. Then, we -P 1 or dual D 1 to get an improved lower-bound for OPT(G) (i.e., OPT(G) > A 1 ) to ensure that we use at most 3 2 1 edges. In the proof we need to crucially use the weak-duality theorem of linear programming which states that if OPT(P 1 ) and OPT(D 1 ) are the objective values of an optimal solution of P 1 and D 1 , respectively, then OPT(P 1 ) OPT(D 1 ).

Applying LP-Based Approach to Critical-MAX-ED
We provide an overview of the 2-approximation algorithm for critical-MAX-ED on an input graph G = (V, E) using a LP-based approach as described in [13]. Call an edge e E a necessary edge if either e D or e} for some U V and let F be the set of necessary edges. If the edges in F provide a valid solution of critical-MAX-ED on G then (V, F) provide us with an optimal solution, thus assume that this is not the case below. In this case, e (U) = 0 for some U V, so there must be a node u r such that no edges in F enter u r . As a pre-processing step, we repeatedly contract a cycle of necessary edges until no such cycles remain. Let OPT in-arb (G) be the total weight of a minimum-weight in-arborescence of G rooted at u r . Consider the LP formulation for the minimum weight rooted out-arborescence problem as defined before:

Limitations of LP-Based Approaches
A standard way of understanding the limitations of any LP-based approach for designing approximation algorithms is to measure the integrality gap, i.e., the ratio of the objective value of an optimal integral solution to that of an optimal fractional solution for a minimization problem and the ratio of the objective value of an optimal fractional solution to that of an optimal integral solution for a minimization problem [21]. In [13] it was shown that the integrality gap for P 1 was at least 4 3 by giving an explicit construction of an input graph for which this ratio is achieved. The same input graph also shows that the integrality gap for the modification of P 1 corresponding to MAX-ED is at least 3 2 .

Biological Applications
In this section, we discuss three applications of transitive reduction problems in computational biology and bioinformatics. For other non-biology applications of transitive reduction problems, such as in visualization of Enron email networks or in connectivity issues of computer networks, the reader may consult appropriate references such as [11,23].
We briefly review the standard regulatory network model that was mentioned in Section 1.1.3 in connection with the MIN-BTR and MAX-BTR problems. A regulatory network is described by an edge-labelled directed graph G = (V, E) in which nodes represent individual components of the biological system and (directed) edges of the form (u i , u j ) indicates that node u i has an influence on node u j . The edge E { 1, 1} indicates the nature of the causal relationship, with (u, u j ) = 1 (u i , u j ) 1 indicating that u i has an excitatory (positive) and inhibitory (negative) influence on u j , respectively; pictorially, it is quite common to denote an excitory and an inhibitory edge by and , respectively. This representation applies to both gene regulatory networks (describing the regulation of gene transcription and related processes) and signal transduction networks (describing the information flow from external signals to within-cell components). Some examples of large size biological networks include: Mammalian network of signaling pathways and cellular machines in the hippocampal CA1 neuron having 512 nodes and 1,047 edges [24]. S. cerevisiae transcriptional regulatory network of interactions between transcription factor proteins and genes having 690 nodes and 1,082 edges [25]. C. elegans metabolic network having 651 nodes and 2,040 edges [26]. Oriented version of an unweighted PPI network constructed from S. cerevisiae interactions in the BioGRID database having 786 nodes and 2,453 edges [27].
Existence of such large networks rules out exact brute-force calculations of optimal solutions of transitive reduction problems and provides motivations to explore approximation algorithms for these problems.

Network Construction and Simplification from Direct and Double-Causal Data
Signal transduction and gene regulatory networks are crucial to the maintenance of cellular homeostasis and for cell behavior such as growth, survival, apoptosis, and movement. Deregulation of these networks is a key contributor to many disease processes such as developmental disorders, diabetes, vascular diseases, and cancer. In a signal transduction network (pathway), there is typically an input, perceived by a receptor, followed by a series of elements through which the signal percolates to the output node, which represents the final outcome of the signal transduction process. For a cellular signal transduction pathway not involving alterations in gene expression, elements often consist of proteinaceous receptors, intermediary signaling proteins and metabolites, effector proteins, and a final output, which represents the ultimate combined effect of the effector proteins. If the signal transduction process includes regulation of the transcript level of a particular gene, the intermediate signaling elements will also include the gene itself and the transcription factors that regulate it, as well undance, with the final output being presence or absence of transcripts. Genome-wide experimental methods now identify interactions among thousands of proteins [28 34]. However, the state of the art understanding of many signaling processes is often limited to the knowledge of key mediators and of their positive or negative effects on the whole process. The experimental evidence about the involvement of specific components in a given signal transduction network frequently belongs to one of these two categories: biochemical evidences that provide information on enzymatic activity or protein-protein interactions and represent direct physical interactions. An interaction of this type is , and is represented in the usual manner by a directed edge A B and A B, respectively. Edges corresponding required edges.
(ii) Putative interaction patterns that arise, for example, during differential responses to a stimulus, which in a wild-type organism versus a mutant organism implicates the product of the mutated gene in the signal transduction process. This type of interaction pattern is not a direct interaction but rather corresponds to an indirect (double-causal) relationship most likely resulting from a chain of direct interactions and reactions, and is a 3-component inference represented by a small-size sub-graph among three or four nodes.
As noted above, inference of type (ii) may not give direct interactions but indirect causal relationships that correspond to reachability relationships in the unknown interaction network for which the MIN-BTR and MAX-BTR problems become directly applicable. More precisely, inferences of type (ii) typically lead to double-process through which A to an intersection of two paths (one path from A to B and another path from C to B) in the interaction network (i.e., C is assumed to activate an unknown intermediary node of the A to B path).
The research works in [5 7] led to the development of an efficient and accurate method incorporating all relevant biological knowledge for synthesizing path-level information into a consistent network by constructing a minimal graph that maintains all reachability relationships without requiring expression information (unlike, say, many reverse-engineering approaches). Methods prior to [5 7] for synthesizing signal transduction networks, such as [28], only included direct biochemical interactions and were therefore restricted by the incompleteness of the experimental knowledge on pairwise interactions. Key steps in the network synthesis method developed in [5 7] are schematically shown in Figure 5. The first step is a distillation of experimental conclusions into qualitative regulatory relations between cellular components (This is a complex process by itself. It is important to note that human intervention will inevitably be an important component of the literature curation process even though automated text search engines such as GENIES [32] become more and more popular) incorporated as a directed edge (A, B). Other kind of double-causal evidences (such as genetic evidences of differential responses to a stimulus) are handled in the third step in the schematic diagram. For the sake of concreteness, assume that such a double-causal interaction is of the form -causal interaction may correspond to a direct interaction is if C is an enzyme catalyzing a reaction in which A is transformed into B, and for this case the interaction can be represented as both A (the substrate) and C (the enzyme) activating B (the product), i.e., by two edges A B and C B. If the interaction between A and B is direct and C is not a catalyst of the interaction between A and B, we can assume that C activates A. In all other cases, this type of interaction corresponds to an intersection of two paths and elsewhere since they are added only to satisfy the pathway properties). One important algorithmic idea in this network synthesis method is that of finding a minimal (Intuitively, by computing a minimal experimental observations. Implicit assumption of chain-like or tree-like topologies permeates the traditional molecular biology literature, e.g., signal transduction and metabolic pathways are assumed to be close to linear chains and genes are assumed to be regulated by one or two transcription factors [33].) network, in terms of number of non-critical edges (i.e., edges not in D), that is consistent with all (directed) reachability relationships between nodes, and is captured by the MIN-BTR and MAX-BTR problem discussed earlier. For further details, see [5 7]. A software named NET-SYNTHESIS incorporating the method shown in Figure 5 using some of the algorithmic ideas described for MIN-BTR and MAX-BTR in Section 3 was first reported in [5,6] and is freely available for download. The input to NET-SYNTHESIS is a list of relationships among biological components (direct and double causal), and its output is a network diagram and a text file with the edges of the signal transduction network. Figure 5. A schematic diagram of the network synthesis method in [5 7]. Human interaction is necessary since some choices may have to be made in distilling the component relationships, e.g., when there are conflicting reports in the literature.

Applications in Agronomic Research
Guard cells are central components in control of plant water status [34] and better understanding of their regulation is imperative for the goal of engineering of crops with improved drought tolerance. Plants both lose water and take in carbon dioxide through microscopic stomatal pores, each of which is regulated by a surrounding pair of guard cells. During drought, the plant hormone abscisic acid (ABA) inhibits stomatal opening and promotes stomatal closure, thereby promoting water conservation. ABA signal transduction in guard cells is one of the best characterized signaling systems in plants with many signal transduction proteins, secondary metabolites and ion channels having been identified to participate in the process [35 37].
The research works in [5,6] used the NET-SYNTHESIS software to generate a network for ABA-induced closure from is a list of about 140 interactions and causal inferences for ABA-induced closure published in Table S1 and Text S1 in [38]. A detailed comparison of this computer generated network with a manually curated network for ABA-induced closure published in [38] validated the accuracy of the algorithms for MIN-BTR used in the software.

Analyzing Disease Networks (Biomedical Application)
Large Granular Lymphocytes (LGL) are medium to large size cells with eccentric nuclei and abundant cytoplasm. In normal adults, LGL comprise 10%~15% of the total peripheral blood mononuclear cells. The disease LGL leukemia is a disordered clonal expansion of LGL and their invasions in the marrow, spleen and liver. Ras is a small GTPase, which is essential for controlling multiple essential signaling pathways, and its deregulation is frequently seen in human cancers. Activation of H-Ras required its farnesylation, which can be blocked by farnesyltransferase inhibitiors (FTIs). This envisions FTIs as future drug target for anti-cancer therapies. One of these FTI is tipifarnib which shows apoptosis induction effect to leukemic LGL in vitro. This observation, together with the finding that Ras is constitutively activated in leukemic LGL cells, leads to the hypothesis that Ras plays an important role in LGL leukemia, and may function through influencing Fas/FasL pathway.
Kachalo et al. in [6] used the NET-SYNTHESIS software together with its specific transitive reduction algorithms to synthesize a cell-survival/cell-death regulation related signaling network from the Transpath 6.0 database with additional information manually curated from literature search, having 359 nodes representing proteins/protein families and mRNAs participating in pro-survival and Fas-induced apoptosis pathways and 1,295 edges representing regulatory relationships between nodes, including protein interactions, catalytic reactions, transcriptional regulation and known double-causal regulations. Using MIN-BTR and other algorithms, they were able to reduce the size of the original network to 267 nodes and 751 edges to focus special interest on the effect of Ras on apoptosis response through Fas/FasL pathway that involve the 33 known T-LGL deregulated proteins. Further work in this direction was done by Zhang et al. in [39] in building and analyzing a network model of signaling components of survival of cytoxic T lymphocytes in LGL-leukemia using the NET-SYNTHESIS software.
For further applications of transitive reduction problems to drug target identification, see [40].

Measuring Topological Redundancy of Biological Networks
The concept of redundancy is well known in information theory. Informally, redundancy refers to identical elements performing the same function (There are also other definitions of the redundancy concept in the context of other biological applications that is completely different from ours. For example, in some context redundancy refers to paralogous genes that provide functional backup for one another [41]). In computer networks and electronic systems, such measures are useful in analyzing properties such as fault-tolerance. It is an accepted fact that biological networks do not necessarily have the lowest possible degeneracy or redundancy. For example, the connectivity of neurons in brains suggests a high degree of degeneracy [42]. As Tononi, Sporns and Edelman observed in [43], a specific and useful notion of redundancy has yet to be firmly incorporated into biological thinking, often because of the lack of a suitable formal theoretical framework. A further reason for the lack of incorporation of these notions in biological thinking is the lack of computationally efficient procedures for computing these measures for large-scale networks even when formal definitions are available. Therefore, such studies are often done in a somewhat ad-hoc fashion, such as in [44]. There are notions of redundancy available in the field of analysis of undirected graphs based on clustering coefficients [45] or betweenness centrality measures [46]. However, such notions are not appropriate for the analysis of biological networks where we must distinguish positive from negative regulatory interactions or where we wish to study possible relationships of the dynamics of the network with its redundancy.
Based on the MIN-BTR and MAX-BTR problems, Albert et al. in [47] proposed a new combinatorial measure of redundancy that is amenable to efficient algorithmic analysis. Note that binary transitive reduction of a graph (V, E) does not change pathway level information of the network and removes an edge from one node u i u j or u i u j only when a similar alternate pathway, namely u i

R=1
, where the |E| term in the denominator is simply a -max of the measure to ensure that 0 < R < 1. Note that the higher the value of R is, the more redundant the network is. Since MIN-BTR or MAX-BTR can be computed efficiently, Albert et al. were able to evaluate R on a variety of large biological and directed social networks to derive interesting conclusions such as transcriptional networks are less redundant than signaling networks, directed social networks are more redundant than biological networks, the topological redundancy of the C. elegans metabolic network is largely due to its inclusion of currency metabolites and the redundancy of signaling networks is highly (negatively) correlated with the monotonicity of their dynamics.

Conclusions
In this review paper, we have elaborated on a few graph-theoretic problems that involve finding an design efficient computational methods to solve these problems and then provided details of three biological applications of these problems. The idea of transitive reductions, in a more simplistic setting or in a different form, has also been used to identify structure of gene regulatory networks [ 48 52]. et al. [52], that is in some sense an inverse of the transitive reduction problems studied in this paper: their goal was to infer the original network given a set of direct (edge-level) and indirect (pathway-level) information about the graph. The authors in this paper showed that an exact closed-form solution of this problem can be found using an infinite-series summation. We hope that our review will lead to further interests in transitive reduction type problems and will promote further collaboration between the computational biology and the graph algorithms community.