Adding Edges for Maximizing Weighted Reachability †

: In this paper, we consider the problem of improving the reachability of a graph. We approach the problem from a graph augmentation perspective, in which a limited set size of edges is added to the graph to increase the overall number of reachable nodes. We call this new problem the Maximum Connectivity Improvement (MCI) problem. We ﬁrst show that, for the purpose of solve solving MCI, we can focus on Directed Acyclic Graphs (DAG) only. We show that approximating the MCI problem on DAG to within any constant factor greater than 1 − 1 / e is NP -hard even if we restrict to graphs with a single source or a single sink, and the problem remains NP -complete if we further restrict to unitary weights. Finally, this paper presents a dynamic programming algorithm for the MCI problem on trees with a single source that produces optimal solutions in polynomial time. Then, we propose two polynomial-time greedy algorithms that guarantee ( 1 − 1 / e ) -approximation ratio on DAGs with a single source, a single sink or two


Introduction
The problem of improving the reachability of a graph is an important graph-theoretical question, which finds applications in several areas. There are several recent possible application scenarios for this problem, for example suggesting friends in a social network in order to increase the spreading of information [1][2][3], reducing the convergence time of random walk processes to perform faster network simulations [4,5], improving wireless sensor networks resilience [6] and even control elections in social networks [7,8].
In this paper, we approach the problem from a graph augmentation perspective, which is the problem of adding a set of non-existing edges to a graph to increase the overall number of reachable nodes. In traditional graph theory, this problem was first studied by Eswaran and Tarjan [9]. They considered the problem of adding a minimum-cost set of edges to a graph such that the resulting graph satisfies a given connectivity requirement, e.g., to make a directed graph strongly connected or to make an undirected graph bridge-connected or biconnected. In their seminal paper, Tarjan et al. presented linear time algorithms for many graph augmentation problems and proved that some variants are instead NP-complete.
More recently, several optimisation problems related to graph augmentation have been addressed. Several papers in the literature deal with the problem of minimising the eccentricity of a graph by adding a limited number of new edges [10][11][12][13].
The problem of minimising the average all-pairs shortest path distance-characteristic path length-of the whole graph was studied by Papagelis [5]. The author considered the problem of adding a small set of edges to minimise the characteristic path length and proves that the problem is NP-hard. He proposed a path screening technique to select the edges to be added. It is worth noting that the objective function in [5] does not satisfy the submodularity property, and so good approximations cannot be guaranteed via a greedy strategy. The problem of adding a small set of links to maximise the centrality of a given node in a network has been addressed for different centrality measures: page-rank [4,14], average distance [15], harmonic and betweenness centrality [16,17] and some measures related to the number of paths passing through a given node [18]. It is worth noting that many traditional graph problem can also be applied to dynamic graphs. Graph problems close to ours studied in the dynamic setting include connectivity [19], minimum spanning tree [20] and reachability [21,22].
In this work, which is based on and extends previous preliminary research in this direction [23], we study the problem of adding at most B edges to a directed graph in order to maximise the overall weighted number of reachable nodes, which we call the Maximum Connectivity Improvement (MCI) problem. The rest of the paper is organised as follows. We first show that we can focus on Directed Acyclic Graphs (DAG) without loss of generality (Section 2). Then, we focus on the complexity of the problem (Section 3) and we prove that the MCI problem is NP-hard to approximate to within a factor greater than 1 − 1 e . This result holds even if the DAG has a single source or a single sink. Moreover, the problem remains NP-complete if we further restrict to the unweighted case. In Section 4, we give a dynamic programming algorithm for the case in which the graph is a rooted tree, where the root is the only source node. In Section 5, we present a greedy algorithm which guarantees a (1 − 1 /e)-approximation factor for the case in which the DAG has a single source or a single sink. As a first step in the direction of extending our approach to general DAGs, i.e., multiple sources and multiple sinks, in Section 6, we tackle the special case of DAGs with two sources providing a constant factor approximation algorithm to solve the problem. We end with some concluding remarks in Section 7. Compared to the conference version, we add the results in Section 6 on DAGs with two sources and we revise the paper in order to improve readability, by adding examples and detailed proofs.

Preliminaries
Let G = (V, E) be a directed graph. Each node v ∈ V is associated with a non-negative weight w v ∈ N ≥0 and a profit p v ∈ N ≥0 . Given a node v ∈ V, we denote by R(v, G) the set of nodes that are reachable from v in G, that is R(v, G) = {u ∈ V : ∃ path from v to u in G}. Moreover, we denote by p(R(v, G)) = ∑ u∈R(v,G) p u the sum of the profits of the nodes reachable from v in G. In the rest of the paper, we also use the form p(R(v, G) \ R(u, G)) = ∑ u∈R(v,G)\R(u,G) p u to denote the sum of the profits of the nodes in G that are reachable from v, but not from u. Note that, in the case . Given a set S of edges not in E, we denote by G(S) the graph augmented by adding the edges in S to G, i.e., G(S) = (V, E ∪ S). Let R(v, G(S)) and p(R(v, G(S))) be, respectively, the set of nodes that are reachable from v in G(S) and the sum of the profits of the nodes in R(v, G(S)). Note that, augmenting G, the connectivity cannot be worse, and thus: R(u, G) ⊆ R(u, G(S)). Let f (G) = ∑ v∈V w v p(R(v, G)) be a weighted measure of the connectivity of G. When weights and profits are unitary, f (G) represents the overall number of connected pairs in G.
In this paper, we aim to augment G by adding a set S of edges of at most size B, i.e., |S| ≤ B and B ∈ N ≥0 , that maximises the weighted connectivity of f (G(S)). We call this problem the Maximum Connectivity Improvement (MCI) problem because maximising f (G(S)) is the same as maximising f (G(S)) − f (G). Formally, Definition 1 (Maximum Connectivity Improvement). Given a graph G = (V, E), given for each node v ∈ V a weight w v ∈ N ≥0 and a profit p v ∈ N ≥0 and given a budget B ∈ N ≥0 , we want to find a set of edges S * such that S * = arg max S⊆V×V:|S|≤B f (G(S)).
From now on, for simplicity, we omit from the notations the original graph G. Thus, we simply use R(v) and R(v, S) to denote R(v, G) and R(v, G(S)), respectively. Similarly, we simply denote with f and f (S) the value of the weighted connectivity in G and in G(S), respectively.
At first, we show how to transform any directed graph G with cycles into a Directed Acyclic Graph (DAG) G = (V , E ) and how to transform any solution for G into a feasible solution for G.
Graph G = (V , E ) has as many nodes as the number of strongly connected components of G, i.e., there is one vertex in V for each strongly connected component of G. Specifically, G selects one representative node for each strongly connected component of G and G adds one directed edge between two nodes u and v of G if there is a directed edge in G connecting any vertex of the strongly connected component represented by u with any vertex of the strongly connected component represented by v . Graph G is called condensation of G and can be computed in O(|V| + |E|) time by using Tarjan's algorithm which consists in performing a DFS visit [24].
The weight and the profit of a node v in G is given by the sum of the weights and profits of the nodes of G that belong to the strongly connected component C v that is represented by v , i.e., Note that, when the profits are unitary in G, p v = 1 ∀v ∈ V, then p v is equal to the size of the strongly connected component C v associated to v .
Since the condensation preserves the connectivity of G, the following lemma can be proved: Given a graph G and its condensation G , it yields: f (G ) = f (G).
Proof. First, consider two nodes u and v that belong to the same strongly connected component C v in G . Clearly, R(u, G) = R(v, G). Moreover, it holds that p(R(v, G)) = p(R(v , G )) because R(v , G ) contains one node for each different strongly connected component in R(u, G) and thus: Denoting C v the strongly connected component represented by v , we have: Given a solution S for the MCI problem in G , we can build a solution S with the same value for the MCI problem in G as follows: For each edge (u , v ) in S , we add an edge (u, v) in S, where u and v are two arbitrary nodes in the connected component corresponding to u and v , respectively.
This derives from the fact that applying the condensation algorithm to G ∪ S or to G ∪ S we obtain the same condensed graph, say G . From Lemma 1, we can conclude that Observe that, if we add an edge e within the same strongly connected component in G, we do not add any edge to G . Since the condensation G of (G ∪ {e}) is the same as G , we have f (G ∪ {e}) = f (G ) = f (G). Hence, we can assume, without loss of generality, that any solution to MCI in G does not contain any edge within a unique strongly connected component since such edge does not improve the objective function. As a consequence, in the remainder of the paper, we assume that the graph is a DAG.
Given a DAG, we can distinguish between three kinds of nodes: sources, nodes with no incoming edges; sinks, nodes with no outgoing edges; and the rest of the nodes. The next lemma allows us to focus on solutions that contain only edges connecting sink nodes to source nodes. In the remainder of the paper, we use this property to derive algorithms to solve both the simple case with binary trees and the more general case with DAGs (with single or double source and single sink).

Lemma 2.
Let S be a solution to the MCI problem, then there exists a solution S such that |S| = |S |, f (S) ≤ f (S ), and all edges in S connect sink nodes to source nodes.
Proof. We show how to modify any solution S in order to find a solution S that contains only nodes that connect sink nodes to source nodes with the same cardinality and such that f (S) ≤ f (S ). To obtain S , we start from S and we repeatedly apply the following modifications to each edge (u, v) of S such that u is not a sink or v is not a source: 1.
If u is not a sink, then there exists a path from u to some sink u and we swap edge (u, v) with edge (u , v). The objective increases at least by the sum of the weights on a path from u to u . Namely, after adding the edge (u , v), any node z on the path from u to u now reaches v passing through u . Note that the objective function does not decrease and, instead, may increase due to the fact that the nodes z now are able to reach the node v.

2.
If v is not a source, then there exists a path from a source v to v and we swap edge (u, v) with edge (u, v ). The objective function does not decrease and increases at least by the number of nodes in a path from v to v multiplied by w u .
Note that, in both cases the gain of a node on the path we are extending can be zero if it was already able to reach the source/sink from another edge in the solution. Since we have neither added nor removed any edge, the cardinality of the new solution S is equal to the previous solution S. Furthermore, since the objective function can only increase by modifying the edges according to the rules described above, we have that f (S) ≤ f (S ).

Hardness Results
In this section, we first show that the MCI problem is NP-complete, even in the case in which all the weights and profits are unitary and the graph contains a single sink node or a single source node. Then, we show that it is NP-hard to approximate MCI to within a factor greater than 1 − 1 e . This last result holds also in the case of graphs with a single sink node or a single source node, but not in the case of unitary weights. Theorem 1. MCI is NP-complete, even in the case in which all the weights and profits are unitary and the graph contains a single sink node or a single source node.
Proof. We consider the decision version of MCI in which all the weights and profits are unitary (i.e., w v = p v = 1): Given a directed graph G = (V, E) and two integers M, B ∈ N ≥0 , the goal is to find a set of additional edges S ⊆ (V × V) \ E such that f (S) ≥ M and |S| = B. The problem is in NP since it can be checked in polynomial time if a set of nodes S is such that f (S) ≥ M and |S| = B. We reduce from the Set Cover (SC) problem which is known to be NP-complete [25]. Consider an instance of the SC problem I SC = (X, F, k) defined by a collection of subsets F = {S 1 , . . . , S m } for a ground set of items X = {x 1 , . . . , x n }. The problem is to decide whether there exist k subsets whose union is equal to X. We define a corresponding instance I MCI = (G, M, B) of MCI as follows: See Figure 1 (left, top) for an example. Note that G is a DAG. By Lemma 2, we can assume that any solution S of MCI contains only edges (v, v S i ) for some S i ∈ F. In fact, v is the only sink node and v S i are the only source nodes. Assume that there exists a set cover F ; then, we define a solution S to the MCI instance as Indeed, all the nodes in G can reach: node v, all the nodes v x j (since F is a set cover), and all the nodes v S i such that S i ∈ F . Moreover, each node v S i such that S i ∈ F can reach itself. Therefore, there are n + B + 1 nodes that reach n + B + 1 nodes and m − B that reach n + B + 2 nodes, that is f (S) = M. On the other hand, assume that there exists a solution for MCI; then, S is in the form {(v, v S i )|S i ∈ F} and we define a solution for the set cover as We show that F is a set cover. By contradiction, if we assume that F is not a set cover and it cover only n < n elements of X, then f (S) = (n + B + 1) 2 + (n − n + m − B)(n + B + 2) < M. Note that, in the above reduction, the graph G has a single sink node. We can prove the NP-hardness of the case of graphs with a single source node by using the same arguments on an instance of MCI made of the inverse graph of G, M = (B + n + 1)(n + m + 1) + m − B, and B = k (see Figure 1 (left, bottom) for an example). Theorem 2. MCI is NP-hard to approximate to within a factor 1 − 1 e + , for any > 0, even if graph contains a single sink node or a single source node.
Proof. We give two approximation factor preserving reductions from the Maximum Coverage problem (MC), which is known to be NP-hard to approximate to within a factor greater than 1 − 1 e [26]. The MC problem is defined as follows: given a ground set of items X = {x 1 , . . . , x n }, a collection of subsets F = {S 1 , . . . , S m } of subsets of X, and an integer k, find k sets in F that maximise the cardinality of their union.
We first focus on the single sink problem. Given an instance of the MC problem I MC = (X, F, k), we define an instance of the (maximisation) MCI problem I MCI = (G, k) similar to the one used in Theorem 1, but where we modify the weights and add Y paths of one node between each v x j and v, where Y is an arbitrarily high number (polynomial in n + m).
In detail, I MCI is defined as follows: See Figure 1 (right, top) for an example. We first show that there exists a solution F ⊆ F to I MC that covers n elements of X if and only if there exists a solution S to I MCI such that f (S) = n (Y + 1) + B + 1. Moreover, we can compute F from S and vice versa in polynomial time. Indeed, given F , we define S as nodes v x j corresponding to the items x j covered by F , the B nodes v S i it is connected to, and itself. On the other hand, given a solution S to I MCI , by Lemma 2, we can assume that it has only edges from v to nodes v S i . Let n be the number of nodes v If OPT(I MC ) and OPT(I MCI ) denote the optimum value for I MC and I MCI , respectively, then OPT(I MCI ) ≥ OPT(I MC )(Y + 1) + B + 1 ≥ Y · OPT(I MC ). Moreover, given the above definition of S and F , then for any > 0 there exists a value of Y = O(poly(n + m)) such that f (S) ≤ (n + )Y.
Let us assume that there exists a polynomial-time algorithm that guarantees an α approximation for I MCI , then we can compute a solution S such that f (S) ≥ αOPT(I MCI ). It follows that: where n is the number of nodes covered by the solution F to MC obtained from S. Therefore, we obtain an algorithm that approximates the MC problem with a factor α (up to lower order terms). Since it is NP-hard to approximate to within a factor greater than 1 − 1 e [26], then the statement follows. Let us now focus on the single source case. Given I MC , we define I MCI = (G, B) as follows: Y is an arbitrarily high polynomial value in m + n. See Figure 1 (right, bottom) for an example. We use similar arguments as above. In detail, there exists a solution F ⊆ F to I MC that covers n elements of X if and only if there exists a solution S to I MCI such that f (S) = n (n + m − m + Y) + n(m + 1), where m is the number of sets in F that do not cover any of the n elements covered by F . Moreover, we can compute F from S and vice versa in polynomial time. Given F , we define S as S = {(v, v S i )|S i ∈ F } and we can verify that f (S) = ∑ x j ∈X p(R(v x j , S)) = ∑ x j ∈X |R(v x j , S)| = n (n + m + Y + 1) + (n − n )(m + 1) = n (n + m − m + Y) + n(m + 1) and |S| = B. Given S, if n is the number of nodes v x j such that v ∈ R(v x j , S), then f (S) = n (n + m − m + Y) + n(m + 1) and F = {S i |(v S i ,v ) ∈ S} covers n elements in X.
As above, we can show that OPT(I MCI ) ≥ Y · OPT(I MC ) and that there exists a value of Y = O(poly(n + m)) such that f (S) ≤ (n + )Y, for any > 0. Then, the statement follows by using the same arguments as above.

Polynomial-Time Algorithm for Trees
In this section, we focus on the case of directed weighted rooted trees in which the root of the tree is the only source node and all the edges are directed towards the leaves. We give a polynomial-time algorithm based on dynamic programming that focuses on the special case of binary trees and requires O(|V|B 2 ) time and O(|V|B) space. Moreover, exploiting Lemma 2, the algorithm focuses only on edges that connect leaves to the root. We then extend our result and give an algorithm to transform any tree into a binary tree that requires O(|V|) time and O(|V|) space. Using this transformation, each solution for the transformed instance has the same value as the corresponding solution in the original instance.

Binary Trees
In the following, we introduce our dynamic-programming algorithm to solve the MCI problem in binary trees. Let us consider an instance of the MCI problem with a directed weighted binary tree T = (V, E), where all the edges are directed towards the leaves, the root r ∈ V is the only source node, and w : V → N ≥0 , p : V → N ≥0 . Let us denote by ψ(v) (left child) and δ(v) (right child) the children of node v ∈ T; moreover, we denote as T(v) the sub-tree rooted at v.
Let us first note that given a node v, and given a solution S v that connects some leaves of T(v) to r. The gain of solution S v in T(v) is simply the increase in weighted reachability of some of the nodes in T(v) that can be written as ∑ u∈T(v) w u (p(T(u, S v )) − p(T(u))). Note that, given a node u ∈ T(v), if in the subtree of u is not present one of the edges of S v , then the value p(T(u, S v )) − p(T(u)) is equal to zero since T(u, S v ) = T(u). On the other hand, if at least one edge is present, then, T(u, S v ) = T(r) The algorithm that we propose computes a solution that connects b leaves of T(v) to r and maximises the gain in T(v) for each node v ∈ V and for each budget b = 0, 1, . . . , B. Formally, we define g(v, b) as the maximum gain in T(v) achievable by adding at most b edges from b leaves of T(v) to node r. Note that g(v, 1) ≤ g(v, 2) ≤ . . . ≤ g(v, b). We then now show how to compute g(v, b) for each node v and for each budget b = 0, 1, . . . , B by using a dynamic programming approach. For each leaf v ∈ T and for each b = 1, 2, . . . , B, we have that g(v, b) = w v · (p(T(r)) − p(T(v))), that is, the sum of profits p of the new nodes that v can reach thanks to the new edge (v, r). Moreover g(v, 0) = 0 for each v ∈ V. Then, the algorithm visits the nodes in T in post order. For each internal node v we compute g(v, b) by using the solutions of its sub-trees, i.e., T(ψ(v)) and T(δ(v)).
Let us assume that we have computed g(ψ(v), b) and g(δ(v), b), for each b = 0, 1, . . . , B. Recall that, if a solution adds an edge between any leaf of T(v) and r, then the gain of node v is w v (p(T(r)) − p(T(v))) since v now reach all the nodes in T. This gain is independent of the number of edges that are added from the leaves of T(v) to r. In fact, given g(ψ(v), b l ) as the maximum gain for T(ψ(v)) and budget b l ∈ {1, 2, . . . , B}, the gain in T(v) of a solution that connects b l leaves of T(ψ(v)) to r is equal to g(ψ(v), b l ) + w v · (p(T(r)) − p(T(v))). Similarly, the gain in T(v) of the solution that connects, for some b r ∈ {1, 2, . . . , B}, b r leaves of T(δ(v)) to r is equal to g(δ(v), b r ) + w v · p(T(r)) − p(T(v)).
Then, once we have decided how many edges to add in ψ(v) and in δ(v) for g(v, b), we increase the reachability function of the same quantity, i.e., w v · (p(T(r)) − p(T(v))).
Hence, g(v, b) is given by the combination of b l and b r such that the sum is equal to the considered budget, i.e., b l + b r = b, that maximises the sum g(ψ(v), b l ) + g(δ(v), b r ) + w v · (p(T(r)) − p(T(v))). Precisely: The optimal value of the problem f (S) = g(r, B) + f (T), where f (T) is the value of the objective function on T (i.e., when no edges have been added). The pseudo-code of the algorithm is reported in Algorithm 1.

Theorem 3. Algorithm 1 finds an optimal solution for MCI if the graph is a binary tree.
Proof. Let us assume by contradiction that v and b are, respectively, the first node and the first budget for which Algorithm 1 computes a non-maximum gain at Line 9 of Algorithm 1, that is is the maximum gain for tree T(v) and budget b. Let S * be an optimal solution that achieves g * (v, b) and let S * l , S * r be the edges in S * that starts from leaves in T(ψ(v)) and T(δ(v)), respectively. Let b * l = |S * l | and b * r = |S * r |. Then, the gain of the optimal solution S * is: by hypothesis g(v, b) is the first time for which Algorithm 1 does not find the maximum gain and since the cost (p(T(r)) − p(T(v))) · w v does not depend on the edges selected in the left and right sub-trees of v, this implies that at Line 9 Algorithm 1 must select For each node v, the algorithm computes B + 1 values of function g. Therefore, the variables g(v, b) can be seen for example as a matrix of dimension |V| × (B + 1). For each entry of the matrix, we need to compute the maximum among B + 1 gains (see Equation (1)) because the number of budgets that we need to try to combine to find the solution is ( B+2−1 B ) = B + 1. Thus, it follows that Algorithm 1 takes O(|V|B 2 ) time. Note that B ∈ O(|V|) because we limit the new edges to be of the form leaf-root. Moreover, the space complexity is O(|V|B).

General Trees
In the following, we present an algorithm that requires O(|V|) time and space to transform any generic rooted tree T = (V, E) into an equivalent binary tree T = (V ∪ U, E ), following a tree transformation proposed in [27] by adding at most |V| − 3 dummy node.
Given a generic rooted tree T = (V, E), let us transform it into a rooted binary tree T = (V ∪ U, E ) with weights w , p by adding a set of dummy nodes U as follows:

1.
Let the root r of T be the root of T .

2.
For each non-leaf node v, let v 1 , v 2 , . . . v l be the children of v: See Figure 3 for an example of the transformation. Note that p(R(v, T )) = p(R(v, T)) for any node v in T due to the fact that the added dummy nodes have p v = 0; moreover, dummy nodes do not increase the objective function because they have the weight set to zero, i.e., any dummy node v will have w v · p(R(v, T )) = 0.
For each node v ∈ T and solution S to MCI in T , let f (S) = ∑ v∈V w v p(T(v, S)). It is easy to see that by applying Algorithm 1 to T we obtain an optimal solution with respect to f . Moreover, for each solution S to T , f (S) = f (S). Note that a solution S for T that connects leaves of T to its root is a feasible solution also for T since T and T have the same root and leaves. v

Polynomial-Time Algorithm for DAG with a Single Source or a Single Sink
In this section, we focus on the case of weighted DAGs in which we have a single source node or a single sink node. We first describe our greedy algorithm to approximate MCI on DAGs with a single source. Then, we show how to modify the algorithm for the case of DAGs with a single sink.
In the case of a single source, by using the property of Lemma 2, we restrict our choices to the edges that connect sinks nodes to the source. Let us denote S as the set of edges added to G. With a little abuse of notation, we also use S to denote the set of sinks from which the edges in S start. Note that no information is lost in this way, since we have a DAG with a single source, at most one edge is added for each sink. In fact, if a second edge is added to a sink, this edge would not bring any increase to the objective function.
The Greedy algorithm for MCI on DAGs with a single source (see Algorithm 2), starts with an empty solution S = ∅ and repeatedly adds to S the edge e that maximises the function f (S ∪ {e }). The edge e is chosen from the set E of edges (u, s), where u is a sink in V, not already inserted in S, and s is the single source in V (see Lines 2 and 4).
To implement the Greedy Algorithm with a single source, some preprocessing is required. First, for each node v ∈ V, we perform a DFS visit on G to compute R(v) and p(R(v)). We store p(R(v, S)) in a vector ρ of size |V|. Every time a new edge is added to the solution S, each entry of ρ is updated in constant time because for each node v, p(R(v, S)) is either equal to p(R(v)) or p(R(s)), as we explain below. To compute the gain of adding the edge e = (u, s), we need to find all the nodes R T (u, S) that reach u in G(S). R T (u, S) is computed by performing a DFS visit starting from u on the reverse graph G T (S) of G(S). Note that the reverse graph G T of G is initially computed in a preprocessing phase in O(|V| + |E|) time, and after every new edge is added to S, G T (S) is updated in constant time. Finally, to compute f (S ∪ {e}) for e = (u, s), observe that f (S ∪ {e}) = f (S) + ∑ z∈R T (u,S) w z (p(R(s)) − p(R(z, S))). After selecting the edge e = (u, s) that maximise f (S ∪ {e }), we update S and we set p(R(z, S)) = p(R(s)) for each node z ∈ R T (u, S) in vector ρ because z reaches s traversing the edge e = (u, s) and inherits the reachability of R(s).
The Greedy algorithm with a single source requires O(B|V||E|) time. Namely, for each edge e = (u, s) ∈ E , to compute f (S ∪ {e}) it is required O(|E|) time to compute R T (u, S) on G T , and O(|V|) time to compute ∑ z∈R T (u,S) w z (p(R(s)) − p(R(z, S))). Since there are O(|V|) sinks, the computation of the maximum value at Line 4 costs O(|V||E|) time. Selected e , O(|V|) time is spent to update ρ. Since at most B edges are added, the Greedy algorithm requires O(B|V||E|).
In the following, we describe the algorithm to compute a solution in the case of DAGs with a single sink. The main differences compared to the previous case mainly concern the preprocessing phase of the algorithm. The greedy algorithm remains the same except that we change the set E from which the edges are chosen in the following way: we substitute E on Line 2 with the set of edges (d, v), where v is a source in V and d is the only sink in V (see Line 2) by Lemma 2.
It is worth noting that, in this case, differently from before, the value of R T (d, S) is always equal to V by definition since all the nodes reach the sink, on the other hand, the reachability of a node v does not assume only the values V or R(v). This implies that we it is not required to perform any DFS visit of G T to compute f (S ∪ {(d, v)}), in fact, the vicinity of any other node depends on the set of added edges and has to be recomputed each time by performing a DFS visit on the augmented graph. Hence, f (S ∪ {(d, v)}) = f (S) + ∑ z∈V w z · p(R(v, S) \ R(d, S)). Since for computing f (S ∪ {(d, v)}), we must compute |V| DFS visits, the overall cost of the Greedy algorithm with a single sink increases by a factor of |V| with respect to the case of DAG with a single source, thus becoming O(B|V| 2 |E|). Namely, for each edge e = (u, r) ∈ E , to compute f (S ∪ {e}) it is required O(|V||E|) time to compute |V| DFS visits in G(S), one for each node in V, and O(|V|) time to update R and to compute ∑ z∈R T (u,S) w z (|V| − |R(z, S)|). Since there are O(|V|) sinks, the computation of the maximum value at Line 4 costs O(|V| 2 |E|) time. Then, since at most B edges are added, the Greedy algorithm requires O(B|V| 2 |E|).
Observe that Algorithm 2 that we have just described in the case of a single source can also be used on trees in place of Algorithm 1. However, the complexity of the Greedy algorithm will be O(|V| 2 B) that is greater than the complexity of Algorithm 1, which is O(|V|B 2 ).
To give a lower bound on the approximation ratio of Algorithm 2, we show that the objective function f (S) is monotone and submodular. Recall that, for a ground set N, a function z : 2 N → R is said to be submodular, if it satisfies the following property of diminishing marginal returns: for any pair of sets S ⊆ T ⊆ N and for any element e ∈ N \ T, z(S ∪ {e}) − z(S) ≥ z(T ∪ {e}) − z(T). This allows us to apply the result by Nemhauser et al. [28]: Given a finite set N, an integer k , and a real-valued function z defined on the set of subsets of N, the problem of finding a set S ⊆ N such that |S| ≤ k and z(S) is maximum can be 1 − 1 e approximated by starting with the empty set, and repeatedly adding the element that gives the maximal marginal gain, if z is monotone and submodular.
Recall that f (S) = ∑ v∈V w v p(R(v, S)) and w v , p v ∈ N ≥0 . To prove that f (S) is a monotone increasing and submodular function, we just need to show that p(R(v, S)) is monotone increasing and submodular, for each node v ∈ V and solution S. This is due because a non-negative linear combination of monotone submodular functions is also monotone and submodular.

Proof.
To prove that f (S) is monotone, we prove that for each v ∈ V, S ⊆ E , and e = (t , s) ∈ E \ S, we have p(R(v, S ∪ {e})) ≥ p(R(v, S)). We first notice that for each node v ∈ V and solution S, if there exists an edge (t, s) ∈ S such that t ∈ R(v), then p(R(v, S)) = p(R(s)); otherwise, p(R(v, S)) = p(R(v)). The same holds for p(R(v, S ∪ {e})).
To prove that f (S) is submodular, we prove that, for any node v ∈ V, any two solutions S, T of MCI such that S ⊆ T, and any edge e = (t , s) ∈ T, where t is a sink node, it holds: We analyse the following cases: In all the cases the inequality in Equation (2) holds.
Theorem 5. Function f (S) is monotone and submodular with respect to any feasible solution for MCI on DAGs with a single sink.

Proof.
To prove that f (S) is monotone, we show that for each v ∈ V, S ⊆ E , and e = (d, s ) ∈ E \ S, we have p(R(v, S ∪ {e})) ≥ p(R(v, S)). We observe that Thus, p(R(v, S)) ≤ ∑ u∈R(v,S)∪R(s ) p u .
To prove that f (S) is submodular, we prove that for any node v ∈ V, any two solutions S, T of MCI such that S ⊆ T, and any edge e = (d, s ) ∈ T, where s is a source node: We first make the following observations based on Equation (3): The inequality in Equation (4) follows by observing that R(v, S) ⊆ R(v, T).

Corollary 1.
Algorithm 2 provides a 1 − 1 e -approximation for the MCI problem either on DAG with a single source or with a single sink.

Polynomial-Time Algorithm for DAG with Two Sources
In this section, we take a first step in the direction of studying the MCI problem in general DAGs, i.e., multiple sources and multiple sinks, by tackling the special case of weighted DAGs with two source nodes. In the following, we describe an approach that can be easily extended to the case of DAGs with a constant number of sources. However, this approach can be computationally heavy as the number of such sources increases.
We first prove an important property that we exploit to provide a polynomial-time algorithm for this case. The idea is that, if we add an initial edge to the solution, then any other added edge is toward the opposite source node. Using this property, we are able to reduce the number of possible solutions that we have to consider. Furthermore, this allows us to use the greedy algorithm that we proposed in the previous section to solve the problem.
In this section, we improve the notation by defining s 1 , s 2 the two sources and by L 1 , L 2 the sink nodes reachable from source s 1 and s 2 , respectively. Note that we allow L 1 ∩ L 2 = ∅, i.e., the two sources may share sink nodes. Lemma 3. Let S be a solution to the MCI problem in a graph G = (V, E) with two sources s 1 , s 2 ; then, there exists a solution S such that |S | = |S| with f (S ) ≥ f (S), and S contains at most one edge e directed towards source s 1 (s 2 , respectively), while all the other edges are directed towards s 2 (s 1 , respectively).
Proof. Let us first consider the following case: there exists in S at least one edge connecting a sink in L 2 with the source s 1 , i.e., edge (v i , s 1 ) ∈ S with v i ∈ L 2 . Note that this case is equivalent to connecting a node in L 1 to s 2 . Now, for any other edge e 1 = (v j , s 1 ) ∈ S, we prove that we can change such edge to connect to source s 2 , i.e., e 2 = (v j , s 2 ), without decreasing the objective function. In fact, note that, for any node z ∈ R T (v j , S) and solution S, we have that p(R(z, S)) can be either equal to p(R(V)), if there exists a path that connect such node to source s 2 , or equal to R(s 1 ) by using edge e 1 . Instead, by using edge e 2 , we have that any node z ∈ R T (v j , S) is able to reach any node in the graph, thus having profit p(R(V)). Therefore, we have that f (S) ≤ f ((S ∪ e 2 ) \ e 1 ) since any node that is able to reach sink v j in solution (S ∪ e 2 ) \ e 1 actually reaches all the nodes in the graph.
On the other hand, it must be that all edges in S are of the kind (v i , s 1 ) with v i ∈ L 1 or (v i , s 2 ) with v i ∈ L 2 , i.e., all the edges from L 1 are toward source s 1 (s 2 , respectively). Let us divide solution S in two subsets: S 1 containing all the edges of kind (v i , s 1 ) with v i ∈ L 1 , and S 2 , respectively, with edges of kind (v i , s 2 ) with v i ∈ L 2 . Then, by changing one edge from S 1 in (v j , s 2 ) and all the edges in S 2 toward s 1 we have increased the objective function, since now all the nodes in a path to one of the sources are able to reach any node in the graph.
The lemma above shows us that we can build a solution for the MCI problem by selecting one initial edge and then we can use the greedy algorithm to choose the rest of the edges. In fact, after we have added the initial edge, we can exploit the algorithm given in Section 5, since the DAG now has only one source node and thus the submodularity holds. The underlying idea for the algorithm is to create 2 · |L 1 ∪ L 2 | different instances by enumerating all possible combination of one initial edge, then, we run the greedy algorithm in each of this instances and we choose the best solution among them. Note that this set of instances is composed by four different kinds of edges: The edges from L 1 to s 2 (and from L 2 to s 1 , respectively) and the edges from L 1 \ L 2 to s 1 (and from L 2 \ L 1 to s 2 , respectively). In both cases, the algorithm considers s 1 (s 2 , respectively) as the only source node. Note that, in both cases, we are considering to optimise an MCI instance with only one source node and thus the objective function is monotone and submodular, as proved in Section 5.
Moreover, since our algorithm tries all possible initial edges and then picks the best solution, it must be that the initial edge is also in the optimal solution, otherwise there would another instance with a greater value. Thus, the greedy algorithm gives us the same approximation ratio as before, i.e., 1 − 1 /e, and requires O(B|V| 2 |E|) time, in fact note that all the possible instances that the algorithm tries are order of number of nodes, i.e., |L 1 ∪ L 2 | ≤ |V|.

Conclusions and Future Works
In this work, we study the problem of improving the reachability of a graph. In particular, we introduce a new problem called the Maximum Connectivity Improvement (MCI) problem: given a directed graph and an integer B, the problem asks to add a set of edges of size at most B in order to maximise the overall weighted number of reachable nodes.
We firstly provide a reduction from the well-known Set Cover problem to prove that the MCI problem is NP-complete. This result holds even if weights and profits are unitary and the graph contains a single sink node or a single source node. Moreover, via a reduction from the Maximum Coverage problem, we prove that the MCI problem is NP-hard to approximate to within a factor greater than 1 − 1 /e even in DAGs with a single source or a single sink.
We then propose a dynamic programming algorithm for the MCI problem on trees with a single source that produces optimal solutions in polynomial time. Then, we study the case of DAGs with a single source (or a single sink); in this case, we propose two 1 − 1 e -approximation algorithms that run in polynomial-time by showing that the objective function is monotone and submodular. Finally, we extend the latter algorithm for DAGs with one source to DAGs with two sources while keeping the same approximation guarantee.
As future works, we plan to extend our approach to general DAGs, i.e., with multiple sources and multiple sinks. Another possible extension is to solve the MCI problem by considering the budgeted version of the problem in which each edge can be added at a different budget cost. The goal is then to find a minimum cost set of pair (sink, source) to which add the edges.
Author Contributions: All authors have equally contributed to this work. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.