Robustness of Network Controllability with Respect to Node Removals Based on In-Degree and Out-Degree

Network controllability and its robustness have been widely studied. However, analytical methods to calculate network controllability with respect to node in- and out-degree targeted removals are currently lacking. This paper develops methods, based on generating functions for the in- and out-degree distributions, to approximate the minimum number of driver nodes needed to control directed networks, during node in- and out-degree targeted removals. By validating the proposed methods on synthetic and real-world networks, we show that our methods work reasonably well. Moreover, when the fraction of the removed nodes is below 10% the analytical results of random removals can also be used to predict the results of targeted node removals.


Introduction
Network controllability is a crucial area of research that has been explored in various types of networks, including biological networks [1], transportation networks [2], and corruption networks [3]. The controllability of a network refers to the ability to steer the states of its nodes to any desired state in a finite time by manipulating the input to a subset of its nodes. Nodes whose inputs are imposed are named driver nodes. In linear time-invariant systems, Kalman's controllability rank condition [4] is the classic method of assessing controllability. However, the method has limitations such as computation complexity and the lack of information about the system's interaction matrix and input matrix. To overcome these limitations, the concept of structural controllability was proposed [5]. Structural controllability is a property of structural linear time-invariant systems with independently free parameters or fixed zero elements in their interaction and input matrices that satisfy the controllability rank condition. Directed networks are structural systems. Liu et al. [6] developed the algorithm and analytical methods to obtain the minimum number of driver nodes in directed networks with the assumption that the directed network has no self-links and a node's internal state can only be modified upon interaction with neighboring nodes [7]. Throughout this paper, we will adhere to this assumption. Besides structural controllability, Yuan et al. introduced an exact controllability paradigm to determine the minimum number of driver nodes for undirected networks with arbitrary weights by using the maximum multiplicity [8].
In recent years, network structural controllability has gained increasing attention as a tool to measure and enhance network robustness. Robustness is commonly assessed by measuring network performance under various perturbations [9]. One approach is to randomly remove nodes or links and observe the resulting changes in network performance, while another approach involves targeted attack strategies exploiting specific

Network Data
To validate the theoretical results presented in the following sections, we will utilize three categories of synthetic networks as well as several real-world networks. In this section, we provide specific information regarding the utilized networks.

Directed Synthetic Networks
We choose three types of synthetic networks: Erdös-Rényi (ER) networks, Swarm Signaling networks (SSNs) and Scale-free networks (SFs).
We generate a directed ER network with N nodes, whereby a directed link is placed between every pair of nodes with a given probability of p ER . The average number of links is governed by the equation, L = N(N − 1)p ER . This study has employed two kinds of ER networks with N = 50 and N = 100, and p ER = 0.07 and p ER = 0.04, respectively.
The topology of Swarm Signaling Networks (SSNs), proposed in [24], is characterized by a regular out-degree and an in-degree distribution that follows a Poisson distribution. Two parameters must be specified to generate SSNs: the number of nodes, N, and the out-degree value, k. Each node in the network randomly creates k outgoing links to other nodes. Two kinds of SSNs are chosen, with N = 10 4 and average out-degree values of k = 2 and k = 5, respectively.
Scale-free networks (SFs) are a class of complex networks whose both in-degree and out-degree distributions exhibit a power-law distribution. In this paper, we generate two SFs using the Barabási-Albert model, which is a preferential attachment mechanism that generates networks with a power-law degree distribution with an exponent γ = 3 [25]. Specifically, we generate SFs in two stages. In the first stage, we generate a Barabási-Albert graph with N nodes, where the initial state is a star with m + 1 nodes. At each step, a node with m edges is preferentially attached to existing nodes with high degrees until the total number of nodes reaches N. In the second stage, we randomly assign directions to each link in the generated graph. The resulting SFs have in-degree and out-degree distributions that follow a power-law distribution with an exponent γ = 3. We set m = 5 and m = 10 for both SFs with N = 10 5 nodes, resulting in minimum in-degree and out-degree values a of 5 and 10, respectively, which are approximated by the integers that make the ceiling of the average value of the power-law distribution equal to m.

Real-World Networks
In this study, we employed real-world communication networks obtained from the Topology Zoo dataset [26]. To convert these networks from undirected to directed, we utilized the source and targeted node attributes [13]. Table 1 demonstrates the basic properties of the networks used in this study, including the number of nodes N, the number of links L, and the average total degree < k >. The total degree of a node is the sum of its in-degree and out-degree. Since the average in-degree equals the average out-degree, the average total degree is twice the average in-degree (and out-degree).

Network Controllability
Consider a linear, time-invariant networked system of N nodes, where each node's state is governed byẋ(t) = Ax(t) + Bu(t), with x(t) = (x 1 (t), x 2 (t), . . . , x n (t)) T being the N × 1 state vector. The N × N matrix A represents the interactions among the network components, and the N × M matrix B specifies which nodes are under the direct control of the M × 1 control input vector u(t) = (u 1 (t), u 2 (t), . . . , u m (t)) T .
A linear, time-invariant networked system is controllable if it can reach any desired state within a finite time by applying external inputs. The Kalman rank criterion requires that the rank of the controllability matrix [B, AB, A 2 B, . . . , A n−1 B] equals N for the system to be fully controllable. Liu et al. introduced the maximum matching method and the minimum inputs theorem to determine the minimum number of driver nodes required to ensure network structural controllability [6]. The number of driver nodes, N D , can be obtained by mapping a directed network into a bipartite network [13], obtaining a maximum matching edge set using the maximum matching algorithm [27], and then calculating N D = min{1, N − N m }, where N m is the number of directed edges in the maximum matching set without sharing the same source or end nodes.

In-Degree and Out-Degree Node Attacks
Centrality analysis is an essential research area in studying network robustness [28]. Nodes with a high degree are known to have a substantial impact on network functioning and are more susceptible to targeted attacks. In this study, our objective is to investigate an analytical approximation of network controllability during targeted node removal based on two types of degrees: in-degree and out-degree.
Assuming that the probability of node attack is proportional to some power of its in-degree and out-degree, we can express the probability of removing node i based on its in-degree k in_i as p in_i and based on its out-degree k out_i as p out_i . The formula for calculating these probabilities is given as follows: In the node removal process, after some nodes are removed, we recalculate the removal probabilities for the remaining nodes using Equation (1). We then select nodes to remove based on the recalculated probabilities until all nodes are removed. When α = 0, the aforementioned equations become which indicates that each node has an equal probability of being removed, resulting in a random removal strategy. On the other hand, for α > 0, nodes with higher degrees have a greater likelihood of being removed, while for α < 0, nodes with lower degrees are more likely to be removed. In this study, we investigate the impact of degree-based node removal strategies on network robustness. To this end, we focus on α > 0, as higher-degree nodes are commonly targeted for attack in real-world scenarios. Specifically, we consider two values of α, namely α = 1 and α = 10, to evaluate the impact of removing nodes proportional to their degree and removing high-degree nodes more aggressively, respectively. By using Equation (1), we obtain the probabilities of the node being removed based on in-degree and out-degree when α = 1 as follows: Analogously, the node removal probabilities based on in-degree or out-degree with α = 10 can be calculated by Our results show that, for α = 10, the removal of high-degree nodes does not lead to a significant reduction in network robustness in the beginning stage. For several networks, there are no significant differences between the results with α = 1 and α = 100. Interestingly, we observe that increasing the value of α to 100 does not result in further performance gains, as the performance of attacks with α = 100 is similar to that of attacks with α = 10. Additional details on these findings can be found in Appendix A. Furthermore, we find that when α = 1, the removal strategies based on in-degree or out-degree can be more detrimental to certain networks than node removal based on the total degree. However, for some other networks, the harmful effects of these strategies are comparable. The results are presented in Appendix B.

Analytical Approximation
The analytical approximation for targeted node removal based on in-and out-degrees with different α is derived from the analytical approximation of random node removal. As such, we begin by introducing the methodology for approximating the minimum fraction of driver nodes under random removal, and then introduce the analytical methods of the cases: α = 1 and α = 10.
To predict the minimum fraction of driver nodes under random removal, α = 0, by using the analytical method based on generating function of degrees, we first employ the framework proposed by Liu et al. [6]. Given a directed network G(N, L) with N nodes and L links, we can determine the minimum fraction of driver nodes using the generating function of the in-and out-degree distributions, denoted by G in (x) and G out (x), respectively, as well as the excess in-and out-degree distributions, denoted by H in (x) and H out (x), respectively. These generating functions can be defined as follows: where k in and k out represent in-degree and out-degree, respectively, while P in (.) and P out (.) are in-and out-degree probability distributions, respectively. Then, the minimum fraction of driver nodes can be obtained by where ω 1 , ω 2 ,ω 1 andω 2 satisfy and k denotes half of the average degree equal to the average in-degree and the average out-degree, k = 1 2 < k >=< k in >=< k out >. We aim to determine the minimum fraction of driver nodes n D needed to control the remaining part of the network after removing a fraction p of nodes. To this end, we partition the network into two sets: a set containing N D driver nodes that can control the rest of the network and a set of N r removed nodes. We assume that each removed node requires the control of an individual driver node. Then, we define the fraction of driver nodes n D as n D = N D +N r N . After removing a fraction p of nodes from the network, we can obtain the following expression for the minimum fraction of driver nodes n D , We adopt the method proposed by Shao et al. [29] to adjust the generating functions of in-and out-degree and the excess in-and out-degree after randomly removing a fraction p of nodes from the network. According to this method, the generating function after random removal can be obtained by applying an adjusted augmentationx = p + (1 − p)x to the original generating functions. Hence, the generating functions of the in-and out-degree and the excess the in-and out-degree after removing a fraction p of nodes can be expressed as follows:Ḡ Next, we use Equations (6) and (8) to obtain the fraction of the minimum number of nodes n D after removing a fraction p of nodes, where ω 1 , ω 2 ,ω 1 andω 2 satisfy 5.1.2. Case: α = 1 In-degree: In undirected networks, after a fraction p of nodes have been removed based on their degree; specifically, the probability of a node removal is proportional to some power of its degree, see Equation (1); the generating function of the degree distribution, G(x), transforms into functionḠ(x), which is as follows [28]: where We investigate the extension of prior conclusions to directed networks while removing nodes based on their in-degree. We assume that a node's in-degree and out-degree are independent and uncorrelated, such that removing a fraction p of nodes based on their in-degree results in the generating function of the in-degree distribution described by Equation (12). Furthermore, the generating function of the out-degree distribution is given byḠ out (x) = G out (p + (1 − p)x) following the equation of random node removals. So, if we remove nodes based on in-degree, functionḠ in (x) and functionḠ in (x) satisfȳ Then, we can obtain the analytical approximation of the minimum fraction of driver nodes under node removals based on in-degree using Equation (10).

Out-degree:
Analogously, if we remove a fraction p of nodes based on their out-degree, we maintain the assumption that the generating function of the out-degree distribution is described by Equation (12). Additionally, the generating function of the in-degree distribution can be expressed asḠ . Therefore, we have functionḠ in (x) and functionḠ in (x) as follows: Furthermore, utilizing Equation (10), we can derive an analytical approximation of the minimum fraction of driver nodes when nodes are removed based on out-degree.
5.1.3. Case: α = 10 When α = 10, we encounter difficulties in obtaining a numerical solution for Consequently, it becomes challenging to determine the evolution of the generating functions for in-degree and out-degree distributions during the node removal process. To address this challenge, we propose a heuristic approach whereby we map the targeted node removal process based on in-degree or out-degree into a random node attack process.
Specifically, for node removals based on in-degree with α = 10, where a fraction of p nodes are to be removed, we map this process to the removal ofp nodes in the in-degree distribution, while maintaining the fraction of nodes in the out-degree distribution at p. Similarly, for node removals based on out-degree with α = 10, we map the process to the random removal of a fraction ofp nodes in the out-degree distribution, as well as a fraction of p nodes in the in-degree distribution.
In-degree: In order to estimate the correspondingp of a given fraction p under node removals based on in-degree with α = 10, we adopt the assumption that nodes are removed in descending order of in-degree. Specifically, we first sort the nodes according to their in-degree and then remove nodes starting from the node with the highest in-degree until the targeted fraction p is reached.
Next, we calculate the total in-degree of all the removed nodes by utilizing the original in-degree distribution and the targeted removal fraction p. The effective fractionp is then obtained by normalizing the total in-degree of all removed nodes with respect to the total in-degree of all nodes in the initial network. This can be calculated as follows: where the largest in-degree value is denoted as k in max , the probability of removed nodes with degree k in is denoted as p k in and degreek in satisfies ∑ It is worth mentioning that except for removed probability p¯k in , other probability p k in is equal to probability P in (k in ) in the generating function. Then, we can use effective proportionp in for the approximation of the minimum fraction of driver nodes as follows: where ω 1 , ω 2 ,ω 1 andω 2 satisfy Equation (11).
Out-degree: Analogously, for targeted node removal based on out-degree with α = 10, the calculation of fractionp out follows the same assumption: nodes are removed from the node with the highest out-degree to the node with the lowest out-degree until the removed fraction of nodes reaches p. The effective fractionp out is the total out-degree of removed nodes normalized by the total out-degree in the original network, which can be calculated bȳ where the largest degree value is denoted as k out max , and the probability of removed nodes with out-degree k out as p k out . To achieve the targeted removal fraction p, we find the minimum out-degree valuek out satisfying ∑ k out =k out k out =k outmax p k out = p. For all out-degree values except fork out , their corresponding probabilities p k out are equal to the probabilities P out (k out ) in the generating function. Then, we usep out , the effective proportion of removed nodes based on out-degree, to estimate the minimum number of driver nodes, which is given by the following expression: where ω 1 , ω 2 ,ω 1 andω 2 satisfy Equation (11).

5.2.
Results for Targeted Node Attacks 5.2.1. Case: α = 1 We ran simulations on various networks, as described in Section 2. We carried out 10,000 realizations for all networks to ensure sufficient statistical power. For ER and realworld networks, which have a relatively small number of nodes, one node was removed at each step until all nodes had been removed during each realization. Then, a recalculation of the minimum fraction of driver nodes was conducted by using the algorithm. On the other hand, due to the large number of nodes in SSNs and SFs, 1% of nodes were removed at each step until all nodes had been removed during each realization. Subsequently, the minimum fraction of driver nodes was recalculated based on the modified network structure. The average value of results obtained from the 10,000 realizations was taken as the final simulation output.
We present the results of targeted node removal based on in-degree and out-degree with α = 1, as depicted in Figures 1 and 2. The simulation results are shown in green lines, whereas the analytical results are in red. The results of random node removal are also presented in gray lines for comparison. We observe that the analytical results serve as a closed-form approximation of the minimum fraction of driver nodes (n D ), as a discrepancy exists between the predicted and simulation values during the targeted node removal process based on in-degree or out-degree. In the case of ER networks and SFs, the in-degree and out-degree distributions are identical. Consequently, the predicted values of targeted removal based on in-degree and out-degree are also the same. For SSNs, the out-degree of nodes is fixed. Therefore, the lines of analytical results of targeted node removal based on out-degree with α = 1 in SSNs overlap with the lines of random node removal. We find that for SFs and SSNs, the simulation results of targeted removal based on in-degree and out-degree are slightly different from the simulation results of random removals. Thus, even though the analytical results of SFs differ slightly from the simulation results of random removals and the analytical results of SSNs are the same as the random removal results, they closely approach the simulation results of targeted removals. When the removed fraction p is small, the simulation results of targeted removals based on in-degree and out-degree are close to those of random removals. We verified this by calculating the Root Mean Square Error (RMSE) between the simulation results of targeted removals based on in-degree and out-degree and analytical results of randomly removing nodes below 10%, as shown in Table 2. Moreover, we calculated the RMSE between the simulation and analytical results of targeted node removals based on in-degree and out-degree below 10%, as shown in Table 3. The results indicated that both methods provide a good approximation of the simulation results, as the values in both tables for targeted node removals based on in-degree and out-degree with α = 1 are reasonably small. Table 2. The RMSE between the analytical results of random removals and the simulation results under random removals, target removals with α = 1 and α = 10, respectively, while removing 10% of the nodes. The column labeled "Random" indicates the RMSE under random removals. The columns labeled "α = 1" and "α = 10" represent the RMSE under targeted node removals with α = 1 and α = 10, respectively. The columns labeled "Indegree", "Outdegree", and "Degree" represent the RMSE under targeted node removals based on in-degree, out-degree, and total degree, respectively. The analytical method for random removals is from the reference [19].  Table 3. The RMSE between the analytical results of the proposed analytical methods and the simulation results under different kinds of removals while removing 10% of the nodes. The column labeled "Random" indicates the RMSE under random removals. The columns labeled "α = 1" and "α = 10" represent the RMSE under targeted node removals with α = 1 and α = 10, respectively. The columns labeled "Indegree", "Outdegree", and "Degree" represent the RMSE under targeted node removals based on in-degree, out-degree, and total degree, respectively. The analytical methods for random removals and targeted node removals based on the total degree are from the reference [19].  10 We ran the simulations of 10,000 realizations with α = 10 under in-degree and outdegree node removals in mentioned networks. Each realization of every network is the same as described in case α = 1. The simulation results of network controllability are shown in the green lines in Figures 3 and 4. As before, the analytical results are depicted in red lines, while the simulation results of network controllability under random node attacks are shown in gray lines.

Network Random
In addition to targeted node removals based on out-degree in SSNs with fixed outdegree, the analytical results are consistent with random node removals. Notably, the analytical results exhibit a similar pattern for α = 10, where they initially surpass the simulation results before eventually intersecting and becoming inferior to the targeted node attack lines but superior to the random node attack lines as the fraction of removed nodes approaches one. We find the proposed methods can closely approximate network controllability using a closed-form approach, but do not precisely align with simulation results.
Upon examining Tables 2 and 3, we observe that both the proposed analytical methods and the analytical results of random node removal demonstrate satisfactory performance for targeted node removal based on in-degree and out-degree with α = 10 when the fraction of removed nodes p is below 10%. However, the values obtained for α = 10 are comparatively inferior to those obtained for α = 1 and random node removal. These outcomes highlight the limitations of our proposed approach. Specifically, our method assumes that nodes are removed from the node with the highest degree to the node with the lowest degree, which is true when α is large enough, such as infinity. In this context, we choose α = 10 and the node with the highest degree is much more likely to be removed, but still cannot be guaranteed to be removed at each step.

Conclusions and Discussion
This study introduces analytical methods based on generating functions to determine the minimum fraction of driver nodes required to maintain network controllability in directed networks under node failures based on in-degree and out-degree. We develop separate analytical techniques for two scenarios, namely α = 1 and α = 10. Our proposed analytical methods demonstrate reasonable results to predict the minimum fraction of driver nodes under targeted attacks. Furthermore, our investigation indicates that random node removal may also serve as a reliable predictor of the results of various targeted node removals, particularly when the fraction of removed nodes is minimal (below 10%).
In addition to the findings presented in this paper, we have endeavored to apply our simulations to various other real-world networks. Our analysis reveals that the minimum fraction of driver nodes calculated by the proposed analytical method utilizing generating functions does not coincide with the results obtained using the maximum matching algorithm before node removal. As such, our proposed methods are inadequate for predicting the minimum fraction of driver nodes under node removal for these networks. When targeted node removal is based on in-degree and out-degree with α = 10, our approximation method assumes that nodes are removed in descending order of in-degree and out-degree. However, the assumption does not reflect the actual removal process, as we recalculated the removal probabilities to choose nodes at each step. This discrepancy is one of the reasons for the inaccurate results obtained. Moreover, we acknowledge that further improvements are required to enhance the method's efficacy. Notably, the numerical solution of the predicted outcomes can be challenging to obtain, particularly when attempting to acquire the results for SFs with some other parameters.
The approximation of node removals based on in-or out-degree involves an assumption that the in-degree distribution and out-degree distribution evolve independently. However, the assumption requires further investigation to ensure its validity. To address this issue, an avenue of promising research involves examining the relationship between in-degree and out-degree distributions through the randomization of networks. Such analyses may provide upper and lower bounds for analytical methods, contributing to the improvement of predictions about network controllability under targeted attacks based on in-degree and out-degree.
In the future, we aim to broaden the scope of our findings by including other types of node attacks, specifically localized node attacks, as documented in [28]. Furthermore, we intend to verify our conclusions on a more comprehensive collection of real-world networks and various types of networks, such as interdependent networks. We also plan to apply additional prediction techniques, such as machine learning methods, to assess network controllability under node removals concerning in-degree and out-degree.

Abbreviations
The following abbreviations are used in this manuscript:

ER
Erdös-Rényi networks SSNs Swarm Signal networks SFs Scale-free networks RMSE Root Mean Square Error

Appendix A. The Simulation Results Based on Different α Values
The following figures demonstrate for α = 0, α = 1, α = 10 and α = 100 how the minimum fraction of driver nodes changes under targeted attacks based on degree, indegree and out-degree for ER networks and SSNs. We find that the results of α = 10 and α = 100 overlap.

Appendix B. Comparison with Node Removal Based on Degree with α = 1
We present the results of four types of node removal strategies: random removal, targeted node removal based on the total degree with α = 1, targeted node removal based on in-degree with α = 1, and targeted node removal based on out-degree with α = 1 for three networks in Figure A2. We find that the three targeted node removal strategies are more disruptive than random removal. However, the effectiveness of the targeted node removal strategies varies depending on the network structure. For instance, in ER(100,0.04), all three targeted node removal strategies show similar performance. In SSN(10 4 , 2), the targeted node removal based on in-degree is the most disruptive; whereas, in HinerniaGlobal, the targeted node removal based on out-degree is the most disruptive. . The minimum fraction of driver nodes n D during random removal and targeted node removal based on degree, in-degree and out-degree with α = 1 for three networks. The results are the average n D calculated by the maximum matching algorithm over 10,000 realizations of HinerniaGlobal and 1000 realizations of ER(100, 0.04) and SSN(10 4 , 2). The blue, red, orange and pink dashed lines are the results of simulations with random removal, target removal based on the total degree with α = 1, target removal based on in-degree with α = 1 and target removal based on out-degree with α = 1 separately.

Appendix C. Another Real-World Network Results
In this study, the real-world graphs utilized have an average degree ranging from 2 to 3. To further evaluate the efficacy of the proposed techniques, we selected a network from the Topology Zoo dataset, namely BtNorthAmerica, which possesses an average total degree of 4.22. The network under consideration comprises 36 nodes and 76 links. We analyzed the controllability of the network under node removals concerning node in-degree and out-degree with different α. The results are presented in Figure A3 and Tables A1 and A2. Our findings suggest that the predicted values of the proposed methods are valid. It is worth mentioning that, when α = 10, attacks based on out-degree at the onset are not as deleterious as random removals. Removing the node with the highest out-degree in the initial steps results in a lower average number of driver nodes than removing other nodes, on average. Table A1. The RMSE between the analytical results of random removals and the simulation results under random removals, target removals with α = 1 and α = 10, respectively, while removing 10% of the nodes. The analytical method for random removals is from the reference [19].  Table A2. The RMSE between the analytical results of the proposed analytical methods and the simulation results under different kinds of removals while removing 10% of the nodes. The analytical methods for random removals and targeted node removals based on the total degree are from the reference [19].