A Potential Information Capacity Index for Link Prediction of Complex Networks Based on the Cannikin Law

Recently, a number of similarity-based methods have been proposed for link prediction of complex networks. Among these indices, the resource-allocation-based prediction methods perform very well considering the amount of resources in the information transmission process between nodes. However, they ignore the information channels and their information capacity in information transmission process between two endpoints. Motivated by the Cannikin Law, the definition of information capacity is proposed to quantify the information transmission capability between any two nodes. Then, based on the information capacity, a potential information capacity (PIC) index is proposed for link prediction. Empirical study on 15 datasets has shown that the PIC index we proposed can achieve a good performance, compared with eight mainstream baselines.


Introduction
Recently, more and more complex systems have been analyzed through theories of network science [1][2][3][4][5]. As an important hot topic of complex networks, link prediction [6] aims to predict the likelihood that a link exists between two nodes of complex networks. It plays an important role in recommending friends of online social networks [7] and discovering missing interactions of protein-protein interaction networks [8].
In the last few years, there are many link prediction methods for predicting missing links of complex networks. Among these methods, topology-based similarity indices are simple and effective, which attract the attention of scholars in various fields [9]. As the simplest method of similarity indices, common neighbor (CN) index measures the similarity between two endpoints by calculating the number of common neighbors between them [10]. Based on CN, many common-neighbor-based methods have been proposed through weighing the common neighbors by local information, such as Adamic-Adar (AA) index [11], resource allocation (RA) index [12], CAR [13] and so on. These local indices perform very well in many types of networks, but they need more topology information to improve the prediction accuracy in some networks. Considering longer paths, Local Path (LP) index [14] and Extended Resource Allocation (ERA) index [4] are proposed by adding the paths with length 3 to the CN index and RA index, respectively. Furthermore, many global indices are proposed by considering all the topological information between two endpoints, such as Katz index [15], SimRank [16], Average Commute Time (ACT, also called Mean Commute Time) [17,18] and Cosine Similarity Time (Cos+) [19]. In the real prediction, the global indices perform better than most of other methods, but they are not suitable for large-scale networks due to their high complexity. It is worth mentioning that, considering the coupling information of local topology, some resource-allocation-based indices and local-path-based indices perform well or even better than global indices [9]. These indices considering of local topology are very suitable for large-scale network prediction, because their complexity is higher than that of CN but lower than that of global indices. Although these indices can get a good performance with lower complexity, most of them ignore the potential information capacity between endpoints.
In the real world, various types of information are transmitted constantly in different networks [20]. Any neighbor of a node can be regarded as the anchor point of information channel for information transmission, and the information capacity denotes the information transmission capability for any information channel (as shown in Figure 1). For online social networks, the larger the potential information capacity between two users x and y, the greater the likelihood that hot topics (such as rumors, news, stocks, etc.) will spread between them [21]. That is, they are more likely to be friends. According to mainstream indices (such as RA index and LP index), the similarity between nodes is mainly focused on considering common neighbors and their related paths. However, if node y receives or sends information to node x, the number of neighbor nodes ( ) Γ y and the information transmission capability between x and ( ) Γ y determine the capability of information exchanged between them through ( ) Γ y . Therefore, besides the local information considered by the existing indices, information channels (all the neighbors, not the common neighbors) and their information capacities are also play an important role in describing the similarity between two endpoints. In view of the above analysis, a potential information capacity (PIC) index is proposed for link prediction. To quantify the information transmission capability between any two nodes, the information capacity is defined based on the Cannikin Law. With a parameter adjusting the strength of potential information capacity for different networks, the PIC index measures the similarity between two endpoints by considering the information channels and their information capacity. Experimental results show that the PIC index proposed can improve the prediction accuracy of 15 datasets, compared with several global and local indices.
The main parts of this paper are organized as follows: in Section 2, the information capacity is defined and the potential information capacity index is introduced; in Section 3, two standard metrics and eight mainstream baselines are described; in Section 4, all the 15 datasets and their topological features are introduced; in Section 5, the comparison between PIC index and eight mainstream methods is discussed; finally, a brief conclusion is given.

The Potential Information Capacity Index
The information transmission or interaction process (including resource transmission) between nodes has been described and used by several link prediction methods, and their prediction accuracy is also very high. However, they ignore the analysis and utilization of the potential information capacity between nodes. In this section, motivated by the Cannikin Law, we propose an information capacity quantification method and a new similarity index.

Information Capacity Based on the Cannikin Law
Information transmission is a common phenomenon in nature and human society, and it is also an important intrinsic motivation for establishing connections in complex networks [12]. Different In view of the above analysis, a potential information capacity (PIC) index is proposed for link prediction. To quantify the information transmission capability between any two nodes, the information capacity is defined based on the Cannikin Law. With a parameter adjusting the strength of potential information capacity for different networks, the PIC index measures the similarity between two endpoints by considering the information channels and their information capacity. Experimental results show that the PIC index proposed can improve the prediction accuracy of 15 datasets, compared with several global and local indices.
The main parts of this paper are organized as follows: in Section 2, the information capacity is defined and the potential information capacity index is introduced; in Section 3, two standard metrics and eight mainstream baselines are described; in Section 4, all the 15 datasets and their topological features are introduced; in Section 5, the comparison between PIC index and eight mainstream methods is discussed; finally, a brief conclusion is given.

The Potential Information Capacity Index
The information transmission or interaction process (including resource transmission) between nodes has been described and used by several link prediction methods, and their prediction accuracy is also very high. However, they ignore the analysis and utilization of the potential information capacity between nodes. In this section, motivated by the Cannikin Law, we propose an information capacity quantification method and a new similarity index.

Information Capacity Based on the Cannikin Law
Information transmission is a common phenomenon in nature and human society, and it is also an important intrinsic motivation for establishing connections in complex networks [12]. Different kinds of information flow constantly in different networks, such as messages are sent from the terminal to any person through the infrastructure network [22], passengers travel from one train station to another through the railway transportation network [23], neural signal is transmitted from one neuron to another through the neural network [24] and so on.
As shown in Figure 2, if node i has one unit of information, and will transfer it to node j through self-avoiding random walk on any path in multipath Path ij , the amount of information R ij received by j can be expressed as: here, k z ij denotes the node degree of vertex z ij , where z ij is the node on the path (Path ij ). Obviously, R ij represents the ability to transmit information between nodes i and j. kinds of information flow constantly in different networks, such as messages are sent from the terminal to any person through the infrastructure network [22], passengers travel from one train station to another through the railway transportation network [23], neural signal is transmitted from one neuron to another through the neural network [24] and so on. As shown in Figure 2, if node i has one unit of information, and will transfer it to node j through self-avoiding random walk on any path in multipath Pathij, the amount of information ij R received by j can be expressed as: here, ij z k denotes the node degree of vertex ij z ,where ij z is the node on the path (Pathij).
Obviously, ij R represents the ability to transmit information between nodes i and j.
. Considering the transmission fading and computational complexity of multi-hop paths, we just analyze the information transmission process of paths below two hops [25]. Therefore, the amount of information received by j through a certain common neighbor zij can be expressed as: After estimating the amount of information transmitted through common neighbors, we are wondering how to use the process of information transmission to define or quantify the information capacity between any two nodes. In the real information transmission process, since the high-degree common neighbor is more easily selected as the transmission relay node [26], the information capacity between two endpoints is strongly related to the information transmission capability of high-degree common neighbors.
As shown in Figure 3a, if all the paths between any two nodes i and j are compared to one "container" (a bucket for storing information), the capacity of the container indicates the potential information transmission capability between nodes through various possible paths. According to the theory of the Cannikin Law [27], the capacity of the wooden bucket is limited by the height of its shortest plank (as shown in Figure 4). In different type of complex networks, information flow can be traffic flow in traffic networks, topic flow in social networks, or bioelectricity flow in neural networks. As the special kind of material flow, information flow also has the common characteristics and attributes of fluid, which can also be described and studied by the Cannikin Law. Based on the above theory, each path between nodes i and j can be regarded as a plank of a wooden bucket. Then, their information capacity ( , ) IC i j is determined by the number of paths nij (number of planks) and the minimum amount of information Min ij R transmitted by these paths (transmitted by the shortest plank), which can be expressed as: Considering the transmission fading and computational complexity of multi-hop paths, we just analyze the information transmission process of paths below two hops [25]. Therefore, the amount of information received by j through a certain common neighbor z ij can be expressed as: After estimating the amount of information transmitted through common neighbors, we are wondering how to use the process of information transmission to define or quantify the information capacity between any two nodes. In the real information transmission process, since the high-degree common neighbor is more easily selected as the transmission relay node [26], the information capacity between two endpoints is strongly related to the information transmission capability of high-degree common neighbors.
As shown in Figure 3a, if all the paths between any two nodes i and j are compared to one "container" (a bucket for storing information), the capacity of the container indicates the potential information transmission capability between nodes through various possible paths. According to the theory of the Cannikin Law [27], the capacity of the wooden bucket is limited by the height of its shortest plank (as shown in Figure 4). In different type of complex networks, information flow can be traffic flow in traffic networks, topic flow in social networks, or bioelectricity flow in neural networks. As the special kind of material flow, information flow also has the common characteristics and attributes of fluid, which can also be described and studied by the Cannikin Law. Based on the above theory, each path between nodes i and j can be regarded as a plank of a wooden bucket. Then, their information capacity IC(i, j) is determined by the number of paths n ij (number of planks) and the minimum amount of information R Min ij transmitted by these paths (transmitted by the shortest plank), which can be expressed as: here, k max z ij denotes the highest node degree of common neighbors between i and j, and β ≥ 0 aims to adjust the strength of information transmission capability for different types of networks.
here, max ij z k denotes the highest node degree of common neighbors between i and j, and 0 β ≥ aims to adjust the strength of information transmission capability for different types of networks.
(a) (b)  If there is a direct connection between two endpoints i and j, as shown in Figure 3b, the information capacity ( , ) IC i j can be expressed as (the direct connection vij can be regarded as another bucket with only one piece of plank): Taking account of the two cases in Figure 3a and Figure 3b, we make a definition of information capacity between any two endpoints in complex network. Definition 1. Considering a pair of endpoints i and j in complex network, ′ ij z is the common neighbor of them. The information capacity ( , ) IC i j between the two endpoints, which represents the information transmission capability between them, can be quantified as: aij is the element value of the adjacency matrix A, which denotes whether there is a connection between nodes i and j.
here, max ij z k denotes the highest node degree of common neighbors between i and j, and 0 β ≥ aims to adjust the strength of information transmission capability for different types of networks.
(a) (b) . If there is a direct connection between two endpoints i and j, as shown in Figure 3b, the information capacity ( , ) IC i j can be expressed as (the direct connection vij can be regarded as another bucket with only one piece of plank): Taking account of the two cases in Figure 3a and Figure 3b, we make a definition of information capacity between any two endpoints in complex network. Definition 1. Considering a pair of endpoints i and j in complex network, ′ ij z is the common neighbor of them. The information capacity ( , ) IC i j between the two endpoints, which represents the information transmission capability between them, can be quantified as: aij is the element value of the adjacency matrix A, which denotes whether there is a connection between nodes i and j. If there is a direct connection between two endpoints i and j, as shown in Figure 3b, the information capacity IC(i, j) can be expressed as (the direct connection v ij can be regarded as another bucket with only one piece of plank): Taking account of the two cases in Figure 3a,b, we make a definition of information capacity between any two endpoints in complex network. Definition 1. Considering a pair of endpoints i and j in complex network, z ij is the common neighbor of them. The information capacity IC(i, j) between the two endpoints, which represents the information transmission capability between them, can be quantified as: a ij is the element value of the adjacency matrix A, which denotes whether there is a connection between nodes i and j.

The Potential Information Capacity Index
Consider an undirected network G(V, E), where V and E are the sets of vertices and edges, respectively. Given a link prediction method, it assigns a score s xy to each pair of endpoints x and y. The score s xy can be a measure of the similarity between two endpoints, and the score for each nonexistent link represents the likelihood that the link exists.
In general, the simplest way to calculate the likelihood that a link exists between two endpoints is to directly use the information capacity between them. However, it will ignore the important role of neighbor nodes in the potential information transmission process. In the real world, any node is transmitting information through its neighbor nodes. As shown in Figure 5a, the neighbor node z y can be regarded as antennas of the node y, which is the anchor point of information channels for receiving and transmitting information. Theoretically, the calculation of all the potential information capacity between two endpoints should consider all the information channels and their information capacity at the same time. Based on the above discussion, the similarity between two endpoints is calculated by the information capacity between their neighbors and endpoints.

The Potential Information Capacity Index
Consider an undirected network G(V, E), where V and E are the sets of vertices and edges, respectively. Given a link prediction method, it assigns a score sxy to each pair of endpoints x and y. The score sxy can be a measure of the similarity between two endpoints, and the score for each nonexistent link represents the likelihood that the link exists.
In general, the simplest way to calculate the likelihood that a link exists between two endpoints is to directly use the information capacity between them. However, it will ignore the important role of neighbor nodes in the potential information transmission process. In the real world, any node is transmitting information through its neighbor nodes. As shown in Figure 5a, the neighbor node zy can be regarded as antennas of the node y, which is the anchor point of information channels for receiving and transmitting information. Theoretically, the calculation of all the potential information capacity between two endpoints should consider all the information channels and their information capacity at the same time. Based on the above discussion, the similarity between two endpoints is calculated by the information capacity between their neighbors and endpoints. x y V . zx is the neighbor of node x, and zy is the neighbor of node y. Considering information channels and their information capacity, the potential information capacity (PIC) index composes of all the potential information capacity between nodes x and zy, nodes y and zx, which can be defined as: Obviously, the complexity of PIC index is between Considering that the neighbor node is the anchor of information channels for the node to exchange information, the physical meaning of the Equation (6) is that the potential information capacity between any two endpoints is the sum of the information capacity of all possible information channels. That is, the potential information capacity composes of all the information capacity between neighbor nodes of one endpoint and the other endpoint.

Metrics
Two standard metrics are widely used to quantify the accuracy of link prediction methods: area under the receiver operating characteristic curve (AUC) [28,29] and precision [30,31]. In principle, a link prediction method gives each non-observed link a similarity score to quantify its existence

Definition 2.
Considering a pair of nodes, x, y ∈ V. z x is the neighbor of node x, and z y is the neighbor of node y. Considering information channels and their information capacity, the potential information capacity (PIC) index composes of all the potential information capacity between nodes x and z y , nodes y and z x , which can be defined as: when β= 0, the PIC index becomes s PIC xy = k x + k y , which is similar to PA index (s PA xy = k x · k y ). Obviously, the complexity of PIC index is between O(N k 2 ) (PA) and O(N k 3 ) (LP).
Considering that the neighbor node is the anchor of information channels for the node to exchange information, the physical meaning of the Equation (6) is that the potential information capacity between any two endpoints is the sum of the information capacity of all possible information channels. That is, the potential information capacity composes of all the information capacity between neighbor nodes of one endpoint and the other endpoint.

Metrics
Two standard metrics are widely used to quantify the accuracy of link prediction methods: area under the receiver operating characteristic curve (AUC) [28,29] and precision [30,31]. In principle, a link prediction method gives each non-observed link a similarity score to quantify its existence likelihood. The AUC evaluates the method's performance as a whole while the precision only focuses on the L links with top ranks or highest scores. A detailed description of these two metrics is as follows.
Given the ranking of all non-observed links, the AUC value can be interpreted as the probability that the score given to a randomly chosen missing link is higher than a randomly chosen non-existent link [6]. In the algorithm implementation, we usually calculate the score of each non-observed link instead of giving the ordered list since the latter task is more time consuming. At each time, we randomly pick a non-existent link and a missing link to compare their scores, if among n times of independent comparisons, there are n times the missing link having a higher score and n times they have the same score, the AUC value of the method is: Obviously, if all the scores are generated from an independent and identical distribution, AUC ≈ 0.5. An AUC score of 1.0 represents a perfect prediction while a random method has a score of 0.5. Therefore, the extent to which a link prediction method exceeds 0.5 indicates how much better its prediction accuracy than pure chance.
Precision only pays attention to the top-ranked links. In practice, all non-observed links are ranked in descending order according to their similarity scores. The precision is defined as the ratio of relevant items selected to the number of items selected [30]. That means if we take the top-L links as the predicted ones, among which m links belong to missing links, then the precision value is defined as: Obviously, the precision value is related to the parameter L. For a given L, the higher precision value means better performance. In practice, L is generally set to 100 for large-scale networks, such as Ref. [4,32]. Thus, in order to compare the results more intuitively and clearly in multiple datasets, we set L=100 in this paper.

Baselines
We compare the PIC index with eight mainstream similarity indices, including five local indices: CN, AA, CAR, RA and LP index, and three global indices: Katz, ACT and Cos+ index. A brief description of these indices is shown as follows:

1.
Common Neighbor (CN) index [10] calculates the similarity of two endpoints by the number of their common neighbors: Γ(x) is the set of neighbors of node x, and Γ(x) ∩ Γ(y) represents the common neighbors between nodes x and y.

2.
Resource Allocation (RA) index [12] measures the similarity of two endpoints by the received resource (information) of endpoint y through common neighbors sending by endpoint x: Entropy 2019, 21, 863 7 of 15 k z denotes the node degree of common neighbor z.

3.
Adamic-Adar (AA) index [11] weights the common neighbors according to the node degree, and punishes the common neighbors with big degree: This method considers that the contribution of common neighbors with low node degree are weighted higher than that of nodes with high node degree, and the weighting scheme used by AA index is the reciprocal of the logarithm of node degree [10]. 4.
CAR index [13] believes that the link is more likely to exist between two nodes if their common-first-neighbors are members of a strongly inner-linked cohort: γ(z) denotes the sub-set of the neighbors of node z, and all these neighbors of node z are also the common neighbors of nodes x and y.

5.
Local Path (LP) index [14] considers the longer paths with length 3 between endpoints based on the common neighbors: α denotes the adjust parameter for longer paths, and A is the adjacency matrix.

6.
Katz index [15] calculates the similarity between two nodes by considering all the paths between them: here, ε is the adjust parameter for paths, and path l xy is the set of paths with length l between nodes x and y.

7.
Average Commute Time (ACT) [17] calculates the similarity between two nodes by the average number of steps required by random walks between them: L + denotes the pseudo-inverse of matrix L = D − A, and l + xy is the corresponding entry in L + . 8.
Cosine Similarity Time (Cos+) [19] calculates the similarity between nodes based on the angle between the random walk vectors:

Data
To test the effectiveness of the proposed PIC index, twelve different real networks and three synthetic dynamic networks (randomly generated by BA model with different scales, denoted as SD-1, SD-2, SD-3 respectively) are used in our experiments. The twelve real networks are introduced as follows: (i) AIDS-Blog (AIDS) [33]: a citation network among blogs related to AIDS, patients, and their support networks. (ii) Food Web of Florida Bay ecosystem (FWFB) [34]: the network of carbon exchanges occurring during the wet season in Florida Bay. (iii) Food Web of Everglades ecosystem FWEW [35]:  Table 1. Each original data is randomly divided into training set contains 90% of links, and the probe set contains the remaining 10%.

AUC Results
Firstly, let us explore the AUC results of the PIC index with different β in 15 datasets, and each result is the average of 20 realizations. With the changing of parameter β, the AUC values are continuous varies for 15 datasets as shown in Figure 6. For most of the datasets, the values of the PIC index are very high when the parameter 0 ≤ β < 1 (except FWFB, FWEW and Email). Similarly, when the adjust parameter β is equal to or very close to zero, the AUC value of some datasets can obtain the maximum value, which indicates that the link establishment of these networks considers more about information channels. However, for some datasets such as FWFB, FWEW, CE, Email, Flight and Yeast, the AUC value get the maximum value when the parameter β is far greater than 1, which indicates that the link establishment of these networks considers more about information capacity of information channels. Table 2 shows the comparison of the AUC value between PIC index and eight mainstream similarity indices. PIC-Max is the maximum AUC value of PIC index, and PIC-0.9 denotes the AUC value with parameter β = 0.9. In 14 out of 15 networks, the AUC value of PIC index is the highest, and only lower than the Cos+ in the Flight network. Having only considered the number of common neighbors between endpoints, CN gets the lowest AUC value for most of networks. The performance of common-neighbor-based indices such as AA and CAR is better than CN, even better than global indices in some networks. For all the networks except SD-3, the AUC value of RA is generally higher than that of CN. Obviously, the resource transmission process describing the common neighbor based on resource allocation can achieve better prediction results than directly calculating the number of common neighbors (CN). It also indicates that the contribution of different common neighbors to similarity is different for most complex networks.  Table 2 shows the comparison of the AUC value between PIC index and eight mainstream similarity indices. PIC-Max is the maximum AUC value of PIC index, and PIC-0.9 denotes the AUC value with parameter 0.9 β = . In 14 out of 15 networks, the AUC value of PIC index is the highest, and only lower than the Cos+ in the Flight network. Having only considered the number of common  With the longer paths considered, LP index obtains a good performance by adding a little more complexity. Obviously, the global indices can achieve a better performance than local indices especially Katz with the highest complexity. However, the AUC values of ACT and Cos+ are lower than expected in the three synthetic dynamic networks, probably because these indices are not suitable for datasets with power-law distribution. Interestingly, for Yeast, the AUC value of Katz, Cos+ and PIC are the same. This phenomenon shows that the path information above the third-order in the current network has little effect on the probability that a link exists between nodes, and the AUC values of these indices are very similar due to the similarity of the different coupling calculation of these special local topological structures (because the average path of the network is long, but the clustering coefficient is high). In addition, the parameter of these indices has little effect on the result, and when the parameter value is small, it has achieved a higher value and remains stable (as shown in Yeast in Figure 6).
As can be seen, having considered all the information channels and their information capacity, the PIC index can perform even better than these mainstream baselines in real networks or synthetic dynamic networks. In many networks such as AIDS, FWFB, FWEW, Hamster, Figeys, UcSocial, SD-1, SD-2 and SD-3, the PIC index is significantly higher than other methods. Compared with these local indices, the performance of the PIC index is increased by 2% to 68% under the AUC metric, while compared with global indices, the performance is improved by up to about 2.18 times (the AUC value of Cos+ in SD is very small). Overall, the average improvement rate of the PIC index is about 12.25% compared to these baselines, with a maximum improvement rate of 68%. Furthermore, the higher AUC results of PIC index show that the potential capacity of information transmission among nodes can represent the similarity between nodes to some extent. In addition, the parameter β is recommended to be set around 0.9 for PIC index under the AUC metric in the real prediction, and most of these AUC values are still higher than other indices (see PIC-0.9 of Table 2).

Precision Results
To test the effectiveness of the PIC index more deeply, the standard metric precision is introduced to measure the prediction accuracy from another perspective. As shown in Figure 7, there are 15 precision results with the change of β for different datasets. Same as the AUC results, the precision values of the PIC index are also very high when 0 ≤ β < 1 for most networks (except FWFB, CE, Email and UcSocial). The precision value of some datasets can achieve the maximum value when the parameter β is around 0, which indicates that the establishment of top L predicted links in these networks considers more about information channels. However, for other datasets such as FWFB, CE, Email, PB, UcSocial, Flight, and Yeast, the precision value gets the maximum value when the parameter β is far greater than 1, which indicates that the establishment of top L predicted links in these networks considers more about the information capacity of information channels.
there are 15 precision results with the change of β for different datasets. Same as the AUC results, the precision values of the PIC index are also very high when 0 1 β ≤ < for most networks (except FWFB, CE, Email and UcSocial). The precision value of some datasets can achieve the maximum value when the parameter β is around 0, which indicates that the establishment of top L predicted links in these networks considers more about information channels. However, for other datasets such as FWFB, CE, Email, PB, UcSocial, Flight, and Yeast, the precision value gets the maximum value when the parameter β is far greater than 1, which indicates that the establishment of top L predicted links in these networks considers more about the information capacity of information channels.    Table 3 shows the comparison of precision between PIC index and eight mainstream baselines. PIC-Max is the maximum precision value of PIC index, and PIC-0.4 denotes the precision value with parameter β = 0.4. For all the 15 datasets, PIC-Max can obtain the best performance under the standard metric precision. Similarly, in 14 out of 15 networks, PIC-0.4 gets the best performance, and only worse than CAR, LP and Katz in Flight network. Surprisingly, the precision value of CN index is higher than RA and AA for many datasets such as FWFB, PB, UcSocial, Flight, Haggle, SD-1, SD-2, and SD-3. For most of datasets, LP and Katz achieve a better performance than these common-neighbor-based local indices with longer paths considered. However, the precision values of ACT and Cos+ are lower than all the indices, probably because they are not suitable for the standard metric precision. For all the local and global indices, the precision results in most networks (except AIDS, PB, UcSocial, Haggle and SD-1) are significantly improved by the proposed PIC index, and the precision value is increased by 0.11 to 0.91. Overall, PIC index can increase the precision value by an average of 0.19 and by a maximum of 0.95 compared to these baselines (because the precision values of ACT and Cos + were close to 0). The proposed PIC index performs very well in all the datasets under precision metrics, which indicates that the potential information capacity between two endpoints is positively related to the establishment of top L predicted links. In addition, under the standard metric precision, we recommend that the parameter β is set at around 0.4 in the real prediction for most of the datasets.  1 In these methods, the adjust parameter α= 0.001. 2 The adjust parameter α= 0.01.

Conclusions
Topology-based similarity indices play an important role in predicting missing links of large-scale networks. Motivated by the potential information capacity between two endpoints, a potential information capacity index is proposed for link prediction. Based on the Cannikin Law, the information capacity considers the number of paths (number of planks) and the minimum amount of information transmitted by these paths (shortest plank). The PIC index can achieve a good performance with an adjust parameter of information capacity for each channel. It can obtain the maximum value for different networks under different parameter values. For most datasets, the AUC and precision of the PIC index are very close to the maximum when the parameter β is around 0.9 and 0.4. According to the PIC index, when the parameter is equal to zero, it is similar to PA index. This indicates that if the parameter is closer to zero when obtaining the maximum value for the dataset, the degree distribution of this dataset is closer to power-law. Due to its good performance in different types of real networks and low time complexity, the PIC index can be applied to many real networks, especially large-scale networks. In our future work, we will address how to quantify the information capacity between nodes in directed networks, and then propose an effective link prediction method for directed networks. Furthermore, for information networks, technology networks and other related networks, transmission nodes are subject to attack and failure in the process of real information transmission [45,46], and transmission delay varies with topology, which is also a common phenomenon [47]. Therefore, considering the above factors in the transmission process, re-modeling the information capacity between nodes will provide a new idea for link prediction.