# ALPINE: Active Link Prediction Using Network Embedding

^{*}

## Abstract

**:**

## 1. Introduction

**max-deg.**,

**max-prob.**, and

**max-ent.**, which are defined in Section 4. Nodes mentioned in the table are highlighted with character names in Figure 1. Strategy

**max-deg.**suggests to query the relationships among Harry and the high-degree nodes—those who are known to have many allies. Strategy

**max-prob.**selects nodes that are highly likely to be Harry’s friends based on the observed part of the network. Finally,

**max-ent.**proposes to query the most uncertain relationships. A more detailed discussion of these results and a thorough formal evaluation of ALPINE are left for Section 5, but the reader may agree that the proposed queries are indeed intuitively informative for understanding Harry’s social connections.

**main contributions**of this paper are:

- We proposed the ALPINE framework for actively learning to embed partially observed networks by identifying the node pairs with an as-yet unobserved link status of which the link status is estimated to be maximally informative for the embedding algorithm (Section 3).
- To identify the most informative candidate link statuses, we developed several active learning query strategies for ALPINE, including simple heuristics, uncertainty sampling, and principled variance reduction methods based on D-optimality and V-optimality from experimental design (Section 4).
- Through extensive experiments (the source code of this work is available at https://github.com/aida-ugent/alpine_public, accessed on 17 October 2020), (1) we showed that CNE adapted for partially observed networks was more accurate for link prediction and more time efficient than when considering unobserved link statuses as unlinked (as most state-of-the-art embedding methods do), and (2) we studied the behaviors of different query strategies under the ALPINE framework both qualitatively and quantitatively (Section 5).

## 2. Background

#### 2.1. Active Learning and Experimental Design

#### 2.2. Network Embedding and Link Prediction

#### 2.3. Related Work

## 3. The ALPINE Framework

#### 3.1. Network Embedding for Partially Observed Networks

**Definition**

**1.**

#### 3.2. Active Link Prediction Using Network Embedding—The Problem

**Problem**

**1**

**.**Given a partially observed network $\mathcal{G}=(V,E,D)$, a network embedding model, a budget k, a query-pool $P\subseteq U$, and a target set $T\subseteq U$ containing all node pairs for which the link statuses are of primary interest, how can we select k node pairs from the pool P such that, after querying the link status of these node pairs, adding them to the respective set E or D depending on the status, and retraining the model, the link predictions made by the network embedding model for the target set T are as accurate as possible?

#### 3.3. The ALPINE Framework

- At iteration $i=0$, initialize the pool as ${P}_{0}=P$, and the set of node pairs with known link status as ${K}_{0}=E\cup D$, and initialize ${\mathcal{G}}_{0}=\mathcal{G}$ and ${\mathit{A}}_{0}=\mathit{A}$;
- Then, repeat:
- Compute the optimal embedding ${\mathit{X}}_{i}^{*}$ for ${\mathcal{G}}_{i}$;
- Find the set of queries ${Q}_{i}\subseteq {P}_{i}$ of size $|{Q}_{i}|=min(s,k)$ with the largest utilities according to ${u}_{{\mathit{A}}_{i},{\mathit{X}}_{i}^{*}}$ (and T);
- Query the oracle for the link statuses of node pairs in ${Q}_{i}$, set ${P}_{i+1}\leftarrow {P}_{i}\backslash {Q}_{i}$, and set ${\mathcal{G}}_{i+1}$ equal to ${\mathcal{G}}_{i}$ with node pairs $\{i,j\}\in {Q}_{i}$ added to the set of known linked or unlinked node pairs (depending on the query result), then set ${\mathit{A}}_{i+1}$ accordingly;
- Set $k\leftarrow k-|{Q}_{i}|$, and break if k is zero.

## 4. Query Strategies for ALPINE

**page-rank. max-deg.**,

**max-prob.**, and

**min-dis.**) are heuristic approaches that use the node degree and link probability information. The fifth (

**max-ent.**) is uncertainty sampling, and the last two (

**d-opt.**and

**v-opt.**) are based on variance reduction. These last three strategies are directly inspired by the active learning and experimental design literature for classical prediction problems (regression and classification). From the utility functions in the last column of the table, we see that the first two strategies do not depend on the embedding model, while the other five are all embedding based. Only for the last strategy (

**v-opt.**) is the utility function a function of the target set T.

#### 4.1. Heuristics

**page-rank.**and

**max-deg.**Meanwhile, as networks are often sparse, queries that result in the discovery of new links—rather than the discovery of non-links—are considered more informative, and this idea leads to the

**max-prob.**and

**min-dis.**strategies.

**page-rank.**, each candidate node pair is evaluated as the sum of both nodes’ PageRank scores [31]: ${u}_{\mathit{A}}(i,j)={\mathrm{PR}}_{i}+{\mathrm{PR}}_{j}$, while for

**max-deg.**, the utility is defined as the sum of the degrees: ${u}_{\mathit{A}}(i,j)={\sum}_{k:\{i,k\}\in E}{a}_{ik}+{\sum}_{l:\{j,l\}\in E}{a}_{jl}$. The probability-based strategies both tend to query node pairs that are highly likely to be linked. This is true by definition for

**max-prob.**: ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)=P({a}_{ij}=1|{\mathit{X}}^{*})$ and approximately true for

**min-dis.**: ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)=-\left|\right|{\mathit{x}}_{i}^{*}-{\mathit{x}}_{j}^{*}{\left|\right|}_{2}$, as nearby nodes in the embedding space are typically linked with a higher probability.

#### 4.2. Uncertainty Sampling

**max-ent.**strategy that selects the most uncertain candidate link status to be labeled by the oracle. Intuitively, knowing the most uncertain link status maximally reduces the total amount of uncertainty in the unobserved part, although indirect benefits of the queried link status on the model’s capability to predict the status of other node pairs are not accounted for.

#### 4.3. Variance Reduction

#### 4.3.1. The Fisher Information for the Modified CNE

#### 4.3.2. Parameter Variance Reduction with D-Optimality

**d-opt.**strategy seeks to minimize the bounds by maximizing the Fisher information.

**d-opt.**strategy.

**d-opt.**for a node pair $\{i,j\}\in P$ is formally defined as:

#### 4.3.3. Prediction Variance Reduction with V-Optimality

**v-opt.**strategy is to minimize the link prediction variance for the target set T.

- First, generate the expression of the prediction variance;
- Then, define the query strategy as the utility function that quantifies the variance reduction.

**v-opt.**and proves a theorem for computing it.

**Definition**

**2.**

**Theorem**

**1.**

## 5. Experiments and Discussion

**Q1**- What is the impact of distinguishing an “observed unlinked” from an “unobserved” status of a node pair for partial network embedding?
**Q2**- Do the proposed active learning query strategies for ALPINE make sense qualitatively?
**Q3**- How do the different active learning query strategies for ALPINE perform quantitatively?
**Q4**- How can the query strategies be applied best according to the results?

**Data:**We used eight real-world networks of varying sizes in the experiments:

**Polbooks**is a network of 105 books about U.S. politics among which 441 connections indicate the co-purchasing relations [40];**C. elegans**is a neural network of C. elegans with 297 neurons and 2148 synapses as the links [41];**USAir**is a network of 332 airports connected through 2126 airlines [42];**MP_cc**is a Twitter network we gathered in April 2019 for the Members of Parliament (MP) in the U.K., which originally contained 650 nodes. We only used its largest connected component of 567 nodes and 49,631 friendship (i.e., mutual follow) connections;**Polblogs_cc**is the largest connected component of the U.S. Political Blogs Network [40], containing 1222 nodes and 16,714 undirected hyperlink edges;- PPI is a protein–protein interaction network with 3890 proteins and 76,584 interactions [43], and we used its largest connected component
**PPI_cc**of 3852 nodes and 37,841 edges after deleting the self-loops; **Blog**is a friendship network of 10,312 bloggers from BlogCatalog, containing 333,983 links [44].

#### 5.1. The Benefit of Partial Network Embedding

- CNE: fit the entire network where the unobserved link status is treated as unlinked;
- CNE_K: fit the model only for the observed linked and unlinked node pairs.

**Setup:**To construct a PON, we first initialized the observed information by randomly sampling a node pair set ${K}_{0}=E\cup D$ that contained a proportion ${r}_{0}$ of the complete information. The complete information means the total number of links in the complete graph for a given number of nodes. For example, ${r}_{0}=10\%$ means that $10\%$ of the network link statuses are observed as either linked or unlinked: if the network has n nodes, $\left(\right)open="|"\; close="|">{K}_{0}$. The observed ${K}_{0}$ is guaranteed connected as this is a common assumption for network embedding methods. Then, we embedded the same PON using both CNE and CNE_K on a machine with an Intel Core i7 CPU 4.20 GHz and 32 GB RAM.

**Results:**From the results shown in Figure 2, we see that CNE_K was not only more time efficient, but also provided more accurate link predictions. The ratio ${r}_{0}$ of observed information varied for datasets because the larger the network, the more time consuming the computations are. The time differences for a small observed part were enough to highlight the time efficiency of CNE_K. The two measures examined were: AUC_U—the prediction AUC score for all unobserved node pairs $(i,j)\in U$ containing $1-{r}_{0}$ network information; and t(s)—the model fitting time in seconds. Both values are averaged—for each ${r}_{0}$ averaged over 10 different PONs and each PON with 10 different embeddings (i.e., CNE has local optima) for the first four datasets—while it is $5\times 5$ for the fifth and sixth and $3\times 3$ for the last dataset.

#### 5.2. Qualitative Evaluation of ALPINE

**page-rank.**,

**d-opt.**, and

**v-opt.**;

**min-dis.**was omitted as it approximates

**max-prob.**

**max-deg.**and

**max-prob.**,

**page-rank.**and

**d-opt.**had the same top three suggestions: Hermione, Ron, and Albus, which are essential allies of Harry. Knowing whether Harry is linked to them will give a clear big picture of his social relations. The results can further be analyzed according to the strategy definitions.

**page-rank.**, as

**max-deg.**, aims to find out Harry’s relationships with the influencers—nodes that are observed to have many links. With this type of strategy, we learned which influencers Harry is close to, as well as his potential allies connecting to them; and conversely for his unlinked influencers.

**d-opt.**strategy selects queries based on the parameter variance reduction. It implies that by knowing whether Harry is linked to the suggested nodes, the node embeddings will have a smaller variance, such that the entire embedding space is more stable, and thus, the link predictions are more reliable. For example, Severus, who ranks the fourth here (also the fourth with

**max-ent.**), was not an obvious ally of Harry, but he helps secretly and is essential in shaping the network structure. The suggestions were considered uncertain and contributed to the reduction of the parameter variance.

**v-opt.**strategy quantifies the informativeness of the unobserved link statuses by the amount of estimated prediction variance reduction they cause. It suggests that Harry’s relationships to the Weasley family are informative for minimizing the prediction variance for him. It makes sense as this family is well connected with Rubeus, who is Harry’s known ally, and also connects well with other nodes. As for Fluffy, it was observed to be connected only to Rubeus and unlinked to all other nodes except Harry. Knowing whether Fluffy and Harry are linked greatly reduced the variance on the prediction for the unobserved, because there was no other information for it.

#### 5.3. Quantitative Evaluation of ALPINE

**Setup:**As before, we constructed a PON by randomly initializing the observed node pair set ${K}_{0}$ with a given ratio ${r}_{0}$, while making sure ${K}_{0}$ was connected. Then, we applied ALPINE with different query strategies for a budget k and a step size s. More specifically, we investigated three representative different cases depending on the pool P and the target set T:

- $P=U$ and $T=U$: all the unobserved information was accessible, and we were interested in knowing all link statuses in U;
- $P\subset U$ and $T=U$: only part of U was accessible, and we still wanted to predict the entire U as accurately as possible;
- $P\subset U$, $T\subset U$, and $T\cap U=\varnothing $: only part of U was accessible, and we were interested in predicting a different set of unobserved link status that was inaccessible.

#### 5.3.1. Case 1: $P=U$ and $T=U$

**rand.**strategy. We saw that when the observed part was relatively small—$3\%$ or $10\%$— the degree-related strategies that did not depend on the embedding usually performed very well, and the random strategy was not always the worst. As more information was observed when ${r}_{0}$ increased (see the plots from the left to the right in each row), we did not only see that the active learning strategies, such as

**v-opt.**and

**max-ent.**, began to dominate and passive learning became the worst, but also the increase of the starting $AUC\_U$. Zooming in to individual subplots, we saw that ALPINE boosted link prediction accuracy with far fewer queries for the active learning strategies, compared to passive learning. Overall, when the observed information was very limited, the embedding-independent strategies

**page-rank.**and

**max-deg.**outperformed the others; while for sufficiently enough information,

**v-opt.**and

**max-ent.**were the better choices. We speculated that this was the case as the embedding must be of sufficient quality for the embedding-dependent strategies to work, which requires a certain minimum amount of data. Worth noticing is that

**d-opt.**showed similar performance across different values of ${r}_{0}$, which will be discussed further in Section 5.4.

**max-ent.**, the performance was very variable. In some cases, it performed quite well, as shown in the top right subplot, beating

**v-opt.**in the first few iterations, while on the MP_cc network, it was one of the worst strategies. In addition to that, the runtime of HALLP was much longer than that of the other strategies; thus, some of the subplots do not have the HALLP result. The runtime analysis for one iteration of the query process on a server with an Intel Xeon Gold CPU 3.00 GHz and 256 GB RAM is shown in Table 4 below. The results were averaged in the same way as in Figure 4 for the four values of ${r}_{0}$ and then further averaged over the ${r}_{0}$ values. Across different datasets, HALLP was by far the most computationally expensive strategy as it had to run two link predictors.

#### 5.3.2. Case 2: $P\subset U$ and $T=U$

**page-rank.**,

**max-deg.**, and

**d-opt.**were the winning group for the first two ${r}_{0}$ values. However, in the third and fourth subplot in the same row,

**v-opt.**,

**max-ent.**, and

**d-opt.**performed best. The

**d-opt.**strategy stayed as one of the top strategies across different percentages of the observed information.

#### 5.3.3. Case 3: $P\subset U$, $T\subset U$, and $T\cap U=\varnothing $

**v-opt.**to perform the best in this case because it was the only strategy that explicitly considered T. Although it was shown to perform quite well in some subplots especially for the first iteration, the quality of the embedding affected its performance. Indeed, as we observed before, the reliability of all embedding-based strategies depended largely on how well the network was embedded, which became much better as ${r}_{0}$ increased.

**page-rank.**and

**max-deg.**—had the top performance when ${r}_{0}$ was small; and the embedding-based strategies became increasingly competitive if more information was observed.

#### 5.3.4. Evaluations on Two Larger Networks

**v-opt.**and HALLP, as they were computationally too expensive. The AUC scores were averaged over five sets of random initial ${K}_{0}$, P, and T, and each set with five initial embeddings. The last column looks bumpy since the score was already very high and small randomness in the embedding could cause a slight difference. Figure 8 shows the results for the second and third case with three values of ${r}_{0}$. Case 1 was omitted because embedding the Blog network with a large observed part was already quite expensive, and evaluating all the unobserved candidates when ${r}_{0}$ was small made it computationally too demanding.

#### 5.4. Discussion

**page-rank.**and

**max-deg.**did not depend on the network embedding while the other five were embedding based. Thus, as with limited observed information, the network embedding might be of poor quality, in such cases,

**page-rank.**and

**max-deg.**were seen to outperform the others.

**max-ent.**and

**v-opt.**had the top performance, but

**d-opt.**had a more stable high performance across different values of ${r}_{0}$. Based on those observations, we recommend a mixed strategy that starts from the degree-related and then switches to other embedding-based strategies.

**rand.**, required a similar computing time when given the same size of P. A notable exception is

**v-opt.**, which was computationally more expensive. Yet, if we had a sufficiently accurate network embedding model, e.g., see the last columns in Figure 4, Figure 5 and Figure 6,

**v-opt.**was almost always the most accurate, especially for the first few iterations. Thus, when the cost of querying was high as compared to the cost of computations,

**v-opt.**was preferable as soon as enough data were available such that the embedding was sufficiently accurate. If computational cost was a bottleneck though,

**max-ent.**and

**d-opt.**were computationally less expensive substitutes for

**v-opt.**, with comparable accuracies.

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. The Observed Fisher Information Matrix

**Full Hessian:**When considering the entire embedding matrix $\mathit{X}$, its Fisher information is its full Hessian $\mathit{H}\in {\mathbb{R}}^{nd\times nd}$ consisting of $n\times n$ blocks of size $d\times d$. The diagonal blocks ${\mathcal{I}}_{ii}\left(\mathit{X}\right)=\mathcal{I}\left({\mathit{x}}_{i}\right)$ and the off-diagonal blocks ${\mathcal{I}}_{ij}\left(\mathit{X}\right)$ are defined as:

## Appendix B. The Prediction Variance

## References

- Cui, P.; Wang, X.; Pei, J.; Zhu, W. A Survey on Network Embedding. IEEE Trans. Knowl. Data Eng.
**2019**, 31, 833–852. [Google Scholar] [CrossRef][Green Version] - Liben-Nowell, D.; Kleinberg, J. The Link-Prediction Problem for Social Networks. J. Am. Soc. Inf. Sci. Technol.
**2007**, 58, 1019–1031. [Google Scholar] [CrossRef][Green Version] - Martínez, V.; Berzal, F.; Cubero, J.C. A Survey of Link Prediction in Complex Networks. ACM Comput. Surv.
**2016**, 49, 1–33. [Google Scholar] [CrossRef] - Mara, A.C.; Lijffijt, J.; De Bie, T. Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress? In Proceedings of the 7th IEEE International Conference on Data Science and Advanced Analytics, Sydney, Australia, 6–9 October 2020; pp. 138–147. [Google Scholar]
- Zhao, Y.; Wu, Y.J.; Levina, E.; Zhu, J. Link Prediction for Partially Observed Networks. J. Comput. Graph. Stat.
**2017**, 26, 725–733. [Google Scholar] [CrossRef][Green Version] - Cesa-Bianchi, N.; Gentile, C.; Vitale, F.; Zappella, G. A Linear Time Active Learning Algorithm for Link Classification. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
- Ostapuk, N.; Yang, J.; Cudré-Mauroux, P. ActiveLink: Deep Active Learning for Link Prediction in Knowledge Graphs. In Proceedings of the 2019 World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 1398–1408. [Google Scholar]
- Jia, J.; Schaub, M.T.; Segarra, S.; Benson, A.R. Graph-based Semi-Supervised & Active Learning for Edge Flows. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 761–771. [Google Scholar]
- Chen, K.J.; Han, J.; Li, Y. HALLP: A Hybrid Active Learning Approach to Link Prediction Task. J. Comput.
**2014**, 9, 551–556. [Google Scholar] [CrossRef] - Kang, B.; Lijffijt, J.; De Bie, T. Conditional Network Embeddings. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Evans, C.; Friedman, J.; Karakus, E.; Pandey, J. PotterVerse. 2014. Available online: https://github.com/efekarakus/potter-network (accessed on 31 March 2019).
- Brinker, K. Incorporating Diversity in Active Learning with Support Vector Machines. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 59–66. [Google Scholar]
- Cai, H.; Zheng, V.W.; Chang, K.C.C. Active Learning for Graph Embedding. arXiv
**2017**, arXiv:1705.05085. [Google Scholar] - Settles, B. Active Learning Literature Survey; Technical Report; University of Wisconsin: Madison, WI, USA, 2009. [Google Scholar]
- Atkinson, A.; Donev, A.; Tobias, R. Optimum Experimental Designs, with SAS; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
- Cohn, D.A.; Ghahramani, Z.; Jordan, M.I. Active Learning with Statistical Models. J. Artif. Intell. Res.
**1996**, 4, 129–145. [Google Scholar] [CrossRef] - Zhang, T.; Oles, F.J. A Probability Analysis on the Value of Unlabeled Data for Classification Problems. In Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, USA, 29 June–2 July 2000; pp. 1191–1198. [Google Scholar]
- Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
- Grover, A.; Leskovec, J. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Chen, X.; Yu, G.; Wang, J.; Domeniconi, C.; Li, Z.; Zhang, X. ActiveHNE: Active Heterogeneous Network Embedding. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 2123–2129. [Google Scholar]
- Yang, Z.; Cohen, W.; Salakhudinov, R. Revisiting Semi-Supervised Learning with Graph Embeddings. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 40–48. [Google Scholar]
- Aggarwal, C.C.; Kong, X.; Gu, Q.; Han, J.; Philip, S.Y. Active Learning: A Survey. In Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2014; pp. 571–605. [Google Scholar]
- Kong, X.; Fan, W.; Yu, P.S. Dual Active Feature and Sample Selection for Graph Classification. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 654–662. [Google Scholar]
- Bilgic, M.; Mihalkova, L.; Getoor, L. Active Learning for Networked Data. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 79–86. [Google Scholar]
- Cesa-Bianchi, N.; Gentile, C.; Vitale, F.; Zappella, G. Active Learning on Trees and Graphs. arXiv
**2013**, arXiv:1301.5112. [Google Scholar] - Guillory, A.; Bilmes, J.A. Label Selection on Graphs. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; Volume 22. [Google Scholar]
- Yang, Z.; Tang, J.; Zhang, Y. Active Learning for Streaming Networked Data. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 3–7 November 2014; pp. 1129–1138. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Clauset, A.; Moore, C.; Newman, M.E. Hierarchical structure and the prediction of missing links in networks. Nature
**2008**, 453, 98–101. [Google Scholar] [CrossRef] [PubMed] - Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
- Smith, K. On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance they give Towards a Proper Choice of the Distribution of Observations. Biometrika
**1918**, 12, 1–85. [Google Scholar] [CrossRef][Green Version] - Welch, W.J. Computer-Aided Design of Experiments for Response Estimation. Technometrics
**1984**, 26, 217–224. [Google Scholar] [CrossRef] - Liu, S.; Neudecker, H. A V-optimal design for Scheffé’s polynomial model. Stat. Probab. Lett.
**1995**, 23, 253–258. [Google Scholar] [CrossRef] - Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Rao, C.R. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 235–247. [Google Scholar]
- Efron, B.; Hinkley, D.V. Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected Fisher Information. Biometrika
**1978**, 65, 457–483. [Google Scholar] [CrossRef] - Harville, D.A. Matrix Algebra From a Statistician’s Perspective. Technometrics
**1998**, 40, 164. [Google Scholar] [CrossRef] - Sherman, J.; Morrison, W.J. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix. Ann. Math. Stat.
**1950**, 21, 124–127. [Google Scholar] [CrossRef] - Adamic, L.A.; Glance, N. The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
- Watts, D.J.; Strogatz, S.H. Collective dynamics of ’small-world’ networks. Nature
**1998**, 393, 440–442. [Google Scholar] [CrossRef] [PubMed] - Handcock, M.S.; Hunter, D.R.; Butts, C.T.; Goodreau, S.M.; Morris, M. statnet: An R Package for the Statistical Modeling of Social Networks. 2003. Available online: http://www.Csde.Washington.Edu/Statnet (accessed on 22 November 2018).
- Breitkreutz, B.J.; Stark, C.; Reguly, T.; Boucher, L.; Breitkreutz, A.; Livstone, M.; Oughtred, R.; Lackner, D.H.; Bähler, J.; Wood, V.; et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res.
**2007**, 36, D637–D640. [Google Scholar] [CrossRef] [PubMed][Green Version] - Zafarani, R.; Liu, H. Social Computing Data Repository at ASU. 2009. Available online: http://socialcomputing.asu.edu (accessed on 22 November 2018).
- Zhang, M.; Chen, Y. Link Prediction Based on Graph Neural Networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Li, P.; Wang, Y.; Wang, H.; Leskovec, J. Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, virtual, 6–12 December 2020; Volume 33, pp. 4465–4478. [Google Scholar]
- Lin, W.; Ji, S.; Li, B. Adversarial Attacks on Link Prediction Algorithms Based on Graph Neural Networks. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, Taipei, Taiwan, 5–9 October 2020; pp. 370–380. [Google Scholar]

**Figure 1.**Harry Potter network with suggestions from Table 1 highlighted.

**Figure 3.**Harry Potter network with suggestions from Table 3 highlighted.

**Figure 4.**ALPINE: $P=U$ and $T=U$. Row 1: Polbooks ($s=54$); Row 2: C. elegans ($s=439$); Row 3: USAir ($s=549$); Row 4: MP_cc ($s=1604$); Row 5: Polblogs_cc ($s=7460$).

**Figure 5.**ALPINE—$P\subset U$ and $T=U$. Row 1: Polbooks ($s=54$); Row 2: C. elegans ($s=439$); Row 3: USAir ($s=549$); Row 4: MP_cc ($s=1604$); Row 5: Polblogs_cc ($s=7460$).

**Figure 6.**ALPINE: $P\subset U$, $T\subset U$, and $T\cap U=\varnothing $. Row 1: Polbooks ($s=54$); Row 2: C. elegans ($s=439$); Row 3: USAir ($s=549$); Row 4: MP_cc ($s=1604$); Row 5: Polblogs_cc ($s=7460$).

Strategy | Max-Deg. | Max-Prob. | Max-Ent. |
---|---|---|---|

1 | Ron Weasley | Ron Weasley | Albus Dumbledore |

2 | Albus Dumbledore | Hermione Granger | Grawp |

3 | Hermione Granger | Albus Dumbledore | Minerva McGonagall |

4 | Ginny Weasley | Grawp | Severus Snape |

5 | Sirius Black | Minerva McGonagall | Aragog |

Strategy | Definition | Utility Function |
---|---|---|

page-rank. | PageRank score sum | ${u}_{\mathit{A}}(i,j)={\mathrm{PR}}_{i}+{\mathrm{PR}}_{j}$ |

max-deg. | Degree sum | ${u}_{\mathit{A}}(i,j)={\sum}_{k:(i,k)\in E}{a}_{ik}+{\sum}_{l:(j,l)\in E}{a}_{jl}$ |

max-prob. | Link probability | ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)=P({a}_{ij}=1|{\mathit{X}}^{*})$ |

min-dis. | Node pair distance | ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)=-\left|\right|{\mathit{x}}_{i}^{*}-{\mathit{x}}_{j}^{*}{\left|\right|}_{2}$ |

max-ent. | Link entropy | ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)=-{\sum}_{{a}_{ij}=0,1}P\left({a}_{ij}\right|{\mathit{X}}^{*})logP\left({a}_{ij}\right|{\mathit{X}}^{*})$ |

d-opt. | Parameter variance reduction | ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)={u}_{{\mathit{x}}_{i}^{*}}(i,j)+{u}_{{\mathit{x}}_{j}^{*}}(i,j)$ |

v-opt. | Prediction variance reduction | ${u}_{\mathit{A},{\mathit{X}}^{*}}(i,j)={\sum}_{k:(i,k)\in T}{u}^{ik}(i,j)+{\sum}_{l:(j,l)\in T}{u}^{jl}(i,j)$ |

Strategy | Page-Rank. | d-opt. | v-opt. |
---|---|---|---|

1 | Ron Weasley | Hermione Granger | Arthur Weasley |

2 | Albus Dumbledore | Ron Weasley | Fluffy |

3 | Hermione Granger | Albus Dumbledore | Charlie Weasley |

4 | Vincent Crabbe Sr. | Severus Snape | Albus Dumbledore |

5 | Neville Longbottom | Ginny Weasley | Ron Weasley |

Data | Rand. | Page-Rank. | Max-Deg. | Max-Prob. | Min-Dis. | Max-Ent. | d-opt. | v-opt. | HALLP |
---|---|---|---|---|---|---|---|---|---|

Polbooks | 0.001 | 0.031 | 0.008 | 0.004 | 0.027 | 0.004 | 0.093 | 0.482 | 12.33 |

C. elegans | 0.005 | 0.108 | 0.042 | 0.028 | 0.18 | 0.029 | 0.675 | 5.469 | 148.4 |

USAir | 0.006 | 0.134 | 0.052 | 0.035 | 0.231 | 0.036 | 0.881 | 7.309 | 232.6 |

MP_cc | 0.016 | 0.707 | 0.16 | 0.117 | 0.693 | 0.125 | 2.746 | 28.90 | 1074 |

Polblogs_cc | 0.092 | 1.153 | 0.707 | 0.675 | 3.264 | 0.709 | 12.31 | 226.0 | 12022 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chen, X.; Kang, B.; Lijffijt, J.; De Bie, T.
ALPINE: Active Link Prediction Using Network Embedding. *Appl. Sci.* **2021**, *11*, 5043.
https://doi.org/10.3390/app11115043

**AMA Style**

Chen X, Kang B, Lijffijt J, De Bie T.
ALPINE: Active Link Prediction Using Network Embedding. *Applied Sciences*. 2021; 11(11):5043.
https://doi.org/10.3390/app11115043

**Chicago/Turabian Style**

Chen, Xi, Bo Kang, Jefrey Lijffijt, and Tijl De Bie.
2021. "ALPINE: Active Link Prediction Using Network Embedding" *Applied Sciences* 11, no. 11: 5043.
https://doi.org/10.3390/app11115043