Multi-View Network Representation Learning Algorithm Research

: Network representation learning is a key research ﬁeld in network data mining. In this paper, we propose a novel multi-view network representation algorithm (MVNR), which embeds multi-scale relations of network vertices into the low dimensional representation space. In contrast to existing approaches, MVNR explicitly encodes higher order information using k -step networks. In addition, we introduce the matrix forest index as a kind of network feature, which can be applied to balance the representation weights of different network views. We also research the relevance amongst MVNR and several excellent research achievements, including DeepWalk, node2vec and GraRep and so forth. We conduct our experiment on several real-world citation datasets and demonstrate that MVNR outperforms some new approaches using neural matrix factorization. Speciﬁcally, we demonstrate the efﬁciency of MVNR on network classiﬁcation, visualization and link prediction tasks.


Introduction
The network representation learning aims at learning and obtaining the low-dimensional, compressed and dense distributed representation vectors for various kinds of networks.It can be straightforwardly considered as the network encoding task for the networks, consequently, the nearest neighboring vertices have closer distance in the network representation space of lower dimension.
DeepWalk is the representative algorithm of network representation learning.However, DeepWalk [1] is able to incorporate higher-order network information by multiple steps of random walks.In this respect, WALKLETS [2] has conducted some positive explorations and researches and proved that multi-step random walk can encode higher-order features into the network representations.Meanwhile, high-order network representation learning can excavate valuable features only based on the existing network structural information in some sparse networks.In addition, the essence of DeepWalk based on Skip-Gram is to factorize the matrix of network structural features [3].Fortunately, GraRep [4] and Network Embedding Update (NEU) [5] are proposed to learn the network representations, which can capture higher order network features.Thus, inspired by GraRep [4], WALKLETS [2] and NEU [5], we propose a joint learning framework of multi-view network representation learning algorithm (MVNR).MNVR aims to capture higher order structural relations into lower dimension embeddings by weighting k-step network representations of higher order into a single vector.The k-th step network is denoted as the k-th view order structural relations into lower dimension embeddings by weighting k-step network representations of higher order into a single vector.The k-th step network is denoted as the k-th view of the original network, therefore, MVNR is a network representation learning algorithm of integrating multi-view features.
The algorithm framework of MVNR is illustrated in Figure 1a and 1b.As shown in Figure 1a, MVNR can capture higher order structural relations, because we reconstruct k-step networks based on the length k of random walk on edges, the new network is different from the original network, the relations between of them are not the inclusive relations.In term of original network, the 2-step network is the 2nd order feature network when k is set as 2 and so on.
As shown in Figure 1b, the link weights consist of the certainty degrees of existing edges and the link probabilities of non-existing edges.The certainty degrees of existing edges are applied to 1step network and the link probabilities of non-existing edges are used to k-step networks where k > 2.
Most previous works on network representation learning use a "one-size fits all" approach to train the learning model, where the single learnt representations are applied to various tasks, for example, DeepWalk [1], Tri-Party Deep Network Representation (TriDNR) [6] and node2vec [7] can only capture lower order features, which fails to explicitly capture higher order relations and global structural information.Especially in sparse networks, the existing network representation learning algorithms based on network structures are difficult to get valuable relations and structural features between vertices.GraRep [4] aims to factorize transition feature matrix of the network, which can explicitly encode the network features of higher order but it does not research how to weight the features form different order networks.WALKLETS [2] improves the procedure of random walk, which can capture the walk sequences by skipping pre-selected length of walk step.Nevertheless, WALKLETS [2] is still unable to obtain global features of the network, in fact, it is still the network representation learning algorithm based on local features of the network.NEU [5] can implicitly approximate higher order proximities with theoretical approximation bound.In fact, NEU [5] is more like an optimization algorithm, which optimizes the network representation obtained by DeepWalk [1], node2vec [7] and Line [8] and so forth.This kind of optimization process transforms the loworder network representations into the high-order network representations, however, NEU has no influence on modeling procedure of network representation learning.
We adopt a novel approach to jointly learn the network representations.For the final network representations, the different k-step networks should be given different weights.Therefore, we adopt As shown in Figure 1a, MVNR can capture higher order structural relations, because we reconstruct k-step networks based on the length k of random walk on edges, the new network is different from the original network, the relations between of them are not the inclusive relations.In term of original network, the 2-step network is the 2nd order feature network when k is set as 2 and so on.
As shown in Figure 1b, the link weights consist of the certainty degrees of existing edges and the link probabilities of non-existing edges.The certainty degrees of existing edges are applied to 1-step network and the link probabilities of non-existing edges are used to k-step networks where k > 2.
Most previous works on network representation learning use a "one-size fits all" approach to train the learning model, where the single learnt representations are applied to various tasks, for example, DeepWalk [1], Tri-Party Deep Network Representation (TriDNR) [6] and node2vec [7] can only capture lower order features, which fails to explicitly capture higher order relations and global structural information.Especially in sparse networks, the existing network representation learning algorithms based on network structures are difficult to get valuable relations and structural features between vertices.GraRep [4] aims to factorize transition feature matrix of the network, which can explicitly encode the network features of higher order but it does not research how to weight the features form different order networks.WALKLETS [2] improves the procedure of random walk, which can capture the walk sequences by skipping pre-selected length of walk step.Nevertheless, WALKLETS [2] is still unable to obtain global features of the network, in fact, it is still the network representation learning algorithm based on local features of the network.NEU [5] can implicitly approximate higher order proximities with theoretical approximation bound.In fact, NEU [5] is more like an optimization algorithm, which optimizes the network representation obtained by DeepWalk [1], node2vec [7] and Line [8] and so forth.This kind of optimization process transforms the low-order network representations into the high-order network representations, however, NEU has no influence on modeling procedure of network representation learning.
We adopt a novel approach to jointly learn the network representations.For the final network representations, the different k-step networks should be given different weights.Therefore, we adopt the link probabilities of non-existing edges to weight different k-step network features.For the joint learning task, we use the inductive matrix completion algorithm to generate network representations for each view.As the GraRep [4] algorithm, we concatenate different network representations from each view as the final representations.We believe that the k-step relations between different vertices reveal higher order relations and the global structural features, and it is essential to explicitly take full advantage of the different k-step network features for learning a better graph representation for various networks.
Our contributions are as follows: (1) We introduce a novel network representation learning (MVNR), which explicitly captures higher order neighboring relations and global features.In network classification task, the performance of MVNR is superior to that of network representation algorithms based on the features of single view, such as DeepWalk, Line and node2vec.MVNR also outperforms the existing network representation learning algorithm based on the features of higher order, such as, GraRep and NEU.(2) We introduce the Matrix Forest Index (MFI) to evaluate the weights of different k-step networks, which gives different weights for the representation vectors of different k-step networks in common representation vector space.This operation makes up for the important deficiency of GraRep algorithm.In addition, MFI features are calculated based on the network structure features.Therefore, it can provide sufficient structure features and improve the sparse problem of structure feature matrix in sparse networks.(3) The visualization result of MVNR algorithm is better than that of node2vec and DeepWalk algorithm.It shows stronger cohesiveness and clearer classification boundary.Therefore, MVNR can learn the discriminative representation vectors.Consequently, it also shows excellent performance in link prediction tasks, and its link prediction performance is better than that of the baseline algorithms used in this paper.

Related Work
Network representation learning aims to embed the features of the network to representation spaces of lower dimension.The network features are mainly composed of structural features, text features of nodes, community attributes and labels, which is also the input of network representation learning.Network representation learning is similar to network coding task.The output of network representation learning is representation vector space, which makes some nodes with similar features have a closer distance in vector space.
Most network analytics approaches possess higher computation complexity.Network representation learning is an effective yet efficient method to deal with the network analytics challenges, which converts the network data into the representation space of lower dimension in which the network structural features and properties are maximumly preserved.Network representation learning benefits various network analytics tasks as the vector representations, the learnt representations also can be applied efficiently in both time and space.For example, we can conduct the node classification, node clustering, node recommendation, node retrieval, node ranking, link prediction and network visualization and so forth.There also exist other example scenarios, such as, multimedia network embedding, Information propagation and social networks alignment and so forth.
According to the application whether the network properties are applied to modeling procedure or not, the existing network representation learning algorithms are mainly divided into two kind of different research systems.The first kind of research system is only based on the network structural features and the second kind of research system is based on the joint learning model.Their representative algorithms are DeepWalk [1] and TriDNR [6], respectively.
The methods based on network structural features aim to explore and research the performance improvement of network representation learning based on the structure features of the network.For example, Perozzi et al. [1] proposed the DeepWalk algorithm in 2014.DeepWalk [1] adopts the Skip-Gram model in Word2Vec [9], modifying the input of the model from (Current Word, Context Words) pairs into (Current Vertex, Context Vertices) pairs.These pairs are inputted to a shadow neural network, which can fully embed the structure features into representation spaces of lower dimension.
Consequently, DeepWalk has been successfully applied to various tasks [10].Line [8] proposed the first-order and second-order proximities function for learning large-scale network representation in 2015.Node2vec [7] improves random walk procedure of DeepWalk to sample neighboring vertices around the current vertex by introducing breadth-first search (BFS) and depth-first search (DFS) methods.Structural Deep Network Embedding (SDNE) [11] integrates global information during random walk procedure for network representation learning.Tu et al. [12] introduces the polysemy idea to representation learning which trains different network representations for current vertex according to different context nodes.There also exist some network representation approaches for the special network structures, such as DynamicTriad [13], DepthLGP [14] and Structural Deep Embedding for Hyper-Networks (DHNE) [15] and so forth.In addition, motivated by Generative Adversarial Networks (GAN) [16], some network representation learning algorithms introduce GAN to optimize the learning procedure, such as ANE [17], GraphGAN [18] and NetGan [19].
The method is based on the joint learning model and mainly improves the performance of network representation learning by means of introducing other attribute information of the network, such as community information, text contents and labels and so forth.According to existing literature findings, the performance of the joint network representation learning algorithm is better than that of the algorithm using only network structure information.Since the research foundation of DeepWalk, many researchers have proposed a variety of network representation learning algorithms based on joint learning model, such as, TriDNR [6], Content-Enhanced Network Embedding (CENE) [20], Graph Convolutional Network (GCN) [21], Discriminative Deep Random Walk (DDRW) [22], Planetoid [23], PTE [24], Modularized Nonnegative Matrix Factorization (M-NMF) [25], Accelerated Attributed Network Embedding (AANE) [26], COSINE [27] and Community Embedding (comE) [28] and so forth.Specifically, TriDNR [6] and CENE [20] incorporate the text contents into the procedure of network representation learning by treating the content information as a special kind of node.M-NMF [25] is a modularized nonnegative matrix factorization approach, which incorporates the community structures into network representations.Both COSINE [27] and comE [28] also incorporate the community information into network representations.AANE [26] is an accelerated attributed network embedding approach, which enables the joint learning framework to be conducted by decomposing the complicated learning and training into many sub-problems.In addition, based on the fact of matrix decomposition of DeepWalk [1], Yang et al. [29] proposes the Text-Associated Deep Walk (TADW) algorithm which ensembles text contents into matrix factorization algorithm.Based on the TADW [29] algorithm, the Max-Margin Deep Walk (MMDW) [30] algorithm adopts maximum margin algorithm to optimize the learnt representation vectors.There exist some community preserving [31] and heterogeneous [32] network representation algorithms.

Our Method
In this paper, we propose a novel network representation algorithm, MVNR, to learn multi-view representations of vertices in a network.Firstly, we introduce the multi-view strategy to define the k-step network features.Meanwhile, we introduce link prediction index to evaluate the link weights for existing and non-existing edges.Based on the link weights, we then propose a new approach to calculate the corresponding weights for each view's representations.Finally, we propose a new approach to jointly learn the representations based on the different views and weight information.Consequently, our model integrates rich local structural information associated with the network, capturing the global structural properties of the network.

Formalization
Suppose that G = (V, E) is a network, we first denote some parameters, V denotes the node set, E denotes the edge set and E ∈ V × V. |V| is the size of node set.For each node v ∈ V, the purpose of network representation aims to learn the low-dimensional representation r v ∈ R k , where k is the length of the representation vector and k is less than |V|.In addition, r v ∈ R k is not only applied to network classification task but also can be used for various machine learning tasks, such as clustering, link prediction, recommendation systems and so forth.

Feature Extraction for Different k-Step Networks
DeepWalk uses Skip-Gram model for big-scale network representation learning, which captures context vertices by random walk algorithm.The objective of the DeepWalk is to maximize the following average log probability: 1 where L denotes the number of vertices, v i denotes the current vertex and v j+i denotes the context vertices of the current vertex v i .t is the number of context vertices before and after the current vertex v i .t indicates the context window size to be 2t + 1.Moreover, the conditional probability p(v j+i |v i ) can be defined by , where r v and o v is the input and output latent variable, namely, the input and output representation vectors of v.
Based on DeepWalk and PageRank, the research achievements from TADW [29] show that DeepWalk aims to factorize the following matrix: where As shown in Equation ( 3), matrix M captures the t-th step neighboring vertices.Node2vec adopts BFS and DFS to capture higher order neighboring vertices, GraRep captures higher order network feature matrix by the k-step probability transition matrix B k , and B k is the multiplication form of multiple B, where B = D −1 A, A is an adjacency matrix, D is a diagonal matrix.One shortage of GraRep is that the inverse matrix D −1 does not exist when the network is sparse.Based on the Equation (2), TADW [29] presents a simplified target matrix and finds a balance between speed and accuracy.Consequently, matrix M can be formulated as follows: Matrix M captures the first-order and second-order neighbors for vertices with small computation complexity.
GraRep uses the matrix (D −1 A) k to represent higher order feature matrix.In MVNR, we capture higher order features by defining the adjacency matrix A (k) as GraRep, but we perform some optimization operations for the matrix A (k) .Unlike GraRep algorithm, A (k) is the adjacency matrix of k-step network.In GraRep algorithm, (D −1 A) k can be regarded as the variation form of A k .
A (k) can be regarded as the reachability matrix within k steps.Therefore, the optimization operations of A (k) adopt a similar form as GraRep, but the essence of matrix A (k) and A k is very different.Moreover, the structure feature matrix constructions based on A (k) and A k are also very different.These differences are the optimization operations of A (k) in this paper.
Here, we first adopt A (k) to denote the adjacency matrix of k-step network and can be formulated as follows: where A (k) is the adjacency matrix of the k-step network, a ij is the matrix elements of matrix A (k) and adjacency matrix A consists of 0 and 1, so we set the element value as 0 or 1 in A (k) , thus a and a (k) ij = 1 otherwise, where a k ij is the matrix elements of matrix A k .The Equation ( 4) is different from the transition matrix of GraRep, for each k-step network, we reconstruct the k-step network based on matrix A k and A (k) .In fact, A (k) is the probability of reachability.For each step network, we can get the network structural features as follows: Here, A (k) is the adjacency matrix of the k-th network, which is different from the probability transition matrix Pr in Equation ( 3).The Equation ( 5) and Equation ( 3) have the same form, but they contain different elements.A (k) only consists of element 0 and 1. Pr ij = 1/d i , if (i, j) ∈ E, and Pr ij = 0 otherwise.d i is the degree value of vertex i.Both transition probability matrix Pr and adjacency matrix A (k) are the structure feature matrices of the network.Moreover, GraRep factorizes the transition probability matrix based on adjacency matrix and achieves better network representation performance.The factorization objective matrix of SGNS algorithm is the Equation ( 2), but the TADW algorithm factorizes the matrix M in Equation (3) simplified by Equation ( 2) and achieves excellent network representation learning performance.Therefore, we replace the transition probability matrix Pr in Equation ( 3) with the adjacency matrix A (k) .On the one hand, it is based on the above comparative analysis, on the other hand, it is based on such consideration that the factorization of adjacency matrix has lower computational complexity.
Therefore, matrix M (k) can be regarded as the structural features of k-step network, which is essentially different from the feature matrix of GraRep.The structure features of GraRep is as follows: Here, |V| is the number of vertices in graph G, B k i,j is the element from i-th row and j-th column of the matrix B k , and B = D −1 A.

Feature Weighting for Different k-Step Networks
For the different k-step networks, the same vertex pairs should be given the different weight values.However, DeepWalk, GraRep and node2vec neglect the weight information for higher order neighbors.Therefore, the proposed MVNR in this paper solves this problem by following procedures.
We first introduce Matrix Forest Index (MFI) to evaluate the weights between vertices for 1-step network.For the weights between vertices on k-step network.We compare the performance of the MFI algorithm with some classical link prediction algorithms, such as the algorithms based on common neighbor, the algorithms based on random walk and the algorithms based on path.We find that MFI achieves the best link prediction performance on several real citation network datasets.In addition, MFI can be calculated only through the Laplacian matrix of the network.The input of Laplacian matrix is the adjacency matrix of the network.The input matrix is exactly the same as that of the proposed MVNR algorithm.Unlike other link prediction algorithms, the input, calculation procedure and result of MFI can be embedded in the learning framework of the MVNR algorithm.Therefore, we choose the MFI algorithm to measure the weights between vertices.
MFI is formulated as follows: Here, S= [s ij is a matrix which is constructed by the matrix forest index.L is the Laplacian matrix of G, I is an identity matrix.L can be calculated based on adjacency matrix, its detailed calculation method can be found in the Algorithm 1.
We do not use MFI algorithm and adjacency matrix of k-step to calculate the weights between vertices in k-step network.We calculate the weights between vertices in k-step network by using the weights between vertices in 1-step network and the adjacency matrix of k-step network.Because the weights calculated by MFI in 1-step network include the weights of existing edges and the future connection probabilities of non-existing edges between vertices.Only through such calculation can the weights of different k-step networks be different and hierarchical and also play a role in adjusting the weights of different k-step network representations.For the k-step network, we define its weight matrix as follows: where a ij is the element of the adjacency matrix A (k) .
As mentioned above, the link weights consist of the certainty degrees of existing edges and the link probabilities of non-existing edges.The certainty degrees of existing edges are applied to 1-step network and the link probabilities of non-existing edges are used to k-step networks where k > 2. Through the Equations ( 7) and ( 8), we find that we only compute the value of MFI for one time to different k-step networks and then through the MFI matrix, we construct the weight matrices of different k-step networks.For the 1-step network, we only retain the similarity values between two vertices with one edge in the MFI matrix and delete the similarity values between two vertices without any edge in the MFI matrix.Therefore, we define the similarity value between two vertices with one edge as the certainty degrees of existing edges.For the 2-step network, we only retain the similarity values between two vertices, where these two vertices are reachable within two steps and the similarity value is the future link probabilities of non-existing edges of the 1-step network.Specifically, the weights of the 2-step network are the similarity values between the current central vertex and the neighbor's neighboring vertices.Therefore, we regard this kind of similarity value as the future link probabilities between two vertices without any edge.The certainty degrees of existing edges and the link probabilities of non-existing edges are calculated by the MFI matrix of 1-step network.The specific calculation process is shown in the Equation ( 8) and Algorithm 1.
For each k-step network, we reconstruct the k-step weight matrix based on matrix S and matrix A (k) .We only retain the link weights of existing edges and neglect the weights of non-existing edges between vertices.The weight matrix factor is not only to balance k-step network representation but also it can be regarded as network weight features, which can be integrated into network representations.
In the k-step network, the edge relationship has been established between vertices, where the edge relationship does not exist in the original network.Weight matrix of k-step network can not only balance network representation vectors of different k-step networks but also can be regarded as network weight features of the network, which can be integrated into network representation framework.Specifically, we construct the edge weight matrix of k-step network and the weight matrix of k-step network is different.The weights of the edges of the original network are larger.The edge relationship of the 2-step network is the reachability relationship within 2 step between two vertices in the original network, which can be set as 1 and 0. By MFI similarity calculation, the edge weights of 2-step network are the similarity values between the current vertex and the neighbor's neighboring vertices in the original network.Therefore, the weights in the 2-step network is less than that in the 1-step network.So, the higher the order of the network, the smaller the edge weight in weight matrix is.By using the hierarchical weight matrix, we can combine the weight matrix with the network structure feature matrix for joint network representation learning.Thus, different weight factors are assigned to different k-step network representation vectors by weight matrix.

Joint Learning of MVNR
Suppose that matrix M ∈ R m×n admits an approximation of low rank k, where k {m, n}.Based on the matrix factorization, Yu et al. [33] propose a matrix factorization approach with a penalty term constraint, which aims to find X ∈ R k×n and Y ∈ R k×m and minimize the likelihood min where α is a harmonic factor to balance two components in Equation ( 9).Specifically, Equation ( 9) aims at factorizing M ∈ R m×n into two matrices X ∈ R k×m and Y ∈ R k×n , where M ≈ XY.Here, matrix M ∈ R m×n can be regarded as the feature matrix of the network G for the task of network representation learning.Matrix X T ∈ R n×k can be regarded as the learnt representation matrix of the network G.
We do not use the model proposed by Equation ( 9) in MVNR algorithm.We use the Inductive Matrix Completion (IMC) algorithm presented by Natarajan and Dhillon [34], which adopts two known matrices to factorize matrix M as follows: where Ω denotes the sample set of matrix M ∈ R m×n , here, P ∈ R p×m and Q ∈ R q×n are two known feature matrices.β is the hyper-parameter to balance two components M ij − (P T X T YQ) ij and X 2 F + Y 2 F .IMC aim at learning the matrix X ∈ R d×p and matrix Y ∈ R d×q to meet M ≈ P T X T YQ.The Equation ( 10) is originally applied to complete gene-disease matrix using gene and disease feature matrix.Motivated by IMC, we integrate the link weights of existing edges into Equation ( 10), the purpose of MVNR is to solve matrices X and Y to minimize the objective function as follows: Here, W (k) ∈ R q×n is the weight matrix of the k-th step network, λ is the hyper-parameter to balance two components For the Equation ( 10), we denote the matrix P ∈ R p×m as the identity matrix E ∈ R p×m .Consequently, we get the Equation ( 11) based on Equation (10).For each k-step network, we denote R (k) = (X (k) ) T as representation matrix of the k-th network.The final representation vectors can be concatenated as follows: where k = {1,2, . . ., K} and K is a pre-selected constant.As Equation (12), we concatenate all k-step representations to form the global representations, which can be used in various machine learning tasks.We introduce matrix W (k) to adjust the weights of the different k-step networks and remedy the defect of ignoring the weight information for GraRep.For the k-step network representation r (k) , we apply L2 Norm to normalize the learnt network representations which can show better performance in some evaluation tasks, such as network classification and link prediction.
We give the algorithm description of MVNR in Algorithm 1.

Complexity Analysis
In the proposed MVNR, the training procedure can be divided into the following parts: M construction, M factorization using IMC and representations concatenation.In matrix M, the number of rows equals the number of columns, meanwhile, the number of rows and columns equals the number of vertices in the network, which is defined as |V|.Thus, the time complexity of constructing matrix M is O(|V| 3 ).
For the Equation ( 11), we introduce the optimization approach proposed by Yu et al. [33].For each k-step network, we use the known weight matrix W to factorize matrix M of network feature based on IMC algorithm.The output of the Equation ( 11) is the matrices X and Y, where M ≈ XYW.Therefore, the time complexity of each iteration of minimizing X and Y is O(nnz(M)d + |V|dq + |V|d 2 ) in Equation (10), where nnz (M) denotes the number of non-zero elements in M, d denotes the vector length of the network representations.The time complexity of the representations concatenation is O(|V|kd).

Input:
Adjacency matrix A on network.Maximum transition step K. Harmonic factor λ. Dimension of representation vector d.

Output:
Matrix of the network representation R.

2.
Calculate matrix forest index S = [s ij ]: % The positions of 1 in Identity Matrix are set to the degree of vertex as follows: degree of vertex as follows: Get the matrix A (k) : where a (2) Get the network structural feature matrix M (k) : 2 )/2.
(3) Get the weight matrix W (k) : Concatenate all the k-step representations:

Experiments and Evaluations
In our works, we conduct our experiment on three real-world citation networks and evaluate the performance of MVNR based on network classification, visualization and link prediction.

Baseline Algorithms
To evaluate the performance of MVNR, we introduce the following algorithms of network representation as baseline algorithms.The representation length of all the baseline methods is set to 200.
DeepWalk: DeepWalk is the most classical algorithm which adopts only network structural features to learn the network representations.we use the Skip-Gram model and Hierarchical Softmax for DeepWalk.Line: Line is a recently proposed algorithm to solve the representation learning of big-scale networks.Line is a method capturing local features based on the probability loss function, which provides 1st Line and 2nd Line models.Similar as our MVNR, it uses 1-setp and 2-step neighboring vertices to learn the network representations.node2vec: node2vec also adopts the BFS and DFS strategy to capture the higher order neighboring vertices.Meanwhile, node2vec define a second-order random walk to balance the DFS and BFS.NEU: This algorithm is from the paper "Fast Network Embedding Enhancement via High Order Proximity Approximation," which is a higher order network representation learning algorithm.In this paper, the NEU algorithm convert the lower-order representations trained by DeepWalk into the higher-order representation.
GraRep: It is a new network representation learning algorithm, which can capture higher order network features by matrix factorization and representation concatenation.
Weight-based Matrix Factorization (WMF): For the k-step network, we define the matrix W (k) as the link weight matrix of the k-th network view.We then use the SVD algorithm to factorize the matrix T .We regard U (k) (S (k) ) 0.5 as the network representation vectors.

MVNR:
It is our proposed algorithm.In this experiment, we adopt the different k values to capture the k-step vertex relational information and learn the network representation.λ as 0.5 for Citeseer and Cora and 0.1 for Wiki.
Unweighted MVNR: This algorithm is a simplified MVNR algorithm with the equal weights for different k-step networks, we call this algorithm as Un_MVNR.Other optimizations are the same as MVNR.
The MVNR algorithm proposed is a higher order network representation learning algorithm.Therefore, we choose three classical network representation learning algorithms of lower order, such as DeepWalk, Line and node2vec.node2vec improves the random walk of DeepWalk, Line and DeepWalk aim to factorize the structure feature matrix of network; the proposed MVNR is also an NRL algorithm based on matrix factorization form of DeepWalk.So, we take DeepWalk, Line and node2vec as the baseline algorithms.In addition, we also choose two classical network representation learning algorithms of higher order for comparison and analysis, such as GraRep and NEU.DeepWalk is the most classical network representation learning algorithm.Line and node2vec are classical network representation learning algorithms based on DeepWalk.GraRep and NEU are higher order network representation learning algorithms accepted by top-level conference and they are also classical representative algorithms of higher order representation learning algorithms.WMF and Un_MVNR are the simplified versions of MVNR algorithm, which are used to measure the performance influence of weight matrix on MVNR algorithm.The parameter setting of these algorithms is introduced in algorithm description part of Section 4.2.
The parameter settings for the selected baseline models are shown in Table 2:

Multi-Class Classification
In this experiment, we first learn the network representations based on MVNR algorithm.Based on the learnt representations, we then train a Support Vector Machine (SVM) classifier with different proportions of training sets.For the training set, we let training proportions of dataset vary from 10% to 90%.The remaining data is regarded as the testing set.We repeat the procedure for 10 times and report the average accuracy value.We set the K value as 1, 3 and 6.For example, we concatenate the network representations of the 1-step, 2-step and 3-step networks when K is 3.The detailed results are shown in Tables 3-5.In classification evaluation, MVNR consistently and significantly improves the accuracy of diverse network representations on three evaluation datasets.In addition, MVNR outperforms the baseline algorithms and showing its excellent performance with diverse training ratios on Citeseer, Cora and Wiki datasets.Obviously, the performance of higher order network representation learning (GraRep, NEU and MVNR) is better than that of the network representation learning algorithm based on local feature acquisition (DeepWalk, Line and node2vec).
In this classification evaluation, we set K as 1, 3 and 6, which means that we learn and concatenate the network representations of the 1-step, {1, 2, 3}-step and {1, 2, 3, 4, 5, 6}-step networks for GraRep and MVNR.The experimental results show that although GraRep can capture neighboring relations of higher order, MVNR appears to capture better neighboring vertices of higher order than GraRep on three datasets.On the other hand, we use the NEU algorithm to transform the learnt network representations generated by DeepWalk algorithm into higher order one but the performance of the network representations of higher order is inferior than that of the proposed MVNR algorithm in network classification task.The main reason is that GraRep ignores the weight information between different k-step networks and it also neglects the edge weights in each k-step network.Importantly, MVNR gives corresponding weight value to edges of each k-step network, which can indirectly affect the weights between different k-step networks.NEU is not involved in the learning procedure of high-order network representation, and NEU is only a high-order transformation of the network representations trained by other network representation algorithms.NEU [6] is essentially different from GraRep and MVNR.MVNR can embed the global information into lower dimension embeddings by explicitly constructing the global feature matrix of the network.In addition, MVNR also can learn the network representations of the 2nd, 3rd and 4th order by constructing k-step networks.
DeepWalk and node2vec belong to this kind of algorithms based on local feature acquisition.Based on DeepWalk, node2vectakes the micro-view and macro-view of the vertices into account.Line-provides a network representation learning algorithm for various large-scale networks.Although Line-improves the training speed of network representation learning, the precision of network representation learning is not as good as DeepWalk and node2vec.Because Line only considers first order similarity and second order similarity of the vertices.The proposed MVNR in this paper fully considers the global information of the vertices by explicitly constructing the global feature matrix.Thus, the performance of MVNR algorithm is superior to DeepWalk, node2vec and Line.GraRep and NEU encode the features of higher order into the representation space of lower dimension.GraRep does not consider the weights of edges, it also does not consider the weights between different k-step networks.NEU only performs higher order transformation of network representation for the learnt network representations trained by other network representation learning algorithms.The MVNR algorithm proposed in this paper gives corresponding weights to the edges of different k-step networks.The larger the k, the smaller the weights of the edges, which indirectly gives different weights to different k-step networks.In addition, the MVNR algorithm explicitly embeds higher order features of the network, global information and weight information into the representation space through the joint learning model.Therefore, the performance of the MVNR algorithm is better than that of GraRep, and NEU algorithms in Tables 3-5.
In addition, the experimental results show that the classification performance of MVNR is better than that of Un_MVNR on Citeseer, Cora and Wiki datasets.This demonstrates that adding different weight information to different k-step networks can improve the performance of network representation learning.In fact, weight information also plays another role, namely, another feature view of the network structure.Therefore, it can also make up for the problem of sparse network structure.
In Section 3.5, we theoretically discuss the time complexity of the proposed MVNR algorithm.In addition, we provide the empirical result comparing running time performance with GraRep and NEU algorithms on Citeseer dataset.First, we installed a virtual machine on a notebook computer and then we installed the MVNR running environment on that virtual machine.The notebook's memory is 8G, the processor is Core I3 2.53 GHz and the memory allocated to the virtual machine is 3G.Experimental results show that MVNR takes 15 minutes and 01 second to train the model of network representation learning, GraRep takes 5 min and 35 s to train the model of network representation learning when K is set to 6 and NEU takes about 7 seconds to convert the learnt network representations trained by DeepWalk.Although MVNR takes the longest time, the network classification performance of MVNR is superior to the GraRep and NEU algorithms on Citeseer, Cora and Wiki datasets.The main reason why MVNR become time-consuming is that MVNR uses weight matrix W to factorize the matrix M based on IMC algorithm, GraRep purely uses SVD to factorize the feature matrix of network, and NEU is just a multiplication and addition operation for the learnt representations.

Parameter Sensitivity
MVNR has two parameters needed to be adjusted, they are the representation length (vector dimension) and K. Here, K is the pre-selected constant for the k-step network, where 1 ≤ k ≤ K.We let K value range from 1 to 6 and the representation length of k-step network varies within 25, 50, 100 and 200 for Citeseer, Cora and Wiki.The last network representation size 75, 150, 300 and 600.We fix the training ratio to 0.5 and we then test classification accuracy with diverse representation lengths and K values in Figure 2.
As shown in Figure 2, the accuracy of network classification shows an increasingly trend with the increase of K and representation length.Importantly, the growth of accuracy becomes slower with the growth of the representation vector length.In addition, the average degree of Cora is bigger than that of Citeseer and Wiki, so that the accuracy fluctuation of Cora within a small range compared with Citeseer and Wiki.

Network Visualization
In this experiment, we aim at visualizing the learnt representations on a real-world citation network (Citeseer dataset).We adopt the t-SNE [36] toolkit to visualize the learnt representations.First, we randomly sample 4 categories on Citeseer.Each category consists of 150 nodes generated by random sample approach.Based on this experiment, we aim to verify whether MVNR is quite qualified for learning discriminative representations.We visualize the learnt representation trained from DeepWalk, node2vec and MVNR algorithms.The representation length of MVRN is 600 and representation size of DeepWalk and node2vec is 200. the results of network representation are shown in Figure 3.

Network Visualization
In this experiment, we aim at visualizing the learnt representations on a real-world citation network (Citeseer dataset).We adopt the t-SNE [36] toolkit to visualize the learnt representations.First, we randomly sample 4 categories on Citeseer.Each category consists of 150 nodes generated by random sample approach.Based on this experiment, we aim to verify whether MVNR is quite qualified for learning discriminative representations.We visualize the learnt representation trained from DeepWalk, node2vec and MVNR algorithms.The representation length of MVRN is 600 and representation size of DeepWalk and node2vec is 200. the results of network representation are shown in Figure 3.As shown in Figure 3, data points of different colors represent different kinds of nodes.we visualize the learnt network representations.It can be found, based on the visualization results, that the same color points tend to cluster together.The clustering ability of the network representations can reflect whether the trained model can embed the nodes with similar attributes and features into the representation space of closer space distance.For example, the similarity between nodes with links should be greater than that between nodes without links.In addition, the network representation with better clustering ability is helpful to the classification tasks of network nodes by embedding the nodes in the same category into the representation space of closer space distance.
Note that the Wiki dataset consists of 2405 nodes and 19 different categories.There are 126 nodes on average in each category and the number of nodes in most categories is less than 150, so we randomly select four labels in 19 categories, such that there exist 150 nodes at least in each category.Therefore, in Citeseer dataset, we also randomly selected 150 nodes for four categories.That is the reason why we choose four categories and 600 vertices on Citeseer.Random selection strategy can better reflect the learning ability of the proposed MVNE algorithm.In addition, the learnt network representations are projected onto a two-dimensional plane, so we only select four categories of nodes As shown in Figure 3, data points of different colors represent different kinds of nodes.We visualize the learnt network representations.It can be found, based on the visualization results, that the same color points tend to cluster together.The clustering ability of the network representations can reflect whether the trained model can embed the nodes with similar attributes and features into the representation space of closer space distance.For example, the similarity between nodes with links should be greater than that between nodes without links.In addition, the network representation with better clustering ability is helpful to the classification tasks of network nodes by embedding the nodes in the same category into the representation space of closer space distance.
Note that the Wiki dataset consists of 2405 nodes and 19 different categories.There are 126 nodes on average in each category and the number of nodes in most categories is less than 150, so we randomly select four labels in 19 categories, such that there exist 150 nodes at least in each category.Therefore, in Citeseer dataset, we also randomly selected 150 nodes for four categories.That is the reason why we choose four categories and 600 vertices on Citeseer.Random selection strategy can better reflect the learning ability of the proposed MVNE algorithm.In addition, the learnt network representations are projected onto a two-dimensional plane, so we only select four categories of nodes in each dataset for clearly achieving clustering boundaries between different categories of nodes and avoiding the poor display performance caused by too many categories.

Link Prediction
The link prediction algorithm can predict the future link probabilities of non-existing edges between vertices.First, we need to give a reasonable score for each vertex pair.We see r i and r j as the representations of the vertex i and vertex j, we calculate the similarity score based on the cosine similarity (r i • r j )/(||r i || 2 • ||r j || 2 ).We use AUC to evaluate the performance of link prediction.We remove 30%, 20% and 10% edges of Citeseer, Cora and Wiki as test set and we use the remaining edges to learn the network representations.We also use some baseline algorithms of link prediction to make some comparison analysis with MVNR proposed in this paper.We set the K value as 3 and 6.The results are shown in Table 6.As shown in Table 6, The MVNR algorithm proposed in this paper is compared with the 8 kinds of link prediction algorithms presented in Table 6.The experimental results show that the proposed MVNR algorithm always achieves the better performance under the three kinds of training proportions on Citeseer, Cora and Wiki datasets.

Conclusions
In this paper, we propose a unified network representation learning framework (MVNR) which can embed hider order neighboring relations and global information into the representation space of a lower dimension.In addition, we adopt the weight matrix factor to balance different k-step network representations with the modeling procedure of the joint representation learning, namely, we adopt the inductive matrix completion algorithm for synchronously learning the network features from both the network structures and link weights.Empirically, the learnt representations can be effectively applied to various machine learning tasks, such as clustering, classification, visualization and link prediction and so forth.The experimental results show that the proposed MVNR can effectively capture the global features and higher order relations between vertices at the same time.Meanwhile, the classification performance of MVRN outperforms the popular global network representation learning algorithms (Line) and higher order network representation learning algorithms (GraRep and NEU).Our Future works would explore the extendibility of our algorithm in other representation learning tasks.

Figure 2 .
Figure 2. Parameter sensitivity analysis, here, we mainly research the influence of representation vector size and K value to network classification task.(a) parameter sensitivity analysis on Citeseer dataset; (b) parameter sensitivity analysis on Cora dataset; (c) parameter sensitivity analysis on Wiki dataset.

Figure 2 .
Figure 2. Parameter sensitivity analysis, here, we mainly research the influence of representation vector size and K value to network classification task.(a) parameter sensitivity analysis on Citeseer dataset; (b) parameter sensitivity analysis on Cora dataset; (c) parameter sensitivity analysis on Wiki dataset.

Table 6 .
Link prediction performance results on Citeseer dataset.