Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint

: In heterogeneous networks, random walks based on meta-paths require prior knowledge and lack ﬂexibility. On the other hand, random walks based on non-meta-paths only consider the number of node types, but not the inﬂuence of schema and topology between node types in real networks. To solve these problems, this paper proposes a novel model HNE-RWTIC (Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint). Firstly, to realize ﬂexible walks, we design a Type strategy, which is a node type selection strategy based on the co-occurrence probability of node types. Secondly, to achieve the uniformity of node sampling, we design an Inner strategy, which is a node selection strategy based on the adjacency relationship between nodes. The Type and Inner strategy can realize the random walks based on meta-paths, the ﬂexibility of the walks, and can sample the node types and nodes uniformly in proportion. Thirdly, based on the above strategy, a transition probability model is constructed; then, we obtain the nodes’ embedding based on the random walks and Skip-Gram. Finally, in the classiﬁcation and clustering tasks, we conducted a thorough empirical evaluation of our method on three real heterogeneous networks. Experimental results show that HNE-RWTIC outperforms state-of-the-art approaches. In the classiﬁcation task, in DBLP, AMiner-Top, and Yelp, the values of Micro-F1 and Macro-F1 of HNE-RWTIC are the highest: 2.25% and 2.43%, 0.85% and 0.99%, 3.77% and 5.02% higher than those of ﬁve other algorithms, respectively. In the clustering task, in DBLP, AMiner-Top, and Yelp networks, the NMI value is increased by 19.12%, 6.91%, and 0.04% at most, respectively.


Introduction
Many systems in the real world can be modeled as Heterogeneous Information Networks (HINs) [1], such as literature and technology networks, social media networks and medical information networks, etc. Among them, the DBLP literature and technology network (DBLP network for short) is a classic heterogeneous information network (HIN), as shown in Figure 1a. Compared with homogeneous networks [2], HINs contain multiple types of entities and relationships, and richer semantic information. Therefore, HINs have been widely used in various fields.
In big data era, with the increase of network scale, traditional methods (such as the adjacency matrix) present some problems, such as high dimensions, sparseness, and coupling; and become the bottleneck of network analysis and mining tasks [3]. Fortunately, In big data era, with the increase of network scale, traditional methods (such as the adjacency matrix) present some problems, such as high dimensions, sparseness, and coupling; and become the bottleneck of network analysis and mining tasks [3]. Fortunately, inspired by the NLP model Word2Vec [4], the first network embedding (network representation learning) model DeepWalk [5] was proposed in 2014, and can effectively solve the above problems. Therefore, as the basis of dealing with large-scale network analysis tasks, network embedding has attracted extensive attention from industry and academia.
Due to the increase of semantic and structural information in HINs, the methods of homogeneous networks embedding either cannot be used directly or their complexity increases greatly. In contrast to this, the heterogeneous network embedding [6] can preserve the key structural attributes and the semantic attributes, and mine the potential semantic information. This is also of great significance for completing various network application tasks, such as classification [7,8], clustering [9,10], link prediction [11,12], and so on. Therefore, heterogeneous network embedding has become a current research hotspot.
At present, the method of heterogeneous network embedding based on random walks is classic and widely used. It mostly relies on meta-paths-guided random walks. For example, Metapath2Vec [1] manually selects "A-P-A" or "A-P-C-P-A" as a meta-path to guide the random walks in literature and technology networks. ESim [13] considers the information of multiple meta-paths and attempts to learn the optimal weight combination to guide the random walks. HIN2Vec [14] uses the different types of relations between nodes to combine meta-paths shorter than a certain length to guide the random walks. Meta-paths are the embodiment of semantics in HINs. The meta-path information of DBLP is shown in Figure 1b. In this case, the semantics of "A-P-A" is co-authorship, and "A-P-C-P-A" is two papers published by two authors at the same conference, and so on. There are many meta-paths in HINs. Different meta-paths can capture different semantic information, but the number of them increases exponentially as the length increases. The selection of meta-paths either requires domain experts, or the optimization of a set of predefined meta-paths, so that it is necessary to attempt a variety of situations. A determined meta-path limits the flexibility of the random walks. All of this brings significant challenges for the practical application of random walks based on meta-paths. Therefore, there is an urgent need for more flexible random walk methods in HINs.
In order to solve some existing problems of meta-paths, JUST (JUmp & STay) has been proposed by Yang et al. [15]. This is the first method of random walks based on nonmeta-paths for HINs. It applies a Jump/Stay (Jump to other types/Stay on current node type) strategy when selecting the next node. In Figure 2, the current node type is P, and L = 3 means it has stayed in P for 3 times. In this case, the stay probability of the next node in P is α 3 . If α 3 is greater than the given threshold, it will remain in P. Otherwise, the Jump Due to the increase of semantic and structural information in HINs, the methods of homogeneous networks embedding either cannot be used directly or their complexity increases greatly. In contrast to this, the heterogeneous network embedding [6] can preserve the key structural attributes and the semantic attributes, and mine the potential semantic information. This is also of great significance for completing various network application tasks, such as classification [7,8], clustering [9,10], link prediction [11,12], and so on. Therefore, heterogeneous network embedding has become a current research hotspot.
At present, the method of heterogeneous network embedding based on random walks is classic and widely used. It mostly relies on meta-paths-guided random walks. For example, Metapath2Vec [1] manually selects "A-P-A" or "A-P-C-P-A" as a meta-path to guide the random walks in literature and technology networks. ESim [13] considers the information of multiple meta-paths and attempts to learn the optimal weight combination to guide the random walks. HIN2Vec [14] uses the different types of relations between nodes to combine meta-paths shorter than a certain length to guide the random walks. Meta-paths are the embodiment of semantics in HINs. The meta-path information of DBLP is shown in Figure 1b. In this case, the semantics of "A-P-A" is co-authorship, and "A-P-C-P-A" is two papers published by two authors at the same conference, and so on. There are many meta-paths in HINs. Different meta-paths can capture different semantic information, but the number of them increases exponentially as the length increases. The selection of meta-paths either requires domain experts, or the optimization of a set of predefined metapaths, so that it is necessary to attempt a variety of situations. A determined meta-path limits the flexibility of the random walks. All of this brings significant challenges for the practical application of random walks based on meta-paths. Therefore, there is an urgent need for more flexible random walk methods in HINs.
In order to solve some existing problems of meta-paths, JUST (JUmp & STay) has been proposed by Yang et al. [15]. This is the first method of random walks based on non-meta-paths for HINs. It applies a Jump/Stay (Jump to other types/Stay on current node type) strategy when selecting the next node. In Figure 2, the current node type is P, and L = 3 means it has stayed in P for 3 times. In this case, the stay probability of the next node in P is α 3 . If α 3 is greater than the given threshold, it will remain in P. Otherwise, the Jump strategy is performed. Q hist of size m to memorize up-to-m previously visited types. Q hist = {P, A} indicates that the recently visited types are P and A with m = 2. Then, JUST randomly samples one type from {T, V} as the target type, where the next node is sampled.  In Figure 2, we found some problems existing in JUST. (1) For the Stay strategy, 0/1 represents the cases that cannot/must Stay in the current node type. In other cases, α l limits the probability of staying in the current node type, but without considering the schema of networks. For example, in DBLP, it can be seen from Figure 1c that only P can stay in its own type, while A, T and C cannot stay. (2) For the Jump strategy, the node types {Q−Qhist} are given priority selection. But, the types are not considered to meet the Jump requirements. If the current random walk sequence is "···-P-P-P-A-?···", and Qhist = {P, A}, then the next node type will be T or C. But actually, A only has edge with P. So, A can only jump to P, where it makes no sense to consider type first. (3) JUST only provides the selection of node type, but does not consider the node selection. When the above problems arise, JUST shows great limitations.
To sum up, in view of the difficulties in choosing meta-paths, the poor flexibility of meta-paths, and the above problems existing in JUST, we design a novel random walks strategy based on Type and Inner constraint, which considers the node type and the adjacency relationship between nodes.
The main contributions of this paper can be summarized as follows.

•
We propose a novel model HNE-RWTIC. It performs the random walks based on Type and Inner constraint, and adopts Skip-Gram to learn dense and low-dimensional embedding in HINs.

•
We propose a novel random walks strategy based on Type and Inner constraint. In the Type strategy, the node types selection considers the co-occurrence probability of nodes. In the Inner strategy, the nodes selection considers the adjacency relationship between nodes. The strategy realizes the flexibility of node types selection in HINs (see Section 5), and the uniformity of proportional sampling between types and nodes (see Section 6.2.4).

•
We build a transition probability model based on the Type and Inner strategy. In the model, parameters control the selection of node type and node. Then, some properties are obtained. They indicate the relationship between the parameter value and the node type or node selection.

•
Using DBLP, AMiner-Top, and Yelp, we conduct the experiments in the classification and clustering tasks. By comparing with five classic networks embedding algorithms, the correctness and effectiveness of HNE-RWTIC are verified.
The remainder of this paper is organized as follows. Section 2 introduces the related work. Section 3 gives the preliminary knowledge and the problem definition. The random walks strategy, the transition probability model, and the HNE-RWTIC algorithm are described in Section 4. We present the properties and analysis of the transition probability model in Section 5, and show the experimental results and analysis in Section 6. The last section concludes the paper and forecasts future work. In Figure 2, we found some problems existing in JUST. (1) For the Stay strategy, 0/1 represents the cases that cannot/must Stay in the current node type. In other cases, α l limits the probability of staying in the current node type, but without considering the schema of networks. For example, in DBLP, it can be seen from Figure 1c that only P can stay in its own type, while A, T and C cannot stay. (2) For the Jump strategy, the node types {Q − Q hist } are given priority selection. But, the types are not considered to meet the Jump requirements. If the current random walk sequence is "···-P-P-P-A-?···", and Q hist = {P, A}, then the next node type will be T or C. But actually, A only has edge with P. So, A can only jump to P, where it makes no sense to consider type first. (3) JUST only provides the selection of node type, but does not consider the node selection. When the above problems arise, JUST shows great limitations.
To sum up, in view of the difficulties in choosing meta-paths, the poor flexibility of meta-paths, and the above problems existing in JUST, we design a novel random walks strategy based on Type and Inner constraint, which considers the node type and the adjacency relationship between nodes.
The main contributions of this paper can be summarized as follows.

•
We propose a novel model HNE-RWTIC. It performs the random walks based on Type and Inner constraint, and adopts Skip-Gram to learn dense and low-dimensional embedding in HINs.

•
We propose a novel random walks strategy based on Type and Inner constraint. In the Type strategy, the node types selection considers the co-occurrence probability of nodes. In the Inner strategy, the nodes selection considers the adjacency relationship between nodes. The strategy realizes the flexibility of node types selection in HINs (see Section 5), and the uniformity of proportional sampling between types and nodes (see Section 6.2.4).

•
We build a transition probability model based on the Type and Inner strategy. In the model, parameters control the selection of node type and node. Then, some properties are obtained. They indicate the relationship between the parameter value and the node type or node selection. • Using DBLP, AMiner-Top, and Yelp, we conduct the experiments in the classification and clustering tasks. By comparing with five classic networks embedding algorithms, the correctness and effectiveness of HNE-RWTIC are verified.
The remainder of this paper is organized as follows. Section 2 introduces the related work. Section 3 gives the preliminary knowledge and the problem definition. The random walks strategy, the transition probability model, and the HNE-RWTIC algorithm are described in Section 4. We present the properties and analysis of the transition probability model in Section 5, and show the experimental results and analysis in Section 6. The last section concludes the paper and forecasts future work.
The first set of methods usually decompose a heterogeneous network into multisimple networks, learn the embedding of these networks, and integrate them. For example, EOE [16] transforms the academic network into a word co-occurrence network and an author co-occurrence network, and learns the vector representations of the node pairs within and between subnets. HERec [21] transforms a heterogeneous network into multihomogeneous networks based on the meta-path extraction. Then, it fuses the vector representations of multi-homogeneous networks through fusion functions. The advantage of the above methods is that the existing homogeneous network embedding methods can be directly used for reference after decomposition. The disadvantage is that the quality of network decomposition and fusion methods directly affect the vector representation of the original network.
The second set of methods use the deep neural network model to obtain the embedding. For example, HeGAN [21] uses the generative adversarial networks to distinguish nodes connected through different relationships, and uses a generalized generator to sample the potential nodes, so as to obtain the representation of nodes. ActiveHNE [22] is a semi-supervised embedding method based on the graph convolution neural network, and adopts different active selection strategies according to uncertainty to make full use of supervisory information. MPDRL [24] uses reinforcement learning to find semantic-rich meta-paths with different lengths based on task accuracy, and performs node embedding based on the meta-paths set. The advantage of the above methods is that more abundant semantic and structural information can be learned. The disadvantage is that with the increasing number of network layers, the number of parameters may be as high as one million, which makes the running speed slow and greatly increases the requirements for hardware.
The third set of methods usually combine the random walk and skip-gram for embedding. According to the different strategies, the methods can be divided into two categories. (1) Random walks based on meta-path, including Metapath2Vec [1], ESim [13], HIN2Vec [14], HeteSpaceyWalk [26], etc. HeteSpaceyWalk systematically formalizes the random walks based on meta-path into a high-order Markov chain process, and proposes a heterogeneous personalized space random walk. These techniques are designed to optimize meta-paths. But they still require a specific meta-path. The experimental results show that the quality of node embedding is sensitive to the meta-path. (2) Random walks based on non-meta-path. For example, JUST [15], based on jump and stay strategy to guide random walks, breaks the constraints that need to define meta paths in advance, and can balance the sampling distribution of different node types in random walks. TANE-RWCN [9] based on a novel random walk strategy that combines the node's degree, path length, the user's preference for topics, and high-order proximity, and uses the set pair connection number to improve the accuracy of vector representation. The advantage of the above methods is that the generated walk sequences are regarded as articles in NLP, and the existing NLP model can be utilized in learning. The disadvantage is that the quality of the random walks affects the performance of embedding.
As a classical graph analysis model, the random walk is often used to describe the reachability between nodes in networks, and is widely used in network embedding. Most of the existing studies on heterogeneous network embedding based on random walks are based on meta-paths. However, meta-path selection is still challenging in reality, requiring sufficient domain knowledge. And the meta-path also limits the flexibility of walking. Therefore, there is an urgent need to study the methods of random walks based on nonmeta-paths. However, the methods based on non-meta-paths either have high complexity, or only consider the selection of node type, and not how to select the nodes within the type. In order to solve the above problems, and inspired by the idea of selecting node type in JUST and the priority search strategy in Node2Vec [7], we propose a novel random walk strategy that can balance the selection of node type and node.

Preliminary Knowledge and Problem Definition
In this section, we first define the important terms, followed by a formal definition of the heterogeneous network embedding problem. For ease of presentation, a list of notations is given, as shown in Table 1. The vector representation matrix W The sequence of random walks Definition 1. A heterogeneous information network (HIN) is defined as G = (V, E, A, R). Where, V is the node set, E is the edge set, A = {A 1 , A 2 , . . . , A n , . . . , A N } (N ≤ |V|) is the node type set, and R is the edge type set. For each node v i ∈ V, it belongs to a specific node type, denoted by ϕ(v i ) = A n ∈ A (1 ≤ n ≤ N). Where N = |A| is the number of node types. For each edge e j = (v i , v j ) ∈ E, it belongs to a specific relation type, denoted by ψ(e j ) ∈ R. Where, M = |R| is the number of edge types. It is generally believed that heterogeneous information networks satisfy M > 1 or N > 1.

Definition 2.
The network schema [30] is denoted as T G = (A, R). This is a meta template for a heterogeneous network G = (V, E, A, R) with the object type mapping ϕ:V→A and the link type mapping ψ:E→R.
Definition 3. Given a heterogeneous network G = (V, E, A, R), the heterogeneous network embedding is to learn a mapping function f: V →X ∈ R |V|×d , d << |V|, so as to obtain the vector representation of nodes in the network. The vector representation can capture the structural and semantic relationships between nodes in the network.
The purpose of this paper is to study the embedding method of random walks based on non-meta-paths in HINs. Firstly, the random walks strategy is determined and described as a transition probability model. Secondly, the sequence W is obtained through random walks. Then, the obtained W is combined with the Skip-Gram model to learn the embedding of nodes in HINs.

HNE-RWTIC Model
In this section, the random walks strategy and transition probability model are introduced in detail. Then, the algorithm HNE-RWTIC and its detailed description are given. Finally, the time complexity of the algorithm is analyzed.

Random Walks Strategy
Due to the characteristics of HINs and the problems existing in meta-paths and JUST, we design a random walks strategy based on Type and Inner strategies. This strategy is divided into three steps:

•
Node type partitioning strategy. According to the network schema and the research purpose, the node types are divided into objective class and non-objective class. • Type strategy. In the strategy of node type selection, the co-occurrence probability of three consecutive node types in the walking sequence is considered. Three consecutive node types are the previous node type, the current node type and the next node type. And the next node type is selected with the largest probability value. • Inner strategy. In the strategy of nodes selection, based on the adjacency relationship of three consecutive nodes, the probability value of backtracking, breadth or depth is calculated. Then, the next node is selected by the largest value.
Thus, the node type partitioning strategy can solve the JUST problem (2), which occurs when the preferred type does not meet the jump requirement. Type strategy can solve the JUST problem (1) of being confined to the current node type. Inner strategy can solve the JUST problem (3) of not considering how to select the next node. The next step is how to build the transition probability model based on the strategy.

Node Type Partitioning
In this paper, the HINs are unsigned. Therefore, the node type needs to satisfy N ≥ 2. In order to better select the node type and the next node, node type partitioning is necessary in HINs.
Based on network schema and application, node types are divided into objective and non-objective classes. Where, the objective class is the type of the entity being studied or the type connected to most classes in the network, denoted as O. The rests are non-objective classes, denoted as O. Then, in Definition 1, the set of node types is also denoted as So, n 1 +n 2 = N, n 1 = n, n 2 = N − n. In this study, n 1 ≥ 1 and n 2 ≥ 1 are required.
In HINs, ∀v i ∈ V, if ϕ(v i ) ∈ O, the type of node v i is the objective class. Otherwise, the type of node v i is the non-objective class, denoted by ϕ(v i ) ∈ O. During the random walks, the stay probability of the node type is in Equation (1).
In Equation (1), α ∈ [0, 1] is the probability that the node stays at O, and 1 − α is the probability that the node stays at O, as shown in Figure 3. Figure 3. Illustration of the random walks model.

Transition Probability Model
Given a start node v0 and length L, we carry out the random walks; vi−1 and vi are the i − 1 and i node in the path. The transition probability of vi+1 is shown in Equation (2).
where, PType is the selection probability of the φ(vi+1), PInner is the selection probability of node vi+1. vi−1, vi and vi+1 represent the previous node, the current node and the next node.

Transition Probability Model
Given a start node v 0 and length L, we carry out the random walks; v i−1 and v i are the i − 1 and i node in the path. The transition probability of v i+1 is shown in Equation (2).
where, P Type is the selection probability of the ϕ(v i+1 ), P Inner is the selection probability of node v i+1 . v i−1 , v i and v i+1 represent the previous node, the current node and the next node. ϕ(v i−1 ), ϕ(v i ) and ϕ(v i+1 ) represent the previous node type, the current node type and the next node type. Note: , are used as simple descriptions below.

1.
The probability of selecting node type In the random walks, we use the parameters α and k to control the transition probability between node types. Given a G, when ϕ(v i ) and ϕ(v i−1 ) are known, the probability of ϕ(v i+1 ) is shown in Equation (3).
In Equation (3) and Figure 3, when OOO, the probability of In Equation (3) and Figure 3, the probability of ϕ(v i+1 ) is divided into five cases. We can see when ϕ(v i ) ∈ O, there are four cases. Otherwise, there is only one case. Since in the current HINs study, there are no edges between O.
When N = |A| = 2, there are only two types in HINs. In this case, k = 1. When N > 2, the O contains multiple seed types, The probability of selecting nodes After determining the node type, we consider the adjacency relationship between v i+1 , v i and v i−1 , and adopt parameters h, p, and q to control the backtracking, breadth, or depth of the node. Therefore, the transition probability of v i+1 is shown in Equation (5).
In Equation (5) and Figure 3, represents v i+1 is a neighbor of v i but not a neighbor of v i−1 , then the probability of v i+1 is 1/q. Where, q ∈ (0,+∞) controls the breadth or depth. When q > 1, the node adopts breadth-first. Otherwise, the node adopts depth-first. d(v i−1 , v i+1 ) = 1 represents v i+1 is a common neighbor of v i and v i−1 , then the probability of v i+1 is 1.
and h ∈ {0, 1} are return parameters that control the probability of returning v i−1 . When p > max(q, 1), it tends not to return v i−1 . When p < min(q, 1), it tends to return v i−1 . h is set as shown in Equation (6).

Algorithm Description of HNE-RWTIC
The description of algorithm HNE-RWTIC is given, as shown in Algorithm 1. Algorithm 1 is mainly divided into five steps.
Step 1: initializes the random walk paths set W to be empty, at line 1.
Step 2 is to randomly sort all nodes in the network, at line 2.
Step 3: for each starting node, perform r random walks with a walk length of L; then, the final paths set W is obtained, at lines 3-16.
Step 4: we put the W into the Skip-Gram model for training, so that the vector representation of each node in HIN is obtained, at line 17.

Algorithm 1 HNE-RWTIC
Input: G, probability parameter α, control parameter k 1 , return parameter p, controlling search mode parameter q, walks length L, each node as the start node times r, node vector dimension d, window size window. Output: The embedding of nodes Φ = R |V|×d .
In Algorithm 1, two key works are included.
A key work is shown in Algorithm 2. That is, according to ϕ(v i ), ϕ(v i−1 ), parameters α and k, the next node type ϕ(v i+1 ) is selected, at line 12. Algorithm 2 is divided into three steps.

Algorithm 2 next_node_type
Another key work is shown in Algorithm 3. That is, according to parameters q, p, and h, the next node v i+1 is selected, at line 13. Algorithm 3 is divided into five steps. Step1: Calculate p_in1, p_in2, and p_in2 according to parameters q, p, and h, at line 1.
Step 3 is shown in lines 3-16. When p_in1 is the largest, the candidate set of v i+1 is v i−1 . When p_in2 is the largest, v i+1 is obtained through the breadth-first search for neighbor nodes of v i . When p_in3 is the largest, v i+1 is obtained through the depth-first search for neighbor nodes of v i . Step 4: the node is randomly selected as v i+1 in the candidate set, at line 10.

Time Complexity Analysis
The key of HNE-RWTIC is to use Type and Inner strategy to get random walks. In the strategy, it is assumed that there are n nodes in the network, the average degree of nodes is d, each node is selected repeatedly for r times, and the walk step is L. Firstly, when ϕ(v i+1 ) and v i+1 are selected, the time complexity of the first-order neighbor node is O(d), and the time complexity of the second-order neighbor node is O(d 2 ). The time complexity of generating the random walks is O(r × L × n). Then, the time complexity of the strategy is O(r × L × n + d + d 2 ), where r, d and L are constants. Therefore, the optimal time complexity of the strategy is O(n).

Properties
After constructing the transition probability model based on Type and Inner strategy, we get the following properties of parameters α, k, p, q, and h selected with ϕ(v i+1 ) and v i+1 .
We can see from Property 1 and Equation (3), without considering the influence of parameter k, ϕ(v i+1 ) is affected by α, as shown in Figure 4a.

Time Complexity Analysis
The key of HNE-RWTIC is to use Type and Inner strategy to get random walks. In the strategy, it is assumed that there are n nodes in the network, the average degree of nodes is d, each node is selected repeatedly for r times, and the walk step is L. Firstly, when φ(vi+1) and vi+1 are selected, the time complexity of the first-order neighbor node is O(d), and the time complexity of the second-order neighbor node is O(d 2 ). The time complexity of generating the random walks is O(r × L × n). Then, the time complexity of the strategy is O(r × L × n + d + d 2 ), where r, d and L are constants. Therefore, the optimal time complexity of the strategy is O(n).

Properties
After constructing the transition probability model based on Type and Inner strategy, we get the following properties of parameters α, k, p, q, and h selected with φ(vi+1) and vi+1. We can see from Property 1 and Equation (3), without considering the influence of parameter k, φ(vi+1) is affected by α, as shown in Figure 4a.

Property 3.
With the increase of q ∈ (0, +∞), the selection of v i+1 tends to change from depth to breadth.

Property 4.
With the increase of p ∈ (0, +∞), the selection of v i+1 tends to change from backtracking to non-backtracking.
According to Properties 3 and 4, and Equation (5), ignoring the influence of h, we consider the adjacency relationship of v i−1 , v i , and v i+1 . Parameter q controls the preference of breadth-first or depth-first; p controls the backtracking. The influence of p and q on v i+1 is shown in Figure 4c,d.

Analysis
A meta-path is usually used to guide random walks in HINs. For example, in DBLP, we generally choose "A-P-A" or "A-P-C-P-A" as the meta-path in experiments. It can be seen that the meta-path essentially only considers node types. We also describe the selection strategy of node type in Equations (3) and (4). According to Property 1 and 2, the strategy can realize random walks with a specified meta-path and other paths. In this section, DBLP is taken as an example to make a comparative analysis with the meta-path "A-P-C-P-A".
In DBLP, the types of "A-P-C-P-A" are A, P and C. According to the network schema, (4), in this case, N = 3. so, k = k 1 or k = 1/k 1 .
According to "A-P-C-P-A", the starting node type of the random walks can only be A. When k 1 < 1, according to Property 2 and Equation (4), it can be known that k 1 < 1/k 1 . At this time, the path type is OOO.
Similarly, other meta-paths can be realized by the Type and Inner strategy. Therefore, this strategy is more flexible than the meta-path one.

Experimental Setup
In experiments, three real HINs datasets are used, which are DBLP [4], AMiner-Top [27], and Yelp [20]. A detailed description of datasets is given in Table 2.  Five classic network embedding algorithms are selected, which are Node2Vec [7], HIN2Vec [14], Metapath2Vec [1], JUST [15], and HeGAN [21]. In these algorithms, the general parameters are consistent with those in the DeepWalk: each node as the start node times r = 10, walk length L = 100, and node vector dimension d = 128.
The unique parameters of the five algorithms are described as follows: In Node2Vec, the bias parameters are p = 1 and q = 1. In HIN2Vec, the range of the meta-path length is 1-4. In Metapath2Vec, DBLP and AMiner-Top network use the meta-path "A-P-C-P-A" suggested by the author, and Yelp uses "B-U-B". In JUST, the stay probability is set to α ∈ [0.2, 0.5], and the number of node types recorded recently visited is set to m = 1. And in HeGAN, iteration time is epoch = 20; the training times of discriminator and generator are n G = 5 and n D = 15, Gaussian variance is σ 2 = 1.

Experimental Results and Analysis
In this section, we conducted the experimental comparison with five classic algorithms in three HINs from classification and clustering. Then, HNE-RWTIC was analyzed by parameter sensitivity analysis. Finally, the uniformity experiment was carried out on DBLP.

Classification
The goal of classification is to predict the most likely tag of some nodes based on the nodes with labels. In experiments, we divided the datasets into training sets and test sets. The training sets considered 100%, 80%, 60%, 40% and 20% of the dataset, and the rest were for testing. In addition, we trained a one-vs-rest logistic regression classifier to predict the labels of the test nodes, which is a recognized node classification algorithm in heterogeneous network embedding. Moerover, we compared the predicted results with their true labels. The 4 categories of authors in DBLP and 3 categories of businesses in Yelp are classified. There are 8 categories of authors in AMiner-Top, but because the number of authors with tags in some categories is too low, we chose the 3 categories with the most tags to classify. We repeated the experiment 10 times and took the mean value of Micro-F1 and Macro-F1 as the experimental results in Table 3. Bold indicates the maximum value and bold & italics indicates the next maximum value. JUST and methods based on meta-path only consider the node type and ignore the node distribution balance. The problem of unbalanced node distribution is less affected when the node types and the number of edges in the network are large, because rich semantic relationships can make up for this shortcoming. However, Yelp only contains two types of nodes and one type of edge, and the quality of walking sequences will be seriously affected. Detailed analysis refers to the analysis of parameters p and q in Section 6.2.3.

Clustering
The goal of clustering is to aggregate similar nodes into the same community. In experiments, K-means was used to cluster nodes, and NMI was used to evaluate the results. Due to the sparse network and large community size, the NMI value is very small. This makes it difficult to compare different approaches. Therefore, referring to Metapath2Vec, we only selected the nodes of the two largest communities to calculate NMI values for DBLP and Yelp, and the three largest communities for AMiner-Top. Their community proportions are 53.2%, 82.01%, and 81.09%, respectively. We repeated the experiment for times and took the average value of NMI in Table 4. Bold indicates the maximum value and bold & italics indicates the next maximum value. In Table 4, HNE-RWTIC has better clustering results in the three networks. It has the highest NMI values of 19.12%, 6.91%, and 0.04% higher than other algorithms in DBLP, AMiner-Top, and Yelp, respectively.
Interestingly, it can be seen that the embeddings obtained by the HNE-RWTIC algorithm have a decreasing NMI increment when clustering on DBLP, AMiner-Top, and Yelp. Therefore, the network analysis shows that the clustering is better when the structural information in the network is more detailed and the connection between nodes is closer. For example, there are four types of nodes in DBLP, P, C, T, and A, and four types of edges, P-P, P-A, P-T, and P-C. The NMI increment is the largest, 19.12%. And there are three types of nodes in AMiner-Top, P, C, and A, and three edges types of edges, P-P, P-A, and P-C. The NMI increment is 6.91%. However, Yelp only has two types of nodes, B and U, and a type edge, B-U. The NMI increment is the lowest, at 0.04%.

Parametric Sensitivity Analysis
In the classification and clustering of the three datasets, Micro-F1, Macro-F1, and NMI were used to analyze the sensitivity of the six main parameters of HNE-RWTIC, which are α, k 1 , p, q, L, and d. The results are shown in Figures 5-10. The ordinates in the figure represent the results of the classification of micro-F1 and macro-F1 values and the clustering of NMI values.

1.
Probability parameter α Experimental results of α are shown in Figure 5. The abscissa represents the range of parameter α, α ∈ [0.1, 0.9], and the step size is 0.1. In Yelp, when α > 0.5, it tends to stay in the current node type, but there is no B-B or U-U edge in the real network, so α ∈ [0.1, 0.5].  When the parameter α is smaller, the experimental results are better. The smaller α makes the nodes tend to different types in random walks, that is, to heterogeneous edges. On the contrary, when α is larger, the quality of node embedding will be affected because there are many homogeneous edges. The results show that the balance of edge types can be achieved by adjusting α.

Control parameter k1
Experimental results of k1 in DBLP and AMiner-Top are shown in Figure 6. The abscissa represents the range of log values for parameter k1, log2(k1)∈[ −3, 3], and the step size is 1.  Since Yelp has only two types of nodes, the value of k1 is always 1. So parameter analysis of k1 is not carried out.
When the parameter k1 is smaller, the experimental results are better. The smaller value of k1 indicates that the node types tend to be different, that is, there are more types of heterogeneous edges. On the contrary, when k1 is larger, the quality of node embedding will be affected when there are too few heterogeneous edge types. The results show the balance of the number of edge types can be achieved by adjusting k1.

Return parameter p
Experimental results of p are shown in Figure 7. The abscissa represents the range of log values for p, log2(p)∈[ −3, 3], and the step size is 1.  Experimental results of k1 in DBLP and AMiner-Top are shown in Figure 6. The abscissa represents the range of log values for parameter k1, log2(k1)∈[ −3, 3], and the step size is 1.  Since Yelp has only two types of nodes, the value of k1 is always 1. So parameter analysis of k1 is not carried out.
When the parameter k1 is smaller, the experimental results are better. The smaller value of k1 indicates that the node types tend to be different, that is, there are more types of heterogeneous edges. On the contrary, when k1 is larger, the quality of node embedding will be affected when there are too few heterogeneous edge types. The results show the balance of the number of edge types can be achieved by adjusting k1.

Return parameter p
Experimental results of p are shown in Figure 7. The abscissa represents the range of log values for p, log2(p)∈[ −3, 3], and the step size is 1.  When p is 0.5, 4, and 2, the clustering effect is the best, and the NMI value is 0.8091, 0.4446, and 0.0017, respectively. A larger p indicates that the next node in the random walks tends to migrate to other nodes, that is, to visit more nodes. On the contrary, when p is smaller, the next node tends to return vi−1, which will affect the distribution of nodes. The results show the balance of node distribution can be achieved by adjusting p.

Controlling search mode parameter q.
The experimental results of q are shown in Figure 8. The abscissa represents the range of log values for q, log2(q)∈[ −3, 3], and the step size is 1.   9157 and 0.9159, 0.9002 and 0.8907, 0.7479 and 0.7179, respectively. When  q is 1, 0.125, and 1, the clustering effect is the best, and the NMI value is 0.8091, 0.4689, and 0.0017, respectively. clustering are better. q > 1 indicates that nodes in the random walk tend to be breadthfirst. q < 1 indicates nodes tend to be depth-first. The results show the distribution of nodes within the type can be balanced by adjusting q.

Walk length parameter L.
Experimental results of L are shown in Figure 9. Mathematics 2022, 10, x FOR PEER REVIEW 17 of 21 Figure 9 shows that when the walk length is small, the classification result is poor. With the increase of L, it also gets better, but after L>100, there will be a downward trend. The clustering effect is generally better in L∈ [80,100], and it becomes better with the increase of L in Yelp. In DBLP, AMiner-top, and Yelp, when L is 100, 150, and 100, the node classification results are the best, and the Micro-F1 and Macro-F1 values are 0.9157 and 0.9159, 0.9015 and 0.8924, 0.7479 and 0.7179, respectively. When L is 60, 100 and 200, node clustering results are the best, and NMI value is 0.8167, 0.4689, and 0.0019, respectively. To be consistent with the baseline algorithm, we set L = 100. 6. Dimension parameter d.

Uniformity of Sampling
This section analyzes the sampling situation from node type and node of the same type. Taking DBLP as an example, when α = 0.4, k1 = 0.5, p = 0.25, q = 0.5, L = 100, r = 10, 10 experiments were carried out. The mean values of each node type and each node were taken for analysis.

Distribution of node types
The distribution of four types of nodes in DBLP is shown in Table 5. NUM1 is the number of nodes P, T, C, and A. The number is 5237, 5915, 4479, and 18, respectively. Since the node of C is much smaller than the others, non-uniform data processing is required. According to the ratio of the sampling number of P, A, and T to the actual number, the filling value for C type is 1671. NUM2 is the average number of samples taken by P, T, C, and A, which are 78,400, 314,161, 346,665, and 13,073, respectively.  When the parameter α is smaller, the experimental results are better. The smaller α makes the nodes tend to different types in random walks, that is, to heterogeneous edges. On the contrary, when α is larger, the quality of node embedding will be affected because there are many homogeneous edges. The results show that the balance of edge types can be achieved by adjusting α.

2.
Control parameter k 1 Experimental results of k 1 in DBLP and AMiner-Top are shown in Figure 6. The abscissa represents the range of log values for parameter k 1 , log 2 (k 1 ) ∈ [−3, 3], and the step size is 1. Figure 6 shows that when log 2 (k 1 ) ∈ [−3, 0], both classification and clustering have good results. In DBLP and AMiner-Top, when k 1 = 0.5, the classification results are the best, and the Micro-F1 and Macro-F1 values are 0.8863 and 0.8865, 0.8962 and 0.8859, respectively. When k 1 is 0.25 and 0.5, the clustering results of nodes are the best, and the NMI values are 0.7826 and 0.4461.
Since Yelp has only two types of nodes, the value of k 1 is always 1. So parameter analysis of k 1 is not carried out.
When the parameter k 1 is smaller, the experimental results are better. The smaller value of k 1 indicates that the node types tend to be different, that is, there are more types of heterogeneous edges. On the contrary, when k 1 is larger, the quality of node embedding will be affected when there are too few heterogeneous edge types. The results show the balance of the number of edge types can be achieved by adjusting k 1 .

3.
Return parameter p Experimental results of p are shown in Figure 7. The abscissa represents the range of log values for p, log 2 (p) ∈ [−3, 3], and the step size is 1. Figure 7 shows that when log 2 (p) ∈ [0, 2], in DBLP and Yelp, both classification and clustering have good results. When log 2 (p) ∈ [−2, 0], in AMiner-Top, both classification and clustering have good results. In DBLP, AMiner-Top and Yelp, when p is 0.25, 4, and 2, the classification effect is the best, and the Micro-F1 and Macro-F1 values are 0.9031 and 0.9030, 0.8960 and 0.8856, 0.7479 and 0.7179, respectively. When p is 0.5, 4, and 2, the clustering effect is the best, and the NMI value is 0.8091, 0.4446, and 0.0017, respectively. A larger p indicates that the next node in the random walks tends to migrate to other nodes, that is, to visit more nodes. On the contrary, when p is smaller, the next node tends to return v i−1 , which will affect the distribution of nodes. The results show the balance of node distribution can be achieved by adjusting p.

4.
Controlling search mode parameter q The experimental results of q are shown in Figure 8. The abscissa represents the range of log values for q, log 2 (q) ∈ [−3, 3], and the step size is 1. Figure 8 shows that when log 2 (q) ∈ [−2, 0], classification has good results. When log 2 (q) ∈ [0, 2], in DBLP and Yelp, the clustering has good results, and the clustering of AMiner-Top has good results when log 2 (q) ∈ [0, 2]. In DBLP, AMiner-Top, and Yelp, when q was 0.5, 0.125 and 1, the classification effect was the best, and the Micro-F1 and Macro-F1 values were 0.9157 and 0.9159, 0.9002 and 0.8907, 0.7479 and 0.7179, respectively. When q is 1, 0.125, and 1, the clustering effect is the best, and the NMI value is 0.8091, 0.4689, and 0.0017, respectively.
When q ∈ [0.5, 1], the results of classification are better. When q ∈ [1,4], the results of clustering are better. q > 1 indicates that nodes in the random walk tend to be breadth-first. q < 1 indicates nodes tend to be depth-first. The results show the distribution of nodes within the type can be balanced by adjusting q.

5.
Walk length parameter L Experimental results of L are shown in Figure 9. The abscissa represents the range of parameter L is [40, 60, 80, 100, 150, 200]. Figure 9 shows that when the walk length is small, the classification result is poor. With the increase of L, it also gets better, but after L>100, there will be a downward trend. The clustering effect is generally better in L ∈ [80, 100], and it becomes better with the increase of L in Yelp. In DBLP, AMiner-top, and Yelp, when L is 100, 150, and 100, the node classification results are the best, and the Micro-F1 and Macro-F1 values are 0.9157 and 0.9159, 0.9015 and 0.8924, 0.7479 and 0.7179, respectively. When L is 60, 100 and 200, node clustering results are the best, and NMI value is 0.8167, 0.4689, and 0.0019, respectively. To be consistent with the baseline algorithm, we set L = 100.

6.
Dimension parameter d The experimental results of d are shown in Figure 10. The abscissa represents the range of parameter d is [32, 64, 128, 256, 512]. Figure 10 shows that when d is smaller, the effect of classification and clustering tasks is relatively poor. With the increase of d, it also increased. But when d > 256, it tends to be balanced. In DBLP, AMiner-top, and Yelp, when d is 512, 512, and 256, the node classification effect is the best, and the Micro-F1 and Macro-F1 values are 0.9267 and 0.9267, 0.9019 and 0.8923, 0.7589 and 0.7260, respectively. When d = 128, the clustering effect of nodes is the best, with NMI values of 0.8091, 0.4689 and 0.0017, respectively.

Uniformity of Sampling
This section analyzes the sampling situation from node type and node of the same type. Taking DBLP as an example, when α = 0.4, k 1 = 0.5, p = 0.25, q = 0.5, L = 100, r = 10, 10 experiments were carried out. The mean values of each node type and each node were taken for analysis.

1.
Distribution of node types The distribution of four types of nodes in DBLP is shown in Table 5. NUM1 is the number of nodes P, T, C, and A. The number is 5237, 5915, 4479, and 18, respectively. Since the node of C is much smaller than the others, non-uniform data processing is required. According to the ratio of the sampling number of P, A, and T to the actual number, the filling value for C type is 1671. NUM2 is the average number of samples taken by P, T, C, and A, which are 78,400, 314,161, 346,665, and 13,073, respectively. To make the results more intuitive, we take Log 2 for NUM1, NUM2, and NUM2/NUM1, and the comparison results are shown in Figure 11. The trend of Log 2 (NUM1) and Log 2 (NUM2) is the same in Figure 11a. In Figure 11b, the value of Log 2 (NUM2)/Log 2 (NUM1) is close to 1.5. In other words, the proportion of sampling quantity of four types in the network is the same as the actual quantity. Therefore, the strategy can achieve proportional and uniform sampling of node types.

2.
Distribution of nodes of the same type To make the results more intuitive, we take Log2 for NUM1, NUM2, and NUM2/NUM1, and the comparison results are shown in Figure 11. The trend of Log2(NUM1) and Log2(NUM2) is the same in Figure 11a. In Figure 11b, the value of Log2(NUM2)/Log2(NUM1) is close to 1.5. In other words, the proportion of sampling quantity of four types in the network is the same as the actual quantity. Therefore, the strategy can achieve proportional and uniform sampling of node types.

Distribution of nodes of the same type
The sampling distribution of nodes of the same type in DBLP is shown in Figure 12. The abscissa in the figure is the node identification. The ordinate is the Log2 of the average sampling value of the node, and the minimum value is set to 6 for the comparison purpose. In Figure 12a, 90.95% of the sampling values of A-type nodes are distributed between 7 and 10. In Figure 12b, 84.79% of the sampling values of P-type nodes are in 10-12. In Figure 12c, 85.74% of the sampling values of T-type nodes are in 6-10. In Figure 12d, 100% of the sampling values of C-type nodes are in 14-17. In summary, for A, P, T, and C, at least 85% or more of each type of nodes are uniformly distributed. Therefore, the strategy can achieve proportional and uniform sampling of nodes within the type.

Log2
Nodes of type C Figure 11. Uniformity of node types. (a) Log 2 of NUM1 and NUM2; (b) Log 2 ratio of NUM2 to NUM1.
The sampling distribution of nodes of the same type in DBLP is shown in Figure 12. The abscissa in the figure is the node identification. The ordinate is the Log 2 of the average sampling value of the node, and the minimum value is set to 6 for the comparison purpose. In Figure 12a, 90.95% of the sampling values of A-type nodes are distributed between 7 and 10. In Figure 12b, 84.79% of the sampling values of P-type nodes are in 10-12. In Figure 12c, 85.74% of the sampling values of T-type nodes are in 6-10. In Figure 12d, 100% of the sampling values of C-type nodes are in 14-17. To make the results more intuitive, we take Log2 for NUM1, NUM2, and NUM2/NUM1, and the comparison results are shown in Figure 11. The trend of Log2(NUM1) and Log2(NUM2) is the same in Figure 11a. In Figure 11b, the value of Log2(NUM2)/Log2(NUM1) is close to 1.5. In other words, the proportion of sampling quantity of four types in the network is the same as the actual quantity. Therefore, the strategy can achieve proportional and uniform sampling of node types.

Distribution of nodes of the same type
The sampling distribution of nodes of the same type in DBLP is shown in Figure 12. The abscissa in the figure is the node identification. The ordinate is the Log2 of the average sampling value of the node, and the minimum value is set to 6 for the comparison purpose. In Figure 12a, 90.95% of the sampling values of A-type nodes are distributed between 7 and 10. In Figure 12b, 84.79% of the sampling values of P-type nodes are in 10-12. In Figure 12c, 85.74% of the sampling values of T-type nodes are in 6-10. In Figure 12d, 100% of the sampling values of C-type nodes are in 14-17. In summary, for A, P, T, and C, at least 85% or more of each type of nodes are uniformly distributed. Therefore, the strategy can achieve proportional and uniform sampling of nodes within the type.

Discussion
In order to prove the correctness and effectiveness of the model HNE-RWTIC, we conducted the above experiments in three real HINs. First, six models were used to learn the vector representation of nodes respectively, and then one-vs-rest logistic regression classifier and K-means were used to realize the classification and clustering tasks.
Among those used, Node2Vec is a homogeneous network embedding model based on random walks. Therefore, the vector representation learned by Node2Vec will lose het-

Log2
Nodes of type C In summary, for A, P, T, and C, at least 85% or more of each type of nodes are uniformly distributed. Therefore, the strategy can achieve proportional and uniform sampling of nodes within the type.

Discussion
In order to prove the correctness and effectiveness of the model HNE-RWTIC, we conducted the above experiments in three real HINs. First, six models were used to learn the vector representation of nodes respectively, and then one-vs-rest logistic regression classifier and K-means were used to realize the classification and clustering tasks.
Among those used, Node2Vec is a homogeneous network embedding model based on random walks. Therefore, the vector representation learned by Node2Vec will lose heterogeneous semantic information, and impact the effect of classification and clustering tasks. HIN2Vec and Metapath2Vec are the models based on meta-paths. Due to the limitations of meta-paths, the performance of the models is unstable, and they only have good results in DBLP networks. JUST is a model based on non-meta-path random walks. Although JUST solves some shortcomings of meta-paths, it only considers the node type and ignores the uniform distribution of nodes and types, so the effect is general. HeGAN is a model based on generative adversarial networks. HeGAN focuses on node sampling, but it only has good results in AMiner-Top.
Compared with the five classic algorithms, HNE-RWTIC comprehensively considers the co-occurrence probability of nodes and the adjacency relationship between nodes; it not only realizes the flexibility of walk among all kinds of nodes in HINs, but also achieves the uniformity of proportional sampling between types and nodes. Therefore, HNE-RWTIC has the best effect on the whole.

Conclusions
In this paper, HNE-RWTIC is proposed based on the random walks strategy with Type and Inner constraints. HNE-RWTIC realizes a flexible random walk by adjusting parameters. It also realizes the uniform sampling of nodes and node types, it can balance homogeneous edges and heterogeneous edges, and balance node distribution in different types. In addition, in three real networks, the experimental results show that the F1-Score and NMI of HNE-RWTIC outperform other five state-of-the-art approaches in the classification and clustering tasks. The next step will be to improve the strategy of heterogeneous types and node constraints in more complex HINs, and to combine the dynamic network with the ideas of heterogeneous types and node constraints. Another important task ahead is to use heterogeneous graph convolution and the deep learning model to automatically learn richer semantic and structural information.