HeteEdgeWalk: A Heterogeneous Edge Memory Random Walk for Heterogeneous Information Network Embedding

Most Heterogeneous Information Network (HIN) embedding methods use meta-paths to guide random walks to sample from HIN and perform representation learning in order to overcome the bias of traditional random walks that are more biased towards high-order nodes. Their performance depends on the suitability of the generated meta-paths for the current HIN. The definition of meta-paths requires domain expertise, which makes the results overly dependent on the meta-paths. Moreover, it is difficult to represent the structure of complex HIN with a single meta-path. In a meta-path guided random walk, some of the heterogeneous structures (e.g., node type(s)) are not among the node types specified by the meta-path, making this heterogeneous information ignored. In this paper, HeteEdgeWalk, a solution method that does not involve meta-paths, is proposed. We design a dynamically adjusted bidirectional edge-sampling walk strategy. Specifically, edge sampling and the storage of recently selected edge types are used to better sample the network structure in a more balanced and comprehensive way. Finally, node classification and clustering experiments are performed on four real HINs with in-depth analysis. The results show a maximum performance improvement of 2% in node classification and at least 0.6% in clustering compared to baselines. This demonstrates the superiority of the method to effectively capture semantic information from HINs.


Introduction
In this current era of rapid Internet development, the volume of data is rising, and the correlation of data is becoming more complex. Most of them are presented as networks with social relationships and genome, academic, and protein interactions, to name a few. Network embedding has become a major research direction. The main concept of mapping the nodes in the network is to a consecutive lower dimensional vector representation. It is utilized to preserve key structural and semantic information in the network. Not only that, network embeddings can be applied to all kinds of supervised and unsupervised downstream tasks, such as node classification [1,2], link prediction [3,4], and node clustering [5,6], among others.
Random walks are extensively used for exploring information about nodes in the immediate vicinity of a network [7,8]. These methods can be used to represent nodes in homogeneous information networks; they are not appropriate for HINs. HINs are more complex, with multiple types of nodes containing both homogeneous and heterogeneous edges. A simple academic HIN is depicted in Figure 1a, consisting of three different types of nodes, three heterogeneous edges, and one homogeneous edge. The nodes are author (A), paper (P), venue (V), and topic (T), while the heterogeneous edges are A-P, P-T, and P-V, respectively. Inspired by homogeneous information network methods, Dong et al. and Shi et al. [9,10] proposed to sample network nodes via meta-path guided random walks walks to solve the complex relationships of HINs. As shown in Figure 1b, just as the metapath A-P-A in [9] represents two authors co-authoring a paper, A-P-V-P-A represents two authors publishing papers in the same venue. However, the selection of the best metapaths continues to present a difficult time and quality challenge. A single meta-path is not adequate to represent the rich structural and semantic information in HINs. Additionally, the choice of meta-paths frequently calls for specialized domain knowledge. Fu et al. [11] propose a strategy using automatic extraction of meta-paths below a threshold value. However, the quantity of these self-selected meta-paths increases exponentially with their length. In any case, the quality of the predefined meta-path or the length chosen have a significant impact on the outcome. Network schemas are used by [12] to guide their spatial random walks. As shown in Figure 1c, network schema is defined as a minimal graph consisting of all node and edge types in a HIN together. However, the results of the sampling will be biased in favor of types where there appear to be more nodes. Samy et al. [13] also use network schema to guide random walks, balancing the choice of edge types in the network. However, the graph data itself is unevenly distributed. Nodes of the same type accumulate together, leading to starvation. This makes some of the clustered nodes inaccessible, losing node information. Network embedding methods depend on the quality of sampling. Both homogeneous and heterogeneous network embedding methods improve the quality of sampling as much as possible to produce a better representation of the network structure and semantic information. The main problem is that the nodes with high centeredness in the network may appear more frequently in the walk sequence. This makes the sampling unbalanced, which leads to a negative impact on the embedding results. The main concern of this paper is how to choose the rich information in HINs in a balanced manner while minimizing the influence of the central node on the sampling results. We propose HeteEdgeWalk, a random walk that samples flexibly through network schema, to achieve this. It mitigates bias when exploring HINs by regulating random walk strategy. To be precise, our aim is a more balanced sampling of edge types. We use a dynamically adjusted bidirectional edge sampling walk strategy. Specifically, we make a probabilistic selection of the next type of edge in the walk by means of an exponential decay function. It reduces the frequency of recurrence of the same edge type in the random walk. Then, we use a dynamic resizing sequence to store the last few edge types to mitigate the probability of starvation during the walk.
The main contributions of our work are as follows: • We propose HeteEdgeWalk, a HIN-embedding technique based on network schema. It is an approach that does not involve meta-paths and optimizes the representation more fairly and comprehensively by capturing composite interactions; Network embedding methods depend on the quality of sampling. Both homogeneous and heterogeneous network embedding methods improve the quality of sampling as much as possible to produce a better representation of the network structure and semantic information. The main problem is that the nodes with high centeredness in the network may appear more frequently in the walk sequence. This makes the sampling unbalanced, which leads to a negative impact on the embedding results. The main concern of this paper is how to choose the rich information in HINs in a balanced manner while minimizing the influence of the central node on the sampling results. We propose HeteEdgeWalk, a random walk that samples flexibly through network schema, to achieve this. It mitigates bias when exploring HINs by regulating random walk strategy. To be precise, our aim is a more balanced sampling of edge types. We use a dynamically adjusted bidirectional edge sampling walk strategy. Specifically, we make a probabilistic selection of the next type of edge in the walk by means of an exponential decay function. It reduces the frequency of recurrence of the same edge type in the random walk. Then, we use a dynamic resizing sequence to store the last few edge types to mitigate the probability of starvation during the walk.
The main contributions of our work are as follows: • We propose HeteEdgeWalk, a HIN-embedding technique based on network schema. It is an approach that does not involve meta-paths and optimizes the representation more fairly and comprehensively by capturing composite interactions; • We analyze the disadvantages of sampling based on node types and suggest sampling based on edge types as a more versatile and finer-grained method for exploring HINs; • Four large real-world datasets are utilized to evaluate the experimental. The results show that HeteEdgeWalk achieves optimal or sub-optimal performance compared to the baselines. In the study of parameter sensitivity, our approach exhibits robustness in various scenarios.

Related Work
According to the type of graph, homogeneous and heterogeneous information networks are the two major categories of available graph-embedding methods. On the one hand, all nodes of a homogeneous information network belong to a single domain, such as a social network of users. Many graph-embedding methods attempt to use the adjacency matrix to obtain the embedding representation of homogeneous information network. However, the sparsity of the adjacency matrix and the amount of matrix computation posed significant challenges. Perozzi et al. [7] proposed a strategy to learn node embeddings by combining a random walk with the SkipGram model [14]. SkipGram maximizes the co-occurrence probability of nodes in context window in random walks. Tang et al. [8] achieved better embedding by preserving first and second order proximity in homogeneous information networks. On the other hand, HINs exhibit a more complicated architecture. Its nodes belong to various domains with heterogeneous and homogeneous edges. The homogeneous information network embeddings directly extending to HINs will result in low-quality node embeddings. The performance of downstream graphics analysis tasks may be further degraded as a result. Dong et al. [9] propose a combination of meta-path guided random walks and heterogeneous SkipGram strategy to capture the key structural properties of HINs. They manually select A-P-A and A-P-V-P-A as meta-paths of the academic HIN in accordance with a priori knowledge. However, the selected meta-path is only appropriate for this network. This causes the meta-path to be highly dependent on the expertise of domain experts.
Some methods either select meta-paths according to predefined criteria or use other approaches instead of meta-paths to achieve improved robustness. First, the specific criterion is generally to set a threshold value for the length of meta-paths, and only those below the threshold are retained. Fu et al. [11] combine all meta-paths below a predetermined threshold in order to direct random walks and thus learn different relationships between nodes. Second, other methods can also be used instead of meta-paths. Zhang et al. [15] use meta-graph to direct random walks. However, the generation of the meta-graph is a considerable overhead. Hussein et al. [16] propose the Jump and Stay strategies to regulate the sampling of node sequences. The Stay strategy keeps the current domain sampled and saves the most recent node types so that domains with the same node type are not visited repeatedly in the near future, as opposed to the Jump strategy, which selects to leap to the next domain with a different type of node. This successfully strikes a slight balance between sampling at heterogeneous and homogeneous edges. However, it is challenging to access short sequences, such as (A-P-A), which leads to the loss of node information. He et al. [12] used network schema to guide their spatial random walks. The results of the sampling, however, will be biased in favor of types where there appear to be more nodes. Samy et al. [13] also used network schema to guide random walks, balancing the choice of edge types in the network. However, the graph data itself are unevenly distributed, with nodes of the same type accumulating together. This leads to starvation, thus making some of the clustered nodes inaccessible, losing node information. In this paper, we propose a flexible sampling strategy that does not utilize any meta-path. It can not only provide a balance of heterogeneous edges but also mitigates the probability of starvation during the walk. Our edge-sampling method can capture the HIN information better, as compared to node sampling. It mitigates the effect of high-frequency nodes in random walks and provides a more balanced sampling of heterogeneous edges.
Finally, there are other HIN-embedding techniques, such as the use of deep learning frameworks [17], hyperedges [18], hypergraphs [19], and HeGAN [20], which learn node embeddings using adversarial learning. In addition to static charts, there are also tech-nologies applied to dynamic graphs [21]. In terms of applications, HIN embedding can be applied to many aspects, such as recommender systems [10], business networks [22], and bioinformatics networks [23].

Materials and Methods
In this section, we provide a detailed description of our HeteEdgeWalk method. It utilizes a dynamically adjusted bidirectional edge sampling walk strategy on a HIN without involving meta-paths. Specifically, HeteEdgeWalk first conducts random walks on the imported HIN. In selecting the next node, our method first performs probabilistic balanced edge sampling. Then, we randomly select nodes by sampled edge types. Finally, these random walks are input into the SkipGram to output node embeddings. Hereafter, we first introduce some key concepts. Subsequently, we describe in detail dynamically adjusted bidirectional edge sampling walks and node embedding learning with SkipGram. Finally, the materials and experimental setup are described.

Preliminaries
Heterogeneous Information Network (HIN). Given as a graph G = (V, E , A, R), where V and E represent set of nodes and edges, respectively. Given a function φ: V → A for node-type mapping and a function ψ: E → R for edge-type mapping, A and R represent node-type and edge-type set, respectively. When |A| + |R| > 2 is a HIN, the opposite is a homogeneous information network.
Heterogeneous Information network embedding. Given a HIN G = (V, E , A, R), for each node v ∈ V, the aim is to learn the d-dimensional consecutive embedding function f : V → R d to maintain each node v semantic and structural information in the HIN, where d |V |. Network schema. Given a HIN G = (V, E , A, R), a network schema is a minimal graph T G = (A, R) containing all node and edge types.

Dynamically Adjusted Bidirectional Edge Sampling Walk
We design dynamically adjustable edge-sampling strategy without meta-path guidance inspired from [16]. Jiang et al. proposed bidirectional random walks, which allow retrieval of more tail nodes than a unidirectional one, in theory [24]. We have followed this idea to design a bidirectional edge sampling walk. After this, the bidirectional edgesampling walk is simply called Bi-EdgeSamplingWalk. It is an edge-sampling strategy that adjusts for walks by rationalizing the choice of edges. To guide the random walk behavior in the HIN, the following probability is first defined to select the next edge type r i+1 : where R N (v i ) denotes the set of all edge types connected to the current node v i , and Weight(r) denotes the probability of all edge types being selected. We adjust Weight(r) by the number of visits n to this edge type. The probability decreases progressively as the number n increases, as follows: where α is the decay parameter and is set to 0.4. Node types that link multiple-type edges in the graph often act as pivots. To reduce the influence of this high-frequency type of nodes, we sample the different edge types as evenly as possible. A queue Q is designed for storing the most recently visited edge types, with the following rules: where in(r) represents the in-queue operation, which adds edge type r to the end of queue Q. The pop() represents the out-of-queue operation. The pop(r) means to remove the edge type r from the queue Q. When the queue |Q|=|R N (v i )|, all edge types connected by the current node v i have been visited in the recent past. After that, r ∈ R N (v i ) is chosen randomly as the next edge type to achieve more balanced edge type access. Afterwards, the next node of the present node v i is selected depending on the edge type, as follows: The above process is the basic strategy of our algorithm. To better understand this process, Figure 2 illustrates an instance of a walker in a simple academic HIN with Bi-EdgeSamplingWalk.
where ( ) represents the in-queue operation, which adds edge type to the end of queue . The () represents the out-of-queue operation. The ( ) means to remove the edge type from the queue . When the queue| | = |ℛ ( )| , all edge types connected by the current node have been visited in the recent past. After that, ∈ ℛ ( ) is chosen randomly as the next edge type to achieve more balanced edge type access.
Afterwards, the next node of the present node is selected depending on the edge type, as follows: The above process is the basic strategy of our algorithm. To better understand this process, Figure 2 illustrates an instance of a walker in a simple academic HIN with Bi-EdgeSamplingWalk. It shows the storage queue and weight changes in a walker with A1 as the starting node. Set and weights to 0.1 and 1, respectively. Taking the left half as an example, nodes of type A are linked to nodes of type P only, and A1 can only select edges of type A-P. Weight (A-P, I = 1) = 0.1, where I is the number of selections. Meanwhile the queue is initially empty, and the ( ) operation is performed directly. After walking to P1, P links four edge types, all with a weight of 1. P-T is randomly chosen as the next edge type and the weights are updated. Because | | = 1 < |ℛ ( )|/2 = 2, the ( ) operation is performed and the queue stores A-P and P-T. After walking to T1, | | ≥ |ℛ ( )|/2, the () and ( ) operation is performed, the queue is updated to P-T and T-P. Meanwhile, the weight of T-P is updated to 0.1. After walking to P3, since P-T has a weight of 0.1 and the other three edge types have a weight of 1, P-P is chosen randomly as the next edge type and the weights are updated. | | = |ℛ ( )|/2, the queue is updated to T-P and P-P.
The pseudo-code for the Bi-EdgeSamplingWalk algorithm is displayed in Algorithm 1. It shows the storage queue and weight changes in a walker with A1 as the starting node. Set α and weights to 0.1 and 1, respectively. Taking the left half as an example, nodes of type A are linked to nodes of type P only, and A1 can only select edges of type A-P. Weight (A-P, I = 1) = 0.1, where I is the number of selections. Meanwhile the queue is initially empty, and the in(r) operation is performed directly. After walking to P1, P links four edge types, all with a weight of 1. P-T is randomly chosen as the next edge type and the weights are updated. Because |Q| = 1 < |R N (v i )|/2 = 2, the in(r) operation is performed and the queue Q stores A-P and P-T. After walking to T1, |Q| ≥ |R N (v i )|/2, the pop() and in(r) operation is performed, the queue Q is updated to P-T and T-P. Meanwhile, the weight of T-P is updated to 0.1. After walking to P3, since P-T has a weight of 0.1 and the other three edge types have a weight of 1, P-P is chosen randomly as the next edge type and the weights are updated. |Q| = |R N (v i )|/2, the queue Q is updated to T-P and P-P.
The pseudo-code for the Bi-EdgeSamplingWalk algorithm is displayed in Algorithm 1.
Input: the graph G = (V, E , A, R), the start node per walk v, the walk length l, the decay parameter α. Output: the random walk sequence W.

Node Embedding Learning with SkipGram
After obtaining the random walk sequence W, HeteEdgeWalk uses the SkipGram [14]. In particular, it aims to maximize the co-occurrence probability p of nodes that appear in the identical context window k, as follows: where w is a walk and N(v) is the neighborhood nodes in w. The node v does not exceed the size of context window k. The co-occurrence probability is generally defined as a softmax function, as follows: The number of nodes is usually large. The negative sampling technique [25] is generally used to achieve an approximate probability. It maximizes the probability that the target node does not occur at the same time as a randomly sampled negative node. The final maximization objective is formulated as follows: where M is the number of negative samples and p(u) is the distribution of negative samples. The above equation is the full process for SkipGram to learn the node representation in HIN. Finally, the pseudo-code for the entire HeteEdgeWalk algorithm is given in Algorithm 2. HeteEdgeWalk as a whole is divided into two parts, Bi-EdgeSamplingWalk and SkipGram. In the first part, we perform edge sampling and generate random walks on the HIN. When the next node is to be selected, our method first performs probabilistic balanced edge sampling by parameter α. Then, the most recently visited edge types are stored according to certain conditions. Finally, we use the sampled edge types to randomly select nodes. In the second part, we input the random walks into the SkipGram to learn the node embedding results.  (G, λ, l, d, k, α) Input: the graph G = (V, E , A, R), the numbers of walks per node λ, the walk length l, the node-embedding dimension d, the context window size k, the decay parameter α. Output: the node representations X ∈ R |V |×d 1: Initialize X; 2: for i = 1 → λ do ; 3: for v ∈ V do; 4: W = Bi-EdgeSamplingWalk (G, v, l, α) 5: X = SkipGram(X, W, d, k); 6: end for 7: end for 8: return X

Dataset Description
We use four real heterogeneous networks for evaluation: DBLP [13], ACM [26], Foursquare [27], and Movies [13]. Table 1 contains statistics on the four HINs. • DBLP [13] is an academic network based on four types of nodes. It includes 5915 authors (A), 5237 papers (P), 4479 topics (T), and 18 venues (V). In addition, each author corresponds to the below four domain areas: "data mining", "information retrieval", "database", and "machine learning"; • ACM [26] is the other academic network containing three types of nodes. The nodes of the network include 7167 authors (A), 4039 papers (P), and 60 subjects (S). Each paper falls under one of the three research categories of "databases", "wireless communications", or "data mining"; In addition, each movie corresponds to the following movie themes: "action", "horror", "adventure", "science fiction", and "crime".
• DeepWalk [7] is a classical homogeneous graph-embedding algorithm. It first performs random walks for each node and transforms the obtained random walks into node embeddings using the SkipGram model; is a HIN-embedding algorithm based on meta-path-guided random walks. It first guides random walks through meta-paths defined in advance and then learns node embedding through the SkipGram model. For DBLP, the meta-paths as in [13] are P-A-P and A-P-V-P-A. The selection of P-A-P and P-S-P as meta-paths for ACM is similar to the selection in [26]. Similar to [13], the meta-paths for Foursquare are U-C-P-C-U and P-C-T-C-P, and the meta-paths for Movies are A-M-D-M-A and A-M-C-M-A, respectively; • JUST [16] is a HIN-embedding algorithm that proposes whether meta-paths are necessary or not. The relational information of each node is treated as a domain, and whether to jump to the next node domain is controlled by the Jump and Stay operations. Jump is equivalent to selecting heterogeneous nodes, and similarly, Stay selects homogeneous nodes. It prioritizes heterogeneous Jump and reduces the heterogeneous Jump probability to achieve balanced sampling of heterogeneous and homogeneous edges. Additionally, it stores the last few selected node domain types to restrict the same node types of accesses. Finally, the node embedding is learned through the SkipGram model; • Bidirectional Random Walks are the sampling part of MARU [24]. After this, it is simply called Bidirectional. It involves performing two independent random walks for each node and stitching together a random walk with the node as the center. This is used to overcome the problem that in traditional random walks, low-order nodes are mostly found at the beginning of the walk. It causes the SkipGram model to ignore information at the beginning of the walk, which leads to the loss of node information.
Ref. [24] designed a unique heterogeneous SkipGram model and set the window size to the walk length to obtain complete context window information. However, in this paper, only the classical SkipGram algorithm is used for evaluation; • SchemaWalk [13] is a schema-aware random walk algorithm for HIN embedding. The edge-sampling mechanism is used to control the selection of edge types by exponential descent. Nodes are then randomly drawn from their corresponding edge types to achieve a more balanced sampling effect. Finally, the sampled random walks are put into the SkipGram model to obtain the corresponding node embeddings and learn the node representations.
Note that DeepWalk was originally designed for homogeneous information networks. Applying it to HIN embedding is to consider the graph as a homogeneous graph. There are also HIN-embedding methods that replace the SkipGram model with a unique heterogeneous SkipGram model, such as HIN2Vec [11], Metapath2Vec++ [9], and MARU [24]. We believe that the most expressive random walks can be combined with different learning methods. In what concerns this paper, we highlight the performance of different random walks as a sampling technique. Therefore, we are using only the classical SkipGram model to learn node embeddings. Methods involving more complex learning components are not used as a baseline because comparisons with them are beyond the scope of this work.

Implementation
We implement the proposed model HeteEdgeWalk with Python. The Python version is 3.9.16. SkipGram is implemented using Gensim [28], and the Sklearn [29] library is used to evaluate the results. The same hyper-parameters are set for all methods. Specifically, we set walk length l and number of walks λ to 40 and 80, respectively. We also set window size k and dimension size d to 5 and 128, as well as negative samples n to 5. The average of the results of 10 random data segmentation evaluations is used as the final evaluation result. Specifically, for each segmentation, 10% to 80% of the database is utilized as the training set with a 10% step size, and we evaluate it on a 20% testing set.

Results
In this section, we evaluate the experiments on node classification and node clustering tasks. Subsequently, we analyze the distribution of edge types across random walks and a parameter sensitivity study.

Node Classification Task
The purpose of multi-label node classification is to predict the most probable labels of target nodes according to marked nodes. It focuses on authors in DBLP, movies in Movies, points of interest in Foursquare, and papers in ACM. We train a one-to-all logistic regression function. We divided 80% of the data as the training set and into eight parts, with training percentages [0.1, 0.8] for each split. The remaining 20% of the data served as a testing set. The ten-fold averages of Micro-F1 and Macro-F1 are the evaluation metric. Figure 3 illustrates the performance of six random walks on four different HINs for the multi-label node classification. The results show that increasing the training ratio usually leads to better results. First, it is obvious that HeteEdgeWalk achieves optimal or suboptimal results in most cases. The performance in the DBLP is more outstanding. Both HeteEdgeWalk and SchemaWalk achieve high accuracy scores at just 10% of the training percentage, and they demonstrate stable performance on all datasets. Second, since the performance of Metapath2Vec depends on the suitability of meta-path selection for the current dataset, it performs best in Foursquare but worst in Movies. Finally, DeepWalk and Bidirectional perform well on both Movies and ACM. This shows that both datasets are more relevant to correctly classifying nodes without considering node and edge types. However, they do not perform well on datasets with more heterogeneous information. This suggests that both algorithms are more suitable for homogeneous information networks.
To demonstrate the reliability and validity of the method, we perform statistical significance tests on the results of node classification. We analyze the results of Micro-F1 using a paired t-test. Our method is an experimental group, with a baseline as a control group. Table 2 shows the significance level of our method relative to the baseline. The test results show that in most cases, our method is significant at the 0.01 level compared to the baseline. In ACM, our method and Bidirectional did not show differences. Additionally, in Foursquare, Metapath2vec is significantly different and better than our method. This result is also consistent with the node classification results. Overall, all improvements in HeteEdgeWalk are statistically significant at a 95% confidence level relative to the baselines in most cases.  To demonstrate the reliability and validity of the method, we perform statistical significance tests on the results of node classification. We analyze the results of Micro-F1 using a paired t-test. Our method is an experimental group, with a baseline as a control group. Table 2 shows the significance level of our method relative to the baseline. The test results show that in most cases, our method is significant at the 0.01 level compared to the baseline. In ACM, our method and Bidirectional did not show differences. Additionally,

Node Clustering Task
The task is a node clustering with the aim of forming clusters of similar nodes. Specifically, we use the k-means [30] to cluster the node-embedding vectors and normalized mutual information (NMI) as an evaluation metric. We focus on three datasets, DBLP, ACM, and Foursquare, since k-means can only be used for non-overlapping partial clustering, and the labels in Movies are overlapping, i.e., a movie node has multiple labels. This makes the clustering results too low; no experiments are conducted on Movies. The label categories of DBLP and ACM are four and three classes, respectively. In order not to affect the clustering results, we perform a TSNE [31] dimensionality reduction operation on these two datasets before k-means clustering, which reduces the node-embedding vectors to two dimensions. All results are averaged from 10 experimental data. Table 3 demonstrates the clustering performance of the three HINs, and it can obviously be viewed that HeteEdgeWalk shows the optimal performance in all three datasets. In all datasets, our method outperforms the other methods by at least 0.6%.

Qualitative Analysis
To allow a comparison of the different sampling results, a qualitative analysis is performed. We display the results by visualization for a more intuitive analysis of the sampling.
Edge Type Sampling Distribution. To analyze the sampling distribution of edge types, we compare the DeepWalk, SchemaWalk, and HeteEdgeWalk methods in DBLP and Movies datasets. Figure 4 displays the distribution of edge types for random walks in doughnut charts.  In DBLP, the differences between the three random walk methods are evident. The sampling distribution of DeepWalk is skewed because it is biased towards higher-order nodes in the graph, i.e., the more frequently the node type is sampled, the higher the probability. The nodes of type "venue" and their related edge types "paper-venue" and In DBLP, the differences between the three random walk methods are evident. The sampling distribution of DeepWalk is skewed because it is biased towards higher-order nodes in the graph, i.e., the more frequently the node type is sampled, the higher the probability. The nodes of type "venue" and their related edge types "paper-venue" and "venue-paper" are sampled much more infrequently than other edge types. SchemaWalk aims to achieve an even distribution of nodes so the overall performance is relatively balanced. However, the homogeneous edges of SchemaWalk "paper-paper" are sampled more frequently than the other types. HeteEdgeWalk has a fairly balanced sampling of heterogeneous edge types. It reduces the sampling frequency of "paper-paper" edge types in comparison to SchemaWalk in exchange for more sampling of heterogeneous edge types. For HIN, the information of heterogeneous edges is more important than that of homogeneous edges.
In Movies, DeepWalk samples the "actor-actor" with the highest frequency. SchemaWalk is clearly under-sampling "actor-actor" due to the starvation phenomenon mentioned in [13]. The HeteEdgeWalk prefers to sample heterogeneous edge types evenly, with the effect of reduced homogeneous edge sampling being relatively small. In Movies, the distribution of "actor-actor" is unbalanced, with aggregation and edge distribution issues. Aggregation makes it relatively easy to sample the same types of edges here, and the exponential drop too quickly makes some of the same types of edges inaccessible. The edge problem makes the edge types distributed at the edges of the graph, which are relatively difficult to sample and difficult to access.
It can be seen that our algorithm properly suppresses the sampling of higher-order nodes and homogeneous edges and achieves a more homogeneous sampling of heterogeneous edges. This is also consistent with the theory that it can properly mitigate the bias of higher-order nodes in HIN.

Parameter Sensitivity Study
To investigate the effect of different parameters on the experiment, we conduct a parameter sensitivity study. We investigate the impact of the decay parameter α, the node-embedding dimension d, and the context window k, respectively.

Impact of the Decay Parameter α
The α is the decay rate that adjusts the sampling probability for each edge type. We adjust α in the range [0.1, 0.9] while holding the other parameters constant in steps of 0.1. Figure 5 illustrates the effect of α on the node classification.
The performance peaks at DBLP and Movies when α is 0.4 and fluctuates down afterwards. For Movies, it first peaks at α = 0.4, and performance improves when α > 0.6. In Foursquare, the performance is overall smooth, peaking at α = 0.3, then fluctuating down and picking up slightly at α = 0.7. In addition to ACM, α ∈ [0.2, 0.5] generally gives the finest outcome. This suggests that an appropriate reduction in the value of α can help compensate for the edge types that some nodes lack. However, the analysis results of the ACM dataset differ from the other three datasets. We think this has something to do with the ACM dataset itself. Our algorithm mitigates the bias of higher-order nodes in the graph data by applying an appropriate suppression to it. This allows higher-order nodes to be relatively balanced to reach other types of nodes. However, the ACM data itself has only two edge types, A-P and P-S. At α > 0.4, first, the increase in sampling of A-P is not enough to compensate for the performance degradation due to the lack of information from the reduction of P-S because the author is relatively more useful to the paper than the subject. After A-P sampling compensates for the loss of information, performance starts to pick up, which may be the reason why ACM is different from the others.

Node-Embedding Dimension d
By fixing other parameters, we increase the dimension d from 4 to 256 to investigate its impact on node-embedding results. Node classification results for the four datasets  Figure 6a. Firstly, it can be clearly seen that node-embedding cannot capture sufficient graph information when the dimensions are low, resulting in poor performance. In ACM and Movies, the node classification performance increases as the dimension d rises, but the increase tends to be flat. In contrast, in DBLP and Foursquare, performance decreases after reaching a peak.

Parameter Sensitivity Study
To investigate the effect of different parameters on the experiment, we conduct a pa rameter sensitivity study. We investigate the impact of the decay parameter , the node embedding dimension , and the context window , respectively.

Impact of the Decay Parameter
The is the decay rate that adjusts the sampling probability for each edge type. We adjust in the range [0.1, 0.9] while holding the other parameters constant in steps of 0.1 Figure 5 illustrates the effect of on the node classification. The performance peaks at DBLP and Movies when is 0.4 and fluctuates down af terwards. For Movies, it first peaks at = 0.4, and performance improves when > 0.6. In Foursquare, the performance is overall smooth, peaking at = 0.3, then fluctuating down and picking up slightly at = 0.7. In addition to ACM, ∈ [0.2,0.5] generally gives the finest outcome. This suggests that an appropriate reduction in the value of can help compensate for the edge types that some nodes lack. However, the analysis results of the ACM dataset differ from the other three datasets. We think this has something to do with the ACM dataset itself. Our algorithm mitigates the bias of higher-order nodes in the graph data by applying an appropriate suppression to it. This allows higher-order node to be relatively balanced to reach other types of nodes. However, the ACM data itself has only two edge types, A-P and P-S. At > 0.4, first, the increase in sampling of A-P is no enough to compensate for the performance degradation due to the lack of information from the reduction of P-S because the author is relatively more useful to the paper than the subject. After A-P sampling compensates for the loss of information, performance starts to pick up, which may be the reason why ACM is different from the others.

Node-Embedding Dimension
By fixing other parameters, we increase the dimension from 4 to 256 to investigate its impact on node-embedding results. Node classification results for the four datasets with different dimensions are shown in Figure 6a. Firstly, it can be clearly seen tha node-embedding cannot capture sufficient graph information when the dimensions are low, resulting in poor performance. In ACM and Movies, the node classification

Context Window k
We study the effect of the context window k by keeping the other parameters stationary, increasing them from 1 to 10 in steps of 1. The right column in Figure 6b shows the node classification results for four datasets with different k. Firstly, on the whole, a larger k will generally have better node classification performance as it captures the higher-order node nearest to neighboring degrees. In DBLP and ACM, the performance of the node classification results grows sharply as k increases, and then tends to remain stable. In Foursquare and Movies, however, there is a small fluctuation in performance after reaching a peak. There is a fluctuating downward trend in Foursquare.

Visualization
We use the TSNE [31] dimensionality reduction algorithm to visualize the 2D results of the node embedding in the DBLP and ACM datasets and color them with the groundtruth labels. DeepWalk and SchemaWalk are chosen as the baselines for comparison with HeteEdgeWalk. The upper side of Figure 7 shows the results of DBLP, while the lower side shows the results of ACM. In DBLP, it is clear that the SchemaWalk and HeteEdgeWalk visualizations have four clusters that are more aggregated and consistent with the label set than DeepWalk. The HeteEdgeWalk clusters are the most aggregated and well-defined. However, in ACM, DeepWalk shows some competitiveness, with all three algorithms presenting roughly the same results. Further, the clustering boundaries are relatively fuzzy, which may lead to inaccurate node classification results. The multiple small clustering centers can be seen in DeepWalk, while SchemaWalk has fewer such clustering centers. HeteEdgeWalk performs somewhat better compared to the first two. This result supports the results exhibited by the node classification task.
performance increases as the dimension rises, but the increase tends to be flat. In contrast, in DBLP and Foursquare, performance decreases after reaching a peak.

Context Window k
We study the effect of the context window by keeping the other parameters stationary, increasing them from 1 to 10 in steps of 1. The right column in Figure 6b shows the node classification results for four datasets with different . Firstly, on the whole, a larger will generally have better node classification performance as it captures the higher-order node nearest to neighboring degrees. In DBLP and ACM, the performance of the node classification results grows sharply as increases, and then tends to remain stable. In Foursquare and Movies, however, there is a small fluctuation in performance after reaching a peak. There is a fluctuating downward trend in Foursquare.

Visualization
We use the TSNE [31] dimensionality reduction algorithm to visualize the 2D results of the node embedding in the DBLP and ACM datasets and color them with the groundtruth labels. DeepWalk and SchemaWalk are chosen as the baselines for comparison with HeteEdgeWalk. The upper side of Figure 7 shows the results of DBLP, while the lower side shows the results of ACM. In DBLP, it is clear that the SchemaWalk and HeteEdge-Walk visualizations have four clusters that are more aggregated and consistent with the label set than DeepWalk. The HeteEdgeWalk clusters are the most aggregated and welldefined. However, in ACM, DeepWalk shows some competitiveness, with all three algorithms presenting roughly the same results. Further, the clustering boundaries are relatively fuzzy, which may lead to inaccurate node classification results. The multiple small clustering centers can be seen in DeepWalk, while SchemaWalk has fewer such clustering centers. HeteEdgeWalk performs somewhat better compared to the first two. This result supports the results exhibited by the node classification task.

Discussion
In this section, we first discuss the overall experimental results. Then, we present some real-world potential applications and future research directions.
The results show a maximum performance improvement of 2% for node classification

Discussion
In this section, we first discuss the overall experimental results. Then, we present some real-world potential applications and future research directions.
The results show a maximum performance improvement of 2% for node classification and at least 0.6% for clustering. This demonstrates, to some extent, the advantages of bidirectional edge sampling. The edge-sampling strategy allows for a more detailed sampling of the graph data. It also proves the advantage of bidirectional random wandering over unidirectional. In the analysis of the sampling distribution of edge types, it is obvious that the state of distribution of sampling for most of the edges accounts for about 14%. This proves that our algorithm can alleviate the bias of high-frequency nodes in HIN and can achieve a more uniform sampling of heterogeneous edges.
Future research could combine other deep-learning algorithms to replace SkipGram for different potential applications. A comprehensive grasp of semantic relations can greatly facilitate the performance of HIN representation learning. Language is also a network of complex relations. Better HIN embeddings are better for the semantic extraction of utterances in natural language processing (NLP) [32]. It can be applied in biology, such as protein interaction networks (PINs) [23] and disease-gene relationship studies [33]. It also has some other applications, such as recommender systems [10], business networks [22], media networks [34], social relationship studies [35], etc.
In addition to static networks, dynamic networks are also an interesting direction for development. Similarity matrices can be constructed for networks of different time periods to study the evolution of dynamic networks [21]. Combining spatiotemporal graphical neural networks (STGNNs) may also be a possible research direction.

Conclusions
In this paper, the proposed HeteEdgeWalk is a random walk approach for HIN embedding without meta-path guidance. It uses a dynamically adjusted bidirectional edge-sampling walk strategy. To validate the effectiveness of the algorithm, four HINs are evaluated comprehensively in several aspects of node classification, clustering tasks, and edge-type distribution of random walks. The results show a maximum performance improvement of 2% for node classification and at least 0.6% for clustering. A more balanced sampling of heterogeneous edges is achieved in the sampling analysis. The experiments show that HeteEdgeWalk has the advantage.
The following is a summary of the conclusions: (1) homogeneous graph embedding methods are insufficient to capture heterogeneous graph structures with rich information; (2) sampling based on edge types can capture the rich information in HINs more effectively and carefully than sampling based on node types; (3) HeteEdgeWalk mitigates the effect of high-frequency nodes in random walks and provides a more balanced sampling of heterogeneous edges.