Hypernetwork Representation Learning with the Set Constraint

: There are lots of situations that cannot be described by traditional networks but can be described perfectly by the hypernetwork in the real world. Different from the traditional network, the hypernetwork structure is more complex and poses a great challenge to existing network representation learning methods. Therefore, in order to overcome the challenge of the hypernetwork structure faced by network representation learning, this paper proposes a hypernetwork representation learning method with the set constraint abbreviated as HRSC, which incorporates the hyperedge set associated with the nodes into the process of hypernetwork representation learning to obtain node representation vectors including the hypernetwork topology structure and hyperedge information. Our proposed method is extensively evaluated by the machine learning tasks on four hypernetwork datasets. Experimental results demonstrate that HRSC outperforms other best baseline methods by about 1% on the MovieLens and wordnet datasets in terms of node classiﬁcation, and outperforms the other best baseline methods, respectively, on average by about 29.03%, 1.94%, 26.27% and 6.24% on the GPS, MovieLens, drug, and wordnet datasets in terms of link prediction.


Introduction
Networks are ubiquitous in our daily life, and many real-life applications focus on mining valuable information from networks. A basic issue of data mining is how to learn ideal node representation vectors. To cope with this issue, network representation learning has been proposed to learn a low-dimensional representation vector for each node in the network, which can be applied to lots of machine learning tasks such as node classification [1], link prediction [2], and community detection [3].
Most of the related works learn node representation vectors only from the network topology structure, such as Deepwalk [4], node2vec [5], LINE [6], SDNE [7], and HARP [8]. The basic assumption of these topology-based representation learning methods is that the nodes with similar topological contexts should be tightly distributed in the low-dimensional vector representation space. However, in many real scenarios, the vectors learned only from the network topology structure are not desirable vectors. For example, in the social network, if two users share some common tags or topics, they are likely to be similar but topologically far apart. Thus, topology-based network representation learning methods cannot effectively capture their similarity. In such a case, other types of heterogeneous information should be incorporated to learn node representation vectors of better quality. Therefore, the researchers propose some representation learning methods that utilize both the network topology and heterogeneous information to learn node representation vectors, such as CANE [9], CNRL [10], and PPNE [11].
Nevertheless, most existing network representation learning methods are designed for traditional networks with pairwise relationships, wherein each edge only links one pair of nodes. However, the relationships among the data objects in the real world are much more complex and often not necessarily pairwise. For example, it forms a high-order relation <James, movie, war> that James watches a war movie. The network that captures the high-order relationship is often referred to as the hypernetwork. In view of the above facts, the researchers extend the spectral clustering technique [12] to hypergraphs [13] to propose a series of methods [14][15][16]. However, these methods mainly aim at homogeneous hypernetworks, which construct the hyperedges through potential similarities such as common tags. As for the heterogeneous hypernetwork, tensor decomposition [17,18] can be directly applied to learn node representation vectors, but the time complexity of tensor decomposition is often too high to effectively scale to large-scale hypernetworks. HGNN [19] is a hypergraph neural network model for data representation learning. However, the datasets to evaluate this model is not real hypernetwork datasets, but traditional network datasets. HPHG [20] proposes a deep model named as hyper-gram to learn low-dimensional representation vectors, but the hyperedge is not sufficiently considered in this model so that the hypernetwork topology structure cannot be well preserved. DHNE [21] provides a solution for modeling hyperedges directly without hyperedge decomposition. However, due to the neural network structure used in this method, this method is limited to heterogeneous hyperedges of fixed type and fixed size, and the relationship between multiple types of objects of unfixed size cannot be considered.
In view of the above facts, this paper proposes a universal hypernetwork representation learning framework, which can effectively encode both the hypernetwork topology structure and hyperedge information (which is handled by solving the following challenge). It is not easy how to combine the hypernetwork topology structure and the hyperedges well into a unified representation learning process under a universal framework. Since there are complex interactions between the hypernetwork topology structure and the hyperedges, it is difficult to incorporate the hyperedges into the existing topology-based network representation learning methods.
To deal with the above challenge, this paper proposes a universal hypernetwork representation learning method with the set constraint HRSC to effectively incorporate the hyperedges into the process of hypernetwork representation learning, which is formulated as a joint optimization problem solved by the stochastic gradient ascent (SGA) algorithm. To be more specific, a negative sampling method is utilized to capture the hypernetwork topology structure, which aims to exploit the correlations between the nodes by maximizing the prediction probabilities of the center nodes given their contextual nodes in the random walk node sequences. Besides, the negative sampling method is still utilized to capture semantic correlations between the center nodes and their associated hyperedges by maximizing the prediction probabilities of the center nodes given the hyperedge set associated with the center nodes.
The contribution of this paper mainly includes the following three aspects: • The hypernetwork is transformed into a traditional network abstracted as a twosection graph approximating the hypernetwork topology structure. Then, on the basis of this traditional network, a hypernetwork representation learning method HRSC is proposed to learn node representation vectors by comprehensively utilizing both the hypernetwork topology structure and the hyperedges. • HRSC incorporates the set of the hyperedges associated with the nodes into the process of hypernetwork representation learning, regarding the learning process of node representation vectors as a joint optimization problem. • Efficient and scalable learning algorithm for HRSC has been developed. Our learning algorithm is able to handle large-scale hypernetworks efficiently.
The rest of this paper is organized as follows. In Section 2, we review related works. Section 3 formally defines our studied problem. Section 4 introduces the preliminaries. In Section 5, we introduce the hypernetwork representation learning method with the set constraint in detail. The experimental results are given in Section 6. Finally, we conclude this paper in Section 7.

Related Works
Network representation learning aims to learn a low-dimensional vector for each node in the network, preserving the structure and inherent characteristics of the network. More and more researchers have focused on network representation learning in recent years. For example, Deepwalk [4] introduces Skip-Gram [22], a widely-used distributed word representation method, into the study of the network to learn node representation vectors. node2vec [5] maximally preserves the network neighborhoods of the nodes through a biased random walk strategy, and maps the network nodes to a low-dimensional feature representation space. LINE [6], which can be easily extended to a network of millions of nodes and billions of edges, carefully designs an objective function that preserves both first-order and second-order proximities. SDNE [7] captures highly nonlinear network structures and utilizes first-order and second-order proximities to characterize local and global network structures. HARP [8] is proposed to learn low-dimensional representations of a graph's nodes to preserve high-order structural features by compressing the input graph prior to graph embedding. However, the above-mentioned network representation learning methods only utilize the network structure to learn the node representation vectors, and do not consider heterogeneous information associated with the nodes, such as text, community or label information of the nodes. To cope with this issue, researchers have incorporated heterogeneous information associated with the nodes into the topology-based network representation learning methods. For example, CANE [9] learns the context-aware representation vectors of the nodes with mutual attention mechanism, and models semantic relations between the nodes. CNRL [10] utilizes the similarities between the text theme and the network community to incorporate community information, a key network analysis factor, into the process of network representation learning, which simultaneously detects the community distribution of each node and learns the representation vectors of the nodes and the communities. PPNE [11] effectively incorporates the network topology and attribute information of the nodes into the process of network representation learning.
Nevertheless, the network representation learning methods listed here only consider the pairwise relationships between the nodes in the traditional network. However, in the real world, the complex interaction relationships among no less than two nodes need to be considered sometimes, and the complex relationships can be modeled by the hypernetwork. Unfortunately, the above network representation learning methods are not appropriate for the hypernetwork, since these methods do not fully consider the hypernetwork topology structure.
So far, only a few hypernetwork representation learning methods have been proposed. For example, Zhou [14] extends the powerful method of spectral clustering that is originally run on undirected graphs to the hypergraph, and further develops the hypergraph embedding algorithm on the basis of the spectral hypergraph clustering method. Liu [15] extends the traditional spectral hashing to the hypergraph and accelerates the similarity search of social images by mapping semantically related nodes to similar binary codes within a short hamming distance. Wu [16] utilizes two hypergraph views to represent text-related and theme-related images and proposes a spectral multi-hypergraph clustering algorithm to classify the images. HGNN [19] a hypergraph neural network model for data representation learning. HPHG [20] proposes a deep model named as hyper-gram to learn low-dimensional representation vectors. DHNE [21] is proposed to learn low-dimensional representation vectors of the hypernetwork with non-decomposable hyperedges. HHNE [23] is a depth model based on graph convolutional network, which incorporates the features of the nodes into the process of network representation learning. However, although all of the above hypernetwork representation learning methods have good representation learning ability, there are various problems. In order to deal with these problems, this paper proposes a universal hypernetwork representation learning method with the set constraint to learn discriminative node representation vectors. Table 1 shows the statistics of the pros and cons for different network representation learning methods.

Problem Definition
This section formally defines our studied problem. The hypernetwork is usually abstracted as the hypergraph H = (V, E) with the set of nodes belonging to T types t=1 and the set of hyperedges which can be associated with no less than two nodes . If the number of nodes is two for each hyperedge, the hypernetwork degenerates to a traditional network. If T ≥ 2, the hypernetwork is defined as a heterogeneous hypernetwork. Here the problem of hypernetwork representation learning with the set constraint is formally defined as follows. Definition 1. Given a hypernetwork H = (V, E), the problem of hypernetwork representation learning with the set constraint aims to learn a low-dimensional vector r n ∈ R k for each node n in the hypernetwork, where k is expected to be much smaller than |V| , making the learned node representation vectors explicitly preserve both the hypernetwork topology structure and the hyperedges.

Preliminaries
Since the study on traditional graphs is relatively mature, it can be regarded as a desirable way to study the hypergraph to deepen the understanding of the corresponding hypergraph by the study on the traditional graph (if the hypergraph can be transformed into the traditional graph). In the literature [13], it is proposed that the hypergraph can be transformed into three types of traditional graphs, namely a line graph, two-section graph, and incidence graph, where the line graph has only the nodes converted from the hyperedges and no nodes are associated with the hyperedges (which indicates that the relationship between the nodes associated with the hyperedges is lost, leading to a large loss of the hypernetwork topology structure). The incidence graph introduces the nodes converted from the hyperedges to weaken the strong correlation relationships between the nodes associated with the hyperedges, which leads to a certain amount of loss of hypernetwork topology structure. The two-section graph makes all of the nodes associated with each hyperedge joined into a complete graph, which leads to less loss of hypernetwork topology structure than other two graphs. Therefore, a two-section graph that approximates the hypernetwork topology structure has the characteristic of preserving the hypergraph topology structure better than other transformation strategies. Hence, in this paper, the hypergraph is transformed into a two-section graph, and then the traditional network topology structure abstracted as two-section graph and the hyperedges in the hypernetwork abstracted as the hypergraph are comprehensively utilized to carry out the study of the hypernetwork representation learning. The detailed strategy for transforming the hypergraph into a two-section graph is as follows.
Given a hypergraph H = (V, E), two-section graph S = (V , E ) corresponding to the hypergraph H is a traditional graph that meets the following conditions: that is, the node set of 2-section graph S is the same as the node set of the hypergraph H.

•
One edge is linked between any two different nodes if and only if they are simultaneously contained in at least one hyperedge (that is, all of the nodes in each hyperedge are joined into a complete graph).
A hypergraph and two-section graph of its transformation are shown in Figure 1.

Our Method
In this section, we introduce the details of hypernetwork representation learning method with the set constraint. Firstly, Section 5.1 gives the introduction of topologyderived objective function. Secondly, Section 5.2 introduces the set constraint objective function. Thirdly, HRSC based on the joint optimization of the above two objective functions is introduced in details in Section 5.3. Finally, Section 5.4 introduces the complexity analysis of HRSC.

Topology-Derived Objective Function
Continuous Bag-of-Words [22], abbreviated as CBOW, is a popular distributed word representation method whose computational efficiency is higher than that of Skip-Gram. Hence, in order to improve the computational efficiency of node representation vectors, following the idea of CBOW, with such an assumption that the nodes with similar topology context tend to be similar, we aimed to maximize the prediction probabilities of the center node given its contextual nodes, which are defined as the previous nodes and after nodes of the center node with a fixed-size window in the node sequences generated by random walk. We propose a novel negative sampling-based topology-derived objective function in which the node representation vectors are regarded as the parameters. In the optimization procedure, the representation vectors are updated and the finally learned vectors preserve the hypernetwork topology structure.
A set of node sequences C is generated by the same random walk on the traditional network abstracted as 2-section graph as Deepwalk. For each node, a node sequence of length t to start from this node is generated in the random walk iteration, which will repeat τ times to generate enough sequences.
Under the condition of a center node n and its contextual nodes context(n), the node n is regarded as the positive sample, other nodes are the negative samples, and NEG(n) is the subset of negative samples of the center node n with a predefined size ds. For ∀ u ∈ V, the labels of the node are defined as follows: p(u|context(n)) denotes the prediction probability of the node u under the condition of the contextual nodes context(n). On the basis of the node sequences C, we try to maximize the following objective function.
As for each node n, this paper designs the embedding vector and the parameter vector, where the embedding vector v n is the representation of the node n when it is treated as the contextual node, while the parameter vector θ n is the representation of n when it is treated as the center node. p(u|context(n)) in Formula (2) is defined as follows.
where σ(X T n θ u ) = 1 1+e −X T n θu is a sigmoid function, where X n is the summing operation of the representation vectors corresponding to context(n). Formula (3) can also be rewritten as an integral expression.
Accordingly, Formula (2) can be rewritten as follows.
Formally, maximizing D 1 means maximizing the prediction probabilities of positive samples and minimizing the prediction probabilities of negative samples simultaneously, by which the hypernetwork topology structure is encoded into the node representation vectors.

Set Constraint Objective Function
The model based on the above topology-derived objective function only takes the traditional network topology structure, abstracted as on a two-section graph approximating the hypernetwork topology structure as input to learn node representation vectors, but ignores the relations among the nodes (namely hyperedges). Hence, it cannot learn representation vectors of the nodes associated with the hyperedges well. In order to learn node representation vectors better, the set of the hyperedges associated with the nodes is incorporated into the process of hypernetwork representation learning.
Inspired by the above topology-derived objective function, this paper proposes a novel negative sampling-based set constraint objective function, where T n denotes the set of the hyperedges associated with the center node n ∈ V, and also the set of the nodes associated with the center node n ∈ V if the hyperedge is regarded as the node. The center node n is regarded as the positive sample, and other nodes that are not associated with the center node n ∈ V are regarded as the negative samples. For ∀ v ∈ T n , NEG(v) is the subset of negative samples with a predefined size ds, and the labels of the node are defined as follows: On the basis of the set constraint and the node sequences C, we aim to maximize the following objective function to force the embedding vectors to satisfy the extracted constraints.
where e v is the parameter vector corresponding to v ∈ T n .
By maximizing D 2 , the hyperedges is encoded into node representation vectors.

Joint Optimization Objective Function
A hypernetwork representation learning method with the set constraint HRSC, which aims to jointly optimize the topology-derived objective function and set constraint objective function, is proposed in this subsection. Figure 2 indicates HRSC framework.
, v i+s are denoted as the first red dot from the left, the second red dot from the left, the red dot in the middle, the second red dot from the right, the first red dot from the right, SUM denoted as all red is the summation of all of the vectors corresponding to context(v i ), and T v i is the set of hyperedges associated with the center node v i , and also the set of the nodes associated with the center node v i if the hyperedge is regarded as the node.
Compared to other hypernetwork representation learning methods, HRSC is improved at two levels: (I) at the hypernetwork topology structure level, HRSC exploits the correlations between the nodes by maximizing the prediction probabilities of the center nodes given their contextual nodes in random walk node sequences; (II) at the node-hyperedge level, HRSC captures semantic correlations between the center nodes and their associated hyperedges by maximizing the prediction probabilities of the center nodes given the set of the hyperedges associated with the center nodes. By means of the above improvements, HRSC effectively incorporates the hyperedges into the process of hypernetwork representation learning to get node representation vectors of better quality.
As shown in Figure 2, the hypernetwork topology representation and hyperedge representation learned by the model based on the topology-derived objective function and the model based on the set constraint objective function respectively share the same representation, which can comprehensively utilize the contexts and hyperedge information of each node to get the vectors of better quality.
For ease of calculation, take the logarithm of D 1 and D 2 , we aim to maximize the following joint objective function of HRSC.
where β is the harmonic factors to balance the model based on the topology-derived objective function and the model based on the set constraint objective function. For ease of derivation, L(n, u, v, ϑ) is defined as follows.
And then the joint objective function L is optimized by means of the stochastic gradient ascent algorithm. The key is to give four kinds of gradients of L.
Firstly, the gradient on θ u of L(n, u, v, ϑ) is calculated as follows.
Accordingly, the updating formula of θ u is expressed as follows.
where α is the learning rate of HRSC. Secondly, the gradient on X n of L(n, u, v, ϑ) is calculated. The symmetry property between θ u and X n is utilized to get the gradient on X n .
Accordingly, the updating formula of v v is expressed as follows, where v ∈ context(n).
Thirdly, the gradient on θ ϑ of L(n, u, v, ϑ) is calculated as follows.
Accordingly, the updating formula of θ ϑ is expressed as follows.
Fourthly, the gradient on e v of L(n, u, v, ϑ) is calculated. The symmetry property between θ ϑ and e v is utilized to get the gradient on e v .
Accordingly, the updating formula of e v is expressed as follows, where v ∈ T n .
The stochastic gradient ascent (SGA) algorithm is used for optimization. In our implementation, the effect of β is approximated through instance sampling (node-node and node-hyperedge) in each training epoch. More details are shown in Algorithm 1.

Input:
Hypernetwork H = (V, E) Embedding size d Output: Embedding matrix X ∈ R |V|×d for node n in V do initialize embedding vector v n ∈ R 1×d initialize parameter vector θ n ∈ R 1×d for node v in T n do initialize parameter vector e v ∈ R 1×d end for end for node sequences C = RandomWalk() for (n, context(n)) in C do update parameter vectors following Formula (11) update embedding vectors following Formula (13) update parameter vectors following Formula (15) for node v in T n do update parameter vectors following Formula (17) end for end for for i = 0; i < |V|; i + + do X i = v v i end for return X

Complexity Analysis
The time complexity of Deepwalk is O(|C| · 2b · log |V|) , where b is the window size of the contextual nodes. In Formula (8), the time complexity of HRSC based on joint objective function is further reduced to O(|C| · (ds + 1) · (β · M + 1)) , where the time complexities of the models based on the topology-derived objective function and the set constraint objective function are O(|C| · (ds + 1)) and O(|C| · (ds + 1) · M) , respectively, where ds is the predefined size of the negative sample set and also a constant number irrelevant to the size of the network, and M = max T v 1 |, |T v 2 |, . . . , |T v |V| is the maximum number of the set of the hyperedges associated with the node v i . Compared to Deepwalk, the computational efficiency of HRSC is improved, since the computational efficiency of the models in HRSC derived from CBOW is higher than that of Skip-Gram in Deepwalk.
Meanwhile, the time complexity of the negative sampling adopted in HRSC is lower than that of the hierarchical SoftMax adopted in Deepwalk.

Dataset
Four hypernetwork datasets, including a GPS network, an online social network, a drug network and a semantic network are utilized to evaluate the effectiveness of our proposed method. Detailed dataset statistics is shown in Table 2. Four real-world hypernetwork datasets are as follows: • GPS [24] describes a situation in which a user participates in an activity in a location. The relationships <user, location, activity> are regarded as hyperedges to construct a hypernetwork.

•
MovieLens [25] describes personal tag activities from MovieLens. The relationships <user, movie, tag> are regarded as hyperedges to construct a hypernetwork, where each movie has at least one genre. • Drug (http://www.fda.gov/Drugs/. 27 January 2020) is from FDA adverse event reporting system (FAERS), including information on adverse events and medication error reports submitted to the FDA. The relationships <user, drug, reaction> are regarded as hyperedges to construct a hypernetwork, that is, the users who take some drugs have certain reactions to lead to adverse events. • wordnet [26] is composed of a set of triplets <head, relation, tail> extracted from WordNet3.0. The relationships <head, relation, tail> are regarded as hyperedges to construct a hypernetwork.

Baseline Methods
Deepwalk. Deepwalk is a popular topology-derived network representation learning method, and utilizes local information obtained from truncated random walk node sequences to learn node representation vectors.
node2vec. node2vec is an algorithmic framework for learning node representation vectors, which designs a biased random walk strategy to maximally preserve the network neighborhoods of the nodes.
LINE. LINE is proposed to learn node representation vectors for large scale networks, which designs an objective function that preserves both first-order proximity and secondorder proximity.
GraRep. GraRep [27] captures global structure properties of a graph using k-step loss functions defined on the graph that integrates rich local structure information.
HOPE. HOPE [28] preserves higher-order proximity of a graph and captures the asymmetric transitivity.
SDNE. SDNE captures highly nonlinear network structure and utilizes first-order proximity and second-order proximity to characterize local and global network structures.
HRSC. HRSC incorporates the set of the hyperedges associated with the nodes into the process of hypernetwork representation learning, regards the learning process of node representation vectors as a joint optimization problem, and obtains node representation vectors including the hypernetwork topology structure and the hyperedges by means of the stochastic gradient ascent (SGA) algorithm to solve this joint optimization problem.

Experiment Setup
The experiments are conducted on four hypernetwork datasets. The node classification and link prediction tasks are adopted to verify the feasibility of our proposed method. As for all four datasets, the vector dimension is set as 100, the number of random walks to start at each node as 10, and the length of random walks to start at each node as 40. A portion of datasets are randomly selected as training set, and the rest is testing set.

Node Classification
The multi-label classification tasks are carried out on MovieLens and wordnet datasets, since only the two datasets have label information. In addition, we remove the nodes that did not have labels in the two datasets. The node representation vectors are taken as input to train SVM [29] classifier to calculate node classification accuracies. Tables 3 and 4 show node classification accuracies on the MovieLens and wordnet datasets. From the two tables, we have obtained the following observations:

•
On the MovieLens and wordnet datasets, the node classification performance of HRSC is superior to that of other baseline methods. For example, for these two datasets, HRSC outperforms other best baseline method (i.e., Deepwalk) by about 1%. Meanwhile, the node classification performance of HRSC exceeds the remaining baseline methods to a certain extent. Experimental results demonstrate that HRSC is effective and robust, since HRSC incorporates the hyperedges into the process of hypernetwork representation learning to learn node representation vectors of better quality than Deepwalk and other baseline methods. Meanwhile, from the above results, we can find that the node classification performance of HRSC exceeds other baseline methods very little, since the categorical attributes on the MovieLens and wordnet datasets are not obvious.

•
The node classification performance of GraRep is second only to that of HRSC and DeepWalk, and its node classification performance is nearly the same as that of Deep-Walk. The reason for this is that GraRep is an algorithm framework that integrates the global structure information of a graph into the representation learning process, and considers the hyperedges to a certain extent in the process of network representation learning. • Although SDNE preserves local and global structure information in the process of network representation learning, it does not perform well in the hypernetwork dataset, which indicates that SDNE is not universal. In summary, we find that HRSC obtains high-quality node representation vectors by comprehensively considering the hypernetwork topology structure and the hyperedges.

Link Prediction
Link prediction is widely used in real life, especially in recommendation systems. In this subsection, we carry out link prediction tasks on all four datasets. As for the link prediction, the evaluation measure AUC [30] is adopted. As shown in Tables 5-8, the edges with different ratios are randomly removed from the four datasets respectively to evaluate the AUC. • On the GPS, drug and wordnet datasets, the link prediction performance of HRSC is superior to that of other baseline methods. On the MovieLens dataset, the link prediction performance of HRSC is better than that of most of other baseline methods, specially, HRSC performs better than DeepWalk and GraRep at the smaller training ratios, and worse than DeepWalk and GraRep at larger training ratios, which indicates the effectiveness of HRSC, and verifies that HRSC has the ability to accurately estimate the relations among the nodes.

•
The link prediction performance of DeepWalk is second only to that of HRSC, which indicates that node representation vectors obtained by directly training DeepWalk on the hypernetwork datasets contains the hyperedge information to a certain extent. It is worth noting that SDNE is the worst performing baseline method overall and has unstable performance at different training ratios, which verifies that SDNE is not universal. In contrast, HRSC performs consistently at different training ratios, which demonstrates its feasibility and robustness. • By introducing the hyperedges, the node representation vectors learned by HRSC are better than that of other baseline methods without considering the hyperedges, which verifies our assumption that it is beneficial to link prediction task to incorporate the hyperedges into the process of hypernetwork representation learning. In short, the above observations show that HRSC can learn high-quality node representation vectors, which help to accurately estimate the relations among the nodes. In addition, the experimental results of the link prediction task demonstrate the effectiveness and robustness of HRSC.

Parameter Sensitivity
HRSC has a harmonic factor β to balance the model based on the topology-derived objective function and the model based on the set constraint objective function. We fix the training ratio to 50%, and test node classification accuracies of HRSC with different β. We let β range from 0.1 to 0.9 on the MovieLens and wordnet datasets. Figure 3 shows the comparisons of node classification accuracies of HRSC with different β. From Figure 3, the variation trend of node classification accuracies is different with the increase of the parameter β on the two datasets, but all of the variation ranges are within 2%, which indicates that the overall performance of HRSC is not sensitive to the parameter β, demonstrating the robustness of HRSC.
For the MovieLens and wordnet datasets, the best evaluated results in terms of node classifications are achieved at β = 0.5. Although the hyperedges are fully incorporated into the process of hypernetwork representation learning to learn node representation vectors, partial hypernetwork topology structure information is still lost, which indicates that the quality of node representation vectors is optimal.

Conclusions
This paper proposes a hypernetwork representation learning method with the set constraint, which effectively incorporates the hyperedges into the process of hypernetwork representation learning and regards the learning process of node representation vectors as a joint optimization problem. By using an effective stochastic gradient ascend algorithm to solve the joint optimization problem, the node representation vectors including the hypernetwork topology structure and the hyperedges can be obtained. We conducted extensive experiments with four hypernetwork datasets. The experimental results demonstrate that HRSC outperforms other best baseline methods by about 1% on the MovieLens and wordnet datasets in terms of node classification, and outperforms other best baseline methods respectively on average by about 29.03%, 1.94%, 26.27%, and 6.24% on the GPS, MovieLens, drug and wordnet datasets in terms of link prediction.