Hyperbolic Directed Hypergraph-Based Reasoning for Multi-Hop KBQA

: The target of the multi-hop knowledge base question-answering task is to ﬁnd answers of some factoid questions by reasoning across multiple knowledge triples in the knowledge base. Most of the existing methods for multi-hop knowledge base question answering based on a general knowledge graph ignore the semantic relationship between each hop. However, modeling the knowledge base as a directed hypergraph has the problems of sparse incidence matrices and asymmetric Laplacian matrices. To make up for the deﬁciency, we propose a directed hypergraph convolutional network modeled on hyperbolic space, which can better deal with the sparse structure, and effectively adapt to the problem of an asymmetric incidence matrix of directed hypergraphs modeled on a knowledge base. We propose an interpretable KBQA model based on the hyperbolic directed hypergraph convolutional neural network named HDH-GCN which can update relation semantic information hop-by-hop and pays attention to different relations at different hops. The model can improve the accuracy of the multi-hop knowledge base question-answering task, and has application value in text question answering, human–computer interactions and other ﬁelds. Extensive experiments on benchmarks—PQL, MetaQA—demonstrate the effectiveness and universality of our HDH-GCN model, leading to state-of-the-art performance.


Introduction
Knowledge base question answering (KBQA) has been a hot task in the field of natural language processing and is very challenging [1]. Several different QA datasets have been proposed, such as the Stanford Question Answering Dataset (SQuAD) [2,3], NarrativeQA [4] and CoQA [5], and the kind of reasoning based on these datasets is termed single-hop reasoning, since it requires reasoning only over a single piece of evidence [6]. For a QA task of single-hop reasoning, the performance of previous work [1,7] has been improved a lot over the last years.
However, in real-world QA tasks, obtaining answers often requires multi-hop reasoning [8], that is, to find a knowledge path consisting of multiple pieces of knowledge in the knowledge base to deduce the answer. Figure 1 shows a two-hop KBQA example. The reasoning path starts from the entity mentioned in the query and consists of the relations at each hop and the intermediate entities. The methods mentioned above that focus only on single-hop reasoning lack the ability to deal with multi-hop reasoning QA tasks. To solve a multi-hop QA task, some work has been proposed recently which can be mainly divided into two categories. One is the neural network-based methods such as models in [9,10] and the other is the graph neural network-based methods such as [11]; this has achieved desirable performance [12].
When we deal with multi-hop tasks, generally, entities extracted from the query begin to be retrieved in the knowledge base, go down to the next hop according to the specific relationship of different hops, repeat this step to form a reasoning path, and finally find the final answer. In this process, we argue that semantic relational information is crucial for multi-hop reasoning, while previous studies have not fully exploited semantic relational information. The work in [13] computes relation-specific transformations by separating different relations, which does not consider semantic relation information. Ref. [14] does not require updating relational information during multi-hop reasoning, but exploits relational information to obtain attention for static graphs. In addition, the pairwise connections between nodes based on a graph network (GNN) are insufficient with respect to fully representing the higher-order relationships between relationships and entities in the knowledge graph. Recently, some main works on hypergraph convolutional networks (HGNN) have been proposed. HGNN uses hyperedges to connect more than two nodes at the same time, which is conducive to imitating human reasoning and accurately locating a group of entities connected by the same relationship, rather than reasoning entity by entity. The disadvantage is that HGNN is aimed at undirected hypergraphs, while knowledge graphs are directed, and each triplet has a specific directional meaning. HGNN collects potential learning relationships from connected entities, but does not reveal and further utilize them.

Child of Deaf Adults
Based on the study in the hypergraph neural network introduced above, a directed hypergraph convolutional network-based model for multi-hop KBQA (2HR-DR) was proposed [15]. 2HR-DR models the entities extracted from questions and their related relationships and entities in the knowledge base into directed hypergraphs, and then uses Directed Hypergraph Convolutional Networks (DHGCN) [15] to predict relations hop-byhop and form a sequential relation path to make the reasoning interpretable. 2HR-DR can explicitly learn and update relation information and dynamically concentrate on different relations at different hops.
Although 2HR-DR can better solve some of the challenges mentioned above, it still has some disadvantages. First, using directed hypergraphs to model entities and relationships may lead to a situation in which an entity can be related to many entities based on some relations while it may be related to a few entities based on some other relations. For example, as shown in Figure 1, when constructing a hypergraph of the query, the number of entities (actors) related to "Child_of_Deaf_Adults" can be very small based on some relations such as "Staring_in" while a large number of entities (actors) do "Act_in" this movie. This results in a large difference in the number of nodes contained by each hyperedge in the modeled directed hypergraph. In that case, the incidence matrices of constructed hypergraphs will be much sparser, and that will have a bad effect on the training efficiency and accuracy of the model. Second, 2HR-DR used the directed hypergraph convolution network, which needs the eigenvalue decomposition of Laplacion matrices when calculating the spectrum convolution of hypergraphs, and that requires that the Laplacian matrices are real symmetric matrices (we are not able to ensure that non-symmetric matrices can certainly perform eigenvalue decomposition). However, as for directed hypergraph convolution networks, since each hyperedge has a direction, the degree of each node is supposed to be divided into in degree and out degree, which are different in most cases; this leads to the fact that Laplacian matrices are often asymmetric matrices.
To solve the problems mentioned above, we firstly propose a Hyperbolic Directed Hypergraph Convolutional Network (HDH-GCN) for a directed hypergraph to take the direction of information transmission into account. We investigate hyperbolic embedding spaces [16] and manage to map the sparse data points and hypergraph to the hyperboloid manifold directly. The rationale is that hyperbolic space has a stronger ability than Euclidean space to accommodate networks with long-tailed distributions and sparse structures [17], which is also verified in our experiments. On that basis, we propose a Hyperbolic Directed Hypergraph Convolutional Network (HDH-GCN)-based framework for multi-hop QA. This framework explicitly updates the relation information and dynamically focuses on specific relations at every hop of the query. In addition, we record the semantic representation of the relationship in each hop, and the representation of the relationship in every hop is influenced by the representation of the relationship in the previous hops, which makes the QA task interpretable to a large extent.
In summary, we make the following contributions: • For solving the problem of sparse incidence matrices of directed hypergraphs modeled on a knowledge base, we design a method of modeling a directed hypergraph in hyperbolic space. • Based on the hyperbolic directed hypergraph, we propose a Hyperbolic Directed Hypergraph Convolutional Network (HDH-GCN) for a directed hypergraph and design a framework on this basis that can handle multi-hop knowledge base questionanswering tasks well. • The modules constitute a new model, namely, HDH-GCN for handling the multi-hop knowledge base question-answering task. Through the experiments on several realworld datasets, we confirm the superiority of HDH-GCN over state-of-the-art models.

Multi-Hop Question Answering
The multi-hop KBQA model can be basically divided into two types. The first is to apply the neural network mentioned earlier. These models use the previous single-hop question-answering method [18][19][20] for multi-hop question-answering tasks. Xu et al. improved KVMemNet to achieve better results across multiple triples [10]. Zhong et al. used coarse-grained modules and fine-grained modules [21]. Part of the method introduces an end-to-end framework, which is explicitly designed to simulate the step-by-step reasoning process involved in multi-hop QA and MRC. Kundu et al.'s [22] model constructed a path connecting questions and candidate answers, and then scored them through the neural architecture. Jiang et al. [23] also constructed a proposer to propose an answer from each root to leaf path in the reasoning tree, extract a key sentence containing the proposed answer from each path and finally combine them to predict the final answer. However, these methods lack consideration of graph structure information.
The other kind of method is based on graph neural networks. Sun et al. learnt what to retrieve from the KB and corpus and then reasoned over the built graph [24]. Tu et al. employed GCN to reason over heterogeneous graphs [25]. Xiong et al. achieved better performance by applying graph attention networks [14]. Cao et al. proposed a bi-directional attention entity graph convolutional network [26]. These models use r-GCN [13], which does not consider the semantic relation information, or use graph attention networks to assign static weights. Different from these models, ref. [15] proposes a dynamic relation strategy, which dynamically updates relation states during the reasoning process. Documents unrelated to the complex query may affect the accuracy of the model. In the "select, answer, and explain" (SAE) model proposed by Tu et al. [27], BERT [28] acts as the encoder in the selection module. Then a sentence extractor is applied to the output of BERT to obtain the sequential output of each sentence with precalculated sentence start and end indices, to filter out answer-unrelated documents and thus reduce the amount of distraction information. The selected answer-related documents are then input to a model, which jointly predicts the answer and supporting sentences. Concurrently to the SAE model, Bhargav et al. [29] used a two-stage BERT-based architecture to first select the supporting sentence and then used the filtered supporting sentence to predict the answer. The upstream side of Jiang et al.'s [23] proposed model is the Document Explorer to iteratively address relevant documents. Han et al. [15] proposed two-phase hypergraph-based reasoning with dynamic relations which explicitly learns and updates relation information and dynamically concentrates on different relations at different hops.

Hypergraph Convolutional Networks
Feng et al. [30] proposed a hypergraph neural network, which replaces the general graph with a hypergraph structure, effectively encoding the higher-order data correlation. Bai et al. [31] further enhanced the representational learning ability by using attention modules. Yadati, N. et al. [32] proposed a new method of training a GCN on a hypergraph using tools from the spectral theory of hypergraphs and applying the method to the problems of SSL (hypergraph-based semi-supervised learning) and combinatorial optimization on real-world hypergraphs. Zhang et al. [33] developed a new self-attention-based graph neural network applicable to homogeneous and heterogeneous hypergraphs with variable hyperlink sizes. Han et al. [15] proposed a directed hypergraph convolutional network that incorporates direction information into HGNN to deal with a directed knowledge graph.

Hyperbolic Neural Networks
Hyperbolic space has always been a popular research domain in mathematics. Some works have been conducted to explore the treelike structure of graphs [34,35] and the relations between hyperbolic space and hierarchical data such as languages and complex networks [36,37]. Such works have demonstrated the consistency between real-world scale-free and hierarchical data and the hyperbolic space, providing a theoretical basis for recent works which apply hyperbolic space to various tasks including link prediction, node classification and recommendation. Some researchers apply hyperbolic space to traditional metric learning approaches such as HyperBPR [38] and HyperML [39]. Some try to adopt hyperbolic space to neural networks and define hyperbolic neural network operations, producing powerful models such as hyperbolic neural networks [40], hyperbolic graph neural networks [41] and hyperbolic convolutional neural networks [17]. Meanwhile, ref. [42] provides a scalable hyperbolic recommender system for industry use. Ref. [43] applies hyperbolic space to heterogeneous networks for link prediction tasks. Ref. [44] applies hyperbolic space to next-POI recommendation. Ref. [45] proposes a path-based recommendation approach with hyperbolic embeddings, etc.

Hyperbolic Directed Hypergraph Convolutional Networks
In this section, we are going to introduce the directed hypergraph convolutional network constructed on hyperbolic space. Definitions of notations used in the text are shown in the Table 1.

Undirected Hypergraph Convolutional Network
We first introduce undirected hypergraph convolutional neural networks. Different from simple graphs, hyperedges in a hypergraph may contain two or more vertices. A hypergraph can be defined as = (V, E, W), which includes a vertex set V, a hyperedge set E and each hyperedge is assigned with a weight by W which is a diagonal matrix whose diagonal lines are the weights of each hyperedge. We use a |V| × |E| incidence matrix H to denote a hypergraph , and H can be concretely expressed as: For every vertex v ∈ V, the degree can be defined as v and ∆ = I − Θ is defined as the hypergraph Laplacian [46], according to the Laplacian expression form, it can be known that it is a symmetric semidefinite matrix. It can be obtained by eigenvalue decomposition of symmetric positive semidefinite matrices that ∆ = ΦΛΦ T ; [30] use the eigen vectors as the Fourier bases and the eigenvalues as frequencies to express the spectral convolution as: Ref. [30] then approximate the above equation by Chebyshev polynomials, modify inside parameters appropriately and finally formulate hyperedge convolution as: where X (l) is the node feature of the hyperedge at layer l, and P is the learnable parameter.

Hyperbolic Directed Hypergraph Convolutional Neural Network
As shown in Section 3.1.1, the derivation process of hyperedge convolution is based on the eigenvalue decomposition of the Laplacian matrix of the hypergraph. In an undirected hypergraph, the Laplacian matrix is a symmetric positive semidefinite matrix because the degree matrices of its vertices and hyperedges are unique, so the eigenvalue decomposition always works. However, for directed hypergraphs, due to hyperedges having direction, the degree matrices of vertices and hyperedges should be divided into out-degree matrix and in-degree matrix, and according to the random walk explanation of spectral hypergraph partitioning [46], the specific representation of the Laplacian matrix of a directed hypergraph should be as follows: where D tail v and D head e are the diagonal matrices of tail degrees of nodes and head degrees of hyperedges, H tail and H head stand for the tail and head incidence matrices.
Since the two incidence matrices are generally different in a directed hypergraph, the Laplacian matrix is often not a symmetric matrix; as a result, the directed hypergraph modeled on the knowledge base may not be able to derive the hyperedge convolution as the undirected hypergraph shown in Section 3.1.1, and this will produce calculation error to some extent.
Aiming at solving the problem above and the sparsity issue in hypergraphs modeled on a knowledge base, we apply the variant forms of GCN on the hyperbolic space [17] to the directed hypergraph and obtain the matrix form of the directed hypergraph convolution network on the hyperbolic space. The directed hypergraph convolution operations on hyperbolic spaces aggregate each representation vector individually in vector dimensions, without the aforementioned problem of symmetry of the Laplacian matrix of directed hypergraphs. The specific process is as follows: We first transform the initial item features from Euclidean space to hyperbolic space H K , and then we feed the initial hyperbolic item embeddings to learn item embeddings. For the hyperbolic space, we set α := { √ K, 0, 0, . . . , 0} ∈ H K as the north pole in H K , and the negative curvature of the hyperboloid manifold is − 1 K . Then the initial item features in hyperbolic space can be deduced from Euclidean space as follows: where x (0,H) and x (0,E) are the initial hyperbolic embedding and the initial Euclidean embedding, respectively. For the directed hypergraph convolutional network, when updating entity representation through the convolutional layer, we first aggregate the head entities in directed hyperedges to obtain the representation of relations, and then accumulate relation representation containing the same tail entity to obtain the representation of the tail entity so as to continuously update the representation of entities. The representation can be aggregated via the following convolutional operation in hyperbolic space: where r L,H i is the hyperbolic hidden embedding of relation e i in the L-th layer after aggregation, the node j's hyperbolic embedding is transformed to Euclidean embedding via log K α , so the Euclidean-based sum and add operations are available. exp K α aims to transform the Euclidean-based embedding to hyperbolic embedding. M ij is the projecting weight defined as follows: Accordingly, we can write an expression that evaluates the tail entity representation: where x L,H t is the hyperbolic embedding of tail entity x t in the L-th layer, e t tail stands for the directed hyperedge containing the tail entity x t .
Because exp K α and log K α are the inverse of each other, the total convolution is as follows: The formula cancels out the adjacent inverse operation exp K α and log K α , and applies max pooling to obtain weights for each relation, To facilitate the formulation of the model, we rewrite it in matrix form: where X L+1,H and X (L,H) are the entity representation matrix in the L + 1-th and L-th layer, respectively. W stands for a diagonal matrix of hyperedge weights, M head and M tail are the aggregate matrices of the head part and tail part of directed hyperedges.

Model
In this section, we are going to introduce the concrete model for the multi-hop knowledge base QA task. The overview of how the model works is shwon in Figure 2.

Query-Aware Entity Encoder
The query-aware entity encoder encodes entities and relations in questions and their potential related entities in knowledge base to vector representation. Let L q , L e and L r respectively denote the embedding matrices of question, entities and relations; we begin by encoding each question using bidirectional Gated Recurrent Units (GRUs) [47].
We then follow the work of [21]; we employ co-attention to learn query-aware entity representation, C e = so f tmax(A eq )E q (16) where so f tmax stands for column-wise normalization, f c is a linear network which converts 2h dimension to h dimension.

Reasoning over Hypergraph
According to the property that relation embedding can be obtained in the intermediate processes of hyperbolic directed hypergraph convolutional networks, we separate the hyperbolic directed hypergraph convolutional networks into two steps. Specifically, the model first collects the features of nodes onto the connected hyperedges and explicitly represents the learning relationship. Then, it dynamically assigns the weight of the relationship according to the similarity between the problem and the relationship, and predicts the current relation. Finally, the node status is updated through the connection relationship information. The specific process is as follows.
Firstly, assuming that the current hop is l, we use a linear network to concatenate the node status obtained by the previous l − 1 layer and the input entity representation of the current hop, and then map it onto the hyperbolic space.
where operator [ ; ] is column-wise concatenation. Then the model learns the relation representation R l,H by aggregating the connected head entity feature.
where P r is the relation-specific learnable parameter, we then use a linear network to concatenate the relation representation obtained by the previous l − 1 layer and U l,H r to obtain a representation of relations at hop l.
After that, we apply w L r in (12) to obtain weights for each relation and the diagonal matrix W of edge weights is W = diag(w L r ). The weight of the dynamic allocation relation depends on the updated relation representation hop-by-hop. This model predicts the current relation based on the relation weight. Finally, the model adaptively updates entity states by accumulating connected relation feature E l = exp K α (M tail WR l,H P e ) (23) where P e is the entity-specific learnable parameter.

Training
For an L hop question, we sum the entity representation of each layer to obtain the final representation and use a liner layer f ans to predict the answer distribution. (24) where σ is the sigmoid function.
Since the model needs to predict both the answer to the question and the reasoning path, the loss function consists of two parts, one is binary cross-entropy loss of the final answer prediction, the other is the negative log likelihood of the intermediate prediction of the reasoning path. The specific expression of the loss function is as follows: where y i is the golden distribution over entities. r * l is the golden relation index at hop L. λ is a hyper parameter to balance the two terms.

Experiment
This section reports the experiments.

Experiment Setup
We detail the adopted datasets, evaluation metrics, parameters and baselines.

Datasets
We use two single answer KBQA datasets and two large-scale multi-answer KBQA datasets for the multi-hop KBQA task. We briefly outline these datasets in Table 2. • PQL-2H [48]: PQL-2H is a single answer KBQA dataset, which includes a knowledge base containing 5035 entities and 364 relationships, and a two-hop question set containing 1594 two-hop questions. These questions can be answered by following the reasoning path consisting of several relations and intermediate entities. The path has been given. To test QA systems in more realistic (and more difficult) scenarios, MetaQA-1H also provides neural-translation-model-paraphrased datasets, and text-to-speech-based audio datasets. • MetaQA-2H [49]: MetaQA-2H contains 148,724 questions for two-hop reasoning and the knowledge base in the dataset contains 40,128 entities and nine relations. MetaQA-2H provides neural-translation-model-paraphrased datasets, and text-to-speech-based audio datasets just like MetaQA-1H.

Metrics and Parameters
We test the effectiveness of the model in four datasets. The total questions in datasets are divided into three parts: 70% for training, 10% for validation and 20% for testing. We evaluate the experiment results via two standard metrics: F1 and Hits@1. The F1 value is an overall evaluation of the precision and recall, which can evaluate the performance of the model well, while Hits@1 measures the proportion of top 1 rankings. The aim of the training is to achieve high F1 and Hits@1.
The reported results are given for the best set of hyper-parameters evaluated on the validation set for each model after grid search on the following values: embedding size ∈ {100, 200, 300, 400, 500}, learning rate ∈ {1, 0.1, 0.01, 0.001}, λ and dropout are set to 1 and 0.4.

Baselines
We compare HDH-GCN with the following baselines: • KVMemNet [50]: This is an end-to-end memory network which divides the memory into two parts, the key memory stores the head entity and relation, and the value memory stores the tail entity. • IRN [48]: This is an interpretable reasoning network, which uses a hop-by-hop reasoning process and answers questions based on knowledge maps. • VRN [49]: An end-to-end variational learning algorithm is proposed, which can effectively solve the multi hop reasoning problem and simultaneously deal with the noise in the problem • GraftNet [12]: Text information and entities are introduced to construct a graph, and GCN is applied to reasoning. • SGReader [14]: This also combines the unstructured text and knowledge graph to figure out the incompleteness of the knowledge graph. The model employs graph attention to reason effectively. • 2HR-DR [15]: This models the entities extracted from questions and their related relationships and entities in the knowledge base into directed hypergraphs, then uses Directed Hypergraph Convolutional Networks to predict relations hop-by-hop and form a sequential relation path to make the reasoning interpretable. Tables 3 and 4 show the main experiment results for two kinds of multi-hop KBQA datasets. The highest scores are in bold. As shown in Table 3, we can find out that our proposed HDH-GCN can achieve optimal results under Hit@1 measurement standards (there is only one answer for each question in PQL datasets, so we only adopt Hits@1 for evaluation). For the rest of the datasets, except for the F1 value of HDH-GCN on the MetaQA-1H dataset, which does not exceed the baseline model, other evaluation indexes have been improved to some extent, as shown in Table 4. Specifically, HDH-GCN achieves an improvement for PQL-2H which is 0.9% higher than the second best model. It also obtains good result on PQL-3H, 1.2% higher than the second best one. Table 4 demonstrates the performance of the baseline methods and HDH-GCN on the MetaQA-1H dataset; our model improves Hits@1 by 1.8% and obtains competitive F1. For MetaQA2-Hop, we improve Hits@1 and F1 by 0.3% and 0.8%, respectively. First of all, compared with models based on a knowledge base modeled on a simple graph, our model reconstructs the knowledge graph to hypergraph structure which fully considers the high-order data correlation. Meanwhile, we dynamically concentrate on relation information at different hops by performing loop operations for each hop of inference to guide the model to follow the golden relation path and select the final answers, so we can introduce the information of the intermediate reasoning path into our model to supervise the model focus on the dynamic relations at different hops.

Results of Main Experiment
When comparing with the directed hypergraph-based model 2HR-DR, the improvement of both evaluation values on the two PQL datasets is more obvious than on the other datasets. The reasons why our method performs better include: (1) The directed hypergraph is modeled on hyperbolic space, which effectively reduces the sparsity of the incidence matrix of the directed hypergraph; this will reduce the scale of matrix calculation during training and reduce the inadequacy of training. This also explains why the results on the two PQL datasets are better than on MetaQA-1H and MetaQA-2H, the number of relationships in PQL is much higher than in MetaQA, which leads to more obvious matrix sparsity problems (the number of relations in MetaQA is small, and each relation can relate to many entities). (2) In the convolutional network of a directed hypergraph, there is a computational process of eigenvalue decomposition of the asymmetric Laplacian matrix, but the asymmetric matrix may not be able to carry out the eigenvalue decomposition, which leads to the forced eigenvalue decomposition of the matrix that cannot be diagonalized similarly in the training process, and the training error is always caused. The problem can be solved effectively by deforming the Laplace matrix in hyperbolic space. This problem is evident in the single-answer QA task, while the multi-answer QA task will dilute the influence of this problem to a certain extent when calculating the F1 value, which also explains why the F1 value of HDH-GCN is not improved in MetaQA-1H.

Parameter Analysis
Embedding size is a significant factor in KBQA models, determining the performance of the model to a large extent. Hence, we will analyze the results obtained by the model on PQL-2H in different embedding sizes to investigate its impact. First, according to the Figure 3a, HDH-GCN outperforms other methods when the dimension is {100, 200, 300, 400}. The Hit@1 of HDH-GCN increases sharply with the early stage of increasing the embedding size and becomes smooth after the embedding size increases to 400. The Hit@1 of 2HR-DR is almost identical to TF-DHP's from the start; however, because the sparsity issue becomes intense as the dimension increases, it cannot remain smooth like HDH-GCN when the embedding size increases. After the embedding size increases to a certain extent, 2HR-DR's Hit@1 will decrease slightly. For other methods, since the knowledge base is not modeled on the hypergraph, the sparsity issue has no obvious effect on the training results of higher dimensions; however, due to the reasons mentioned above, its results in each dimension are not as superior as HDH-GCN. We also count the Hit@1 results on PQL-2H of each training session. In the Figure 3b, we compare the Hit@1 between HDH-GCN and 2HR-DR on model training. 2HR-DR is stopped early around 35 epochs because of not updating Hit@1 for 10 epochs, so the line is not complete. HDH-GCN always achieves better performance, and is still updating until around 34 epochs.

Approximate Training Time Comparison
On the two kinds of datasets PQL and MetaQA, HDH-GCN takes around 75 min and 3 h, respectively, of training time, while 2HR-DR takes around 2 h and 3 h, respectively. All were run on a GeForce GTX 1080 super GPU machine with Python 3.

Case Study
As Figure 4 shows, we give an exemplar question from PQL-2H and its corresponding reasoning path and triples in KB. It is clear that HDH-GCN has the ability to predict relations hop-by-hop and stop reasoning automatically. For question "Who is the singer of the theme song of the movie "Titanic"?", the model firstly detects the relation "theme song", then "singer" successively and finally meets <STOP> to end the reasoning process. From the "path" in Figure 4, we can observe our model's predicted relation path (Theme song → Singer → <STOP>).

Path: Titanic Theme Song My_Heart_Will_Go_On Singer Celine Dion <Stop>
KBs: Figure 4. An exemplar of a two-hop query in a KBQA dataset; the figure shows the reasoning path and triples related to entities in the query, and graphically shows how the model reaches the answer to the question through the reasoning path.

Conclusions and Future Work
In this paper, we introduce HDH-GCN, a novel model for multi-hop KBQA tasks. We model the directed hypergraph convolutional network in hyperbolic space, which effectively reduces the influence of the sparsity issue on the model effect. Our model can improve the accuracy of the multi-hop knowledge base question-answering task, and has application value in text question answering, human-computer interactions and other fields. The experimental results verify the advantages of HDH-GCN in both single-answer questions and multi-answer question datasets.
In the future, we will study the multi-hop knowledge base question-answering task in multi-modal data, study the possibility of modeling a multi-modal knowledge base by directed hypergraph and explore the possible application prospect.

Conflicts of Interest:
The authors declare no conflict of interest.