DAEOM: A Deep Attentional Embedding Approach for Biomedical Ontology Matching

: Ontology Matching (OM) is performed to ﬁnd semantic correspondences between the entity elements of different ontologies to enable semantic integration, reuse, and interoperability. Representation learning techniques have been introduced to the ﬁeld of OM with the development of deep learning. However, there still exist two limitations. Firstly, these methods only focus on the terminological-based features to learn word vectors for discovering mappings, ignoring the network structure of ontology. Secondly, the ﬁnal alignment threshold is usually determined manually within these methods. It is difﬁcult for an expert to adjust the threshold value and even more so for a non-expert user. To address these issues, we propose an alternative ontology matching framework called Deep Attentional Embedded Ontology Matching (DAEOM), which models the matching process by embedding techniques with jointly encoding ontology terminological description and network structure. We propose a novel inter-intra negative sampling skill tailored for the structural relations asserted in ontologies, and further improve our iterative ﬁnal alignment method by introducing an automatic adjustment of the ﬁnal alignment threshold. The preliminary result on real-world biomedical ontologies indicates that DAEOM is competitive with several OAEI top-ranked systems in terms of F-measure.


Introduction
Ontology is a highly interoperable, extensible, and scalable mechanism to represent knowledge as a set of concepts within a domain and the relation among those concepts [1]. Life science is one of the most prominent application areas of ontology technology. Many biomedical ontologies such as SNOMED CT [2], the National Cancer Institute Thesaurus (NCI) [3], and the Foundational Model of Anatomy (FMA) [4] have been developed and utilized in real-world systems. Biomedical ontologies provide domain knowledge to support applications such as semantic annotation of biomedical data, knowledge discovery and exchange, data integration, and decision support [5,6]. To integrate and migrate data among applications, it is crucial to first establish mappings between the entities of their respective ontologies. However there are diverse ways to construct an ontology, which leads to varying degrees of heterogeneities [7]. The heterogeneities among different ontologies have limited interoperability across the ontologies. Ontology matching techniques are usually performed to find semantic correspondences between the entity elements of different ontologies to enable interoperability [8]. Two types of correspondences can be distinguished. While approaches generating simple correspondences are limited to matching single entities (i.e., linking a single entity from a source ontology to a single entity of a target ontology), complex matching approaches are able to generate correspondences which express more complex relationships between entities from different ontologies [9]. In this paper, we focus on matching single entities.
Early studies regarding automatic ontology matching has focused on engineering features from terminological, structural, extensional (individuals of concepts) information, and external resource [10,11]. These features are utilized to compute the similarities of ontological entities (i.e., concepts, properties, individuals) for guiding ontology matching. Up to now, the mainstream ontology matching systems (e.g., AML [12], FCA_Map [13], LogMap [14], and XMap [15]) still use feature-based strategies to evaluate entity similarity. AML employs various sophisticated features and domain-specific thesauruses to perform ontology matching based on heuristic methods that rely on aggregation functions. FCA_Map uses Formal Concept Analysis to derive terminological hierarchical structures that are represented as lattices. The matching is performed by aligning the constructed lattices, taking into account the lexical and structural information that they incorporate. LogMap uses logic-based reasoning over the extracted features and cast ontology matching to a satisfiability problem. XMap relies on the context notion to deal with lexical ambiguity as well as a parallel comparison between concepts to efficiently handle the matching of large ontologies. Feature-based methods mainly employ crafting features of the data to achieve specific tasks. Unfortunately determining which hand-crafted features will be valuable for a given task can be highly time consuming. Cheatham and Hitzler showed that the performance of ontology matching based on such engineered features varies greatly with the domain described by ontologies [16].
As a complement to feature engineering, attempts have been made to develop representation learning techniques for ontology matching. Continuous vectors representing ontological entities can capture the potential associations among features, which is helpful to discover more mappings among ontologies [17].
Representation learning have so far limited impacts on ontology matching, specifically in biomedical ontologies. To the best of our knowledge, only seven literatures have explored the use of representation learning techniques for ontology matching. In [18][19][20], they used word embeddings [21] to compute the semantic similarities among elements. Zhang et al. [18] firstly introduce word embeddings to the field of ontology matching. They trained word embeddings on Wikipedia and used skip-gram architectures for computing vector representations. They combined edit distance and word embeddings for the computation of the semantic similarities among elements. For textual entity descriptions consisting of more than one word, they proposed a heuristic method to compute semantic similarities among elements. The entity matching strategy was based on maximum similarity. For every entity in the source ontology, the algorithm found the most similar entity in the target ontology. To avoid the problems that come from the semantic similarity and conceptual association coalescence, in [19,20], they used synonymy and antonymy constraints extracted from semantic lexicons to refine pre-trained word embeddings and make them better suited for evaluating semantic similarity. Kolyvakis et al. [19] used the Dual Embedding Space Model (DESM) to compute the semantic similarity of two entity textual description word sets. In [20], they averaged the bag of the word vectors to produce sentence vectors of the entities' textual description, and use the cosine distance to compute the semantic similarity of two entities. In addition, they leverage an outlier detection mechanism based on a denoising autoencoder to improve the performance of alignments. DOME [22] is a scalable matcher that relies on large texts describing ontological concepts. It uses the doc2vec approach to train a fixed-length vector representation of the concepts. Mappings are generated if two concepts are close to each other in the resulting vector space. Xiang et al. [23] proposed an entity representation learning algorithm based on Stacked Auto-Encoders to learn the general representation of the entities. To describe an ontological entity, they design a combination of its ID, labels, and comments. They introduced an iterative similarity propagation method that takes advantage of the more abundant structure information of ontology to discover more mappings. Wang et al. [24] proposed a neural architecture tailored for biomedical ontology matching called OntoEmma, which can encode a variety of information and derive large amounts of labeled data for training the model. Moreover, they utilize natural language texts associated with entities to further improve the quality of alignments. Li et al. [17] present an alternative ontology matching framework called MultiOM, in which different loss functions were designed based on cross-entropy to model different views among ontologies and learn the vector representations of concepts. They further proposed a novel negative sampling skill tailored for structural relations, which could obtain better vector representations of concepts. After computing the semantic similarity of entities from two compared ontologies with the learned vectors, the final alignment method is executed to select appropriate correspondences between entities of compared ontologies. The final alignment is an important part of the ontology matching process because it directly determines the output result of this process. Several works [17,19,20,23] use the Stable Marriage algorithm [25] for mapping selection during the final alignment. They iteratively pass through all the candidate alignments and discard those with a cosine distance higher than a certain threshold. The adjustment of the threshold value is usually determined manually within matching systems considering the experience of developers.
However, there exist two limitations for the above methods. The first one is the sparsity problem of structural relations. To avoid the poor capability of encoding sparse relations, the above methods prefer terminological-based features to learn concept vectors for discovering mappings, but they do not make full use of structural relations in ontologies. Although Li et al. [17] designed a negative sampling technique to fine-tune the vector representation of concepts by using the network structure indirectly, it can not effectively preserve the network structure. The other is that using the Stable Marriage algorithm during the final alignment has worst-case time complexity Θ n 2 log n [20]. That is, the larger the ontology, the more time it takes for the final alignment. Meanwhile, the adjustment of the threshold value is usually determined manually within matching systems considering the experience of developers, therefore it is not necessarily adjusted in an optimal way for each pair of compared ontologies [26].
Motivated by the above observations, in this paper we propose a graph attentional autoencoder-based attributed ontology matching framework, which models the matching process by jointly encoding terminological description and network structure to continuous vector representation. Based on the representation, a mapping selection module is proposed to perform the matching algorithm towards better performance. We summarize our contributions as follows: • We develop the first siamese graph attention-based autoencoder to effectively integrate both network structure and terminological description for deep latent representation learning in ontology matching. To further make full use of structural relations in ontologies, we design a novel inter-intra negative sampling skill tailored for structural relations asserted in ontologies, which can obtain better vector representations of concepts; • To improve the performance of ontology matching, we combine the Greedy Matching algorithm [27] with the Stable Marriage algorithm to only find the highest correspondences in the semantic similarity matrix as the candidate alignments. We also introduce an automatic adjustment of the threshold method by using the first iteration found for the highest correspondences; • We implement our method and conduct experiments on real-world biomedical ontologies. The experimental results show that our matching approach can achieve a competitive matching performance compared to several OAEI top-ranked systems in terms of F-measure.

Problem Definition
We consider ontology O = (C, E, A, X), where C is the set of concepts (vertices) and E ⊆ C × C is the set of relationships (directed edges) between concepts. The topological structure of ontology O can be represented by an adjacency matrix A, where A i,j = 1 if c i , c j ∈ E, otherwise A i,j = 0. X = {x 1 ; · · · ; x n } are the sequence of textual description values where x i ∈ R d is a real-value feature vector associated with concept c i .
Ontology matching can be formally defined as a function that takes two ontologies O s and O t , and returns all semantically equivalent mappings between their different concepts c i ∈ O s and c j ∈ O t . In this work, we focus on discovering equivalence correspondences between two ontologies with cardinality 1:1. That is, one concept in ontology O s can be matched to at most one concept in ontology O t and vise versa.

Overall Framework
Existing work [17] designs a negative sampling skill tailored for structural relations to obtain better vector representations of concepts, which only leverages the structural relations for embedding learning indirectly, and can not effectively preserve the network structure information into vectors directly. Graph attention networks (GATs) [28] can assign nodes in a network to low-dimensional representations and effectively preserves the network structure. It is highly effective in network analysis tasks, such as link prediction [29] and node classification [30]. Until recently, this method did not get enough interest in ontology matching. Inspired by their works, we develop an ontology attentional encoder which effectively integrates both structure and terminological description to learn a latent representation. This model uses both direct and indirect methods to solve the sparsity problem of structural relations for representation learning. With jointly encoding terminological description and network structure to continuous vector representation, we can obtain more similar concepts and discover more potential mappings among ontologies. Moreover, there is no need to select and aggregate different single similarities in the similarity computation as described in [17].
Our framework is shown in Figure 1 and consists of two parts: An ontology attentional encoder and a mapping selection module.
• Ontology attentional encoder. Our encoder takes the terminological description and network structure as input, and learns the latent embedding by minimizing the reconstruction loss, and designs an inter-intra negative sampling skill to indirectly alleviate the sparsity problem for embedding learning; • Mapping Selection. The mapping selection module performs ontology matching based on the learned representation. It includes two phrases: Candidate selection and final alignment. To reduce the running time of mapping selection, we improve the Stable Marriage algorithm's ability to choose the 1:1 mappings from two different ontologies and introduce an automatic adjustment of the threshold method by using the highest correspondences.

Ontology Attentional Encoder
The architecture of the ontology attentional encoder is shown in Figure 2. We first assign each sequence of terminological description associated with concept from source and target ontologies (O s and O t ) to low-dimensional representations (X s and X t ) such that semantically similar sequences are close. Then we develop a siamese graph attentional autoencoder, which effectively integrates both network structure (A s and A t ) and terminological description (X s and X t ) to learn latent representations Z s and Z t . The encoder exploits both network structure and terminological description with a graph attention network, and multiple layers of encoders are stacked to build a deep architecture for embedding learning. On the other side, the decoder reconstructs the topological network information and manipulates the latent graph representation. Terminological Description Embedding. In this paper, we present Terminological-BERT (TBERT), a modification of the Bert [31] network using siamese networks that is able to derive semantically meaningful terminological description embeddings. TBERT adds a pooling operation to the output of BERT to derive a fixed sized terminological description embedding. The pooling strategy is computing the mean of all output vectors (MEAN-strategy). In order to fine-tune BERT, we create siamese networks to update the weights such that the produced terminological description embeddings (X s and X t ) are semantically meaningful and can be compared with a similarity metric.
We employ a similarity metric, described in [17] (listed as Weizhuo Li), to guide the optimizing procedure. Given input {u, v}, where u and v are the mapping pair. TBERT seek value of the parameter such that the symmetric similarity metric is large if u and v have the same semantic, and small if they mean different semantics. The similarity metric function sim tb (u, v) that measures the semantic relatedness between mapping pair (u, v) can be defined as: where || · || 2 is the L2-norm, and f (u) and f (v) are the generated terminological description embeddings.
We define a loss function based on cross-entropy to optimize the vector representations of terminological description. The loss function is defined as follows: where M intra is a set of positive mappings and M intra is a set of negative mappings (see Section 3.2 for details). Jointly Embedding. After producing all the terminological description embeddings (X s and X t ) of concepts from source and target ontologies, we present structural-GAT (SGAT), a variant of the graph attention network (GAT) [28] using siamese networks that is able to represent both network structure (A s and A t ) and terminological description (X s and X t ) in a unified framework. The idea is to learn hidden representations of each concept that allow for assigning different importance to different concepts within a neighborhood while dealing with different sized neighborhoods to aggregate feature information. In order to measure the importance of various neighbors, different weights are given to the neighbor representations in our layer-wise graph attention strategy: Here, z l+1 i denotes the output representation of concept i, and N i denotes the neighbors of i.
α ij is the attention coefficient that indicates the importance of neighbor concept j to concept i, and σ is a nonlinerity function. To calculate the attention coefficient α ij , we measure the importance of neighbor concept j from both the aspects of the terminological description and topological distance. The attention coefficient α ij can be represented as a single-layer feedforward neural network on the addition of x i and x j with weight vector − → a ∈ R d : Topologically, neighbor concepts contribute towards the representation of a target concept through edges. GAT considers only the 1-hop neighboring concepts (parents and children) for graph attention. In this paper, we assume that two concepts from different ontologies with same parents, children, and siblings considered more similar than concepts that do not have these things in common. Four relationships were used to construct a neighborhood: "self", "parent", "child", and "sibling". For a given concept, its neighbors include itself, all direct parent and children, and sibling concepts. We obtain a proximity matrix by considering 1-order neighbor concepts (parents and children) and 2-order neighbor concepts (only siblings) in the ontology: Here B is the transition matrix where B ij = 1/d i if r ij ∈ R and B ij = 0 otherwise. d i is the degree of concept i. Therefore P ij denotes the topological relevance of concept j to concept i up to first or second orders. In this case, N i means the neighboring concepts of i in P. i.e., j is a neighbor of i if P ij > 0.
The attention coefficients are usually normalized across all neighborhoods j ∈ N i with a softmax function to make them easily comparable across concepts: Adding the topological weights P and an activation function σ (here LeakyReLU is used), the coefficients can be expressed as: We have x i = z 0 i as the input for our problem, and stack two graph attention layers: In this way, our encoder encodes both the structure and terminological description of the concept into a hidden representation, i.e., we will have z i = z i .
We jointly optimize the ontology autoencoder embedding and matching learning, and define our total objective function as: where L r and L sg are the reconstruction loss of source and target ontologies and matching loss respectively, and γ ≥ 0 is a coefficient that controls the balance in between. As our latent embedding already contains both content and structure information, we choose to adopt a simple inner product decoder to predict the relationships between concepts, which would be efficient and flexible:Â whereÂ s is the reconstructed structure matrix of the source ontology O s , andÂ t is the reconstructed structure matrix of the target ontology O t . We minimize the reconstruction error by measuring the difference between A andÂ: Ontology mapping pairs are fed to train the siamese graph attention networks. Given an input cs k i , ct k j , where cs k i and ct k j are the k th mapping pair, the i th concept from O s , and j th concept from O t . The similarity metric function sim sg cs k i , ct k j that measures the semantic relatedness between mapping pair cs k i , ct k j can be defined as: where || · || 2 is the L2-norm. We define a loss function based on cross-entropy to optimize the vector representations of terminological description. The loss function is defined as follows: where M inter is a set of positive mappings and M inter is a set of negative mappings (see Section 3.2 for details). The model is trained by minimizing the overall loss function. In this paper, we only consider small ontology structure data, so the batch size during model training is 1. Large-scale ontology will be discussed in future research.

Inter-Intra Negative Sampling
In this paper, the unique rdfs:label that accompanies every type in the source and target ontologies is used as a terminological description of the concept. If a concept contains multiple labels, we will choose the first label as the terminological description of the concept. To obtain more candidate mappings for representation learning, we assume that the mappings generated by equivalent strings or their synonym labels are positive samples. It means that the positive mappings M has two parts: A set of positive mappings from inside the ontology (M intra ) and a set of positive mappings from inter-ontology (M inter ). Each concept contains native attributes: Name, alias, and synonyms, these attributes have the same semantics. We construct M intra and M inter through these attributes. An example can be seen in Figure 3.
Furthermore, we design a novel inter-intra negative sampling tailored for structural relations (e.g., subclassOf relations) asserted in ontologies, which can obtain better vector representations of entities for ontology matching. This method can indirectly solve the sparsity problem of structural relations like [17]. Unlike the uniform negative sampling method that samples its replacer from all the concepts, we limit the sampling scope to a group of candidates. The negative mappings M = M intra ∪ M inter . For a given positive sample c i , c j ∈ M intra , if there exist subclassOf relations (e.g., (c i , subclassOf, c j ) or (c j , subclassOf, c i ) asserted in ontologies, we need to exclude this replace case. If concept u of ontology O s has a similar semantic with entity v of ontology O t , and u can not match with another concept of ontology O t . In addition, v can not match with another concept of ontology O s . For a given positive sample (u, v) ∈ M inter , we corrupt it and randomly replace u or v to generate negative mapping pairs. Unlike the uniform negative sampling method that samples its replacer from all the concepts, we randomly select a concept that was not the neighbor of u and v. That is, the randomly selected concept is not a parent or child of u and v, because the subclass of relationship of our datasets are transitive [32].

Mapping Selection
As shown in Figure 4, the mapping selection consists of two phrases. One is a candidate selection (in blue boxes), another is the final alignment (in green boxes). We compute the distance using the euclidean distance over concept embeddings learned by jointly embedding the model. We iteratively match the concepts of two different ontologies using the improved Stable Marriage algorithm (SM) over the concept embeddings' pairwise distances. We also introduce an automatic adjustment of the threshold method by using the first iteration found in the highest correspondences. Finally, we pass through all the candidate alignments and discard those with a euclidean distance higher than a threshold.
As the computation of the preference matrix required for defining the Stable Marriage assignment problem's instance has worst-case time complexity Θ n 2 log n [20], it means that aligning larger ontologies will consume much more time. To improve the performance of ontology matching, we combine the Greedy Matching algorithm with the Stable Marriage algorithm to reduce the runtime of ontology matching. Inspired by [26], we only find the highest correspondences in the semantic similarity matrix as the candidate alignments. A correspondence between concept c s of ontology O s and c t of ontology O t is the highest correspondence if and only if it has the smallest semantic distance than any other correspondence of either c s or c t with some other concept. We iteratively find the highest correspondences from the semantic similarity matrix until there is no highest correspondence or concepts are all matched in either O s or O t . In this way, we do not have to find the most similar concepts for all concepts. Experiments show that only finding the highest correspondences can reduce the running time without affecting the matching performance.
We iteratively pass through all the candidate alignments and discard those with a euclidean distance higher than a certain threshold t. The adjustment of the threshold value is usually determined manually within matching systems considering the experience of developers. In this paper, we introduce an automatic adjustment of the threshold method instead of manual adjustment. Since the highest correspondences found in the first iteration are most likely to be the correct matching pairs, we use their semantic similarity to calculate the threshold t [26]. The threshold value is defined as: where Q = {q 1 , q 2 , . . . , q i } (i > 0) is the semantic similarity of all highest correspondences found in the first iteration.Q is the arithmetic mean of Q and SD Q is the standard deviation of Q. The algorithm of the whole module is shown in Algorithm 1.

Experiments and Discussion
To verify the effectiveness of DAEOM, we used Python to implement our approach with the aid of TensorFlow. All reported experiments were performed on a desktop computer with an Intel R Core TM i7-6700K (3.40 GHz) processor with 32 GB RAM and one NVIDIA R GeForce R GTX TM 1080 (8 GB) graphic card.

Dataset
The experiments were performed on biomedical evaluation benchmarks coming from the Ontology Alignment Evaluation Initiative (OAEI), which organizes annual campaigns for evaluating ontology matching systems.
Biomedical ontologies were collected from the Anatomy Track and Large BioMed Track in OAEI. We used four ontologies in our ontology mapping experiments, which are Adult Mouse Anatomical Dictionary (MA) [33], NCI Thesaurus (NCI), Foundational Model of Anatomy (FMA), and SNOMED Clinical Terms (SNOMED). Two of them (FMA and MA) are pure anatomical ontologies, while the other two (SNOMED and NCI) are broader biomedical ontologies, and anatomical structure is their subdomain [34]. These resources are available in the Ontology Alignment Evaluation Initiative. We provide some details regarding the respective size of each ontology matching task in Table 1. The ontology matching tasks are MA-NCI, FMA-NCI, and FMA-SNOMED. The "Nodes" is the ontology entities. The "Relations" is the edges (SUBCLASS-OF) between nodes in the ontology. In this paper, we only consider the SUBCLASS-OF relationship of these datasets. We have only focused on discovering one-to-one matchings. The "Matchings" only include the nodes one-to-one types' equivalences. To represent each ontology node by jointly encoding its textual description and network structure information, the unique rdfs:label that accompanies every type in the ontologies is used as a textual description of the node. We performed a textual preprocessing, including case-folding, tokenization, removal of English stopwords, and words coappearing in terms (for example, the word "structure" in SNOMED). The reference alignments of these alignment scenarios are based on the UMLS Metathesaurus [35], which currently consists of the most comprehensive effort for integrating independently developed medical thesauri and ontologies. All the mappings marked as "?" in the reference alignment were considered as positive.

Evaluation Measures
We followed the standard evaluation criteria from the OAEI, calculating the precision (P), recall (R), and F-measure (F) over each test. Given a reference alignment G, the precision and recall of some alignment A are defined as (17) and (18): The F-measure of some alignment A is basically the harmonic means of precision and recall, defined as (19):

Experiment Settings
We select several strategies to construct the baseline methods to explore the performance details of our algorithm. The following is the detailed construction of strategies in our experiments.
-OM-TD (TF-IDF [36]): It uses TF-IDF to calculate the similarity of ontology terminological description by the procedure described in [17]; -OM-TD (LSTM [37]): It only uses LSTM to produce ontology terminological description embeddings ignoring ontology network structure information for ontology matching; -OM-TD (TBERT): It employs our proposed ontology terminological description embedding component without encoding network structure information; -OM (LSTM + SGAT): It extends the LSTM experiment by applying our proposed SGAT to jointly encoding terminological description and network structure for ontology matching; -OM (TBERT + GraphSAGE [38]): It combines our proposed TBERT and GraphSAGE to jointly encode a terminological description and network structure for ontology matching; -OM (TBERT + TransE [39]): It employs TransE, jointly encoding terminological description and network structure for ontology matching. The vectors of nodes are initialized by our proposed TBERT; -OM (DAEOM): Is our proposed jointly embedding method for ontology matching.
"OM-TD" means only using terminological description for ontology matching. "OM" means using both network structure and terminological description. Those methods applied our proposed mapping selection module in final alignment.
For DAEOM, we use Adam as an optimizer and the configuration of hyper-parameters is listed below: Word vector has the size (d) 200 and is shared everywhere. We initialized the word vectors from word vectors pre-trained on a combination of PubMed and PMC texts with texts extracted from a recent English Wikipedia dump [40]. All the initial out-of-vocabulary word vectors are sampled from a normal distribution (µ = 0, σ 2 = 0.01). We selected {2, 4, 6} negative triples sampled for each positive triple. In TBERT, the number of block N and multi-head h e in transformer-encoder were 2 and 4. The whole training of TBERT spent 50 epochs. The mini-batch size was set to Tbatch = {100, 500, 1000}. We selected the learning rate λ tb among {0.01, 0.001, 0.0001}. In SGAT, we set the matching coefficient γ to 10. The output concept vector had size (d ) 25. The SGAT was trained over 100 epochs. The mini-batch size was set to Sbatch = 1. We selected the learning rate λ sg among {0.01, 0.001, 0.0001}.
In order to show the effect of our proposed negative sampling, a symbol "-" added to the symbol represented a module that indicates that this module is not equipped with negative sampling tailored for structural relations. Table 2 lists the matching results of DAEOM compared with baseline systems. We can see that our method clearly outperforms all the baselines across all the evaluation metrics. Comparing the F-measure of TF-IDF and our proposed TBERT, the latter achieves a higher performance than TF-IDF. The main reason for this is that continuous vectors representing tokens can provide more semantic information than single strings for calculating the similarity of concepts. To validate the importance of our ontology term encoder component (TBERT), we further analyzed the behavior of aligning ontologies based on the LSTM. As we can see, TBERT achieves a statistically significant higher performance than LSTM in all experiments. This behavior indicates the superiority of TBERT in injecting semantic similarity to terminological description embeddings. Moreover, the performance of DAEOM is better than TBERT in term of F-measure. We further extended the LSTM experiment by applying our proposed SGAT. This extension leads to the same effects as the ones summarized in the TBERT-DAEOM comparison. This observation demonstrates that both the network structure and terminological description contain useful information for ontology matching, and illustrates the significance of capturing the interplay between two-sides information. As our proposed model is graph-based, we also select two graph-based models GraphSAGE and TransE as the baselines. Notice that the performance of DAEOM is much more better than combining TBERT with GraphSAGE or TransE. The main reason is that the GAT can learn node embeddings that allow for assigning different importance to different nodes within a neighborhood while dealing with different sized neighborhoods. Moreover, the performance of DAEOM is better than DAEOM in terms of the F-measure. This further proves that employing structural relations are helpful to distinguish the vector representations of concepts.  Note: "OM-TD" means only using terminological description for ontology matching. "OM" means using both network structure and terminological description. Bold numbers indicate the best performance on each matching task, respectively. Table 3 lists a comparison of DAEOM with six top-performing systems based on feature engineering and representation learning, according to the results published in the Anatomy track and Large BioMed track by OAEI 2019. The preliminary result shows that DAEOM obtains the top performance in the two ontology mappings tasks (FMA-NCI (small)) and FMA-SNOMED (small)). In those two, the precision and F-measure are significantly better than all other systems. In terms of recall, DAEOM demonstrates lower performance in the ontology matching tasks. However, we would like to note that we have not used any semantic lexicons specific to the biomedical domains compared to the other systems. For instance, AML uses three sources of biomedical background knowledge to extract synonyms. Specifically, it exploits the Uber Anatomy Ontology (Uberon), the Human Disease Ontology (DOID), and the Medical Subject Headings (MeSH) [12]. Hence, our reported recall can be explained due to the lower coverage of biomedical terminology in the semantic lexicons that we have used. Nevertheless, there still exists a gap compared with the best systems (e.g., AML, LogMap, POMAP++) in MA-NCI. We analyze that only considering the SUBCLASS-OF relationship may be the main reason. In MA-NCI, the ontology O s contains a large number of property relationships, which we did not use. This is also the reason why the relations of ontology O s of this task in Table 1 is sparser than that of other ontology. We leave this issue in our future work.

Results and Discussion
In Table 4 we demonstrate a sample of results produced by aligning ontologies using improved Stable Marriage's solution based on a similarity matrix computed either on DAEOM or OM-TD(LSTM) in FMA-NCI task. The automatically calculated threshold of DAEOM is 0.046, and the other is 0.097. These results are divided into a true and false alignment. As we know, there are two types of ontological heterogeneity. The first one is that different concepts from the two ontologies contain the same semantics, but are defined as different name labels. The other is that two concepts are defined as the same name labels but their semantemes are very different. From those results in Table 4, it can be seen that DAEOM can effectively distinguish two concepts that have name or semantic heterogeneity by jointly encoding ontology terminological description and network structure. In particular, our method is more effective than the term-only method to match concepts that have numbers and abbreviations in their name labels.

Runtime Analysis
In this section, we report the runtimes of our ontology matching algorithm for different matching scenarios. Since DAEOM consists of three major steps, we present the time devoted to each of them in Figure 5a. In brief, the steps of our algorithm are the following: The training of TBERT (Step 1), the training of SGAT (Step 2), and the mapping selection (Step 3). As seen in Figure 5a, the majority of the time is allotted to the training of TBERT and SGAT. Specifically, we would like to note that training time depends on the number of epoch. Besides, it can be observed that running time gradually increases as the scale of ontology. We also experimented to analyze the running time and performance (F-measure) of our improved Stable Marriage (SM) matching across the different matching scenarios. The experimental results are shown in Figure 5b,c. With the increasing scale of ontology, the running time of the unimproved matching presents exponential growth. However, the running time of our improved matching is linear growth. In addition, we can see that the running time of the improved matching is shorter than unimproved matching across different matching scenarios. This can be explained as follows: The unimproved solution requires each entity of ontology O s to be compared across all possible entities of ontology O t , and has worst-case time complexity Θ n 2 log n . Our improved matching only finds the highest correspondences in the semantic similarity matrix as candidate alignments. The time complexity is Θ (n). In Figure 5c, we can see that the improved and unimproved matching methods obtain similar or even better performances in all different matching tasks. Experimental results show that only finding the highest correspondences can reduce the running time without affecting the matching performance.

Threshold
The threshold t constitutes a means for quantifying if two entities are semantically similar. To validate our automatic adjustment threshold method is effective, we perform a sensitivity analysis for the threshold t and compare between the automated threshold and manual threshold about performance (F-measure). Figure 6a shows a threshold sensitivity analysis of our method across the different ontology matching tasks. For exploring the effectiveness of threshold t, we present the performance of DAEOM for all different matching scenarios when manually varying the value of t between 0 and 1 at 0.1 intervals. It is observed that the performance (F-measure) monotonically increases when the value of t varies between 0 and approximately 0.1. Then the performance decreases with t ∈ [0.1, 0.6] and reaches an asymptotic value at about 0.6.
We report the performance (F-measure) results with automated and manual threshold across the different matching tasks in Figure 6b. As the performance increases with the value of t varying between 0 and approximately 0.1, we only manually change the threshold between 0 and 0.1 at 0.01 intervals. It can be observed that the automated threshold values are closed to the optimal values of the manual threshold. They obtain similar or even better performances in all the different matching tasks. The experimental results prove that using the highest correspondences found in the first iteration to calculate the threshold is effective for ontology matching.

Conclusions
In this paper, we developed siamese graph attention-based autoencoders to effectively integrate both network structure and terminological description for deep latent representation learning in ontology matching. We based our ontology embedding architecture on Bert, Graph Attention Networks, and Siamese Neural Networks. We computed ontologies semantic similarity using the euclidean distance over the node embedding vectors. We found the highest correspondences in the semantic similarity matrix as the candidate alignments to reduce the runtime of ontology matching. Finally, we usde the highest correspondences found in the first iteration to automate the threshold instead of manual adjustment. Experimental results on the datasets from OAEI demonstrated that our approach performed better than most of the participants and achieved competitive performance.
Nonetheless, our approach also has certain shortcomings. To begin with, our approach employs one single model for all relations without distinction, which inevitably restricts the capability of network embedding. We only considered the SUBCLASS-OF relationship in the ontology. In fact, there are many types of relationships in ontology. A promising direction for future research is to investigate heterogeneous relations of ontology. At the same time, we also only considered small ontology matching. It means that graph segmentation and distributed model training did not need to be considered. Large-scale ontology will be discussed in future research.