A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion

Yang, Wanli; Xing, Linlin; Zhang, Longbo; Cai, Hongzhen; Guo, Maozu

doi:10.3390/app131810055

Open AccessArticle

A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion

by

Wanli Yang

¹

,

Linlin Xing

^1,*

,

Longbo Zhang

¹,

Hongzhen Cai

² and

Maozu Guo

³

¹

School of Computer Science and Technology, Shandong University of Technology, Zibo 255049, China

²

College of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo 255090, China

³

Beijing Key Laboratory of “Research on Intelligent Processing Methods of Building Big Data”, College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(18), 10055; https://doi.org/10.3390/app131810055

Submission received: 7 August 2023 / Revised: 1 September 2023 / Accepted: 4 September 2023 / Published: 6 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Biomedical texts are relatively obscure in describing relations between specialized entities, and the automatic extraction of drug–drug or drug–disease relations from massive biomedical texts presents a challenge faced by many researchers. To this end, this paper designs a relation extraction method based on dependency information fusion to improve the predictive power of the model for the relations between given biomedical entities. Firstly, we propose a local–global pruning strategy for the dependency syntax tree. Next, we propose the construction of a dependency type matrix for the pruned dependency tree to incorporate sentence dependency information into the model to feature extraction. We then incorporate attention mechanism into the graph convolutional model by calculating the attention weights of word–word dependencies, thus improving the traditional graph convolutional network. The model distinguishes the importance of different dependency information by attention weights, thus weakening the influence of interfering information such as word-to-word dependencies that are unrelated to entities in long sentences. In this paper, our proposed Dependency Information Fusion Attention Graph Convolutional Network (DIF-A-GCN) is evaluated on two biomedical datasets, DDI and CIVIC. The experimental results show that our proposed method based on dependency information fusion outperforms current state-of-the-art biomedical relation extraction models.

Keywords:

biomedical relation extraction; dependency syntax tree; pruning strategy; dependency information fusion; graph convolutional network; attention mechanism

1. Introduction

Relation Extraction (RE) is an important subtask of information extraction [1]. With the rapid development of biomedical field, this technology has become a powerful tool for mining biological information and its relation. One of the key issues that biomedical relation extraction needs to address is how to obtain effective biomolecular information and its correlation relation from discrete yet interrelated massive data. It plays a very important role in downstream tasks such as medical knowledge graph construction and biomedical knowledge discovery [2].

Traditional relation extraction methods are mainly rule-based methods and traditional machine learning-based methods [3], which mainly match text by handwritten rules or rely on manually designed kernel functions [4]. In recent years, with the gradual maturity of deep learning, deep learning models have made certain progress in relation extraction task, which no longer need to rely on professionals to design grammars and data features as before. The most important feature of deep learning is that it can automatically learn features in the data and can back-propagation learning. The use of neural network models for relation extraction is not only simple and effective, but also promises to incorporate more diverse and richer knowledge into the relation extraction model. Among all the different sources of knowledge, syntactic information, especially dependency syntax tree, has been proven to be beneficial in many studies. Dependency syntax tree provides better semantic guidance for analyzing the contextual information associated with a given entity. However, the extensive use of dependency syntactic information does not always lead to good relation extraction performance, and the noisy information in it may bring confusion to relation classification. Especially when the sentence length in biomedical data is longer and the semantics are more complex, the dependency tree generated through sentences is too cumbersome, which not only increases the complexity of the model, but also reduces the performance of biomedical relation extraction in complex text data.

In response to previous research and current realities, we summarize the following: First, problem statement: most of the biomedical texts have longer sentence lengths, which are more implicit in describing the relation between specialized entities. The sentence semantics are more complex. The current research on introducing dependency trees for relation extraction failed to solve the problem of noise information propagation in dependency trees. Second, research questions: how to solve the problem of dependency tree noise propagation and how to utilize dependency information to improve the performance of the model for biomedical relation extraction. Finally, hypothesis: the problem of noise propagation caused by dependency trees can be solved by necessary pruning strategies. The performance of the model’s relation extraction can be improved by incorporating the dependency information between words.

Based on the above questions and hypothesis, this paper proposes a method to improve the performance of relation extraction in biomedical domain from the perspective of dependency information. The main work is as follows:

(1): Obtain the dependency syntax tree of biomedical data labeled with entities and propose to use a local-global pruning strategy to filter out interfering information and improve accuracy;
(2): Construct dependency type matrix for dependency information fusion on the pruned dependency tree so as to incorporate dependency information into the graph convolutional model. Calculating attention weights allows the model to distinguish the important dependency types in the dependency tree for predicting relations, thus improving the relation extraction performance.

2. Related Work

Early research on relation extraction had problems such as time-consuming, laborious, and low accuracy. Existing studies show that deep learning-based relation extraction methods have become a research hotspot. In biomedical relation extraction, sentence-level relation extraction has become a research hotspot in recent years [5]. Domestic and foreign researchers have continuously innovated the relation extraction model architecture for the characteristics of texts in this field, promoting the research progress of relation extraction in the biomedical field. Zeng et al. [6] first used Convolutional Neural Network (CNN) for the relation extraction task. They utilized the powerful local feature extraction ability of convolutional neural network to extract features of words and sentences, and used position embedding to represent the distance between words and entities. Santos et al. [7] used convolutional neural network for relation classification task which classifies by ranking. They also proposed a new pairwise ranking loss function so that the impact of artificial classes can be easily reduced. But these methods have poor performance when facing pairs of entities that are far away from each other. Socher et al. [8] first proposed to apply Recurrent Neural Network (RNN) to relation extraction and made some progress. However, RNN may generate problems such as gradient disappearance or gradient explosion when processing long sequence text. SuarezPaniagua et al. [9] proposed that RNN can be used to classify drug interactions in biomedical texts. The system is based on MV-RNN, a sentence tree based on Stanford constituencies. However, MV-RNN did not provide satisfactory results, mainly due to the fact that DDIs are usually described by long sentences with complex structures (e.g., subordinate clauses, antithetical and parallel structures, etc.). Later, Long Short-Term Memory (LSTM) was proposed to solve the problem of RNN. Zhang et al. [10] used bidirectional LSTM for relation extraction and achieved good improvement. Song et al. [11] proposed a graph state LSTM model that uses parallel states to model each word, continuously enriching the state values through message passing and speeding up the computation through more parallelization. At that time, their model performed well on the CIVIC dataset. Vu et al. [12] showed that the relation extraction performance could be further improved by combining CNN with RNN. Later, the continuous development of attention mechanism and language models provided better semantic representation solutions for natural language processing tasks, and the birth of BERT pre-training model caused a significant repercussion in the field of NLP. Zhou et al. [13] used bidirectional LSTM combined with attention mechanism for relation extraction task, which could effectively learn the importance of words using attention mechanism. Lin et al. [14] used graph neural network to combine sentence semantics to represent relations and incorporated attention mechanism to minimize the impact of incorrect labels in samples. Wu et al. [15] used BERT for relation extraction task, which greatly improved the effectiveness and made new progress in stages.

Existing research has extensively used syntactic information, especially dependency syntax tree, to improve relation extraction. Miwa and Bansal [16] have demonstrated that dependency syntax tree is beneficial for relation extraction and that dependency analysis is effective in capturing long-distance word relations in relation extraction tasks. However, over-utilization of dependency information can lead to confusion in the relation extraction models, so necessary pruning strategies are needed to alleviate this problem. For example, Xu et al. [17] proposed to use only the shortest dependency path between two entities for the dependency connections and model them using LSTM. Miwa and Bansal proposed to prune the original dependency tree into the lowest common ancestor subtree. These pruning strategies are either too radical or too mild and do not effectively solve the problem. The pruning strategy proposed in this paper lies between these two methods, is theoretically effective, and has been demonstrated through experiments. In recent years, GCN considering the dependency structure has been widely used in natural language processing tasks. Zhang et al. [18] used GCN considering the dependency structure of word sequences for relation extraction. The method proposed in this paper is closely related to GCN and improves the standard GCN without complex model design by designing the graph convolution module from the dependency tree perspective, which effectively incorporates information external of the sentences. Most previous researchers have validated the effectiveness of relation extraction models on generic datasets, and less work has been done using deep neural networks for biomedical relation extraction [19]. The work in this paper applies natural language processing technology to biomedical relation extraction to demonstrate that deep neural networks are also effective for relation extraction between biomedical entities, rather than being limited to generic datasets, reflecting the robustness of the model method in this paper.

3. Method Design

3.1. Dependency Tree Pruning Strategy

Relation extraction is usually performed as a typical classification task. Specifically, given an unstructured input sentence with n words, X =

x_{1}

,

x_{2}

, …,

x_{n}

, let

E_{1}

and

E_{2}

be two entities in X. The goal is to predict the relation r between

E_{1}

and

E_{2}

, where

r \in R

and R is the set of relation types.

In this paper, we study from the perspective of dependency syntax tree, which first needs to process the dependency tree of each input sentence. The complete dependency syntax tree of a sentence is shown in Figure 1.

As shown in the figure, the sentence is from the biomedical dataset DDI, the bolded Thiazides and norepinephrine are two entities in it, and the relation between the entity pairs is “effect”. After obtaining the complete dependency syntax tree, the initial task is to prune the dependency tree to filter out the external structural interfering information. In this paper, we propose a pruning strategy that includes two kinds of dependency connections: local dependency connections and global dependency connections.

Local dependency connections refer to all dependency relations including those directly connected to the head of two entities. This pruning method is to preserve only the master–slave relations related to the entities and remove the rest of the irrelevant dependency connections. This is shown in Figure 2.

Global dependency connections refer to all dependency relations between two entities along the shortest path. This pruning method preserves the information describing the relation between two entities. The shortest path between entities condenses the most enlightening information of entity relations, providing a strong hint for the classification of relations. As shown in Figure 3.

The combination of these two pruning methods is the pruning strategy in this paper, as shown in Figure 4. The intra-entity and inter-entity information is utilized to remove some irrelevant information and weaken the impact of noise information. Especially when the input sentences are long and the dependency information is complex. As in the biomedical dataset used in this paper, many of the sentences have more words, the semantics of describing interactions are more complex, and the generated dependency tree has many dependency connections, which requires further optimization.

The detailed pruning operation is shown in Figure 5 below:

Figure 5 shows the detailed illustration of dependency tree pruning. Firstly, the tree is represented with the central word of a sentence (ROOT verb) as the root node, and the pruning strategy of L+G is clearly seen from the figure. Local Pruning (local pruning strategy around entities) strategy is to take two entity word nodes as the main ones, preserving the edges(dependencies) with their parent nodes, and then traverse their child nodes, leaving the edges(dependencies) with the child nodes if there are any. Global Pruning (global retention strategy around central word) strategy is to obtain the path from one entity word node through the central word node to another entity word node, leaving the dependencies of this path. The pruning strategy of designing L+G covers more completely the structure of syntactic relation related to and between entities, which contains enough dependency information to characterize the main information of a sentence. The word-to-word dependencies that are not related to entity words are removed to reduce the interference of irrelevant information in predicting entity-to-entity relations.

Figure 6 shows the construction of the dependency type matrix D. It is obtained from the L+G Tree obtained by pruning the sentence dependency tree according to Figure 5 and the dependency type dictionary. It is used in the model below to demonstrate how this paper utilizes the external structural information of sentences.

3.2. Dependency Information Fusion Attention Graph Convolutional Network

In this paper, we propose a graph convolutional network architecture incorporating dependency information. We call this model Dependency Information Fusion Attention Graph Convolutional Network, abbreviated as DIF-A-GCN. The model is divided into: input layer, BERT encoding layer, graph convolution layer, attention mechanism, and classification layer.

Figure 7 shows the model diagram of this paper. The input sentence dependency tree is pruned by the pruning operation in Section 3.1, and then the sentence is encoded by BERT. The obtained word vectors enter the graph convolution module in the blue dashed box on the left. The Adjacency Matrix(A) in the figure is obtained based on the nodes and edges of the dependency connections after the pruning of the sentence dependency tree and is used in the graph convolution model. The Dependency Type Matrix(D) in the figure refers to the matrix obtained in Figure 6, which serves to incorporate the external dependency information of the sentence into the word embedding so as to perform graph convolution for feature extraction and obtain a more complete representation of entity pairs. The blue dashed box module on the right is used to calculate the attention weight of each dependency connection to obtain the third matrix Attention Matrix, which guides the model to learn features better by incorporating attention mechanism. All three matrices are used in the graph convolution module, which improves the traditional GCN. Finally, the relation between two given entities is predicted by sentence representation and entity pair representation.

3.2.1. Input Layer

In this paper, when processing data, special tokens are inserted into the input sentences to mark out the entity pairs to better reflect the structural features of the sentences. The details are as follows:

<e1> Thiazides </e1> may decrease arterial responsiveness to <e2> norepinephrine </e2>.

This approach enables the encoder to distinguish the position of entities during encoding, thus improving the performance of the model.

3.2.2. BERT Encoding Layer

Machine learning methods cannot directly process text data and require suitable methods to convert text data into digital data, thus introducing the concept of word vectors. Word vectors are representations of words as low-dimensional dense vectors for subsequent tasks, such as the traditional word vectors word2vec and glove, which are based on the assumption of the distribution of word meanings to obtain its unique mapping vector. However, this approach ignores the problem of multiple meanings of words and does not take into account the semantics of the context, and is therefore a static word vector. In 2018, Google proposed the pre-trained language model BERT [20], which uses a self-supervised method to learn a feature representation for each word on the basis of a massive corpus. The structure of previous pre-trained models would be limited by the one-way language model (left-to-right or right-to-left), thus also limiting the representational power of the model to acquire only unidirectional contextual information. BERT uses a masked language model (MLM) for pre-training and a deep bi-directional Transformer component to construct the entire model, thus generating a deep bi-directional linguistic representation that incorporates left and right contextual information. In this paper, we first obtain the word vector and sentence vector representations of each data through the BERT model.

Assuming X = [

x_{1}, x_{2}, \dots, x_{n}

] is an input text sentence in the dataset, where

x_{j}

(

1 \leq j \leq n

) is the jth word in the sentence. In this paper, we use BERT to encode X as follows:

H = [h_{1}, h_{2}, \dots, h_{n}] = B E R T ([x_{1}, x_{2}, \dots, x_{n}])

(1)

where

h_{j}

is the hidden vector of the last layer of BERT output.

3.2.3. Graph Convolution Layer

Graph Convolutional Network (GCN) is a widely used architecture for encoding information in a graph. It convolves the features of neighboring nodes and also propagates the information of a node to the nearest neighboring node [21]. By overlaying GCN layers, GCN can extract the features of each node. Generally, the graph in the standard GCN model is constructed from word dependency and represented by the adjacency matrix A = (

a_{i, j}

)

_{n * n}

, with

a_{i, j}

= 1 when

i = j

or when there is an edge connection between two words

x_{i}

and

x_{j}

, otherwise

a_{i, j}

= 0. Based on the adjacency matrix A, for each word

x_{i} \in X

, the first GCN layer aggregates the information carried by its contextual word in the sentence and computes the output representation of

x_{i}

by computing the output representation of

h_{i}^{(l)}

in the following way:

h_{i}^{(l)} = σ (\sum_{j = 1}^{n} a_{i, j} (W^{(l)} \cdot h_{j}^{(l - 1)} + b^{(l)}))

(2)

where

h_{j}^{(l - 1)}

is the output of

x_{j}

of the GCN model at layer

(l - 1)

,

W^{(l)}

and

b^{(l)}

are the trainable weight matrix and bias of the GCN at layer l, respectively, and

σ

is the ReLU activation function.

In Equation (2), the dependency connections between words are treated equally as

a_{i, j}

either 1 or 0. Therefore, the traditional GCN model cannot distinguish the importance of different connections. To solve this problem, we propose a graph convolutional network incorporating attention mechanism to calculate the weights of different dependency connections, which enables the model to distinguish important information and reduce the impact of distractive information. In order to calculate the weights, we need to use the dependency types between words in the dependency syntax tree. Specifically, we first construct dependency type matrix D =

{(d_{i, j})}_{n * n}

to represent the dependency types in the dependency tree

D_{x}

. Details of which are shown in Figure 6, where

d_{i, j}

are the dependency types of the directed dependency connections between

x_{i}

and

x_{j}

(e.g.,

n s u b j

). Next, each dependency type

d_{i, j}

is mapped to its embedding vector

u_{i, j}^{d}

. At the lth GCN layer, the weights

k_{i, j}^{(l)}

of the dependency connections between

x_{i}

and

x_{j}

are computed:

k_{i, j}^{(l)} = \frac{a_{i, j} \cdot exp (v_{i}^{(l)} \cdot v_{j}^{(l)})}{\sum_{j = 1}^{n} a_{i, j} \cdot exp (v_{i}^{(l)} \cdot v_{j}^{(l)})}

(3)

where

a_{i, j} \in A

,

v_{i}^{(l)}

and

v_{j}^{(l)}

are the intermediate vectors of words

x_{i}

and

x_{j}

, respectively, given by equation:

v_{i}^{(l)} = h_{i}^{(l - 1)} \oplus u_{i, j}^{d}

(4)

v_{j}^{(l)} = h_{j}^{(l - 1)} \oplus u_{i, j}^{d}

(5)

The computation is obtained, which represents the output of the previous layer of GCN and the dependency type embedding vector for concatenating to incorporate dependency information into the model. After that, this paper applies weights

k_{i, j}^{(l)}

to the dependency connections between

x_{i}

and

x_{j}

to obtain the output representation of

x_{i}

as follows:

h_{i}^{(l)} = σ (\sum_{j = 1}^{n} k_{i, j}^{(l)} (W^{(l)} \cdot H_{j}^{(l - 1)} + b^{(l)}))

(6)

where

σ

,

W^{(l)}

and

b^{(l)}

have the same meaning in Equation (2).

H_{j}^{(l - 1)} = h_{j}^{(l - 1)} + W_{D}^{(l)} \cdot u_{i, j}^{d}

(7)

W_{D}^{(l)}

maps the dependency type embedding vector

u_{i, j}^{d}

into the same dimension as the

h_{j}^{(l - 1)}

.

The original GCN was designed for undirected graphs. In this paper, we start from the dependency syntax tree, and in order to take into account the directedness of the dependency tree, this paper uses a bidirectional GCN model incorporating the attention mechanism to extract its structural features on this basis.

\vec{h_{i}^{(l)}} = σ (\sum_{j = 1}^{n} k_{i, j}^{(l)} ({\vec{W}}^{(l)} \cdot H_{j}^{(l - 1)} + {\vec{b}}^{(l)}))

(8)

\overset{\leftarrow}{h_{i}^{(l)}} = σ (\sum_{j = 1}^{n} k_{i, j}^{(l)} ({\overset{\leftarrow}{W}}^{(l)} \cdot H_{j}^{(l - 1)} + {\overset{\leftarrow}{b}}^{(l)}))

(9)

h_{i}^{(l)} = \vec{h_{i}^{(l)}} \oplus \overset{\leftarrow}{h_{i}^{(l)}}

(10)

In this paper, the outgoing and incoming features of words are concatenated together as the final word features.

3.2.4. Classification Layer

After obtaining the output representation of each word, the vector representation of the entity

h_{E_{q}}

is obtained by the max pooling method:

h_{E_{q}} = M a x P o o l i n g (h_{i}^{(L)} | x_{i} \in E_{q})

(11)

Finally, we concatenate together the entire sentence and the vector representations of the two entities (i.e.,

h_{X}

,

h_{E_{1}}

and

h_{E_{2}}

) to obtain the output vector O:

O = W^{R} \cdot (h_{X} \oplus h_{E_{1}} \oplus h_{E_{2}})

(12)

where

W^{R}

is a trainable matrix and

h_{X}

is a whole sentence vector representation, obtained directly from the

[C L S]

label in BERT without going through the graph convolutional network module. O is an R-dimensional vector whose each value represents a relation type in the set R of relation types. The softmax function is then applied to predict the relation

\hat{r}

between two entities:

\hat{r} = arg max \frac{exp (O^{s})}{\sum_{s = 1}^{| R |} exp (O^{s})}

(13)

where

O^{s}

represents the value of dimension s in O. The model of this paper is shown in Figure 7. The above is the model method studied in this paper for biomedical relation extraction, which will be evaluated with two biomedical datasets in the next experiments.

4. Experiment and Analysis

4.1. Experiment Data

This paper mainly evaluates and analyzes the model on two biomedical datasets: Drug-Drug Interaction (DDI) [22] and Clinical Interpretations of Variants In Cancer (CIVIC) [23]. The DDI dataset consists of two parts: the DrugBank database and the MedLine abstracts, which contain a total of 1025 text datas and define 5 types of drug-entity interaction relations: advise, effect, int, mechanism, negative. These 5 types of drug relation types are presented in the form of sentences, with at least two drug entities in each sentence sequence. Among these 5 types of relation types, the negative relation type accounted for a disproportionate share of 85.5%, and the data were severely unbalanced. Negative indicated that there was no interaction relation between the two drugs in the sentence. Previous researchers have typically filtered out these negative samples where no interaction relation exists, such as the study by Sahu et al. We therefore followed that practice by discarding that class and considering only data for which a relation existed for the experiment. Finally, we obtained 3000 training data (1.1 MB), 500 validation data (190 KB) and 500 test data (200 KB). Following the data preprocessing method proposed above, the positions of two drug entities were marked in each sentence and the true relation type were indicated. CIVIC refers to the interaction between drugs and variants, and the corpus comes from PubMed Central, which contains a large amount of biomedical literature. We obtained 3000 training data (1.5 MB), 500 validation data (270 KB) and 500 test data (250 KB) from it. The dataset defines four relation types: sensitivity, resistance, response, resistance or non-response, and the distribution of relation types is relatively balanced. For the CIVIC dataset, we follow the data processing method of previous researchers and combine it with the method of this paper. We use the entity tagger from Literome to tag the two entities mentioned in the sentences, namely drug and variant, and label the entity location and true relation labels.

In addition, this paper uses the evaluation indexes P(Precision), R(Recall) and F1 score to analyze and evaluate the experimental results. Their formulas are expressed as:

P = \frac{T P}{T P + F P}

(14)

R = \frac{T P}{T P + F N}

(15)

F 1 = \frac{2 * P * R}{P + R}

(16)

4.2. Experiment Setup and Results

The experiments in this paper use the PyTorch deep learning framework with the integrated environment Miniconda 3. The experiments train the model on a GPU of NVIDIA RTX3090 with 24 GB of video memory. The experiments use a case-free version of the BERT-base encoder, and ignore case when setting the parameters with the do_lower_case value defaulted to True. We follow the official default settings, for BERT-base, use 12 layers of multi-head attention with 768-dimensional hidden vectors. In the graph convolution module, we randomly initialize all trainable parameters and dependency type embeddings.

In order to maximize the effect of the model, a series of experiments were conducted in this paper to determine the most appropriate parameter values. The specific parameters of the experiments are shown in Table 1 above. The Adam optimizer is used to update the parameters of the iterative neural network. Adam combines the advantages of both AdaGrad and RMSProp optimization algorithms. The first-order moment estimates of the gradient (the mean of the gradient) and the second-order moment estimates (the uncentered variance of the gradient) are combined to calculate the update step. We use cross-entropy loss as the loss function. The final step in biomedical relation extraction studied in this paper is the classification problem, and many of the neural network models dealing with classification problems use cross-entropy as the loss function, with the following formula:

H (p, q) = - \sum_{x} (p (x) l o g (q (x)))

(17)

It is desirable that the predicted data distribution learned by the model on the training data is as similar to the real data distribution as possible, with the aim of minimizing the loss.

4.2.1. Verification of Pruning Strategy

In this paper, we first verify the correctness of the pruning strategy. We start from the dependency structure. In order to construct the graph of the bidirectional GCN, we process each input sentence using Stanford CoreNLP, a Stanford syntactic parser, to obtain lexical information and the dependency syntax tree of the sentence. After obtaining the dependency tree of each sentence, we perform dependency analysis on the full dependency tree (Full), SDP (shortest dependency path) dependency tree and the local–global connected dependency tree (L+G) proposed in this paper to obtain the dependency path adjacency matrix and dependency type matrix of different state dependency trees. Experiments are done to verify the correctness and advantages of the pruning strategy proposed in this paper, and the following results are obtained on two biomedical datasets:

In Table 2, Full represents the complete dependency tree, SDP represents the current mainstream pruning method, and L+G represents our proposed pruning method. The experimental results in Table 2 show that when the input is a complete dependency syntax tree, the model is less effective than other pruned strategies. It is mainly caused by too much external information that makes the model prediction confusing. Especially in biomedical dataset, when the sentence length is long, the semantic is complex, and there are many dependencies between words, which has a improper semantic guidance when predicting the relation between two given entities. SDP is used by many researchers by preserving the dependency information on the shortest path between two entities. It ignores the contextual dependency information within the entity associated with the entity, resulting in the effect of over-pruning and missing information. The pruning strategy in this paper fully considers the intra-entity and inter-entity dependency information and discards the dependency connections between words that are not related to entities. There is a significant improvement on both datasets, which proves that the pruning strategy we propose is reasonable and effective.

Case Study:

Table 3 verifies the effectiveness of the pruning strategy in this paper through one data of DDI. The main meaning of the sentence is about the 2- to 3-fold elevation of liver enzymes in 5 out of 30 patients in the combination study of ARAVA with methotrexate. It represents the effect produced by the interaction of the two drugs and does not describe the drug mechanism, so the relation is effect rather than mechanism. Whether the full dependency tree or the SDP pruned dependency tree fails to extract the correct relation, they both subject the model to the dependency information for the term liver enzymes, incorrectly directing the model to believe that this sentence represents the pharmacological mechanism between the two drug entities. The pruning strategy in this paper removes the noisy information and directs the model to think that these are the two drugs that will have an effect on liver enzymes. The model will follow this layer of semantics to predict the relation between the entity pairs without being influenced by the liver enzymes words.

4.2.2. Verification of GCN Layers

In addition, this paper also conducts a comparative experiment on the number of GCN layers as a way to verify whether stacking GCN layers improves the relation extraction model in this paper. The experimental results are as follows:

The experimental results in Table 4 show that the number of GCN layers is improved from 1 to 2, and there is no improvement beyond 2 layers, but the effect will decrease. 2-layer GCN can aggregate feature information of most nodes in the network to the target node. If the number of layers is deeper, the objects of aggregated features of each node will be heavily duplicated, resulting in a weak differentiation ability of the trained embedding, which is not good for the later classification loss function. Therefore, the graph convolution module in this paper uses 2-layer GCN.

4.2.3. Model Comparison

Finally, this section experimentally compares the performance differences between some baseline models and current mainstream models with the DIF-A-GCN biomedical relation extraction model proposed in this paper.

For the DDI dataset, a number of CNN-based and RNN-based baseline models are selected in this paper. A number of them are dependency-based models. The more typical ones are: the syntactic CNN (SCNN) proposed by Zhao et al. (2016) [24], which used syntactic information of sentences as well as features of lexical labels and dependency trees for syntactic word embedding. Quan et al. (2016) [25], which used multi-channel CNN (MCCNN) for DDI relation extraction and achieved the fusion of multi-word embeddings. Joint AB-LSTM, proposed by Sahu and Anand (2018) [26], consists of two RNN-based models concatenated with the aim of taking advantage of maximum pooling and attention pooling techniques using two independent modules, each with a Bi-LSTM network and applying two pooling techniques to obtain features. Ma et al. (2018) [27] proposed a model of RvNN based on bottom-up and top-down tree structures. The core idea of the method is to enhance the high-level representation of tree nodes by following the recursion of the propagation structure on different branches in the tree. Zhang et al. (2017) [28] proposed a new neural sequence model with a position-aware attention mechanism for LSTM networks for relation extraction. The sequence model works well on short-range relations compared to methods using dependency information. Hong et al. (2020) [29] proposed a BERE model using a hybrid encoding network to represent each sentence semantically and syntactically, and a feature aggregation network for prediction after considering all relevant utterances, which performs well on the DDI dataset.

We selected the current mainstream relation extraction model mentioned above to compare with the model method proposed in this paper. The experimental results of the DDI dataset are shown in Table 5:

For the CIVIC dataset, this paper selects some mainstream models for biomedical relation extraction from the perspective of dependency syntax tree. Tree-LSTM: the authors firstly proposed to use a bidirectional tree LSTM to capture the dependency structure of the target sentences, making the features imply syntactic structure features. Graph LSTM-FULL [30]: the authors explored a graph long short-term memory network-based framework for relation extraction. It combines intra-sentence and inter-sentence dependencies, such as order, syntax, and discourse relations, to learn a robust context representation for entities as input to a relation classifier. AGGCN [31]: an attention-guided graph convolutional network for relation extraction was proposed to learn soft pruning in an end-to-end strategy model.

Table 6 shows the experimental results of this paper’s model compared with the above more mainstream model methods on the CIVIC dataset.

The experimental results in Table 5 and Table 6 demonstrate the effectiveness of our model. First, for DDI dataset, the RNN-based model works better than the CNN-based model. The CNN model requires pooling of n consecutive grams built on the entire sentence to obtain features of constant length, where n is the length of the convolution or filter, which may cause problems for long sentences or sentences with important cues far apart. The main feature of LSTM is that it bears some memory ability and performs better than CNN models when processing sequences (such as a sentence), especially when biomedical data have long sentences and large entity separation lengths. The model in this paper improves 3.2 percentage points over the best RNN-based BERE model. Not only because of the power of BERT encoder, but also because we perform graph convolution operations after efficiently pruning the dependency tree of the sentence, encoding the word-to-word dependency types into the corresponding feature representation of each node as well, obtaining a more complete and dense encoding representation of each word. By applying attention mechanism to dependency connections in the graph convolution layer, assigning a weight to each edge, they can be better utilized for semantic guidance in predicting the relation between given entities. The dependency structure-based model has better performance than the sequence feature-based model. It mainly because the structure-based model uses graph neural networks to capture more complex correlations between words, thus improving the ability of the feature representation to reflect relation in the text. It is also evident from the CIVIC dataset in Table 6 that AGGCN and DIF-A-GCN proposed in this paper can extract structural features better using the graph convolution module. They comprehensively utilizes the context semantic features of words and the long-distance structural features between words, and combine local and global correlations for relation extraction to obtain better results and improve F1 score.

5. Discussion

By analyzing the experimental results above, it is easy to see that our proposed local-global pruning strategy outperforms the current mainstream pruning strategy, and our proposed method of dependency information fusion via GCN outperforms other mainstream models. However there are still limitations in our research methodology. Because we study data in the biomedical domain, but there is no tool for dependency syntactic analysis for data in this domain. We use a generalized out-of-domain parser like previous researchers. As a result the quality of dependency syntax trees generated for biomedical text sentences is limited, which results in failure to maximize the effectiveness of biomedical relation extraction methods. This is a limitation of our research methodology, as well as that of numerous studies. In future work, how to solve this limitation is still a big challenge. We will still continue to study biomedical relation extraction from the perspective of dependency syntax trees, so as to promote the development and progress of biomedical text mining.

6. Conclusions

With the development of artificial intelligence and big data, almost every field has implemented the use of AI technology. In this paper, we also apply the relation extraction model based on deep learning to the field of biomedicine, which combines natural language processing with biomedicine.

In terms of technology route, this paper proposes an effective local–global pruning strategy for dependency syntax tree based on most of the existing research on introducing syntactic information to achieve relation extraction. We then incorporate dependency information of sentences into the model for feature extraction by constructing dependency-type matrix. The importance of different dependency connections is distinguished by adding attention weights to GCN to guide the model to utilize dependency information. In this paper, the directedness of the dependency tree is considered to obtain more complete word features, which can effectively help in predicting the classification relation. Our pruning strategy obtains an F1 score of 77.1 on the DDI dataset, which is 3.8 and 2.4 percentage points higher than the unpruned and mainstream pruning strategy, respectively. An F1 score of 77.9 is obtained on the CIVIC dataset, which is 3.7 and 1.8 percentage points higher than the unpruned and mainstream pruning strategy, respectively. In this way, we validate the effectiveness of our pruning strategy. By model performance comparison, our modeling method outperforms the current best model by 3.2 and 0.5 percentage points on the DDI and CIVIC datasets, respectively. Experiments conducted on two biomedical datasets have demonstrated the improvement of our research method for the problem of noise propagation in biomedical relation extraction, achieving state-of-the-art performance. We improve the robustness of the dependency structure-based model by removing irrelevant content without ignoring critical dependency information.

In the future, with the development of deep learning and NLP technology, more concise and efficient biomedical relation extraction models will be developed to promote the research and development of biomedical text mining.

Author Contributions

Conceptualization, W.Y. and L.X.; methodology, W.Y. and L.X.; software, L.X. and L.Z.; validation, W.Y., L.X. and M.G.; formal analysis, H.C.; investigation, W.Y.; resources, L.X.; data curation, W.Y.; writing—original draft preparation, W.Y. and L.X.; writing—review and editing, W.Y., L.X., L.Z. and M.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 62002206).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jin, Y.; Li, J.; Lian, Z.; Jiao, C.; Hu, X. Supporting Medical Relation Extraction via Causality-Pruned Semantic Dependency Forest. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2450–2460. [Google Scholar]
Liu, K. A survey on neural relation extraction. Sci. China Technol. Sci. 2020, 63, 1971–1989. [Google Scholar] [CrossRef]
Kambhatla, N. Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, Barcelona, Spain, 21–26 July 2004; pp. 178–181. [Google Scholar]
Bunescu, R.; Mooney, R. A shortest path dependency kernel for relation extraction. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada, 6–8 October 2005; pp. 724–731. [Google Scholar]
Qian, M.; Wang, J.; Lin, H.; Zhao, D.; Zhang, Y.; Tang, W.; Yang, Z. Auto-learning convolution-based graph convolutional network for medical relation extraction. In Proceedings of the Information Retrieval: 27th China Conference, CCIR 2021, Dalian, China, 29–31 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 195–207. [Google Scholar]
Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation classification via convolutional deep neural network. In Proceedings of the COLING 2014, the 25th International Conference On Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 2335–2344. [Google Scholar]
dos Santos, C.; Xiang, B.; Zhou, B. Classifying Relations by Ranking with Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015. [Google Scholar]
Socher, R.; Huval, B.; Manning, C.D.; Ng, A.Y. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 1201–1211. [Google Scholar]
Suárez-Paniagua, V.; Segura-Bedmar, I. Extraction of drug-drug interactions by recursive matrix-vector spaces. In Proceedings of the 6th International Workshop on Combinations of Intelligent Methods and Applications (CIMA 2016), The Hague, Holland, 30 August 2016; Volume 2016, p. 65. [Google Scholar]
Zhang, S.; Zheng, D.; Hu, X.; Yang, M. Bidirectional long short-term memory networks for relation classification. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, Shanghai, China, 30 October–1 November 2015; pp. 73–78. [Google Scholar]
Song, L.; Zhang, Y.; Wang, Z.; Gildea, D. N-ary relation extraction using graph state LSTM. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
Vu, N.T.; Adel, H.; Gupta, P.; Schütze, H. Combining Recurrent and Convolutional Neural Networks for Relation Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 534–539. [Google Scholar]
Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin/Heidelberg, Germany, 7–12 August 2016; pp. 207–212. [Google Scholar]
Lin, Y.; Shen, S.; Liu, Z.; Luan, H.; Sun, M. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin/Heidelberg, Germany, 7–12 August 2016; pp. 2124–2133. [Google Scholar]
Wu, S.; He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing China, 3–7 November 2019; pp. 2361–2364. [Google Scholar]
Miwa, M.; Bansal, M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin/Heidelberg, Germany, 7–12 August 2016; pp. 1105–1116. [Google Scholar]
Xu, Y.; Mou, L.; Li, G.; Chen, Y.; Peng, H.; Jin, Z. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1785–1794. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2205–2215. [Google Scholar]
Li, F.; Zhang, M.; Fu, G.; Ji, D. A neural joint model for entity and relation extraction from biomedical text. BMC Bioinform. 2017, 18, 1–11. [Google Scholar] [CrossRef] [PubMed]
Kenton, J.D.M.W.C.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the naacL-HLT, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 2. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2016. [Google Scholar]
Herrero-Zazo, M.; Segura-Bedmar, I.; Martínez, P.; Declerck, T. The DDI corpus: An annotated corpus with pharmacological substances and drug–drug interactions. J. Biomed. Inform. 2013, 46, 914–920. [Google Scholar] [CrossRef] [PubMed]
Griffith, M.; Spies, N.C.; Krysiak, K.; McMichael, J.F.; Coffman, A.C.; Danos, A.M.; Ainscough, B.J.; Ramirez, C.A.; Rieke, D.T.; Kujan, L.; et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 2017, 49, 170–174. [Google Scholar] [CrossRef]
Zhao, Z.; Yang, Z.; Luo, L.; Lin, H.; Wang, J. Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. Bioinformatics 2016, 32, 3444–3453. [Google Scholar] [CrossRef] [PubMed]
Quan, C.; Hua, L.; Sun, X.; Bai, W. Multichannel convolutional neural network for biological relation extraction. Biomed Res. Int. 2016, 2016, 1850404. [Google Scholar] [CrossRef] [PubMed]
Sahu, S.K.; Anand, A. Drug-drug interaction extraction from biomedical texts using long short-term memory network. J. Biomed. Inform. 2018, 86, 15–24. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Gao, W.; Wong, K.F. Rumor Detection on Twitter with Tree-structured Recursive Neural Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018. [Google Scholar]
Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; Manning, C.D. Position-aware attention and supervised data improve slot filling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
Hong, L.; Lin, J.; Li, S.; Wan, F.; Yang, H.; Jiang, T.; Zhao, D.; Zeng, J. A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories. Nat. Mach. Intell. 2020, 2, 347–355. [Google Scholar] [CrossRef]
Peng, N.; Poon, H.; Quirk, C.; Toutanova, K.; Yih, W.T. Cross-sentence n-ary relation extraction with graph lstms. Trans. Assoc. Comput. Linguist. 2017, 5, 101–115. [Google Scholar] [CrossRef]
Guo, Z.; Zhang, Y.; Lu, W. Attention Guided Graph Convolutional Networks for Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 241–251. [Google Scholar]

Figure 1. A complete dependency syntax tree.

Figure 2. Local dependency connection.

Figure 3. Global dependency connection.

Figure 4. Pruned dependency tree.

Figure 5. Illustration of L+G Tree pruning.

Figure 6. Illustration of the construction of the dependency type matrix.

Figure 7. Model diagram.

Table 1. Parameter settings.

Parameter Name	Parameter Value
Epoch	100
BatchSize	16
Learning_rate	3 × 10⁻⁵
Warmup_rate	0.06
Random_seed	42
Max_seq_length	128
Dropout	0.2

Table 2. Comparison of F1 values (%) of experimental results.

	DDI	CIVIC
Full	73.3	74.2
SDP	74.7	76.1
L+G(ours)	77.1	77.9

Table 3. DDI data case study.

DDI Sentence	In a small combination study of ARAVA with methotrexate, a 2- to 3-fold elevation in liver enzymes was seen in 5 of 30 patients.
True relation	e1: ARAVA, e2: methotrexate, RE: effect
Full	e1: ARAVA, e2: methotrexate, RE: mechanism
SDP	e1: ARAVA, e2: methotrexate, RE: mechanism
L+G(ours)	e1: ARAVA, e2: methotrexate, RE: effect

Table 4. Comparison of F1 values (%) of experimental results.

	DDI	CIVIC
1-layer GCN	74.0	75.8
2-layer GCN	77.1	77.9
3-layer GCN	73.5	75.1

Table 5. Comparison of the results of various models on the DDI dataset.

Models	F1(%)
MV-RNN	50.0
SCNN	67.0
CNN-bioWE	69.8
MCCNN	70.2
Joint AB-LSTM	71.5
RvNN	71.7
Position-aware LSTM	73.0
BERE	73.9
DIF-A-GCN(ours)	77.1

Table 6. Comparison of the results of various models on the CIVIC dataset.

Models	F1(%)
BiLSTM-Shortest-Path	70.2
GRN	71.7
CNN	73.0
BiLSTM	73.9
Tree LSTM	75.9
Graph LSTM-FULL	76.7
AGGCN	77.4
DIF-A-GCN(ours)	77.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, W.; Xing, L.; Zhang, L.; Cai, H.; Guo, M. A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion. Appl. Sci. 2023, 13, 10055. https://doi.org/10.3390/app131810055

AMA Style

Yang W, Xing L, Zhang L, Cai H, Guo M. A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion. Applied Sciences. 2023; 13(18):10055. https://doi.org/10.3390/app131810055

Chicago/Turabian Style

Yang, Wanli, Linlin Xing, Longbo Zhang, Hongzhen Cai, and Maozu Guo. 2023. "A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion" Applied Sciences 13, no. 18: 10055. https://doi.org/10.3390/app131810055

APA Style

Yang, W., Xing, L., Zhang, L., Cai, H., & Guo, M. (2023). A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion. Applied Sciences, 13(18), 10055. https://doi.org/10.3390/app131810055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Biomedical Relation Extraction Method Based on Graph Convolutional Network with Dependency Information Fusion

Abstract

1. Introduction

2. Related Work

3. Method Design

3.1. Dependency Tree Pruning Strategy

3.2. Dependency Information Fusion Attention Graph Convolutional Network

3.2.1. Input Layer

3.2.2. BERT Encoding Layer

3.2.3. Graph Convolution Layer

3.2.4. Classification Layer

4. Experiment and Analysis

4.1. Experiment Data

4.2. Experiment Setup and Results

4.2.1. Verification of Pruning Strategy

4.2.2. Verification of GCN Layers

4.2.3. Model Comparison

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI