LSLSD: Fusion Long Short-Level Semantic Dependency of Chinese EMRs for Event Extraction

: Most existing medical event extraction methods have primarily adopted a simplex model based on either pattern matching or deep learning, which ignores the distribution characteristics of entities and events in the medical corpus. They have not categorized the granularity of event elements, leading to the poor generalization ability of the model. This paper proposes a diagnosis and treatment event extraction method in the Chinese language, fusing long short-level semantic dependency of the corpus, LSLSD, for solving these problems. LSLSD can effectively capture different levels of semantic information within and between event sentences in the electronic medical record (EMR) corpus. Moreover, the event arguments are divided into short word-level and long sentence-level, with the sequence annotation and pattern matching combined to realize multi-granularity argument recognition, as well as to improve the generalization ability of the model. Finally, this paper constructs a diagnosis and treatment event data set of Chinese EMRs by proposing a semi-automatic corpus labeling method, and an enormous number of experiment results show that LSLSD can improve the F1-value of event extraction task by 7.1% compared with the several strong baselines.


Introduction
Information extraction is a critical task in natural language processing, which aims at extracted structured information from unstructured or semi-structured texts. Medical research, such as disease source [1,2], disease prediction [3], and clinical decision [4,5], needs to extract structured information from underlying medical corpus. This kind of effective extraction and learning for knowledge has become a significant task. Traditionally, the medical knowledge graph (KG) takes entities as nodes, edges as relations to describe medical concepts. However, in addition to the involved medical concepts, the rich and closely related diagnosis and treatment events (DTE) in electronic medical records (EMRs) also contain dynamic knowledge and disease development logic, which cannot be captured by a fine-grained conceptual structure. Therefore, it is necessary to study the extraction method of DTE [6,7].
At present, automatic content extraction (ACE) [8] is the most famous event extraction conference. It divides event extraction (EE) into two subtasks: trigger word detection and event argument recognition. The event trigger word (ETW) is the word that can indicate the event type in the sentence, and the event argument (EA) is the participant that describes the event. For example, the sentence "On 12 August 2015, the patient underwent radical resection of rectal cancer in our hospital due to rectal cancer" is an operation event, "underwent" is an ETW, and the EAs include "12 August 2015", "rectal cancer", "radical resection of rectal cancer".
Early research on medical DTE extraction mostly utilizes pattern matching [9,10]. Recent studies have shown that the sequence annotation methods based on the deep neural network have positive effects on the medical field [11,12]. These methods have incorporated the context of the entities in the form of distributed word representation and illustrated the importance of domain knowledge in clinical medical event extraction.
However, the basic-feature based EE methods merely adopt pattern matching or sequence annotation, which have limitations in fully mining the important but implied semantic dependency information contained in and between sentences. As shown in Table 1, both example 1 and example 2 are operation events. The ETWs and EAs contained in this type of event sentence have a certain regularity, such as that the ETWs are words of "underwent". EAs often contain entity words such as disease and operation name, while example 3 of pathological examination usually takes a long sentence-level test result as one of the EAs, and it is mostly located after the operation event sentence. Table 1. Examples of event sentences from Chinese present medical history.
In this paper, an elegant framework is proposed to recognize the complex ETWs and EAs in the rich DTEs in the history of present illness (HPI) texts of Chinese EMRs. The key idea of our work is to introduce different feature information to train the ETW classifier according to the distribution of entities and events at different corpus levels, and adopt different modules to realize the recognition of EAs with different granularity. Specifically, the contribution of this paper is three-fold:

1.
Filling up the gap in the current representation framework and annotated corpus of HPI event in Chinese EMRs based on the combined source from the ACE task definition and Informatics for Integrating Biology and the Bedside (I2B2) [13], through adopting the reverse maximum matching algorithm performing semi-automatically labeling of the PME corpus of Chinese EMRs.

2.
The proposed Chinese HPI event extraction method which fusing long short-level semantic dependency (LSLSD) of corpus not only highlights the semantic dependence between the components in sentences, but also emphasizes the semantic dependence between sentences. Meanwhile, it distinguishes the granularity of EAs at the short word-level and long sentence-level can better capture the semantic information of the corpus and enhance the generalization ability of the model.

3.
A series of experiments on the annotated dataset show that the proposed model LSLSD has significant performance in trigger word detection and event argument recognition. It can effectively recognize EAs of different granularity at the same time.

Related Works
Event extraction [14] is an information extraction task that can be traced back to the 1980s. With the emergence of big data and the development of text mining and natural language processing, event extraction technology has been popularized. The technology of event extraction of the English context is mature [15], while that of the Chinese context, especially in clinical diagnosis and treatment events, is relatively limited. Zheng et al. [16] first proposed the language issues regarding Chinese event extraction, and divides event extraction into four sub-tasks of event recognition, event classification, argument recognition, and argument role classification. We follow ACE by dividing the event extraction into two sub-tasks of event trigger word detection and event argument recognition.

Event Trigger Word Detection
The approaches of the ETW detection task are mainly divided into two categories. One is the traditional pattern matching method based on syntactic analysis or the clustering method. Marco et al. [17] completed ETW detection of the open domain through rules of event extraction established by using syntactic analysis, Bui et al. [10] and Huang et al. [18] are completed ETW detection by building extraction rules according to the characteristics of biomedical text and using the method of joint constraint clustering, respectively. The other one is the current popular machine learning classification methods, which classify the ETWs after extracting them from sentences. For example, Xia et al. [19] detect ETW through multitype features of lexical, dictionary, and syntactic according to the characteristics of biological literature, Yang et al. [20] and Zheng et al. [21] apply the remote supervision method to automatically label the financial texts from the knowledge base of Chinese financial event to realize the detection of financial ETWs. Jindal et al. [22] adopt the lexical features such as words and semantic relations in clinical terms to identify events. Wei et al. [23] construct a method based on sequence annotation and combine the static pre-training word vector represented by character level words with the dynamic context word representation based on the pre-training language model as the model input. However, most studies detect ETW by only using intra sentence information and ignore the information of inter sentences, while some scholars believe that the event distribution information at the document level can improve the accuracy of ETW detection of long-level semantic dependency. Xia et al. [24] construct event reasoning rules by using the document consistency feature to improve the classification results after recognizing and classifying ETW based on the semisupervised model. Li et al. [25] utilize the combination semantics of ETWs and document consistency in Chinese training set to improve the performance of ETW detection, the former is used to infer unknown ETWs in the test set, and the latter is used to infer the situation that is difficult to deal with based on feature method.
To make the most of semantic information implied in the corpus, different from the previous models, we designed some extended features based on the basic features according to the characteristics of the data, including intra sentence features, inter sentence features, dependency syntax, and other features, to improve the accuracy of ETW detection model from the semantic perspective of entity information and relationship information.

Event Argument Recognition
Recently, the most popular methods of EA task have been based on sequence annotation; the model of bidirectional long short-term memory (BiLSTM) combined with conditional random field algorithm (CRF) is most frequently employed in sequence annotation task and has achieved excellent results. BiLSTM can capture useful context information from forward and backward of sentences, while CRF has the advantage of using sentence-level and adjacent labeled information when predicting current label. Xu et al. [26] construct an event model of clinical guidelines. The model extracts treatment events of Chinese clinical guidelines by through Word2Vec, LSTM and CRF technologies. Zeng et al. [27] propose a convolution BiLSTM neural network that combines LSTM and convolutional neural network (CNN) for Chinese event extraction, which can capture both sentence-level and local lexical information without any hand-craft features, in addition to the design of ETW location feature and ETW type feature and connect them with the original word vector. However, these models only extract treatment event contains short word-level arguments and do not divide EAs into long short-level granularity, and the arguments of long sentence-level are hardly extracted by a single sequence annotation model. Wei et al. [24] propose a multi-classification model based on self-attention, which fully utilizes the features of entity and entity type. However, the model only considers the granularity of the corpus when obtaining the representation vector. Jari et al. [28] focus on the edge detection, entity context features and the relation path feature between entities in the biomedical field. The hybrid method proposed by Xuan et al. [29] combines rule-based and feature-based classifier, which is effective in the long sentence-level EAs on biological EA recognition. We follow a classic method of BiLSTM-CRF [30] in the EA recognition task for short word-level arguments; the differences are that we employ the transformer [31] instead of the original encoder and we integrate extended semantic features and design a method of joint sequence annotation and pattern matching to recognize multi granularity EAs. For the Chinese EMR text purpose, we improve the generalization of the EA recognition model by dividing the EAs into long-and short-level granularity, and joint pattern matching and sequence annotation methods, so that the model can not only accurately identify the short word-level EAs in the corpus, but also effectively recognize the long sentence-level EAs widely existing in Chinese EMRs, to improve the performance of event argument recognition model.

Materials and Methods
The event extraction model in this paper is applied to the DTE extraction of the HPI texts in Chinese EMRs. In that sense, we complete the semi-automatic annotation of entity, ETW, and EA according to the annotation standard for entity and event form ACE and I2B2. Moreover, an event extraction model fusion long short-level semantic dependence of corpus has been proposed to capture the feature information of the corpus at different levels. Figure 1 illustrates our framework. To simplify the expression, the common nouns in the paper are abbreviated, as shown in Table 2. We will first introduce the semi-automatic annotation of the corpus. original word vector. However, these models only extract treatment event contains sh word-level arguments and do not divide EAs into long short-level granularity, and arguments of long sentence-level are hardly extracted by a single sequence annotat model. Wei et al. [24] propose a multi-classification model based on self-attention, wh fully utilizes the features of entity and entity type. However, the model only considers granularity of the corpus when obtaining the representation vector. Jari et al. [28] foc on the edge detection, entity context features and the relation path feature between en ties in the biomedical field. The hybrid method proposed by Xuan et al. [29] combin rule-based and feature-based classifier, which is effective in the long sentence-level E on biological EA recognition. We follow a classic method of BiLSTM-CRF [30] in the recognition task for short word-level arguments; the differences are that we employ transformer [31] instead of the original encoder and we integrate extended semantic f tures and design a method of joint sequence annotation and pattern matching to recogn multi granularity EAs. For the Chinese EMR text purpose, we improve the generalizat of the EA recognition model by dividing the EAs into long-and short-level granular and joint pattern matching and sequence annotation methods, so that the model can n only accurately identify the short word-level EAs in the corpus, but also effectively rec nize the long sentence-level EAs widely existing in Chinese EMRs, to improve the perf mance of event argument recognition model.

Materials and Methods
The event extraction model in this paper is applied to the DTE extraction of the H texts in Chinese EMRs. In that sense, we complete the semi-automatic annotation of ent ETW, and EA according to the annotation standard for entity and event form ACE a I2B2. Moreover, an event extraction model fusion long short-level semantic dependen of corpus has been proposed to capture the feature information of the corpus at differ levels. Figure 1 illustrates our framework. To simplify the expression, the common nou in the paper are abbreviated, as shown in Table 2. We will first introduce the semi-au matic annotation of the corpus. The framework of LSLSD. This model is composed of two modules: trigger word detection A and event argument recognition B (the pink and blue background color, respectively). 1~4 are the basic features commonly used in trigger word detection task; 5~10 are the extended features proposed in this paper. By fusing these semantic features, the potential association information within and between the event sentences in HPI texts can be fully mined. Table 2. Abbreviations of nouns are commonly used in the paper. Figure 1. The framework of LSLSD. This model is composed of two modules: trigger word detection (A) and event argument recognition (B) (the pink and blue background color, respectively). f 1 ∼ f 4 are the basic features commonly used in trigger word detection task; f 5 ∼ f 10 are the extended features proposed in this paper. By fusing these semantic features, the potential association information within and between the event sentences in HPI texts can be fully mined.  Table 3; • Building a dictionary of candidate ETWs: the ETW is the core word that indicates the occurrence of a certain event in the sentence, generally verb or gerund. The words with high frequency in each type of events are selected in this paper as the candidate ETWs for each type of event. As shown in Table 3, the dictionary is expanded by using the synonyms of the candidate ETWs in the synonym forest; • Defining entity types: I2B2 and CCKS2017 (http://www.sigkg.cn/ccks2017/?page_ id=51 (accessed on 18 January 2020)), respectively, divide medical named entities into three categories (medical problem, examination, treatment) and five categories (treatment, body part, symptom, examination, disease). Symptom entities are divided into body parts and symptom descriptions, and treatment entities into drug and operation in CCKS2018 (http://www.sigkg.cn/ccks2018/?page_id=16 (accessed on 18 January 2020)). Therefore, the entities in the data set are divided into three categories disease, symptom, and treatment. The symptom includes the body part and symptom description, and treatment includes drug, operation, and general treatment; • Corpus semi-automatic annotation: each HPI text contains 30-40 entities; manual annotation is time-consuming and labor-consuming. Therefore, this paper annotates corpus by a semi-automatic method based on a medical dictionary. Firstly, entity dictionary is collected from medical dictionaries (e.g., Dingxiangyuan (https://portal. dxy.cn/ (accessed on 21 September 2019)), Sogou Medical (https://pinyin.sogou. com/dict/cate/index/132 (accessed on 1 September 2019)) and 39 Health Net (http: //www.39.net/ (accessed on 18 September 2019)), then a reverse maximum matching algorithm is employed to tag the texts automatically. Secondly, the ETWs, EAs, and ETW types in texts are manually annotated, referring to the definition of the event type. Finally, two doctors are invited to check and correct the above annotation results.

Event Trigger Word Detection Module
Event trigger word detection is a vital step in the event extraction task. Its result directly affects the accuracy of event extraction. In this paper, ETW detection is regarded as a multi-classification task. By analyzing the long short-level semantic dependence intraevent sentence and inter sentences in HPI texts, extended features are proposed to support the training of classifiers, and to obtain more accurate ETW detection results.
Firstly, the words existing in the sentence contained in the candidate ETW dictionary are condemned as the ETW of the sentence. If the words are not found, the ETW will be replaced by words with higher similarity in the sentence and the candidate ETW dictionary. Then, several types of features commonly utilized in ETW detection are selected as the basic features, as shown in Table 4 (1-4). For details, please refer to paper [32]. Based on the basic features suitable for general texts, extended features are designed for the entity distribution information in medical event sentences by analyzing the linguistic characteristic of Chinese HPI event sentences, as shown in Table 4 (5,6). Where feature f 5 and f 6 , respectively, refer to the distribution and number of entities in different types of event sentences. These two extended features of short-level semantic dependency are proposed because each type of event sentence does not cover all six types of entities, and the number of entities is not the same; these distribution regulars play an important role in the classification of ETWs.
In particular, there is a correlation between the events because HPI text describes the occurrence and evolution of the patient's disease. For example, the admission event does not occur after the operation event, and the pathological examination event does not occur before the operation event. Therefore, inspired by [24,25], the extended feature of long-level semantic dependence refers to the consistency between event sentences in the HPI text is proposed named document consistency feature, as shown in Table 4 (7). Taking the operation event as an example, Figure 2 statistics the distribution of different event types before and after the operation event in the HPI texts. It can be seen that the probability of admission, diagnosis, and examination event occurring is higher before the operation event, but the probability of pathological test, chemotherapy, and general treatment event occurring is higher after the operation event.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 17 level semantic dependence refers to the consistency between event sentences in the HPI text is proposed named document consistency feature, as shown in Table 4 (7). Taking the operation event as an example, Figure 2 statistics the distribution of different event types before and after the operation event in the HPI texts. It can be seen that the probability of admission, diagnosis, and examination event occurring is higher before the operation event, but the probability of pathological test, chemotherapy, and general treatment event occurring is higher after the operation event. Based on the above semantic features, for a given HPI text , = { 1 , 2 , ⋯ , }, where = { 1 , 2 , ⋯ , } is the th event sentence and is the word in the sentence, where express the sentence length (total number of words). The goal of ETW detection task is to obtain the classification ( | ) of ETW in the event sentence to determine the event type. Firstly where '[;]'is the concatenation operation, ∈ ℝ * , * ∈ ℝ̂ (̂= * + ). Therefore, the dual form of the SVM optimization problem can be expressed as: Based on the above semantic features, for a given HPI text R, R = {S 1 , S 2 , · · · , S n }, where S n = {W n1 , W n2 , · · · , W ni } is the nth event sentence and W ni is the word in the sentence, where i express the sentence length (total number of words). The goal of ETW detection task is to obtain the classification P(T|S n ) of ETW in the event sentence to determine the event type. Firstly, R is transformed into the corresponding word vector [e W n1 , [e W n2 ], · · · ,[e W ni ]] through the pre-trained word embedding Word2Vec (https://github. com/tensorflow/tensorflow/tree/r1.1/tensorflow/examples/tutorials/word2vec (accessed on 22 April 2020)). Secondly, SVM is adopted as a classifier to classify ETWs. The one versus rest (OVR) method is adopted to realize the multi-classification task of ETW, K SVMs are trained for K ETWs categories, and the jth SVM is used to judge whether the ETWs belong to category j.
Specifically, SVM classifies the given x into category j by searching the maximum value of W T j x + b j . To mine the long short-level semantic dependence information in the corpus, we will concatenate the feature vector f m (m = 1, 2, · · · , 7) of input HPI  where '[;]'is the concatenation operation, S e ∈ R d n * i , x n * i ∈ R dx (dx = d n * i + d m ). Therefore, the dual form of the SVM optimization problem can be expressed as: where α q , α j ∈ C, y q ,y j are ETW types of x q and x j , respectively, K x q , x j is the kernel function, and c is the penalty coefficient.

Event Argument Extraction
Inspired by the research of [24], the EAs are divided into long sentence-level and short word-level according to the long short-level of the corpus, and the EAs are recognized by the method of joint sequence annotation and pattern matching. Here, the examination, pathological test, immunohistochemistry event result includes 20~50 words, defined as long sentence-level EA, and other EAs are short word-level EA.
In addition to the intra sentence and inter sentence semantic features described in Section 3.2, there are other semantic features in the event sentence of HPI, such as the location feature and type feature of ETWs and the dependency syntactic feature of event sentences shown in Table 5. The location feature refers to the distance between EAs and ETW in event sentence can provide syntactic information about the event for EA recognition because the distribution of EAs usually surrounds ETW. The ETW position is encoded by the transformer, and the even and odd position is encoded by PE (pos,2i) = sin (pos/10, 000 2i/d ) and PE (pos,2i+1) = cos (pos/10, 000 2i/d ), respectively, where pos is the ETW position in the sequence, d and i are vector dimension, and ith item in the sequence, respectively. Meanwhile, different event types often correspond to different kinds of EAs, which are closely related, while ETW type implies the type of event sentence. Moreover, under the writing standard of HPI text, the same type of event sentences usually follows a similar syntactic structure, such as the treatment event sentence "On May 14, 2015, the patient was given acid making, anti-inflammatory, and antiemetic nursing." and "The patient was given treatment for symptomatic after the operation, such as protecting liver and stomach and improving immunity.". The former ETW is "nursing", EAs are "acid making", "anti-inflammatory" and "antiemetic", the latter ETW is "treatment for symptomatic", EAs are "protecting liver" and "protecting stomach" and "improving immunity". Although the ETWs of the two event sentences are not the same, the EAs in each sentence are juxtaposed, and the EA and ETW are modifier-head constructions. Thus, the syntactic components such as "subject predicate object" and "attribute adverbial complex" in the event sentence can be identified by dependency parsing. The syntactic relations of dependency parsing, and their corresponding label are shown in Table 6; the specific types of relations can refer to the language technology platform LTP. (1) For EAs of short word-level, recognized by the sequence annotation method fusion semantic feature vectors of corpus, the model framework is given in Figure 3. Firstly, the vector representation [e B n1 , [e B n2 ], · · · ,[e B ni ]] of all words in each event sentence from HPI text R is obtained by BERT [33]. Secondly, the word vector is fused with semantic feature vectors f m (m = 5, 8,9,10); the fusion process is as follows: Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 17 HPI text is obtained by BERT [33]. Secondly, the word vector is fused with semantic feature vectors ( = 5,8,9,10); the fusion process is as follows: Then, encoding the fused vectors by a bi-directional LSTM network with a hidden layer size of ̃ can learn how to judge the key fusion vectors to obtain the semantic information of short word-level, where ̃= 1 2 ⁄ * . In the th time step, the hidden layer state of BiLSTM output is with the input ̂; this process is as following: The dropout layer is adopted to make some bi-directional long short-term memory units randomly inactivated for avoiding over-fitting of training results, and the semantic information of short word-level is enhanced by concatenating the vectors fused semantic features.
where ∈ ℝ * ×̂ is the weight matrix, is bias, and ( * ) is activation function, = [ 1 , ⋯ , ]， ∼ Bernoulli( ). Here, Bernoulli is utilized to randomly generate a vector of 0 or 1. Finally, the output ̂ of dropout is entered into CRF and the corresponding label sequence is = ( 1 , ⋯ , ). Then, all parameters of CRF can be estimated by maximizing ( |ĥ ) for the given HPI text : (ĥ ) is the normalization factor, , is the probability of label corresponds to ĥ , ̃− 1 , is the probability of label −1 corresponds to ĥ −1 under the premise of label corresponds to ĥ , and are hyperparameters. Therefore, the CRF is trained by calculating the maximum log-likelihood function: To sum up, we train the event extraction model LSLSD in Chinese DTE by minimizing the following loss function: where 1 and 2 are weights, Θ is the sum of the parameters of the model LSLSD. Then, encoding the fused vectors by a bi-directional LSTM network with a hidden layer size of d n can learn how to judge the key fusion vectors to obtain the semantic information of short word-level, where d n = 1/2d n * i . In the tth time step, the hidden layer state of BiLSTM output is h t ni with the inputê t ni ; this process is as following: The dropout layer is adopted to make some bi-directional long short-term memory units randomly inactivated for avoiding over-fitting of training results, and the semantic information of short word-level is enhanced by concatenating the vectors fused semantic features.
where W L ∈ R d n * i ×dx is the weight matrix, b L is bias, and f ( * ) is activation function, r L = r L n1 , · · · , r L ni , r L ni ∼ Bernoulli(p). Here, Bernoulli is utilized to randomly generate a vector of 0 or 1.
Finally, the outputĥ L ni of dropout is entered into CRF and the corresponding label sequence is y a = y a n1 , · · · , y a ni . Then, all parameters of CRF can be estimated by maximizing P(y a |ĥ L ) for the given HPI text R: Z ĥ L ni is the normalization factor, E y a g ,g is the probability of label y a g corresponds tô h L g , T y a v−1 ,y a v is the probability of label y a v−1 corresponds toĥ L v−1 under the premise of label y a v corresponds toĥ L v , λ g and µ v are hyperparameters. Therefore, the CRF is trained by calculating the maximum log-likelihood function: To sum up, we train the event extraction model LSLSD in Chinese DTE by minimizing the following loss function: where λ 1 and λ 2 are weights, Θ is the sum of the parameters of the model LSLSD.
(2) For long sentence-level EAs, being recognized by the pattern matching rules that are given in Table 7 is not only resulted from the low detection accuracy of machine learning methods as the EAs contain a generous number of words, but also because the sentence structure of examination and test event corresponding to long sentence-level EA is relatively regular and has distinct syntactic characteristics. For example, pathological test (PT) event "On 3 June 2015, postoperative pathology showed: colon cancer, infiltrating to the upper serosa, perirectal . . . adenocarcinoma metastasis", where the ETW is "pathology" and the EAs are time argument "3 June 2015", disease argument "colon cancer" and test result argument "infiltrating to the upper serosa . . . adenocarcinoma metastasis". From the dependency syntactic analysis of the event sentence, Figure 4a, time argument and "showed" have adverbial structure, while "showed" has the subject-predicate structure with ETW "pathology"; the disease argument has the verb-object structure with "showed"; the long sentence-level test result argument contains all words in the recursive juxtaposition structure with disease argument. The logical expression of the rule is shown in Table 7-E1. The dependency syntactic analysis of immunohistochemistry (IHC) event "On 12 April 2016, underwent immunohistochemistry, reported as KI50%, ER60%, . . . , SYN+, CK10%" is shown in Figure 4b, where the ETW is "immunohistochemistry" and the EAs are time argument "12 April 2016" and test result argument "KI50%, ER60%, . . . , SYN+, CK10%". The logical expression of the rule for such sentence pattern is shown in Table 7-IHC1. "CT", "4 September 2016", "esophageal cancer" and "showing local thickening of midesophageal, uneven thickening of the lower esophagus" in CT examination event "On 4 September 2016, take CT showed: esophageal cancer, local thickening of middle esophageal, uneven thickening of lower esophagus" are the ETW, time argument, disease argument, and examination result argument, respectively, and its dependency syntactic analysis and logical expression of rules are shown in Figure 4c and Table 7-E1, respectively.  (2) For long sentence-level EAs, being recognized by the pattern matching rules that are given in Table 7 is not only resulted from the low detection accuracy of machine learning methods as the EAs contain a generous number of words, but also because the sentence structure of examination and test event corresponding to long sentence-level EA is relatively regular and has distinct syntactic characteristics. For example, pathological test (PT) event "On 3 June 2015, postoperative pathology showed: colon cancer, infiltrating to the upper serosa, perirectal…adenocarcinoma metastasis", where the ETW is "pathology" and the EAs are time argument "3 June 2015", disease argument "colon cancer" and test result argument "infiltrating to the upper serosa…adenocarcinoma metastasis". From the dependency syntactic analysis of the event sentence, Figure 4a, time argument and "showed" have adverbial structure, while "showed" has the subject-predicate structure with ETW "pathology"; the disease argument has the verb-object structure with "showed"; the long sentence-level test result argument contains all words in the recursive juxtaposition structure with disease argument. The logical expression of the rule is shown in Table 7-E1. The dependency syntactic analysis of immunohistochemistry (IHC) event "On 12 April 2016, underwent immunohistochemistry, reported as KI50%, ER60%, …, SYN+, CK10%" is shown in Figure 4b, where the ETW is "immunohistochemistry" and the EAs are time argument "12 April 2016" and test result argument "KI50%, ER60%, …, SYN+, CK10%". The logical expression of the rule for such sentence pattern is shown in Table 7-IHC1. "CT", "4 September 2016", "esophageal cancer" and "showing local thickening of mid-esophageal, uneven thickening of the lower esophagus" in CT examination event "On 4 September 2016, take CT showed: esophageal cancer, local thickening of middle esophageal, uneven thickening of lower esophagus" are the ETW, time argument, disease argument, and examination result argument, respectively, and its dependency syntactic analysis and logical expression of rules are shown in Figure 4c and Table 7-E1, respectively.

Event Types
Logical Expression

Dataset
Annotating the data from 1000 HPI texts of Chinese EMRs in a top-three hospital in Shanghai by the semi-automatic annotation method proposed in Section 3.1, and the data set of Chinese medical event (CME_1000) is obtained, which contains 7035 event sentences. The number of ETWs and EAs in different event sentences is shown in Table 8; "#" represents the number. Here, 70%, 20%, and 10% of the CME_1000 are randomly adopted as the training set, test set, and validation set, respectively, and there are no duplicate items in these three sets.  Training  700  16,071  4822  4822  14,296  Validation  100  2194  728  728  1681  Test  200  3528  1485  1485  3607  Total  1000  21,793  7035  7035 19,584

Details
There are several comparative experiments designed for the subtask of event trigger word detection and event argument recognition of event extraction; the experiment results are evaluated by the index value of precision (P), recall (R), and F1-value.
In the event trigger word detection module, the software package LibSVM [34] is adopted to train and test the SVM; select RBF kernel for K x q , x j in the formula (2); the constant value coefo of kernel function is 0; the penalty coefficient c is 1.
In the event argument recognition module, the neural network framework Tensorflow 2.0 (https://www.tensorflow.org/install (accessed on 13 October 2019)) is employed for model training and the pre-training language model adopts BERT_base. Adam optimizer is used in the BiLSTM layer, and the batch size, epochs, learning rate, and dropout rate are set to 16, 20, 2 × 10 −5 , and 0.5, respectively.

Results
In the aspect of ETW detection, aiming at proving the positive effect of the proposed extended features fusion of long short-level semantic dependency on the detection results, the model performance of adding extended features of short-level semantic dependency is compared based on the baseline model without fusion of extended features (only feature f 1~f4 are included). The performance of the ETW detection module under different feature combinations on CME_1000 is shown in Table 9. From Table 9, the experiment results of the baseline are not ideal because the entity information in the medical text is not fully utilized. After adding the entity type feature f 5 (T) of intra sentence, the classification results of all kinds of events are improved because different entity types only appear in specific event types. For example, the P and F1 value of the operation event increased by 5.3% and 3.61%. This is because the ETW of operation event often appears in chemotherapy, test, and examination event, and the entities of operation type only appear in admission, diagnosis, and operation event, ensuring the entity type feature T of intra sentence can improve the ETW detection results of operation event. Moreover, after adding the number feature f 6 (N) of different entities of inter sentences, the average F1 value for all events increased from 81.07% to 82.45%. For example, the F1 value of the examination event is increased by 3.44%. This is because the feature N can effectively distinguish test and examination events with similar entity type distribution.
In the aspect of EA extraction, our model LSLSD is compared with word-based C-BiLSTM and char-based C-BiLSTM proposed by Zeng et al. [27] to prove the better performance of LSLSD in EA extraction. As shown in Table 10, the short word-level arguments and long sentence arguments of the corpus are better captured by LSLSD, which joins sequence annotation and pattern matching. In the EA recognition task, the F1 value of LSLSD is 6.5% and 5.1% higher than word-based C-BiLSTM and char-based C-BiLSTM respectively. The F1 value of LSLSD is 11.2% and 8.2% higher than word-based C-BiLSTM and char-based C-BiLSTM, respectively, in the EA classification task.

Discussion
In terms of ETW detection, the low accuracy of ETW detection is directly reduced by the ETWs that infrequently appear in the candidate ETW dictionary that exists in HPI texts. The accuracy of ETW module is improved by setting a two-stage experiment; the first stage determines whether each candidate ETW, verb, and gerund are ETW through direct binary classification by using features of word, part of speech, and the number of different entities type. The second stage determines the type by multi-classification for the ETWs selected in the first stage. As shown in Table 11, the F1 value of the two-stage ETW detection has been significantly improved under the same feature combination. For example, compared with the result in Table 10, the F1 value of the chemotherapy event increased from 89.91% to 91.48% through the two-stage method under the feature combination of "Baseline + T". In addition, the event sentence with the accuracy of TWE classification is less than 40% by fusing extended features of short-level semantic dependence named untrusted event sentence; otherwise, it is a trusted event sentence. The untrusted event sentences are classified a second time by adding long-level semantic dependence features f 7 and other extended features of enter sentences to improve the accuracy of ETW detection. From the ETW detection results under different features combinations in Table 12, the result of the "P1 + R1 + B + N + PN + RN" combination is the best. Compared with the two-stage ETW detection method, the R and F1 value of experiment with document consistency features are improved from 79.03% to 80.61% and from 82.23% to 83.57%, respectively, which indicate that document consistency features can supplement the information difficult to extract on the long sentence-level and improve the performance of the model. The trend of F1 value change of different events under different feature combinations are shown in Figure 5, in which the entity type feature T of intra sentence significantly improves the ETW detection effect on chemotherapy, treatment, operation, and examination events, and the number feature N of different entities of inter sentences significantly improves the ETW detection effect on the pathological test, immunohistochemistry, operation, and examination events. The two-stage experiment had a great influence on treatment, operation, examination, and chemotherapy events, while the document consistency features S improved the results of ETW detection of all event types.  9 RN are the document consistency feature, where P1 and R1 represent the event type of the previous and latter sentence respectively; PT and RT, respectively, represent the feature (T) of entity type in the previous and latter sentence; PN and RN are the number feature (N) of different entity types in the previous and latter sentence; and PTN and RTN represent the extended features of intra sentence in the previous and latter sentence, respectively. In terms of EA extract, the influence of different word vector pre-training models of Word2Vec, ELMO, and BERT on the recognition of ETW and EA are compared in this paper; as shown in Table 13, the effect of BERT is better than that of Word2Vec and ELMO, which is because Word2Vec does not consider the influence of context information on the word vector when constructing the text vector so that the word vector generated by the same word is invariant. ELMO employs stacked BiLSTM to make the generated word vectors contain more context information of the vocabulary, so the ELMO is better than Word2Vec. The model effect of using BERT is better than ELMO and Word2Vec, due to the generated text sequence of BERT having deeper context information and lexical-semantic by feature extractor of transformer. The experiment results of removing the CRF model are obtained to highlight the positive effect of CRF on improving the accuracy of label prediction by adding constraint rules, and the results show that the effect of using BERT-BiLSTM along is far less than that of adding the CRF model.  In terms of EA extract, the influence of different word vector pre-training models of Word2Vec, ELMO, and BERT on the recognition of ETW and EA are compared in this paper; as shown in Table 13, the effect of BERT is better than that of Word2Vec and ELMO, which is because Word2Vec does not consider the influence of context information on the word vector when constructing the text vector so that the word vector generated by the same word is invariant. ELMO employs stacked BiLSTM to make the generated word vectors contain more context information of the vocabulary, so the ELMO is better than Word2Vec. The model effect of using BERT is better than ELMO and Word2Vec, due to the generated text sequence of BERT having deeper context information and lexical-semantic by feature extractor of transformer. The experiment results of removing the CRF model are obtained to highlight the positive effect of CRF on improving the accuracy of label prediction by adding constraint rules, and the results show that the effect of using BERT-BiLSTM along is far less than that of adding the CRF model. In addition, the ablation experiment is designed to better study the influence of the four extra input vectors of BiLSTM on the EA recognition task. This experiment utilizes the sequence annotation model of BERT-BiLSTM-CRF for all event types, the experiment results are shown in Table 14, where D represents the distance vector between the word and the ETW, and C and Y represent the type vector of the ETW and the dependency syntactic information vector, respectively. From Table 14, the addition of four types of vectors improves the performance of the model in varying degrees, which proves the effectiveness of the proposed feature vectors. It is noted that the experimental effect is higher than that of BERT-BiLSTM-CRF when the dependency syntactic information vector Y is added into the combination of the distance vector D, type vector C of ETW and entity type vector T of intra sentence, but it is slightly lower than before. The same situation occurs again for the vector combination of DC and DCY, which illustrates that the addition of dependency syntactic information vector will harm the combination of other features that may be caused by the characteristics of syntactic structure is not obvious due to the different writing styles of different doctors, although the medical text has a certain degree of standardization. Compared with the BERT-BiLSTM-CRF, the F1 values of the best combination DCT among all vector combinations on EA recognition and EA classification tasks are improved by 5.9% and 4.5%, respectively.

Conclusions
A novel model for event extraction on Chinese history of present illness texts by fusing its long short-level semantic dependency is proposed in this paper, and the experiment results demonstrated that the proposed extended features with implicit information of intra-sentence or inter-sentences help improve the performance of LSLSD. Meanwhile, LSLSD has excellent generalization performance, due to the fact that the event arguments recognition module divides event arguments into the long sentence-level and short wordlevel, and it recognizes event arguments by the method of joint sequence annotation and pattern matching.
We also noticed in the experiment results analysis that, overall, the dependence syntactic feature proposed in this paper has negative effects on some event arguments recognition. The reason for this effect is that the HPI text data set is small; on the other hand, the writing styles of different clinicians are different. Further improvement might be made by expanding the scale of Chinese HPI texts, to explore more profound semantic features. We will study advanced deep learning algorithms to improve the accuracy of event argument recognition.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The electronic medical record data obtained from the hospital has not been made available due to privacy concerns.