A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification

ALSC (Aspect-level Sentiment Classification) is a fine-grained task in the field of NLP (Natural Language Processing) which aims to identify the sentiment toward a given aspect. In addition to exploiting the sentence semantics and syntax, current ALSC methods focus on introducing external knowledge as a supplementary to the sentence information. However, the integration of the three categories of information is still challenging. In this paper, a novel method is devised to effectively combine sufficient semantic and syntactic information as well as use of external knowledge. The proposed model contains a sentence encoder, a semantic learning module, a syntax learning module, a knowledge enhancement module, an information fusion module and a sentiment classifier. The semantic information and syntactic information are respectively extracted via a self-attention network and a graphical convolutional network. Specifically, the KGE (Knowledge Graph Embedding) is employed to enhance the feature representation of the aspect. Then, the attention-based gate mechanism is taken to fuse three types of information. We evaluated the proposed model on three benchmark datasets and the experimental results establish strong evidence of high accuracy.


Introduction
The aspect-level sentiment classification, as a fine grained sentiment analysis task, is widely considered as a main focus in the field of natural language processing.In ALSC tasks, the sentiment polarity of a given aspect in a given text is classified as either positive, neutral or negative [1].As an example, in the sentence 'the ambience was nice, but service wasn't so great', the sentiment of the two discussed aspects, 'ambience' and 'service', are predicted as positive and negative, respectively.In practice, ALSC has become an effective approach to identify opinions and preferences towards products, stock and anything in the world.
Currently, most methods involving ALSC are performed using the following steps: sentence encoding, syntax dependency tree constructing, syntactic information capturing via graph convolution network (GCN) [2], semantic information extracting based on attention mechanism, information fusion and sentiment classification.So much is the effectiveness of attention networks in attentive weights distribution, a number of studies show their superiority in ALSC tasks [3][4][5].Notwithstanding, for a long distance between aspect and its dependency-words, more weight may be assigned to irrelevant words.On this occasion, the establishment of the relation between aspect and its opinion words is thus proposed, which exploits the sentence syntax dependency tree [6]. Figure 1 shows the syntax dependency tree of a given sentence.One can easily see that the syntactical-related words to the aspect, such as 'nice' and 'great', have impressive effects on sentiment polarity prediction.In spite of the significance of syntax structure, the ALSC for informal grammar styles (e.g., colloquial comments, slang language, etc.) remains challenging.In these cases, the connection between aspect and opinion words can be confusing.Thereby, the extracted syntax can even become noise, which results in the misunderstanding of the sentiment.
The ambience was nice but service was not so great Encouragingly, according to recent publications, external knowledge is also employed to enhance the information of aspect for ALSC [7].Generally, the exploiting of external knowledge is carried out by searching the information related to the given aspect.That is, the aspect is taken as the central node of the knowledge graph, based on which the subgraphs are built up using its neighbor nodes.In such a manner, the selection of the neighboring nodes is highlighted.The distinctiveness of the external knowledge is mainly restricted by the selection method.Further, for the searched knowledge of substantial distinction, the selected nodes must be revised to a large extent.Moreover, when dealing with the knowledge graph, most of the previous methods used graph neural networks such as the graph convolutional network to search the knowledge graph nodes, which is inefficient.
In Consideration of the aforementioned issues, we propose a method that integrates the sentence semantics and syntax as well as the external knowledge toward the aspect.In order to fully extract the sentence information, the semantic relation between aspect and its contexts is built.Likewise, the connection of opinion words to the aspect is set up.With respect to external knowledge, the knowledge graph embedding (KGE) [8] is employed to obtained the knowledge embeddings of the aspect which makes it more efficient to deal with the knowldege graph.In addition, a fusion module is devised to incorporate the relevant external information and the sentence information for sentiment classification.The contributions of this paper are threefold and summarized as follows:

•
The external knowledge is effectively applied to enhance the aspect information, which is also supplementary to the sentence information.

•
An information fusion approach is dedicatedly designed to integrate different types of information for ALSC.

•
Comparing with the state-of-the-art methods, experimental results on three benchmark datasets corroborate the competitiveness of the proposed methods.
The rest of this paper is organized as follows: we review the recent studies on ALSC methods and the KGE applications in Section 2. Section 3 presents the proposed model in detail.In Section 4, experiments are carried out to investigate the working performance of our model.Finally, concluding remarks of this work are given in Section 5.

Aspect-Level Sentiment Classification
Early deep-learning based ALSC methods generally concentrate on extracting the contextual semantics by using the integration of a RNN (Recurrent Neural Network) and attention mechanism [9].In terms of multiple aspects, the sentiment polarity determination via only semantic information becomes insufficient.In addition to the semantic-based models, the exploiting of sentence syntax is one such approach as well.The relation between an aspect and its opinion words can be conveyed by a syntax dependency tree.Because of the graph structure of dependency trees, graph neural networks [10] are employed to cope with the syntactic information.Distinctively, the graph convolutional network is most pronounced for processing graph structured data in a variety of tasks.In terms of ALSC, GCN-based models are capable of not just aggregating and delivering information among neighboring nodes, but also of extracting features and syntactic information of the graph.Zhao [11] takes a GCN to model the sentiment dependencies between aspect words, and thereby captures the sentiment relationships of multiple aspects in a sentence.Zhang [12] characterizes the sentence using a syntax dependency tree, and extracts syntactic information via the GCN.Furthermore, aiming to distinguish the importance of each node in the graph, the attention mechanism is integrated into GCN-based methods.To comprehensively understand the relation between aspect and its opinion words, Tian [13] exploits the attention mechanism to assign the attention weight to each word syntactically connected with the aspect word, based on which the syntactic information can be precisely extracted by GCN.By constructing an aspect-centered syntax dependency tree, Wang [14] focuses on identifying each node using a graph attention, and thus aggregating information from neighboring nodes.

Semantics and Syntax
Since both semantics and syntax have their own advantages and disadvantages, some recent research solves ALSC by combining these two pieces of information together.Zhang et al. [15] propose an aspect-aware attention mechanism combined with self-attention to obtain the attention score matrices of a sentence, which can not only learn the aspect-related semantic correlations, but also learn the global semantics of a sentence.Bie et al. [16] propose an end-to-end ABSA model, which fuses the syntactic structure information and the lexical semantic information, to address the limitation that existing end-to-end methods do not fully exploit the textual information.Zhang et al. [17] also analyze sentences both syntactically and semantically, and they propose a simple and effective fusion mechanism to make the integration of aspect information and context information more adequate.Some researchers also utilize GCN to capture the neighbor's information [18][19][20].However, this research generally ignores that the sentence may not be well formed, and that slang language and informal writing can be found in most user-generated content.As a result, more information is required to help in these situations.

Knowledge Graph
A knowledge graph involves a great number of entities and their relationship types.The application of a knowledge graph is carried out in a variety of domains, such as education [21], medicine [22], cybersecurity [23], etc.More recent work validates the significance of the knowledge graph in natural language processing [24].As such, the utilization of the knowledge graph is currently a main focus in NLP tasks.This also gives rise to new opportunities for its use in ALSC.Zhou [25] has devised a GCN-based method that combines syntactic information and external knowledge.Liang [26] introduced knowledge from the SenticNet knowledge base, thus enhancing the information about aspectual word sentiment in this context.However, these approaches generally ignore the inefficiency of the GCN-based method when dealing with the knowledge graph.
Knowledge graph embedding (KGE) is a creative and practical method for introducing the knowledge graph.Theoretically, KGE aims to represent both complex and sparse entity relationship types with low-dimensional and continuous embeddings, which facilitates the computation of introduced knowledge.KGE is currently a widely-used approach in question answering [27], semantic retrieval [28] and recommendation systems [29].Early KGE methods, such as TransE [30], and TransH [31], consider the "relationship" as the interpretation between head and tail entities.Furthermore, advances in deep-neural networks have optimized the working performance of KGE.The state-of-the-art KGE methods, such as ConvE [32] and CapsE [33], are developed based on capsule neural networks, which obtain the feature and calculate the credibility of a triplet through convolutional layers.

Methodology
Figure 2 shows the architecture of the proposed model.There are five main components, namely the sentence encoder, semantic learning module, syntax learning module, knowledge enhancement module, information fusion module and sentiment classifier.More details of each component are presented as follows.

Sentence Encoder
Let x = w s 1 , w s 2 , . . ., w t m , . . ., w t m+l , . . ., w s n be an n-word sentence containing the aspect.Each word is mapped into a low-dimensional vector by looking up in a pretrained word embedding matrix.We can thus obtain the sentence embedding.
Then, the hidden state of the given sentence is extracted via Bidirectional-Gate Recurrent Unit (Bi-GRU) which outperforms other methods in extracting the long-term information of a sentence.As a result, we use Bi-GRU to encode the sentence for further processing.The forward and backward hidden states of the sentence are delivered as − →

Semantic Learning Module
The semantic learning module is mainly developed to establish the semantic relation between aspect and its context.With the input sentence representation, in order to corcapture the semantic relation between aspect and its context, we proposed two attention mechanisms.The self-attention mechanism is first performed to obtain the contextual dependency of the given sentence.Subsequently, the aspect-specific attention mecha-nism is carried out to determine the relation between the aspect and context.Concretely, the attention weights of each context word is computed: where W k and W q are trainable parameter matrices and d k is the dimension of input vector.
Based on the attention weight, the hidden state in relation to the aspect can be derived, which is: where H Sel f Att represents the outcome of the self-attention network and H a is the hidden state of the aspect word output from Bi-GRU.We take H se as the semantic representation for further processing.

Syntax Learning Module
Syntax can be seen as a supplement of semantics and it has shown to be helpful in sentiment classification.So, to fully extract sentence information, syntactic information is necessary.With respect to the syntactic information, the syntax dependency tree of the given sentence is built in advance.In the syntax learning module, the syntax dependency tree is transformed to the graph G sy = H GRU , A sy to facilitate processing.Notably, H GRU is the feature matrix derived from Bi-GRU, while A sy is the adjacency matrix of the syntax dependency tree.
We employ GCN to extract the syntactic information of the sentence, which can be written as: with where H sy(l+1) stands for the output of the l-th layer in the GCN.The initial H sy(0) is the output from Bi-GRU.A sy represents the adjacency matrix with self-circulation.W sy(l+1) is the learnable-parameter-matrix of the l-th layer.
With the convolution of each layer, the information of every single node is aggregated from its neighboring node, based on which the node information can be updated during the iterative computation of the GCN.Thus, the syntax representation is the output of the GCN after the last iteration.

Knowledge Enhancement Module
For the purpose of the aspect feature, supplementary, external knowledge is leveraged to enhance the information of the aspect.Specifically, we use Freebase [34] as an external knowledge base, which contains a large number of words together with various semantic relations.
For a word beyond comprehension, one can search for known information involved with this word for better understanding.In such a manner, the external knowledge can be applied to complement information related to the aspect during learning.
In most user-generated content, informal writing, such as errors in spelling and grammar and slang language, can be found.On this occasion, the exploiting of external knowledge makes a contribution to the determination of sentiment polarity.For instance, the sentence 'check out these songs!Especially that amazing rock one' contains an aspect word 'songs'.Syntactically, there is no explicit opinion word in direct relation to the aspect 'songs' for sentiment classification.For this reason, external knowledge can be introduced, based on which the relation between 'songs' and 'rock' is set up.That is, the word 'rock' indicates a type of song, which is a subordinate of 'songs'.Seeing that the opinion word toward 'rock' is 'amazing', the sentiment polarity is identified as positive.In this way, the sentiment polarity of the aspect 'songs' is similar to that of 'rock'.
In the knowledge enhancement module, we introduce the knowledge graph and take KGE to tackle the external knowledge from Freebase.Notably, most state-of-the-art methods employ GCN to encode the external knowledge.Whereas, a certain amount of external knowledge bases contain heterogeneous graphs, which is challenging for the GCN to deal with.In our model, the external knowledge is mapped into a continuous vector space using KGE, which is more efficient.The enhancement of aspect is conducted by computing the weights between aspect words and the knowledge embeddings.
On this occasion, we select DistMult [35] as the KGE of the proposed model.Every single entity within the knowledge graph base is delivered as: where f stands for either a linear or nonlinear function.W is the parameter matrix.x e is a vector that represents an entity.Notably, the relationship representation is typically obtained from the score function.DistMult takes the basic bilinear score function as: where the relation matrix M r is a diagonal matrix whilst y e1 and y e2 are the vector representations of entities x e1 and x e2 , respectively.The aspect-based knowledge embedding H kg can be obtained by computing the attentive weight between the aspect and its knowledge embedding:

Information Fusion Module
Since we have gained different kinds of information including syntactic information, semantic information and external knowledge information, how to effectively combine these three kinds of information is of vital importance.The information fusion module is devised to make full use of the syntactic information, the semantic information and the external information.Both the syntax and the semantics can be considered as sentence information while the external knowledge is the supplementary.During information fusion, each type of information has to be controlled within a certain extent to prevent the introduction of noise.Therefore, we shall compute the attention weights of syntactic information toward the other two types of information.The attention weight between H sy and H se is expressed as: Likewise, the attention weight of H sy and H kg is: Then, two gating units are established to filter the noise from the input information, which are: where W k , W s , b k and b s are trainable parameters of the proposed model.The aspect-related sentence representation is computed using cross product operation:

Sentiment Classifier
The sentence representation H is sent to the sentiment classifier for sentiment polarity classification.A fully connected layer is developed to obtain the score for each sentiment polarity.The final sentiment probability distribution of the aspect is determined using a SoftMax classifier, which is written as: where W T 1 and b 1 are trainable parameters, and y is the predicted sentiment polarity.The training of the proposed is conducted using the cross entropy and regularization as the loss function, i.e.
where i represents the i-th sample while j represents the j-th sentiment polarity.N is the number of sentiment polarities.y is the real distribution of sentiment and y is the predicted one.

Dataset
In this experiment, three publicly available benchmark datasets are used for working performance evaluation, which are Laptop14 and Restaurant14 from SemEval2014 [36] and Twitter [37].All the samples in the experiment are labeled as three different polarities, i.e., positive, neutral and negative.Each sample is a review sentence with the tagged aspect within it.Details of each dataset are exhibited in Table 1.

Implementation Details
The initialization of sentence embeddings is conducted using both Glove [38] and Bert [39].The batch sizes of Restaurant14, Laptop14 and Twitter are 32, 64 and 32, respectively.The learning rates of the Glove-based model and BERT-based model are separately set to 1e-3 and 2e-5.In addition, the Adam optimizer is adopted during model training.

Baseline Methods
Aiming to corroborate the working performance of the proposed model, seven stateof-the-art methods are taken for comparison.
Syntax-and semantic-based methods: • BiGCN [40]: Two graphs, i.e., a global lexical graph and a concept hierarchy graph, are constructed.A bi-level interactive GCN is established to deal with these graphs.• R-GAT: An aspect-oriented dependency tree is constructed, which is encoded by a relational graph attention network.
• AFGCN [41]: An aspect fusion graph is constructed based on the syntax dependency tree, which captures the aspect-related context words.

•
InterGCN [42]: To capture the relation between multiple aspect words, an inter-aspect GCN is devised on the foundation of the AFGCN.
KG-based methods: • SK-GCN: A two-GCN-based model that deals with the syntax dependency tree and knowledge graph, respectively.• Sentic GCN: The external knowledge from SenticNet is introduced to the GCN, which enhances the sentiment dependency between aspects and their contexts.

Experiment Results
Table 2 shows the experiment results on all datasets.As presented in Table 2, the proposed model outperforms the-state-of-the-art methods on the datasets Restaurant14 and Twitter.Notably, there is a considerable gap between our model and the baselines.The minimum accuracy gaps of the Glove-based model and Bert-based model are 3.57% (versus SK-GCN) and 3.15% (versus RGAT+BERT), which are significant.The main reason is that the introduction of external knowledge from Freebase provides a large amount of semantic information and relationships.With the enhancement of external information toward the aspect, the sentiment classification performance can be optimized.With respect to Laptop14, the working performance of the Sentic-GCN model is slightly better than the proposed method.One possible explanation for this is that the syntactic structure plays a more important role in the sentiment determination in sentences from Laptop14.The utilization of SenticNet [43] brings information to the adjacency matrices.In this way, the syntactic information can be extracted via graph convolution.Moreover, the pre-training of Bert further provides an improvement to the ALSC results.Since the proposed model is capable of integrating the sentence semantics, the sentence syntax and the external knowledge, we can thus expect better sentiment classification results with information supplementary on each other.

Impact of GCN Layer Number
An GCN is a key component in the syntax learning module for syntactic information encoding.On this occasion, we tend to explore the optimal GCN layer number for ALSC.The number of GCN is set to 1, 2, 3, 4 and 5, respectively.According to Table 3, the GCN layer number of 2 obtains the best result in all evaluation settings.Comprehensively, the configuration of the GCN determines the amount of contextual information that is aggregated toward the aspect.It is clear that a one-layer GCN fails to capture sufficient syntactic information from the sentence.When the GCN layer number ranges from 3 to 5, the working performance of our model declines with the increasing number of layers.As such, there are two main considerations.Firstly, the connected context words increase in line with the increment of layer number, based on which the syntactic noise is introduced.Secondly, after multi-layer graph convolution, the nodes become less distinguishable whilst the node representation vectors tend to be consistent, which results in the over-smoothing problem of multi-layer GCN.TransE, TransR and TransH have minor accuracy compared with DistMult.The reason for this is that these three translation models determine the word relationship by using head and tail entities, rather than semantic information.By contrast, DistMult uses bilinear methods, which are capable of computing the semantic credibility of entities and relationships within vector space.That is, the introduction of semantic information results in the incorporating of external knowledge, and thus a better sentiment classification accuracy.

Run Time and Parametric Amount
To further evaluate the efficacy of the proposed model, the run time for training and testing, as well as the size of the parametric quantities of different methods are compared, see Table 5.Both SK-GCN and our model take advantage of the knowledge graph.Our model has a better performance in not only run time, but also the parameter amount.In this way, our model shows its superiority over the GCN-based method in dealing with knowledge graphs.On the other hand, the run time of BiGCN and the proposed model is comparable, but the test accuracy of our model is far better than RGAT and BiGCN, which indicates a higher working efficiency.

Case Study
The visualization of attention weights distribution of a given sentence is presented in Figure 3. Words in the darker color are of greater weight, and vice versa.The former is processed by integration of the semantic learning module and syntax learning module, while the latter incorporates the external knowledge as well.According to Figure 3, more attention is given to the words that are close to the aspect by using only sentence-related information.One can easily see that the opinion word 'love' to aspect 'drinks' obtains a higher attentive weight, which is the same with 'great' to 'food'.However, for the aspect 'lychee martini', few syntactic-or semantic-related words are identified via the semantic learning module and syntax learning module.The introduction of external knowledge facilitates the sentiment word determination of 'lychee martini', which contributes to the sentiment classification.

Conclusions
In this work, we propose a model that integrates semantics, syntax and external knowledge on the task of ALSC.Aiming to sufficiently incorporate the external information into aspect words, we employ the KGE and aspect-specific attention mechanism to enhance the aspect features.Further, a semantic-learning module and a syntactic-learning module are devised to extract the sentence information.In addition, an information fusion module is established to integrate three types of information for sentiment classification.Experiments are carried out on three benchmark datasets.Our model is the best-performing method compared with the baselines.
Further work will focus on more details of the knowledge graph processing.The loss of graph structural information is still a question that in suspense.

Table 1 .
Statistics of datasets.

Table 3 .
ALSC accuracy in line with GCN layer numbers.We employ four distinguishing KGE methods and investigate their effectiveness in external knowledge enhancement.Table4exhibits the ALSC results of the Glove-based model of different KGEs.

Table 4 .
ALSC results of different KGE methods.

Table 5 .
Results of run time and the parameter amount of different methods.