Knowledge-Enhanced Dual-Channel GCN for Aspect-Based Sentiment Analysis

: As a subtask of sentiment analysis, aspect-based sentiment analysis (ABSA) refers to identifying the sentiment polarity of the given aspect. The state-of-the-art ABSA models are developed by using the graph neural networks to deal with the semantics and the syntax of the sentence. These methods are challenged by two issues. For one thing, the semantic-based graph convolution networks fail to capture the relation between aspect and its opinion word. For another, minor attention is assigned to the aspect word within graph convolution, resulting in the introduction of contextual noise. In this work, we propose a knowledge-enhanced dual-channel graph convolutional network. On the task of ABSA, a semantic-based graph convolutional netwok (GCN) and a syntactic-based GCN are established. With respect to semantic learning, the sentence semantics are enhanced by using commonsense knowledge. The multi-head attention mechanism is taken to construct the semantic graph and ﬁlter the noise, which facilitates the information aggregation of the aspect and the opinion words. For syntactic information processing, the syntax dependency tree is pruned to remove the irrelevant words, based on which more attention weights are given to the aspect words. Experiments are carried out on four benchmark datasets to evaluate the working performance of the proposed model. Our model signiﬁcantly outperforms the baseline models and veriﬁes its effectiveness in ABSA tasks.


Introduction
Aspect-based sentiment analysis (ABSA) is a sentiment classification task that aims to identify the sentiment of given aspects [1]. Within ABSA, the sentiment of each aspect is classified according to a predefined set of sentiment polarities, i.e., positive, neutral or negative [2]. In recent years, ABSA yields very fine-grained sentiment information, which is useful for applications in a variety of domains [3].
In the context of advancing deep neural networks, state-of-the-art ABSA methods report high accuracy and strong robustness on benchmark datasets. During the progressing stage in ABSA tasks, efforts are generally made in two directions: one is to enhance significant information from the given text and the other is to filter the irrelevant information and its impact. A major step toward the comprehension of semantic information is the integration of attention mechanism with deep neural networks [4][5][6]. More attentive weights are assigned to aspect-related words, based on which to classify the sentiment polarity. Nevertheless, it can be challenging to capture syntax dependencies between the aspect and its contexts for attention-based models. More recently, research on graph neural networks (GNNs) has given rise to dealing with the syntactic information from dependency trees, a manner in which to prevent the syntactically irrelevant contextual noise [7][8][9]. The widespread GNNs, such as graph convolutional networks (GCNs) and graph attention networks (GATs), are capable of encoding both the semantics and the syntax. This has been an ongoing trend to incorporate syntactic information and semantic information into GNN-based models [10][11][12].
In spite of the collaborative exploiting of syntax and semantics, two main limitations can be observed : (1) For one thing, GNNs are generally used for tackling global syntactic information, while the mask operation is lastly performed to conceal the context words. Thereby, the sentiment of the aspect is determined. In practical application, the contextual noise can be introduced, which results in minor importance given to the aspect words.
(2) For another, the semantic-based GNNs are typically built up based on attention weights. With respect to the delicate relationship between aspects and opinion words, more attention is assigned to other words instead of the sentiment words. This can further confuse the sentiment aggregation. As presented in Figure 1, in the sentence 'Meal is very expensive for what you get', the aspect 'meal' and its opinion word 'expensive' are semantically insensitive.
Meal was very expensive for what you get.
The menu may be small, but everything on it is delicious.
The menu may be small, but everything on it is delicious.
Meal was very expensive for what you get. On the task of ABSA, this work focuses on establishing a Knowledge-Enhanced Dual-Channel Graph Convolutional Network (KDGCN). Two GCN-based modules, referred to as syntax-based GCN and semantic-based GCN, are developed to separately deal with the syntax structure and the semantic information. On the one hand, the syntactical dependency tree of the sentence is pruned to remove the connections of minor relevance to the aspect. Hence, the aspect-oriented syntactic information is sent to the syntaxbased GCN. Besides, the position information and the attention mechanisms are taken to highlight the importance of the aspect. On the other hand, the external knowledge is introduced to enhance the semantic-based GCN. The word sentiment vectors, together with the supplementary of the aspect, are obtained (derived) by using SenticNet (i.e., a commonsense knowledge base); see Figure 2. A multi-headed attention mechanism is carried out to re-assign the attentive weights among words. The sentiment of the opinion words can thus be aggregated to the aspect via the knowledge-enhanced semanticbased GCN.
It has so much more speed and the screen is very sharp . Notably, a certain number of studies leverage the commonsense knowledge to enhance the sentiment expression and classify the sentiment polarity of the aspect [13,14]. Theoretically, the commonsense knowledge is involved with the background materials of the entities under discussion. The commonsense knowledge is preserved in the commonsense bases, such as ConceptNet [15], SenticNet [16] and WordNet [17], and recalled for processing. In most cases, the integration of semantic-related commonsense knowledge can generate noise from external information. Our model aims to exploit the sentencerelated external knowledge, not just the sentiment information of each word, but also the relative knowledge of the aspect. In such a manner, the input of semantic-based GCN is distilled. Accordingly, the more-related information is preserved with the noise removed. The contributions of this paper are threefold and summarized as follows: • Considering the deficiencies of the current ABSA methods, a dual-channel GCN based model is proposed, which processes both the syntax structure and the semantic information. • The external knowledge is incorporated to enhance the semantics of the sentence, while the multi-head attention mechanism is taken to further filter the noise. • Experiments on a variety of datasets indicate the effectiveness of the proposed method. Our model produces results considerably better than the baselines.
The paper is mainly divided into six sections. In the Introduction, we summarize the content of the article in general and propose our solutions for the challenge of the current ABSA task; in the Section 2, we will summarize the research related to our work; in the Section 3, we will introduce our proposed model and each module in detail; in the Section 4, we will conduct experiments on four public datasets and design ablation experiments; in the Section 5, we will further analyze the general situation of the model and the experimental results; in the Section 6, we summarize the full text.

Aspect-Based Sentiment Analysis
As pointed out in the introduction, ABSA is a fine-grained sentiment classification task. Rather than assigning an overall sentiment polarity to a sentence or a document, ABSA aims at precisely determining the sentiment of a certain aspect. Early methods usually rely on manual features when predicting, which cannnot model the dependency relationship between the aspect and its context [18][19][20].
In recent years, advances in deep-learning algorithms significantly improved the working performance of ABSA, while a more detailed analysis of the textual information has risen [21,22]. The integrating of an attention mechanism into deep neural networks highlights the contribution of opinion words towards the aspects. [4][5][6][23][24][25] The relationship between aspect and its opinion words are reliably modeled in attention-based networks. Wang et al. [4] proposed an attention-long short-term memory (LSTM) method to obtain more-related information about a given aspect. Chen et al. [5] devised a hierarchical multi-attention model to address the long-range dependency between aspect and the opinion words. Whereas the attention mechanisms fails to cope with sentence syntax, by contrast, the employment of GCN takes advantages of the syntactic dependencies of the aspect and the opinion words. To be specific, an adjacency matrix is formed based on the syntactic dependent tree, which is further modeled to aggregate the sentiment information to the aspect by GCN [7,8]. Wang et al. [9] eliminated the noise from irrelevant contexts by constructing an aspect-oriented syntactic dependency tree, and then encoded the syntax relation by GNN. More recently, modules of multi-channel-GCNs have been carried out to resolve the syntax and semantics of the given sentence, which effectively optimizes the results of ABSA.
Most studies [7,8] take GCNs to capture the syntactic information of a sentence where the nodes represent the words and the edges indicate the dependencies, which can induce representation vectors of nodes based on their neighborhoods' features. Likewise, the semantic relation within the sentence can also be obtained using GCN. In [10,11], the semantic graph was constructed with edges standing for the attention weights. Therefore, both semantic features and syntactic features can be extracted via GCN-based modules.
Considering a graph as structured data, the multilayers of GCN are responsible for information delivery. As such, every single node within the graph can learn the global . . , v n } is a set of N = |V| nodes and E is the set of edges, and it represents an n−node graph with an adjacency matrix of A ∈ R k×k . In a graph, let v i ∈ V to denote a node and e ij = (v i , v j ) ∈ E to denote an edge between v i and v j .
GCN can only capture information about neighbors with a layer. However, information about more neighborhoods can be integrated when multilayers of GCN are stacked. We define h l i as the output of node i on the l − th layer and h 0 i as the initial state of node i. The graph convolution of node i can be written as: where W l is the weight of linear transformation, b l is the bias and σ is a nonlinear function such as Relu.

Commonsense Knowledge
The commonsense knowledge for NLP is typically obtained through large-scale corpus training and saved in commonsense bases. The commonsense is taken as prior knowledge for the pre-training of knowledge-enhanced approaches. SenticNet [16] is one such commonsense knowledge base, which contains 100k concepts related to sentiment expression. (e.g., mood, polarity, semantics and so on). Additionally, these affective properties provide concept-level representation and semantic connections to the words.
To facilitate access to corresponding knowledge, SenticNet provides an application programming interface. A series of sentiment scores of the word and its related concepts can be obtained from the interface (as shown in Figure 2), which can expand the semantics of the sentence.
The application of SenticNet into ABSA shows its distinctiveness in sentiment representation learning [13,33]. Ma et al. [13] utilized the commonsense from SenticNet to generate essays more closely surrounding the semantics of the input topics. Zhou et al. [14] enlarged the sentence semantics using SenticNet 5, and then jointly modeled the syntactic dependency trees and commonsense graph. Regardless of additional key information, the filter of the noise during the external knowledge introducing remains unsettled.

Methodology
The architecture of KDGCN is presented in Figure 3. Our model consists of five key components, i.e., a sentence encoder, a knowledge enhancement module, a semantic learning module, a syntax aware module and a sentiment classifier. Firstly, each word of the sentence is encoded as a vector by the sentence encoder. At the same time, the sentence is input into the knowledge enhancement module, and the sentiment vector of each word and the expanding words of aspect are obtained from SenticNet; secondly, the hidden state vector of the sentence is sent into a semantic learning module and a syntax aware module, respectively, to obtain the syntactic and semantic representation. Finally, we can obtain the sentiment polarity of the aspect from the sentiment classifier.

Sentence Encoder
Glove embedding. For a sentence c = {w 1 , w 2 , . . . , w n } with the aspects a = {w a1 , w a2 , . . . , w an }, we take the pre-trained embedding matrix E ∈ R |V|×d e to map each word into a low-dimensional vector, where |V| represents the lexicon size and d e is the dimension of the word vector [34].
BERT embedding. BERT [35] is a commonly used sentence encoder in recent years. Each sentence is pre-processed by adding [CLS] at the beginning and [SEP] at the end, respectively, to obtain c = {w 0 , w 1 , . . . , w n+1 }, where w 0 and w n+1 denote the two special tokens inserted. Then, c is fed into BERT to obtain the textual feature representation A Bidrectional LSTM (Bi-LSTM) is employed for sentence encoding. The given sentence embedding is sent to Bi-LSTM to generate the hidden state vector H LSTM = {h 1 , h 2 , . . . , h n }. Specifically, the vector H LSTM ∈ R 2d h is the hidden state at a time step and is the hidden state vector dimension of LSTM.

Knowledge Enhancement Module
Word sentiment enhancement: For the given sentence c, the sentiment vector of each word can be obtained based on the commonsense from SenticNet. A 23-dimensional sentiment vector H LSTM ∈ R 23 hat represents the sentence that is derived. Besides, for the words that do not appear in SenticNet, the zero-vector is used instead. Then, H LSTM and H sen are fused to obtain the sentence representation, which is: with H c ∈ R 2d h +23 . Aspect knowledge enhancement: In terms of the aspects a, the relative words of each word within a is collected from SenticNet, i.e., {w ex1 , w ex2 , . . . , w exn }. For the purpose of word supplementary, the first five words in relation to the aspect are used. All the relative words are also mapped to word embeddings and encoded with the Bi-LSTM encoder.
where H LSTM ex stands for the hidden state vector of Bi-LSTM, and H sen ex is the corresponding sentiment vector. The aspect expanding vector is denoted as H ex ∈ R 2d h +23 .
Notably, since the word co-occurrence in the corpus has an impact on the word embedding of glove, to prevent the noise fusion, the aspect relative words are not pretrained by glove. We take a unk for relative words that are absent from the given texts. Similarly, the absent-words of SenticNet are taken in place of zero.

Semantic Learning Module
Motivated by [10], most short sentences are of confused syntactic structure. That is, the rigid extraction of syntactic information can lead to the misinterpretation of the sentiment information. For this reason, a semantic learning module based on GCN is proposed to capture the semantic information among words. Both the enhanced sentiment vector and the aspect expanding vector are sent to the semantic learning module, which aims to further enrich the semantic information.
Node construction: Each word w i from the sentence, together with each aspect relative word w exi , is taken as a node. All nodes constitute a node set V.
Edge construction: The edge indicates the relationship between word nodes. Concretely, two semantic-related nodes are connected with an edge and vice versa. To capture the semantic relation of each word, we employ K − heads multi-head self-attention mechanism to compute the attention weight, i.e., where H where H se ∈ R 2d h +23 is the commonsense-enhanced hidden layer output; K is the head number of multi-head attention mechanism; W se,k and W se,q ∈ R (2d h +23)×d head are trainable matrices. Subsequently, based on the top-k selecting approach, the largest k values of each dimension are selected and set to 1, while others are set to 0. Hence, the adjacency matrix A se is obtained; see Equation (7). Corresponding to the edge construction principle, the adjacency matrix with value 1 denotes the semantic relevance between nodes. Notably, the A se remains symmetric with the application of the top-k selector.
Thereby, a graph G sem = (A se , H c ) that concerns the node representations and the adjacency matrix is constructed. The graph is fed into the N-layer GCN to obtain the hidden layer state H se : where H (l) se ∈ R (2d h +23)×d gcn stands for the parametric matrix of GCN. The mask operation is conducted on non-aspect words, following with the average pooling to compute semantic hidden layer output h se , which is written as: h se = f (mask(H se )) where τ + 1 ≤ t ≤ τ + m indicates the aspect index and f (·) is the average pooling function.

Syntax Aware Module
The syntax aware module is devised by modifying the method proposed by Zhang et al. [7]. The sentence syntax is characterized by the syntax dependency tree. Note that not all context words are syntactically related to the aspect-an aspect-related selection approach is taken to reshape the syntax dependency tree. Only if a context word reaches the aspect within n hops can the dependency edge between nodes be kept. We can thus revise the adjacency matrix A 0 to A sy . In this way, the revised graph is written as G sy = (A sy , H LSTM ), where H LSTM is the current node representation. Before sending G sy to GCN, the position-aware transformation is performed [7]: where q i ∈ R the position weight of the i-th token and F (·) is the function for position weight assignment. The syntactic information is learned by using graph convolution. The syntactic hidden layer output is expressed as: where H (l) ∈ R 2d h ×d gcn is a trainable parametric matrix. Similar to the semantic-based GCN, the syntactic hidden state representation W sy is revised via masking (Equation (16)). The where H t = {h t 1 , h t 2 , . . . , h t j }. The outcome hidden layer state from Equation (16) concentrates more on the aspect words. In addition, to further detect the significant semantic feature concealed within the syntax structure, the attention weight of each context word is assigned. The dot product of h t i and h i are obtained to denote the syntactic representation, i.e.,

Sentiment Classifier
Both the semantic representation and the syntactic representation are so far computed. We shall thus concatenate h s e and h s y to obtain the final representation h a (Equation (20)). The sentiment polarity of the given aspect is classified by sending h a to the Softmax classifier, which is:

Model Training
The training process is performed by using the categorical cross entropy and L 2 regularization as the loss function: where i is the index of the ABSA sample and j is the corresponding sentiment polarity.

Experiment
In this section, we designed the main experiment and attention visualization to verify the effectiveness of our model on the ABSA task. Specifically, we first introduce the benchmark datasets used in our experiment, and then briefly introduce the details of the experiment and the selected baseline. Then, we carried out the main experiment and analyzed the experimental results. In addition, in order to explore the contribution of each module to the model, we designed ablation experiments and analyzed the mechanism of knowledge enhancement in attention visualization.

Dataset
To verify the working performance of the proposed model, experiments were carried out on four publicly available benchmark datasets, i.e., Rest14 and Lap14 from SemEval 2014 [36], Rest15 from SemEval 2015 [37] and Rest16 from SemEval 2016 [1], containing reviews of restaurant and laptop domains.
Every single sentence from the datasets contains at least one aspect. The sentiment polarity of each aspect is given as well, including: positive, negative and neutral. For example, in the sentence "Great food but the service was dreamful!", there are two aspect terms, 'food' and 'service', and their sentiment polarity are positive and negative, respectively. The details of each dataset are presented in Table 1.

Implementation Details
The best test result of each method was taken for evaluation. For the proposed model, the initialization of word embeddings was conducted using Glove [38] and uncased BERT [35], respectively. The pretrained Glove provides a 300-dimensional word vector, with a learning rate of 0.001 and a batch size of 64. Moreover, the dimension of Bert-based word embeddings was 768, with a learning rate of 0.00002 and a batch size of 32. The head number of multi-head attention network was set to 1. The value of top-k selection was 2. Besides, the Adam optimizer was employed. The L 2 regularization weight was 0.0001. The value of dropout was determined within the interval of [0.4, 0.6] using grid searching. With respect to the GCN in our model, the number of layers and the dimension of hidden layers ranged within [1,4] and [100,200], respectively, which were also selected via grid searching.

Baseline
For the purpose of validating the effectiveness of our model, twelve state-of-the-art methods were taken for comparison, which are presented as follows: • CDT [8]: GCN is taken to deal with the syntax dependency tree, which aims to learn the sentence syntactic information. Specifically, it exploits a GCN to model the structure of a sentence through its dependency tree, where node (word) embeddings of the tree are initialized by means of a Bi-LSTM network. • ASGCN [7]: On the task of ABSA, GCN is applied to learn the aspect-specific representation for the first time. Specifically, it starts with a LSTM layer to encode the sentence, and a multi-layered graph convolution structure is implemented on top of the LSTM output to obtain aspect-specific features. • SK-GCN [14]: A syntax-based GCN and a knowledge-based GCN are designed to model the syntax dependency tree and knowledge graph, respectively. Specifically, it obtains the sentiment information from the SenticNet to enrich the representation of a sentence toward a given aspect. • R-GAT [4]: It reshapes and prunes an ordinary dependency parse tree to obtain an aspect-oriented dependency tree structure rooted at a target aspect. Then, a relational graph attention network (R-GAT) is introduced to encode the new tree structure for sentiment prediction. • DualGCN [5]: Considering the complementarity of syntax structures and semantic correlations, a dual graph convolutional network is proposed to tackle both the syntactic information and semantic information. • DMGCN [11]: A multi-channel GCN-based method is developed to exploit not only the syntax and the semantics, but also the correlated information from the generated graph. • BERT [35]: The basic BERT model is established based on a bidirectional transformer.
With the concatenation of sentence and the corresponding aspect, BERT can be applied to ABSA. • SK-GCN+BERT [14], R-GAT+BERT [9],DualGCN+BERT [10], DMGCN+BERT [11] : The pre-trained BERT is integrated with SK-GCN, R-GAT, DualGCN and DMGCN, respectively, where BERT is used for sentence encoding. • TGCN+BERT [39]: The dependency type is identified with type-aware graph convolutional networks, while the relation is distinguished with attention mechanism. The pre-trained BERT is used for sentence encoding.

Experimental Results
Experimental results on all datasets are exhibited in Table 2. In this experiment, we took accuracy and macro-F1 as the method evaluation metrics. Comparing with the baseline models, KDGCN generally obtained the best and most consistent results in all evaluation settings. However, our model with the Bert encoder was less competitive than DMGCN+BERT on the dataset of Rest14. A possible explanation is that the pretrained Bert contains a wealth of semantic information. The semantic enhancement via SenticNet is not that distinctive. With respect to the Glove-based word embeddings, the performance of KDGCN was 0.93% and 2.89% higher than DMGCN in accuracy and Macro-F1, respectively.
Comprehensively, current GCN-based models focus on encoding either the syntactic information (e.g., ASGCN, CDT, R-GAT and TGCN+BERT) or the semantic-integrated syntactic information (e. g., DualGCN and DMGCN ). The performance of these methods largely depends on their fitting capabilities. By contrast, the proposed model adopted the aspect-related selection approach to prune the edges of the syntax dependency tree, based on which the unrelated information to the aspect was eliminated. On the other hand, the commonsense knowledge was introduced to enhance the semantic information and the sentiment of the aspect. In this way, the results of ABSA can be improved.
Furthermore, SK-GCN also uses the external knowledge derived from SenticNet to construct the syntax-based GCN and semantic-based GCN. In comparison with SKGCN, our model performs significantly better on all datasets. Clearly, KDGCN is capable of exploiting the commonsense knowledge in ABSA tasks. As such, it is rational to expect the integration of external knowledge into the given sentence and thus improved sentiment classification results. Table 2. Experimental results on four public datasets. The results of R-GAT and R-GAT+BERT are retrieved from [40], and others are retrieved from the original papers.

Ablation Study
An ablation study was conducted to quantitively investigate the importance of different modules in the proposed model. The results of the ablation study are given in Table 3 and Figure 4. We took the basic KDGCN as the baseline and ablated the knowledge enhancement module, semantic learning module, syntax aware module and the aspect-related select procedure. According to Table 3, the most important component for the proposed model is the syntax aware module. The accuracy drop on four datasets were 6.78%, 6.12%, 4.61% and 3.08%, which are significant. Obviously, the use of syntactic information plays a pivotal role in ABSA. Moreover, the contributions of the semantic learning module and the knowledge enhancement module are comparable. The integration of commonsense knowledge into the semantic learning process gives an improvement of the sentiment classification performance. Lastly, withdrawal of the aspect-related selection also caused a minor decrease of the working performance.  Figure 4. Results of the ablation study. Different columns show the performance of different models on different datasets.

Attention Visualization
To investigate the effectiveness of the knowledge enhancement, we visualized the attention matrix. In our model, the semantics enhancement is carried out by using the commonsense from SenticNet. The connection between the aspect and its opinion word is established and enhanced. The syntax-based GCN also removes the irrelevant information by encoding the syntax dependency tree. Cases are presented to demonstrate the attention weight distribution. In the first line of Figure 5 , the attentive weights are assigned based on a basic multi-head attention mechanism. One can easily see that the minor attention was given to the opinion word 'excellent' of the aspect 'food'. Likewise, the attention weight of 'food' toward 'excellent' was also weakened. With the integration of commonsense knowledge, the relationships of both 'food' and 'excellent' to the context word 'meal' were established. That is, the 'food-meal' edge and the 'excellent-meal' edge can be constructed by using a top-k selection. As a result, the sentiment information of 'excellent' can be aggregated on the aspect word 'food' with the encoding of GCN. Besides, the syntacticbased GCN, which deals with the syntactic relation among words, also facilitates the determination of aspect sentiment polarity.
Similarly, from the two figures in the second line, we can see that the aspect word 'waiter' established a direct connection with the opinion word 'helpful' after knowledge enhancement. Additionally, from the two figures in the last line, the aspect word 'sauce' and the opinion word 'flavorful' are connected through the path 'sauce-dough-flavorful' after knowledge enhancement, so that the sentiment polarity of the aspect words can be better predicted after the subsequent network structure.

Discussion
Through a series of experiments, we can see that our KDGCN performs well on the ABSA task. Specifically, in the main experiment part (Section 4.4), the accuracy and F1-score of our model on the four datasets are generally higher than baselines, especially compared with SK-GCN [14], which also uses SenticNet for knowledge enhancement; our improvement was 2-5%. In the ablation study, we removed the semantic learning module, the syntax aware module and so on, which proves that semantics and syntax are both important for ABSA tasks. In addition, after removing the knowledge enhancement module, the model performance also decreased significantly on the four datasets, indicating that our knowledge enhancement facilitates ABSA tasks.
Moreover, we also found the limitations of our model. Take DMGCN [11] and the use of the glove encoder as an example-KDGCN's improvement on Lap14 was not as big as that on rest14 (0.52% and 0.93%, respectively). This may be because most of the Lap14 datasets are proper nouns (such as Windows 7 and Microsoft), and they do not have obvious emotional clues. Different from it, most of the words in Rest14 are daily words, so the sentiment information is rich and can be further enhanced through SenticNet. In order to obtain more semantic information and deeper connections, large-scale knowledge graphs can be introduced into the ABSA task in future work.

Conclusions
In this work, we propose a knowledge-enhanced dual-channel graph convolutional network to deal with the ABSA tasks. A semantic-based GCN and a syntactic-based GCN are devised to encode both the sentence semantics and the syntax. On the one hand, the external commonsense knowledge is introduced to enhance the semantics, based on which more attention is assigned to the aspect and its relevant words. On the other hand, the syntactic-based GCN processing on the syntax dependency tree further filters the low-dependency words. We demonstrate the effectiveness of our method on four benchmark datasets, obtaining state-of-the-art results on both accuracy and macro-F1. Comparing with the baseline models, the proposed method is the best alternative that produces results considerably better than the widely-applied approaches in ABSA. In the ablation experiment, we tested the contribution of each module to the model and verified that our innovation is effective. In addition, we also carried out a case analysis to further intuitively demonstrate the role of knowledge enhancement in promoting our task.
However, SenticNet is a small-scale knowledge base with shallow and limited semantics, which limits the performance of the model. Therefore, future work can consider exploring the use of a larger scale knowledge graph (such as Wikipedia) to enhance the knowledge of ABSA tasks, which can provide more clues to predict the sentiment polarity of the aspect.

Conflicts of Interest:
The authors declare no conflict of interest.