A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification

Yu, Haibo; Lu, Guojun; Cai, Qianhua; Xue, Yun

doi:10.3390/math10203908

Open AccessArticle

A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification

by

Haibo Yu

^1,2

,

Guojun Lu

¹,

Qianhua Cai

^2,* and

Yun Xue

²

¹

School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China

²

School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(20), 3908; https://doi.org/10.3390/math10203908

Submission received: 21 September 2022 / Revised: 16 October 2022 / Accepted: 17 October 2022 / Published: 21 October 2022

(This article belongs to the Special Issue Advancement of Mathematical Methods in Feature Representation Learning for Artificial Intelligence, Data Mining and Robotics)

Download

Browse Figures

Versions Notes

Abstract

ALSC (Aspect-level Sentiment Classification) is a fine-grained task in the field of NLP (Natural Language Processing) which aims to identify the sentiment toward a given aspect. In addition to exploiting the sentence semantics and syntax, current ALSC methods focus on introducing external knowledge as a supplementary to the sentence information. However, the integration of the three categories of information is still challenging. In this paper, a novel method is devised to effectively combine sufficient semantic and syntactic information as well as use of external knowledge. The proposed model contains a sentence encoder, a semantic learning module, a syntax learning module, a knowledge enhancement module, an information fusion module and a sentiment classifier. The semantic information and syntactic information are respectively extracted via a self-attention network and a graphical convolutional network. Specifically, the KGE (Knowledge Graph Embedding) is employed to enhance the feature representation of the aspect. Then, the attention-based gate mechanism is taken to fuse three types of information. We evaluated the proposed model on three benchmark datasets and the experimental results establish strong evidence of high accuracy.

Keywords:

aspect-level sentiment classification; external knowledge; KGE; GCN

MSC:

18C50

1. Introduction

The aspect-level sentiment classification, as a fine grained sentiment analysis task, is widely considered as a main focus in the field of natural language processing. In ALSC tasks, the sentiment polarity of a given aspect in a given text is classified as either positive, neutral or negative [1]. As an example, in the sentence ‘the ambience was nice, but service wasn’t so great’, the sentiment of the two discussed aspects, ‘ambience’ and ‘service’, are predicted as positive and negative, respectively. In practice, ALSC has become an effective approach to identify opinions and preferences towards products, stock and anything in the world.

Currently, most methods involving ALSC are performed using the following steps: sentence encoding, syntax dependency tree constructing, syntactic information capturing via graph convolution network (GCN) [2], semantic information extracting based on attention mechanism, information fusion and sentiment classification. So much is the effectiveness of attention networks in attentive weights distribution, a number of studies show their superiority in ALSC tasks [3,4,5]. Notwithstanding, for a long distance between aspect and its dependency-words, more weight may be assigned to irrelevant words. On this occasion, the establishment of the relation between aspect and its opinion words is thus proposed, which exploits the sentence syntax dependency tree [6]. Figure 1 shows the syntax dependency tree of a given sentence. One can easily see that the syntactical-related words to the aspect, such as ‘nice’ and ‘great’, have impressive effects on sentiment polarity prediction. In spite of the significance of syntax structure, the ALSC for informal grammar styles (e.g., colloquial comments, slang language, etc.) remains challenging. In these cases, the connection between aspect and opinion words can be confusing. Thereby, the extracted syntax can even become noise, which results in the misunderstanding of the sentiment.

Encouragingly, according to recent publications, external knowledge is also employed to enhance the information of aspect for ALSC [7]. Generally, the exploiting of external knowledge is carried out by searching the information related to the given aspect. That is, the aspect is taken as the central node of the knowledge graph, based on which the subgraphs are built up using its neighbor nodes. In such a manner, the selection of the neighboring nodes is highlighted. The distinctiveness of the external knowledge is mainly restricted by the selection method. Further, for the searched knowledge of substantial distinction, the selected nodes must be revised to a large extent. Moreover, when dealing with the knowledge graph, most of the previous methods used graph neural networks such as the graph convolutional network to search the knowledge graph nodes, which is inefficient.

In Consideration of the aforementioned issues, we propose a method that integrates the sentence semantics and syntax as well as the external knowledge toward the aspect. In order to fully extract the sentence information, the semantic relation between aspect and its contexts is built. Likewise, the connection of opinion words to the aspect is set up. With respect to external knowledge, the knowledge graph embedding (KGE) [8] is employed to obtained the knowledge embeddings of the aspect which makes it more efficient to deal with the knowldege graph. In addition, a fusion module is devised to incorporate the relevant external information and the sentence information for sentiment classification. The contributions of this paper are threefold and summarized as follows:

The external knowledge is effectively applied to enhance the aspect information, which is also supplementary to the sentence information.
An information fusion approach is dedicatedly designed to integrate different types of information for ALSC.
Comparing with the state-of-the-art methods, experimental results on three benchmark datasets corroborate the competitiveness of the proposed methods.

The rest of this paper is organized as follows: we review the recent studies on ALSC methods and the KGE applications in Section 2. Section 3 presents the proposed model in detail. In Section 4, experiments are carried out to investigate the working performance of our model. Finally, concluding remarks of this work are given in Section 5.

2. Related Work

2.1. Aspect-Level Sentiment Classification

Early deep-learning based ALSC methods generally concentrate on extracting the contextual semantics by using the integration of a RNN (Recurrent Neural Network) and attention mechanism [9]. In terms of multiple aspects, the sentiment polarity determination via only semantic information becomes insufficient. In addition to the semantic-based models, the exploiting of sentence syntax is one such approach as well. The relation between an aspect and its opinion words can be conveyed by a syntax dependency tree. Because of the graph structure of dependency trees, graph neural networks [10] are employed to cope with the syntactic information. Distinctively, the graph convolutional network is most pronounced for processing graph structured data in a variety of tasks. In terms of ALSC, GCN-based models are capable of not just aggregating and delivering information among neighboring nodes, but also of extracting features and syntactic information of the graph. Zhao [11] takes a GCN to model the sentiment dependencies between aspect words, and thereby captures the sentiment relationships of multiple aspects in a sentence. Zhang [12] characterizes the sentence using a syntax dependency tree, and extracts syntactic information via the GCN. Furthermore, aiming to distinguish the importance of each node in the graph, the attention mechanism is integrated into GCN-based methods. To comprehensively understand the relation between aspect and its opinion words, Tian [13] exploits the attention mechanism to assign the attention weight to each word syntactically connected with the aspect word, based on which the syntactic information can be precisely extracted by GCN. By constructing an aspect-centered syntax dependency tree, Wang [14] focuses on identifying each node using a graph attention, and thus aggregating information from neighboring nodes.

2.2. Semantics and Syntax

Since both semantics and syntax have their own advantages and disadvantages, some recent research solves ALSC by combining these two pieces of information together. Zhang et al. [15] propose an aspect-aware attention mechanism combined with self-attention to obtain the attention score matrices of a sentence, which can not only learn the aspect-related semantic correlations, but also learn the global semantics of a sentence. Bie et al. [16] propose an end-to-end ABSA model, which fuses the syntactic structure information and the lexical semantic information, to address the limitation that existing end-to-end methods do not fully exploit the textual information. Zhang et al. [17] also analyze sentences both syntactically and semantically, and they propose a simple and effective fusion mechanism to make the integration of aspect information and context information more adequate. Some researchers also utilize GCN to capture the neighbor’s information [18,19,20]. However, this research generally ignores that the sentence may not be well formed, and that slang language and informal writing can be found in most user-generated content. As a result, more information is required to help in these situations.

2.3. Knowledge Graph

A knowledge graph involves a great number of entities and their relationship types. The application of a knowledge graph is carried out in a variety of domains, such as education [21], medicine [22], cybersecurity [23], etc. More recent work validates the significance of the knowledge graph in natural language processing [24]. As such, the utilization of the knowledge graph is currently a main focus in NLP tasks. This also gives rise to new opportunities for its use in ALSC. Zhou [25] has devised a GCN-based method that combines syntactic information and external knowledge. Liang [26] introduced knowledge from the SenticNet knowledge base, thus enhancing the information about aspectual word sentiment in this context. However, these approaches generally ignore the inefficiency of the GCN-based method when dealing with the knowledge graph.

Knowledge graph embedding (KGE) is a creative and practical method for introducing the knowledge graph. Theoretically, KGE aims to represent both complex and sparse entity relationship types with low-dimensional and continuous embeddings, which facilitates the computation of introduced knowledge. KGE is currently a widely-used approach in question answering [27], semantic retrieval [28] and recommendation systems [29]. Early KGE methods, such as TransE [30], and TransH [31], consider the “relationship” as the interpretation between head and tail entities. Furthermore, advances in deep-neural networks have optimized the working performance of KGE. The state-of-the-art KGE methods, such as ConvE [32] and CapsE [33], are developed based on capsule neural networks, which obtain the feature and calculate the credibility of a triplet through convolutional layers.

3. Methodology

Figure 2 shows the architecture of the proposed model. There are five main components, namely the sentence encoder, semantic learning module, syntax learning module, knowledge enhancement module, information fusion module and sentiment classifier. More details of each component are presented as follows.

3.1. Sentence Encoder

Let

x = \{w_{1}^{s}, w_{2}^{s}, \dots, w_{m}^{t}, \dots, w_{m + l}^{t}, \dots, w_{n}^{s}\}

be an n-word sentence containing the aspect. Each word is mapped into a low-dimensional vector by looking up in a pretrained word embedding matrix. We can thus obtain the sentence embedding.

Then, the hidden state of the given sentence is extracted via Bidirectional-Gate Recurrent Unit (Bi-GRU) which outperforms other methods in extracting the long-term information of a sentence. As a result, we use Bi-GRU to encode the sentence for further processing. The forward and backward hidden states of the sentence are delivered as

{\vec{H}}^{G R U} = \{{\vec{h}}_{1}^{s}, {\vec{h}}_{2}^{s}, \dots, {\vec{h}}_{m}^{t}, \dots, {\vec{h}}_{m + l}^{t}, {\vec{h}}_{n}^{s}\}

and

{\overset{\leftarrow}{H}}^{G R U} = \{{\overset{\leftarrow}{h}}_{1}^{s}, {\overset{\leftarrow}{h}}_{2}^{s}, \dots, {\overset{\leftarrow}{h}}_{m}^{t}, \dots, {\overset{\leftarrow}{h}}_{m + l}^{t}, \dots, {\overset{\leftarrow}{h}}_{n}^{s}\}

, respectively. The sentence representation is the concatenation of

{\vec{H}}^{G R U}

and

{\overset{\leftarrow}{H}}^{G R U}

, i.e.,

H^{G R U} = [{\vec{H}}^{G R U}, {\overset{\leftarrow}{H}}^{G R U}]

(1)

3.2. Semantic Learning Module

The semantic learning module is mainly developed to establish the semantic relation between aspect and its context. With the input sentence representation, in order to corcapture the semantic relation between aspect and its context, we proposed two attention mechanisms. The self-attention mechanism is first performed to obtain the contextual dependency of the given sentence. Subsequently, the aspect-specific attention mechanism is carried out to determine the relation between the aspect and context. Concretely, the attention weights of each context word is computed:

S e l f A t t = \frac{(H^{G R U} W^{k}) {(H^{G R U} W^{q})}^{T}}{\sqrt{d_{k}}}

(2)

where

W^{k}

and

W^{q}

are trainable parameter matrices and

d_{k}

is the dimension of input vector.

Based on the attention weight, the hidden state in relation to the aspect can be derived, which is:

H^{s e} = A t t (H^{S e l f A t t}, H^{a})

(3)

where

H^{S e l f A t t}

represents the outcome of the self-attention network and

H^{a}

is the hidden state of the aspect word output from Bi-GRU. We take

H^{s e}

as the semantic representation for further processing.

3.3. Syntax Learning Module

Syntax can be seen as a supplement of semantics and it has shown to be helpful in sentiment classification. So, to fully extract sentence information, syntactic information is necessary. With respect to the syntactic information, the syntax dependency tree of the given sentence is built in advance. In the syntax learning module, the syntax dependency tree is transformed to the graph

G_{s y} = (H^{G R U}, A^{s y})

to facilitate processing. Notably,

H^{G R U}

is the feature matrix derived from Bi-GRU, while

A^{s y}

is the adjacency matrix of the syntax dependency tree.

We employ GCN to extract the syntactic information of the sentence, which can be written as:

H^{s y (l + 1)} = G C N (H^{s y (l)}, {\tilde{A}}^{s y}, W^{s y (l + 1)})

(4)

G C N (H^{l}, \tilde{A}, W^{(l + 1)}) = R e L U (H^{l} \tilde{A} W^{(l + 1)})

(5)

with

{\tilde{A}}^{s y} = {\tilde{D}}^{- \frac{1}{2}} (A^{s y} + I^{f}) {\tilde{D}}^{- \frac{1}{2}}

(6)

where

H^{s y (l + 1)}

stands for the output of the l-th layer in the GCN. The initial

H^{s y (0)}

is the output from Bi-GRU.

{\tilde{A}}^{s y}

represents the adjacency matrix with self-circulation.

W^{s y (l + 1)}

is the learnable-parameter-matrix of the l-th layer.

With the convolution of each layer, the information of every single node is aggregated from its neighboring node, based on which the node information can be updated during the iterative computation of the GCN. Thus, the syntax representation is the output of the GCN after the last iteration.

3.4. Knowledge Enhancement Module

For the purpose of the aspect feature, supplementary, external knowledge is leveraged to enhance the information of the aspect. Specifically, we use Freebase [34] as an external knowledge base, which contains a large number of words together with various semantic relations.

For a word beyond comprehension, one can search for known information involved with this word for better understanding. In such a manner, the external knowledge can be applied to complement information related to the aspect during learning.

In most user-generated content, informal writing, such as errors in spelling and grammar and slang language, can be found. On this occasion, the exploiting of external knowledge makes a contribution to the determination of sentiment polarity. For instance, the sentence ‘check out these songs! Especially that amazing rock one’ contains an aspect word ‘songs’. Syntactically, there is no explicit opinion word in direct relation to the aspect ‘songs’ for sentiment classification. For this reason, external knowledge can be introduced, based on which the relation between ‘songs’ and ‘rock’ is set up. That is, the word ‘rock’ indicates a type of song, which is a subordinate of ‘songs’. Seeing that the opinion word toward ‘rock’ is ‘amazing’, the sentiment polarity is identified as positive. In this way, the sentiment polarity of the aspect ‘songs’ is similar to that of ‘rock’.

In the knowledge enhancement module, we introduce the knowledge graph and take KGE to tackle the external knowledge from Freebase. Notably, most state-of-the-art methods employ GCN to encode the external knowledge. Whereas, a certain amount of external knowledge bases contain heterogeneous graphs, which is challenging for the GCN to deal with. In our model, the external knowledge is mapped into a continuous vector space using KGE, which is more efficient. The enhancement of aspect is conducted by computing the weights between aspect words and the knowledge embeddings.

On this occasion, we select DistMult [35] as the KGE of the proposed model. Every single entity within the knowledge graph base is delivered as:

y_{e} = f (W x_{e})

(7)

where f stands for either a linear or nonlinear function. W is the parameter matrix.

x_{e}

is a vector that represents an entity. Notably, the relationship representation is typically obtained from the score function. DistMult takes the basic bilinear score function as:

g_{r}^{b} (y_{e 1}, y_{e 2}) = y_{e 1}^{T} M_{r} y_{e 2}

(8)

where the relation matrix

M_{r}

is a diagonal matrix whilst

y_{e 1}

and

y_{e 2}

are the vector representations of entities

x_{e 1}

and

x_{e 2}

, respectively. The aspect-based knowledge embedding

H^{k g}

can be obtained by computing the attentive weight between the aspect and its knowledge embedding:

H^{k g} = A t t (D i s t M u l t (x_{e}), H^{a})

(9)

3.5. Information Fusion Module

Since we have gained different kinds of information including syntactic information, semantic information and external knowledge information, how to effectively combine these three kinds of information is of vital importance. The information fusion module is devised to make full use of the syntactic information, the semantic information and the external information. Both the syntax and the semantics can be considered as sentence information while the external knowledge is the supplementary. During information fusion, each type of information has to be controlled within a certain extent to prevent the introduction of noise. Therefore, we shall compute the attention weights of syntactic information toward the other two types of information. The attention weight between

H^{s y}

and

H^{s e}

is expressed as:

A t t (H^{s y}, H^{s e}) = \sum_{i = 1}^{N} α_{(i)} \cdot H_{(i)}^{s y}

(10)

α_{(i)} = \frac{exp (\sum_{i = 1}^{N} H_{(i)}^{s y^{T}} H_{(i)}^{s e})}{\sum_{j = 1}^{N} exp (\sum_{i = 1}^{N} H_{(i)}^{s y} H_{(i)}^{s e})}

(11)

Likewise, the attention weight of

H^{s y}

and

H^{k g}

is:

A t t (H^{s y}, H^{k g}) = \sum_{i = 1}^{N} α_{(i)} \cdot H_{(i)}^{k g}

(12)

Then, two gating units are established to filter the noise from the input information, which are:

H_{i}^{L} = t a n h [A t t (H_{i}^{s y}, H_{i}^{s e}) \cdot W_{s} + b_{s}]

(13)

H_{i}^{K} = R e L U [A t t (H_{i}^{s y}, H_{i}^{k g}) \cdot W_{k} + b_{k}]

(14)

where

W_{k}

,

W_{s}

,

b_{k}

and

b_{s}

are trainable parameters of the proposed model. The aspect-related sentence representation is computed using cross product operation:

H = H^{L} \times H^{K}

(15)

3.6. Sentiment Classifier

The sentence representation H is sent to the sentiment classifier for sentiment polarity classification. A fully connected layer is developed to obtain the score for each sentiment polarity. The final sentiment probability distribution of the aspect is determined using a SoftMax classifier, which is written as:

\tilde{H} = Re L U (W_{1}^{T} H + b_{1})

(16)

\tilde{y} = s o f t m a x (\tilde{H})

(17)

where

W_{1}^{T}

and

b_{1}

are trainable parameters, and

\tilde{y}

is the predicted sentiment polarity.

The training of the proposed is conducted using the cross entropy and regularization as the loss function, i.e.

L = - \sum_{i} \sum_{j = 1}^{N} y_{i}^{j} log {\tilde{y}}_{i}^{j}

(18)

where i represents the i-th sample while j represents the j-th sentiment polarity. N is the number of sentiment polarities. y is the real distribution of sentiment and

\tilde{y}

is the predicted one.

4. Experiment

4.1. Dataset

In this experiment, three publicly available benchmark datasets are used for working performance evaluation, which are Laptop14 and Restaurant14 from SemEval2014 [36] and Twitter [37]. All the samples in the experiment are labeled as three different polarities, i.e., positive, neutral and negative. Each sample is a review sentence with the tagged aspect within it. Details of each dataset are exhibited in Table 1.

4.2. Implementation Details

The initialization of sentence embeddings is conducted using both Glove [38] and Bert [39]. The batch sizes of Restaurant14, Laptop14 and Twitter are 32, 64 and 32, respectively. The learning rates of the Glove-based model and BERT-based model are separately set to 1e-3 and 2e-5. In addition, the Adam optimizer is adopted during model training.

4.3. Baseline Methods

Aiming to corroborate the working performance of the proposed model, seven state-of-the-art methods are taken for comparison.

Syntax- and semantic-based methods:

BiGCN [40]: Two graphs, i.e., a global lexical graph and a concept hierarchy graph, are constructed. A bi-level interactive GCN is established to deal with these graphs.
R-GAT: An aspect-oriented dependency tree is constructed, which is encoded by a relational graph attention network.
AFGCN [41]: An aspect fusion graph is constructed based on the syntax dependency tree, which captures the aspect-related context words.
InterGCN [42]: To capture the relation between multiple aspect words, an inter-aspect GCN is devised on the foundation of the AFGCN.

KG-based methods:

SK-GCN: A two-GCN-based model that deals with the syntax dependency tree and knowledge graph, respectively.
Sentic GCN: The external knowledge from SenticNet is introduced to the GCN, which enhances the sentiment dependency between aspects and their contexts.

4.4. Experiment Results

Table 2 shows the experiment results on all datasets. As presented in Table 2, the proposed model outperforms the-state-of-the-art methods on the datasets Restaurant14 and Twitter. Notably, there is a considerable gap between our model and the baselines. The minimum accuracy gaps of the Glove-based model and Bert-based model are 3.57% (versus SK-GCN) and 3.15% (versus RGAT+BERT), which are significant. The main reason is that the introduction of external knowledge from Freebase provides a large amount of semantic information and relationships. With the enhancement of external information toward the aspect, the sentiment classification performance can be optimized. With respect to Laptop14, the working performance of the Sentic-GCN model is slightly better than the proposed method. One possible explanation for this is that the syntactic structure plays a more important role in the sentiment determination in sentences from Laptop14. The utilization of SenticNet [43] brings information to the adjacency matrices. In this way, the syntactic information can be extracted via graph convolution. Moreover, the pre-training of Bert further provides an improvement to the ALSC results. Since the proposed model is capable of integrating the sentence semantics, the sentence syntax and the external knowledge, we can thus expect better sentiment classification results with information supplementary on each other.

4.5. Impact of GCN Layer Number

An GCN is a key component in the syntax learning module for syntactic information encoding. On this occasion, we tend to explore the optimal GCN layer number for ALSC. The number of GCN is set to 1, 2, 3, 4 and 5, respectively. According to Table 3, the GCN layer number of 2 obtains the best result in all evaluation settings. Comprehensively, the configuration of the GCN determines the amount of contextual information that is aggregated toward the aspect. It is clear that a one-layer GCN fails to capture sufficient syntactic information from the sentence. When the GCN layer number ranges from 3 to 5, the working performance of our model declines with the increasing number of layers. As such, there are two main considerations. Firstly, the connected context words increase in line with the increment of layer number, based on which the syntactic noise is introduced. Secondly, after multi-layer graph convolution, the nodes become less distinguishable whilst the node representation vectors tend to be consistent, which results in the over-smoothing problem of multi-layer GCN.

4.6. Impact of KGE

We employ four distinguishing KGE methods and investigate their effectiveness in external knowledge enhancement. Table 4 exhibits the ALSC results of the Glove-based model of different KGEs.

TransE, TransR and TransH have minor accuracy compared with DistMult. The reason for this is that these three translation models determine the word relationship by using head and tail entities, rather than semantic information. By contrast, DistMult uses bilinear methods, which are capable of computing the semantic credibility of entities and relationships within vector space. That is, the introduction of semantic information results in the incorporating of external knowledge, and thus a better sentiment classification accuracy.

4.7. Run Time and Parametric Amount

To further evaluate the efficacy of the proposed model, the run time for training and testing, as well as the size of the parametric quantities of different methods are compared, see Table 5. Both SK-GCN and our model take advantage of the knowledge graph. Our model has a better performance in not only run time, but also the parameter amount. In this way, our model shows its superiority over the GCN-based method in dealing with knowledge graphs. On the other hand, the run time of BiGCN and the proposed model is comparable, but the test accuracy of our model is far better than RGAT and BiGCN, which indicates a higher working efficiency.

4.8. Case Study

The visualization of attention weights distribution of a given sentence is presented in Figure 3. Words in the darker color are of greater weight, and vice versa. The former is processed by integration of the semantic learning module and syntax learning module, while the latter incorporates the external knowledge as well. According to Figure 3, more attention is given to the words that are close to the aspect by using only sentence-related information. One can easily see that the opinion word ‘love’ to aspect ‘drinks’ obtains a higher attentive weight, which is the same with ‘great’ to ‘food’. However, for the aspect ‘lychee martini’, few syntactic- or semantic-related words are identified via the semantic learning module and syntax learning module. The introduction of external knowledge facilitates the sentiment word determination of ‘lychee martini’, which contributes to the sentiment classification.

5. Conclusions

In this work, we propose a model that integrates semantics, syntax and external knowledge on the task of ALSC. Aiming to sufficiently incorporate the external information into aspect words, we employ the KGE and aspect-specific attention mechanism to enhance the aspect features. Further, a semantic-learning module and a syntactic-learning module are devised to extract the sentence information. In addition, an information fusion module is established to integrate three types of information for sentiment classification. Experiments are carried out on three benchmark datasets. Our model is the best-performing method compared with the baselines.

Further work will focus on more details of the knowledge graph processing. The loss of graph structural information is still a question that in suspense.

Author Contributions

Conceptualization, H.Y. and Y.X.; methodology, H.Y.; formal analysis, H.Y. and G.L.; writing—original draft preparation, H.Y.; writing—review and editing, Q.C.; supervision, Y.X. and Q.C.; funding acquisition, Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Characteristic Innovation Projects of Guangdong Colleges and Universities (Nos. 2018KTSCX049), the Science and Technology Plan Project of Guangzhou under Grant Nos. 202102080258 and 201903010013.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhou, J.; Huang, J.X.; Chen, Q.; Hu, Q.V.; Wang, T.; He, L. Deep learning for aspect-level sentiment classification: Survey, vision, and challenges. IEEE Access 2019, 7, 78454–78483. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016. [Google Scholar]
Yang, M.; Tu, W.; Wang, J.; Xu, F.; Chen, X. Attention based LSTM for target dependent sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Zhou, X.; Wan, X.; Xiao, J. Attention-based LSTM network for cross-lingual sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016. [Google Scholar]
Nakagawa, T.; Inui, K.; Kurohashi, S. Dependency tree-based sentiment classification using CRFs with hidden variables. In Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, 2–4 June 2010. [Google Scholar]
Wu, S.; Xu, Y.; Wu, F.; Yuan, Z.; Huang, Y.; Li, X. Aspect-based sentiment analysis via fusing multiple sources of textual knowledge. Knowl.-Based Syst. 2019, 183, 104868. [Google Scholar] [CrossRef]
Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
Zhao, P.; Hou, L.; Wu, O. Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl.-Based Syst. 2020, 193, 105443. [Google Scholar] [CrossRef]
Zhang, C.; Li, Q.; Song, D. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar]
Tian, Y.; Chen, G.; Song, Y. Aspect-based sentiment analysis with type-aware graph convolutional networks and layer ensemble. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021. [Google Scholar]
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational Graph Attention Network for Aspect-based Sentiment Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
Zhang, Z.; Zhou, Z.; Wang, Y. SSEGCN: Syntactic and Semantic Enhanced Graph Convolutional Network for Aspect-based Sentiment Analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 4916–4925. [Google Scholar]
Bie, Y.; Yang, Y.; Zhang, Y. Fusing Syntactic Structure Information and Lexical Semantic Information for End-to-End Aspect-Based Sentiment Analysis. Tsinghua Sci. Technol. 2022, 28, 230–243. [Google Scholar] [CrossRef]
Zhang, D.; Zhu, Z.; Kang, S.; Zhang, G.; Liu, P. Syntactic and semantic analysis network for aspect-level sentiment classification. Appl. Intell. 2021, 51, 6136–6147. [Google Scholar] [CrossRef]
Wu, H.; Zhang, Z.; Shi, S.; Wu, Q.; Song, H. Phrase dependency relational graph attention network for Aspect-based Sentiment Analysis. Knowl.-Based Syst. 2022, 236, 107736. [Google Scholar] [CrossRef]
Phan, H.T.; Nguyen, N.T.; Hwang, D. Convolutional attention neural network over graph structures for improving the performance of aspect-level sentiment analysis. Inf. Sci. 2022, 589, 416–439. [Google Scholar] [CrossRef]
He, J.; Wumaier, A.; Kadeer, Z.; Sun, W.; Xin, X.; Zheng, L. A Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis. IEEE Access 2022, 10, 84135–84146. [Google Scholar] [CrossRef]
Chen, P.; Lu, Y.; Zheng, V.W.; Chen, X.; Yang, B. KnowEdu: A System to Construct Knowledge Graph for Education. IEEE Access 2018, 6, 31553–31563. [Google Scholar] [CrossRef]
Rotmensch, M.; Halpern, Y.; Tlimat, A.; Horng, S.; Sontag, D. Learning a Health Knowledge Graph from Electronic Medical Records. Sci. Rep. 2017, 7, 5994. [Google Scholar] [CrossRef] [PubMed]
Jia, Y.; Qi, Y.; Shang, H.; Jiang, R.; Li, A. A Practical Approach to Constructing a Knowledge Graph for Cybersecurity. Engineering 2018, 4, 53–60. [Google Scholar] [CrossRef]
Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
Zhou, J.; Huang, J.X.; Hu, Q.V.; He, L. SK-GCN: Modeling Syntax and Knowledge via Graph Convolutional Network for aspect-level sentiment classification. Knowl.-Based Syst. 2020, 205, 106292. [Google Scholar] [CrossRef]
Liang, B.; Su, H.; Gui, L.; Cambria, E.; Xu, R. Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl.-Based Syst. 2022, 235, 107643. [Google Scholar] [CrossRef]
Huang, X.; Zhang, J.; Li, D.; Li, P. Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019. [Google Scholar]
Gerritse, E.J.; Hasibi, F.; Vries, A.P. Graph-embedding empowered entity retrieval. In European Conference on Information Retrieval; Springer: Cham, Switzerland, 2020. [Google Scholar]
Sun, Z.; Yang, J.; Zhang, J.; Bozzon, A.; Huang, L.K. Recurrent knowledge graph embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems, New York, NY, USA, 2 October 2018. [Google Scholar]
Bordes, A.; Weston, J.; Collobert, R.; Bengio, Y. Learning structured embeddings of knowledge bases. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–8 August 2011. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Vu, T.; Nguyen, T.D.; Nguyen, D.Q. A capsule network-based embedding model for knowledge graph completion and search personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1. [Google Scholar]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 10–12 June 2008. [Google Scholar]
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Pontiki, M.; Galanis, D.; Papageorgiou, H. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. SemEval 2014, 2014, 27. [Google Scholar]
Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 22–27 June 2014. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
Kenton, J.; Chang, D.M.-W.; Toutanova, L.K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
Chen, Z.; Ma, T.; Jin, Z.; Song, Y.; Wang, Y. BiGCN: A bi-directional low-pass filtering graph neural network. arXiv 2021, arXiv:2101.05519. [Google Scholar] [CrossRef]
Zhang, F.; Zhang, Y.; Hou, S.; Chen, F.; Lu, M. Aspect Fusion Graph Convolutional Networks for Aspect-Based Sentiment Analysis. In China Conference on Information Retrieval; Springer: Cham, Switzerland, 2021. [Google Scholar]
Liang, B.; Yin, R.; Gui, L.; Du, J.; Xu, R. Jointly learning aspect-focused and inter-aspect relations with graph convolutional networks for aspect sentiment analysis. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020. [Google Scholar]
Cambria, E.; Speer, R.; Havasi, C.; Hussain, A. Senticnet: A publicly available semantic resource for opinion mining. In Proceedings of the 2010 AAAI Fall Symposium Series, Arlington, VA, USA, 11–13 November 2010. [Google Scholar]

Figure 1. Syntax dependency tree.

Figure 2. Model architecture.

Figure 3. Attention weights to aspects ‘drinks’, ‘lychee martini’ and ‘food’.

Table 1. Statistics of datasets.

Dataset	Positive		Negative		Neutral
Dataset	Train	Test	Train	Test	Train	Test
Restaurant	2164	728	805	196	633	196
Laptop	987	341	866	128	466	169
Twitter	1560	173	1560	173	3127	346

Table 2. Experimental results.

Models	Restaurant		Laptop		Twitter
Models	Acc	F1	Acc	F1	Acc	F1
BiGCN	81.96	73.53	74.61	71.19	74.13	72.64
AFGCN	81.70	73.43	76.80	72.88	-	-
InterGCN	82.23	72.81	77.12	72.87	-	-
RGAT	83.30	76.08	77.74	73.91	73.12	72.40
SK-GCN	81.53	72.90	77.62	73.84	71.97	70.22
Sentic-GCN	84.03	75.38	77.90	74.71	-	-
Ours	84.23	76.12	77.81	73.47	77.70	76.27
BERT only	84.11	76.66	77.90	73.30	73.27	71.52
AFGCN+BERT	86.16	79.34	80.88	77.24	-	-
RGAT+BERT	86.61	80.99	78.53	74.06	75.72	74.60
InterGCN+BERT	86.43	80.75	82.29	78.9	-	-
Ours+BERT	86.93	81.05	82.41	79.32	78.87	77.97

Table 3. ALSC accuracy in line with GCN layer numbers.

Num of GCN Layers	Restaurant	Laptop	Twitter
1	83.68	76.94	77.13
2	84.23	77.81	77.70
3	83.12	76.55	76.81
4	82.83	75.98	76.44
5	82.35	75.50	75.93

Table 4. ALSC results of different KGE methods.

KGE Methods	Restaurant	Laptop	Twitter
TransE	82.37	77.13	75.89
TransR	82.65	77.44	75.80
TransH	82.80	77.63	76.12
DistMult	84.23	77.81	77.70

Table 5. Results of run time and the parameter amount of different methods.

Method	Training Speed (secs.)	Params (M)
BiGCN	6.88	1.9
RGAT	4.22	3.9
SK-GCN	8.92	8.5
Ours	7.43	7.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, H.; Lu, G.; Cai, Q.; Xue, Y. A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification. Mathematics 2022, 10, 3908. https://doi.org/10.3390/math10203908

AMA Style

Yu H, Lu G, Cai Q, Xue Y. A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification. Mathematics. 2022; 10(20):3908. https://doi.org/10.3390/math10203908

Chicago/Turabian Style

Yu, Haibo, Guojun Lu, Qianhua Cai, and Yun Xue. 2022. "A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification" Mathematics 10, no. 20: 3908. https://doi.org/10.3390/math10203908

APA Style

Yu, H., Lu, G., Cai, Q., & Xue, Y. (2022). A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification. Mathematics, 10(20), 3908. https://doi.org/10.3390/math10203908

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A KGE Based Knowledge Enhancing Method for Aspect-Level Sentiment Classification

Abstract

1. Introduction

2. Related Work

2.1. Aspect-Level Sentiment Classification

2.2. Semantics and Syntax

2.3. Knowledge Graph

3. Methodology

3.1. Sentence Encoder

3.2. Semantic Learning Module

3.3. Syntax Learning Module

3.4. Knowledge Enhancement Module

3.5. Information Fusion Module

3.6. Sentiment Classifier

4. Experiment

4.1. Dataset

4.2. Implementation Details

4.3. Baseline Methods

4.4. Experiment Results

4.5. Impact of GCN Layer Number

4.6. Impact of KGE

4.7. Run Time and Parametric Amount

4.8. Case Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI