Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification

Yang, Bin; Li, Haoling; Teng, Sikai; Sun, Yuze; Xing, Ying

doi:10.3390/electronics12061329

Open AccessArticle

Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification

by

Bin Yang

¹

,

Haoling Li

²,

Sikai Teng

³,

Yuze Sun

⁴ and

Ying Xing

^5,*

¹

China Unicom Research Institute, Beijing 100048, China

²

School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

³

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China

⁴

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

⁵

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(6), 1329; https://doi.org/10.3390/electronics12061329

Submission received: 7 February 2023 / Revised: 4 March 2023 / Accepted: 8 March 2023 / Published: 10 March 2023

(This article belongs to the Special Issue Big Data Analytics, Emerging Technologies and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Aspect-based sentiment analysis (ABSA) plays a significant role in the field of big data and aims to distinguish the sentiment polarity of specific aspects in given sentences; however, the previous works on ABSA had two limitations. They mainly considered semantic features, rather than syntactic dependency features, and paid too much attention to the context words, while ignoring the high-level interaction of multiple representations of aspects themselves. To cope with these limitations, we propose a new method based on the graph convolutional network (GCN) and the multi-head attention mechanism, called the attention interactive encoder network (AIEN). The GCN was used to obtain the syntactic information that has the greatest syntactic impact on the aspect based on the syntax dependency tree. The multi-head attention mechanism can not only obtain the context-aware information of the given aspects, but also the interaction information between multiple representations of the aspect itself. The high-level information generated by the interaction of multi-dimensional features can produce a stronger representation ability for the aspect. Our experiments with the proposed model on five benchmark datasets showed that our model outperformed other works significantly. The experimental results further demonstrated the feasibility and applicability of our proposed model in the ABSA task.

Keywords:

ABSA; GCN; multi-head attention mechanism; interactive information

1. Introduction

As an emerging technology, big data analysis aims to collect massive data, analyze and mine the hidden relationship between datasets, and establish corresponding data [1,2]. Due to the rapid development of AI technology, the use of deep learning methods for data analysis has become the norm in industry and academia [3]. In natural language processing, sentiment analysis is a common data analysis task, widely used in various text and voice services [4,5,6]. Among them, aspect-based (also known as aspect-level) sentiment analysis (ABSA) is a fine-grained sentiment analysis task for a given aspect [7]. It determines the relative sentiment polarity of each aspect of the sentence, rather than outputting a single sentiment polarity for the whole sentence. The sentiment polarity of a particular aspect item is closely related to its contextual words and syntactic dependencies, so the polarity of each aspect in a sentence is usually unique. For example, in the sentence “The pizza is delicious, but the service is terrible.”, pizza and service have different emotion polarity.

Early work applied rule-based or statistical methods to ABSA tasks with obvious manual characteristics [8]. Lately, machine learning and neural networks have been broadly utilized in ABSA tasks [9]. The mainstream is to develop a neural network structure that can learn the features of aspects and context words such as semantic features. Because long-term and short-term memory (LSTM) can extract the sequential characterizations of sentences [10] and the attention mechanism can pay more attention to the important information of context words for a given aspect and reduce the interference of irrelevant information [11], these two advanced technologies are widely used in ASBA tasks. However, the attention mechanism has some limitations. Generally speaking, it is impossible to obtain the complex syntactic dependency between aspect and context words, because it only outputs their relative importance. For example, from “Its size is ideal and its weight is acceptable”, the attention mechanism may misinterpret “acceptable” as a key contextual word describing size. To capture the relationship between aspect and context, recent research tried to obtain the syntactical dependency of these two contents in ABSA based on a graph convolutional network (GCN) [12,13]. Nevertheless, these works focused more on context words and less on interactive information between context words and aspects.

In short, the current research on ABSA has two limitations. First, previous works mainly focused on semantic features rather than syntactic dependency features [14]. They mainly focused on context-aware features without considering aspects. At the same time, they also ignored some syntactical dependency of aspect related to context words, which may lead to their inability to capture the information given by the most-applicable context words about determining the sentiment polarity of a particular aspect. In addition, some work focused on the intuitive information between context words and aspects, but ignored the interaction between features of multiple representations of aspects [12,13]. However, these multiple representations of aspects contain rich information, such as semantics, syntax, context-aware features, etc. Through the interaction between these contents containing comprehensive information to obtain advanced features, various aspects of information can be synthesized and the representation of aspects can be improved, so as to more effectively describe the emotion polarity of aspects.

Hence, we designed an attentional interactive encoder network (AIEN) to tackle these two problems. In the AIEN, how to explore the semantic and syntactic dependencies centered on aspect itself will become the main concern in this paper. Firstly, the semantic features of context and aspect in sentences are captured by LSTM, respectively. Then, we used the GCN to obtain the syntactic dependency features from the graph constructed according to the syntactic dependency tree. A simple schematic of syntactical dependency analysis is shown in Figure 1.

Interactive features can integrate multiple feature capabilities to obtain a better representation. Therefore, after acquiring the multi-dimensional representation features of aspect and context words, we need to acquire the high-level interaction features centered on the aspect. We propose a strategy based on the attention mechanism to obtain interactive information on aspects according to various features. Considering the multiple representations of aspects in semantics and syntax and the influence of context words on aspects, we obtained two types of high-level interaction features of aspects based on the multi-head attention mechanism, including the interaction between multiple representations of aspects and the interaction between aspects and context words. These interactive features improved the basic characterization ability of aspects.

Our contributions are as follows:

1. We endeavored to investigate multiple representations of aspects, including the semantic features of the aspect word itself, as well as the semantic and syntactic dependency features in the whole sentence.

2. We propose a high-level interactive feature extraction strategy focusing on aspect by the attention mechanism. These generated high-level interaction features consider both semantic characteristics and syntactical dependence in the meantime.

3. The extensive experimental results showed that our model performed better than the state-of-the-art models. Our ablation experiments additionally confirmed the significance of semantic features and syntactic dependence information; meanwhile, they also demonstrated the effectiveness of interaction features focused on aspect in the ABSA task.

The rest of this paper is organized as follows. Section 2 reviews some existing works related to our work. Section 3 presents the architecture of our model the AIEN. Section 4 shows the experimental results and analyzes them. Finally, Section 5 concludes this work.

2. Related Works

2.1. Aspect-Based Sentiment Analysis

As a fine-grained sentiment classification task, ABSA has attracted the attention of researchers and has gradually become an important branch in the field of sentiment analysis. In early studies, researchers mainly extracted features of the specific aspect through rule-based and statistics methods. Most of these methods are manual and labor-intensive [15]. With the emergence of deep learning (DL), many scholars have turned to explore ABSA tasks by introducing neural networks. In the beginning, researchers only extracted the semantics of the text through LSTM, not considering the relationship between aspect and context [9,16]. Later research realized the importance of this relationship between aspect and context words and proposed to use LSTM and the attention mechanism to establish the sentiment polarity of aspects considering aspect and context. Because of the excellent performance of attention in the natural language processing (NLP) and computer vision (CV) tasks, many works have also tried to enhance the important information of context words related to aspect through the attention method in the ABSA task [17,18,19]. While enjoying the benefits brought by the attention mechanism, the researchers also considered integrating other methods to improve the model effect of ABSA tasks, such as feature interaction and regularization. Reference [20] proposed an interactive attention network (IAN) to interactively learn contextual and aspect attention and generate representations for aspect and context, respectively. Reference [14] used the attention encoder network to model the relationship between the context and the target and introduced label smoothing regularization to improve the results. These methods simply model the semantic information of aspect and context words. However, they lack information about the syntactic dependencies. Thus, this may result in irrelevant contexts being involved in determining the sentiment polarity of aspect.

2.2. Graph Convolutional Network

The graph convolutional neural network (GCN) proposed by [21] can be effectively used to process graph-structured data containing rich inter-node relationship information. The GCN was introduced into computer vision processing and applied to the tasks of image segmentation and video detection [22,23]. The GCN was also applied to social network mining and recommendation system, to build graphs through the relationship between users or users and items [24]. Furthermore, the GCN was extended to NLP tasks such as text classification [25], machine translation [26], and relation extraction [27]. In recent years, there have been some works that have attempted to introduce the GCN into sentiment analysis. Reference [12] first built a GCN on the syntactic dependence tree to use syntactic information to solve related syntactic constraints and long-distance word dependence. Reference [28] combined the attention mechanism and the GCN to capture the syntactic dependence between different aspects of a sentence. Reference [13] proposed a model composed of the attention mechanism and the graph convolutional network built on sentence dependency trees. Reference [29] proposed a directed GCN, which performs joint extraction and sentiment analysis by encoding syntactic information. Although these methods attempt to obtain syntactic dependency features through the GCN, they do not consider the feature interactions of aspect and context words at the same time. In this paper, we propose a novel method called the AIEN to obtain comprehensive interactive features focusing on the aspect-based GCN and attention mechanism.

3. Methodology

The architecture of our developed AIEN model is shown in Figure 2. In the figure, “Embedding” represents GloVe embedding or pre-trained BERT embedding; “Hidden states” represents the Bi-LSTM; “GCN Layer” represents the layers in the GCN; “Mask Layer” denotes the layer of the aspect-specific masking technique; “MHA” refers to multi-head attention.

3.1. Embedding

Given a sentence

s = \{w_{1}^{s}, w_{2}^{s}, \dots, w_{τ}^{s}, w_{(τ + 1)}^{s}, \dots, w_{(τ + m)}^{s}, \dots, w_{n}^{s}\}

of n words including corresponding m words’ aspect beginning from the

(τ + 1)

to

(τ + m)

token, through word embedding, we map each word to a low-dimensional real-valued vector space. The embedding matrix is denoted as

E \in R^{d \times | V |}

, where d is the embedding dimension of word vectors and

| V |

is the size of the vocabulary. In our work, the pre-trained word embedding of GloVe [30] and BERT [31] is used to initialize word embeddings.

Considering the BERT embedding layer, the network architecture of BERT uses a multi-layer transformer encoder structure; here, we used the most-basic BERT model, which has 12 transformer layers. The training of the BERT model was divided into two steps: pre-training and fine-tuning. After pre-training, BERT returns two feature vectors, which contain the context information extracted for aspect and context, respectively. Then, we fine-tuned the model. To simplify, we first added a dropout layer after the output of the model and then added a linear layer. After word embedding, we encoded an n-word sentence and an m-word target embedding vector

c = \{e_{1}^{c}, e_{2}^{c}, e_{3}^{c}, \dots, e_{n}^{c}\}

and

t = \{e_{1}^{t}, e_{2}^{t}, e_{3}^{t}, \dots, e_{m}^{t}\}

, respectively.

3.2. Semantic Information Encoding

3.2.1. Bidirectional LSTM

We employed bidirectional LSTM (Bi-LSTM) in the next layer of the embedding to capture the context information of each word, which can effectively use the previous words and future words of contextual information and then summarize the information in two directions to obtain word features.

The word embedding vectors including context and target words are sent into Bi-LSTM, respectively. Then, the Bi-LSTM produces the hidden state vector for contexts and the target, respectively, as follows:

\begin{matrix} H_{c} = \{h_{1}^{c}, h_{2}^{c}, h_{3}^{c}, \dots, h_{n}^{c}\} \end{matrix}

(1a)

\begin{matrix} H_{t} = \{h_{1}^{t}, h_{2}^{t}, h_{3}^{t}, \dots, h_{n}^{t}\} \end{matrix}

(1b)

where

(h_{i}^{c}, h_{i}^{t}) \in R^{2 d_{h}}

represents the hidden state vector at time i in Bi-LSTM and

d_{h}

is the dimension of the hidden state vector output by undirected LSTM.

3.2.2. Aspect-Specific Masking

To obtain aspect features’ focusing, we designed a masking mechanism that filters out non-aspect words and keeps solely aspect-specific features:

\begin{matrix} h_{t}^{L} = 0 1 \leq t < τ + 1, τ + m < t \leq n \end{matrix}

(2)

\begin{matrix} H_{m a s k} = \{0, \dots, h_{τ}, \dots, h_{τ + m}, \dots, 0\} \end{matrix}

(3)

As shown in Equation (2),

H_{m a s k}

represents the semantic information filtered through the mask mechanism. In our work, we obtained the context-perceived features’ focusing aspect through a masking mechanism. The masked information focuses on aspects. At this time, aspects have rich contextual semantic information.

3.3. Syntactic Information Encoding

3.3.1. Position Encoding

Empirically, we found that the polarity of a specific aspect is more easily affected by the context words that are closer to the aspect. Therefore, we employed position encoding to simulate these normal rules before feeding into a successive GCN layer. Formally, given an aspect W, which is indexed from

τ

to

τ + m

, the relative distance weight of the i-th word is defined as follows:

q_{i} = \{\begin{matrix} 1 - \frac{τ + 1 - i}{n} & 1 \leq i < τ + 1 \\ 1 & τ + 1 \leq i \leq τ + m \\ 1 - \frac{i - τ - m}{n} & τ + m < i \leq n \end{matrix}

(4)

where

q_{i} \in R

is the position weight to the i-th token. In Equation (4),

q_{i} = 1

means that the word in the sentence is an aspect word. That indicates that the position distance weight of the aspect is 1, and then, the weight is gradually reduced from the aspect to the sides. Finally, we can obtain the position-aware representation with position information:

H = \{h_{1}^{L}, \dots, h_{τ + 1}^{L}, \dots, h_{τ + m}^{L}, \dots, h_{n}^{L}\}

(5)

3.3.2. Graph Convolutional Network

The graph convolutional network (GCN) can effectively obtain the correlation of the graph through the information transfer between the nodes of the graph. Even if it is not trained and completely uses the randomly initialized parameter W, the features extracted by the GCN are excellent. Due to the GCN capturing the connectivity of the graph through information propagation between the nodes of the graph, we can use the GCN to solve the sentiment dependency between aspect and context of the ABSA problem. We used the GCN on the syntactic dependency tree of sentences and manually set each word to be adjacent to itself according to the idea of the GCN, that is the diagonal elements of the adjacency matrix are ones. The representation of each node is updated with the graph convolutional operation with a normalization factor, which is shown below:

h_{i}^{(k)} = f (W^{(k)} \cdot \frac{\sum_{u \in N (i)} h_{u}^{(k - 1)}}{| N (i) |} + B \cdot h_{i}^{(k - 1)})

(6)

where

h_{i}^{(k)}

is node v’s embedding at step k and

h_{i}^{(0)}

is node v’s initial embedding specifically. The embedding of a neighbor of node v is

h_{u}^{(k - 1)}

. Apparently,

\frac{\sum_{u \in N (i)} h_{u}^{(k - 1)}}{| N (i) |}

is the mean of v’s neighbor’s embeddings at step

k - 1

. For each step k, the function f and matrices W and B are potentially learnable parameters and shared across all nodes. Especially,

h_{i}^{0} = X

, which is a feature description

x_{i}

for every node i, summarized in an

N \times D

feature matrix X (N: number of nodes, D: number of input features).

h_{i}^{L} = Z

, which is a node-level output Z (an

N \times F

feature matrix, where F is the number of output features per node), and L is the number of layers.

In our work, we used a two-layer GCN to exploit text information, and the calculation can be indicated as follows:

\begin{matrix} Z^{(0)} = R e l u (\tilde{A} X W^{(0)}) \end{matrix}

(7a)

\begin{matrix} Z^{(1)} = softmax (\tilde{A} Z^{(0)} W^{(1)}) \end{matrix}

(7b)

where

Z^{(0)}

represents the first layer and

Z^{(1)}

represents the second layer.

R e l u

is a nonlinear activation function, which can effectively alleviate the overfitting problem. The final output is

H_{g}

:

H^{g} = softmax (\tilde{A} (Relu (\tilde{A} X^{l} W^{(0)})) W^{(1)})

(8)

Then, by analogy with the aspect semantic feature extraction of the context, we also applied the mask mechanism to obtain the syntactic dependency features of the aspect in the context. As shown in Equation (8),

H_{g}

is the syntax-dependent feature of aspect.

3.4. Aspectwise Feature Interaction

3.4.1. Multi-Head Attention

Our model is based on multi-head attention (MHA), which can be performed in parallel subspaces. The main method of MHA is to divide the dimension of attention parameters (Query, Key, and Value) into multiple subspaces. After segmentation, each subspace can pay attention to different feature spaces to improve the performance of the model and, finally, put together the results of these subspaces. We define a Key sequence

k = {k_{1}, k_{2}, \dots, k_{n}}

and a Query sequence

q = {q_{1}, q_{2}, \dots, q_{n}}

. Then, the mathematical calculation is as follows:

Attention (k, q) = softmax (f_{m} (k, q)) k

(9)

where

f_{m}

is the function utilized to calculate and learn the semantic relation between

q_{j}

and

k_{i}

:

f_{m} (k_{i}, q_{j}) = tanh ([k_{i}; q_{j}] \cdot W_{a})

(10)

where

W_{a} \in R^{2 d_{h i d}}

is the parameter learnable matrix. MHA is able to learn the scores of different

n_{h e a d}

in parallel spaces, and the parameters are not shared between these heads, because the values of q and k are continuously changing. The results of

n_{h e a d}

are connected and projected to the particular hidden dimension

d_{h i d}

by

\begin{matrix} MHA = [o^{1} \oplus o^{2} \oplus \dots \oplus o^{n_{head}}] \cdot W_{m} \end{matrix}

(11)

\begin{matrix} o^{h} = {Attention}^{h} (k, q) \end{matrix}

(12)

where “⊕” represents the vector concatenation,

W_{m} \in R^{d_{h i d} \times d_{h i d}}

,

o^{h} = \{o_{1}^{h}, o_{2}^{h}, \dots, o_{m}^{h}\}

is the output of the h-th head’s attention, and

h \in [1, n_{head}]

.

3.4.2. Aspect to Aspect Interaction

Through the above work, we obtained aspect-focused contextual features

H^{g a}

, semantic features

H^{c a}

, and syntactic features

H^{c g}

, as well as the semantic features

H^{a}

of the aspect itself. Considering that the information contained in different representation forms of aspect is different, its high-level features are obtained through interaction. We realized the interaction through MHA, including the interaction of context-aware semantic features and syntactic features and the interaction between the aspect’s own features and context-aware aspect features, respectively. The calculation formulas of these three groups of focused aspect interaction characteristics are as follows:

\begin{matrix} H^{c a} = MultiHead (H^{c}, H^{a}) \end{matrix}

(13a)

\begin{matrix} H^{g a} = MultiHead (H^{g}, H^{a}) \end{matrix}

(13b)

\begin{matrix} H^{c g} = MultiHead (H^{c}, H^{g}) \end{matrix}

(13c)

where

H^{c a} = {h_{1}^{c a}, h_{2}^{c a}, \dots, h_{n}^{c a} \in R^{d_{h} \times n}}

,

H^{g a} = {h_{1}^{g a}, h_{2}^{g a}, \dots, h_{n}^{g a} \in R^{d_{h} \times n}}

,

H^{c g} = {h_{1}^{c g}, h_{2}^{c g}, \dots, h_{n}^{c g} \in R^{d_{h} \times n}}

and

d_{h}

is the dimension of MHIA.

3.5. Output and Training

After obtaining the aspect-focused contextual features representation

h^{g a}

, semantic feature representation

h^{c a}

, and syntactic feature represent

h^{c g}

, we first obtained the average vector by adapting the averaging pooling method and then concatenated the aspect feature with the interactive feature between aspect and context, the interaction feature between aspect and GCN context, the interaction feature between context and GCN context, and the GCN context feature. The final feature representation u is as follows:

u = [h_{a v g}^{c} \oplus h_{a v g}^{c a} \oplus h_{a v g}^{g a} \oplus h_{a v g}^{c g} \oplus h_{a v g}^{a}]

(14)

where “⊕” represents the vector concatenation and

h_{a v g}

stands for

\frac{\sum_{i = 1}^{n} h_{i}}{n}

. Finally, the probability distribution of different target sentiment polarities is obtained through the output of the softmax layer, as follows:

\begin{matrix} x = {\tilde{W}}_{u}^{T} \tilde{u} + b_{u} \end{matrix}

(15)

\begin{matrix} y = softmax (x) = \frac{exp (x)}{\sum_{k = 1}^{C} exp (x)} \end{matrix}

(16)

Our model was trained by minimizing the cross-entropy with the L2-regularization term. For a given sentence, the loss function is represented by

Loss = - \sum_{(c, l) \in C} log P_{l} + λ {∥ Θ ∥}_{2}

(17)

where C is the collection of datasets, l is the label,

P_{l}

means the l-th element of P,

Θ

represents all of the trainable parameters, and

λ

is the parameter of regularization.

4. Experiments

This section presents the experimental setup, including the dataset, evaluation metrics, and baseline, in which the effectiveness of our proposed model and its comparison with other models were shown. The outstanding features of the model for the ABSA task were highlighted by conducting an ablation experiment, parameter experiment, and case study on the model.

4.1. Experiment Setting

4.1.1. Datasets

To comprehensively evaluate the performance of the AIEN, we conducted experiments on five real datasets. The TWITTER dataset was originally constructed by [16] and contains Twitter posts, while the other four datasets (LAP14, REST14, REST15, REST16) were derived from SemEval 2014 Task 4 [32], SemEval 2015 Task 12 [33], and SemEval 2016 Task 5 [34]. These four datasets include two categories: laptop and restaurant. Each sample consists of the review sentences, aspects, and sentiment polarity towards the aspects. These datasets are labeled with three sentiment polarities: positive, neutral, and negative. For the fairness and rationality of the experiment, we used the same training set, validation set, and test set data distribution as all other baseline models. Table 1 shows the number of training and test instances in each dataset.

4.1.2. Evaluation Metrics

The experimental results were obtained by averaging 3 times with random initialization, where the accuracy and macro-averaged F1 were adopted as the evaluation metrics.

4.1.3. Baseline

To comprehensively evaluate the performance of the proposed AIEN, we compared our model with a range of baselines and state-of-the-art models, as listed below:

SVM [35] is a traditional method based on complicated feature engineering.
LSTM [36] obtains the extracted features through LSTM and then uses the obtained softmax function to obtain the emotion polarity.
The MemNet [37] uses a multi-hop architecture and handles contexts as external memory.
AOA [38] migrates the attention-over-attention method from the field of machine translation and applies it to the field of ABSA.
The IAN [20] proposes an interactive attention network model based on LSTM and the attention mechanism, modeling target and context, respectively, then interacting with them, and finally, obtaining the results.
The TNet-LF [39] proposes a technique for producing target-specific word representations that incorporates a mechanism for keeping the original contextual data from the RNN layer.
The AEN [14] eliminates recurrence and models the relationship between the contexts and the specific targets using an attentional encoder network.
The ASGCN [12] proposes a graph convolutional network (GCN) on the dependency tree of sentences to make use of syntactic information and word dependence.
The AEGCN [13] proposes the multi-head attention and an improved graph convolutional network built over the dependency tree of a sentence.
The SK-GCN [40] utilizes both the syntax-based and knowledge-based GCN to describe dependency trees and knowledge graphs.

In order to fairly evaluate our method, all experiments in this paper extracted the embedding vector with the GloVe pre-trained model. Furthermore, we also used the BERT pre-trained model and fine-tuned it in combination with this task.

4.2. Performance Comparison

The overall performance of all the models is shown in Table 2, from which several observations can be noted. The AIEN consistently outperformed all compared models on the TWITTER, LAP14, and REST14 datasets.

The AIEN based on the deep learning model had better performance than the traditional machine learning methods. In the above table, the SVM model proposed by Kirishenko uses SVM for classification, which depends on a large amount of manual feature extraction. On the TWITTER, LAP14, and Rest14 datasets, the accuracy of our model was 10.97%, 5.44%, and 1.74% higher than that of SVM without manual feature extraction. This result showed that the deep learning model is applicable to specific aspects of sentiment analysis.

In addition, the mask mechanism can have the effect of preserving context information on specific aspect words, which had some advantages on REST16. Furthermore, we also speculated that the REST16 dataset may have the shortcomings of fuzzy feature information and a large number of noise data, which, to some extent, affected our model. The experimental results also showed that the accuracy of our model on the REST15 dataset was slightly lower than that on SK-GCN, but the AIEN was significantly better than the SK-GCN on the F1 measurement. Compared with the benchmark models, the ASGCN and AEGCN, our model performed better on various datasets, which showed that the idea of using LSTM and the GCN for feature extraction and emotion classification through the multi-head attention mechanism is reasonable.

The pre-trained model based on BERT showed excellent performance, and the final classification accuracy was far higher than all existing ABSA models. At the same time, in order to make the BERT pre-trained model better match our emotion classification task, we fine-tuned the model. By adding a simple linear layer and a softmax output layer, the model accuracy and F1 value on various datasets exceeded the basic pre-trained model, which fully proved the ability of BERT in this task. As an embedded layer, BERT showed further improvement after joining our network, the AIEN. These results proved the effectiveness of the AIEN in capturing important syntactic structures of sentiment analysis.

4.3. Study of our Method

4.3.1. Ablation Experiment

To further understand the impact of each part of our model on the performance, we conducted several ablation experiments, as shown in Table 3. All the data in the w/o context experiments were inferior to the benchmark because the context contains aspect lexical structure information and semantic correlation, which had an impact on the performance of the model. In addition, the result of w/o the GCN experiments also proved that the GCN method can be harmoniously combined with our proposed model and plays a positive role. The experimental results of w/o mask further emphasized the importance of the mask layer and proved that the target-oriented feature extraction plays a key role in the ABSA task.

From the result without MHIA, the performance of the Rest15 dataset was improved after deleting MHIA. The possible reason is that, compared with the other datasets, the Rest15 dataset is more sensitive to syntactic information. Using MHIA will cause a small part of the Rest15 dataset to lose key feature information.

4.3.2. Parameter Experiment

Since the design of the AIEN model involves MHA and the GCN, we investigated the impact of the number of headers of MHA and the number of layers of the GCN on the model performance and checked the accuracy and macro-F1 of the AIEN on the LAP14 dataset. The results are shown in Figure 3 and Figure 4. When the attention headers reached three and the number of GCN layers was two, the model obtained the best performance. This proved the rationality of selecting parameters in the design of the AIEN model.

4.3.3. Case Study

In this section, to better understand how the AIEN works, we further show the advantages of our proposals through some typical examples. We list some examples from the test set that have typical characteristics in Figure 5. In Sentence (a), “I have not a bad thing to say about this place.”, there is a negative word: not. This can easily lead to the model making the wrong predictions. Our model can obtain the correct polarity, unaffected by negation words that do not represent negation here. In Sentence (b), “great food but the service was dreadful!”, there are two different aspects of food and service, which brings additional difficulty to detecting the target semantics. Our model can distinguish different aspects of different emotion polarities. In the last instance (c), “I’m not necessarily fanatical about this place, but it was a fun time for low prices”, the sentence structures are long and complex, and it may be difficult for existing models to make correct judgments to correctly predict the polarity. However, our model can process these sentences well with the help of modules such as the attention mechanism and the GCN to make the correct sentiment polarity judgment.

5. Conclusions

Due to the lack of research on syntactic features and high-level interaction features simultaneously in traditional works, this paper proposed the AIEN model to overcome the problem. Through LSTM and the GCN layer, we obtained multi-dimensional features including the semantic features of the aspect itself, the context-aware semantic features, and the syntactic features. Based on these features, we constructed the multi-dimensional representation of a specific aspect, which was composed of the new high-level interactive features. These generated multi-dimensional features were used for the final prediction of the ABSA task. We conducted extensive experiments on five datasets and compared the experimental performance with other advanced methods. The experimental results showed that our proposed approach can significantly improve the effectiveness of the model. In future work, we will extend the method proposed in this paper to study the performance of multiple aspects to verify the scalability of this method.

Author Contributions

Conceptualization, B.Y. and H.L.; methodology, B.Y. and S.T.; software, H.L., S.T. and Y.S.; validation, Y.S.; formal analysis, S.T. and Y.X.; investigation, H.L. and Y.S.; resources, Y.X.; data curation, S.T. and Y.X.; writing original draft preparation, H.L.; writing—review and editing, H.L., Y.S. and Y.X.; visualization, Y.S.; supervision, Y.X.; project administration, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, M.; Mao, S.; Liu, Y. Big data: A survey. Mob. Netw. Appl. 2014, 19, 171–209. [Google Scholar] [CrossRef]
Mayer-Schönberger, V.; Cukier, K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. In Smart Business Cincinnati/Northern Kentucky; Houghton Mifflin Harcourt: Boston, MA, USA, 2013. [Google Scholar]
Deng, L.; Yu, D. Deep Learning: Methods and Applications. Found. Trends Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef] [Green Version]
Elkorany, A.; Hassan, H. Opinion Mining and Sentimental Analysis Approaches: A Survey. Life Sci. J. 2014, 11, 321–326. [Google Scholar]
Muthukumaran, S.; Suresh, P.; Amudhavel, J. A state of art approaches on sentiment analysis techniques. J. Adv. Res. Dyn. Control Syst. 2017, 9, 1353–1370. [Google Scholar]
Sitaula, C.; Basnet, A.; Mainali, A.; Shahi, T.B. Deep learning-based methods for sentiment analysis on Nepali COVID-19-related tweets. Comput. Intell. Neurosci. 2021, 2021, 2158184. [Google Scholar] [CrossRef] [PubMed]
Yadav, K.; Kumar, N.; Kumar Reddy, P.; Reddy Gadekallu, T. A Comprehensive Study on Aspect Based Sentimental Analysis Framework and its Techniques. Int. J. Eng. Syst. Model. Simul. 2021, 12, 279. [Google Scholar]
Talafha, B.; Al-Ayyoub, M.; Abuammar, A.; Jararweh, Y. Outperforming State-of-the-Art Systems for Aspect-Based Sentiment Analysis. In Proceedings of the 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 3–7 November 2019; pp. 1–5. [Google Scholar] [CrossRef]
Vo, D.T.; Zhang, Y. Target-dependent twitter sentiment classification with rich automatic features. In Proceedings of the Twenty-fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [Green Version]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Zhang, C.; Li, Q.; Song, D. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. arXiv 2019, arXiv:1909.03477. [Google Scholar]
Xiao, L.; Hu, X.; Chen, Y.; Xue, Y.; Gu, D.; Chen, B.; Zhang, T. Targeted sentiment classification based on attentional encoding and graph convolutional networks. Appl. Sci. 2020, 10, 957. [Google Scholar] [CrossRef] [Green Version]
Song, Y.; Wang, J.; Jiang, T.; Liu, Z.; Rao, Y. Attentional encoder network for targeted sentiment classification. arXiv 2019, arXiv:1902.09314. [Google Scholar]
Jiang, L.; Yu, M.; Zhou, M.; Liu, X.; Zhao, T. Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 151–160. [Google Scholar]
Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 49–54. [Google Scholar]
Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
Wang, B.; Lu, W. Learning latent opinions for aspect-level sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Yang, H.; Zeng, B. Enhancing Fine-grained Sentiment Classification Exploiting Local Context Embedding. arXiv 2020, arXiv:2010.00767. [Google Scholar]
Ma, D.; Li, S.; Zhang, X.; Wang, H. Interactive attention networks for aspect-level sentiment classification. arXiv 2017, arXiv:1709.00893. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Qi, X.; Liao, R.; Jia, J.; Fidler, S.; Urtasun, R. 3d graph neural networks for rgbd semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5199–5208. [Google Scholar]
Xu, M.; Zhao, C.; Rojas, D.S.; Thabet, A.; Ghanem, B. G-tad: Sub-graph localization for temporal action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10156–10165. [Google Scholar]
Wang, H.; Zhao, M.; Xie, X.; Li, W.; Guo, M. Knowledge graph convolutional networks for recommender systems. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3307–3313. [Google Scholar]
Peng, H.; Li, J.; He, Y.; Liu, Y.; Bao, M.; Wang, L.; Song, Y.; Yang, Q. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1063–1072. [Google Scholar]
Bastings, J.; Titov, I.; Aziz, W.; Marcheggiani, D.; Sima’an, K. Graph convolutional encoders for syntax-aware neural machine translation. arXiv 2017, arXiv:1704.04675. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. arXiv 2018, arXiv:1809.10185. [Google Scholar]
Zhao, P.; Hou, L.; Wu, O. Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl. Based Syst. 2020, 193, 105443. [Google Scholar] [CrossRef] [Green Version]
Chen, G.; Tian, Y.; Song, Y. Joint aspect extraction and sentiment analysis with directional graph convolutional networks. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 272–279. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha, Qatar, 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Manandhar, S. Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014. [Google Scholar]
Pontiki, M.; Galanis, D.; Papageorgiou, H.; Manandhar, S.; Androutsopoulos, I. Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), Denver, CO, USA, 4–5 June 2015; pp. 486–495. [Google Scholar]
Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S.; Al-Smadi, M.; Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. Semeval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the International Workshop on Semantic Evaluation, San Diego, CA, USA, 16–17 June 2016; pp. 19–30. [Google Scholar]
Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval), Dublin, Ireland, 23–24 August 2014; pp. 437–442. [Google Scholar]
Tang, D.; Qin, B.; Feng, X.; Liu, T. Effective LSTMs for target-dependent sentiment classification. arXiv 2015, arXiv:1512.01100. [Google Scholar]
Tang, D.; Qin, B.; Liu, T. Aspect level sentiment classification with deep memory network. arXiv 2016, arXiv:1605.08900. [Google Scholar]
Huang, B.; Ou, Y.; Carley, K.M. Aspect level sentiment classification with attention-over-attention neural networks. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, Washington DC, USA, 10–13 July 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 197–206. [Google Scholar]
Li, X.; Bing, L.; Lam, W.; Shi, B. Transformation networks for target-oriented sentiment classification. arXiv 2018, arXiv:1805.01086. [Google Scholar]
Zhou, J.; Huang, J.X.; Hu, Q.V.; He, L. Sk-gcn: Modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowl. Based Syst. 2020, 205, 106292. [Google Scholar] [CrossRef]

Figure 1. An example of a syntactic dependency parse tree.

Figure 2. The overall architecture of the attentional interactive encoder network (AIEN).

Figure 3. Impact of number of heads in MHA.

Figure 4. Impact of number of GCN layers.

Figure 5. Case study. Visualization of attention scores for Rest14, Rest15, and Rest16.

Table 1. Statistics of datasets used in this paper.

Datasets		Pos	Neu	Neg	Total
TWITTER	Train	1561	3127	1560	6248
	Test	173	346	173	692
LAP14	Train	994	464	870	2328
	Test	341	169	128	638
REST14	Train	2164	637	807	3608
	Test	728	196	196	1120
REST15	Train	912	36	256	1204
	Test	326	34	182	542
REST16	Train	1240	69	439	1748
	Test	469	30	117	616

Table 2. Model comparison results (%). The top two results with each dataset are in bold. We use “N/A” to represent unreported experimental results. The comparison models’ results were retrieved from the original papers.

Model	TWITTER		LAP14		REST14		REST15		REST16
Model	Accuracy	F1	Accuracy	F1	Accuracy	F1	Accuracy	F1	Accuracy	F1
SVM [35]	63.4	63.3	70.49	N/A	80.16	N/A	N/A	N/A	N/A	N/A
LSTM [36]	69.56	67.7	69.28	63.09	78.13	67.47	77.37	55.17	86.8	63.88
MemNet [37]	71.48	69.9	70.64	65.17	79.61	69.64	77.31	58.28	85.44	65.99
AOA [38]	72.3	70.2	72.62	67.52	79.97	70.42	78.17	57.02	87.5	66.21
IAN [20]	72.5	70.81	72.05	67.38	79.26	70.09	78.54	52.65	84.74	55.21
TNet-LF [39]	72.98	71.43	74.61	70.14	80.42	71.03	78.47	59.47	89.07	70.43
AEN [14]	72.83	69.81	73.51	69.04	80.98	72.14	N/A	N/A	N/A	N/A
ASGCN-DT [12]	71.53	69.68	74.14	69.24	80.86	72.19	79.34	60.78	88.69	66.64
ASGCN-DG [12]	72.15	70.4	75.55	71.05	80.77	72.02	79.89	61.89	88.99	67.48
AEGCN [13]	73.16	71.82	75.91	71.63	81.04	71.32	79.95	60.87	87.39	68.22
SK-GCN [40]	71.97	70.22	73.20	69.18	80.36	70.43	80.12	60.70	85.17	68.08
Our GloVe	74.37	72.78	75.91	71.71	81.90	73.69	80.01	63.70	88.31	69.43
Our BERT	74.86	73.71	78.21	73.39	85.36	78.33	83.58	64.67	90.58	74.49
BERT + FineTune	74.98	73.15	79.01	74.98	86.96	81.82	84.13	71.41	91.40	75.35

Table 3. Ablation study results (%). The best results with each dataset are highlighted in bold.

Model	TWITTER		LAP14		REST14		REST15		REST16
Model	Accuracy	F1	Accuracy	F1	Accuracy	F1	Accuracy	F1	Accuracy	F1
Our	74.37	72.78	75.91	71.71	81.90	73.69	80.01	63.70	88.31	69.43
w/o mask	73.98	72.61	73.09	68.29	79.82	69.51	78.59	60.93	87.53	67.77
w/o GCN	73.69	72.20	75.28	70.95	81.69	73.53	79.58	62.24	88.04	68.07
w/o context	72.10	70.17	73.14	68.41	80.53	71.09	79.52	56.87	87.71	66.19
w/o MHIA	72.88	71.69	73.77	69.37	80.50	70.00	80.50	62.77	88.04	65.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, B.; Li, H.; Teng, S.; Sun, Y.; Xing, Y. Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification. Electronics 2023, 12, 1329. https://doi.org/10.3390/electronics12061329

AMA Style

Yang B, Li H, Teng S, Sun Y, Xing Y. Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification. Electronics. 2023; 12(6):1329. https://doi.org/10.3390/electronics12061329

Chicago/Turabian Style

Yang, Bin, Haoling Li, Sikai Teng, Yuze Sun, and Ying Xing. 2023. "Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification" Electronics 12, no. 6: 1329. https://doi.org/10.3390/electronics12061329

APA Style

Yang, B., Li, H., Teng, S., Sun, Y., & Xing, Y. (2023). Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification. Electronics, 12(6), 1329. https://doi.org/10.3390/electronics12061329

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Attentional Interactive Encoder Network Focused on Aspect for Sentiment Classification

Abstract

1. Introduction

2. Related Works

2.1. Aspect-Based Sentiment Analysis

2.2. Graph Convolutional Network

3. Methodology

3.1. Embedding

3.2. Semantic Information Encoding

3.2.1. Bidirectional LSTM

3.2.2. Aspect-Specific Masking

3.3. Syntactic Information Encoding

3.3.1. Position Encoding

3.3.2. Graph Convolutional Network

3.4. Aspectwise Feature Interaction

3.4.1. Multi-Head Attention

3.4.2. Aspect to Aspect Interaction

3.5. Output and Training

4. Experiments

4.1. Experiment Setting

4.1.1. Datasets

4.1.2. Evaluation Metrics

4.1.3. Baseline

4.2. Performance Comparison

4.3. Study of our Method

4.3.1. Ablation Experiment

4.3.2. Parameter Experiment

4.3.3. Case Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI