Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks

Gu, Qun; Wang, Zhidong; Zhang, Hai; Sui, Siyi; Wang, Rui

doi:10.3390/app14020729

Open AccessArticle

Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks

by

Qun Gu

¹,

Zhidong Wang

¹,

Hai Zhang

¹,

Siyi Sui

¹ and

Rui Wang

^2,*

¹

School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

²

College of Information Technology, Shanghai Jianqiao University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(2), 729; https://doi.org/10.3390/app14020729

Submission received: 1 December 2023 / Revised: 9 January 2024 / Accepted: 11 January 2024 / Published: 15 January 2024

(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)

Download

Browse Figures

Versions Notes

Abstract

Aspect-level sentiment analysis is a task of identifying and understanding the sentiment polarity of specific aspects of a sentence. In recent years, significant progress has been made in aspect-level sentiment analysis models based on graph convolutional neural networks. However, existing models still have some shortcomings, such as aspect-level sentiment analysis models based on graph convolutional networks not making full use of the information of specific aspects in a sentence and ignoring the enhancement of the model by external general knowledge of sentiment. In order to solve these problems, this paper proposes a sentiment analysis model based on the Syntax-Aware and Graph Convolutional Network (SAGCN). The model first integrates aspect-specific features into contextual information, and second incorporates external sentiment knowledge to enhance the model’s ability to perceive sentiment information. Finally, a multi-head self-attention mechanism and Point-wise Convolutional Transformer (PCT) are applied to capture the semantic information of the sentence. The semantic and syntactic information of the sentences are considered together. Experimental results on three benchmark datasets show that the SAGCN model is able to achieve superior performance compared to the benchmark methods.

Keywords:

aspect-level sentiment analysis; graph convolutional networks; external sentiment commonsense knowledge; multi-head self-attention mechanism; point-wise convolutional transformer

1. Introduction

With the development of the Internet, user-generated online review texts have exploded, and there exists valuable information in these massive textual data that can help to contribute to business decision-making, policy formulation, and so on. How to extract this information quickly and accurately remains a challenge to this day, and in order to meet this challenge, the research field of Sentiment Analysis has arisen. Sentiment Analysis automatically analyses and understands the sentiment in large-scale text data and obtains the user’s emotions and opinions about it, thus providing valuable information for business decision making, e.g., for a business or an organization, Sentiment Analysis provides detailed sentiment information about the products, services, characteristics and other aspects of the detailed sentiment information, which helps to target improvement and optimization, thus enhancing user experience and increasing marketing volume so that the enterprise and customers end up in a win–win situation. Traditional sentiment analysis research mainly focuses on coarse-grained text, i.e., prediction at the sentence or document level [1], identifying the overall sentiment of the whole sentence or document. And coarse-grained sentiment analysis is difficult to meet user needs for personalization. In this case, fine-grained aspect level sentiment analysis (ALSA) is proposed to identify and understand the sentiment tendency of specific aspects of a sentence. For example, in the sentence “The taste of food is delicious and the price is reasonable, but the service is worst”, when the given aspect is taste of food, the affective polarity identified by ALSA is positive, whereas when the given aspect is price or service, the affective polarity is positive and negative, respectively.

In the early days of sentiment analysis research, machine learning approaches were mainly used, where the input features were based on manual design requiring expertise and experience [2], and the classifiers were trained and optimized using traditional machine learning algorithms, such as support vector machines [3] and decision trees. While these methods perform well in dealing with sentiment analysis problems, they usually require a lot of manual feature design and parameter tuning, and their classification effectiveness depends heavily on the quality of the features [4]. With the rise of deep learning, since they are not deliberately designed for feature engineering, they have shown superior performance in many natural language processing tasks, such as in machine translation, semantic recognition, question answering, text summarization, etc. As for the task of sentiment analysis, recurrent neural networks, long- and short-term memory networks and gated recurrent units have become the mainstream research methods [5] and achieved good results [6]. Attentional mechanisms have also been widely used in this task [7]. Applying pre-trained language models to downstream tasks has been a research hotspot in recent years; pre-trained word embeddings have been usually obtained in previous tasks using GloVe [8] and Word2vec [9], but the obtained word embeddings cannot represent polysemous words, and when the BERT pre-trained language model was proposed, this problem was effectively solved, and then GPT, RobERTa [10], etc., were proposed, and the word representation capability was further enhanced. Depending on the powerful representation capability of these advanced pre-trained language models, the contextual information and semantic associations in the text can be captured more accurately, which leads to a significant performance improvement in various downstream tasks.

Recently, aspect-level sentiment analysis using syntactic information and graph neural networks has been heavily researched and achieved good performance. Zhang et al. [11] argued that early models lacked a constraint mechanism to incorporate syntactic information and dependencies between long-distance words, which resulted in the models incorrectly considering syntactically irrelevant context words as cues for determining aspectual sentiments, and therefore used sentence dependencies as neighbors to guide GCN for information aggregation and dissemination and obtained good results. Huang et al. [12] proposed a new approach using graph attention networks that effectively integrates syntactic information to improve aspect-level sentiment classification. Zhu et al. [13] integrated the global and local structural information of sentences. By constructing word-document graphs to capture the global dependencies between words, and simultaneously using syntactic structure analysis to mine potential local structural information in sentences, excellent performance results were obtained on multiple datasets. Sun et al. [14] utilized long- and short-term memory networks to learn the features of sentences, and further enhanced the embedding representation on the dependency tree by graph convolutional networks, achieving excellent performance results on four benchmark datasets. There is great improvement in accuracy. However, there is a lack of utilization of sentiment-related knowledge for specific aspects of sentences and fusion of aspect-specific information. With further research, integrating external sentiment knowledge as auxiliary information into aspect-level sentiment analysis tasks is expected to further enhance the performance of the model. Liang et al. [15] constructed a new enhanced sentence dependency graph by using SenticNet sentiment general knowledge on the dependency graph of sentences. Liu et al. [16] incorporated GCN into the gating mechanism to enhance the GCN ability to node information to be fully aggregated. Meanwhile, contextual sentiment knowledge was incorporated into the graph convolutional network to enhance the model’s perception of sentiment features, which further proved the effectiveness of sentiment general knowledge in aspect-level sentiment analysis tasks.

Given the limitations of the above models, this paper proposes a novel aspect-level sentiment analysis model to address these issues. Firstly, aspectual features are supplemented to the syntax-aware module and semantic enhancement module, and external sentiment knowledge is integrated into a graph convolutional network to enhance the model’s ability to perceive the sentiment information; secondly, the semantic information of the sentence is obtained by using a multi-head self-attention mechanism and Point-wise Convolutional Transformer. The obtained syntactic and semantic information is pooled and spliced to obtain the final feature representation based on specific aspects of the sentence.

The main contributions of this paper are as follows:

(1): For the task of aspect-level sentiment analysis, we propose a novel model that integrates both syntactic and semantic aspects of sentence-specific aspects of sentiment tendencies.
(2): External affective general knowledge is introduced to enhance the model’s ability to perceive affective information, and additional aspect-specific information is added to the model to increase the model’s sensitivity to different specific aspects of the sentence.
(3): Extensive experiments on three benchmark aspect-level sentiment analysis datasets outperform the benchmark models compared, demonstrating the significant superiority of our proposed ASGCN model in aspect-level sentiment analysis tasks.

2. Related Work

Aspect-level sentiment analysis belongs to the fine-grained research area of sentiment analysis tasks, and its main challenge lies in the more complex task of accurately identifying the sentiment tendencies associated with specific aspects at the sentence level compared to traditional sentence-level and document-level sentiment analysis. This task requires the model to understand the text in greater detail and dig deeper into the sentiment information associated with specific aspects in the sentence, thus placing higher demands on the accuracy and complexity of the algorithm. With the development of deep learning, researchers have gradually introduced a variety of innovative neural network structures and attention mechanisms that meet this requirement and improve the performance of models in aspect-level sentiment analysis tasks. For example, Tang et al. [17] used an LSTM network to simultaneously establish the semantic correlations between the top text-to-target word and the bottom text-to-target word in a sentence. The accuracy of target-related sentiment classification was greatly improved. With the introduction of the attention mechanism, many models have emerged that integrate the attention mechanism with neural networks. Wang et al. [18] proposed a unidirectional LSTM aspect-level sentiment classification model based on the attention mechanism. For different input aspects, the most important sentiment features corresponding to them in the sentence can be captured. Tang et al. [19] used the attention mechanism to design a deep memory network, where each layer of the network learns an abstract representation of the text. Through the superposition of multiple layers of attention, the model learns a highly complex function of the sentence for a specific aspect, which has a high capacity of abstract data representation, to represent important affective information in the text. Ma et al. [20] argued that the target and the context should be treated equally, based on which an interactive attention model was designed to establish deep semantic associations between context and target items. The model obtains not only sentence-to-aspect attention, but also aspect-to-sentence attention, and then combines them for sentiment classification. Ren et al. [21] designed a lightweight and efficient model using gated CNNs, which integrates stacked gated convolutions and attention mechanisms. Liu et al. [22] utilized a multilayer attention mechanism, including intra-layer and inter-layer attention mechanisms that generate hidden state representations of sentences. In the intra-layer attention mechanism, multi-head self-attention and pointwise feed-forward structure were designed. In the inter-level attention mechanism, global attention is introduced to capture the interaction information between context and aspect words, and based on this, a feature-focused attention mechanism is proposed to enhance the model’s sentiment recognition ability. In recent years, the research field of aspect-oriented sentiment analysis has witnessed the emergence of a number of models that employ a combination of syntactic information and graph convolutional networks. These models capture the sentiment information of the target aspect more accurately by making full use of the syntactic structure and graph convolution operations in the text. This is because syntactic information provides syntactic relationships between words, while graph convolutional networks capture complex associations in the text through graph structures, enabling the model to understand the textual context more comprehensively. Zhang et al. [23] proposed a two-layer interactive graph convolutional network model for sentiment analysis. Tian et al. [24] devised a type-aware graph convolutional network model that utilizes not only the syntactic information but also explicitly inter-word dependency types, and attention integration was proposed in order to distinguish between different edges in the graph. Zhang et al. [25] pruned the syntactic dependency tree, got rid of the noisy information, and built a semantic-based GCN and a syntactic-based GCN.

3. Methodology

In this section, the SAGCN model proposed in this paper is introduced in detail, and the specific model architecture is shown in Figure 1, which consists of BERT, external sentiment common sense knowledge, and graph convolutional network. Firstly, BERT is used to pre-train the language model to obtain the word embedding representation of the fused context, then the aspect-specific sentiment representation after syntactic restriction is obtained through the syntactic-aware module on the left side, and then the sentiment semantics is supplemented by the syntactic-assisted module on the right side. The representations obtained from the syntax-aware and semantic complementary modules are spliced after average pooling, and finally classified using the Softmax function to obtain aspect-specific sentiment polarity in the sentence.

3.1. Definition of Tasks

Given a sentence which consists of n words, denoted by

P = {w_{1}, \dots, w_{a + 1}, \dots w_{a + m}, \dots, w_{n}}

, we select one of the aspects corresponding to this sentence, denoted by

T = {w_{a + 1}, w_{a + 2}, \dots, w_{a + m}}

; T is a subset of P which consists of m words, where a + 1 and

a + m

are the start and end indexes of the particular aspect, respectively, and the aim of the ALSA task study is to identify the affective polarity of the particular aspect in the given sentence.

3.2. BERT Embedding

In this study, a BERT pre-trained language model is used to obtain the embedding vectors of each word in the input sentence, i.e., the discrete symbols of the sentence are mapped to real vectors to capture the semantic and association information between the data for downstream models. Unlike directly inputting a sentence, and inspired by literatures [26,27] and so on, this paper inputs the text into the BERT model in the form of a sentence-aspect pair, i.e., [CLS]Sentence [SEP]Aspect [SEP] to obtain an embedding representation of the sentence and the aspect. This is expressed as follows:

E^{s} = {e_{1}^{s}, e_{2}^{s}, \dots, e_{n}^{s}},

(1)

E = {e_{a + 1}, e_{a + 2}, \dots, e_{a + m}} .

(2)

3.3. Syntax-Aware Module

3.3.1. BiLSTM

In order to obtain richer contextual semantic information, this paper inputs the text vectorized representation

E^{s}

obtained by BERT into the BiLSTM network. The specific structure of BiLSTM is shown in Figure 2. It can be seen that BiLSTM has two independent BiLSTM units in each time step. This structure allows for the model to capture both past and future dependencies simultaneously, and thus better capture the long-term dependencies and contextual information in the sequence. Linking the corresponding parallel hidden representations of the forward and backward BiLSTM modelling into a higher-dimensional representation generates richer semantic information, which works well in many sequence tasks such as language modelling, machine translation, sentiment analysis, etc. This bi-directional structure also helps to process various relations and patterns in the input sequence, improving the model’s ability to understand sequence data. The specific calculations are shown in Equations (3)–(6).

{\vec{h}}_{t} = \vec{L S T M} (e_{t}^{s}), t \in [1, n],

(3)

{\overset{\leftarrow}{h}}_{t} = \overset{\leftarrow}{L S T M} (e_{t}^{s}), t \in [n, 1],

(4)

{\tilde{e}}_{t} = ({\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}), t \in [1, n],

(5)

H = ({\tilde{e}}_{t}, {\tilde{e}}_{t + 1}, \dots, {\tilde{e}}_{n - 1}, {\tilde{e}}_{n}),

(6)

where

[n, 1]

denotes that the vectors are processed by LSTM in a back-to-front order in the sequence.

3.3.2. Emotional Common Sense Knowledge and Aspect-Enhanced Syntax Map

In order to take the dependencies between words in a sentence, which are exploited by graph convolutional networks, inspired by literature [11,14], in this paper, we use the spaCy (SpaCy toolkit: https://spacy.io/.) (accessed on 6 February 2023) tool to obtain a syntactic dependency tree, and then we use the dependency tree to construct a corresponding adjacency matrix

A \in R^{n \times n}

for each input sentence. If there is a dependency relationship between word

w_{i}

and word

w_{j}

, then

A_{i j}

is set to 1, otherwise

A_{i j}

= 0, followed by assuming that each word is adjacent to itself, i.e., the diagonals of the adjacency matrix A are all set to 1. The specific formula is shown in Equation (7):

A_{i j} = {\begin{cases} 1, i f {there is a dependency between w}_{i} {and w}_{j}, o r w_{i} {= w}_{j} \\ 0, otherwise \end{cases} .

(7)

Then, in order to enhance the ordinary syntactic dependency graph and to highlight the affective dependencies between individual words in a sentence, SenticNet7 is introduced into the model as an external source of affective commonsense knowledge, which is a conceptual-level knowledge base that assigns semantics to 300,000 concepts. SenticNet7 contains a large amount of information about affective-related concepts, words and the associations between them in order to facilitate the inference of textual emotional information. It also helps to identify and understand the emotional states embedded in the text. In this paper, in order to integrate sentiment knowledge into graph convolutional neural networks, the calculated sentiment scores of the words contained in SenticNet7 in the four sentiment dimensions representations are utilized, with the sentiment scores of each word distributed between −1 and 1. The sentiment scores corresponding to some of the words are shown in Table 1. The sentiment score matrix

M_{i j} \in R^{n \times n}

is defined as the sum of the sentiment scores of two words, as shown in Equation (8):

M_{ij} = S e n t i c N e t 7 (w_{i}) + S e n t i c N e t 7 (w_{j}) .

(8)

In the above equation,

S e n t i c N e t 7 (w_{i}) = 0

, which means that word

w_{i}

is neutral or does not exist in the SenticNet7 sentiment knowledge base.

In addition, currently available aspect-level sentiment analysis models based on graph convolutional networks usually do not fully consider the focus on specific aspects when constructing graphs. Therefore, in this paper, in order to further enhance the sentiment dependencies between context words and aspect words based on SenticNet7, an aspect enhancement matrix is proposed, denoted as

T_{i j} = 1

if

w_{i}

or

w_{j}

belongs to an aspect word, and

T_{i j} = 0

otherwise, where

T_{i j}

is the element contained in matrix

T

. Eventually, the adjacency matrix enhanced by external emotional knowledge and aspect words can be obtained. As shown in the calculation of Equation (9),

D_{i j} = A_{i j} (M_{i j} + T_{i j} + 1) .

(9)

3.3.3. Syntax-Enhanced Graph Convolution

Graph Convolutional Neural Networks were proposed by Kipf et al. [28] in 2016. The core idea is to update the feature representation of each node by aggregating the features of each node with those of its neighbors. This aggregation process uses the topology of the graph to define the relationships between nodes and learns weights to determine how much influence different neighboring nodes have on the current node. This is also similar to the traditional convolutional neural network operation of aggregating information within a node’s neighborhood for encoding local features of unstructured data. By stacking multiple layers of graph convolution, the network can progressively learn richer and more advanced node representations to adapt to more complex graph data. Influenced by literature [14], in this paper, the final grammar graph obtained in the previous section is fed into the GCN layer in order to learn the sentiment representations of specific aspects of sentences subject to grammatical constraints, and hidden representations of each node on the first layer of

l

GCN are computed as shown in Equations (10) and (11):

{\tilde{h}}_{i}^{l} = \sum_{j}^{n} D_{i j} W^{l} u_{j}^{l - 1},

(10)

h_{i}^{l} = Re L U ({\tilde{h}}_{i}^{l} / (d_{i} + 1) + b^{l}),

(11)

where

D_{i j}

denotes the element in the adjacency matrix,

u_{j}^{l - 1}

is the representation of the j node of the previous GCN layer, the inputs of the particular initial GCN are the hidden representation of the BiLSTM and the splicing of the original aspect-specific embedding

E

,

h_{i}^{l}

is the feature representation of the i node in the current layer,

d_{i} = \sum_{j = 1}^{n} D_{i j}

is the degree corresponding to the i node, and weights

W^{l}

and bias

b^{l}

are trainable parameters.

3.4. Semantic Assistance Module

3.4.1. Multi-Head Self-Attention

The syntax-aware module can extract syntactic and part of semantic information, but the model still lacks the complement of semantic information. In order to capture more semantic information, a semantic auxiliary module is introduced, in which instead of using the traditional BiLSTM to obtain semantic information, the hidden state of the embeddings is computed using multi-head self-attention. It can be computed in parallel, making full use of semantic relations between words; it does not have to consider order and distance, not to mention the loss of information due to long-term dependency [29], and it can capture different semantic information in different subspaces to obtain a richer semantic representation. Its input is a splice of sentence vector

E^{s}

and the mean of aspect vector

E

. The calculation process is shown in Equations (12) and (13):

M H S A (Q, K, V) = C o n c a t (h e a d_{1}; h e a d_{2}; \dots; h e a d_{h}) W^{h},

(12)

h e a d_{i} = A t t e n t i o n (Q_{i}, K_{i}, V_{i}) .

(13)

3.4.2. Point-Wise Convolutional

The Point-wise Convolutional Transformer can transform the hidden representation generated by the multi-head self-attention mechanism. It highlights the features related to aspectual words, thus improving the sensitivity of the model to emotional information in order to better capture emotional information. The convolution kernel of Point-wise Convolutional is 1. For input sequence

x^{'}

, the specific formula is shown below.

PCT (x^{'}) = σ (x^{'} * W_{p 1} {+ b}_{p 1}) * W_{p 2} {+ b}_{p 2},

(14)

h^{s} = P C T (M H S A),

(15)

where

σ

is the ELU nonlinear activation function,

W_{p 1}

and

W_{p 2}

are two trainable weight matrices, “

*

” denotes the convolution operator,

b_{p 1}

and

b_{p 2}

are the bias.

3.5. Feature Fusion

Inspired by literature [14], in the grammar-aware module, only the aspect vectors among them are selected for the aggregation of features, because these vectors are encoded by bi-directional long- and short-term memory network units and graph convolutional networks, respectively, which incorporate contextual semantic information and emotion-dependent information, and in this paper, the aspect vectors are average pooled so as to retain the vast majority of the information in the vectors, which is denoted by

h^{s y n}

.

In the semantic assistance module all the outputs after Point-wise Convolutional Transformer are average pooled to obtain the final representation of the semantics

h^{s e m}

, and then the syntactic and semantic representations are connected to obtain the final integrated representation of aspect-specific sentiment information. After linear variation, it is projected into the target space. The sentiment space probability distribution y is obtained after passing the Softmax function, and the specific formula is shown below.

h = [h^{s y n}; h^{s e m}],

(16)

x = W_{x} h + b_{x},

(17)

y = s o f t \max (x) .

(18)

3.6. Training of the Model

This model uses the cross-entropy loss function as an objective function to measure the difference between the predicted values and the actual labels. The bootstrap model is gradually optimized during the training process so that it can make more accurate predictions. The specific calculation formula is shown below.

L o s s = - \sum_{i = 1}^{s} \sum_{j = 1}^{c} (y_{i j} \log {\hat{y}}_{i j}) + λ | | θ | |^{2},

(19)

where c denotes the number of categories, s is the number of training samples,

y_{i j}

is the actual labelled value of sample i,

{\hat{y}}_{i j}

is the probability that sample i is predicted to be category j, λ is the regularity coefficient, and θ is the set of all parameters in the model. During the training process, the loss size of the predicted and labelled values are compared, and the loss function is adjusted to derive the model parameters by continuously performing forward and backward propagation, and the model parameters are optimally updated using the gradient descent method so that the loss function reaches the minimum value.

4. Experiment

4.1. Datasets

This paper focuses on training and evaluating the proposed model on three publicly available benchmark datasets, namely the Laptop and Restaurant review datasets from SemEval 2014 Task 4 [30] and the Twitter [31] review dataset from the ACL 14 task, where each sample in the dataset consists of a review sentence, a number of aspect words, and the sentiment polarity of the sentence corresponding to the aspect word. The details of the three datasets are shown in Table 2.

4.2. Experimental Environment

The experimental environment for this thesis is shown in Table 3.

4.3. Experimental Parameters and Evaluation Criteria

Embedded representations of aspect terms and sentences are obtained based on the BERT pre-trained model, both of which are 768 dimensions, and the number of LSTM hidden units is set to 300 dimensions. The batch size is 64, the maximum sentence length is 85, the number of layers of GCN is two. In addition, the optimizer is chosen to be Adam with a learning rate of 0.001, the Dropout mechanism and L2 regularization are introduced, and the random discard rate is set to 0.3. The regularization factor is 10⁻⁵, and the number of sentiment polarity categories is three. Accuracy and Macro F1 are used as evaluation criteria for this model.

4.4. Model Comparison

In order to validate the effectiveness of the SAGCN model proposed in this thesis by comparing it with some of the classical aspect-level sentiment analysis models in recent years, five GCN-based models are selected in this paper. The results are shown in Table 4.

(1): CDT: Learning affective representations of aspect-specific sentences using bi-directional long- and short-term memory networks, and further enhancing the embedded representations with graph convolutional networks with syntactic information.
(2): R-GAT [32]: In order to highlight the importance of the goal aspect, the dependency parse tree is reconstructed and pruned, the goal aspect is regarded as the root node of the dependency tree, and the graph attention network is used for information aggregation.
(3): SentICGCN: Enhancing sentence dependencies using external sentiment knowledge and constructing graph neural networks from enhanced dependency trees.
(4): DualGCN: A graph convolutional network model simultaneously considering syntactic structural complementarity and semantic relevance is proposed.
(5): SSEGCN: A new syntactically and semantically enhanced graph convolutional network is proposed to learn not only the semantic associations associated with aspects, but also the overall semantics of the sentence, and then combine syntactic structure and semantic information through different syntactic distances between words.

4.5. Analysis of Experimental Results

As can be seen from Table 4, the results of the CDT model are poorer; the model uses a long short-term memory network to capture the contextual information of the sentence and a graph convolutional network to perform convolutional operations on the syntactic dependency tree to enhance BiLSTM embedding representation. However, focusing on syntactic information with a single-channel GCN ignores the semantic relations of sentences to some extent. In contrast, the RGAT model outperforms the CDT model in the Laptop, Restaurant and Twitter datasets, with 1.02%, 4.30% and 1.49% higher accuracy than CDT, respectively, because the R-GAT model argues that the graph neural network learns sentence-specific aspectual sentiment representations from a dependency tree lacking effective between aspect words and opinion sentiment word dependencies, and that encoding the entire dependency tree introduces noisy information that degrades model performance. Therefore, the model uses a pruned and reconstructed syntactic dependency tree that preserves only edges directly related to aspects and uses aspect words as the root node of the tree in order to focus on the connection between aspects and potential sentiment words. Therefore, the sentiment classification accuracy of the RGAT model increases. Different from the CDT and RGAT models, the SenticGCN model proposes, for the first time, to introduce external sentiment knowledge into the aspect-level sentiment analysis task by augmenting the ordinary dependency graph with external sentiment knowledge and generating a sentiment-enhanced dependency graph, which is then passed into the GCN model to facilitate the model to extract the sentiment dependencies between context words and specific aspects, resulting in a further improvement of the model effect. The DualGCN model also pays attention to syntactic information while considering syntactic structure in order to make the syntactic and semantic information complement each other and proposes an orthogonal regularizer to constrain the attention scores in the model for the purpose of accurately capturing the semantic correlation between words, and the differential regularizer is used to supplement the semantic information that has not been obtained by the model so that the model effect is much higher than that of the CDT model. In order to make full use of syntactic information and to directly link aspect words with sentiment opinion words, the SSEGCN model proposes to use different syntactic distances between words to obtain syntactic mask matrices and learn the structural information from local to global in the sentence through the syntactic mask matrices in order to augment the traditional GCN. Although it achieves good results in the Restaurant and Twitter dataset, it does not fully utilize the information of specific aspects in the sentence. The SAGCN model proposed in this paper, on the other hand, focuses on the semantic information related to aspects, and also introduces external sentiment knowledge to enhance the dependency graph, which facilitates the model to extract the sentiment dependencies between context words and specific aspects in the sentence. Finally, a semantic auxiliary module is used to supplement certain semantic information to enrich sentiment features in the final output. Good experimental results are achieved on all three public datasets. SAGCN improves the accuracy of the Laptop dataset by 5.87% compared to the CDT model, while the accuracy of the Restaurant and Twitter datasets is improved by 5.23% and 3.31%, respectively. From the improvement results, it can be seen that the improvement on the Laptop and Restaurant datasets is larger, while the improvement on the Twitter dataset is relatively lower. This is because the text in the Twitter dataset is usually shorter than that in the Laptop and Restaurant datasets and contains a large number of abbreviations and slang, the percentage of neutral comments is larger than that of the other two datasets, and the percentage of sentences with neutral comments is larger than that of the other two datasets. The semantic relationship of sentences is more complex, so it is more difficult for the model to capture the sentiment information in it.

4.6. Ablation Study

As shown in Table 5, in order to further validate the effectiveness of each module in SAGCN, ablation experiments are conducted in this paper. Considering the basic SAGCN as the baseline model, it can be first observed that removing the external sentiment knowledge decreases model performance, with accuracy decreasing by 0.94%, 1.51%, and 1.60% in the Laptop, Restaurant, and Twitter datasets, respectively. It can be verified that the introduction of external sentiment general knowledge is necessary for the aspect-level sentiment analysis task, and the use of external sentiment knowledge can introduce the sentiment information into the sentence context and aspect dependency, which enables the model to obtain more accurate sentiment features. Also, performance of the model without fused aspect feature information is significantly lower than that of the model with spliced aspect features, suggesting that making full use of aspect-specific information in the sentence is beneficial for improving model performance. Finally, removing the semantic auxiliary module leads to a significant decrease in model performance, suggesting that supplementing the model with sufficient semantic information plays a key role in the aspect-level sentiment analysis task. In conclusion, the results of the ablation experiments show that each module contributes to the overall model.

4.7. Effect of the Number of GCN Layers

The performance of different GCN layers on the Laptop, Restaurant and Twitter datasets can be seen in Figure 3. It is clearly seen that the model works best when the number of GCN layers is two. As the number of GCN layers increases, the accuracy and Macro-F1 decrease significantly. This is because a two-layer GCN can capture a certain degree of local and global information, enabling the model to retain a certain amount of global contextual information while taking into account the surrounding nodes, which helps the model to better understand the relationship between aspectual words and context. When the number of GCN layers is one, it is difficult for the model to access the complex dependencies and contextual information in the sentence. It is unable to fully perceive the information about aspect and sentiment, so the model performs poorly in terms of performance. When the number of GCN layers is more than two, the GCN node representations undergo over-smoothing, which means that the representations of the neighboring nodes become very similar and are difficult to distinguish. In aspect-level sentiment analysis tasks, this may cause the model to lose important information related to a particular aspect, as different aspects require different feature representations, so model performance is degraded.

5. Conclusions

In this paper, we propose a syntax-aware and graph convolutional network-based sentiment analysis model for aspect-level sentiment analysis tasks. The model first uses BERT to pre-train the language model to obtain embedded representations of sentences and aspectual vocabulary, and then obtains syntactic knowledge and semantic information of sentences through syntactic-aware and semantic-assisted modules, respectively. In the syntactic-aware module, sentence dependency graphs are obtained through dependency parsing, and then external affective common sense knowledge is introduced into dependency graphs, used to augment the dependency graphs and provide more accurate sentiment representations of the different aspects of the sentence. In the semantic auxiliary module, multi-head self-attention and Point-wise Convolutional are used to supplement the semantic information and enrich the sentiment features of specific aspects. In order to highlight the importance of aspect information, specific aspect information is additionally supplemented in both semantic and syntactic modules. The experimental results on three benchmark data demonstrate the effectiveness of the SAGCN model proposed in this paper. Compared to the most basic CDT model consisting of graph convolutional networks, the accuracy of the SAGCN model in this paper is improved by 5.58% in the Laptop data, whereas the accuracy on the Restaurant and Twitter datasets is improved by 5.23% and 3.31%, respectively. The accuracy of classification is greatly improved.

Although the SAGCN model achieved good results in the ALSA task, there are still some limitations, and we will further improve our method based on these limitations in future research. Firstly, the model overly relies on dependencies between words and ignores other complex relationships in the sentence, which may lead to a decrease in the accuracy of the model. For this problem we plan to integrate the composition tree into the aspect-level sentiment analysis model, enriching the syntactic information through the composition tree and improving the accuracy of the model’s classification. Secondly, the dependence on pre-trained models such as BERT may affect the performance of the model and the repeatability of the experiments. To address this problem, we plan to freeze the weights of the pre-trained models and control the version of the model to minimize the sensitivity of the model to changes in external resources and improve the reliability and repeatability of the experiments. Thirdly, the model architecture is relatively complex, which is not convenient for practical application. For this, we consider pruning or compressing the model to reduce the model size and the number of parameters so as to improve the scalability and efficiency of the model in practical application.

Author Contributions

Conceptualization, Q.G. and R.W.; Methodology, Q.G. and Z.W.; Software, Z.W.; Validation, Q.G., S.S. and R.W.; Formal analysis, H.Z.; Writing—original draft, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/zhangzheng1997/SSEGCN-ABSA/tree/main/dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

Trusca, M.M.; Frasincar, F. Survey on aspect detection for aspect-based sentiment analysis. Artif. Intell. Rev. 2023, 56, 3797–3846. [Google Scholar] [CrossRef]
Wang, X.; Chen, X.; Tang, M.; Yang, T.; Wang, Z. Aspect-level sentiment analysis based on position features using multilevel interactive bidirectional GRU and attention mechanism. Discret. Dyn. Nat. Soc. 2020, 2020, 1–13. [Google Scholar] [CrossRef]
Liu, J.; Li, Z.; Ye, X.; Chen, J.; Deng, J. Sentiment Orientation of Text:bfsmPMI-SVM. J. Wuhan Univ. Nat. Sci. Ed. 2017, 63, 259–264. [Google Scholar]
Dubey, T.; Jain, A. Sentiment Analysis of Keenly Intellective Smart Phone Product Review Utilizing SVM Classification Technique. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 6–8 July 2019; pp. 1–8. [Google Scholar]
Liu, B.; Lane, I. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. In Proceedings of the 17th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2016), San Francisco, CA, USA, 8–12 September 2016; pp. 685–689. [Google Scholar]
Liang, B.; Du, J.; Xu, R.; Li, B.; Huang, H. Context-aware embedding for targeted aspect-based sentiment analysis. arXiv 2019, arXiv:1906.06945. [Google Scholar]
Lu, T.; Xiang, Y.; Zhang, L.; Zhang, J. Sentence constituent-aware attention mechanism for end-to-end aspect-based sentiment analysis. Multimed. Tools Appl. 2022, 81, 15333–15348. [Google Scholar] [CrossRef]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems, NIPS 2013, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Zhang, C.; Li, Q.; Song, D. Aspect-based sentiment classification with aspect-specific graph convolutional networks. arXiv 2019, arXiv:1909.03477. [Google Scholar]
Huang, B.; Carley, K.M. Syntax-aware aspect level sentiment classification with graph attention networks. arXiv 2019, arXiv:1909.02606. [Google Scholar]
Zhu, X.; Zhu, L.; Guo, J.; Liang, S.; Dietze, S. GL-GCN: Global and Local Dependency Guided Graph Convolutional Networks for aspect-based sentiment classification. Expert Syst. Appl. 2021, 186, 115712. [Google Scholar] [CrossRef]
Sun, K.; Zhang, R.; Mensah, S.; Mao, Y.; Liu, X. Aspect-Level Sentiment Analysis via Convolution over Dependency Tree. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5679–5688. [Google Scholar]
Liang, B.; Su, H.; Gui, L.; Cambria, E.; Xu, R. Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl. Based Syst. 2022, 235, 107643. [Google Scholar] [CrossRef]
Liu, H.; Wu, Y.; Li, Q.; Lu, W.; Li, X.; Wei, J.; Liu, X.; Feng, J. Enhancing aspect-based sentiment analysis using a dual-gated graph convolutional network via contextual affective knowledge. Neurocomputing 2023, 553, 126526. [Google Scholar] [CrossRef]
Tang, D.; Qin, B.; Feng, X.; Liu, T. Effective LSTMs for target-dependent sentiment classification. arXiv 2015, arXiv:1512.01100. [Google Scholar]
Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-Based LSTM for Aspect-Level Sentiment Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
Tang, D.; Qin, B.; Liu, T. Aspect level sentiment classification with deep memory network. arXiv 2016, arXiv:1605.08900. [Google Scholar]
Ma, D.; Li, S.; Zhang, X.; Wang, H. Interactive attention networks for aspect-level sentiment classification. arXiv 2017, arXiv:1709.00893. [Google Scholar]
Ren, F.; Feng, L.; Xiao, D.; Cai, M.; Cheng, S. DNet: A lightweight and efficient model for aspect based sentiment analysis. Expert Syst. Appl. 2020, 151, 113393. [Google Scholar] [CrossRef]
Liu, M.; Zhou, F.; Chen, K.; Zhao, Y. Co-attention networks based on aspect and context for aspect-level sentiment analysis. Knowl. Based Syst. 2021, 217, 106810. [Google Scholar] [CrossRef]
Zhang, M.; Qian, T. Convolution over Hierarchical Syntactic and Lexical Graphs for Aspect Level Sentiment Analysis. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3540–3549. [Google Scholar]
Tian, Y.; Chen, G.; Song, Y. Aspect-Based Sentiment Analysis with Type-Aware Graph Convolutional Networks and Layer Ensemble. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 2910–2922. [Google Scholar]
Zhang, Z.; Ma, Z.; Cai, S.; Chen, J.; Xue, Y. Knowledge-Enhanced Dual-Channel GCN for Aspect-Based Sentiment Analysis. Mathematics 2022, 10, 4273. [Google Scholar] [CrossRef]
Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. Dual Graph Convolutional Networks for Aspect-Based Sentiment Analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 6319–6329. [Google Scholar]
Song, Y.; Wang, J.; Jiang, T.; Liu, Z.; Rao, Y. Attentional encoder network for targeted sentiment classification. arXiv 2019, arXiv:1902.09314. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Song, W.; Wen, Z.; Xiao, Z.; Park, S.C. Semantics perception and refinement network for aspect-based sentiment analysis. Knowl. Based Syst. 2021, 214, 106755. [Google Scholar] [CrossRef]
Pontiki, M.; Papageorgiou, H.; Galanis, D.; Androutsopoulos, I.; Pavlopoulos, J.; Manandhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 27–35. [Google Scholar]
Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive Recursive Neural Network for Target-Dependent Twitter Sentiment Classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 23–24 June 2014; pp. 49–54. [Google Scholar]
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational graph attention network for aspect-based sentiment analysis. arXiv 2020, arXiv:2004.12362. [Google Scholar]

Figure 1. SAGCN-Specific Model Architecture.

Figure 2. Bidirectional LSTM network structure.

Figure 3. Effect of the number of GCN layers on the model.

Table 1. Sentiment scores for selected words in SenticNet7.

Words	Introspection	Attitude	Sensitivity	Sentiment Scores
Abandon	−0.329	0	0	−0.329
Happy	0.659	0	0	0.659
Depression	−0.999	0	−0.999	−0.999
Beautiful	0	0.823	0	0.823
excitement	0.659	0	0.659	0.659

Table 2. Statistics of datasets.

Datasets	Positive		Neutral		Negative
Datasets	Train	Test	Train	Test	Train	Test
Laptop	994	341	464	169	870	128
Restaurant	2164	727	637	196	807	196
Twitter	1507	172	3016	336	1528	169

Table 3. Experimental environment details.

Development Environment	Parameters
computer operating system	Windows 10
CPU	Intel(R) Core(TM) i5-10200 H
RAM	16 GB
Integrated Development Tools	JetBrains Pycharm
Deep Learning Framework	Pytorch 1.13.0
GPU	NVIDIA GeForce RTX 3050 Laptop GPU

Table 4. Experimental results.

Models	Laptop		Restaurant		Twitter
Models	Accuracy	Macro-F1	Accuracy	Macro-F1	Accuracy	Macro-F1
CDT	77.19	72.99	82.30	74.02	74.66	73.66
RGAT	78.21	74.07	86.60	81.35	76.15	74.88
SenticGCN	82.12	79.05	86.92	81.03	-	-
DualGCN	81.80	78.10	87.13	81.16	77.40	76.02
SSEGCN	81.01	77.96	87.31	81.09	77.81	76.08
SAGCN	83.06	79.69	87.53	81.28	77.97	76.62

Bold numbers represent the best experimental results. Symbol “-” indicates that the results of the experiment were not available in the original paper.

Table 5. The results of the ablation study.

Models	Laptop		Restaurant		Twitter
Models	Accuracy	Macro-F1	Accuracy	Macro-F1	Accuracy	Macro-F1
w/o Sentic	82.12	78.51	86.02	80.11	76.37	75.12
w/o Aspect	82.86	79.14	87.16	81.72	77.21	75.89
w/o Semantic	81.12	77.81	85.92	79.02	75.73	74.1
SAGCN	83.06	79.69	87.53	81.28	77.97	76.62

Bold numbers represent the best experimental results.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, Q.; Wang, Z.; Zhang, H.; Sui, S.; Wang, R. Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks. Appl. Sci. 2024, 14, 729. https://doi.org/10.3390/app14020729

AMA Style

Gu Q, Wang Z, Zhang H, Sui S, Wang R. Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks. Applied Sciences. 2024; 14(2):729. https://doi.org/10.3390/app14020729

Chicago/Turabian Style

Gu, Qun, Zhidong Wang, Hai Zhang, Siyi Sui, and Rui Wang. 2024. "Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks" Applied Sciences 14, no. 2: 729. https://doi.org/10.3390/app14020729

APA Style

Gu, Q., Wang, Z., Zhang, H., Sui, S., & Wang, R. (2024). Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks. Applied Sciences, 14(2), 729. https://doi.org/10.3390/app14020729

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Definition of Tasks

3.2. BERT Embedding

3.3. Syntax-Aware Module

3.3.1. BiLSTM

3.3.2. Emotional Common Sense Knowledge and Aspect-Enhanced Syntax Map

3.3.3. Syntax-Enhanced Graph Convolution

3.4. Semantic Assistance Module

3.4.1. Multi-Head Self-Attention

3.4.2. Point-Wise Convolutional

3.5. Feature Fusion

3.6. Training of the Model

4. Experiment

4.1. Datasets

4.2. Experimental Environment

4.3. Experimental Parameters and Evaluation Criteria

4.4. Model Comparison

4.5. Analysis of Experimental Results

4.6. Ablation Study

4.7. Effect of the Number of GCN Layers

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI