Next Article in Journal
5G-Practical Byzantine Fault Tolerance: An Improved PBFT Consensus Algorithm for the 5G Network
Previous Article in Journal
Bootstrapping Optimization Techniques for the FINAL Fully Homomorphic Encryption Scheme
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning

1
School of Computer and Information Engineering, Henan University, Kaifeng 475004, China
2
Intelligent Computing Infrastructure Innovation Center, Zhejiang Lab, Hangzhou 311121, China
3
54th Research Institute, China Electronics Technology Group Corporation, Shijiazhuang 050081, China
4
School of Software, Henan University, Kaifeng 475004, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(3), 201; https://doi.org/10.3390/info16030201
Submission received: 28 January 2025 / Revised: 22 February 2025 / Accepted: 1 March 2025 / Published: 5 March 2025

Abstract

:
Aspect-based sentiment analysis (ABSA) through joint task learning aims to simultaneously identify aspect terms and predict their sentiment polarities. However, existing methods face two major challenges: (1) Most existing studies focus on the sentiment polarity classification task, ignoring the critical role of aspect term extraction, leading to insufficient performance in capturing aspect-related information; (2) existing methods typically model the two tasks independently, failing to effectively share underlying features and semantic information, which weakens the synergy between the tasks and limits the overall performance of the model. In order to resolve these issues, this research suggests a unified framework model through joint task learning, named MTL-GCN, to simultaneously perform aspect term extraction and sentiment polarity classification. The proposed model utilizes dependency trees combined with self-attention mechanisms to generate new weight matrices, emphasizing the locational information of aspect terms, and optimizes the graph convolutional network (GCN) to extract aspect terms more efficiently. Furthermore, the model employs the multi-head attention (MHA) mechanism to process input data and uses its output as the input to the GCN. Next, GCN models the graph structure of the input data, capturing the relationships between nodes and global structural information, fully integrating global contextual semantic information, and generating deep-level contextual feature representations. Finally, the extracted aspect-related features are fused with global features and applied to the sentiment classification task. The proposed unified framework achieves state-of-the-art performance, as evidenced by experimental results on four benchmark datasets. MTL-GCN outperforms baseline models in terms of F1ATE, accuracy, and F1SC metrics, as demonstrated by experimental results on four benchmark datasets. Additionally, comparative and ablation studies further validate the rationale and effectiveness of the model design.

Graphical Abstract

1. Introduction

With the quick advancement of Internet technology in recent years, the flourishing rise of various social media platforms and e-commerce has provided netizens with extensive channels for expressing their opinions while also generating vast amounts of emotion-rich review data. Analyzing users’ emotional tendencies toward events, products, and services from this data has become one of the key research hotspots. Efficiently analyzing review data can not only enhance the service quality of online platforms and strengthen societal sentiment monitoring but also optimize business marketing strategies, enabling precise customer service and product improvement.
Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment classification task aimed at identifying the sentiment polarity of specific aspects within a text. It is commonly used to analyze users’ sentiment toward particular aspects of a product or service, such as the taste of food or the attitude of service. The sentiment polarity is typically divided into three categories: positive, negative, and neutral. Such analysis holds significant commercial value for businesses, enabling them to accurately understand consumer feedback and optimize their products or services accordingly. Currently, ABSA research is primarily focused on English corpora, with relatively limited studies on other languages. Due to differences in grammar structures, vocabulary usage, and ways of expressing sentiment across languages, these linguistic characteristics can directly impact the effectiveness of sentiment analysis. Therefore, ABSA methods need to be adapted to the specific features of each language. With the rapid development of deep learning technologies, sentiment classification techniques have also matured in the context of ABSA. Currently, pre-trained language models such as BERT have been widely applied to sentiment analysis tasks. Through fine-tuning, BERT can effectively adjust and optimize for different languages and sentiment categories. Additionally, other deep learning methods, such as graph neural networks (GNNs) and long short-term memory networks (LSTMs), are also commonly used in ABSA tasks, further improving the model’s performance in handling complex sentiment analysis. Moreover, ABSA demonstrates remarkable advantages even in scenarios with scarce data. On one hand, it leverages large-scale pre-trained models to effectively reduce the dependency on labeled data. On the other hand, through knowledge transfer, it can generate more high-quality training samples, thereby further improving model performance.
Many previous studies on aspect-based sentiment analysis [1,2] have primarily focused on improving the performance of single tasks. However, the ABSA task consists of two closely related subtasks: aspect term extraction (ATE) and sentiment classification (SC). These two subtasks can be effectively handled together using a multi-task learning approach. For example, in the sentence “The price is reasonable although the service is poor,” the ATE task needs to identify “price” and “service” as aspect terms, while the SC task attempts to predict the sentiment polarity (e.g., neutral, positive, or negative) associated with these aspect terms. The intrinsic relationship between these two subtasks lies in their shared dependency on the same aspect terms.
However, current research still faces several notable limitations in the following areas: (1) Existing methods mainly focus on sentiment polarity classification tasks, neglecting the critical role of aspect term extraction in ABSA tasks. This leads to insufficient performance in capturing aspect-related information, resulting in incomplete identification of aspect terms and, consequently, affecting the accurate judgment of sentiment polarity for these terms. (2) Existing methods typically model aspect term extraction and sentiment polarity classification as independent tasks, failing to effectively share underlying features and semantic information between them. This weakens the synergy between the tasks and limits the further improvement in model performance. (3) Although multi-task learning methods have achieved some success, current multi-task learning approaches do not fully leverage syntactic dependency information, leading to poor performance when handling long-distance dependencies and complex contexts. As a result, the models struggle to effectively capture deep dependencies within the text.
In order to resolve the aforementioned concerns, this article proposes a unified joint framework, MTL-GCN, based on GCN and joint task learning. The joint framework consists of four modules: a context encoding module, an aspect term extraction module, a contextual feature representation module, and a sentiment polarity classification module. First, the input sequence is processed using a BERT-based context encoder [3] to generate low-level textual features shared by both subtasks.
On the one hand, a self-attention mechanism and the input sentence’s dependency tree (a tool in natural language processing used to analyze the syntactic structure of a sentence and help the model recognize the dependencies between words) are used to calculate a new attention weight matrix W F . The graph convolutional network is then reconstructed to focus on positional information, resulting in position-focused graph convolutional networks (P-GCNs). These P-GCNs model the local structure and dependency relations of the sentence to extract aspect terms, providing aspect features for the sentiment classification task. On the other hand, the MHA processes the input data, which is then provided to the GCN. The GCN performs graph structure modeling to capture long-distance dependencies and generates deep contextual feature representations. Finally, the results from the aspect term extraction and contextual feature representation modules are integrated, where the extracted aspect features are fused with contextual features. These combined features are fed into a classifier to complete sentiment polarity classification, ultimately outputting the sentiment category for each aspect. Experimental results demonstrate that the proposed joint task learning framework has superior performance across four benchmark datasets, significantly outperforming multiple baseline models in classification accuracy and effectiveness.
The primary contributions of this study are as follows:
  • This study presents an innovative aspect-based sentiment analysis model (MTL-GCN) that integrates graph convolutional networks into a joint task learning framework. By introducing a feature-sharing mechanism to promote information interaction between tasks, aspect term extraction is effectively leveraged to provide deep semantic features for the sentiment classification task, thereby improving the overall performance.
  • In terms of method design, we redesign the traditional GCN. By introducing the relative positional information of nodes, P-GCNs are proposed, which significantly enhance the modeling capability of syntactic dependencies and focus on the positional information of aspect terms.
  • We propose a context feature representation method that combines graph convolutional networks with the multi-head attention mechanism, comprehensively integrating the local and global semantic information within the text.
  • Finally, experiments on multiple benchmark datasets validate the superior performance of the proposed model in aspect term extraction and sentiment classification tasks.

2. Related Work

Early aspect term extraction methods have employed various techniques to extract aspect features [4], including both traditional methods and deep learning-based approaches [5]. In the medical field, Ji et al. [6] proposed applying the Bi-LSTM-CRF model to named entity recognition in Chinese electronic medical records. Phan et al. [7] combined part-of-speech embeddings, dependency embeddings, and contextual embeddings (such as BERT and RoBERTa) to further enhance the performance of aspect term extraction. In addition, some studies [8] have attempted to model the relationships between words using latent dependency information, where certain relationships may include positional information of aspect terms. Chen et al. [1] were the first to associate words with soft prototypes for the ATE task, where soft prototypes provide a semantic summary of aspect terms. This approach significantly improved the performance of traditional sequence labeling methods. Ma et al. [9] formalized this method in a sequence-to-sequence learning task. They also improved ATE performance by creating gated unit networks and position-aware attention mechanisms. These methods classify each word into specific categories (such as whether it constitutes an aspect term) and rely on features such as lexical information, part-of-speech tags, and syntactic information for model training. With the development of deep learning, more and more research has started to use neural network methods for aspect term extraction. Sun et al. [10] proposed an aspect term extraction method that combines a dynamic attention mechanism and dense connection graph convolutional network. This method has achieved the best results among existing studies.
In the field of sentiment classification, early research primarily relied on rule-based methods or traditional machine learning approaches, such as support vector machines (SVMs) [11] and naive Bayes [12]. These methods typically relied on manually constructed sentiment lexicons or methods based on the frequency of sentiment-related words in the text for classification. However, these traditional methods have certain limitations, as they are unable to fully capture the complex emotions and deep contextual information within the text. In recent years, deep learning technologies have made significant progress, particularly graph convolutional networks (GCNs) [13], whose core advantage lies in their ability to embed graph structures into deep neural networks, enabling efficient modeling of structured information in text and providing richer contextual representations. Zhang et al. [13] proposed a unique syntax and semantics enhanced graph convolutional network (SSEGCN) model that learns from local features to global contextual information to enhance classification performance. A bi-syntax aware graph attention network (BiSyn-GAT+) was proposed by Liang et al. [2]. This network utilizes syntactic information to simulate intra- and inter-contextual relationships, effectively improving sentiment classification performance. In addition, the SS-GCN model proposed by Chen et al. [14] is one of the most advanced models in sentiment classification tasks. This model automatically learns the syntactic weight matrix through graph convolutional networks (GCNs) and combines semantic information to obtain deep semantic representations of the text, thereby effectively enhancing the overall performance of the task.
The multi-task learning approach has been widely adopted in recent years and has played a significant role in improving model performance. Nguyen et al. [15] proposed the MATEPC model to jointly handle aspect term extraction and sentiment polarity classification tasks. Zhao et al. [16] integrated aspect term extraction and sentiment polarity classification within a multi-task learning framework, utilizing multi-head attention mechanisms and relational graph attention networks (RGAT) to capture key dependency relations, thereby enhancing classification performance. Fan et al. [17] integrated the AD-BiReGU module into the BERT-LCF framework to simultaneously perform aspect term extraction and fine-grained sentiment analysis, addressing the limitations of existing models that primarily focus on a single task. Their proposed model achieved state-of-the-art results. Multi-task learning enhances the model’s semantic understanding in complex contexts by sharing internal representations, allowing the model to effectively capture the inherent relationships between sentiment classification and aspect extraction tasks during training. Compared to traditional single-task learning methods, multi-task learning optimizes shared features between tasks, enabling the model to promote and collaborate between sentiment polarity classification and aspect extraction, thereby significantly improving the overall performance.

3. Methodology

3.1. Task Definition

For a sentence s = {w1, w2, …, wn}, where w1, w2, …, wn represent the words in the sentence, the aspect term sequence consists of m words a = {a1, a2, …, am}, which represent specific words or phrases related to the sentiment aspects in the sentence. These aspect terms differ from other words in the sentence because they specifically represent the entities or attributes to which the sentiment is directed. ABSA, through multi-task learning, aims to label aspect terms with a set of goal boundary labels Y = {B, I, O} and forecast their sentiment polarities Y = {pos, neg, neu}. Where the B tag indicates the initiation of an aspect term, I indicates the intermediate or final word of an aspect term, and O represents other words. The sentiment polarities correspond to positive, negative, and neutral, respectively.

3.2. Overview of the Model Framework

In Figure 1, we present the proposed multi-task learning framework (MTL-GCN), which combines graph convolutional networks with multi-task learning methods. The framework consists of four main modules: a context encoding module, an aspect term extraction module, a contextual feature representation module, and a sentiment polarity classification module. Specifically, Figure 1 outlines the entire processing flow from input text to sentiment classification. First, the input text is processed through a BERT encoder to obtain low-level text representations. Then, the resulting text representations are processed by the aspect term extraction module on the left and the contextual feature representation module on the right, generating vector representations for aspect terms and contextual features, respectively. Finally, the two extracted vectors are concatenated and passed into the sentiment polarity classifier for sentiment classification. In Section 3.3, Section 3.4, Section 3.5 and Section 3.6, we will provide a detailed explanation of the functions and working principles of each module.

3.3. Context-Encoder

In this study, we utilized the BERT [3] model to capture contextual information from text. BERT processes input text sequences through a bidirectional transformer architecture, effectively extracting the semantic features of each word. To handle the input for BERT, we used WordPiece embeddings [18]. The processing of BERT input is shown in Figure 2. Each word is first mapped to a low-dimensional real-valued vector, with its embedding matrix represented as E R | V | × d e , where |V| represents the size of the vocabulary and d e represents the dimension of the word embeddings. Therefore, a sentence s is represented by its corresponding word embeddings E = E 1 , E 2 , , E n .
To ensure the model focuses solely on the actual text and eliminates interference from irrelevant padding, we employ a masking mechanism and sequence length information to filter BERT’s output. Using the mask operation, the feature values of the padding sections are set to zero, ensuring that subsequent feature learning relies only on valid text portions. Furthermore, after deriving contextual features from the BERT encoder, we map the vocabulary through one-hot encoding to provide a foundation for semantic enhancement. Subsequently, the one-hot encoded mapping results are combined with the masked BERT output via matrix multiplication. This approach not only retains the contextual information provided by the BERT model but also improves the ability of the model to perceive the semantic subtleties of individual words, thereby offering more refined contextual features for the subsequent sentiment classification task. Next, the hidden state vectors are generated by the BERT encoder as follows H = H 1 , H 2 , , H n :
H i = B E R T E n c o d e r ( E i )

3.4. Aspect Terms Extraction

The aspect term extraction module can be divided into three parts: dependency tree construction, weight matrix computation, and local-focused graph convolutional networks.

3.4.1. Construction of the Dependency Tree

The input text is first processed through a natural language processing tool to construct the dependency tree. As an important syntactic structure, the dependency tree clearly describes the relationships between words and effectively captures the dependencies between them. By precisely encoding these syntactic relationships, the dependency tree plays a crucial role in helping the model understand the meaning of the sentence. It guides the model to focus on the most relevant syntactic connections between words, thus enabling it to accurately distinguish sentiment words related to aspect terms using these dependencies. Compared to other syntactic representations, the dependency tree has significant advantages. It explicitly models the syntactic relationships between words, allowing the model to directly capture the dependencies between words. This feature makes the dependency tree particularly suitable for tasks such as aspect term extraction and sentiment polarity classification. We use a graph g ( e , v ) to describe sentence s, where each edge e denotes the syntactic dependence connection and each node v denotes a word’s hidden state. For a given sentence s , we construct a standard dependency graph G ( E , V ) , where E and V represent the edges and nodes, respectively. The following is the definition of the dependency graph’s adjacency matrix, or DA:
D A i j = 1 ,   i f   e i j E i    o r    i = j 0 ,   o t h e r w i s e
where e i j denotes the edge connecting node i and node j .
However, the quality and accuracy of the dependency tree may vary depending on the input text. Specifically, sentences with ambiguity or spelling errors can lead to parsing mistakes. To address this, we combine Spacy’s powerful parsing capabilities with fine-tuning of pre-trained language models, which helps improve parsing accuracy. This fine-tuning process contributes to the construction of high-quality dependency trees, ensuring that the model performs well across a variety of input texts, thus optimizing overall performance. As shown in Figure 3, the complete dependency tree structure of the sentence “The price is reasonable although the service is poor” is presented. The direction of the arrows indicates the dependency relationship from the head word to the dependent word, while the labels specify the specific grammatical functions (e.g., det represents a determiner, nsubj represents a nominal subject, etc.).

3.4.2. Calculation of the Weight Matrix

Self-Attention Mechanism is an important method for capturing dependencies between elements in a sequence. It dynamically adjusts the attention given to each element by calculating the relationships between query (Q), key (K), and value (V). By evaluating the relevance between each pair of elements in the sequence, it effectively captures the complex dependencies within the sequence. To aggregate the attention scores of different tokens and form the final attention matrix, we first compute the attention score for each pair of tokens (i.e., the similarity between their query and key). These attention scores are then normalized using the softmax function, transforming them into a probability distribution. Next, these normalized attention scores are used to weight the corresponding value vectors. Finally, the attention matrix is obtained by aggregating the weighted value vectors for each token. The calculation formula is as follows:
A = s o f t m a x ( Q K T d k ) V
where Q and K are both equal to H , which is generated by the encoding layer.
In our model, a new weight matrix is constructed by integrating the dependency features between nodes, resulting in the weighted feature representation. The specific calculation formula is as follows:
W G = A D A
where represents element-wise multiplication.

3.4.3. Position-Focused Graph Convolutional Networks

To further improve the performance of graph convolutional networks (GCNs), inspired by [19,20], we designed the position-focused graph convolutional network (P-GCN) as an enhancement to the traditional GCN. Traditional GCN primarily focuses on the dependencies between nodes while neglecting their relative positions in the graph. In contrast to traditional GCN, P-GCN integrates positional information of nodes into the graph construction process, enabling the model to not only capture dependencies between words but also pay closer attention to the positions of aspect terms within the sentence. This allows the model to more effectively improve the identification of aspect terms, generating more accurate feature vectors for aspect terms. Specifically, we converted the aggregation process into one that aggregates local dependency information by extracting the maximum features of nodes. The graph convolution process involves l iterations of updates. During each iteration, the target node aggregates information based on the hidden representations of its neighboring nodes. Utilizing the previously given weight matrix W G , the representation of the target node may be determined using the following formula:
h i l = σ ( M A X ( W G i h l 1 ) ; h i l W m + b m )
h l = h 1 l , h 2 l , h 3 l , , h n l
where h l 1 represents the feature representation of the previous node, and W m and b m are the learnable parameters, and h i l denotes the representation of the i-th node in the l-th layer.
In order to represent the position of aspect terms in the final feature vector, P-GCN combines the adjacency matrix and self-attention mechanism, adding position weighting to the nodes’ features. Specifically, each node’s position weight ( W G i ) is integrated with its previous node’s feature representation ( h l 1 ), making the graph convolution operation focus more on the relative position of the nodes and capturing the syntactic dependencies related to the aspect terms.
During the update process of the target node, a MAX aggregator is used to extract feature information from preceding nodes and concatenate it with the current node’s features. Through the graph convolution operation, the node features are progressively updated and capture the dependencies and positional information between the nodes. Finally, after processing with the P-GCN module, we obtain the feature vector for aspect terms, denoted as h A , calculated as follows:
h A = P G C N s ( H )
where PGCNs represent the graph convolution operations.
Through multiple iterations, we can obtain more accurate local feature representation h A = h 1 A , h 2 A , h 3 A , , h n A , which serve as the final features of the aspect terms. Then, a linear layer and the ReLU activation function are applied to further process these features, and the softmax function is used to convert them into a probability distribution over aspect categories. Finally, the model assigns a BIO tag to each word, predicting whether each word is an aspect term.
Z A = R e L U ( h A W a + b a )
Y A = s o f t m a x ( Z ) = e x p ( z ) k = 1 C a e x p ( z )
where W a and b a represent optimized parameters, Ca signifies the number of aspect categories, and YA denotes the probable aspect category.

3.5. Contextual Feature Representation

The main goal of the context representation module is to generate deeper contextual feature representations by modeling the global semantics and long-distance dependencies of the text. It consists of two components: the multi-head attention mechanism and the graph convolutional network.

Multi-Head Attention Mechanism and Graph Convolutional Network

Multi-Head Attention Mechanism: In this study, we designed and introduced the multi-head attention [21] mechanism to more accurately capture semantic information at different levels of the text. To enhance the model’s robustness in noisy environments, we introduce the BERT encoder for input processing. BERT employs the WordPiece tokenization method, which automatically splits misspelled words into smaller subunits, thereby effectively reducing the impact of noise on model performance. In this way, even if the input text contains spelling errors or informal language, BERT can still preserve semantic integrity through token decomposition and mapping, ensuring that the model’s performance is not excessively affected by noise. Unlike traditional single-head attention mechanisms, MHA employs multiple attention heads, allowing it to learn semantic relationships within the input sequence from various perspectives and comprehensively capture contextual information. Specifically, MHA first projects the input Query (Q), Key (K), and Value (V) into multiple independent subspaces, where each attention head independently computes attention scores to measure the relationships between elements in the input sequence. Then, the outputs of all attention heads are concatenated and processed through a linear transformation to integrate information from different subspaces, ultimately generating a global semantic representation.
The following is the formula for multi-head attention:
a = s o f t m a x ( f s ( K , Q ) ) K
f s ( K i , Q j ) = t a n h ( [ K i ; Q j ] W f )
where [ K i ; Q i ] represents the concatenation of the key and query vectors, W f is a model parameter, and f s is a function used to compute the relationship score between the key and query. The result of f s ( K i , Q j ) is then subjected to a softmax operation to obtain the attention weight, which is subsequently multiplied with the key vector K to derive the attention value a .
The formula for multi-head attention is as follows:
M H A = [ a 1 ; a 2 ; a 3 , , a n h e a d ] W p
where n h e a d represents the quantity of attention heads, and W p is the parameter of the model. For each attention head i (ranging from 1 to n h e a d ), there is a corresponding attention value a i . The attention values of all attention heads [ a 1 ; a 2 ; a 3 , , a n h e a d ] are concatenated and then multiplied by the parameter matrix W p to get the ultimate outcome of the multi-head attention.
Graph Convolutional Network: The results obtained from the MHA computation are used as the input for the GCN. By modeling the syntactic and semantic dependencies of the text, GCN effectively captures long-distance dependency information within the text. In the text, relationships between words are not only established through local context but also involve long-range dependencies spanning multiple words. By constructing a graph structure of the text, the graph convolutional network enables each word’s representation to be influenced not only by its neighboring words but also to incorporate global information from the entire sentence. Specifically, the graph convolutional network transmits information between nodes (words) through the adjacency matrix, enabling the model to learn more intricate dependency relationships in multi-layer graph convolutions. The specific calculation formula is as follows:
h j = G C N s ( M H A )
h i l = σ ( j = 1 n D A i j W l h j l 1 + b l )
where W l is a learnable parameter, b l is a bias term, σ is a non-linear activation function (ReLU), and h i l denotes the representation of the i-th node in the l-th layer.
By integrating GCN and MHA, we designed and implemented an efficient module for global context modeling and feature extraction. Firstly, MHA enhances the semantic representation of the context through multi-head parallel computation, enabling the model to capture information from multiple semantic subspaces and focus on different levels and fine-grained semantic relationships within the text. Secondly, GCN models long-distance dependencies in the text by constructing a graph structure, further capturing both local and global structural information, thereby strengthening the connections between syntactic and semantic levels. Through this integration, the model effectively captures syntactic dependencies between words while gaining a deeper understanding of the overall semantic structure of the text. After multiple iterations, a more accurate contextual feature representation h C = h 1 C , h 2 C , h 3 C , , h n C is obtained.

3.6. Sentiment Polarity Classification

To improve the accuracy of sentiment classification, we first integrate the local features (i.e., aspect term features) obtained from the aspect term extraction module with the global features derived from the context feature representation module. The sentiment polarity classification module first integrates the output features h A from the aspect term extraction module and the output features h C from the contextual feature representation module. By performing a concatenation operation, the two features are combined into a unified vector representation. Subsequently, this fused feature is fed into the classifier, which predicts the sentiment polarity (e.g., positive, negative, or neutral) for each aspect based on the feature relationships learned during training. Finally, the classifier generates a probability distribution across the sentiment polarities and determines the category with the greatest probability as the predicted result. This approach allows the model to simultaneously capture key information from the local context and the overall semantic relationships from the global context, thereby improving the accuracy of sentiment classification. The specific calculation formula is as follows:
h = h C ; h A
y = s o f t m a x ( W s h + b s )
where W s represents learnable weights, and b s represents bias terms.

3.7. Model Training

To simultaneously optimize the aspect term extraction task and the sentiment polarity classification task, we designed a multi-task learning framework. During the training process, the parameters of both the BERT embedding layer and transformer layers are shared by both tasks, and the generated contextual representations are used as inputs for both tasks, helping to extract task-relevant features. The shared BERT parameters are updated through joint training, ensuring that both tasks can independently learn their respective features based on the shared contextual information. For each task, we also designed independent output layers, and the task-specific parameters are updated independently during training. Meanwhile, the shared BERT parameters are optimized through joint loss functions in the gradient updates for both tasks, while the task-specific parameters are gradually updated based on each task’s individual loss. To balance the influence of the two tasks on model training, we adopted a weighted loss function, assigning loss weights based on the relative importance of each task.
To optimize the model during training, we minimize the following overall objective function, which is calculated as:
L = L A T E + L S C
L A T E and L S C represent the loss functions for the aspect term extraction task and the sentiment polarity classification task, respectively. Both adopt cross-entropy loss and incorporate L2 regularization for constraint. The particular calculating formula is as follows:
L A T E = i = 1 N j = 0 C 1 ( y i j ( A ) l n ( y ^ i j ( A ) ) ) + λ 1 | | θ 1 | |
L S C = i = 1 N j = 0 C 2 ( y i j ( P ) ln ( y ^ i j ( P ) ) ) + λ 2 | | θ 2 | |
where y i j ( A ) represents the authentic label of the i-th sample in the j-th aspect category, and y i j ( P ) represents the true label of the i-th sample in the j-th sentiment category. y ^ i j A and y ^ i j P are the predicted labels of the model. C 1 and C 2 denote the total number of categories for the two tasks, λ 1 , λ 2 represent the regularization coefficients, and θ 1 , θ 2 are the model’s parameters that may be trained.

4. Experiments

4.1. Datasets and Parameter Settings

4.1.1. Datasets

To assess the efficacy of the presented model, experiments were conducted on four benchmark datasets, namely SemEval2014 (Task 4), MAMS [22], and Twitter [23]. All the datasets used in this study are in English, and each dataset has been carefully annotated for aspect term extraction and sentiment polarity classification tasks. Specifically, the Restaurant14 and Laptop14 datasets contain both multi-aspect and single-aspect sentences, the Twitter dataset only contains single-aspect sentences, and each sentence in the MAMS dataset includes at least two aspect terms with different sentiment polarities. The sentiment labels are categorized into three types: positive, negative, and neutral. The statistical results of the datasets are shown in Table 1, where Train represents the training set and Test represents the test set.

4.1.2. Experimental Parameter Settings

To ensure optimal performance of the model, we configured and optimized several key hyperparameters. Table 2 presents the final hyperparameter settings, which were determined by comprehensively referencing previous studies and validating through multiple experiments to guarantee the stability and generalization capability of the model during training and testing. The Adam optimizer is a gradient-based optimization algorithm that combines momentum and adaptive learning rate adjustment. It efficiently adjusts parameters, optimizing the convergence speed and stability during training, and enhances the model’s performance and generalization ability in tasks such as aspect term extraction and sentiment polarity classification.

4.2. Evaluation and Criteria

To comprehensively evaluate the classification performance of the model, two commonly used metrics for classification tasks are adopted: accuracy and F1-score. Specifically, F 1 ATE is used as the evaluation metric for the aspect term extraction task, while F 1 SC and accuracy (Acc) are used as evaluation metrics for the aspect sentiment classification task.
The calculation formulas are as follows:
A c c = T P + T N T P + T N + F P + F N
P = T P T P + F P
F 1 A T E = 2 P R P + R
F 1 S C = 1 C i = 1 C F 1 A T E
where TP is the quantity of samples classified as positive that are really positive. TN denotes the quantity of samples classified as negative that are really negative. FP denotes the quantity of negative samples erroneously classified as positive, FN signifies the count of positive samples mistakenly identified as negative, and C represents the total number of aspect terms.

4.3. Compared Models

The models compared in the experiments mainly come from three major domains: aspect term extraction, aspect sentiment classification, and multi-task joint learning.

4.3.1. Aspect Term Extraction Model

  • DTBCSNN [24]: A dependency tree-based stacked convolutional neural network was proposed, which uses conditional random fields (CRFs) to accurately extract aspect terms.
  • RAL [25]: A reinforcement learning-based active learning sampling strategy was proposed to optimize the aspect term extraction process, improving extraction efficiency and accuracy.
  • LDA [26]: An unsupervised learning method was proposed, which identifies potential topics of user interest and achieves automatic aspect term extraction through the guidance of a small set of seed words.
  • DA-DCGCN [10]: A method for aspect term extraction combining dynamic attention mechanism and dense connection graph convolutional network (DA-DCGCN) is proposed.

4.3.2. Aspect Sentiment Classification Model

  • DualGCN [27]: A dual graph convolutional network model including SynGCN and SemGCN modules for capturing syntactic structures and semantic connections separately.
  • SDGCN-BERT [28]: An ABSA model through graph convolutional networks. By introducing a bi-directional attention mechanism with positional encoding and a GCN module, the model effectively captures sentiment dependencies between multiple aspects in a sentence.
  • MHAGCN [21]: A model using hierarchical multi-head attention mechanisms and graph convolutional networks, thoroughly considering syntactic dependencies and combining semantic information to provide deep interaction among aspect terms and context.
  • SS-GCN [14]: This model enhances semantic representation for aspect-level sentiment analysis through graph convolutional networks by automatically learning syntactic weight matrices and integrating syntactic and semantic information, thereby capturing aspect sentiment more accurately.

4.3.3. Multi-Task Joint Learning Model

  • MNN [29]: Utilizes a unified sequence labeling scheme to define training tasks, simultaneously performing aspect term extraction and sentiment classification.
  • LCF-ATEPC [30]: A multi-task learning model for Chinese ABSA that is capable of simultaneously extracting aspect terms and inferring their sentiment polarities.
  • MTABSA [16]: Combines aspect term extraction and sentiment polarity classification in a multitask learning framework, leveraging multi-head attention and RGAT to capture key dependency relations and enhance classification performance.
  • BLAB [17]: By integrating the AD-BiReGU module into the BERT-LCF framework, aspect term extraction and fine-grained sentiment analysis are performed simultaneously, addressing the limitation of existing models that primarily focus on a single task.

4.4. Main Results

In the experiments conducted on each dataset, we consistently fine-tuned the MTL-GCN model using the same hyperparameters to ensure consistency and stability in both training and testing processes. Overall, the experimental results on four datasets indicate that the model performs relatively well and achieves state-of-the-art levels. Table 3 present the comparison results between our model and the baseline models. This outcome strongly demonstrates that our model can consistently leverage its advantages across multiple datasets.
Table 3 illustrates that, compared to the current best aspect term extraction model, DA-DCGCN, our proposed multi-task learning model, MTL-GCN, achieves an average improvement of approximately 2.01% across three datasets. This result indicates that the feature-sharing mechanism in multi-task learning can more effectively capture task-related features, thereby significantly enhancing the model’s performance in the aspect term extraction task. Similarly, as shown in Table 4, MTL-GCN also achieves significant improvements in the sentiment polarity classification task, compared to the latest single-task model SS-GCN. This further validates that multi-task learning can effectively leverage the relationships between different subtasks, thus improving the accuracy of sentiment polarity classification.
Additionally, the experimental results in Table 5 indicate that our suggested model, MTL-GCN, significantly outperforms the baseline models in both subtasks, providing strong evidence that our method can effectively enhance the overall performance of the model in the joint modeling process. Specifically, in comparison to the current state-of-the-art model, BLAB, our model improves the accuracy (Acc) by 1.03% and the F1SC score by 1.56% on the sentiment polarity classification task on the Restaurant dataset. On the Laptop dataset, our model increases the accuracy by 1.17% in the aspect term extraction task. On the Twitter dataset, our model achieves a 2.14% improvement in accuracy, and on the MAMS dataset, the accuracy increases by 2.28%. Although the F1ATE score shows a slight decline on the Restaurant and Twitter datasets, the overall performance of the model still significantly outperforms other methods. These results strongly validate the effectiveness of the suggested position-focused graph convolutional network in the aspect word extraction task and demonstrate the effectiveness of leveraging inter-task correlations to improve classification performance.

4.5. Ablation Study

To further validate the efficacy of each component in our model, we performed an ablation study, with the specific results shown in Table 6. In this study, we separately removed the dependency tree, the P-GCN module, and the MHA module to evaluate the contribution of each component to the model. The results in Table 4 indicate that removing the P-GCN module had the most significant impact on the performance of the ATE task.
Specifically, when the P-GCN module was removed, the F1ATE metric dropped by an average of approximately 4.3% across the four datasets. This result demonstrates that the P-GCN module effectively models the dependencies between aspect terms while capturing contextual information related to aspect terms, enabling accurate aspect term extraction. Additionally, when the MHA module was removed, the F1SC metric showed varying degrees of decline across the four datasets. This suggests that relying solely on the GCN for semantic feature extraction is insufficient to fully capture sentiment features, resulting in reduced performance in the SC task. Lastly, we evaluated the impact of the dependency tree on model performance. The results show that removing the dependency tree caused significant declines in performance for both the ATE and SC tasks, highlighting the crucial role of syntactic dependency relationships provided by the dependency tree in the overall model.

4.6. Impact of GCN Layers

This section analyzes the impact of different numbers of GCN layers on model performance through visualization experiments. Figure 4 summarizes the performance of P-GCN with different layers on the ATE task and GCN with different layers on the SC task. The experiments demonstrate that a single-layer GCN achieves relatively stable performance in both tasks. The introduction of a two-layer P-GCN further improves the effectiveness of the ATE task, enabling the model to more accurately identify aspect terms. However, when the number of GCN layers increases to three or more, the overall model performance begins to decline significantly.
The visualization results indicate that an excessively deep network structure may lead to over-aggregation of features, interfering with the model’s capacity to collect critical task-related information. Overall, the single-layer GCN, with its simple and efficient structure, fully demonstrates its advantages in multi-task learning.

4.7. Visualization of Attention Weights

To clearly demonstrate how the attention mechanism enhances performance in aspect term extraction (ATE) and sentiment classification (SC) tasks, we present the visualization of attention weights in Figure 5. Figure 5a shows that high attention weights are concentrated on aspect terms and their surrounding modifiers, indicating that the syntactic dependencies we introduced enhance the attention mechanism, helping the model focus more on aspect terms. Specifically, both the x-axis and y-axis represent words in the sentence, and the shading of each cell indicates the attention weight of one word to another. The darker the color, the higher the attention weight, meaning that when computing the features of the current word, the model pays more attention to the words that are highlighted. Therefore, the attention mechanism allocates more weight to aspect terms in this way. Figure 5b depicts how the multi-head attention mechanism captures the relationship between sentiment words and aspect terms, with a focus on the allocation of attention to sentiment words. We can observe that the attention layer assigns more weight to sentiment words, allowing the model to effectively determine sentiment polarity. These results indicate that the attention mechanism is essential for enhancing the model’s performance in both tasks.

4.8. Case Study

To intuitively demonstrate the benefits of our approach compared to other models, we selected MTABSA, BLAB, and our model, MTL-GCN, for analysis of example sentences, with the findings shown in Table 7.
For the initial example sentence, the labeled aspects include food, kitchen, and menu, with the corresponding sentiment polarities being positive, positive, and neutral, respectively. MTABSA exhibits certain deviations in sentiment classification. Although MTABSA successfully extracted all aspect terms (food, kitchen, and menu), it incorrectly labeled the sentiment polarity of “menu” as negative.
For the second example sentence, the labeled aspects are tech guy, service center, and “sales” team, with the sentiment polarities being neutral, negative, and negative, respectively. As shown in Table 7, neither MTABSA nor BLAB fully extracted all aspect terms, which affected the accuracy of these models in the sentiment classification task. Additionally, MTABSA incorrectly identified the sentiment polarity of the “sales” team.
In contrast, our model, MTL-GCN, not only accurately extracted all aspect terms but also correctly classified the sentiment polarity for each aspect term, perfectly matching the labels.

5. Conclusions and Future Work

5.1. Conclusions

This research presents a unified model through graph convolutional networks and multi-task learning (MTL-GCN), capable of jointly modeling aspect term extraction and sentiment polarity classification. Within MTL-GCN, we introduce dependency trees combined with a self-attention mechanism to generate new weight matrices, focusing on the positional information of aspect terms. Additionally, we redesign the graph convolutional network to effectively extract aspect terms. By integrating a graph convolutional network with an MHA, the model’s comprehension of different levels of semantic information in the text is enhanced, thereby generating deeper contextual feature representations. The improved model achieves significant performance gains compared to existing baseline models across four different datasets.

5.2. Future Work

In subsequent work, we plan to further enhance the existing framework to accommodate the increasingly diverse multimodal data requirements in sentiment analysis. With the widespread application of multimodal data, such as images and audio, in social media and review scenarios, we aim to integrate these modalities into the current framework to enable the model to better understand complex emotional expressions and further enhance its generalization ability. First, we will explore how to more effectively model the complementarity between modalities within a unified framework, particularly in scenarios where emotional expression heavily depends on cross-modal information. Second, we will leverage the task relationship modeling mechanisms in the current framework to investigate how multimodal data can share features across different tasks, thereby enhancing the synergy between subtasks. In addition, considering that emotional expressions in different domains often have their own uniqueness, we plan to evaluate the model’s performance on domain-specific languages (such as those in medicine, finance, etc.) in our future research. While expanding the multimodal capabilities, we aim to further enhance the model’s adaptability and accuracy in these domains to better address the specific language features of each domain. Lastly, since statistical significance testing was not conducted in the current study, we will place greater emphasis on this aspect in future work to ensure the reliability and robustness of the results. Through these improvements, our research is expected to provide effective solutions for multimodal aspect-based sentiment analysis.

Author Contributions

H.H. was responsible for the design of the study, data analysis, and experiments; S.W. participated in the design and execution of the study and assisted the first author in conducting the research and writing the manuscript; B.Q. was responsible for reviewing and refining the initial draft of the manuscript; L.D. handled the proofreading and editing after the completion of the manuscript; Y.W. contributed to parts of the manuscript writing; H.X. and X.Z. provided financial support for this research. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the Key Research and Promotion Projects of Henan Province (Grant No. 242102210084 and 242102210081), the Zhejiang Provincial Natural Science Foundation (Grant No. LQ23F020013).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

I am grateful to my lab and mentors for their guidance and support throughout the entire process of this work.

Conflicts of Interest

Author Xiaomei Zou was employed by the company Intelligent Computing Infrastructure Innovation Center, Zhejiang Lab. Author Hui Xue was employed by the company 54th Research Institute, China Electronics Technology Group Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BERTBidirectional encoder representations from transformers
ATEAspect term extraction
SCSentiment polarity classification
RGATRelational graph attention network
PosPositive
NeuNeutral
NegNegative

References

  1. Chen, Z.; Qian, T. Enhancing aspect term extraction with soft prototypes. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 2107–2117. [Google Scholar]
  2. Liang, S.; Wei, W.; Mao, X.-L.; Wang, F.; He, Z. BiSyn-GAT+: Bi-syntax aware graph attention network for aspect-based sentiment analysis. arXiv 2022, arXiv:2204.03117. [Google Scholar]
  3. Kenton, J.D.M.-W.C.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
  4. Wang, W.; Pan, S.J.; Dahlmeier, D.; Xiao, X. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
  5. Yang, Y.; Li, K.; Quan, X.; Shen, W.; Su, Q. Constituency lattice encoding for aspect term extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 844–855. [Google Scholar]
  6. Ji, B.; Liu, R.; Li, S.; Tang, J.; Yu, J.; Li, Q.; Xu, W. A BILSTM-CRF method to Chinese electronic medical record named entity recognition. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, Sanya China, 21–23 December 2018; pp. 1–6. [Google Scholar]
  7. Phan, M.H.; Ogunbona, P.O. Modelling context and syntactical features for aspect-based sentiment analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3211–3220. [Google Scholar]
  8. Chen, G.; Tian, Y.; Song, Y. Joint aspect extraction and sentiment analysis with directional graph convolutional networks. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 272–279. [Google Scholar]
  9. Ma, D.; Li, S.; Wu, F.; Xie, X.; Wang, H. Exploring sequence-to-sequence learning in aspect term extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 3538–3547. [Google Scholar]
  10. Sun, X.; Mi, Y.; Liu, J.; Li, H. Aspect Term Extraction via Dynamic Attention and a Densely Connected Graph Convolutional Network. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Kyoto, Japan, 18–24 November 2024; pp. 383–395. [Google Scholar]
  11. Luo, F.; Li, C.; Cao, Z. Affective-feature-based sentiment analysis using SVM classifier. In Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Nanchang, China, 4–6 May 2016; pp. 276–281. [Google Scholar]
  12. Das, S.; Kolya, A.K. Sense GST: Text mining & sentiment analysis of GST tweets by Naive Bayes algorithm. In Proceedings of the 2017 Third International Conference on Research in Computational Intelligence and Communication networks (ICRCICN), Kolkata, India, 3–5 November 2017; pp. 239–244. [Google Scholar]
  13. Zhang, Z.; Zhou, Z.; Wang, Y. SSEGCN: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, DC, USA, 10–15 July 2022; pp. 4916–4925. [Google Scholar]
  14. Chen, J.; Fan, H.; Wang, W. Syntactic and Semantic Aware Graph Convolutional Network for Aspect-based Sentiment Analysis. IEEE Access 2024, 12, 22500–22509. [Google Scholar] [CrossRef]
  15. Nguyen, H.; Shirai, K. A joint model of term extraction and polarity classification for aspect-based sentiment analysis. In Proceedings of the 2018 10th International Conference on Knowledge and Systems Engineering (KSE), Ho Chi Minh City, Vietnam, 1–3 November 2018; pp. 323–328. [Google Scholar]
  16. Zhao, G.; Luo, Y.; Chen, Q.; Qian, X. Aspect-based sentiment analysis via multitask learning for online reviews. Knowl.-Based Syst. 2023, 264, 110326. [Google Scholar] [CrossRef]
  17. Fan, X.; Zhang, Z. A fine-grained sentiment analysis model based on multi-task learning. In Proceedings of the 2024 4th International Symposium on Computer Technology and Information Science (ISCTIS), Xi’an, China, 12–14 July 2024; pp. 157–161. [Google Scholar]
  18. Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
  19. Zhao, X.; Peng, H.; Dai, Q.; Bai, X.; Peng, H.; Liu, Y.; Guo, Q.; Yu, P.S. Rdgcn: Reinforced dependency graph convolutional network for aspect-based sentiment analysis. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Merida, Mexico, 4–8 March 2024; pp. 976–984. [Google Scholar]
  20. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  21. Li, X.; Lu, R.; Liu, P.; Zhu, Z. Graph convolutional networks with hierarchical multi-head attention for aspect-level sentiment classification. J. Supercomput. 2022, 78, 14846–14865. [Google Scholar] [CrossRef] [PubMed]
  22. Jiang, Q.; Chen, L.; Xu, R.; Ao, X.; Yang, M. A challenge dataset and effective models for aspect-based sentiment analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6280–6285. [Google Scholar]
  23. Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland, 23–25 June 2014; pp. 49–54. [Google Scholar]
  24. Ye, H.; Yan, Z.; Luo, Z.; Chao, W. Dependency-tree based convolutional neural networks for aspect term extraction. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, 23–26 May 2017; pp. 350–362. [Google Scholar]
  25. Venugopalan, M.; Gupta, D. A reinforced active learning approach for optimal sampling in aspect term extraction for sentiment analysis. Expert Syst. Appl. 2022, 209, 118228. [Google Scholar] [CrossRef]
  26. Venugopalan, M.; Gupta, D. An enhanced guided LDA model augmented with BERT based semantic strength for aspect term extraction in sentiment analysis. Knowl.-Based Syst. 2022, 246, 108668. [Google Scholar] [CrossRef]
  27. Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. Dual graph convolutional networks for aspect-based sentiment analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 6319–6329. [Google Scholar]
  28. Zhao, P.; Hou, L.; Wu, O. Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl.-Based Syst. 2020, 193, 105443. [Google Scholar] [CrossRef]
  29. Wang, F.; Lan, M.; Wang, W. Towards a one-stop solution to both aspect extraction and sentiment analysis tasks with neural multi-task learning. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
  30. Yang, H.; Zeng, B.; Yang, J.; Song, Y.; Xu, R. A multi-task learning model for chinese-oriented aspect polarity classification and aspect term extraction. Neurocomputing 2021, 419, 344–356. [Google Scholar] [CrossRef]
Figure 1. This figure illustrates how aspect-based sentiment analysis is conducted through graph convolutional networks and multi-task learning. The pink sections correspond to the aspect term extraction task, while the orange sections represent the sentiment polarity classification task.
Figure 1. This figure illustrates how aspect-based sentiment analysis is conducted through graph convolutional networks and multi-task learning. The pink sections correspond to the aspect term extraction task, while the orange sections represent the sentiment polarity classification task.
Information 16 00201 g001
Figure 2. This figure illustrates the process of generating input embeddings for the sentence “The price is reasonable although the service is poor”. The input embedding consists of three components: token embeddings, segment embeddings, and position embeddings. Token embeddings are the word vectors obtained through word embedding techniques; segment embeddings are used to distinguish different parts of the sentence, with each word receiving a corresponding segment embedding based on its position within the sentence; position embeddings contain information about the word’s position within the sentence.
Figure 2. This figure illustrates the process of generating input embeddings for the sentence “The price is reasonable although the service is poor”. The input embedding consists of three components: token embeddings, segment embeddings, and position embeddings. Token embeddings are the word vectors obtained through word embedding techniques; segment embeddings are used to distinguish different parts of the sentence, with each word receiving a corresponding segment embedding based on its position within the sentence; position embeddings contain information about the word’s position within the sentence.
Information 16 00201 g002
Figure 3. The visualization of the complete dependency tree structure for the sentence “The price is reasonable although the service is poor”, showing the relationships between words and their grammatical functions.
Figure 3. The visualization of the complete dependency tree structure for the sentence “The price is reasonable although the service is poor”, showing the relationships between words and their grammatical functions.
Information 16 00201 g003
Figure 4. This figure illustrates the effect of the number of GCN layers, with subgraph (a) showing the impact of P-GCN layers on the ATE task and subgraph (b) showing the impact of GCN layers on the SC task.
Figure 4. This figure illustrates the effect of the number of GCN layers, with subgraph (a) showing the impact of P-GCN layers on the ATE task and subgraph (b) showing the impact of GCN layers on the SC task.
Information 16 00201 g004
Figure 5. Attention layer visualization. Darker colors indicate higher attention scores. Subfigure (a) presents the visualization results for the ATE task, while subfigure (b) shows the visualization results for the SC task.
Figure 5. Attention layer visualization. Darker colors indicate higher attention scores. Subfigure (a) presents the visualization results for the ATE task, while subfigure (b) shows the visualization results for the SC task.
Information 16 00201 g005
Table 1. Statistics of datasets.
Table 1. Statistics of datasets.
DatasetPositiveNegativeNeutralTotal
Restaurant 14 *Train21648076373608
Test7271961961119
Laptop 14 *Train9378514552243
Test337128167632
TwitterTrain1507152830166051
Test172169336677
MAMSTrain33802764504211,186
valid4033256041332
Test4003296071336
* http://alt.qcri.org/semeval2014/task4, accessed on 24 October 2024.
Table 2. Hyperparameter settings.
Table 2. Hyperparameter settings.
Hyper-ParametersValue
word embedding dimension768
batch size12
learning rate2 × 10−5
training epochs20
dropout rate0.5
optimizerAdam
Table 3. Performance of different models on the aspect term extraction task. The bolded text indicates the best outcomes.
Table 3. Performance of different models on the aspect term extraction task. The bolded text indicates the best outcomes.
ModelsRestaurantLaptopTwitter
F1ATEF1ATEF1ATE
DTBCSN (2017)83.9775.6675.33
RAL (2022)85.6378.6773.61
LDA (2022)81.0075.0074.00
DA-DCGCN (2024)87.6182.7483.42
MTL-GCN89.0284.7386.04
Table 4. Performance of different models on the sentiment classification task. The bolded text indicates the best outcomes.
Table 4. Performance of different models on the sentiment classification task. The bolded text indicates the best outcomes.
ModelsRestaurantLaptopTwitter
AccF1SCAccF1SCAccF1SC
SDGCN-BERT (2020)83.5776.4781.3578.34
DualGCN (2021)84.2778.0878.4874.4775.9274.29
MHAGCN (2022)82.5775.8379.0675.7074.5373.75
SS-GCN (2024)82.9674.2675.8671.78
MTL-GCN89.4984.3581.6278.2381.5380.22
Table 5. Comparison of experimental results of different models. The bolded text indicates the best outcomes.
Table 5. Comparison of experimental results of different models. The bolded text indicates the best outcomes.
ModelsRestaurantLaptopTwitterMAMS
F1ATEAccF1SCF1ATEAccF1SCF1ATEAccF1SCF1ATEAccF1SC
MNN (2018)83.0577.1768.5076.9470.4065.9872.0571.0563.87
LCF-ATEPC (2021)88.4586.7780.5483.3280.9777.8685.1276.774.54
MTABSA (2023)87.4586.8881.1681.5580.5677.0087.3376.2174.34
BLAB (2024)89.4788.4682.7984.5780.4578.0288.8779.3979.2883.7284.5685.37
MTL-GCN89.0289.4984.3584.7381.6278.2386.0481.5380.2285.2486.8486.46
Table 6. Ablation study. w / o indicates the removal of the current module from the model.
Table 6. Ablation study. w / o indicates the removal of the current module from the model.
MethodsRestaurantLaptopTwitterMAMS
F1ATEAccF1SCF1ATEAccF1SCF1ATEAccF1SCF1ATEAccF1SC
MTL-GCN89.0289.4984.3584.7381.6278.2386.0481.5380.2285.2486.8486.46
w / o P-GCN83.7885.6479.2381.5977.8673.5480.9976.4976.3681.6182.1683.29
w / o MHA87.9987.9282.8583.7580.8477.3184.3979.6278.7684.0785.9384.61
w / o Dep.tree85.1386.9180.0582.5378.1976.2683.6279.7678.5182.0184.8784.23
Table 7. A case study comparing the predicted results of different models.
Table 7. A case study comparing the predicted results of different models.
Review SamplesModelPredicted Label
The food is uniformly exceptional, with a very capable kitchen that will proudly whip up whatever you feel like eating, whether it’s on the menu or not.
Label (Aspect:
{ food, kitchen, menu }
Polarity:
{pos, pos, neu})
MTABSAAspect:{ food, kitchen, menu }
Polarity:{pos, pos, neg}
BLABAspect:{ food, kitchen, menu }
Polarity:{pos, pos, neu}
MTL-GCNAspect:{ food, kitchen, menu }
Polarity: {pos, pos, neu}
The tech guy then said the service center does not do 1-to-1 exchange and I have to direct my concern to the “sales” team, which is the retail shop from which I bought my netbook.
Label(Aspect:
{ tech guy, service center, “sales” team }
Polarity:
{neu, neg, neg})
MTABSAAspect:
{ service center, “sales” team }
Polarity:{ neg, neu}
BLABAspect:
{ service center, “sales” team }
Polarity:{ neg, neg}
MTL-GCNAspect:
{ tech guy, service center, “sales” team }
Polarity:{neu, neg, neg}
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, H.; Wang, S.; Qiao, B.; Dang, L.; Zou, X.; Xue, H.; Wang, Y. Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning. Information 2025, 16, 201. https://doi.org/10.3390/info16030201

AMA Style

Han H, Wang S, Qiao B, Dang L, Zou X, Xue H, Wang Y. Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning. Information. 2025; 16(3):201. https://doi.org/10.3390/info16030201

Chicago/Turabian Style

Han, Hongyu, Shengjie Wang, Baojun Qiao, Lanxue Dang, Xiaomei Zou, Hui Xue, and Yingqi Wang. 2025. "Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning" Information 16, no. 3: 201. https://doi.org/10.3390/info16030201

APA Style

Han, H., Wang, S., Qiao, B., Dang, L., Zou, X., Xue, H., & Wang, Y. (2025). Aspect-Based Sentiment Analysis Through Graph Convolutional Networks and Joint Task Learning. Information, 16(3), 201. https://doi.org/10.3390/info16030201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop