Generative Aspect Sentiment Quad Prediction with Self-Inference Template

Qin, Yashi; Lv, Shu

doi:10.3390/app14146017

Open AccessArticle

Generative Aspect Sentiment Quad Prediction with Self-Inference Template

by

Yashi Qin

and

Shu Lv

^*

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6017; https://doi.org/10.3390/app14146017

Submission received: 27 February 2024 / Revised: 28 June 2024 / Accepted: 9 July 2024 / Published: 10 July 2024

(This article belongs to the Special Issue AI Empowered Sentiment Analysis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Aspect Sentiment Quad Prediction is a research topic of paramount significance and complexity within the Aspect-Based Sentiment Analysis task. Leveraging the generative paradigm of the T5 model, we achieve end-to-end extraction of aspect sentiment elements by paraphrasing the original text into sentences predefined by templates. Current research predominantly confines templates to single sentences or directly concatenates sentiment elements using a few symbols, limiting the model’s reasoning opportunities. In this work, we introduce a Self-Inference Template (SIT) to guide the model in thoughtful reasoning, facilitating a step-by-step inference generation process. This approach enables the model to more accurately identify aspect sentiment elements and their interdependencies. Experimental results demonstrate a significant improvement in quadruplet prediction performance under constant time costs, effectively mitigating overfitting issues caused by limited data volume to some extent.

Keywords:

aspect-based sentiment analysis; aspect sentiment quad prediction; aspect-category-opinion-sentiment; chain of thought; prompt

1. Introduction

The research on Aspect-Based Sentiment Analysis (ABSA) mainly involves four sentiment elements: Aspect Term, Aspect Category, Opinion Term, and Sentiment Polarity. The study of ABSA tasks aims to identify sentiment elements related to specific text items, which can be individual elements such as aspect term extraction [1,2], aspect category detection [3,4], or multiple dependent sentiment elements like aspect-opinion pair extraction [5,6], aspect sentiment triplet extraction [7], aspect-category-sentiment detection [8], etc. Clearly, the more sentiment elements identified, the better the understanding of aspect-level opinions in the text. In 2021, Cai [9] first proposed the Aspect-Category-Opinion-Sentiment(ACOS) quadruple extraction task, which includes implicit aspect and opinion elements. In the same year, Zhang [10] introduced the Aspect Sentiment Quad Prediction (ASQP) task, excluding implicit opinions. Thus, the task of aspect sentiment quadruple extraction officially emerged.

Zhang [10] also proposed a novel modeling paradigm based on the T5 generative model. This paradigm involves paraphrasing the original sentence into the form “

x_{a c}

is

x_{s p}

because

x_{a t}

is

x_{o t}

”, making it easy to extract quadruplets from the paraphrased sentences. In this context,

x_{a c}

represents the aspect category,

x_{s p}

represents the sentiment polarity,

x_{a t}

represents the aspect term, and

x_{o t}

represents the opinion term. Subsequently, numerous studies in aspect-level sentiment analysis based on the generative paradigm emerged, some of which explored template settings. Hu [11] investigated the impact of the order of sentiment tuples in templates on aspect sentiment quadruplet prediction. They simplified templates by directly connecting symbols and elements, such as “[AC]

x_{a c}

[AT]

x_{a t}

[SP]

x_{s p}

[OT]

x_{o t}

”. Joseph [12] also redefined model templates, setting them as “

x_{a c}

| the

x_{a t}

is

x_{o t}

|

x_{s p}

”. These templates, through different orders and forms of placing sentiment elements, attempted to improve the prediction of aspect sentiment quadruplets. However, these templates were all concise sentences, expecting the model to directly provide a sentence containing the quadruplet. We consider the possibility of allowing the model to think slowly to provide answers. This is because there are complex dependencies among various sentiment elements in some sentences. For example, the aspect category is not only determined by the aspect term but also related to the opinion term. In the two examples in Figure 1, although the aspect terms are both “sandwiches”, the aspect categories are different. Therefore, if we could design a template to assist the model in reasoning and analyzing the relationships among various sentiment elements, it might be beneficial for aspect sentiment quadruple prediction.

Inspired by the chain-of-thought approach proposed by Jason [13], we incorporated intermediate reasoning steps into our templates. This inclusion guides the model to progressively reason through the generation process based on our template, step by step inferring each sentiment element. We term this approach the Self-Inference Template (SIT). Simultaneously, during this process, there might be a repetitive generation of aspect terms, aspect categories, opinion terms, and sentiment polarities. To address this, we conduct a voting mechanism on the repetitively generated sentiment elements to obtain the final quadruplet. This approach helps ensure the correctness of predictions to some extent.

We transform the gold labels into a Self-Inference Template form, denoted as y, and the original text denoted as x, is input into the model for supervised training, resulting in a

p_{θ} (y | x)

model. To train the

θ

parameters effectively, a large amount of supervised data is usually required. However, due to the complexity and high cost of ABSA data annotation, the commonly used ABSA datasets are relatively small. The rise of prompts can help models learn in few-shot or even zero-shot scenarios [14]. Therefore, we add prefix prompts to the text data to assist in model training.

Our goal is to extract the required sentiment elements from sentences, similar to entity recognition, requiring the model to have a deeper understanding of the text. Some current studies have found that using noisy text during model training can effectively improve model performance. For instance, focusing the noise on entities within the sentence can result in particularly high predictive performance for entities [14]. Currently, noise includes four types: Masking, Replacement, Deletion, and Permutation. BERT [15] employs Masking and Replacement to process training texts. To encourage the model to understand the text, we applied MASK processing to a small number of tokens in the text, forcing the model to comprehend the text better, thereby improving the identification of sentiment elements.

In summary, we made three improvements to the model: first, introducing a Self-Inference Template to guide the model to think and reason step by step; second, adding prompt prefixes to the text data to help the model quickly adapt to the data; third, implementing a MASK strategy within the text to force the model to deeply understand the text, aiding in the identification of sentiment elements.

The experiments demonstrate that the optimal model, combining the self-inferencing template with two additional methods, outperforms the Paraphrase model. Specifically, on the ASQP datasets Rest15 and Rest16, there is an improvement of 3.07% and 4.06%, respectively. In the case of ACOS datasets for Restaurant and Laptop, the improvement is 3.32% and 1.45%, respectively.

In summary, our work contributes in the following three aspects:

We designed a Self-Inference Template that guides the model in step-by-step reasoning and significantly improves the results of aspect sentiment quadruplet prediction. To our knowledge, this work is the first to approach aspect sentiment quadruplet prediction from the perspective of encouraging the model to contemplate and reason gradually.
We created prompt texts based on the training tasks to help the model train on small datasets. Experiments on both Paraphrase and SIT models demonstrated the effectiveness of prompts.
We boldly experimented with applying MASK operations to ABSA text data to help the model effectively identify sentiment elements, providing more possibilities for future research on ABSA tasks.

2. Related Work

The Chain of Thought (CoT) is a prompting method that significantly enhances the capabilities of large language models in complex reasoning tasks [13]. It achieves this by presenting a small number of examples to the model, explaining the reasoning process in these examples, and guiding the model to generate intermediate reasoning steps. The introduction of the Chain of Thought has led to substantial progress in large language models. Scholars have also applied the chain-of-thought approach to sentiment analysis. Fei [16] utilized the CoT framework to simulate human-like reasoning processes in implicit sentiment analysis, step-by-step extracting implicit aspects, opinions, and sentiment polarity, achieving outstanding results in implicit sentiment analysis.

Currently, the main modeling paradigms for ABSA tasks are Sequence-level Classification (SeqClass), Token-level Classification (TokenClass), Machine Reading Comprehension (MRC), and Sequence-to-Sequence Modeling (Seq2Seq) [17]. SeqClass and TokenClass paradigms are mostly used for single ABSA tasks and cannot meet the current demand for extracting multiple sentiment elements. The MRC paradigm extracts sentiment elements by constructing relevant questions, with the model predicting the start position of words in the original text. This method requires the extracted elements to appear in the original text, making it ineffective for texts containing implicit aspects or implicit opinions. In contrast, the generative paradigm of Seq2Seq can be widely applied to various ABSA tasks, offering high flexibility and providing a unified framework for ABSA task modeling. Zhang [10] transformed the ASQP task into a paraphrase generation process, demonstrating for the first time the excellent capabilities of generative paradigms in handling ABSA tasks. Joseph [12] using a generative model combined with contrastive learning, achieved optimal performance in quadruplet extraction on ACOS datasets containing implicit language. This approach also significantly improved the extraction of implicit aspects and opinions. This indicates that the generative paradigm has the potential for datasets containing implicit terms and requiring strong reasoning abilities.

Inspired by the generative paradigm and the Chain of Thought approach, we propose a Self-Inference Template based on generative aspect sentiment quadruplet prediction. By guiding the model to generate the reasoning process for aspect sentiment elements, our approach helps the model better comprehend the text and improves the results of aspect sentiment quadruplet prediction.

3. Methodology

3.1. Aspect Sentiment Quad Prediction Based on the Generative Paradigm

Aspect sentiment quadruplet prediction aims to predict all aspect terms (AT), aspect categories (AC), opinion terms (OT), and sentiment polarities (SP) within a given sentence x. Aspect terms and opinion terms are generally words present in the sentence, but sometimes aspect terms and opinion terms may be implicitly represented in the sentence, denoted as “NULL” in such cases. Aspect categories belong to a predefined set

V_{c}

. A sentence may contain multiple quadruplets.

Currently, aspect sentiment quadruplet prediction based on the generative paradigm involves arranging the quadruplets in the dataset into a template format to create targets. The text and targets are then fed into a Sequence-to-Sequence model for fine-tuning training. Finally, the trained model generates targets, which are split into quadruplets based on the template format. The key component in achieving this task is the learning process of the Sequence-to-Sequence model. This involves learning parameters

θ

, maximizing the probability

p_{θ} (y | x)

, where x is the original sentence, and y is the target sentence to be obtained. Since the target sentence is generated token by token, the

i - t h

token of y is determined by x and the preceding

i - 1

tokens of y.

p_{θ} (y_{i} | x, y_{1}, \dots, y_{i - 1}) = s o f t m a x (W^{T} y_{i - 1})

(1)

In the process, W maps

y_{i - 1}

to a vector of vocabulary size and subsequently utilizes the softmax function to determine which word from the vocabulary the model should choose as the next token.

During training, we chose the T5 model [18] to initialize the parameters. The T5 model, proposed by Google in 2020, is a pre-trained model designed to handle various text tasks through a unified framework. It converts all tasks into text-to-text problems and completes different tasks by adding different prefix prompts to the text, such as translation and summarization. T5 follows the standard Transformer encoder-decoder structure. We initialized the model parameters with T5-base and input the ABSA text data into the model. The data are converted into a sequence of word vectors through the word embedding layer and then passed into the Transformer encoder. The encoder transforms it into high-dimensional hidden representations. The decoder combines the encoder’s output and the previously generated text to autoregressively generate new words step by step. As shown in Figure 2.

During training, T5 uses cross-entropy loss to measure the difference between the generated text and our Self-Inference Template target, updating the model parameters accordingly.

L (x, y) = - \sum_{i = 1}^{n} l o g p_{θ} (y_{i} | x, y_{1}, \dots, y_{i - 1})

(2)

where n represents the length of the target sequence y.

3.2. Self-Inference Template

CoT provides specific thought processes in prompts, allowing large models to learn the way of thinking provided in the thought chain. The model then follows the thought chain, step by step, to enhance its reasoning abilities. However, in the generative model employed in this paper, we use the T5-base model, which has fewer parameters compared to large models. It is not suitable to train the model through examples.

Therefore, we directly formulate the intermediate reasoning process into the form of a template, as illustrated in Figure 3. This approach guides the model to reason step by step according to the template’s thought process. The first half of the template initially obtains the aspect term and opinion term, then infers the sentiment polarity based on the opinion term. In the second half of the template, the aspect category is deduced based on the obtained aspect term and opinion term. Finally, the aspect category and sentiment polarity are generated again to confirm the correctness of generating sentiment elements.

In the template, the aspect term is repeated three times, while the aspect category, opinion term, and sentiment polarity are each generated twice. Each output result may vary, so a numerical annotation is added in the lower right corner of each sentiment element to facilitate distinction. Leveraging CoT’s self-consistency [19], a voting aggregation is applied to the repetitively generated sentiment elements, ultimately resulting in aspect sentiment quadruplets. The specific model structure is illustrated in Figure 4.

3.3. Addition Prompt

In recent years, prompts have been widely employed in language model processing. Research indicates that by selecting appropriate prompts, the model’s behavior can be manipulated, enabling the language model to predict the desired outputs without additional training [20]. Our chosen T5 model [18] also supports prompt addition, aiding in model training. Therefore, we experimented with adding prompt prefixes to the text, specifying the task for the model. Experimental results demonstrate that prompts effectively assist the model in improving aspect sentiment quadruplet prediction capabilities.

3.4. Mask Tokens

Bert [15] utilizes a random token masking strategy to force the model to understand the text, enhancing the model’s error correction ability and overall accuracy.

To deepen the model’s understanding of the text and improve its ability to recognize sentiment elements, we applied a masking strategy to the data. We masked 10% of the text in the dataset. For the sentences to be masked, we randomly selected 10% of the tokens. Among these, 80% of the tokens were replaced with [mask], while the remaining 20% were randomly replaced with a word from the vocabulary. Experimental results indicate that the combined use of masking and prompt addition effectively aids the model in predicting quadruplets.

4. Experimental Setup

4.1. Dataset

To understand the performance of our model on different datasets, we conducted experiments on two main types of datasets, primarily focusing on explicit terms and datasets containing implicit opinions. The first type consists of the ASQP dataset curated by Zhang [10], including Rest15 and Rest16. This type of dataset does not include implicit opinion terms. The second type is the ACOS dataset proposed by cai [9], including ACOS_Restaurant and ACOS_Laptop. In this type of dataset, over 33% of the sentiment quadruples contain implicit opinions or aspect terms, placing higher demands on the model’s inference capability. The specific statistics for the four datasets are provided in Table 1. The proportions of explicit and implicit terms are illustrated in Figure 5.

For ACOS_Restaurant and ACOS_Laptop, the aspect categories are in a form similar to LAPTOP#GENERAL, which may be challenging for generative models to comprehend semantically. Inspired by Joseph [12], we replaced the aspect categories in the ACOS dataset with human-readable forms. For example, LAPTOP#GENERAL was replaced with “the laptop overall” to facilitate the model’s understanding and sentence rewriting.

4.2. Experiment Details

We opted for the T5-base [18] as the pre-trained generative model, with a training batch size set to 16, a learning rate of

3 \times 10^{- 4}

, and a fixed random seed of 42 to eliminate experimental bias caused by random factors. All experiments were conducted for 20 training epochs, and during the inference process, we employed greedy decoding to generate output sequences. Our experiments were performed on an Nvidia 4080 GPU.

4.3. Baselines

To assess the effectiveness of our approach compared to previous methods, we selected several strong baseline methods:

HGCN-BERT+BERT-Linear HGCN [21] jointly extracts aspect categories and sentiment polarities, utilizes BERT to extract corresponding aspect terms and opinion terms [22], and applies a linear layer for final aggregation.
HGCN-BERT+BERT-TFM Modification of the above model with the final linear layer replaced by Transformer blocks (BERT-TFM).
TASO-BERT-Linear TAS [8], originally designed for extracting unified triples of aspect categories, aspect terms, and sentiment polarities, is extended to TASO for handling ASQP tasks. Linear classification layers are used for prediction.
TASO-BERT-CRF A variant of the TASO model with a Conditional Random Field layer in the prediction stage.
TAS-BERT-ACOS On the basis of the TAS method, cai [9] designed a two-step pipeline approach that incorporates BERT to extract quadruples from ACOS data.
Extract-Classify-ACOS This method first extracts aspect terms and opinion terms from the original sentence and then classifies aspect categories and sentiment polarities based on these extracted terms [9].
GAS A generative baseline [23], modified by [10] to directly generate aspect sentiment quadruplets as the target sequence in the generative model.
Seq2Path Transforming the generation order of sentiments into the path of a tree, using a constrained beam search, automatically selecting valid paths with the help of additional tokens [24].
PARAPHRASE This method extracts (at, ac, sp, ot) by paraphrasing the original sentence as “ac is sp because at is ot” [10].
DLO Considering the impact of the order of generating each element in the quadruplet in generative models [11], 24 template orders were experimented with. The final template order was chosen based on the overall quadruplet extraction performance on the dataset.
ILO Similar to DLO, after experimenting with 24 template orders, the template order for each instance was chosen individually based on its own performance.

5. Results and Discussion

5.1. Main Results

The experimental results for various methods are reported in Table 2. For the ASQP dataset that does not contain implicit opinions, our model significantly improves various metrics compared to the Paraphrase method. The F1 scores for Rest15 and Rest16 are increased by 2.05% and 2.33%, respectively. In comparison to DLO and ILO methods, our Self-Inference Template slightly lags behind ILO on Rest15 but outperforms DLO and ILO on Rest16 without increasing the time cost. After adding prefix prompts, the model achieves the best results on Rest16. Combining prefix prompts and Mask operations on the smaller dataset Rest15 leads to a substantial improvement in model performance. With the assistance of these two methods, the model achieves optimal results on both datasets, with improvements of 1.02% and 1.73% compared to the Self-Inference Template.

For the ACOS dataset containing implicit opinion terms, the Self-Inference Template, compared to the Paraphrase method, showed a 2.94% improvement in F1 score on ACOS_Restaurant. This indicates that the Self-Inference Template, by guiding the model to think step by step, indeed enhances the model’s reasoning ability. However, for the ACOS_Laptop dataset, the improvement in the Self-Inference Template was marginal. This could be attributed to the excessive number of aspect categories in ACOS_Laptop, coupled with imbalanced data distribution among different aspect categories. The training set of ACOS_Laptop comprises a total of 114 aspect categories, with only 10 categories appearing in the tuples more than 100 times, and over half of the aspect categories appearing in tuples fewer than 10 times. The model struggles to adequately learn from each aspect category’s data. Despite the guidance provided by the Self-Inference Template for thoughtful reasoning, the model faces challenges in correctly classifying aspect categories with numerous classes and limited training examples. However, with the addition of prefix prompts and Mask operations, the F1 score for ACOS_Laptop increased by 1.44%. This indicates that our two methods effectively assist the model in learning.

Table 3 records the runtime of Paraphrase, SIT, and the combination of SIT with two methods. It can be observed that, compared to the Paraphrase model, the runtime of the Self-Inference Template has almost remained unchanged. The addition of the two small enhancements to the model has also had no impact on runtime.

5.2. Determination of Prefix Prompts

Currently, prefix prompts can be broadly categorized into hard prompts and soft prompts [14]. Hard prompts, also known as discrete prompts, are manually crafted prompts typically consisting of semantically meaningful phrases. On the other hand, soft prompts, also known as continuous prompts, are continuously updated and iterated during training, resembling a kind of updatable parameter without clear human-interpretable semantics.

Training with soft prompts requires a substantial amount of data for iterative updates, and the existing datasets for aspect sentiment quadruplet prediction are relatively small, making them unsuitable for training with soft prompts. Therefore, we opt for hard prompts, where we manually create prompt texts to assist the model’s understanding during training. We generated six prompt texts, as illustrated in Figure 6. Three of them were created based on the original template, and the other three were created based on the Self-Inference Template, informing the model about the task it needs to perform in three different forms. The experimental results are presented in Table 4.

Based on the experimental results, it can be observed that, for the Rest15, Rest16, and ACOS_Restaurant datasets, the first type of prefix prompt, which directly instructs the model to rewrite, can significantly help improve the model’s reasoning ability. For ACOS_Laptop, the third type of prefix, instructing the model to first identify the four sentiment elements and then rewrite, combined with the Mask operation, leads to the optimal results. The reason might be that Rest15, Rest16, and ACOS_Restaurant datasets have fewer aspect categories, allowing the model to adequately learn the data for each aspect category during training and understand the task requirements without the need for prompting the model to recognize sentiment elements. However, ACOS_Laptop has more aspect categories, and many of them have fewer occurrences, making it challenging for the model to fully learn each class of data, resulting in an insufficient understanding of the task requirements. Therefore, for ACOS_Laptop, the third type of prefix, prompting the model to recognize sentiment elements first and then rewrite, can provide the maximum assistance in helping the model quickly understand task requirements and enhance its capabilities.

The second type of prefix prompt created in a task assignment manner yielded the worst results, possibly due to the relatively limited parameter count of our T5-base model. Unlike larger language models like GPT, which can engage in task-oriented dialogues, our model may not benefit as much from prompts crafted in a task assignment format. Therefore, describing the task directly as a prefix prompt proves to be more effective. Interestingly, among the prefix prompts, instructing the model to rewrite the original template resulted in a higher improvement compared to using the Self-Inference Template. This may be attributed to the shorter nature of the prompt in the original template, mainly informing the model about the rewriting task it is about to perform. When rewriting sentences, the model learns the template based on the data, eliminating the need for extensive text prompts. Therefore, based on the experimental results, for Rest15, Rest16, and ACOS_Restaurant, selecting Prompt1 as the prefix prompt and for ACOS_Laptop, choosing Prompt6 as the prefix prompt proves to be most effective.

5.3. Ablation Study

On the Self-Inference Template, we proposed two methods to assist in the experiments. To understand the respective contributions of the Self-Inference Template and the two methods, we incorporated each method separately into the Paraphrase model and the Self-Inference Template. The experimental results are shown in Table 5. Overall, the Self-Inference Template proves beneficial for sentiment quadruple extraction across all four datasets. The addition of prefix prompts effectively enhances the ability of both the Paraphrase model and the self-inference model to extract sentiment quadruples in Rest15, Rest16, and ACOS_Restaurant datasets. The use of Mask Tokens on the Paraphrase model results in a decrease in performance, but when combined with prefix prompts on the Self-Inference Template, it helps the model achieve the best results on Rest15 and ACOS_Laptop. This suggests that the combination of the Self-Inference Template and the two methods yields impressive performance on some datasets, but the effectiveness of Mask Tokens is unstable and requires careful experimentation.

5.4. Model Overfitting Analysis

Due to the limited amount of data, the original model exhibits a significant overfitting issue, as shown in Figure 7. In all four datasets, the training set’s loss steadily decreases, but the validation set’s loss increases instead of decreasing. After applying our Self-Inference Template, a notable reduction in the validation set’s loss is observed. Although there is still a subtle upward trend, it is considerably alleviated compared to the original model, indicating a significant reduction in overfitting.

5.5. Error Analysis and Case Study

To understand the issues our model may encounter during inference, we conducted an error analysis and case study. We randomly sampled 100 data points from the test set of each dataset and performed sentiment quadruple extraction. Subsequently, we compared the quadruples inferred by the model with the ground truth labels, tallying the frequency of errors in predicting each sentiment element. Additionally, we recorded instances where the model overpredicted or underpredicted quadruples, as shown in Figure 8.

On the Rest15, Rest16, and ACOS_Restaurant datasets, similar to the findings by [10], the opinion term is the most challenging sentiment element to predict. The model struggles to grasp the length of opinion term extraction. Following that, we have aspect terms and aspect categories, where the model finds it difficult to discern implicit aspect terms. If the aspect prediction is incorrect, it can easily lead to further errors in predicting aspect categories, as illustrated in Example 1 in Figure 9. Apart from predicting sentiment elements incorrectly, the model also tends to overgenerate or undergenerate quadruples, as shown in Example 2 in Figure 9. Therefore, determining how to make the model generate an appropriate number of quadruples is a question that deserves more consideration. For ACOS_Laptop, aspect category prediction errors are most frequent, as discussed in Section 5.1, mainly due to the abundance and imbalance of aspect categories in ACOS_Laptop, leading to insufficient learning, and the model tends to get confused, as shown in Example 3 in Figure 9.

5.6. Practical Insights

In our work, in addition to the aforementioned methods, we also conducted some other experiments. When we initially observed the model’s overfitting problem, we tried to mitigate overfitting through data augmentation using pseudo-labels. We crawled 10,000 restaurant reviews from the internet, then cleaned the data and filtered it down to 3000 entries. We first extracted 1000 entries and used the models trained on Rest15 and Rest16 to infer these 1000 reviews. We then performed an intersection of the inference results from the two models, ultimately obtaining 300 entries. We added these 300 entries to the dataset and experimented with adding them to the training set and test set in various proportions. We found that the model was very sensitive to the data, with different addition proportions causing significant fluctuations in the model’s results. Therefore, we abandoned this method. These are some of our trial-and-error experiences, which we hope can provide some reference for future research.

6. Conclusions

In this work, we introduced a Self-Inference Template that leverages a chain of thought to assist the model in reasoning about aspect sentiment quadruples. Without increasing the time cost, this approach not only significantly improves the prediction results of quadruples but also effectively mitigates the overfitting issue caused by the limited amount of data. Additionally, we experimented with adding prefix prompts to the text and applying MASK operations to the text to assist in model training, which improved the model’s results to some extent. This indicates the research significance of these two methods, suggesting potential avenues for further exploration in future studies. Finally, we conducted experiments on both the ASQP dataset, which does not contain implicit opinions, and the ACOS dataset, which contains implicit opinions. The results showed that the Self-Inference Template improved by 3.07%, 4.06%, 3.32%, and 1.45% on Rest15, Rest16, ACOS_Restaurant, and ACOS_Laptop, respectively, compared to Paraphrase, demonstrating significant effectiveness.

Author Contributions

Conceptualization, S.L. and Y.Q.; methodology, Y.Q.; software, Y.Q.; validation, S.L. and Y.Q.; formal analysis, S.L.; investigation, Y.Q.; resources, S.L.; data curation, Y.Q.; writing—original draft preparation, Y.Q.; writing—review and editing, S.L.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [9,10].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, P.; Joty, S.; Meng, H. Fine-grained opinion mining with recurrent neural networks and word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1433–1443. [Google Scholar]
He, R.; Lee, W.S.; Ng, H.T.; Dahlmeier, D. An unsupervised neural attention model for aspect extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 388–397. [Google Scholar]
Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S.; AL-Smadi, M.; Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. Semeval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the ProWorkshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, San Diego, CA, USA, 16–17 June 2016; pp. 19–30. [Google Scholar]
Zhou, X.; Wan, X.; Xiao, J. Representation learning for aspect category detection in online reviews. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
Zhao, H.; Huang, L.; Zhang, R.; Lu, Q.; Xue, H. Spanmlt: A span-based multi-task learning framework for pair-wise aspect and opinion terms extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3239–3248. [Google Scholar]
Chen, S.; Liu, J.; Wang, Y.; Zhang, W.; Chi, Z. Synchronous double-channel recurrent network for aspect-opinion pair extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6515–6524. [Google Scholar]
Peng, H.; Xu, L.; Bing, L.; Huang, F.; Lu, W.; Si, L. Knowing what, how and why: A near complete solution for aspect-based sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8600–8607. [Google Scholar]
Wan, H.; Yang, Y.; Du, J.; Liu, Y.; Qi, K.; Pan, J.Z. Target-aspect-sentiment joint detection for aspect-based sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 9122–9129. [Google Scholar]
Cai, H.; Xia, R.; Yu, J. Aspect-category-opinion-sentiment quadruple extraction with implicit aspects and opinions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 340–350. [Google Scholar]
Zhang, W.; Deng, Y.; Li, X.; Yuan, Y.; Bing, L.; Lam, W. Aspect sentiment quad prediction as paraphrase generation. arXiv 2021, arXiv:2110.00796. [Google Scholar]
Hu, M.; Wu, Y.; Gao, H.; Bai, Y.; Zhao, S. Improving aspect sentiment quad prediction via template-order data augmentation. arXiv 2022, arXiv:2210.10291. [Google Scholar]
Peper, J.J.; Wang, L. Generative aspect-based sentiment analysis with contrastive learning and expressive structure. arXiv 2022, arXiv:2211.07743. [Google Scholar]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Fei, H.; Li, B.; Liu, Q.; Bing, L.; Li, F.; Chua, T.S. Reasoning Implicit Sentiment with Chain-of-Thought Prompting. arXiv 2023, arXiv:2305.11255. [Google Scholar]
Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. IEEE Trans. Knowl. Data Eng. 2022, 35, 11019–11038. [Google Scholar] [CrossRef]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-consistency improves chain of thought reasoning in language models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Cai, H.; Tu, Y.; Zhou, X.; Yu, J.; Xia, R. Aspect-category based sentiment analysis with hierarchical graph convolutional network. In Proceedings of the 28th International Conference on Computational Linguistics, Online, 8–13 December 2020; pp. 833–843. [Google Scholar]
Li, X.; Bing, L.; Zhang, W.; Lam, W. Exploiting BERT for end-to-end aspect-based sentiment analysis. arXiv 2019, arXiv:1910.00883. [Google Scholar]
Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. Towards generative aspect-based sentiment analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Virtual Event, 1–6 August 2021; pp. 504–510. [Google Scholar]
Mao, Y.; Shen, Y.; Yang, J.; Zhu, X.; Cai, L. Seq2path: Generating sentiment tuples as paths of a tree. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; pp. 2215–2225. [Google Scholar]
Li, S.; Zhang, Y.; Lan, Y.; Zhao, H.; Zhao, G. From Implicit to Explicit: A Simple Generative Method for Aspect-Category-Opinion-Sentiment Quadruple Extraction. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]

Figure 1. Example of Aspect Sentiment Quad Prediction.

Figure 2. The architecture of the T5 model [18].

Figure 3. Self-Inference Template.

Figure 4. Model Architecture.

Figure 5. Distribution of Explicit and Implicit Terms in the Dataset. EA represents explicit aspect terms, EO represents explicit opinion terms, IA represents implicit aspect terms, and IO represents implicit opinion terms. The proportions are illustrated for each category.

Figure 6. Prefix Prompt Texts.

Figure 7. Training and validation set loss for Rest15, Rest16, ACOS_Restaurant, and ACOS_Laptop.

Figure 8. Quadruple Error Statistics.

Figure 9. Cases of Quadruple Extraction Errors.

Table 1. Data statistics. #C, #S, #+, #0, and #− denote the number of aspect categories, the number of sentences, the number of positive, neutral and negative quads, respectively.

		Train	Dev	Test
Rest15	#C	13	12	12
	#S	834	209	537
	#+	1005	252	453
	#0	34	14	37
	#−	315	81	305
Rest16	#C	12	13	12
	#S	1264	316	544
	#+	1369	341	583
	#0	62	23	40
	#−	558	143	176
ACOS_Restaurant	#C	12	13	12
	#S	1530	171	583
	#+	1656	180	667
	#0	95	12	44
	#−	733	69	205
ACOS_Laptop	#C	114	71	81
	#S	2934	326	816
	#+	2583	279	716
	#0	227	24	65
	#−	1362	137	380

Table 2. Evaluation results compared with baseline methods in terms of precision (Pre, %), recall (Rec, %) and F1 score (F1, %). PT stands for the Add Prompt method, MT stands for the Mask Tokens method, and PM represents the combination of both the Add Prompt and Mask Tokens methods. The best scores are marked in bold. The prefix prompts are the optimal prompts for each dataset in Section 5.2. For Rest15 and Rest16, the experimental results of the baseline methods, * are from [10], and ⋆ are from [11]. For ACOS_Restaurant and ACOS_Laptop, the experimental results of the baseline methods, ♠ are from [9], ▴ are from [25], and ♣ are from [24]. ▾ indicates the reproduction of the official method on our dataset.

Methods	Rest15			Rest16			ACOS_Restaurant			ACOS_Laptop
Methods	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec	F1
HGCN-BERT+BERT-Linear *	24.43	20.25	22.15	25.36	24.03	24.68	-	-	-	-	-	-
HGCN-BERT+BERT-TFM *	25.55	22.01	23.65	27.40	26.41	26.90	-	-	-	-	-	-
TASO-BERT-Linear *	41.86	26.50	32.46	49.73	40.70	44.77	-	-	-	-	-	-
TASO-BERT-CRF *	44.24	28.66	34.78	48.65	39.68	43.71	-	-	-	-	-	-
TAS-BERT-ACOS ♠	-	-	-	-	-	-	26.29	46.29	33.53	47.15	19.22	27.31
Extract-Classify-ACOS ⋆♠	35.64	37.25	36.42	38.40	50.93	43.77	38.54	52.96	44.61	45.56	29.48	35.80
GAS *▴	45.31	46.70	45.98	54.54	57.62	56.04	53.57	54.34	53.95	40.70	40.17	40.43
Seq2Path ♣	-	-	-	-	-	-	62.38	55.02	58.47	41.46	41.00	41.23
Paraphrase *▾	46.16	47.72	46.93	56.63	59.30	57.93	61.02	59.73	60.37	44.87	44.10	44.48
DLO ⋆	47.07	49.33	48.18	57.92	61.80	59.79	-	-	-	-	-	-
ILO ⋆	47.78	50.38	49.05	57.58	61.17	59.32	-	-	-	-	-	-
SIT	47.89	50.13	48.98	58.98	61.60	60.26	63.13	63.49	63.31	44.38	44.61	44.49
SIT+PT	48.41	49.75	49.07	60.78	63.24	61.99	63.54	63.83	63.69	43.12	42.78	42.95
SIT+MT	47.93	49.50	48.70	58.30	60.96	59.60	61.79	63.27	62.52	44.46	44.35	44.41
SIT+PM	49.63	50.38	50.00	59.22	61.66	60.44	62.88	63.38	63.13	45.95	45.91	45.93

Table 3. Model Runtime (Unit: Seconds).

Methods	Running Time
Methods	Rest15	Rest16	ACOS_ Restaurant	ACOS_ Laptop
Paraphrase	152.24	224.81	266.91	501.65
SIT	151.16	225.55	263.80	495.60
SIT+PT	153.52	224.05	259.83	495.52
SIT+MT	154.39	225.97	268.03	496.58
SIT+PM	153.86	225.32	270.12	498.00

Table 4. Experimental Results Combining Different Prefix Prompts with the Self-Inference Template. The best scores are marked in bold.

Prompt Text	Rest15			Rest16			ACOS_Restaurant			ACOS_Laptop
Prompt Text	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec	F1
SIT	47.89	50.13	48.98	58.98	61.60	60.26	63.13	63.49	63.31	44.38	44.61	44.49
+Prompt1	48.41	49.75	49.07	60.78	63.24	61.99	63.54	63.83	63.69	43.79	43.57	43.68
+Prompt1+MT	49.63	50.38	50.00	59.22	61.66	60.44	62.88	63.38	63.13	43.18	42.96	43.07
+Prompt2	48.20	48.99	48.59	58.35	61.09	59.69	62.13	62.13	62.13	44.16	43.74	43.95
+Prompt2+MT	48.94	49.37	49.15	58.65	60.58	59.60	62.60	62.81	62.71	45.23	44.52	44.87
+Prompt3	46.99	48.11	47.54	53.50	56.15	54.79	61.43	61.22	61.33	44.20	43.74	43.97
+Prompt3+MT	45.41	46.10	45.75	57.89	60.46	59.14	61.20	62.59	61.88	44.70	44.70	44.70
+Prompt4	43.85	43.07	43.46	54.25	58.30	56.20	58.68	58.62	58.65	41.99	41.48	41.73
+Prompt4+MT	46.45	46.98	46.71	54.43	56.02	55.22	58.33	58.73	58.53	42.45	41.83	42.14
+Prompt5	47.43	48.87	48.14	59.21	61.09	60.14	63.46	63.61	63.53	43.93	43.39	43.66
+Prompt5+MT	47.27	49.12	48.18	57.95	61.47	59.66	61.88	62.59	62.23	43.46	43.30	43.38
+Prompt6	47.77	48.49	48.13	58.29	60.58	59.42	61.01	60.32	60.66	43.12	42.78	42.95
+Prompt6+MT	46.48	47.36	46.91	57.58	59.70	58.62	61.51	60.88	61.20	45.95	45.91	45.93

Table 5. Results of ablation experiments for four datasets. The best results are in bold.

Methods	Rest15			Rest16			ACOS_Restaurant			ACOS_Laptop
Methods	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec	F1
Paraphrase	46.16	47.72	46.93	56.63	59.30	57.93	61.02	59.73	60.37	44.87	44.10	44.48
Paraphrase+PT	48.46	49.56	49.00	58.99	61.58	60.26	60.07	59.40	59.73	44.51	43.32	43.91
Paraphrase+MT	45.51	46.54	46.02	58.19	61.33	59.72	57.71	57.84	57.78	43.53	42.89	43.21
Paraphrase+PM	47.58	48.30	47.94	57.11	58.82	57.95	60.09	60.29	60.19	44.54	43.24	43.88
SIT	47.89	50.13	48.98	58.98	61.60	60.26	63.13	63.49	63.31	44.38	44.61	44.49
SIT+PT	48.41	49.75	49.07	60.78	63.24	61.99	63.54	63.83	63.69	43.12	42.78	42.95
SIT+MT	47.93	49.50	48.70	58.30	60.96	59.60	61.79	63.27	62.51	44.46	44.35	44.41
SIT+PM	49.63	50.38	50.00	59.22	61.66	60.44	62.88	63.38	63.13	45.95	45.91	45.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, Y.; Lv, S. Generative Aspect Sentiment Quad Prediction with Self-Inference Template. Appl. Sci. 2024, 14, 6017. https://doi.org/10.3390/app14146017

AMA Style

Qin Y, Lv S. Generative Aspect Sentiment Quad Prediction with Self-Inference Template. Applied Sciences. 2024; 14(14):6017. https://doi.org/10.3390/app14146017

Chicago/Turabian Style

Qin, Yashi, and Shu Lv. 2024. "Generative Aspect Sentiment Quad Prediction with Self-Inference Template" Applied Sciences 14, no. 14: 6017. https://doi.org/10.3390/app14146017

APA Style

Qin, Y., & Lv, S. (2024). Generative Aspect Sentiment Quad Prediction with Self-Inference Template. Applied Sciences, 14(14), 6017. https://doi.org/10.3390/app14146017

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative Aspect Sentiment Quad Prediction with Self-Inference Template

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Aspect Sentiment Quad Prediction Based on the Generative Paradigm

3.2. Self-Inference Template

3.3. Addition Prompt

3.4. Mask Tokens

4. Experimental Setup

4.1. Dataset

4.2. Experiment Details

4.3. Baselines

5. Results and Discussion

5.1. Main Results

5.2. Determination of Prefix Prompts

5.3. Ablation Study

5.4. Model Overfitting Analysis

5.5. Error Analysis and Case Study

5.6. Practical Insights

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI