A More Fine-Grained Aspect-Sentiment-Opinion Triplet Extraction Task

Aspect Sentiment Triplet Extraction (ASTE) aims to extract aspect term, sentiment and opinion term triplets from sentences and tries to provide a complete solution for aspect-based sentiment analysis (ABSA). However, some triplets extracted by ASTE are confusing, since the sentiment in a triplet extracted by ASTE is the sentiment that the sentence expresses toward the aspect term rather than the sentiment of the aspect term and opinion term pair. In this paper, we introduce a more fine-grained Aspect-Sentiment-Opinion Triplet Extraction (ASOTE) Task. ASOTE also extracts aspect term, sentiment and opinion term triplets. However, the sentiment in a triplet extracted by ASOTE is the sentiment of the aspect term and opinion term pair. We build four datasets for ASOTE based on several popular ABSA benchmarks. We propose a Position-aware BERT-based Framework (PBF) to address this task. PBF first extracts aspect terms from sentences. For each extracted aspect term, PBF first generates aspect term-specific sentence representations considering both the meaning and the position of the aspect term, then extracts associated opinion terms and predicts the sentiments of the aspect term and opinion term pairs based on the sentence representations. Experimental results on the four datasets show the effectiveness of PBF.


Introduction
Aspect-based sentiment analysis (ABSA) (Hu and Liu, 2004;Pontiki et al., 2014Pontiki et al., , 2015Pontiki et al., , 2016) is a fine-grained sentiment analysis (Nasukawa and Yi, 2003;Liu, 2012) task and can provide more detailed information than general sentiment analysis.To solve the ABSA task, many subtasks have been proposed, such as, Aspect Term Extraction (ATE), Aspect Term Sentiment Analysis TOWE ATSA "attractive" conflict positive TOWE ATSA ASTE ("atmosphere" , conflict, "uncomfortable") ("atmosphere" , conflict, "attractive") ("servers", positive, "friendly") ASOTE ("atmosphere", negative, "uncomfortable") ("atmosphere", positive, "attractive") ("servers", positive, "friendly") Figure 1: An example showing the inputs and outputs of the tasks.For each arrow, when the head is a task name, the tail is an input of the task; when the tail is a task name, the head is an output of the task.The bold words are aspects.The underlined words are opinions.
(ATSA) and Target-oriented Opinion Words Extraction (TOWE) (Fan et al., 2019).An aspect term (aspect for short) is a word or phrase that refers to a discussed entity in a sentence.An opinion term (opinion for short) is a word or phrase that expresses a subjective attitude.ATE extracts aspects from sentences.Given a sentence and an aspect in the sentence, ATSA and TOWE predict the sentiment and opinions associated with the aspect.These subtasks can work together to tell a complete story, i.e. the discussed aspect, the sentiment of the aspect, and the cause of the sentiment.However, no previous ABSA study tried to provide a complete solution in one shot.Peng et al. (2020) proposed the Aspect Sentiment Triplet Extraction (ASTE) task, which attempted to provide a complete solution for ABSA.A triplet extracted from a sentence by ASTE contains an aspect, the sentiment that the sentence expresses toward the aspect, and one opinion associated with the aspect.The example in Figure 1 shows the inputs and outputs of the tasks mentioned above.However, the triplet extracted from a sentence by ASTE becomes confusing when the sentence has multiple opinions about the aspect and these opinions express different sentiments toward the aspect, since the sentiment in a triplet extracted by ASTE is the sentiment that the sentence expresses

ASTE Result ASOTE Result (ours)
1 The atmosphere is attractive , but a little uncomfortable.
("Food", negative, "average") ("Food", neutral, "average") The sentiment toward the aspect "Food" The sentiment of the aspect-opinion pair ("Food", "average") Figure 2: Differences between ASOTE and ASTE.In the third sentence, the negative sentiment toward the aspect "Food" is expressed without an annotatable opinion.
toward the aspect rather than the sentiment of the aspect and opinion pair.The third column in Figure 2 shows the extraction results of ASTE from the corresponding sentences.When seeing the triplets where the words indicating sentiments are red, people will be confused.Moreover, downstream tasks can not benefit from these triplets.
In this paper, we introduce a more finegrained Aspect-Sentiment-Opinion Triplet Extraction (ASOTE) task.ASOTE also extracts aspect, sentiment and opinion triplets.In the triplet extracted by ASOTE the sentiment is the sentiment of the aspect and opinion pair.The fourth column in Figure 2 shows the extraction results of the ASOTE task from the corresponding sentences.In addition, we build four datasets for ASOTE based on several popular ABSA benchmarks.
We propose a Position-aware BERT-based Framework (PBF) to address ASOTE.PBF first extracts aspects from sentences.For each extracted aspect, PBF then extracts associated opinions and predicts the sentiments of the aspect and opinion pairs.PBF obtains triplets by merging the results.Since a sentence may contain multiple aspects associated with different opinions, to extract the corresponding opinions of a given aspect, similar to previous models proposed for the TOWE task (Fan et al., 2019;Wu et al., 2020b;Pouran Ben Veyseh et al., 2020;Jiang et al., 2021), PBF generates aspect-specific sentence representations.To accurately generate aspect-specific sentence representations, both the meaning and the position of the aspect are important.Some methods has been proposed to integrate the position information of aspects into non-BERT based models for some ABSA subtasks, such as, Gu et al. (2018); Li et al. (2018a) for ATSA, however, how to integrate the position information of aspects into BERT (Devlin et al., 2019) based modes has not been studied well.PBF generates aspect-specific sentence representations considering both the meaning and the position of the aspect.We explore several methods integrating the position information of aspects into PBF.
Our contributions are summarized as follows: • We introduce a new aspect based sentiment analysis subtask: Aspect-Sentiment-Opinion Triplet Extraction (ASOTE).
• We build four datasets for ASOTE and release the datasets for public use as a benchmark.
• We propose a Position-aware BERT-based Framework (PBF) to address ASOTE.
• Experimental results on the four datasets demonstrate the effectiveness of our method.

Dataset Construction
Data Collection We annotate four datasets (i.e., 14res, 14lap, 15res, 16res) for our propsoed Aspect-Sentiment-Opinion Triplet Extraction (ASOTE) task.First, we construct four Aspect Sentiment Triplet Extraction (ASTE) datasets.Similar to previous studies (Peng et al., 2020;Xu et al., 2020), we obtain four ASTE datasets by aligning the four SemEval Challenge datasets (Pontiki et al., 2015(Pontiki et al., , 2016) ) and the four Target-oriented Opinion Words Extraction (TOWE) datasets (Fan et al., 2019).The four SemEval Challenge datasets are restaurant and laptop datasets from SemEval 2014, and restaurant datasets from SemEval 2015 and SemEval 2016.The four SemEval Challenge datasets provide the annotation of aspect terms and the corresponding sentiments, and the four TOWE datasets were obtained by annotating the corresponding opinion terms for the annotated aspect terms in the four SemEval Challenge datasets.
Data Annotation We invited a researcher who works on natural language processing (NLP) and an undergraduate to annotate the sentiments of the aspect-opinion pairs in the triplets of the four ASTE datasets.The annotation tool we used is brat (Stenetorp et al., 2012).Each time, we only provided triplets of one aspect term to the annotators.For each aspect term, not only the aspect term and its corresponding opinion terms but also the sentiment of the aspect term were provided to the annotators.Figure 3 (a) shows an example of what we provided to the annotators and Figure 3 (b) shows the results of annotation.When annotating the sentiment of an aspect-opinion pair, the annotators need to consider both the opinion itself and the context of the opinion.For example, given the sentence, "The decor is night tho...but they REALLY need to clean that vent in the ceiling...its quite un-appetizing, and kills your effort to make this place look sleek and modern."5and one aspect-opinion pair, ("place", "sleek"), the sentiment should be negative, even though the sentiment of "sleek" is positive.The kappa statistic (Cohen, 1960)  on NLP.

Dataset Analysis
The statistics of the four ASOTE datasets are summarized in Table 1.Since #diff_s2 is always greater than 0, the annotators have to annotate the sentiments of the triplets in which the aspect only have one triplet and the sentiment of the aspect is not conflict.That is, we can not treat the sentiment of the aspect in these triplets as the sentiment of these triplets.For example, for the third sentence in Figure 2, the aspect "Food" has negative sentiment, while the correct sentiment of its only one triplet, ("Food", neutral, "average"), is neutral.

Method
In this section, we describe our Positionaware BERT-based Framework (PBF) for Aspect-Sentiment-Opinion Triplet Extraction (ASOTE).

Task Definition
Given a sentence S = {w 0 , ..., w i , ..., w n−1 } containing n words, ASOTE aims to extract a set of triplets: , where a is an aspect, o is an opinion, s is the sentiment of the aspect-opinion pair (a, o), and |T | is the number of triplets in the sentence.When a sentence does not contain triplets, |T | = 0.

PBF
Figure 4 shows the overview of PBF.PBF contains three models.Given a sentence S = {w 0 , ..., w i , ..., w n−1 }, the Aspect Term Extraction (ATE) model first extracts a set of aspects A = {a 0 , ..., a j , ..., a m−1 }.For each extracted aspect, a j , the Target-oriented Opinion Words Extraction (TOWE) model then extracts its opinions O = {o 0 j , ..., o k j , ..., o }, where l j is the number of opinions with respect to the j-th aspect and l j ≥ 0. Finally, for each extracted aspect-opinion pair (a j , o k j ), the Aspect-Opinion Pair Sentiment Classification (AOPSC) model predicts its sentiment s k j ∈ P = {positive, neutral, negative}.PBF obtains the triplets by merging the results of the three models: In PBF, all three models use BiLSTM (Graves et al., 2013) with BERT (Devlin et al., 2019) as sentence encoder.
Since a sentence may contain multiple aspects associated with different opinions, to extract the associated opinions of a particular aspect, the TOWE model generates aspect-specific sentence represen- tations for the aspect.It's intuitive that both the meaning and the position of the aspect are important for producing aspect-specific sentence representations.In other words, we need to tell the TOWE model what the aspect is and where the aspect is in the sentence.Given the sentence S and an aspect a j in the sentence, we first replace the words of the aspect with the word "aspect", which tells the TOWE model where the aspect is in the sentence.We then append the words of the aspect to the end of the sentence, which tells the model what the aspect is.Finally, we obtain a new sentence S A B = {w 0 , ..., w i , ..., w q }.We also generate seg- To predict the sentiment of an aspect-opinion pair, the AOPSC model (Figure 4 (c)) also generate aspect-specific sentence representations for the aspect.The inputs of the AOPSC model are the same as the TOWE model.

ATE
We formulate ATE as a sequence labeling problem.The encoder takes S B , I seg and I pos as inputs, and outputs the corresponding sentence representation, to predict the tag y A i ∈ {B, I, O} (B: Begin, I: Inside, O: Outside) of the word w i .It can be regarded as a three-class classification problem at each position of S B .We use a linear layer and a softmax layer to compute prediction probability ŷA i : where W A 1 and b A 1 are learnable parameters.The cross-entropy loss of ATE task can be defined as follows: where y A i denotes the ground truth label.I is an indicator function.If y A i == t, I = 1, otherwise I = 0. We minimize L AT E to optimize the ATE model.
Finally, ATE model decodes the tag sequence of the sentence and outputs a set of aspects A = {a 0 , ..., a j , ..., a m−1 }.

TOWE
We aslo formulate TOWE as a sequence labeling problem.The TOWE model has the same architecture as the ATE model, but they do not share the parameters.The TOWE model takes S A B , I A seg and I A pos as inputs and outputs the opinions O = {o 0 j , ..., o k j , ..., o } of the aspect a j .

AOPSC
Given an aspect a j and its opinions {o 0 j , ..., o k j , ..., o }, the AOPSC model predicts the sentiments {s 0 j , ..., s k j , ..., s } of all aspect-opinion pairs, {(a j , o 0 j ), ..., (a j , o k j ), ..., (a j , o )}, at once.The encoder of the AOPSC model takes the new sentence S A B , the segment indices I A seg and the position indices I A pos as inputs and outputs the aspect-specific sentence representation, H S = {h S 0 , ..., h S q }.We then obtain the representation of an opinion by averaging the hidden representations of the words in the opinion.The representation h o j k of opinion o k j is used to make sentiment prediction ŷo j k of opinion o k j : where W S 1 and b S 1 are learnable parameters.The loss of the AOPSC task is the sum of all opinions' cross entropy of the aspect: where y o j k denotes the ground truth label.We minimize L AOP SC to optimize the AOPSC model.

Datasets and Metrics
We evaluate our method on two types of datasets: TOWE-data (Fan et al., 2019) is used to compare our method with previous methods proposed for the Target-oriented Opinion Words Extraction (TOWE) task on the TOWE task.TOWE-data only includes the sentences that contain pairs of aspect and opinion and the aspect associated with at least one opinion.Following previous works (Fan et al., 2019;Wu et al., 2020b), We randomly select 20% of training set as development set for tuning hyperparameters and early stopping.
ASOTE-data is the data we build for our Aspect-Sentiment-Opinion Triplet Extraction (ASOTE) task and is used to compare the methods on the ASOTE task.ASOTE-data can also be used to evaluate the TOWE models on the TOWE task.Compared with TOWE-data, ASOTE-data additionally includes the sentences that do not contain aspect-opinion pairs and includes the aspects without opinions.Since methods can encounter these kind of examples in real-world scenarios, ASOTEdata is more appropriate to evaluate methods on the TOWE task.
We use precision (P), recall (R), and F1-score (F1) as the evaluation metrics.For the ASOTE task, an extracted triplet is regarded as correct only if predicted aspect spans, sentiment, opinion spans and ground truth aspect spans, sentiment, opinion spans are exactly matched.

Our Methods
We provide the comparisons of several variants of our Position-aware BERT-based Framework (PBF).The difference between these variants is the way they generate the new sentence S A B , the segment indices I A seg and the position indices I A pos .PBF -w/o A doesn't append the words of the aspect to the end of the original sentence.In other words, this variant doesn't know what the aspect is., given the sentence "Rice is too dry, tuna was n't so fresh" and the aspect "Rice".
PBF -w/o P does not replace the words of the aspect with the word "aspect", namely that this variant does not know where the aspect is.This model has been used on some aspect-based sentiment analysis subtasks to generate aspect-specific sentence representations (Jiang et al., 2019;Li et al., 2020b).
PBF -w/o AP neither appends the words of the aspect to the end of the original sentence, nor replaces the words of the aspect with the word "aspect".
PBF-M1 does not replace the words of the aspect with the word "aspect".In order to tell the model the position of the aspect, the words of the aspect in the original sentence and the words of the aspect appended to the original sentence have the same position indices.This method has been used on relation classification (Zhong and Chen, 2020).
PBF-M2 does not replace the words of the aspect with the word "aspect".In order to tell the model the position of the aspect, the position indices of the words of the aspect in the original sentence are marked as 0, and the position indices of other words are the relative distance to the aspect.This method has been used on the aspect-term sentiment analysis task (Gu et al., 2018).
PBF-M3 modifies the original sentence S by inserting the special token # at the beginning of the aspect and the special token $ at the end of the aspect.Special tokens were first used by Wu and He (2019) to incorporate target entities information into BERT on the relation classification task.

Implementation Details
We implement our models in PyTorch (Paszke et al., 2017).We use the uncased basic pre-trained BERT.

2014
).The learning rates is 0.00002.we apply a dropout of p = 0.5 after the BERT and BiLSTM layers.We apply early stopping in training and the patience is 10.We run all models for 5 times and report the average results on the test datasets.For the baseline models of the ASOTE task, we first convert our datasets to the datasets which has the same formats as the inputs of the baseline models, then run the code that the authors released on the converted datasets.

Results
The results of the ASOTE task are shown in Table 2.We have several observations from Table 2. First, MTL outperforms JET t on all datasets, because JET t can extract at most one triplet for an aspect.Although JET o can extract at most one triplet for an opinion, JET o outperforms JET t on all datasets and surpasses MTL on 3 of 4 datasets, because the opinions belonging to multiple triplets are less than the aspects belonging to multiple triplets 9 .Second, GTS-CNN and GTS-BiLSTM outperform both JET t and JET o on all datasets, and GTS-BERT also achieves better performance than JET t +bert and JET o +bert .GTS-BERT is the best baseline model.Third, our proposed PBF surpasses GTS-BERT on all datasets.Since the Aspect Term Extraction (ATE) model and the Aspect-Opinion Pair Sentiment Classification (AOPSC) model in PBF are vanilla 10 , compared with previous models, the advantages of PBF are from the TOWE model.However, GTS-BERT can not be applied to the TOWE 8 https://github.com/l294265421/GTS-ASOTE 9More statistics about our ASOTE datasets can be found in the Appendix B. 10 The results of PBF and its variants on AOPSC can be found in the Appendix C. Compared with PBF -w/o AP, PBF and the other variants can't obtain better performance consistently on all datasets.

Ground truth PBF -w/o A PBF -w/o P PBF
1 We really enjoy the food, was areally great food.
("crust", negative, "dry") ("crust", negative, "burnt") ("crust", negative, "dry") ("crust", negative, "raw") ("crust", negative, "burnt") ("crust", negative, "cold") ("crust", negative, "dry") ("crust", negative, "burnt") ("crust", negative, "dry") ("crust", negative, "burnt") task directly, so we compare PBF with GTS-BERT on the aspect-Opinion Pair Extraction (OPE) (Wu et al., 2020a) task.The results of OPE are shown in Table 3, which shows that PBF also outperforms GTS-BERT on all datasets.Fourth, PBF outperforms PBF -w/o P on all datasets, indicating that integrating position information of aspects can boost the model performance.Fifth, PBF -w/o A surpasses PBF -w/o P on the 14lap, 15res and 16res datasets, which indicates that the position information of aspects is more effective than the meanings of aspects on these datasets.Sixth, PBF outperforms PBF-M1 and PBF-M2, which shows that our method that tells the model where the aspect is is more effective.Although the mehtod used by PBF-M2 to integrate the position information of aspects into it has been successfully applied to non-BERT based models, it is not effective enough for BERT-based models.Seventh, PBF outperforms PBF-M3, indicting our method is more effective than the method that integrating the position information and meaning of an aspect into a model by inserting special aspect markers for the aspect.
The possible reason is that the additional special tokens may destroy the syntax knowledge learned by BERT.Last but not least, PBF -w/o AP obtains the worst performance among all variants, which further demonstrates that both the position and the meaning of an aspect are important.

Case Study
To further understand the effect of the position and the meaning of an aspect, we perform a case study on two sentences, as displayed in Figure 6.In the first sentence, the bold "food" and underlined "food" are different aspects.The positions of the aspects help PBF and PBF -w/o A to extract different opinions for aspects with the same meaning.In the second sentence, with the help of the meaning of the aspect "crust", PBF and PBF -w/o P do not extract "raw" and "cold" as the opinions of "crust".

Comparison Methods
On the TOWE task, we compare our methods with (1) three non-BERT models: IOG (Fan

Results
The results on ASOTE-data are shown in Table 4 and the results on TOWE-data are shown in Table 5.We draw the following conclusions from the results.First, PBF outperforms all baselines proposed for TOWE on the TOWE-data, indicating the effectiveness of our method.Second, PBF -w/o P also surpasses all baselines on the TOWE-data.To the best of our knowledge, no previous study evaluates the performance of this method on TOWE.Third, regarding PBF and its variants, we can obtain conclusions from A More Related Work Wang et al. (2016Wang et al. ( , 2017) ) have annotated the opinions and thier sentiments of the sentences in the restaurant and laptop datasets from SemEval-2014 Task 4 (Pontiki et al., 2014) and the restaurant dataset from SemEval-2015 Task 12 (Pontiki et al., 2015).Is it necessary to annotate the sentiments of the aspect and opinion pairs in the Aspect Sentiment Triplet Extraction (ASTE) datasets for obtaining our Aspect-Sentiment-Opinion Triplet Extraction (ASOTE) datasets?The answer is yes.The reasons are as follows: • The sentiments of aspect and opinion pairs are different from the sentiments of opinions.
• The opinions annotated by Wang et al. (2016Wang et al. ( , 2017) ) are different from the opinions annotated in the Target-oriented Opinion Words Extraction (TOWE) datasets (Fan et al., 2019) which are used to construct our ASOTE datasets.For example, given this sentence, "those rolls were big , but not good and sashimi wasn't fresh.", the opinions and their sentiments annotated by Wang et al. (2016Wang et al. ( , 2017) ) are ("big", positive), ("good", positive), and ("fresh", positive), while the opinions annotated in the TOWE datasets are "big", "not good" and "wasn't fresh" and the triplets containing "not good" and "wasn't fresh" are negative.We think the opinions annotated in the TOWE datasets are more appropriate for the ASOTE task.Zhang et al. (2020) defined their Opinion Triplet Extraction task as an integration of aspectsentiment pair extraction (Zhou et al., 2019;Li et al., 2019;Phan and Ogunbona, 2020) and aspectopinion co-extraction (Wang et al., 2016(Wang et al., , 2017;;Dai and Song, 2019).The obtained Opinion Triplet Extraction task has the same goal as our ASOTE task.
However, the authors used the ASTE datasets to evaluate the perfromances of their models on the Opinion Triplet Extraction task and said that the Opinion Triplet Extraction task is the same as the ASTE task in the corresponding github repository of their paper.In fact, combining aspect-sentiment pair extraction with aspect-opinion co-extraction can obtain neither the ASTE task nor our ASOTE task.Therefore, we don't think they introduce the ASOTE task.

B More Data Statistics
More statistics about our ASOTE-data are shown in Table 6.

C More Experimental Results
The results including precision (P), recall (R) and F1 (F) score of the aspect-opinion pair extraction task on the ASOTE-data are shown in Table 7.
The results including precision (P), recall (R) and F1 (F) score of the TOWE task on the ASOTEdata are shown in Table 8.
The results including precision (P), recall (R) and F1 (F) score of the TOWE task on the TOWEdata are shown in Table 9.
The results of the Aspect-Opinion Pair Sentiment Classification (AOPSC) task are shown in Table 10.Position-aware BERT-based Framework (PBF) and its variants including PBF -w/o AP obtain similar performance.
The results of the Aspect Term Extraction (ATE) task are shown in Table 11.

C.1 Case Study
The extraction results of our Position-aware BERTbased Framework (PBF) on the sentence "We been there and we really enjoy the food, was areally great food, and the service was really good." are shown in Figure 7. From the results, we can see that all variants except for PBF -w/o P include the position information of aspects.
Figure 8 shows three sentences.While PBF correctly extracts all triplets from these sentences, GTS-BERT can't correctly extract all triplets from these sentences.
Table 12 shows some hard sentences that both GTS-BERT and PBF can not correctly extract all triplets from.

Figure 3 :
Figure 3: An example of annotating the sentiments of the aspect and opinion pairs on the ASTE triplets for the ASOTE task.
ment indices I A seg = {0, ..., 1} and position indices I A pos = {0, ..., q} for the new sentence.The encoder of the TOWE model (Figure 4 (b)) takes S A B , I A seg and I A pos as inputs, and can generate aspectspecific sentence representations.

Figure 5 :
Figure5: The inputs of PBF-M1, PBF-M2 and PBF-M2, given the sentence "Rice is too dry, tuna was n't so fresh" and the aspect "Rice".

Table 1 :
Statistics of our ASOTE datasets.#zero_t, #one_t and #m_t represent the number of aspects without triplet, with one triplet and with multiple triplets, respectively.#d_s1 represents the number of aspects that have multiple triplets with different sentiments.#d_s2 represents the number of aspects which only have one triplet and whose sentiments are not conflict and are different from the sentiment of the corresponding triplet.#t_d represents the number of the triplets whose sentiments are different from the sentiments of the aspects in them.

Table 2 :
The BERT is fine-tuned during training.The batch size is set to 32 for all models.All models are optimized by the Adam optimizer (Kingma and Ba, Results of ASOTE task.The bold F1 scores are the best scores among PBF and the baselines.The underlined F1 scores are the best scores among PBF and its variants.

Table 3 :
Results of the OPE task in terms of F1.

Table 4 :
Results of the TOWE task in terms of F1 on the ASOTE-data.

Table 5 :
Results of the TOWE task in terms of F1 on the TOWE-data.The results of the baselines cite form the original papers.
Table 4 similar to the conclusions obtained from Table2, because the differences of these models' performance on ASOTE are mainly brought by the differences of their performacnes on TOWE.Fourth, since the methods (i.e., PBF, PBF -w/o A, PBF -w/o P and PBF -w/o AP) obtain better performance on TOWE-data than on ASOTE-data, ASOTE-data is a more challenging dataset for TOWE.Opinion Pair Sentiment Classification (AOPSC) are also highly correlated with each other, we can improve PBF by turning it into a joint model which jointly trains the ATE model, the TOWE model and the AOPSC model.However, it is not easy to jointly train the ATE model and the TOWE model, since we need to use the aspects that the ATE model extracts to modify the sentences that the TOWE model takes as input.In the future, we will explore how to jointly train the ATE model and the TOWE model.