PHNN: A Prompt and Hybrid Neural Network-Based Model for Aspect-Based Sentiment Classiﬁcation

: Aspect-based sentiment classiﬁcation (ABSC) is an important task in natural language processing (NLP) that aims to predict the sentiment polarity of different aspects in a sentence. The attention mechanism and pre-trained models are commonly used in ABSC tasks. However, a single pre-trained model typically does not perceive downstream tasks very well, and the attention mechanism usually neglects the syntactic information of sentences. In this paper, we propose a prompt and hybrid neural network (PHNN) model, which utilizes the prompt and a hybrid neural network structure to solve the ABSC task. More precisely, it ﬁrst uses the prompt to convert an input sentence into cloze-type text and utilizes RoBERTa to deal with the input. Then, it applies the graph convolutional neural network (GCN) combined with the convolutional neural network (CNN) to extract the syntactic features of the sentence while using bi-directional long short-term memory (BiLSTM) to obtain the semantic features of the sentence. Further, it utilizes the multi-head attention (MHA) mechanism to learn attention in the sentence and aspect words. Finally, the sentiment polarity of the aspect words is obtained by using the softmax function. Experiments on three benchmark datasets show that PHNN has the best performance compared with other baselines, validating the efﬁciency of our model.


Introduction
Sentiment analysis (SA) is an important research aspect of NLP, which studies emotions and attitudes about an entity in natural language texts.ABSC is an entity-level, fine-grained SA task that aims to determine the sentiment polarity (e.g., negative, neutral, or positive) of an entity in a sentence.E.g., in a comment sentence about a restaurant such as "poor restaurant but good food", this sentence contains two aspects of sentiment polarity: the aspect word "food" shows a positive sentiment, and the aspect word "environment" indicates a negative sentiment.ABSC can accurately identify the attitude towards a particular aspect, instead of simply assigning sentiment polarity to the sentence.
Traditional research has utilized various neural networks with attention mechanisms to extract sentence representations [1][2][3].However, the attention-based models only pay attention to the semantic information of a sentence, ignoring its syntactic dependence information.When the sentence contains multiple sentiment words with opposite polarities, the attention mechanism easily focuses on sentiment opinion words that are unrelated to aspect words.Taking the sentence in Figure 1 as an example, in terms of environment, the opinion word "good" may recieve more attention than the opinion word "poor", but the opinion word "good" is related to another aspect of the sentence, namely, "food".The graph neural network (GNN) model is suitable for processing unstructured information.Using GNN on the syntactic dependency tree to solve the ABSC task usually has better results than traditional neural networks since the dependency tree can establish connections between related words.Considering Figure 1 as an example, there is a dependency between the aspect word "environment" and the opinion word "poor".Zhang et al. [4] applied GCN to the ABSC task, using the dependency tree and attention mechanism for sentiment classification.Huang et al. [5] utilized the graph attention network and MHA to update the feature representations of nodes.Zhao et al. [6] proposed a GCNbased ABSC model, effectively capturing the sentiment dependencies among multiple aspects in a sentence.
Since the emergence of large-scale, pre-trained models, such as BERT [7] and RoB-ERTa [8], NLP tasks have generally begun to fine-tune from pre-trained models.E.g., Ranaldi [9] et al. compared BERT and interpretable tree-based approaches to study the syntactic knowledge of downstream tasks and demonstrated the effectiveness of the BERT-based model.However, researchers have found a gap between downstream tasks and pre-trained models.That is, when solving downstream tasks based on the pre-trained model, the pre-trained model is adapted to the downstream task.The prompt technology solves this problem.Some recent papers have used prompts attached to raw input text to guide language models to perform different tasks.One of the earliest examples is [10], which evaluated the efficiency of the GPT-2 model on downstream tasks by using prompts without any fine-tuning.Brown et al. [11] added prompts to sentences to make accurate predictions in the classification task, converting the task into a pre-training task, that is, a masked language model (MLM).Schick et al. [12] used prompts to achieve advanced results in text classification.
Based on the above analysis, to better adapt the pre-trained model to downstream tasks and make full use of the semantic and syntactic information of sentences, this paper proposes the PHNN model, which adds a prompt to adjust the input sequence and captures the sentiment of the aspect words through a hybrid neural network.This approach better extracts aspect words combined with contextual semantic and syntactic information.The validity of this model is verified on three benchmark datasets, and the contributions in this paper are concluded as follows:

•
This paper utilizes the prompt technology to convert the input into cloze-type text, making the downstream ABSC task more suitable for the pre-trained model; The rest of the paper is organized as follows: the related work is reviewed in Section 2, and the details of PHNN are introduced in Section 3. Experiments are conducted and analyzed in Section 4, and the paper concludes in Section 5.The graph neural network (GNN) model is suitable for processing unstructured information.Using GNN on the syntactic dependency tree to solve the ABSC task usually has better results than traditional neural networks since the dependency tree can establish connections between related words.Considering Figure 1 as an example, there is a dependency between the aspect word "environment" and the opinion word "poor".Zhang et al. [4] applied GCN to the ABSC task, using the dependency tree and attention mechanism for sentiment classification.Huang et al. [5] utilized the graph attention network and MHA to update the feature representations of nodes.Zhao et al. [6] proposed a GCN-based ABSC model, effectively capturing the sentiment dependencies among multiple aspects in a sentence.
Since the emergence of large-scale, pre-trained models, such as BERT [7] and RoBERTa [8], NLP tasks have generally begun to fine-tune from pre-trained models.E.g., Ranaldi [9] et al. compared BERT and interpretable tree-based approaches to study the syntactic knowledge of downstream tasks and demonstrated the effectiveness of the BERT-based model.However, researchers have found a gap between downstream tasks and pre-trained models.That is, when solving downstream tasks based on the pre-trained model, the pre-trained model is adapted to the downstream task.The prompt technology solves this problem.Some recent papers have used prompts attached to raw input text to guide language models to perform different tasks.One of the earliest examples is [10], which evaluated the efficiency of the GPT-2 model on downstream tasks by using prompts without any fine-tuning.Brown et al. [11] added prompts to sentences to make accurate predictions in the classification task, converting the task into a pre-training task, that is, a masked language model (MLM).Schick et al. [12] used prompts to achieve advanced results in text classification.
Based on the above analysis, to better adapt the pre-trained model to downstream tasks and make full use of the semantic and syntactic information of sentences, this paper proposes the PHNN model, which adds a prompt to adjust the input sequence and captures the sentiment of the aspect words through a hybrid neural network.This approach better extracts aspect words combined with contextual semantic and syntactic information.The validity of this model is verified on three benchmark datasets, and the contributions in this paper are concluded as follows:

•
This paper utilizes the prompt technology to convert the input into cloze-type text, making the downstream ABSC task more suitable for the pre-trained model; The rest of the paper is organized as follows: the related work is reviewed in Section 2, and the details of PHNN are introduced in Section 3. Experiments are conducted and analyzed in Section 4, and the paper concludes in Section 5.

Related Work
ABSC is a fine-grained subtask of aspect-based sentiment analysis (ABSA) that seeks to identify the sentiment polarity of a given aspect in a sentence.Classical methods mainly utilize CNN, recurrent neural networks (RNNs), and attention mechanisms to solve the ABSC task.Fan et al. [13] proposed the incorporation of attention in CNN to capture word expressions in sentences.Joshi et al. [14] applied CNN to extract features from text attention-based neural networks and model the semantic relations between sentence and aspect words.Xu et al. [15] proposed an MHA network to solve the ABSC problem when aspects contain multiple words.Zhang et al. [16] proposed an attention network that combined the two attention parts of a sentence to obtain better contextual representation.
In recent years, GNN has received much attention due to its ability to deal with unstructured content.Moreover, in ABSC tasks, GNN can handle syntactic dependency trees.Sun et al. [17] constructed a dependency tree model using BiLSTM to learn sentence feature representation and enhance sentence presentation through GCN.Wang et al. [18] pruned and reshaped the ordinary dependency tree and proposed a relational graph attention network to encode the new dependency tree.
With the development of language models, pre-trained models have achieved remarkable results on many NLP tasks, e.g., BERT and RoBERTa.In ABSA tasks, pre-trained models convert traditional static word vectors into dynamic word vectors with better dynamic semantic representations, effectively solving the sentiment analysis problem in long sentences and gradually becoming a standard model.Sun et al. [19] devised an aspectbased approach to solve the ABSA task by constructing auxiliary sentences and converting ABSA into a sentence-to-sentence classification problem.Yin et al. [20] proposed SentiBERT, a variant of BERT that can capture the sentiment features of a text more effectively.Alexandridis et al. [21] used BERT to perform emotion detection in social media text written in Greek.Sirisha et al. [22] combined RoBERTa and LSTM to analyze people's emotions on the conflict between Ukraine and Russia through Twitter data.Although the pre-trained model is helpful in NLP tasks, it often suffers from the problem that it is less aware of the downstream task, and thus fails to exploit its full potential.
The prompt is a new fine-tuning paradigm inspired by GPT-3 [11], which has better semantic modeling for NLP tasks.The common practice for the prompt technology is to insert prompts with [mask] into the original input text and pre-train the model to predict words likely to occur at [mask] locations.Li et al. [23] first applied prompts to ABSA tasks, given known aspects and perspectives, constructing successive prompts to predict the corresponding sentiment categories.Gao et al. [24] dynamically selected cases relevant to each context to generate prompts to fine-tune the model automatically.Hu et al. [25] introduced knowledgeable prompt tuning to utilize external knowledge of sentences, thus improving the stability of prompt turning.
To solve the problem of inconsistent upstream and downstream ABSC tasks based on the pre-trained model, this paper designs input text based on the prompt, splices the original sentence, prompt text, and aspect words as the input of the pre-trained model, uses GCN combined with CNN to extract the syntactic information of the sentence, utilizes BiLSTM to obtain the semantic information of the sentence, and, finally, uses MHA to interact with the sentence and aspect words to further extract sentiment information.

Prompt Text Construction Layer
The main goal of the prompt text construction layer is to use the prompt mechanism to create prompt text.Adding prompt text helps the model to better understand the semantic relations between context and aspect words, thus aligning the upstream and downstream tasks.This method maximizes the power of MLM.The core of the prompt mechanism is to use the prompt text marked with [mask] to simulate the goal of the pretrained model before training.Through this, we can transform the sentiment analysis task into a cloze task.In this paper, MLM is used to implement the cloze task.Different from BERT, < CLS > is marked as < s >, and < SEP > is marked as </s >.Adding a prompt to the input text can leverage the ability of the pre-trained model, improving its perception performance to downstream tasks.Figure 3 shows the process of the prompt text construction in this paper.As shown in Figure 3, given a sentence  and an aspect term , we change the original sentence  to  + , and the prompt text  is defined as  =  + +  .More precisely,  is defined as "What is the sentiment about" and  is defined as "?It was < mask >".E.g., if the original input sentence  = "poor restaurant environment but good food", for the aspect word "food", the final sentence constructed by the prompt template  is "< s > poor restaurant environment but good food </s > What is the sentiment about food?It was < mask > </s >".This paper uses RoBERTa and a sentence

Prompt Text Construction Layer
The main goal of the prompt text construction layer is to use the prompt mechanism to create prompt text.Adding prompt text helps the model to better understand the semantic relations between context and aspect words, thus aligning the upstream and downstream tasks.This method maximizes the power of MLM.The core of the prompt mechanism is to use the prompt text marked with [mask] to simulate the goal of the pre-trained model before training.Through this, we can transform the sentiment analysis task into a cloze task.In this paper, MLM is used to implement the cloze task.Different from BERT, CLS is marked as s , and SEP is marked as /s .Adding a prompt to the input text can leverage the ability of the pre-trained model, improving its perception performance to downstream tasks.Figure 3 shows the process of the prompt text construction in this paper.

Prompt Text Construction Layer
The main goal of the prompt text construction layer is to use the prompt mechanism to create prompt text.Adding prompt text helps the model to better understand the semantic relations between context and aspect words, thus aligning the upstream and downstream tasks.This method maximizes the power of MLM.The core of the prompt mechanism is to use the prompt text marked with [mask] to simulate the goal of the pretrained model before training.Through this, we can transform the sentiment analysis task into a cloze task.In this paper, MLM is used to implement the cloze task.Different from BERT, < CLS > is marked as < s >, and < SEP > is marked as </s >.Adding a prompt to the input text can leverage the ability of the pre-trained model, improving its perception performance to downstream tasks.Figure 3 shows the process of the prompt text construction in this paper.As shown in Figure 3, given a sentence  and an aspect term , we change the original sentence  to  + , and the prompt text  is defined as  =  + +  .More precisely,  is defined as "What is the sentiment about" and  is defined as "?It was < mask >".E.g., if the original input sentence  = "poor restaurant environment but good food", for the aspect word "food", the final sentence constructed by the prompt template  is "< s > poor restaurant environment but good food </s > What is the sentiment about food?It was < mask > </s >".This paper uses RoBERTa and a sentence As shown in Figure 3, given a sentence X and an aspect term A, we change the original sentence X to X + P, and the prompt text P is defined as P = P le f t +A + P right .More precisely, P le f t is defined as "What is the sentiment about" and P right is defined as "?It was mask ".E.g., if the original input sentence X = "poor restaurant environment but good food", for the aspect word "food", the final sentence constructed by the prompt template P is " s poor restaurant environment but good food /s What is the sentiment about food?It was mask /s ".This paper uses RoBERTa and a sentence pair approach to generate the embedding vector representation of an input text, where the constructed prompt text O inputs is combined with the aspect term O aspects to form sentence pairs.The details are as follows: O inputs = s + X + /s + P + /s (1) where X is the original input sentence, s is the unique identifier of each input sentence, /s is the identifier of the contextual sentence, P is the prompt text incorporating the aspect term, and A is the aspect term.
The input text is transformed into word vectors using operations such as word separation and word embedding, and the <mask> tokens are predicted by using the MLM task in the pre-trained model.In ABSC tasks, pre-trained-based models such as BERT and RoBERTa are commonly used.RoBERTa is an improvement on the BERT model with three main optimizations.Firstly, RoBERTa adopts dynamic masking, which uses a new masking method for each new sequence input, making it more flexible than the fixed masking method in BERT.Secondly, RoBERTa removes the next sentence prediction task from BERT, which has little impact on performance.Finally, RoBERTa expands the batch size and word list, allowing the model to use a larger dataset during pre-training, resulting in richer semantic information at the end of pre-training.Like the BERT model, RoBERTa consists of multiple bi-directional transformer encoders, where the transformer encoder includes components such as self-attention, residual connectivity, and layer normalization.
Using the sentence pair O inputs and O aspects as the input, the context hidden state vector W i inputs = w i 1 , w i 2 . . .w i n and the aspect vector W a aspects = w a 1 , w a 2 . . .w a c are generated by RoBERTa for MLM and RoBERTa, respectively, where W i inputs ∈ R d i ×n , W a aspects ∈ R d a ×c , d i , and d a are the word-embedding dimensions of RoBERTa for MLM and RoBERTa, respectively, and n and c are the lengths of the input sentences and aspect words, respectively.The formulas are shown as follows: W a aspects = RoBERTa O aspects (4)

Syntactic and Semantic Encoding Layer
GCN can be considered as an extension of traditional CNN to encode the local information of unstructured data.GCN combines hidden state vectors with dependency trees to construct a text graph and utilizes convolutional operations on the graph to obtain the syntactic features of aspect words.Moreover, GCN uses the information related to the node's neighbor nodes to model multiple layers so that each node's final hidden state can receive information from its more distant neighboring nodes.Given that a text has n words and each word is a node in the text graph, an adjacency matrix A ij ∈ R n×n can be obtained.For an L layer GCN, l ∈ [1, 2, . . . ,L], let the output of the l-layer of a node i be g l i ; this can be calculated as shown in Equation ( 5): where A ij denotes the syntactic structural adjacency matrix produced by the dependency tree parser, W l is the weight matrix of the l-layer, b l is the bias of the l-layer, and σ is a non-linear activation function, such as ReLU.The context hidden state vector W i inputs generated by RoBERTa for MLM and the syntactic structural matrix A ij are fed into GCN, and the final output of GCN at the L layer is The CNN layer in the PHNN model continues modeling the output of GCN, further extracting text features.Then, the output is fed into ReLU.Compared with the earlier sigmoid function, ReLU can speed up the convergence of the model training and can implement gradient descent and backpropagation more effectively and simultaneously, avoiding the problems of gradient explosion and gradient disappearance.The process of extracting features in CNN is shown in Equation ( 6): where W ∈ R h×m denotes the convolution kernel, h × m is the convolution kernel size, b is the bias, and f is the ReLU activation function.
The output of GCN is convolved to obtain the vector c i , which is sequentially spliced into the matrix C.After the CNN is connected to the maximum pooling layer, each convolutional kernel obtains the scalar Ĉ=max{C}.In this paper, we use more than one convolutional kernel for feature extraction.After the maximum pooling layer, the features are concatenated to obtain the feature vector Z.
where m is the number of convolutional kernels.BiLSTM is a special RNN that captures long-term dependencies in a sentence.In the PHNN model, the hidden state vector generated by RoBERTa for MLM is fed into BiLSTM, allowing the model to encode the input in both the forward and backward directions.BiLSTM consists of three gates: an input gate, an output gate, and a forgetting gate.These gate mechanisms allow the model to selectively remember or ignore information when processing input sequences, and thus allow the semantic and contextual relationships of the sentences to be better captured.Through the BiLSTM encoding process, the model can obtain a sentence representation that integrates forward and backward information, providing much richer semantic expressiveness for subsequent tasks.The specific BiLSTM unit computation process is shown in Equations ( 8)-( 13): where t denotes the time step, x t is the input at t, x t ∈ W i inputs , h t is the hidden vector representation at time step t, * represents element multiplication, σ denotes the sigmoid activation function, W i and b i are the parameters of the input gate, W f and b f are the parameters of the forgetting gate, W o and b o are the parameters of the output gate, and c t−1 and c t denote the state of the previous cell and the state of the current cell, respectively.The hidden state vector W i inputs generated by RoBERTa for MLM is passed through BiLSTM to obtain the vector H, where H is the final output of h t .
After obtaining the outputs of the maximum pooling and BiLSTM, we use MHA to perform an interactive learning analysis of their outputs with aspect words, capturing possibly missed representations of sentiment features.MHA refers to performing multiple attention functions in parallel to calculate attention.The attention function maps a key sequence k = {k 1 , k 2 . . . ,k n } and a query sequence q = {q 1 , q 2 . . . ,q m } to the output sequence, as shown in Equation (15): where d k is the scale parameter.MHA integrates single attention and projects it to a specified hidden dimension d hid .The formula for calculating the MHA value MH A(k, q) is shown in Equations ( 16) and (17): where W mh ∈ R d hid ×d hid , A h is the output of the h-th head attention, h ∈ [1, 2, . . . ,r], r is the number of heads, and ":" denotes vector concatenation.We obtain the output vector Z of the maximum pooling and the output vector H of the BiLSTM through the previous process and learn the vectors C ca and C la after the MHA interacts with the aspect words' vector W a aspects , as shown in Equations ( 18) and ( 19): C la = MH A H, W a aspects (19)

Sentiment Classification Layer
The vectors C ca and C la obtained from MHA are combined into H f in and then averaged to obtain H avg .The averaged vectors are fed into the linear layer immediately following the softmax function to generate the sentiment polarity probability distribution y.The calculation process is shown in Equations ( 20)-( 22): where W a and b a are the learnable parameter matrix and bias, respectively.

Training
Using a gradient descent algorithm, the model is trained using a cross-entropy loss and L2 regularization, as shown in Equation ( 23): where D is the size of the training set, C takes a value of 3, because the dataset includes positive, neutral, and negative labels, y j i is the predicted sentiment category of the text, ŷj i is the true sentiment category of the text, λ||θ|| 2 is the regularization term, θ denotes all the trainable parameter sets, and λ denotes the L2 regularization coefficient.

Datasets
Three datasets are used in the experiments, including the Laptop and Restaurant datasets [26] from SemEval 2014 Task 4 and the Twitter dataset [27].The first two datasets can be downloaded from https://alt.qcri.org/semeval2014/task4/(accessed on 15 August 2023).The last dataset can be downloaded from http://goo.gl/5Enpu7 (accessed on 15 August 2023).The Laptop dataset consists of over 3K instances from laptop reviewers.The Restaurant dataset consists of over 3K instances from the reviewers of restaurants.The Twitter dataset contains over 7K tweets about celebrities, products, and companies.Each instance of the above datasets consists of three lines: sentence, aspect words, and the polarity of the aspect words (1: positive, 0: neutral, −1: negative).Each dataset is originally divided into two parts: the train set and the test set.The details are shown in Table 1.

Experimental Setting
In the experiments, for RoBERTa, we use the RoBERTa-base version; the RoBERTa embedding dimension is 768, the RoBERTa for MLM embedding dimension is 50265, the learning rate is 2 × 10 −5 , and the regularization coefficient is 1 × 10 −4 .The number of layers of GCN is 2. In CNN, the number of convolutional kernels, the size of the convolution kernel, and the step size are 6, (6, 100), (4, 55), respectively.The maximum pooling window size is (2, 1).The dimension of the hidden state vector output by BiLSTM and MHA is 300.The number of attention heads is 8 and the dropout is 0.1 in MHA.The Adam optimizer is used to update all parameters.The model is run on a GeForce RTX 2080 Ti GPU (NVDIA, Santa Clara, CA, USA).

Baseline Models
To verify the validity of the PHNN model, we compared it with the following models: • AOA [28].It borrows the idea of attention over attention (AOA) to model aspects and sentences, learning the representation of aspect terms and contexts.• ATAE-LSTM [29].It combines aspect and contextual word embeddings as the input, using LSTM and attention to process the hidden layer to obtain results.• TD-LSTM [30].It uses two LSTM networks to model the text, extending the LSTM for ABSA tasks.

•
ASGCN [4].It utilizes GCN to model the context, using syntactic information and interdependencies between words for ABSA tasks.

•
R-GAT [18].It reconstructs the dependency tree to remove redundant information, extending the original GNN to add a relational attention mechanism.• R-GAT+BERT [18].An R-GAT model that is based on pre-trained BERT.
• DualGCN [32].It is a dual GCN model and utilizes orthogonal and differential regularizer methods to improve the ability of semantic correlations.

•
SSEGCN [33].It is a syntactically and semantically enhanced GCN model for ABSA tasks that uses an aspect-aware attention mechanism with self-attention to obtain the attention score matrix of a sentence and enhanced node representations by executing GCN on the attention score matrix.

Main Results
We use accuracy and macro-averaged F1 values as measures of model performance.The experimental results are shown in Table 2 and the bold data in each column represents the optimal result.The results in Table 2 can be found in more detail in Appendix A.

Discussion and Conclusions
ABSC is a well-studied NLP task, and pre-trained models and neural networks are frequently used in ABSC tasks.In response to the downstream task that cannot fully stimulate the ability of the pre-trained model and the attention mechanism that usually neglects the syntactic information of sentences, resulting in information loss and unsatisfactory results, this paper proposes the PHNN model, which utilizes a prompt and hybrid neural network to solve the ABSC task.PHNN contains three main layers: the prompt text construction layer, the syntactic and semantic encoding layer, and the sentiment classification layer.In the prompt text construction layer, we use the prompt to reform the sentence and then input it into the RoBERTa pre-trained model.The prompt knowledge guides the pre-trained model to narrow the gap between the downstream task and the pre-trained model.In the syntactic and semantic encoding layer, we consider both syntactic dependency information and semantic information between contextual sentences.More precisely, we use GCN combined with CNN to extract syntactic features, and utilize BiLSTM to obtain semantic features.Then, we utilize MHA to capture possibly missed representations of sentiment features.In the sentiment classification layer, we obtain the sentiment polarity of the sentence by using the softmax function.Our experiments demonstrate the efficiency of PHNN for the ABSC task.
Our future plan is to investigate other deep learning techniques to further enhance the performance of the proposed model.Additionally, we intend to evaluate our proposed model in other ABSA tasks to verify its effectiveness in addressing sentiment-related issues.

Figure 1 .
Figure 1.A sentence with its syntactic dependency tree.

Figure 1 .
Figure 1.A sentence with its syntactic dependency tree.
To solve the ABSC problem, we propose the PHNN model.The model's architecture is shown in Figure 2. It consists of three layers: the prompt text construction layer, the syntactic and semantic encoding layer, and the sentiment classification layer.The details of the PHNN model are presented in the rest of this section.

Figure 2 .
Figure 2. The overall architecture of the PHNN model.

Figure 3 .
Figure 3.The prompt text construction.

Figure 2 .
Figure 2. The overall architecture of the PHNN model.

Figure 2 .
Figure 2. The overall architecture of the PHNN model.

Figure 3 .
Figure 3.The prompt text construction.

Figure 3 .
Figure 3.The prompt text construction.

Table 2 .
Comparison of accuracy and macro-F1 on three datasets.