Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification

Yue, Chunyi; Li, Ang; Chen, Zhenjia; Luan, Gan; Guo, Siyao

doi:10.3390/app14177971

Open AccessArticle

Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification

by

Chunyi Yue

¹,

Ang Li

^2,*

,

Zhenjia Chen

¹

,

Gan Luan

¹

and

Siyao Guo

¹

School of Information and Communication Engineering, Hainan University, Haikou 570228, China

²

Peng Cheng Laboratory, Shenzhen 518066, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7971; https://doi.org/10.3390/app14177971

Submission received: 10 July 2024 / Revised: 2 August 2024 / Accepted: 3 September 2024 / Published: 6 September 2024

Download

Browse Figures

Versions Notes

Abstract

Domain information plays a crucial role in sentiment analysis. Neural networks that treat domain information as attention can further extract domain-related sentiment features from a shared feature pool, significantly enhancing the accuracy of sentiment analysis. However, when the sentiment polarity within the input text is inconsistent, these methods are unable to further model the relative importance of sentiment information. To address this issue, we propose a novel attention neural network that fully utilizes domain information while also accounting for the relative importance of sentiment information. In our approach, firstly, dual long short-term memory (LSTM) is used to extract features from the input text for domain and sentiment classification, respectively. Following this, a novel attention mechanism is introduced to fuse features to generate the attention distribution. Subsequently, the input text vector obtained based on the weighted summation is fed into the classification layer for sentiment classification. The empirical results from our experiments demonstrate that our method can achieve superior classification accuracies on Amazon multi-domain sentiment analysis datasets.

Keywords:

neural network; attention mechanism; pooling; multi-domain; sentiment analysis

1. Introduction

Sentiment classification is a classic research topic in natural language processing (NLP), with promising application and research prospects. For instance, sentiment polarity analysis of product and service reviews can help businesses identify and resolve problems, enhancing user experience. It can also serve as a tool for public sentiment monitoring, analyzing people’s comments on specific events to understand their emotional reactions and respond accordingly. Additionally, sentiment analysis can be used for mental health monitoring, helping individuals track their emotional state and detect mental disorders. In the field of chatbots, this technology can assist robots to promptly capture the emotions of interlocutors, facilitating deeper human–machine conversations. Furthermore, sentiment analysis can be applied in cross-cultural communication, reducing misunderstandings caused by cultural differences and enhancing communication and comprehension among people in a multilingual environment.

An effective sentiment classification system must be able to predict the sentiment polarity of a text quickly and accurately, such as negative, neutral, or positive. Some previous research has treated sentiment classification as a general text classification task and utilized classic classification methods. Many methods based on statistics [1,2], dictionary and rule-based [3,4], as well as shallow machine learning models [5,6] provide solutions for this task. In recent years, deep learning has attracted widespread attention, and three typical frameworks, namely convolutional neural networks (CNNs) [7], recurrent neural networks (RNNs) [8], and transformers [9], have also shown their abilities in this task. However, sentiment classification is different from other text classification tasks. It is a domain-specific issue. This implies that the same expression can convey different sentiment polarities or effects across domains. For instance, the word “simple” may serve as a synonym for “not rich” and “handy” when describing breakfasts in a hotel and the use of an electrical appliance, respectively. Even the same meaning can have varying sentiments across different domains. For the sentence “This skirt is very beautiful, but the camera on my phone is subpar”, it’s positive in the clothing domain, but negative in the phone domain. One simple method to resolve this issue is to train domain-specific classifiers for different domains. Nonetheless, they frequently require a large number of labeled samples to ensure satisfactory performance. In addition, it is impractical to create a sufficient labeled sample set for each domain.

Thus, some studies have started to explore how to leverage data from multiple domains to improve the classification accuracies of domain-specific sentiment classifiers. Multi-task learning, which can achieve mutual benefits from different tasks by learning them together, offers fresh prospects to this matter. By simultaneously training multiple domain-specific classifiers, a number of models extract both shared and domain-specific features for sentiment classification. These methods can be roughly divided into two categories. One category uses independent components to learn shared and domain-specific features [10], like a neural network containing a shared layer and other task-specific layers. The other category uses a general model structure [11], such as a neural network used to generate a feature pool and then attention or other techniques applied to extract the features required for domain-specific sentiment classification from the pool.

Although the aforementioned methods are efficient, we observe that they only concentrate on domain-related features and cannot directly model the relative importance of the features. In fact, this is essential for sentiment classification, particularly when a sentence contains descriptions from multiple aspects and the corresponding polarities are inconsistent. Consider the following three reviews:

“The food at this restaurant is delicious and reasonably priced.”

“This restaurant has poor service and an unpleasant atmosphere.”

“The food at this restaurant is delicious and reasonably priced, but the service and atmosphere are poor. Overall, you can give it a try.”

The first review is obviously positive, because it has the words “delicious” and “reasonably priced”. The second review is negative, because neither the service nor the atmosphere are satisfactory. For the third review, if we only notice the first sentence, it may be difficult to determine its polarity. This is because it includes inconsistent polarity among the service, environment, taste, and price. However, when we see the final sentence “Overall, you can try it”, we know this customer recommends the restaurant, regardless of what was said earlier, which indicates that some sentiment information is more important than others. Inspired by this observation, we design a novel attention neural network that can automatically choose the crucial information, such as the final sentence of the third review. In contrast to reviews with consistent polarity, reviews with inconsistent polarity are common in natural language corpora and are always a challenge for sentiment classification.

To further model the relative importance of domain-related sentiment features, we constructed an attention-pooling neural network for multi-domain sentiment classification(AP-MDSC). It consists of two core components: an attention weight matrix composed of feature-wise gates, and a min-pooling layer over the attention weight matrix. On the Amazon multi-domain sentiment analysis dataset, the effectiveness of our proposed model and other traditional sentiment classification classifiers is evaluated.

The following are our main contributions:

(1): proposing an attention mechanism for multi-domain sentiment classification to take both the relative importance of features and the influence of the domain information into account, and expanding the set of existing attention mechanisms.
(2): integrating attention with min-pooling to generate sharp probabilistic attention weights over words in a sentence to select salient features. To the best of our knowledge, the constructed model is distinct from other existing attention neural networks.
(3): According to our experimental results, our model can obtain better performance on the Amazon sentiment analysis dataset at different scales.

The rest of this article is organized as follows. In the Section 2, we present some important works in sentiment classification, cutting-edge attention and pooling techniques, and multi-domain scenarios. Then, we provide a detailed introduction of the proposed method, AP-MDSC, in the Section 3, which consists of the basic framework, attention-pooling technology, and model training approach. The evaluation is conducted in the Section 4, and the overall work is summarized in the Section 5.

2. Related Work

2.1. Sentiment Classification

From dictionary-based methods to deep learning, various approaches aimed at sentiment classification have been proposed. Initially, sentiment lexicons [12,13] are broadly used for this task due to their high interpretability. However, it is impractical to construct an ex ante lexicon for each domain, because there are infinite domains in reality. Therefore, domain adaptation based on the manipulation of sentiment lexicons is proposed. For instance, Blitzer et al. [14] proposed a novel pivot selection method for linking the source and target domains. Melville et al. [15] discovered that the upweighting/downweighting of terms can be regarded as a domain adaptation process. Xing et al. [16] extended the previous work by introducing a cognitive-inspired domain adaptation of sentiment lexicons.

In recent years, neural network algorithms have attracted great attention. Sentiment classification models based on three main architectures, CNNs, RNNs, and transformers, have emerged successively. For instance, Kim et al. [17] proposed a CNN for sentence classification. Zhang et al. [18] introduced a character-level convolutional network (ConvNet) for text classification. Tang et al. [19] demonstrated a gated RNN for document-level sentiment classification. Lai et al. [20] used a recurrent layer and a convolutional layer together to construct a recurrent CNN for text classification. Additionally, Liu et al. [21] combined atransformer and graph convolutional network (GCN) to accomplish text classification.

2.2. Attention and Pooling Mechanisms in Neural Networks

Many studies have excavated the potential of neural networks from the perspective of attention mechanisms. Attention is first introduced into a seq2seq model for machine translation, which represents a soft alignment between a source input and its respective target output. Due to the effectiveness in modeling dependencies between elements, it almost becomes an essential component for neural networks. For instance, Rocktäschel et al. [22] used attention to align words and phrases in a premise and a hypothesis to deduce their final representation for textual entailment. Kumar et al. [23] developed a dynamic memory network with recursive attention for textual question answering. Yang et al. [24] built an attention neural network for inferring aspect/lexicon-aware document representations via a multi-view attention mechanism for cross-domain aspect-level sentiment classification. Transformers introduce self-attention mechanisms. This allows them to weigh the importance of different words in a sentence relative to each other, regardless of their distance in the sequence. However, this model also suffers from the drawback of high storage and computational requirements for multi-domain sentiment classification. Domain Attention Model (DAM) [25] is an attention-based neural network model built upon the framework. It efficiently accomplished multi-domain sentiment classification tasks with a relatively simple structure.

Pooling is another critical operation in neural networks, particularly in CNNs, where it helps to reduce the spatial dimensions of the feature maps, leading to a reduction in the number of parameters and computational cost. There are several commonly used pooling techniques, including max pooling, which selects the maximum value in a region; average pooling, which calculates the mean value of the region; and more sophisticated methods like stochastic pooling and spatial pyramid pooling. In CNNs, pooling operations are frequently used to provide downsampling. Thus, the mainstream models in computer vision (CV), like ImageNet [26], have predominantly utilized it. In NLP, mean pooling is a prevalent technique, such as in the domain feature extractor of DAM, where it is utilized to aggregate the outputs of the LSTM [27] as the domain representation of the input text. A similar use case can be found in the transformer architecture. Min pooling, which chooses the minimum value within a window, has not been commonly seen in previous works.

2.3. Multi-Domain Scenarios

We have showcased many text classification models earlier, but they were designed for a single domain, meaning they do not take the domain information of the text into consideration when performing the classification task. Thus, we need some methods to extract and utilize the domain information efficiently. Multi-domain sentiment classification approaches that can leverage labeled samples in various domains to effectively improve the classification performance in each domain are a good choice, including the multi-domain sentiment classification method with feature-level and classifier-level fusion mechanisms (MDSC-Com) [28], collaboratively training sentiment classifiers with a global classifier and a domain-specific one (CMSC) [29], regularized multi-task learning (RMTL) [30], multi-task learning with graph regularization (MTL-Graph) [31], the adversarial multi-task learning method for text classification (ASP-MTL) [32], multi-task learning with deep neural networks (MTL-DNNs) [33], deep multi-task learning with shared memory for text classification (MTL-SM) [34], and a neural word embeddings approach for multi-domain sentiment analysis (NeuroSent) [35]. Another method, DAM, was developed to provide a more flexible feature extraction method. In this study, the domain representation of an input text is treated as attention to select the features needed for sentiment classification in each domain. Overall, by introducing domain information, all of these approaches can obtain shared and domain-specific features.

More recent efforts have been devoted to incorporating general techniques to further enhance model performance [36,37,38], such as active learning, uncertainty sampling, and domain adversarial contrastive learning.

3. Methods

To directly model the ranking of emotional information, we propose a attention-pooling neural network for multi-domain sentiment classification called AP-MDSC. We will introduce it in two parts: one is a general neural network framework for the sentiment classification task, and the other is a novel attention mechanism with a pooling technique to improve the model performance. It is worth noting that the pooling operation in our model is not performed on the feature maps but on an attention weight matrix, and instead of using conventional max pooling and mean pooling, min pooling is employed. Now, we offer an overview of the model architecture and the attention mechanism proposed.

3.1. Overall Framework

We adopt the general framework illustrated in Figure 1 to accomplish the multi-domain sentiment classification task, which is composed of four components: a word-embedding layer, domain feature representation and sentiment feature layers, an attention layer, and domain and sentiment classifiers.

Word-embedding layer:converting words into dense vector representations to reflect semantic and syntactic information between words.

Feature representation layer: generating domain features and sentiment features from the obtained word embeddings for domain and sentiment classification.

Attention layer: deciding how to extract the features required for the target task, which is a critical mechanism for the attention neural network and directly affects the model performance.

Classifiers: categorizing input data into predefined domain and sentiment classes based on learned features.

Specifically, we assume that the one-hot vector representations of an input text are represented as

X = {x_{1}, x_{2}, \dots, x_{n}}

. Initially, X is passed through the word-embedding layer to produce a set of embedding vectors,

E = {e_{1}, e_{2}, \dots, e_{n}}

,

E = A X

, and

A = \in R^{D * V}

. A is the word-embedding matrix, where D and V represent the dimension of the word embedding and the size of the dictionary, respectively. Then, the word-embedding vectors are conveyed to a domain module. In the domain module, inputs are transformed into context-aware vector representations via a bidirectional LSTM (BiLSTM) [39] layer, which can be expressed by the following formulae:

\begin{matrix} \overset{\leftarrow}{H_{d}} = \overset{\leftarrow}{L S T M_{d}} (E), \end{matrix}

(1)

\begin{matrix} \vec{H_{d}} = \vec{L S T M_{d}} (E), \end{matrix}

(2)

\begin{matrix} H_{d} = \overset{\leftarrow}{H_{d}} + \vec{H_{d}} . \end{matrix}

(3)

where

H_{d} = {h_{d 1}, h_{d 2}, \dots, h_{d n}}

are hidden states consisting of the forward states

\overset{\leftarrow}{H_{d}}

and backward states

\vec{H_{d}}

from BiLSTM. The internal structure of a cell in LSTM is described by the following formulae:

\begin{matrix} f_{i} = σ (v_{f} \cdot [h_{i - 1}, e_{i}] + b_{f}), \end{matrix}

(4)

\begin{matrix} i_{i} = σ (v_{i} \cdot [h_{i - 1}, e_{i}] + b_{i}), \end{matrix}

(5)

\begin{matrix} \tilde{c_{i}} = t a n h (v_{c} \cdot [h_{i - 1}, e_{i}] + b_{c}), \end{matrix}

(6)

\begin{matrix} C_{i} = f_{i} \cdot c_{i - 1} + i_{i} \cdot \tilde{c_{i}}, \end{matrix}

(7)

\begin{matrix} o_{i} = σ (v_{o} \cdot [h_{i - 1}, e_{i}] + b_{o}), \end{matrix}

(8)

\begin{matrix} h_{i} = o_{i} \cdot t a n h (C_{i}), \end{matrix}

(9)

In the formulae above,

f_{i}

,

i_{i}

, and

o_{i}

denote the forget gate, the input gate, and the output gate, which are used to regulate the influence of historical inputs and the current input, as well as the current information output at time i, respectively, according to the previous hidden state

h_{i - 1}

and present word embbeding

e_{i}

.

\tilde{c_{i}}

and

C_{i}

are referred to as the unit state.

σ

denotes the sigmoid function. v and b with different subscripts represent the projection matrix and the bias, respectively. At this point, the domain representation of the input text

h_{d}

is the last hidden state of the input text, and it is converted to a probability distribution on the domain labels, which is as follows:

\begin{matrix} {\hat{y}}_{d} = f_{d} (h_{d}), \end{matrix}

(10)

where

f_{d}

is the mapping of

h_{d}

into a predicted domain label

{\hat{y}}_{d}

.

Then, the word embeddings and the domain representation are fed into another BiLSTM and the attention layer in the sentiment module. The word embeddings are converted into context-aware hidden states

H = {h_{1}, h_{2}, \dots, h_{n}}

via the BiLSTM, and then the hidden states and the domain representation interact to produce attention weights

{α_{1}, α_{2}, \dots, α_{n}}

at the attention layer as follows:

\begin{matrix} α_{i} = f (h_{d}, h_{i}) \end{matrix}

(11)

where f is a fusion function. It is defined as a linear projection in DAM. Our work proposes a new method of information interaction, which will be thoroughly introduced in the next subsection. Once

α_{i}

has been determined, the weighted sum over

h_{s}

, which also represents the text vector v, is calculated as follows:

\begin{matrix} v = \sum_{i = 1}^{n} α_{i} h_{i} . \end{matrix}

(12)

Finally, another mapping function

f_{s}

is used to predict a sentiment label

{\hat{y}}_{s}

according to the text representation, i.e.,

\begin{matrix} {\hat{y}}_{s} = f_{s} (v) . \end{matrix}

(13)

3.2. Proposed Attention Mechanism

According to our analysis, when there is a conflict in the polarity of emotional information within a sentence, the relative importance of these conflicting emotions is crucial for determining the overall polarity of the sentence. To fit this relationship, we propose an attention mechanism distinguished from other works. This mechanism uses an attention weight matrix to store the relative importance scores between input elements and then employs pooling techniques to select the attention distribution over each element. Figure 2 shows the attention generation process in this mechanism.

First, on the input text domain representation

h_{d}

generated in the domain module and the hidden states H generated in the sentiment module, an attention matrix is created that reflects the relative importance of the input elements. This is obtained by comparing the elements at each moment with those at other moments, which is expressed as follows:

\begin{matrix} s_{i, j} = f (g_{1} [h_{i}, h_{d}] + g_{2} [h_{j}, h_{d}]), \end{matrix}

(14)

where f,

g_{1}

and

g_{2}

correspond to one-layer feed-forward neural networks that are used to merge domain information with hidden states at all time steps.

h_{i}

and

h_{j}

are concatenated with

h_{d}

to form domain-aware hidden states at times i and j, respectively, then the hidden state at time i is used as the key point and compared with the hidden state at time j to generate a gate

s_{i, j}

. Ideally, this value should be 0 or 1 to indicate whether i is selected after comparison. In our method, the activation function in f is set to Sigmoid to complete this function. After each hidden state is compared with other hidden states, a matrix

S = {s_{i, j}; i, j = 1, \dots, n}

is generated.

However, we note that in our method, the values in S are subject to a significant restriction; that is, the sum of

s_{i j}

and

s_{j i}

must be equal to 1, because if the

i

th element is selected between the

i

th element and the

j

th element, then the

j

th element cannot be selected, and vice versa.

Accordingly, we assume that there are two triangular mask matrices,

M^{u p p e r}

and

M^{l o w e r}

, which are expressed as follows:

\begin{matrix} M^{u p p e r} = \{\begin{matrix} 1, i \leq j, \\ 0, o t h e r w i s e . \end{matrix} \end{matrix}

(15)

\begin{matrix} M^{l o w e r} = \{\begin{matrix} 0, i \leq j, \\ 1, o t h e r w i s e . \end{matrix} \end{matrix}

(16)

Next, the attention matrix is updated as follows:

\begin{matrix} U = M^{upper} \cdot S + M^{lower} \cdot (J - S^{T}), \end{matrix}

(17)

where

U = {u_{i, j}; i, j = 1, \dots, n}

is the updated attention matrix, S and

S^{T}

are the initial attention matrix and its transpose, J is the all-ones matrix, and · denotes element-wise multiplication. Through the above transformation, it can be ensured that in matrix U, the sum of

u_{i j}

and

u_{j i}

is equal to 1, except for the elements on the diagonal.

After that, U passes through a min-pooling layer and a Softmax layer to generate the attention probability distribution

{α_{i}; i = 1, \dots, n}

. This step is crucial for our method, because the min-pooling layer is used to achieve the function of “if one element cannot be more significant than others, it will not be the focus of attention”. When using other pooling layers such as a max-pooling layer or a mean pooling layer, this function is difficult to implement. The process is expressed in the following formula:

\begin{matrix} a_{i} = m i n - p o o l i n g {u_{i 1}, u_{i 2}, \dots, u_{i n}}, \end{matrix}

(18)

\begin{matrix} α_{i} = \frac{e^{a_{i}}}{\sum_{k = 1}^{n} e^{a_{k}}} . \end{matrix}

(19)

where min-pooling is performed on the first dimension of the matrix to determine the importance of the

i

th element, represented by the attention weight

a_{i}

. It is converted to

α_{i}

through a Softmax transformation.

Afterward, we can use Formulas (12) and (13) to calculate the input text features required for sentiment classification and complete this task accordingly.

3.3. Training

In our model, the objective function is a linear combination of the loss functions of sentiment classification and domain classification. We suppose there are K domains, with each containing N samples. The objective function is expressed as follows:

\begin{matrix} C (ϕ) = \frac{1}{N} \sum_{k = 1}^{K} \sum_{n = 1}^{N} L (y_{s}^{k, n}, {\hat{y}}_{s}^{k, n}) + \frac{λ}{N} \sum_{k = 1}^{K} \sum_{n = 1}^{N} L (y_{d}^{k, n}, {\hat{y}}_{d}^{k, n}), \end{matrix}

(20)

where

ϕ

represents all inputs and parameters;

y_{s}^{k, n}

and

y_{d}^{k, n}

denote the true sentiment label and domain label, respectively, of the

n

th sample in the

k

th domain; and

{\hat{y}}_{s}^{k, n}

and

{\hat{y}}_{d}^{k, n}

correspond to the predicted sentiment label and domain label, respectively. The cross-entropy loss function is denoted by L. Because the primary task is sentiment classification and the auxiliary task is domain classification,

λ

is introduced to control the participation rate of the domain classification task.

4. Experiments

In this section, we evaluate the performance of the proposed model on the Amazon multi-domain sentiment analysis dataset and compare it to other multi-domain sentiment analysis methods.

4.1. Experimental Settings

Datasets: the Amazon dataset provided by Blitzer et al. is a public English sentiment analysis dataset containing over 20 domains. We focus on four domains, namely books, DVDs, kitchen, and electronics, and we use 1000 positive samples (labelled as 1) from positive.reviews and 1000 negative samples (labelled as 0) from negative.reviews in each domain for our experiments. The selected labeled samples in each domain are randomly divided into training data, validation data, and testing data in a 7:2:1 ratio. Table 1 displays the statistics of the datasets. Because some sentences in the Amazon dataset are too long, we set a length threshold so that the part of the sentence that exceeds this threshold is truncated.

Implementation details: The experiments were conducted on a computer equipped with an Intel Core i7 processor, 64 GB of RAM (Santa Clara, CA, USA), and a NVIDIA GeForce RTX 1080 GPU (Santa Clara, CA, USA) running on the Ubuntu operating system (v.24.04). The Python language was employed for programming, the data were processed with the NumPy and NLTK libraries, and our models were built using the Keras platform (v.3.4.1) for all experiments.

For training, an adaptive learning rate method (ADADELTA) [40] is selected to update the parameters in AP-MDSC, and the initial learning rate is set to 0.1. If the validation loss does not improve after 3 epochs of training, the learning rate is reduced. Simultaneously, if the loss cannot be reduced further within 5 epochs, early stopping is implemented. Considering the overfitting problem, the word-embedding and output layers utilize a dropout layer with a rate of 0.2. The dimension of the hidden states is 100, and the batch size is 64. For the Amazon dataset, 300D Glove [41] pretrained vectors are used as initial word-embedding vectors for words in the Glove dictionary; otherwise, randomly initialized embedding vectors are employed. Each experiment is carried out 10 times, and the average accuracy is recorded.

Baselines: We compare the proposed AP-MDSC with the classic supervised sentiment classification methods in a single domain and across all domains, including least squares (LS-single, LS-all), support vector machine (SVM-single, SVM-all),and LSTM methods (LSTM-single, LSTM-all). Considering that our model is based on multi-domain data, some typical multi-task learning methods are also compared, including MTL-DNN, MTL-CNN, MTL-SM, MDSC-Com, RMTL, MTL-Graph, CSMC, AP-MTL, NeuroSent, and DAM. To further investigate the model’s performance on datasets composed of varying domains, we have also defined the DMA-x and AP-MDSC-x series models, where x denotes the number of domains on which the model is trained and tested. For instance, when x equals 2, the model utilizes data from the book and DVD domains, and so on. If experiments are conducted on the classic four domains (books, DVDs, kitchen, electronics, and music), we can also use DAM and AP-MDSC to denote these setups.

4.2. Performance Evaluation

Table 2 summarizes the experimental results of our model and baseline models trained on the Amazon dataset.

A detailed analysis of the results can be given as follows:

(1) AP-MDSC versus traditional sentiment classification models: It is shown in Table 2 that our model obtains significant improvements compared to all single-domain sentiment classification methods, including LS-single, SVM-single, and LSTM-single. We speculate that the main reason is that those models only use single-domain data for training and cannot acquire general knowledge from other datasets. Compared to sentiment classification models that use all domain data indiscriminately, such as LS-all, SVM-all, and LSTM-all, our model achieves higher classification accuracy across all domains. This is because, although the models can gain general knowledge from a larger set of training data, they do not leverage domain information to filter out the sentiment features required for specific domains.

(2) AP-MDSC versus other multi-task learning models: As shown in Table 2, our model outperforms other multi-task-based models, whether in terms of performance within a single domain or the average performance across the entire dataset, except for DAM. According to the analysis of DAM, these multi-task-based methods have various shortcomings, such as not fully considering the relationships between different domains or ignoring shared sentiment knowledge, while DAM can extract domain-related features from a shared feature pool at the feature level using domain attention, reducing the need to design a specific layer for each task. In contrast to DAM, although AP-MDSC does not achieve the best performance in all domains, it can achieve the best overall performance. This suggests that our proposed approach can improve multi-domain sentiment classification by simulating the ranking of the elements in the input text. Overall, the experimental results show that AP-MDSC is a powerful multi-domain sentiment classification algorithm.

4.3. Comparative Experiments in Datasets with Different Domains

To further verify the performance of our model, we compare the performance of AP-MDSC and DAM on different numbers of domains. Table 3 presents their average sentiment classification accuracies as the number of domains increases from two to five (books, DVDs, kitchen, electronics, and music).

As shown in Table 3, both DAM and AP-MDSC generally perform better as the number of domains increases. As previously discussed, the expansion of domains can provide more shared knowledge in most cases. In addition, the average classification accuracy of AP-MDSC is higher than that of DAM, regardless of the number of domains. These results demonstrate that the attention mechanism used in AP-MDSC can effectively use the ranking of elements to obtain salient sentiment features. However, we also observed that as the number of domains increases, the performance advantage of AP-MDSC decreases. We believe that for multi-task learning problems such as multi-domain sentiment analysis, an increase in the number of domains can lead to more conflicts between tasks, to which AP-MDSC is particularly sensitive. This is a topic worthy of study, and we plan to explore it in future work.

4.4. Pooling Techniques

According to our analysis in the third section, using min-pooling in AP-MDSC can help extract significant features for sentiment classification, which is different from the more general use of max-pooling and mean-pooling in many other attention models. Table 4 shows the testing accuracy of the AP-MDSC model on the Amazon sentiment classification dataset with three different pooling techniques while keeping all other network structures and settings unchanged.

As shown in Table 4, using the min-pooling layer results in a better performance on the dataset than using the mean-pooling and max-pooling layers. This validates our previous analysis result that when calculating the distribution of attention at each moment in AP-MDSC. If a word is less important than others, it should be given less attention; specifically, the corresponding attention weight should be smaller. Therefore, adopting the min-pooling layer is more reasonable than using the other two pooling layers. The choice of pooling technique is one of the keys to efficiently performing multi-domain sentiment analysis tasks for our model.

4.5. Inverse Events

As mentioned in the previous analysis, we set the sum of

u_{i j}

and

u_{j i}

to be 1 in the matrix U in AP-MDSC. This is because in our work,

u_{i j}

represents the relative importance of words at time i and time j. When a word at time i is selected, the word at time j cannot be selected naturally, and vice versa. This is different from other attention models for this task. Table 4 shows the test accuracy on the dataset when considering inverse events.

As shown in Table 5, the test accuracy of AP-MDSC (U-limited) is higher than that of AP-MDSC (U-unlimited). This also verifies the rationality of the method our model uses to calculate the matrix U.

4.6. Computational Costs

The training times of AP-MDSC and DAM during each epoch with an increasing number of domains are shown in Table 6.

Table 6 shows that as the size of the sample data increases, the training time costs of AP-MDSC and DAM increase gradually. Compared with DAM, the growth of the time cost of AP-MDSC during each epoch is slight, never exceeding 10%. Given the improvement in classification precision, we regard AP-MDSC as a good multi-domain sentiment classification model.

4.7. Visualization of Attention Weights

Figure 3 shows the distribution of attention weights on a test sample from the DVD domain, which is as follows:

“This is an excellent film, which unfortunately has not been given a decent treatment by the distributor. I was very disappointed in the quality of the release-the picture quality is poor inter-titles appear to be missing, and the score which has been added is just a reptition of long synth chords that don’t match the action on-screen. It’s a shame, because a film like this one deserves much better.”

We mark words with negative emotions in blue, words with positive emotions in red, and randomly generate colors for other words. The sizes of the words are based on the attention weights assigned. The greater the weight, the larger the font size, and vice versa.

Based on Figure 3, it is clear that, except for the initial position, the most prominent words are the blue word “unfortunately” and the red word “decent”. The most prominent sentence is the first one, followed by the last one. In the first sentence, the emotional words are “unfortunately”, “excellent”, and “decent”. However, it is evident that “excellent”, which describes “film”, is paid relatively less attention by our attention mechanism. This indicates that, by contrasting it with other words in the context, it is judged that “excellent” is not important. Then, there is the contrast between “unfortunately” and “decent”, where an important phrase “not been given” stands between them. Although these words are neutral when taken individually, the model’s context-aware capability allows it to understand the emotional information of “not been given decent treatment” as a whole. Similarly, in the last sentence, the negative emotion word “shame” and the positive emotion word “better” are included, but the understanding of “better” needs to be combined with the context; i.e., “deserves much better.” The large sections of description in the middle are allocated relatively little attention; compared to the first and last sentences, they have less influence on determining the emotional polarity of the review.

5. Conclusions

To further deal with the conflicting sentiment information contained in the input text, we proposed an attention neural network with a novel attention-pooling mechanism, AP-MDSC, to model the relative importance of domain-aware sentiment features. We demonstrated the advantages of AP-MDSC on a public multi-domain sentiment classification dataset and validated various settings of the model through a series of comparative experiments. Meanwhile, we observed some limitations of AP-MDSC, including its sensitivity to multi-task conflicts. In future work, we plan to delve into the balance algorithms between multi-tasks in our proposed attention neural network to improve model performance. Additionally, we will conduct more research on sentiment classification tasks in real sentiment classification scenarios based on large-size datasets to extend the boundaries of our work.

Author Contributions

Conceptualization, C.Y. and A.L.; data curation, S.G.; formal analysis, C.Y. and A.L.; funding acquisition, C.Y. and Z.C.; investigation, Z.C. and G.L.; methodology, C.Y. and A.L.; project administration, Z.C.; resources, A.L.; software, C.Y.; supervision, G.L.; validation, A.L.; visualization, A.L.; writing—original draft, C.Y.; writing—review and editing, Z.C. and G.L. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the Hainan Provincial Natural Science Foundation (621MS020) and the Joint Funds of the National Natural Science Foundation of China (U22A2002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cao, Q.; Duan, W.; Gan, Q. Exploring Determinants of Voting for the “Helpfulness” of Online User Reviews: A Text Mining Approach. Decis. Support Syst. 2011, 50, 511–521. [Google Scholar] [CrossRef]
Hu, N.; Bose, I.; Koh, N.S.; Liu, L. Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decis. Support Syst. 2012, 52, 674–684. [Google Scholar] [CrossRef]
Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-Based Methods for Sentiment Analysis. Comput. Linguist. 2011, 37, 267–307. [Google Scholar] [CrossRef]
Park, S.; Kim, Y. Building Thesaurus Lexicon Using Dictionary-Based Approach for Sentiment Classification. In Proceedings of the 14th IEEE International Conference on Software Engineering Research, Management and Applications (SERA), Towson, MD, USA, 8–10 June 2016; pp. 39–44. [Google Scholar] [CrossRef]
Hearst, M.A.; Dumais, S.T.; Osman, E.; Platt, J.; Scholkopf, B. Support Vector Machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Rasmussen, C.E. The Infinite Gaussian Mixture Model. In Proceedings of the Advances in Neural Information Processing Systems 12, NIPS 1999, Denver, CO, USA, 7 April 1999; Volume 12, pp. 554–560. [Google Scholar]
O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent Neural Network Regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
Collobert, R.; Weston, J. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning, COLING 2014, Dublin, Ireland, 23–29 August 2008; pp. 160–167. [Google Scholar] [CrossRef]
Tao, H.; Tong, S.; Zhao, H.; Xu, T.; Jin, B.; Liu, Q. A Radical-Aware Attention-Based Model for Chinese Text Classification. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, Hilton Hawaiian Village, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5125–5132. [Google Scholar] [CrossRef]
Shi, W.; Yu, Z. Sentiment Adaptive End-to-End Dialog Systems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2018, Melbourne, Australia, 15–20 July 2018; Volume 1, pp. 1509–1519. [Google Scholar] [CrossRef]
Xing, F.Z.; Cambria, E.; Welsch, R.E. Intelligent Asset Allocation via Market Sentiment Views. IEEE Comput. Intell. Mag. 2018, 13, 25–34. [Google Scholar] [CrossRef]
Blitzer, J.; Dredze, M.; Pereira, F. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007, Prague, Czech Republic, 23–30 June 2007; pp. 440–447. [Google Scholar]
Melville, P.; Gryc, W.; Lawrence, R.D. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, Paris, France, 28 June 2009; pp. 1275–1284. [Google Scholar] [CrossRef]
Xing, F.Z.; Llucchini, F.P.; Cambria, E. Cognitive-Inspired Domain Adaptation of Sentiment Lexicons. Inf. Process. Manag. 2019, 56, 554–564. [Google Scholar] [CrossRef]
Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, J.; LeCun, Y. Character-Level Convolutional Networks for Text Classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, Montreal, QC, Canada, 7–12 December 2015; Volume 28, pp. 649–657. [Google Scholar]
Tang, D.; Qin, B.; Liu, T. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015; pp. 1422–1432. [Google Scholar] [CrossRef]
Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, Austin, TX, USA, 25–30 January 2015; pp. 2267–2273. [Google Scholar]
Liu, B.; Guan, W.; Yang, C.; Fang, Z.; Lu, Z. Transformer and Graph Convolutional Network for Text Classification. Int. J. Comput. Intell. Syst. 2023, 16, 161. [Google Scholar] [CrossRef]
Rocktäschel, T.; Grefenstette, E.; Hermann, K.M.; Kočiský, T.; Blunsom, P. Reasoning about Entailment with Neural Attention. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Kumar, A.; Irsoy, O.; Ondruska, P.; Iyyer, M.; Bradbury, J.; Gulrajani, I.; Zhong, V.; Paulus, R.; Socher, R. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 1378–1387. [Google Scholar]
Yang, M.; Yin, W.; Qu, Q.; Tu, W.; Shen, Y.; Chen, X. Neural Attentive Network for Cross-Domain Aspect-level Sentiment Classification. IEEE Trans. Affect. Comput. 2019, 12, 761–775. [Google Scholar] [CrossRef]
Yuan, Z.; Wu, S.; Wu, F.; Liu, J.; Huang, Y. Domain Attention Model for Multi-Domain Sentiment Classification. Knowl. Based Syst. 2018, 155, 1–10. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Zong, C. Multi-Domain Sentiment Classification. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, ACL 2008, Columbus, OH, USA, 15–20 June 2008; pp. 257–260. [Google Scholar]
Wu, F.; Yuan, Z.; Huang, Y. Collaboratively Training Sentiment Classifiers for Multiple Domains. IEEE Trans. Knowl. Data Eng. 2017, 29, 1370–1383. [Google Scholar] [CrossRef]
Evgeniou, T.; Pontil, M. Regularized Multi-Task Learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 109–117. [Google Scholar] [CrossRef]
Zhou, J.; Chen, J.; Ye, J. Malsar: Multi-task learning via structural regularization. Ariz. State Univ. 2011, 21, 1–50. [Google Scholar]
Liu, P.; Qiu, X.; Huang, X. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 1–10. Available online: https://aclanthology.org/P17-1001/ (accessed on 30 July 2017).
Liu, X.; Gao, J.; He, X.; Deng, L.; Duh, K.; Wang, Y. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2015, Denver, CO, USA, 31 May–5 June 2015; pp. 912–921. [Google Scholar] [CrossRef]
Liu, P.; Qiu, X.; Huang, X. Deep Multi-Task Learning with Shared Memory for Text Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, TX, USA, 1–4 November 2016; pp. 118–127. [Google Scholar]
Dragoni, M.; Petrucci, G. A Neural Word Embeddings Approach for Multi-Domain Sentiment Analysis. IEEE Trans. Affect. Comput. 2017, 8, 457–470. [Google Scholar] [CrossRef]
Katsarou, K.; Douss, N.; Stefanidis, K. REFORMIST: Hierarchical Attention Networks for Multi-Domain Sentiment Classification with Active Learning. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, Tallinn, Estonia, 27–31 March 2023; pp. 919–928. [Google Scholar]
Katsarou, K.; Jeney, R.; Stefanidis, K. MUTUAL: Multi-Domain Sentiment Classification via Uncertainty Sampling. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, Tallinn, Estonia, 27–31 March 2023; pp. 331–339. [Google Scholar]
Dai, Y.; El-Roby, A. DaCon: Multi-Domain Text Classification Using Domain Adversarial Contrastive Learning. In Proceedings of the International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2023; pp. 40–52. [Google Scholar]
Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Zeiler, M.D. Adadelta: An Adaptive Learning Rate Method. arXiv 2012, arXiv:1212.5701. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]

Figure 1. Overall framework of AP-MDSC.

Figure 2. The process of generating attention weights and a text representation.

Figure 3. Attention distribution over a sample from the DVD domain (the text in the image corresponds to the description in the second paragraph of this section).

Table 1. Statistical information on the Amazon sentiment analysis dataset. N: number of labeled samples; L: maximum sample length; V: number of words.

	Negative	Positive
Domain	(train/val/test)	(train/val/test)
Book	700/200/100	700/200/100
DVD	700/200/100	700/200/100
Kitchen	700/200/100	700/200/100
Electronics	700/200/100	700/200/100
Amazon	N: 8000 (5600/1600/800) L: 5939 V: 2022

Table 2. Sentiment classification accuracy (%) on each domain of the Amazon dataset.

	Books	DVDs	Kitchne	Electronics	Overall
Model	Books	DVDs	Kitchne	Electronics	Overall
LS-single	77.80	77.88	84.33	81.63	80.41
LS-all	78.40	79.76	85.73	84.67	82.14
SVM-single	78.56	78.66	84.74	83.03	81.25
SVM-all	79.16	80.97	86.06	85.15	82.84
LSTM-single	78.51	78.74	81.80	80.72	79.94
LSTM-all	83.74	79.70	83.56	82.87	82.47
MTL-DNN	79.70	80.50	82.80	82.50	81.38
MTL-CNN	80.20	81.00	83.00	83.40	81.90
MTL-SM	82.80	83.00	84.30	85.50	83.90
MDSC-Com	79.07	80.09	85.54	83.77	82.12
RMTL	81.33	82.18	87.02	85.49	84.01
MTL-Graph	79.66	81.84	87.06	83.69	83.06
CMSC	81.16	82.08	87.13	85.85	84.06
ASP-MTL	84.00	85.50	86.20	86.80	85.63
NeuroSent	79.66	80.90	86.86	86.41	83.46
DAM	87.75	86.58	88.93	87.50	87.69
AP-MDSC	90.04	84.01	88.07	89.90	88.01

Table 3. Average sentiment classification accuracies of AP-MDSC and DAM on Amazon datasets with different numbers of domains.

	Books	DVDs	Kitchne	Electronics	Music	Overall
Model	Books	DVDs	Kitchne	Electronics	Music	Overall
DAM-2	87.57	82.57	-	-	-	85.07
AP-MDSC-2	90.10	84.65	-	-	-	87.38
DAM-3	88.02	86.00	84.45	-	-	86.16
AP-MDSC-3	89.21	84.65	86.14	-	-	86.67
DAM-4	87.75	86.58	88.93	87.50	-	87.69
AP-MDSC-4	90.04	84.01	88.07	89.90	-	88.01
DAM-5	90.79	84.95	87.72	87.72	88.76	87.99
AP-MDSC-5	91.73	83.96	87.72	87.92	88.86	88.05

Table 4. Average domain classification accuracies of AP-MDSC with different pooling layers.

Model	Accuracy
AP-MDSC (mean-pooling)	87.13
AP-MDSC (max-pooling)	87.55
AP-MDSC (min-pooling)	88.01

Table 5. Average domain classification accuracies of AP-MDSC with/without inverse events limitation.

Model	Accuracy
AP-MDSC (U-unlimited)	87.72
AP-MDSC (U-limited)	88.01

Table 6. Computational cost (seconds) of model training.

Model	DAM-2	DAM-3	DAM-4	DAM-5
Time cost	67∼72	133∼141	203∼214	267∼281
Model	AP-MDSC-2	AP-MDSC-3	AP-MDSC-4	AP-MDSC-5
Time cost	70∼76	139∼147	213∼22	284∼300

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yue, C.; Li, A.; Chen, Z.; Luan, G.; Guo, S. Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification. Appl. Sci. 2024, 14, 7971. https://doi.org/10.3390/app14177971

AMA Style

Yue C, Li A, Chen Z, Luan G, Guo S. Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification. Applied Sciences. 2024; 14(17):7971. https://doi.org/10.3390/app14177971

Chicago/Turabian Style

Yue, Chunyi, Ang Li, Zhenjia Chen, Gan Luan, and Siyao Guo. 2024. "Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification" Applied Sciences 14, no. 17: 7971. https://doi.org/10.3390/app14177971

APA Style

Yue, C., Li, A., Chen, Z., Luan, G., & Guo, S. (2024). Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification. Applied Sciences, 14(17), 7971. https://doi.org/10.3390/app14177971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Domain-Aware Neural Network with a Novel Attention-Pooling Technology for Binary Sentiment Classification

Abstract

1. Introduction

2. Related Work

2.1. Sentiment Classification

2.2. Attention and Pooling Mechanisms in Neural Networks

2.3. Multi-Domain Scenarios

3. Methods

3.1. Overall Framework

3.2. Proposed Attention Mechanism

3.3. Training

4. Experiments

4.1. Experimental Settings

4.2. Performance Evaluation

4.3. Comparative Experiments in Datasets with Different Domains

4.4. Pooling Techniques

4.5. Inverse Events

4.6. Computational Costs

4.7. Visualization of Attention Weights

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI