Weibo Text Sentiment Analysis Based on BERT and Deep Learning

Li, Hongchan; Ma, Yu; Ma, Zishuai; Zhu, Haodong

doi:10.3390/app112210774

Open AccessArticle

Weibo Text Sentiment Analysis Based on BERT and Deep Learning

School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(22), 10774; https://doi.org/10.3390/app112210774

Submission received: 9 October 2021 / Revised: 29 October 2021 / Accepted: 8 November 2021 / Published: 15 November 2021

(This article belongs to the Special Issue Application of Artificial Intelligence, Deep Neural Networks)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid increase of public opinion data, the technology of Weibo text sentiment analysis plays a more and more significant role in monitoring network public opinion. Due to the sparseness and high-dimensionality of text data and the complex semantics of natural language, sentiment analysis tasks face tremendous challenges. To solve the above problems, this paper proposes a new model based on BERT and deep learning for Weibo text sentiment analysis. Specifically, first using BERT to represent the text with dynamic word vectors and using the processed sentiment dictionary to enhance the sentiment features of the vectors; then adopting the BiLSTM to extract the contextual features of the text, the processed vector representation is weighted by the attention mechanism. After weighting, using the CNN to extract the important local sentiment features in the text, finally the processed sentiment feature representation is classified. A comparative experiment was conducted on the Weibo text dataset collected during the COVID-19 epidemic; the results showed that the performance of the proposed model was significantly improved compared with other similar models.

Keywords:

BERT; sentiment analysis; Weibo text; word vector; deep learning

1. Introduction

The emergence of a large number of online public opinions is the product of people expressing their own opinions in response to various events, and the resulting online public opinions are related to the development of things. Sina Weibo (referred to in this paper as ‘Weibo’), as one of the social platforms with the most users in China, carries a large amount of online public opinion. Users post their views on things in the form of short texts on the Weibo platform for other users to browse, comment on and forward. In addition, users can also search for related events by keywords. Through these methods, certain events that attract people’s attention can be spread quickly and widely, and then generate public opinion on the internet. Therefore, Weibo is regarded as a platform for the public to collect and publish information and learn about social knowledge to manage uncertainty and risk [1], hence sentiment analysis of Weibo text is critical.

Text sentiment analysis refers to analyzing the emotional information contained in a text and classifying it into its category, and the classification results can be applied to some other downstream tasks. It plays a vital role in monitoring online public opinion. Analyzing the emotional tendency of short texts posted by Weibo users can not only aid understanding of users’ views on things and grasp user psychology, but also further control the trend of online public opinion and improve government credibility. It also can speed up the government’s response to online public opinion.

Recently, sentiment analysis has been a research hotspot in the field of data mining and natural language processing [2]. Therefore, many methods for sentiment analysis were born. At present, commonly used text sentiment analysis methods are divided into early rule-based methods, machine learning methods, and the currently more popular deep learning methods. Rule-based methods include methods based on sentiment dictionary, and so forth. This method requires artificial construction of sentiment dictionary [3]. However, due to the endless emergence of a large number of online new words and new ideas in the internet, the constructed sentiment dictionary can not meet the needs, so this method cannot achieve a high accuracy rate [4]. Methods based on machine learning include support vector machine, decision tree, and so forth, which also require a certain labor cost, and the accuracy of classification depends on the results of feature extraction. However, due to the continuous emergence of massive information, it is difficult to fully extract text features, making the classification accuracy drop. Compared with the traditional machine learning methods, the deep learning technology that appeared later has achieved more successful results in the fields of computer vision [5], machine translation [6], and text classification [7]. Methods based on deep learning algorithms such as CNN and LSTM do not require manual intervention, and the classification accuracy is high but it needs to use a large scale dataset to train the model, and it is difficult to construct a large scale and high-quality training dataset. At present, many researchers combine rule-based methods with machine learning or deep learning techniques. They have achieved satisfactory results in terms of accuracy, but there are often problems such as excessively complex models [8].

In view of the shortcomings of the above methods, in order to increase the accuracy of the text sentiment analysis model, based on the BERT and related deep learning techniques, this paper proposes a new sentiment analysis model. Experimental results show that this model has significantly improved accuracy compared with other similar models.

2. Related Work

This section reviews the work done in the field of text sentiment analysis from three aspects: sentiment analysis methods based on sentiment dictionary, sentiment analysis methods based on deep learning and sentiment analysis methods based on transformers.

2.1. Research on Sentiment Analysis Based on Sentiment Dictionary

The method based on sentiment dictionary is the earliest method applied to text sentiment analysis, and its classification performance mainly depends on the constructed sentiment dictionary. This method first calculates the similarity between each word in the text and the words in the sentiment dictionary, then weights the results of all words, and judges the sentiment polarity of the entire text according to the weighted results. Murtadha et al. [9] believe that in different application fields, the emotional polarity of words and the emotional intensity they carry will show changes, so they constructed a field-dependent emotional dictionary to deal with changes. Zhang et al. [10] constructed a sentiment dictionary specifically for Weibo texts to analyze Weibo texts, in order to better monitor online public opinion and improve the work efficiency of network regulators. Atanu et al. [11] created a new and diversified dictionary called Senti-N-Gram for sentiment analysis. Han et al. [12] introduced the concept of mutual information in the sentiment dictionary, combined the traditional sentiment dictionary with mutual information, and proposed a method for generating a sentiment dictionary for a specific field. Wu et al. [13] proposed a new sentiment dictionary construction method, which can automatically construct a specific target.

2.2. Research on Sentiment Analysis Based on Deep Learning

In recent years, due to the deep learning-based methods that do not require manual intervention and have high classification accuracy, they have attracted the attention of many scholars. Wei et al. [14] proposed a BiLSTM model based on a multi-polar orthogonal attention mechanism for invisible sentiment analysis. Compared with the traditional single attention model, this model can identify the difference between words and sentiment tendencies. Kai et al. [15] proposed a based neural network emotional information collection and extraction structure for emotional analysis. The structure consists of a based BiLSTM structure emotional information collector (SIC) and a based CNN emotional information extractor (SIE). Wu et al. [16] proposed a new labelling strategy, and used a two-level LSTM network to construct a sentiment classifier for sentiment analysis. Simultaneously, the polarity reversal was simulated, and the experiment achieved good results. Liu et al. [17] proposed a novel architecture AC-BiLSTM for sentiment analysis. In this structure, the convolutional layer extracts features from word vectors and uses BiLSTM to access contextual information, and then uses the attention mechanism to analyze the output information. Given different degrees of attention, the experimental results show that the structure can capture both the local features of phrases and the global semantics of sentences. Basiri et al. [18] proposed a sentiment analysis model called ABCDM based on a parallel structure, which combines technologies such as BiLSTM, BiGRU and CNN. It conducted experiments on five English comment datasets and three Twitter datasets, and achieved the best results. Gonzalez et al. [19] proposed a sentiment analysis model based on methods such as CNN, RNN and polar dictionary, which verified the effectiveness of the model on English and Arabic datasets. Alexandridis et al. [20] proposed a model that effectively recognizes negative emotions. This model combines traditional deep learning techniques and introduces an attention mechanism and a knowledge management system, which can further improve the accuracy of classification. Alexandridis et al. [21] tested the performance of various text classifiers such as feed-forward neural networks in Greek sentiment analysis, and the experimental results were clearly displayed.

2.3. Research on Sentiment Analysis Based on Transformer

Due to its complex structure and powerful functions, Transformer has recently shined in sentiment analysis tasks, and BERT is one of its typical representatives. Gonzalez et al. [22] proposed a Twitter sentiment analysis model specifically for Spanish based on improved BERT, named TWilBERT. A Reply Order Prediction signal Integrates into the model to learn inter-sentence coherence in Twitter conversations, which improves the performance. Experiments on fourteen different sentiment classification tasks, the results outperformed the state-of-the-art models. Zhao et al. [23] proposed a knowledge-based language representation model BERT for aspect-based sentiment analysis.The main feature of this model is the integration of external emotional domain knowledge into BERT, which can obtain better performance with a small amount of training data. Alaparthi et al. [24] compared the relative effectiveness of four sentiment analysis techniques and proved the undisputed advantage of BERT in text sentiment classification. Yenduri et al. [25] proposed a novel customized BERT-oriented model for Twitter sentiment classification. The model sends the preprocessed and tokenized tokens into the BERT for classification, which can effectively solve problems caused by slang, modern accents, grammar and spelling errors.

3. Preliminary Knowledge

In this section, a brief overview of the basic building blocks of our model is presented. Specifically, BERT, CNN and LSTM are described respectively.

3.1. BERT

BERT is a natural language processing preprocessing model based on neural network. Compared with the traditional language model word2vec, BERT is more flexible and more advantageous and can be used as a substitute for word2vec. word2vec represents each word in a fixed form and does not change with context. That is, the representation has nothing to do with the context in which the word appears, but due to the complex semantic characteristics of natural language itself, the meaning of the same word in different contexts may be different, so using word2vec for word vector representation will reduce the accuracy of the results of downstream tasks. The BERT can dynamically adjust the word vector representation according to the context in which the word is located. Since the birth of BERT, it has set new records in 11 natural language processing tasks, such as MultiNLI, QQP, and so forth [26]. Besides, the word vector obtained by using BERT has higher quality features. Inputting these word vectors with higher quality features can achieve better results in downstream tasks.

Due to the complex structure of BERT, a lot of training time and expensive training costs are required. However, Google has opened its source code. Users only need to fine-tune the pre-trained BERT model according to the actual task to use, which greatly saves training time and cost. The trained BERT model can be applied to a variety of different natural language processing downstream tasks and only needs to be fine-tuned in various degrees. In the sentiment analysis task, a [CLS] symbol is added to the pre-trained BERT model, and the symbol is inserted at the forefront of the input text. The output result of the symbol can be regarded as the semantic representation of the entire input text, that is, use the output of the symbol as the result of sentiment classification. Because the symbol is different from other words in the text, this symbol independent of the input text does not carry obvious semantic information, so it will output the result more objectively and impartially. The output can effectively integrate the semantic information carried by each word in the input text and then better represent the text [27]. The model structure of BERT is shown in Figure 1.

3.2. CNN

As the most traditional deep learning model, CNN is very popular in the field of natural language. It is a feedforward neural network composed of a 1-dimensional convolutional layer and a 1-dimensional pooling layer [28], which relies on the most classic convolutional and pooling operations have shined in feature extraction and have become the most popular feature extractor at present. CNN is mostly used to process local information, that is, the results of convolutional and pooling are based on the initial local context information. It means CNN has strong feature extract capabilities, which can successfully cut down the dimensionality of input data and increase robustness [29].

To apply a CNN to a text

S

with s words, first convert the s words in the text into word vectors of dimension e, and then repeatedly apply a filter

F

of size

s x h

to the generated text matrix, the filter

F

slides according to the specified step size and acts on the different sub-matrices of the generated text matrix. This process will generate a feature map

M = [m_{0}, m_{1}, \dots, m_{s - h}]

, and then the result of the convolution is sent to the pooling layer to cut down the dimensionality of the feature vector. Commonly used pooling methods include max pooling and average pooling. Max pooling is to retain the most important part of the feature vector, while average pooling is to retain the average value of the feature vector, and the output result of the pooling layer can be used as the input of the fully connected layer.

The method of generating feature mapping is shown in Formula (1):

m_{i} = F \cdot S_{i : i + h - 1} .

(1)

In Formula (1),

i = 0, 1, \dots, s - h

and

S_{i : j}

is the sub-matrix of

S

from the i-th row to the j-th row.

The method of max pooling is shown in Formula (2):

r = max \{m_{i}\} 0 \leq i \leq s - h .

(2)

The average pooling method is shown in Formula (3):

q = \frac{\sum m_{i}}{s - h} 0 \leq i \leq s - h .

(3)

The structure the CNN is shown in Figure 2:

3.3. LSTM

Although the RNN can process sequence information, it is prone to problems such as gradient disappearance or gradient explosion. LSTM can effectively solve these problems. Compared with RNN, LSTM introduces a memory unit

C_{t}

and three logic gates: input gate

i_{t}

, output gate

o_{t}

and forget gate

f_{t}

. The network determines the output information at the time through the output information at the previous time and the input information at the time, and uses part of the output information at the current time as the input information at the next time. Through the memory unit and logic gates, LSTM can decide how much historical information and the input information at the current moment to keep, which can better transmit and abandon the information. However, LSTM can only process information in a single direction, and the words in the text will be affected by its context. Therefore, it is necessary to process information in both the forward and reverse directions. Then, the BiLSTM is proposed which the information in both directions can be processed at the same time. It means the output of the BiLSTM network contains both contextual information before and after. The structure of BiLSTM is shown in Figure 3:

In one direction, suppose that

h_{t}

and

x_{t}

are the hidden state vector and input vector at time t, respectively,

U

and

W

represent the weight matrix to be trained, and b represents the bias term to be trained.

The forget gate expresses how much information to forget by outputting a number in the interval

[0, 1]

according to the following formula, as shown in Formula (4):

f_{t} = σ (W_{f} h_{t - 1} + U_{f} x_{t} + b_{f}) .

(4)

The input gate determines how much new information to store by calculating

i_{t}

and

C_{t}

and combining them according to the following equations, as shown in Formulas (5)–(7):

i_{t} = σ (W_{i} h_{t - 1} + U_{i} x_{t} + b_{i})

(5)

C_{t} = tanh (W_{c} h_{t - 1} + U_{c} x_{t} + b_{c})

(6)

A_{t} = f_{t} ⊙ A_{t - 1} + i_{t} ⊙ C_{t} .

(7)

The output gate determines which parts of the current existing information are output, as shown in Formulas (8) and (9):

o_{t} = σ (W_{o} h_{t - 1} + U_{o} x_{t} + b_{o})

(8)

h_{t} = o_{t} ⊙ tanh (A_{t}) .

(9)

BiLSTM combines the forward and reverse hidden states as the final hidden state representation at time t. In this way, it is conducive to the flow of time information in two directions, and context information can be better learned.

4. Proposed Method

Aiming at the shortcomings of most existing sentiment classification models, this paper proposes a new model. The model consists of five parts: Pre-train layer, BiLSTM layer, Attention layer, CNN layer and Full Connected layer. The structure is shown in Figure 4:

Suppose the input text sentence is

S = {w_{1}, w_{2}, \dots, w_{i}, \dots, w_{n}}

, where

w_{i}

represents the i-th word in the sentence S. The task of the model is to assume a given text sentence S, input S into the model, and the model outputs the sentiment polarity P of the sentence.

We first need to build a sentiment dictionary

S D

, which consists of two parts, the first part is the sentiment words, and the second part is the sentiment weight

s w_{i}

corresponding to each sentiment word. The function of the sentiment dictionary is to assign a sentiment weight

s w_{i}

to words that belong to

S D

in S, and to assign a sentiment weight one to words that do not belong to

S D

in S. The sentiment dictionary used in this article is based on the open source sentiment dictionary provided by Dalian University of Technology [30]. We only retain the positive and negative words in the original dictionary, and removes the words that represent neutral or both genders. The sentiment strength of the word in the original sentiment dictionary is used as the sentiment weight

s w_{i}

of the constructed sentiment dictionary. It is worth mentioning that, for negative words, the sentiment weight should be the sentiment strength multiplied by

- 1

. The expression of sentiment weight is as follows:

senti (w_{i}) = \{\begin{matrix} s w_{i}, & w_{i} \in S D \\ 1, & w_{i} \notin S D . \end{matrix}

(10)

In Formula (10),

w_{i}

represents the i-th word in S,

s w_{i}

represents the sentiment weight of the word

w_{i}

in the constructed sentiment dictionary, and

S D

is the constructed sentiment dictionary.

In the pre-train layer, through the fine-tuned BERT model, the input word

w_{i}

is converted into a word vector

v_{i}

, and then the sentiment weight in

S D

is used to weight

v_{i}

, as shown in Formula (11):

v_{i}^{'} = v_{i} * senti (w_{i}) .

(11)

The weighted word vector carries a stronger emotional color, and we use it as the input of the BiLSTM layer. At this time, the text sentence starts to classify. In the text data, the current word will be affected by adjacent words, and the LSTM network can only extract information in one direction, so this paper chooses to use the BiLSTM network. The BiLSTM network can process a long text sequence and apply it after the pre-train layer which function is to extract the dependencies of the text in the forward and backward directions. It can combine the historical information at the previous moment and the input at the current moment to determine the output information at the current moment, so it can extract contextual features in the sequence data. For the input

v_{t}^{'}

at time t, the hidden state representation

\vec{h_{t}}

and

\overset{\leftarrow}{h_{t}}

obtained by the forward LSTM and the reverse LSTM, as shown in Formulas (12) and (13):

\vec{h_{t}} = \vec{LSTM} (v_{t}^{'}), t \in [1, n]

(12)

\overset{\leftarrow}{h_{t}} = \overset{\leftarrow}{LSTM} (v_{t}^{'}), t \in [n, 1] .

(13)

For each word, we can connect its forward and backward context information, and then obtain the annotation of the word. Combining

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

is the output

h_{t}

of the state at time t, as shown in Formula (14):

h_{t} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}] .

(14)

In the task of text sentiment analysis, some words have a strong emotional color, and some words have no emotional color or only a small amount of emotional color. Different words have different effects on judging the sentiment polarity of the entire text. Therefore, the model introduces an attention mechanism to assign different sentiment weights to different words. Word with stronger emotion is given a higher attention, that is, a higher weight is assigned. For the hidden state output

h_{t}

of BiLSTM, the attention weight is allocated as follows:

u_{t} = tanh (W_{w} h_{t} + b_{w})

(15)

α_{t} = \frac{exp (u_{t}^{T} u_{w})}{\sum_{t} exp (u_{t}^{T} u_{w})}

(16)

Z = \sum_{t} α_{t} h_{t} .

(17)

In the above formula,

u_{t}

is the hidden state representation,

u_{w}

is the contextual content vector of the article,

u_{w}

is randomly initialized at the beginning of training and is continuously optimized during the training process.

The similarity between

u_{t}

and

u_{w}

is used to judge the importance of the word. We standardize

u_{t}

to get the attention weight, and the calculated attention weight is finally weighted with

h_{t}

and aggregated into

Z

.

Z

represents the entire input text vector, which contains the sentiment information of all words in the text.

After obtaining the comment vector containing all word information, convolution is performed on the comment vector. Convolution is to extract the most influential local feature information in the input text information through a filter, which can reduce the data dimension and the model can get the position invariant. The convolutional operation is performed according to Formula (1).

After the convolutional operation is over, we perform the pooling operation. The role of the pooling layer is to further compress the convolutional feature vector, which can reduce the vector dimension and reduce the computational complexity. Traditional pooling methods include max pooling and average pooling. If max pooling is used, while retaining the most important features, other important features may be omitted. Therefore, this article uses both max pooling and average pooling in parallel. In this way, the two pooled feature vectors are connected together as the final feature vector and sent to the fully connected layer, which can improve the robustness of the feature vector while retaining important features as much as possible. The max pooling and average pooling operations are independently applied to the feature vector after the convolution operation, and then the results of the two are connected to obtain the text vector

L_{c}

.

After obtaining the pooled

L_{c}

vector, this paper uses batch normalization to speed up network training and reduce overfitting [31]. In order to predict the emotional polarity of the review, the fully connected layer is used to convert the text vector

L_{c}

into the final emotional representation

X

, and finally the emotional representation

X

is sent to the output layer to perform the binary classification task using the sigmoid function.

In the output layer, the sigmoid function is used to convert the sentiment representation

X

into the approximate probability value

Y

of each emotional polarity category. This function can map the input information to the

[0, 1]

interval.

Y

close to 0 indicates that the sentiment category is close to negative,

Y

close to 1 means that the sentiment category is close to positive, and then the emotional label of the input text is obtained, that is, the positive and negative two-category sentiment analysis task is completed.

Y = sigmoid (W \cdot X + b),

(18)

where

X

is the high-level emotional representation obtained through the fully connected layer, and

W

and

b

are the parameter matrix and bias items that are continuously learned during the training process.

5. Experiment and Analysis

In this section, we evaluate our model for sentiment analysis task.

5.1. Experimental Setup

5.1.1. Dataset

The dataset used in this article is a public dataset of Weibo text collected during the COVID-19 epidemic. The dataset is based on the subject keywords related to ‘COVID-19’ for data collection, and captured from 1 January 2020 to 20 February 2020, a total of more than 1,000,000 Weibo data were labelled. The labelled results were divided into two categories: P (representing positive text) and N(representing negative text). We select more than 100,000 pieces of data as our dataset which contains 102,830 comments, of which 50,830 are positive comments and 52,000 are negative comments. Examples of emotional tags are shown in Table 1:

5.1.2. Evaluation Criteria

The experiment uses four evaluation criteria: Accuracy (Acc), Precision (P), Recall (R) and F1-measure (F1), which are widely used in text classification and sentiment analysis tasks.

5.1.3. Data Preprocessing

First, the text data need to be segmented. This article uses Jieba tokenizer to segment the comments. Before segmentation, all words in the constructed sentiment dictionary should be added to the Jieba tokenizer to prevent sentiment words from being divided into two words, which will affect the result of sentiment analysis. After the word segmentation is over, illegal characters and stop words should be removed from the word segmentation results, which can improve the performance of the model.

5.2. Baselines

In order to verify the effectiveness of the proposed model in analyzing the sentiment of Weibo text comments collected during the epidemic, we designed the following comparative experiment.

CNN [32]: The most basic convolutional neural network for sentiment analysis.
CNN+Att [33]: After the convolutional neural network extracts the main features, it uses the attention mechanism to give different degrees of attention to the extracted features, and then classifies the feature vectors with different attention weights.
BiLSTM [8]: Using two LSTM networks, it can process information in both forward and backward directions, effectively fusing text contextual content.
BiLSTM+Att [34]: The attention mechanism is introduced into the BiLSTM network to assign different weights to different words, which can better reflect the importance of different words.
BERT [24]: A powerful and open source text pre-training model based on the transformer structure.
BERT+BiLSTM+Att [27]: The words are dynamically converted into word vectors through BERT, so that the converted word vectors are closer to the context. The generated word vectors are used to extract the sentiment features through the BiLSTM network, and the extracted feature vectors are given different attention weights through the attention mechanism.
AC-BiLSTM [17]: A new framework for sentiment analysis that combines bidirectional LSTM (BiLSTM) and asymmetric CNN (ACNN) with high classification accuracy.
ABCDM [18]: The model introduces a parallel mechanism so that BiLSTM and BiGRU can work at the same time. The addition of the attention mechanism allows the model to allocate weights more reasonably, and finally extract features through CNN.

5.3. Experimental Results

In order to evaluate the performance of our proposed model, we randomly divide the dataset into ten parts, using nine of them as the training set and validation set, the remaining one part as the test set, taking the results as the evaluation result of our model.

As we all know, the performance of the model is affected by the number of iterations. Therefore, the optimal number of iterations suitable for the model is explored here through experiments. The experimental results are shown in Figure 5 and Table 2. It can be seen that, when the number of iterations is eight, the performance of the model is optimal. When the number of times is greater than eight, the performance of the model begins to decline, which is caused by the phenomenon of overfitting. When the number of times is less than eight, the performance of the model is also not optimal due to the insufficient ability of learning features.

The input length of the text also affects the performance of the model. Therefore, choosing an appropriate input text length is very important. If the review length is larger than the selected length, it is intercepted, and if the review length is smaller than the selected length, 0 will be added. We chose the maximum text length of 648 and the average text length of 12 in the dataset for the experiment. The experimental results are shown in Table 3 and Figure 6. Through experiments, it can be seen that when the selected input text length is the average text length, the performance of the model is better. This is because the Weibo data is mostly short text. If a longer text length is selected, most texts need to be filled with a lot of zeros. Filling reduces the accuracy while increasing the time loss, thereby reducing the performance of the model.

We introduced the dropout value in the model to increase the generalization ability of the model, prevent the model from overfitting, and the performance of the neural network has been further improved. We set up different dropout values for comparison experiments, and learned through experiments that when the dropout value is equal to 0.6, the experimental result is the best. The dropout value at this time can effectively prevent the occurrence of model overfitting. The experimental results are shown in Table 4 and Figure 7.

Through the above experiments, the best parameters of the model are obtained. The parameter settings of the model are shown in Table 5:

The experiment was conducted on the collected Weibo text dataset related to the COVID-19 epidemic. The experimental results of the proposed model and other competitive models are shown in Table 6. We can divide the competition model into three parts. The first part is the traditional deep learning model, including CNN and BiLSTM, the second part is based on transformer models, such as BERT, and the last part is the combined model.

As shown in the table, although traditional deep learning models do not need to manually label features, their performance is not satisfactory. It is worth mentioning that the same is a deep learning related model, the accuracy of the BiLSTM network is noticeably higher than CNN. This is because BiLSTM considers the context information of text from two directions and solves the related dependencies of long text. In addition, the attention mechanism can also increase the accuracy by giving different degrees of attention to different words. After introducing the BERT language model, the accuracy of classification has been significantly increased. This is because BERT is a powerful pre-training language model. Unlike traditional language models such as Glove and Word2vec, BERT can dynamically, based on context, generate word vectors and use them for downstream tasks.

The proposed model defeats most competitive models in the accuracy of classification, and reflects its superiority. The advantages are mainly reflected in the following points: First, the model combines the advantages of BiLSTM, the CNN network, and the attention mechanism. Secondly, the model uses the more advanced BERT language model to mine the deep semantics of words and dynamically generate high-quality word vectors. Finally, the model introduces an external sentiment dictionary when constructing the word vector, and strengthens the sentiment intensity of the word vector through the sentiment dictionary.

The experimental results show that the accuracy and other performance indicators of the proposed model in this article are lower than those of the ABCDM model. This is because BiLSTM and BiGRU are used in parallel in ABCDM, which give the model the ability to process long text and short text at the same time, so its performance is slightly better than the model proposed in this paper. This has inspired us, and our model is also going to consider parallel mechanisms in the next step.

Besides, in order to study the impact of the introduction of sentiment dictionary and CNN on proposed model, we established four controlled experiments to conduct ablation studies based on the model proposed in this article:

−CNN − Lexion: We remove the emotional dictionary and convolutional neural network from our model, so that the remaining model is equivalent to the BERT + BiLSTM + Attention model. BERT dynamically converts words into word vectors, and then converts the word vectors It is sent to the BiLSTM network to extract context information, and then different word vectors are given different degrees of attention through the attention mechanism, and are finally sent to the output layer for classification.

−CNN + Lexion: We remove the convolutional and pooling operations. The pooling operation here refers to the parallelization of max pooling and average pooling, and then connect the pooled results (the same below), and retain external sentiment dictionary. Through this experiment, the influence of sentiment dictionary on the model can be explored.

+CNN − Lexion: We retain the convolutional and pooling operations, but do not introduce the external sentiment dictionary, through this experiment to learn about the effect of the CNN on the model.

+CNN + Lexion: All components of the model are retained.

The experimental results are shown in Table 7 and Figure 8. Compared with removing the CNN and the sentiment dictionary in the meantime, the accuracy of the experiment is increased by 2.7% when all components are retained. After adding the CNN, the accuracy is improved by 1.3%. This is because the CNN can extract local features and reduce the dimension of the feature vector. In addition, the average pooling result is added to the max pooling result to retain important features as much as possible and increase its robustness. It can also observe that the introduction of a sentiment dictionary can strengthen the performance of the model, and the accuracy of the experiment is increased by 1.6%. This is because some emotional words with strong emotional color play a significant role in judging the sentiment tendency of the entire text; we weight the word vectors through the introduced sentiment dictionary to improve the sentiment strength of the word vector, thereby improving the accuracy of classification.

6. Conclusions

At present, deep learning models are popular in the field of sentiment analysis, but the existing traditional models can be further increased in accuracy. This paper proposes a new model based on BERT and deep learning algorithms for sentiment analysis. The model uses the BERT to convert the words in the text into corresponding word vectors, and also introduces a sentiment dictionary to enhance the sentiment intensity of the word vector, and then uses a BiLSTM network to extract the forward and reverse contextual information. In order to more or less emphasize different words in the text, the attention mechanism is added to the output of the BiLSTM network. Through convolutional and pooling operations, the main features are extracted, which reduces the dimensionality of the feature vector and increases the robustness of the model. Therefore, the long-distance dependence between texts and local features are effectively processed.

We conducted experiments on the collected COVID-19 Weibo text dataset. Compared with the comparison model, the proposed model achieves the best experimental results. We also set up ablation experiments to explore the effects of CNN and the sentiment dictionary on the proposed model.

The research in this article points out two directions for our future work. One direction is that the BERT model consumes huge hardware resources and requires a huge corpus for training. How to train a high-quality and light weight BERT model and apply the trained BERT model to downstream tasks is our future research direction. On the other hand, we noticed that the accuracy of the experiment could be further improved by introducing an external sentiment dictionary. Therefore, our next goal is to introduce more external knowledge, such as part-of-speech information.

Author Contributions

Conceptualization, H.L.; Methodology, H.L. and Y.M.; Software, Z.M.; Validation, Y.M.; Resources, H.Z.; Writing—original draft preparation, Y.M.; Writing—review and editing, H.Z. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Science Research Project of Colleges and Universities in Henan Province of China (No. 19A520009) and the National Science Foundation of China (No. 81501548).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study could be provided by request.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their helpful comments and suggestions, which have improved the presentation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, T.; Lu, K.; Chow, K.P.; Zhu, Q. COVID-19 sensing: Negative sentiment analysis on social media in China via BERT model. IEEE Access 2020, 8, 138162–138169. [Google Scholar] [CrossRef]
Hussain, A.; Cambria, E. Semi-supervised learning for big social data analysis. Neurocomputing 2018, 275, 1662–1673. [Google Scholar] [CrossRef] [Green Version]
Asghar, M.Z.; Khan, A.; Ahmad, S.; Qasim, M.; Khan, I.A. Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 2017, 12, e0171649. [Google Scholar] [CrossRef]
Birjali, M.; Kasri, M.; Beni-Hssane, A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 2021, 226, 107134. [Google Scholar] [CrossRef]
Campos, V.; Jou, B.; Giro-i Nieto, X. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image Vis. Comput. 2017, 65, 15–22. [Google Scholar] [CrossRef] [Green Version]
Dabre, R.; Chu, C.; Kunchukuttan, A. A survey of multilingual neural machine translation. ACM Comput. Surv. (CSUR) 2020, 53, 1–38. [Google Scholar] [CrossRef]
Tai, K.S.; Socher, R.; Manning, C.D. Improved semantic representations from tree-structured long short-term memory networks. arXiv 2015, arXiv:1503.00075. [Google Scholar]
Zabit Hameed, B.G.Z. Sentiment classification using a single-layered BiLSTM model. IEEE Access 2020, 8, 73992–74001. [Google Scholar] [CrossRef]
Ahmed, M.; Chen, Q.; Li, Z. Constructing domain-dependent sentiment dictionary for sentiment analysis. Neural Comput. Appl. 2020, 32, 14719–14732. [Google Scholar] [CrossRef]
Zhang, S.; Wei, Z.; Wang, Y.; Liao, T. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Future Gener. Comput. Syst. 2018, 81, 395–403. [Google Scholar] [CrossRef]
Dey, A.; Jenamani, M.; Thakkar, J.J. Senti-N-Gram: An n-gram lexicon for sentiment analysis. Expert Syst. Appl. 2018, 103, 92–105. [Google Scholar] [CrossRef]
Han, H.; Zhang, J.; Yang, J.; Shen, Y.; Zhang, Y. Generate domain-specific sentiment lexicon for review sentiment analysis. Multimed. Tools Appl. 2018, 77, 21265–21280. [Google Scholar] [CrossRef]
Wu, S.; Wu, F.; Chang, Y.; Wu, C.; Huang, Y. Automatic construction of target-specific sentiment lexicon. Expert Syst. Appl. 2019, 116, 285–298. [Google Scholar] [CrossRef]
Wei, J.; Liao, J.; Yang, Z.; Wang, S.; Zhao, Q. BiLSTM with multi-polarity orthogonal attention for implicit sentiment analysis. Neurocomputing 2020, 383, 165–173. [Google Scholar] [CrossRef]
Shuang, K.; Zhang, Z.; Guo, H.; Loo, J. A sentiment information collector–extractor architecture based neural network for sentiment analysis. Inf. Sci. 2018, 467, 549–558. [Google Scholar] [CrossRef] [Green Version]
Wu, O.; Yang, T.; Li, M.; Li, M. Two-Level LSTM for Sentiment Analysis With Lexicon Embedding and Polar Flipping. IEEE Trans. Cybern. 2020, PP, 99. [Google Scholar] [CrossRef]
Liang, D.; Zhang, Y. AC-BLSTM: Asymmetric convolutional bidirectional LSTM networks for text classification. arXiv 2016, arXiv:1611.01884. [Google Scholar]
Basiri, M.E.; Nemati, S.; Abdar, M.; Cambria, E.; Acharya, U.R. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gener. Comput. Syst. 2021, 115, 279–294. [Google Scholar] [CrossRef]
González, J.A.; Pla, F.; Hurtado, L.F. ELiRF-UPV at SemEval-2017 task 4: Sentiment analysis using deep learning. In Proceedings of the 11th International Workshop on Semantic Evaluation (SEMEVAL-2017), Vancouver, BC, Canada, 3–4 August 2017; pp. 723–727. [Google Scholar]
Alexandridis, G.; Aliprantis, J.; Michalakis, K.; Korovesis, K.; Tsantilas, P.; Caridakis, G. A Knowledge-Based Deep Learning Architecture for Aspect-Based Sentiment Analysis. Int. J. Neural Syst. 2021, 31, 2150046. [Google Scholar] [CrossRef]
Alexandridis, G.; Varlamis, I.; Korovesis, K.; Caridakis, G.; Tsantilas, P. A Survey on Sentiment Analysis and Opinion Mining in Greek Social Media. Information 2021, 12, 331. [Google Scholar] [CrossRef]
Gonzalez, J.A.; Hurtado, L.F.; Pla, F. TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter. Neurocomputing 2021, 426, 58–69. [Google Scholar] [CrossRef]
Anping Zhao, Y.Y. Knowledge-enabled BERT for aspect-based sentiment analysis. Knowl.-Based Syst. 2021, 227, 107220. [Google Scholar] [CrossRef]
Alaparthi, S.; Mishra, M. BERT: A sentiment analysis odyssey. J. Mark. Anal. 2021, 9, 118–126. [Google Scholar] [CrossRef]
Yenduri, G.; Rajakumar, B.R.; Praghash, K.; Binu, D. Heuristic-Assisted BERT for Twitter Sentiment Analysis. Int. J. Comput. Intell. Appl. 2021, 20, 2150015. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Cai, R.; Qin, B.; Chen, Y.; Zhang, L.; Yang, R.; Chen, S.; Wang, W. Sentiment analysis about investors and consumers in energy market based on BERT-BiLSTM. IEEE Access 2020, 8, 171408–171415. [Google Scholar] [CrossRef]
Liu, F.; Zheng, J.; Zheng, L.; Chen, C. Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification. Neurocomputing 2020, 371, 39–50. [Google Scholar] [CrossRef]
Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Mao, X.; Chang, S.; Shi, J.; Li, F.; Shi, R. Sentiment-Aware Word Embedding for Emotion Classification. Appl. Sci. 2019, 9, 1334. [Google Scholar] [CrossRef] [Green Version]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. JMLR.org 2015, PP, 448–456. [Google Scholar]
Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
Shin, B.; Lee, T.; Choi, J.D. Lexicon integrated cnn models with attention for sentiment analysis. arXiv 2016, arXiv:1610.06272. [Google Scholar]
Xie, J.; Chen, B.; Gu, X.; Liang, F.; Xu, X. Self-attention-based BiLSTM model for short text fine-grained sentiment classification. IEEE Access 2019, 7, 180558–180570. [Google Scholar] [CrossRef]

Figure 1. BERT structure.

Figure 2. Convolutional neural network structure.

Figure 3. BiLSTM structure.

Figure 4. Proposed model structure.

Figure 5. Experimental results of iterative experiments.

Figure 6. Experimental results of text length.

Figure 7. Experimental results of text length.

Figure 8. Comparison model experiment results.

Table 1. Specific cases of emotional polarity label.

Text	Label
I pay tribute to the medical staff who are fighting on the front line!	$P$
I have to infuse liquid for two days. It is too easy to get a fever on this day.	$N$

Table 2. Experimental results of iterative experiments.

Epoch	Accuracy	Precision	Recall	F1
3	$91.6 %$	$94.4 %$	$88.3 %$	$91.3 %$
5	$91.5 %$	$94.9 %$	$87.8 %$	$91.2 %$
7	$91.7 %$	$94.7 %$	$88.3 %$	$91.4 %$
8	$92.6 %$	$93.8 %$	$91.4 %$	$92.6 %$
9	$92.1 %$	$92.6 %$	$91.5 %$	$92.0 %$
11	$91.2 %$	$88.4 %$	$95.0 %$	$91.6 %$
13	$91.6 %$	$92.9 %$	$90.3 %$	$91.6 %$

Table 3. Experimental results of text length.

Sentence Length	Accuracy	Precision	Recall	F1
12	$92.2 %$	$94.1 %$	$90.1 %$	$92.1 %$
648	$87.5 %$	$94.3 %$	$80.1 %$	$86.7 %$

Table 4. Experimental results of dropout value.

Dropout	Accuracy	Precision	Recall	F1
$0.2$	$92.1 %$	$94.2 %$	$90.0 %$	$92.0 %$
$0.3$	$92.2 %$	$93.5 %$	$90.6 %$	$92.0 %$
$0.4$	$91.8 %$	$89.4 %$	$94.9 %$	$92.1 %$
$0.5$	$92.2 %$	$92.3 %$	$91.9 %$	$92.1 %$
$0.6$	$92.7 %$	$93.0 %$	$92.4 %$	$92.7 %$
$0.7$	$90.5 %$	$86.8 %$	$95.3 %$	$90.8 %$

Table 5. Model parameter setting table.

Parameter	Parameter Value
The Dimension of the Word Vector	768
The Length of the Input text	12
Dropout	0.6
Epoch Number	8
The Number of hidden neurons in BiLSTM Layer	128
The Number of hidden neurons in CNN Layer	128
The Size of the Convolutional Kernel	3 * 3
Learning Rate	0.01

Table 6. Comparison model experiment results.

Model	Accuracy	Precision	Recall	F1
CNN	$73.5 %$	$73.7 %$	$72.1 %$	$72.9 %$
CNN + Att	$77.6 %$	$78.1 %$	$76.0 %$	$77.0 %$
BiLSTM	$83.6 %$	$83.7 %$	$82.9 %$	$83.3 %$
BiLSTM + Att	$86.5 %$	$86.4 %$	$86.2 %$	$86.3 %$
BERT	$84.4 %$	$85.5 %$	$82.4 %$	$83.9 %$
BERT + BiLSTM + Att	$89.7 %$	$90.9 %$	$87.8 %$	$89.3 %$
AC-BiLSTM	$91.6 %$	$93.0 %$	$89.8 %$	$91.4 %$
ABCDM	$93.1 %$	$93.7 %$	$92.2 %$	$92.9 %$
OURS	$92.4 %$	$92.5 %$	$92.4 %$	$92.5 %$

Table 7. Comparison model experiment results.

Model	Accuracy	Precision	Recall	F1
−CNN − Lexion	$89.7 %$	$90.9 %$	$87.8 %$	$89.3 %$
−CNN + Lexion	$91.0 %$	$91.9 %$	$89.8 %$	$90.8 %$
+CNN − Lexion	$91.3 %$	$93.3 %$	$88.8 %$	$91.1 %$
+CNN + Lexion	$92.4 %$	$92.5 %$	$92.4 %$	$92.5 %$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Ma, Y.; Ma, Z.; Zhu, H. Weibo Text Sentiment Analysis Based on BERT and Deep Learning. Appl. Sci. 2021, 11, 10774. https://doi.org/10.3390/app112210774

AMA Style

Li H, Ma Y, Ma Z, Zhu H. Weibo Text Sentiment Analysis Based on BERT and Deep Learning. Applied Sciences. 2021; 11(22):10774. https://doi.org/10.3390/app112210774

Chicago/Turabian Style

Li, Hongchan, Yu Ma, Zishuai Ma, and Haodong Zhu. 2021. "Weibo Text Sentiment Analysis Based on BERT and Deep Learning" Applied Sciences 11, no. 22: 10774. https://doi.org/10.3390/app112210774

APA Style

Li, H., Ma, Y., Ma, Z., & Zhu, H. (2021). Weibo Text Sentiment Analysis Based on BERT and Deep Learning. Applied Sciences, 11(22), 10774. https://doi.org/10.3390/app112210774

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weibo Text Sentiment Analysis Based on BERT and Deep Learning

Abstract

1. Introduction

2. Related Work

2.1. Research on Sentiment Analysis Based on Sentiment Dictionary

2.2. Research on Sentiment Analysis Based on Deep Learning

2.3. Research on Sentiment Analysis Based on Transformer

3. Preliminary Knowledge

3.1. BERT

3.2. CNN

3.3. LSTM

4. Proposed Method

5. Experiment and Analysis

5.1. Experimental Setup

5.1.1. Dataset

5.1.2. Evaluation Criteria

5.1.3. Data Preprocessing

5.2. Baselines

5.3. Experimental Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI