Running a Sustainable Social Media Business: The Use of Deep Learning Methods in Online-Comment Short Texts

: With the prevalence of the Internet in society, social media has considerably altered the ways in which consumers conduct their daily lives and has gradually become an important channel for online communication and sharing activities. At the same time, whoever can rapidly and accurately disseminate online data among different companies affects their sales and competitiveness; therefore, it is urgent to obtain consumer public opinions online via an online platform. However, problems, such as sparse features and semantic losses in short-text online reviews, exist in the industry; therefore, this article uses several deep learning techniques and related neural network models to analyze Weibo online-review short texts to perform a sentiment analysis. The results show that, compared with the vector representation generated by Word2Vec’s CBOW model, BERT’s word vectors can obtain better sentiment analysis results. Compared with CNN, BiLSTM, and BiGRU models, the improved BiGRU-Att model can effectively improve the accuracy of the sentiment analysis. Therefore, deep learning neural network systems can improve the quality of the sentiment analysis of short-text online reviews, overcome the problems of the presence of too many unfamiliar words and low feature density in short texts, and provide an efﬁcient and convenient computational method for improving the ability to perform sentiment analysis of short-text online reviews. Enterprises can use online data to analyze and immediately grasp the intentions of existing or potential consumers towards the company or product through deep learning methods and develop new services or sales plans that are more closely related to consumers to increase competitiveness. When consumers experience the use of new services or products again, they may provide feedback online. In this situation, companies can use deep learning sentiment analysis models to perform additional analyses, forming a dynamic cycle to ensure the sustainable operation of their enterprises.


Introduction
In recent years, social media has become a major part of people's lives and has significantly impacted the way businesses communicate with their customers. However, with the increasing amount of online content and user-generated comments, it has become increasingly challenging for businesses to monitor and manage the sentiments and opinions of their customers. With the popularity of the Internet at present, the presence of social media is changing almost every aspect of people's daily lives and is gradually becoming an important channel for Internet users to communicate and share their ideas [1]. Because of their characteristics of convenience and rapidity, Microblog, WeChat, QQ, and Zhihu influence people's daily lives and help them to be aware of the latest current affairs and exchange their thoughts [2]. Microblog is a leading social media platform for people to create, share, and discover information online. It not only provides users with more freedom and a faster way to communicate, express their opinions, and record their feelings, but it also has the advantages of providing rich information, being simple to operate, providing rapid dissemination, and having a wide-ranging audience. As a result, increasingly more people have begun to use it [3,4]. According to the financial report data for Q3 on Microblog in 2018, the number of monthly active Microblog users increased to 446 million, continuing to maintain a year-on-year net growth of 70 million, and the number of daily active users also increased to 195 million [5].
Numerous short texts are created in the process of using Microblogs to communicate, share, and disseminate information, which includes people's emotions, such as happiness, sadness, and joy, and certain sentiments, such as like, dislike, and hate [6]. People desire to express their emotions while communicating through Microblogs, such as their views on news events, opinions on social hot topics, movie comments, and item evaluations, so that a high quantity of online data are generated every day. An effective sentiment analysis of these data on Microblog has many practical uses. For example, for consumers, it can provide a reference for purchasing goods. For e-commerce platforms, goods are marketed on the Microblog. It can also be used to help relevant government departments make decisions and improve network security and public-opinion monitoring systems [7,8].
However, online-comment short texts sent on Microblogs are different from traditional texts, such as those sent on forums and the news. Their specific characteristics are as follows: First, the number of words in the texts should be limited to 140. In addition to the general text and punctuation, the former contains a variety of emoticons, special symbols, stop words, and pictures [9,10]. Second, the language used is not standardized because the users' knowledge and language use are different; for example, network terms such as "skr" and "lemon essence" exist [11]. As time passes, online-comment short texts on Microblogs have the characteristics of a large amount of data, more neologisms, serious colloquialisms, and rich content, which pose a challenge to the technical requirements of mining users' sentiment orientations from large online-comment short texts presented on Microblogs [12]. In recent years, deep learning techniques have emerged as a powerful tool for analyzing and mining text data.  proposed a deep learning approach to Chinese Microblog topic modeling [3]. Jiang and Yang (2019) developed a hybrid approach for a product quality assessment based on the user-generated content presented on social media [4]. Huang et al. (2019) used a hierarchical deep learning model to perform finegrained emotion analysis on Chinese online reviews [5]. Nishida et al. (2019) proposed an active learning approach for social media text classification via query generation based on kernel density estimation [6]. Chen et al. (2020) developed a novel deep learningbased approach for detecting fake news on social media [7]. Furthermore, deep learning techniques have also been applied to text mining and sentiment analysis in social media Guo and Zhang, 2019) [8,9]. Convolutional and recurrent neural networks have also been compared in the literature for sentiment analysis (Fang et al., 2019) [10], and event extraction and rumor detection activities in social media have been tackled by novel deep learning approaches Fu et al., 2020) [13,14]. Therefore, a series of models are proposed in the present study for the construction and optimization of the sentiment analysis of online-comment short texts through deep learning technology and the related neural network model, aiming to rapidly analyze the sentiment analysis of onlinecomment short texts and facilitate the development of related e-commerce marketing, public opinion monitoring, and other businesses. Moreover, precision marketing and services have been used in the field to enhance enterprise competitiveness and seek opportunities for sustainable operations. If enterprises can use online data and conduct sentiment analyses through deep learning techniques, they can rapidly grasp the reactions and apprehend the intentions of existing and potential consumers concerning their products. Based on this information, enterprises can further modify, develop, or produce sales, services, and new products that are more in line with consumer needs at present. When consumers receive new services or products, they provide feedback online, which forms new online data. At this time, enterprises can use deep learning sentiment analysis models to perform further analyses, creating a dynamic cycle and achieving sustainable business operations.

The BERT Model
BERT (bidirectional encoder representations from transformers) is a representative model of bidirectional encoders based on Transformer [15]. The BERT model was proposed by Google in 2018, and it uses the Transformer encoder as the main structure and is stacked by the encoder structures of multiple Transformer models. The structure is presented in Figure 1 [16]. present. When consumers receive new services or products, they provide feedback online, which forms new online data. At this time, enterprises can use deep learning sentiment analysis models to perform further analyses, creating a dynamic cycle and achieving sustainable business operations.

The BERT Model
BERT (bidirectional encoder representations from transformers) is a representative model of bidirectional encoders based on Transformer [15]. The BERT model was proposed by Google in 2018, and it uses the Transformer encoder as the main structure and is stacked by the encoder structures of multiple Transformer models. The structure is presented in Figure 1 [16].  Figure 1 shows that the BERT model uses MLM (masked language model) and nextsentence prediction methods to capture sentiment representations at word and sentence levels, respectively [17]. The main structure of the BERT model is stacked by Transformer, and the text encoding of the Transformer is based on the attention mechanism [18]. Therefore, different weights can be assigned to each word according to the relationships evident between the words and sentences [19].

CNN (Convolutional Neural Network)
In the 1960s, Hubel and Wiesel proposed the CNN, whose concept comes from the neocognitron model, proposed by Japanese scholar Fukushima, based on the concept of the perceptual field [20]. The core of a CNN is to capture local correlations. The CNN is mainly used in the field of image recognition, and it has achieved great success in the recognition of different types of images, such as plants, animals, and food [21]. Yi used the CNN to identify different types of birds [22]. The convolution layer is the core of the whole CNN. Unlike the convolution layer in the image field, the convolution kernel no longer uses the square convolution kernel commonly used in the image field but proposes a rectangular convolution kernel, which presents a good effect on the short-range semantic acquisition of a text [23,24]. In a study, Sun proposed a model for text analysis using the CNN. The word-cut text obtained following the pre-processing stage is transformed into a word vector by the word vector tool as the input of the CNN, and the labeled texts are used to train the model. The trained model is used to extract the features of the text analysis [25]. The main features of the CNN are local perception, parameter sharing, multi-core convolutions, and downsampling. A CNN is good at extracting short-range semantic features in a text analysis [26].

Recurrent Neural Network
The RNN (recurrent neural network) is a kind of neural network model used for processing sequence data. In comparison to the traditional neural network model, the nodes in the hidden layer of the network model are connected, which makes the structure and parameters of the RNN shared at each time point [27]. The greatest feature of sequence  Figure 1 shows that the BERT model uses MLM (masked language model) and nextsentence prediction methods to capture sentiment representations at word and sentence levels, respectively [17]. The main structure of the BERT model is stacked by Transformer, and the text encoding of the Transformer is based on the attention mechanism [18]. Therefore, different weights can be assigned to each word according to the relationships evident between the words and sentences [19].

CNN (Convolutional Neural Network)
In the 1960s, Hubel and Wiesel proposed the CNN, whose concept comes from the neocognitron model, proposed by Japanese scholar Fukushima, based on the concept of the perceptual field [20]. The core of a CNN is to capture local correlations. The CNN is mainly used in the field of image recognition, and it has achieved great success in the recognition of different types of images, such as plants, animals, and food [21]. Yi used the CNN to identify different types of birds [22]. The convolution layer is the core of the whole CNN. Unlike the convolution layer in the image field, the convolution kernel no longer uses the square convolution kernel commonly used in the image field but proposes a rectangular convolution kernel, which presents a good effect on the short-range semantic acquisition of a text [23,24]. In a study, Sun proposed a model for text analysis using the CNN. The word-cut text obtained following the pre-processing stage is transformed into a word vector by the word vector tool as the input of the CNN, and the labeled texts are used to train the model. The trained model is used to extract the features of the text analysis [25]. The main features of the CNN are local perception, parameter sharing, multi-core convolutions, and downsampling. A CNN is good at extracting short-range semantic features in a text analysis [26].

Recurrent Neural Network
The RNN (recurrent neural network) is a kind of neural network model used for processing sequence data. In comparison to the traditional neural network model, the nodes in the hidden layer of the network model are connected, which makes the structure and parameters of the RNN shared at each time point [27]. The greatest feature of sequence data-processing behavior is that the output of the current moment contains the output of the previous moment, which continues to preserve the information of the previous moment through the cyclic preservation process. The feature provides the RNN with "memory" [28]. Figure 2 presents the structure of the RNN. data-processing behavior is that the output of the current moment contains the output of the previous moment, which continues to preserve the information of the previous moment through the cyclic preservation process. The feature provides the RNN with "memory" [28]. Figure 2 presents the structure of the RNN.
Here, t and t + 1 are the time series; x is the input sample; st is the memory of the sample at time t; U is the weight from the input layer to the hidden layer; w is the weight of the sample in the hidden layer; V is the weight of the output sample (w, U, and V are shared at all times); and f and g are activation functions-f can be tanh, relu, sigmoid, and other activation functions, and g can be SoftMax and other functions. In the initial state, st is usually set to 0, and the network hidden-layer output ot = softmax(Vst) at time t. The memory of the RNN is limited. With the increase in sequence length, the amount of longdistance information that can be memorized reduces and the information-processing time increases, resulting in the problem of gradient disappearance and explosion. Given this situation, researchers also proposed an LSTM (long short-term memory) model, which controls the input and output values of the current time information through the structure of the "gate", including the forgetting, input, and output gates. In order to simplify the structure of the model, the GRU (gated recurrent unit), as a variant of the LSTM, can also solve the problem of gradient disappearance and explosion in the long-term dependence of recurrent neural networks, such as the LSTM model [30]. When the RNN is expanded according to time, the calculation method of the cyclic network can be obtained, as shown in Equations (1) and (2) [29]: Here, t and t + 1 are the time series; x is the input sample; s t is the memory of the sample at time t; U is the weight from the input layer to the hidden layer; w is the weight of the sample in the hidden layer; V is the weight of the output sample (w, U, and V are shared at all times); and f and g are activation functions-f can be tanh, relu, sigmoid, and other activation functions, and g can be SoftMax and other functions. In the initial state, s t is usually set to 0, and the network hidden-layer output o t = softmax(Vst) at time t. The memory of the RNN is limited. With the increase in sequence length, the amount of long-distance information that can be memorized reduces and the information-processing time increases, resulting in the problem of gradient disappearance and explosion. Given this situation, researchers also proposed an LSTM (long short-term memory) model, which controls the input and output values of the current time information through the structure of the "gate", including the forgetting, input, and output gates. In order to simplify the structure of the model, the GRU (gated recurrent unit), as a variant of the LSTM, can also solve the problem of gradient disappearance and explosion in the long-term dependence of recurrent neural networks, such as the LSTM model [30].

The BiGRU Network Model
The BiGRU (bi-directional gated recurrent unit) network model combines two common unidirectional GRU networks into a two-way GRU network structure to obtain the sequence characteristics of the text. The structure of the network model of BiGRU is presented in Figure 3 [31].

The BiGRU Network Model
The BiGRU (bi-directional gated recurrent unit) network model combines two common unidirectional GRU networks into a two-way GRU network structure to obtain the sequence characteristics of the text. The structure of the network model of BiGRU is presented in Figure 3 [31].  Figure 3 shows that, since a single GRU neural network lacks consideration of the following information in a text analysis, a two-way GRU network can be used to obtain the semantics in the context [17]. Among them, V (W1), V (W2), and V (W3) represent the word vectors transformed by the vector representation model, and h1, h2, and h3 are the final vectors obtained by splicing following the feature selection of forwarding and reverse GRU networks [32].

The Attention Mechanism
As shown in Figure 4, V (W1), V (W2), and V (W3) are the word vectors transformed by the model using vector representation, and h1, h2, and h3 are the final vectors obtained by splicing the forward and reverse GRU feature selections and input into the attention mechanism. y is the final result achieved following the analysis [33]. The specific calculations are presented in Equations (3)-(6):   Figure 3 shows that, since a single GRU neural network lacks consideration of the following information in a text analysis, a two-way GRU network can be used to obtain the semantics in the context [17]. Among them, V (W 1 ), V (W 2 ), and V (W 3 ) represent the word vectors transformed by the vector representation model, and h 1 , h 2 , and h 3 are the final vectors obtained by splicing following the feature selection of forwarding and reverse GRU networks [32].

The Attention Mechanism
As shown in Figure 4, V (W 1 ), V (W 2 ), and V (W 3 ) are the word vectors transformed by the model using vector representation, and h 1 , h 2 , and h 3 are the final vectors obtained by splicing the forward and reverse GRU feature selections and input into the attention mechanism. y is the final result achieved following the analysis [33]. The specific calculations are presented in Equations (3)-(6): Here, t = 1, 2, 3 . . . , u is the parameter of the attention mechanism, h is the output vector value, and at is the normalized attention score, which is obtained by Equation (5). y is the summation of the product of output vector value h in the BiGRU layer and the normalized attention score a t , as presented in Equation (6). Here, t = 1, 2, 3…, u is the parameter of the attention mechanism, h is the output vector value, and at is the normalized attention score, which is obtained by Equation (5). y is the summation of the product of output vector value h in the BiGRU layer and the normalized attention score at, as presented in Equation (6).

The Vector Representation Model of Online-Comment Short Texts
In the research conducted on sentiment analysis methods based on deep learning networks, the vectorization of texts is an important step, and it is necessary to use vector representation technology to represent the text as a vector and then as the input of the sentiment analysis model based on deep learning methods.

Construction of the Vector Representation Model
The vector representation model is used to represent texts as vectors, thereby providing complete data for the sentiment analysis model based on deep learning methods. Prior to the text vector representation stage, the texts must be pre-processed, and then vector representation models, such as Word2Vec (Word To Vector), are used to transform texts into vectors. The vector representation process of online-comment short texts on Microblog is presented in Figure 5.

The Vector Representation Model of Online-Comment Short Texts
In the research conducted on sentiment analysis methods based on deep learning networks, the vectorization of texts is an important step, and it is necessary to use vector representation technology to represent the text as a vector and then as the input of the sentiment analysis model based on deep learning methods.

Construction of the Vector Representation Model
The vector representation model is used to represent texts as vectors, thereby providing complete data for the sentiment analysis model based on deep learning methods. Prior to the text vector representation stage, the texts must be pre-processed, and then vector representation models, such as Word2Vec (Word To Vector), are used to transform texts into vectors. The vector representation process of online-comment short texts on Microblog is presented in Figure 5. Here, t = 1, 2, 3…, u is the parameter of the attention mechanism, h is the output vector value, and at is the normalized attention score, which is obtained by Equation (5). y is the summation of the product of output vector value h in the BiGRU layer and the normalized attention score at, as presented in Equation (6).

The Vector Representation Model of Online-Comment Short Texts
In the research conducted on sentiment analysis methods based on deep learning networks, the vectorization of texts is an important step, and it is necessary to use vector representation technology to represent the text as a vector and then as the input of the sentiment analysis model based on deep learning methods.

Construction of the Vector Representation Model
The vector representation model is used to represent texts as vectors, thereby providing complete data for the sentiment analysis model based on deep learning methods. Prior to the text vector representation stage, the texts must be pre-processed, and then vector representation models, such as Word2Vec (Word To Vector), are used to transform texts into vectors. The vector representation process of online-comment short texts on Microblog is presented in Figure 5.   Figure 5 shows that the online-comment short texts presented on Microblog must be pre-processed and feature-selected prior to the vector representation step. The preprocessing step can clean out the redundant and disorderly data present in the short texts; feature selection reduces the number of features present and the feature dimensions. Following the vector representation step, the feature vectors of multiple features must be spliced together to form the final text vector representation.
A vector representation method of online-comment short texts on Microblog based on the BERT model was proposed to transform the short texts into vector representations that could input the sentiment analysis model, aiming at the problem of sparse features and a lack of short-text semantics on Microblog. This method uses the BERT model to embed words into the pre-processed short texts and analyzes a polysemy while the texts are transformed into vectors, thus generating more accurate text representation vectors.

The Vector Representation Model of Short Texts Based on the BERT Model
At present, Word2Vec and Glove are mostly used in the field to train word vectors in the sentiment analysis models of online-comment short texts on Microblog based on deep learning methods. However, word vectors trained by these vector representation models belong to static vector representations; that is, the vector representation results of the same word in different contexts are the same. This leads to some deviations in the results of the sentiment analysis. For example, the word "apple" can represent either fruit or the Apple company. If the word "apple" is represented as the same vector in these different short texts, a certain deviation is evident for subsequent sentiment analyses. A text vector representation method based on BERT embedding was proposed to improve the semantic representation of the vector of the short texts since numerous polysemous words exist in the online-comment short texts on Microblog and the word vector models, such as Word2Vec, cannot express and process them well.
Each word input in the BERT model consists of three parts: the word vector that represents the current word encoding method (token embeddings), the segment vector that represents the position encoding of the current word in the sentence (segment embeddings), and the position vector that represents the position encoding of the current word in the sentence (position embeddings). The details are presented in Figure 6. Figure 5 shows that the online-comment short texts presented on Microblog must be pre-processed and feature-selected prior to the vector representation step. The pre-processing step can clean out the redundant and disorderly data present in the short texts; feature selection reduces the number of features present and the feature dimensions. Following the vector representation step, the feature vectors of multiple features must be spliced together to form the final text vector representation.
A vector representation method of online-comment short texts on Microblog based on the BERT model was proposed to transform the short texts into vector representations that could input the sentiment analysis model, aiming at the problem of sparse features and a lack of short-text semantics on Microblog. This method uses the BERT model to embed words into the pre-processed short texts and analyzes a polysemy while the texts are transformed into vectors, thus generating more accurate text representation vectors.

The Vector Representation Model of Short Texts Based on the BERT Model
At present, Word2Vec and Glove are mostly used in the field to train word vectors in the sentiment analysis models of online-comment short texts on Microblog based on deep learning methods. However, word vectors trained by these vector representation models belong to static vector representations; that is, the vector representation results of the same word in different contexts are the same. This leads to some deviations in the results of the sentiment analysis. For example, the word "apple" can represent either fruit or the Apple company. If the word "apple" is represented as the same vector in these different short texts, a certain deviation is evident for subsequent sentiment analyses. A text vector representation method based on BERT embedding was proposed to improve the semantic representation of the vector of the short texts since numerous polysemous words exist in the online-comment short texts on Microblog and the word vector models, such as Word2Vec, cannot express and process them well.
Each word input in the BERT model consists of three parts: the word vector that represents the current word encoding method (token embeddings), the segment vector that represents the position encoding of the current word in the sentence (segment embeddings), and the position vector that represents the position encoding of the current word in the sentence (position embeddings). The details are presented in Figure 6.   Figure 6 shows that each text, such as Text 1 and Text 2, that composes the input texts, comprehensively considers the relationship between texts prior to the use of the BERT to train the text vector, which is reflected by the segmented and position vectors, respectively. The core idea of the BERT model for text vector representation is to calculate the relationship that exists between the texts in a sentence, which reflects the semantic correlation between different texts. Then, the weight is allocated according to the importance of each text by using the coding mechanism of Transformer, so that the vector representation of the text contains more abundant semantics. Compared with the text vector of Word2Vec, the text vector based on the BERT model is more universal in its application.

Evaluation Indicators
The universal evaluation criteria were used to evaluate the effect of sentiment analyses on the experimental model. Precision, recall, and F1-scores under positive-and negativeemotion categories were used as the evaluation indicators for the effect of the model analysis. Indicators and their corresponding contents are presented in Table 1. Table 1. Evaluation indicators and their meanings.

TP Both prediction and practice are positive FP
The prediction is positive, but the reality is negative TN Both prediction and reality are negative FN The prediction is negative, but the reality is positive The calculation equations for precision, recall, and F1-scores under positive-and negative-emotion categories are presented in Equations (7) In the equations above, P is the precision rate, and R is the recall rate. Accuracy represents the analytical performance of the training model for the sample, as shown in Equation (13).

The Sentiment Analysis Model Based on the BiGRU-Att Model
Increasingly more words are used to express people's opinions and emotions, which poses a challenge to sentiment analysis techniques based on deep learning methods. The problems of sentiment analysis based on deep learning methods are less-advanced data pre-processing technology, inaccurate text vector representation, and feature extraction dependent on deep neural networks. The sentiment analysis process of online-comment short texts based on deep learning methods includes data pre-processing, text vector representation, and sentiment analysis methods, as shown in Figure 7.
poses a challenge to sentiment analysis techniques based on deep learning methods. Th problems of sentiment analysis based on deep learning methods are less-advanced dat pre-processing technology, inaccurate text vector representation, and feature extractio dependent on deep neural networks. The sentiment analysis process of online-commen short texts based on deep learning methods includes data pre-processing, text vector rep resentation, and sentiment analysis methods, as shown in Figure 7.  Figure 7 shows that the data pre-processing method mainly includes data cleaning text segmentation, and the removal of stop words. The data cleaning process remove content that is not relevant to subsequent sentiment analysis tasks, such as various sym bols. Text segmentation is the use of word segmentation tools to perform text segmenta tion, especially knot segmentation. Text vector representation is mainly used to represen the pre-processed text as a vector using Word2Vec, Glove, and other neural networks, an then as the input of the deep learning model. Sentiment analysis refers to the use of ma chine learning, deep learning, and other technologies to construct models to perform th sentiment analysis of texts. Sentiment analysis based on deep learning methods shoul first train the model with the labeled data set, then use the trained optimal paramete model to extract the text feature, and finally use the analyzer to analyze the sentimen analysis, obtaining the final sentiment analysis results. Performance evaluation uses com mon evaluation criteria to evaluate the performance of different emotion analysis models

The Construction of the Sentiment Analysis Model Based on the BiGRU-Att Mode
Since the current sentiment analysis method of online-comment short texts on Mi croblog based on the deep neural network does not focus on the importance of sentimen words or phrases in the extraction of text sentiment features, the attention mechanism i introduced, and a BiGRU-Att (bi-directional gated recurrent unit attention) model base on the attention mechanism was proposed by combining the BiGRU deep neural network The attention mechanism can assign different weights to the feature words with differen sentiment contributions and add more weight to the words with more sentiment contri butions. The BiGRU model can effectively obtain the semantic information of online-com ment short texts on the Microblog. The model structure is presented in Figure 8.  Figure 7 shows that the data pre-processing method mainly includes data cleaning, text segmentation, and the removal of stop words. The data cleaning process removes content that is not relevant to subsequent sentiment analysis tasks, such as various symbols. Text segmentation is the use of word segmentation tools to perform text segmentation, especially knot segmentation. Text vector representation is mainly used to represent the pre-processed text as a vector using Word2Vec, Glove, and other neural networks, and then as the input of the deep learning model. Sentiment analysis refers to the use of machine learning, deep learning, and other technologies to construct models to perform the sentiment analysis of texts. Sentiment analysis based on deep learning methods should first train the model with the labeled data set, then use the trained optimal parameter model to extract the text feature, and finally use the analyzer to analyze the sentiment analysis, obtaining the final sentiment analysis results. Performance evaluation uses common evaluation criteria to evaluate the performance of different emotion analysis models.

The Construction of the Sentiment Analysis Model Based on the BiGRU-Att Model
Since the current sentiment analysis method of online-comment short texts on Microblog based on the deep neural network does not focus on the importance of sentiment words or phrases in the extraction of text sentiment features, the attention mechanism is introduced, and a BiGRU-Att (bi-directional gated recurrent unit attention) model based on the attention mechanism was proposed by combining the BiGRU deep neural network. The attention mechanism can assign different weights to the feature words with different sentiment contributions and add more weight to the words with more sentiment contributions. The BiGRU model can effectively obtain the semantic information of online-comment short texts on the Microblog. The model structure is presented in Figure 8. Figure 8 shows that the entire BiGRU-Att model is mainly divided into four layers: the input, BiGRU, attention, and output layers of the sentiment analysis. The contents of each layer are as follows: The input layer: The vector matrix of the text is input. Given text d = {s 1 , s 2 , s n }, sn denotes the sentence that constitutes the text. Sentences can be regarded as the sequence information of words. Given sentences s = {W 1 , W 2 , W n }, Wn denotes words that compose the sentences. Word2Vec and other word vector representation tools are used to convert W into the form of the word vector V (Wn), and the dimension of the word vector is m. Then, the input text is represented in the form of the n × m vector matrix, providing complete data for the BiGRU layer.  Figure 8 shows that the entire BiGRU-Att model is mainly divided into four layers: the input, BiGRU, attention, and output layers of the sentiment analysis. The contents of each layer are as follows: The input layer: The vector matrix of the text is input. Given text d = {s1, s2, sn}, sn denotes the sentence that constitutes the text. Sentences can be regarded as the sequence information of words. Given sentences s = {W1, W2, Wn}, Wn denotes words that compose the sentences. Word2Vec and other word vector representation tools are used to convert W into the form of the word vector V (Wn), and the dimension of the word vector is m. Then, the input text is represented in the form of the n × m vector matrix, providing complete data for the BiGRU layer.
The BiGRU layer: Used to extract the semantic information features of online-comment short texts on the Microblog. A bidirectional GRU network is used. One GRU direction is in the positive-order direction of the input sequence, and the other is in the inverseorder direction of the input sequence. In the feature extraction of the input sequence, there is no shared state between GRUs in the two directions. The state of GRUs in the positive direction is only transferred along the positive direction, and the reverse direction is only transferred along the reverse direction. However, at the same time, the output results of the GRU in two directions are spliced as the output values of the entire BiGRU layer. This process considers the semantic information presented above as well as the semantic information below.
The attention layer: A structure added between the BiGRU and output layers. It uses the splicing vectors output by all the BiGRU networks for the input of automatic learning features. According to the contribution of each word, it dynamically assigns different attention weights to each word and adds additional weights to the most relevant features of the sentiment analysis, so that the sentiment characteristics of the text are more obvious. Therefore, the output vector of the attention layer considers the context and semantic information of the text and key sentiment word features.
The analysis output layer: Selects a SoftMax analyzer to perform the final sentiment analysis. After the attention layer assigns the weights to the BiGRU layer output features, the results are input into the SoftMax analyzer. Analysts output the final results of integration in the form of arrays. Since positive and negative emotions were evident in the The BiGRU layer: Used to extract the semantic information features of online-comment short texts on the Microblog. A bidirectional GRU network is used. One GRU direction is in the positive-order direction of the input sequence, and the other is in the inverse-order direction of the input sequence. In the feature extraction of the input sequence, there is no shared state between GRUs in the two directions. The state of GRUs in the positive direction is only transferred along the positive direction, and the reverse direction is only transferred along the reverse direction. However, at the same time, the output results of the GRU in two directions are spliced as the output values of the entire BiGRU layer. This process considers the semantic information presented above as well as the semantic information below.
The attention layer: A structure added between the BiGRU and output layers. It uses the splicing vectors output by all the BiGRU networks for the input of automatic learning features. According to the contribution of each word, it dynamically assigns different attention weights to each word and adds additional weights to the most relevant features of the sentiment analysis, so that the sentiment characteristics of the text are more obvious. Therefore, the output vector of the attention layer considers the context and semantic information of the text and key sentiment word features.
The analysis output layer: Selects a SoftMax analyzer to perform the final sentiment analysis. After the attention layer assigns the weights to the BiGRU layer output features, the results are input into the SoftMax analyzer. Analysts output the final results of integration in the form of arrays. Since positive and negative emotions were evident in the study, the content in the array represents the probability that the text's emotion is positive or negative.

The Contrast Experiment
The sentiment analysis model based on the BiGRU-Att model proposed in the study was compared with the following three commonly used sentiment analysis models to verify its effect:

•
The CNN sentiment analysis model.

•
The BiLSTM (bi-directional long short-term memory) emotion analysis model. The comparative experiment of the sentiment analysis model based on the BiLSTM model shows that the number of hidden neurons is 100 and the activation function is tanh.

•
The BiGRU emotion analysis model. The sentiment analysis model based on BiGRU is the BiGRU-Att model, which directly connects the output results of BiGRU to the output layer of the emotion analysis following the removal of the attention layer.

Experimental Results and Analysis of the Short-Text Vector Representation Model Based on the BERT Model
In order to improve the quality of word vectors and the semantic representation ability of online-comment short texts on the Microblog, a vector representation method based on the BERT model was proposed to solve the problem of the sparse feature of online-comment short texts on the Microblog. The effects of two different word embedding methods were compared in the experiment. BERT and Word2Vec were used to obtain the vector of the text, and the CNN model was selected as the analysis model for the comparative experiment. The experimental results are presented in Figure 9. study, the content in the array represents the probability that the text's emotion is positive or negative.

The Contrast Experiment
The sentiment analysis model based on the BiGRU-Att model proposed in the study was compared with the following three commonly used sentiment analysis models to verify its effect: • The CNN sentiment analysis model.

•
The BiLSTM (bi-directional long short-term memory) emotion analysis model. The comparative experiment of the sentiment analysis model based on the BiLSTM model shows that the number of hidden neurons is 100 and the activation function is tanh.

•
The BiGRU emotion analysis model. The sentiment analysis model based on BiGRU is the BiGRU-Att model, which directly connects the output results of BiGRU to the output layer of the emotion analysis following the removal of the attention layer.

Experimental Results and Analysis of the Short-Text Vector Representation Model Based on the BERT Model
In order to improve the quality of word vectors and the semantic representation ability of online-comment short texts on the Microblog, a vector representation method based on the BERT model was proposed to solve the problem of the sparse feature of onlinecomment short texts on the Microblog. The effects of two different word embedding methods were compared in the experiment. BERT and Word2Vec were used to obtain the vector of the text, and the CNN model was selected as the analysis model for the comparative experiment. The experimental results are presented in Figure 9.  Figure 9 shows that the vector representation based on the BERT model is better than that based on the Word2Vec model in the sentiment analysis. The reason for this result was that the BERT transformer mechanism could obtain more semantic representations, which can be observed in the derivation of the word vector representation model in the second chapter. In summary, the effect of the pre-training word vector using the BERT model is better than that of the CBOW (continuous bag-of-word model) based on Word2Vec.  Figure 9 shows that the vector representation based on the BERT model is better than that based on the Word2Vec model in the sentiment analysis. The reason for this result was that the BERT transformer mechanism could obtain more semantic representations, which can be observed in the derivation of the word vector representation model in the second chapter. In summary, the effect of the pre-training word vector using the BERT model is better than that of the CBOW (continuous bag-of-word model) based on Word2Vec.

Experimental Results and Analysis of Sentiment Analysis Model Based on the BiGRU-Att Model
In order to verify the effectiveness of the BiGRU-Att model proposed in this study, the two vectors presented in Section 3.1 were used to represent the word vector trained by the model as the input of the emotion analysis model. The experimental results obtained are as follows: The experimental results of the sentiment analysis model based on deep learning methods under the word vector based on the Word2Vec model are presented in Figure 10.

Model
In order to verify the effectiveness of the BiGRU-Att model proposed in this study, the two vectors presented in Section 3.1 were used to represent the word vector trained by the model as the input of the emotion analysis model. The experimental results obtained are as follows: The experimental results of the sentiment analysis model based on deep learning methods under the word vector based on the Word2Vec model are presented in Figure 10.  Figure 10 shows that, among the four sentiment analysis models based on the Word2Vec model, the BiGRU-Att sentiment analysis model proposed in this study presents the highest accuracy among the four models, reaching 98.09%. Although the LSTM and GRU models can process long sequences, the experimental results show that the accuracy of the BiGRU model is 97.58%, which is higher than that of the BiLSTM model. The experimental results also show that the accuracy increases by 0.53% compared with the unfocused BiGRU model.
The experimental results of the sentiment analysis model based on deep learning methods under the word vector based on the BERT model are presented in Figure 11.  Figure 10 shows that, among the four sentiment analysis models based on the Word2Vec model, the BiGRU-Att sentiment analysis model proposed in this study presents the highest accuracy among the four models, reaching 98.09%. Although the LSTM and GRU models can process long sequences, the experimental results show that the accuracy of the BiGRU model is 97.58%, which is higher than that of the BiLSTM model. The experimental results also show that the accuracy increases by 0.53% compared with the unfocused BiGRU model.
The experimental results of the sentiment analysis model based on deep learning methods under the word vector based on the BERT model are presented in Figure 11.  Figure 11 shows that, among the four sentiment analysis models based on the word vector of the BERT model, the accuracy of the BiGRU-Att model is the highest, reaching 98.32%. Compared with the BiGRU model without the attention mechanism, the accuracy is increased by 0.39%, which also verifies that the BiGRU sentiment analysis model with an attention mechanism can improve the accuracy of the sentiment analysis.
The advantages of the BiGRU-Att model proposed in this study are the following: The GRU can effectively record the temporal information of the text; the BiGRU not only learns the semantic information mentioned above but also the semantic information presented below; different words are assigned different weights following the introduction  Figure 11 shows that, among the four sentiment analysis models based on the word vector of the BERT model, the accuracy of the BiGRU-Att model is the highest, reaching 98.32%. Compared with the BiGRU model without the attention mechanism, the accuracy is increased by 0.39%, which also verifies that the BiGRU sentiment analysis model with an attention mechanism can improve the accuracy of the sentiment analysis.
The advantages of the BiGRU-Att model proposed in this study are the following: The GRU can effectively record the temporal information of the text; the BiGRU not only learns the semantic information mentioned above but also the semantic information presented below; different words are assigned different weights following the introduction of the attention mechanism, focusing on important sentiment words in the text. From the abovementioned experimental results, we observed that the BiGRU-Att sentiment analysis model proposed in this study can effectively improve the accuracy of the sentiment analysis of online-comment short texts on the Microblog.

Conclusions
With the popularity of social networking tools such as Microblog and WeChat, short online comments have become the disseminators of people's online dialogues and interactions. The sentiment analysis of online-comment short texts on Microblogs is, at present, a popular direction in the field of natural language processing techniques. The new word-discovery method, vector representation model, and sentiment analysis model based on the deep learning methods of online-comment short texts on Microblog were studied in this paper. The main subjects of this study were as follows: (1) Aiming at the problem of the sparse feature of short texts on the Microblog, a short-text representation model of online-comment short texts based on BERT embedding was constructed. In comparative experiments, the influence of two different word vector representation models based on BERT and Word2Vec on the accuracy of the sentiment analysis model based on deep learning methods was explored. (2) The application of deep learning methods to the sentiment analysis of short texts on Microblog was studied. Following the introduction of the attention mechanism, the BiGRU-Att sentiment analysis model of online-comment short texts on Microblog was designed. Through the comparative experiments we performed, the BiGRU-Att analysis model presented high-accuracy results for the data set compared with the CNN, BiLSTM, and BiGRU models, which proves that the correct addition of attention mechanisms can improve the accuracy of the sentiment analysis model of online-comment short texts on the Microblog. This study explored how to use online data and deep learning methods for sentiment analysis, which helps companies rapidly understand the reactions and purchase intentions of current and potential consumers towards their products. It further enables them to modify, develop, or produce sales, services, and new products that better meet consumer demands at present, creating new online data and achieving sustainable business operations through dynamic cycles.

Research Limitations and Future Research Directions
In this study, we present some of the shortcomings of the present research and some additional thoughts on future research directions, and the following two aspects should be emphasized and studied further: (1) Through our experiments, we observed that the text embedding method based on the BERT model had a better effect following the sentiment analysis. In the future, more auxiliary sentiment features, such as sentiment images, user portraits, and social networks, should be fused based on BERT. (2) Only the positive and negative categories were studied in our study, neglecting other emotions. Therefore, the follow-up research must focus on other emotions, such as surprise and joy.