4. Experimental Results and Discussion
We first assessed the efficiency of various models on predicting VA values of a word on the official data from the Dimensional Sentiment Analysis for Chinese Phrases (DSA_P) Competition, which included 2802 training and 750 testing instances. In our experiments, we used a dropout rate of 0.2, a batch size of 64, and the Adam optimizer with 0.01 learning rate. The VA prediction model was implemented using Keras (
https://keras.io/). Mean Absolute Error (MAE) and Pearson Correlation Coefficient (PCC) were the two metrics employed to validate these approaches. MAE’s definition is as listed in Equation (2). It aims at reflecting the overall difference between actual and estimated values. Therefore, a smaller MAE indicates a better estimate. PCC provides the correlation between these values, with a range of −1 to 1. A PCC that is close to 1 indicates a higher correlation between the two numbers; if a value is within 0 to 0.09, it indicates no correlation; 0.1 to 0.3 shows a low correlation; 0.3 to 0.5 a medium correlation; and greater than 0.5 indicates a clear correlation between these numbers. In Equation (3), we denote PCC as
.
refers to the correct response,
refers to the outcome of the model (valence or arousal values), and
refers to the count of test samples.
and
are the arithmetic mean, respectively, of the previous
and
, and
σ is the standard deviation.
As shown in
Table 1, the proposed method is compared with the top three competitors in the DSA_P word Valence and Arousal prediction task [
30], i.e., THU_NGN, AL_I_NLP, and CKIP. Our method is outstanding in predicting the dimensional sentiment of Chinese words, achieving a comparable performance with the highest-ranking competitor (i.e., THU_NGN). For the valence values, MAE is reduced by 0.543 and PCC increased to 88.7%. This indicates a very high correlation between the model prediction and the correct values. Similar observations can be made regarding the arousal value prediction task. The MAE is considerably lowered by 0.855, and PCC is increased to 68.9%.
In the second experiment, we utilized the outputs from the prediction of valence-arousal model and integrated them into a hybrid Deep Neural Networks (DNN) for the classification of the overall sentiment of a comment on social media. Because the goal of this research is to recognize public opinion to assist government social media management, only coarse-grained sentiment categories (positive and negative) were considered in this experiment. Moreover, in order to thoroughly prove the effectiveness of the proposed model, we conducted two experiments. First, we validated the dataset from the Natural Language Processing and Chinese Computing (NLPCC) 2014 competition (
http://tcci.ccf.org.cn/conference/2014/pages/page04_sam.html) of sentiment classification for Chinese product reviews, which contained multiple domains such as books, DVDs, and electronics. There were 10,000 examples for training and 2500 for testing. In this experiment, we compared the proposed model with the best teams in this competition, which is denoted as NNLM [
31]. In order to demonstrate the generalization ability of the proposed model, we conducted the second experiment using the E-commerce service review dataset (ECSR) (
https://github.com/renjunxiang/Text-Classification). This dataset consists mainly of the review comments for TV products and distribution services collected from several E-commerce websites. In this dataset, the average length of review comments was 72 words. Each review was given a sentiment tag:
positive or negative. The data contained a total of 4212 reviews, in which 1883 were positive and 2329 were negative. We performed 10-fold cross validation to examine the performance. Here, we set the dropout rate as 0.25 and the batch size as 32, and we used the Adam optimizer with a learning rate of 0.01. Our proposed BiLSTM-based sentiment classification approach was implemented using Keras. We used precision, recall, and F
1-measure for our evaluations [
32]. Furthermore, we calculated the macro average of these metrics for the overall comparison. Precisely, letting
Ci be the corpus in our studies, we calculated precision, recall, and F
1-measure
P(
Ci),
R(
Ci),
F1(
Ci), and micro-average
Fμ, as in Equations (4)–(7).
where
TP(
Ci) indicates the quantity of correct positive cases and
FP(
Ci) denotes the number of false positives (namely, negative cases that are wrongly classified as positives). Analogously,
TN(
Ci) and
FN(
Ci) indicate the number of true negatives and false negatives, respectively. To systematically assess the relative effectiveness of the compared methods, the F
1 value is also used.
Next, we evaluated the performance of the embeddings to demonstrate the effectiveness of our novel text representation method.
Table 2 shows the gain in performance after applying LLR and VA, denoted as Em
BERT+LLR + BiLSTM_Att and Em
BERT+LLR+VA + BiLSTM_Att, respectively. To provide an all-inclusive performance evaluation, we compared our method to the state-of-the-art system (denoted as NNLM) of the NLPCC 2014 dataset. As shown in this table, the Em
BERT +BiLSTM_Att can achieve about 74% and 87% F
1-score on NLPCC 2014 and ECSR datasets, respectively. By using LLR features, we further improved the system performance because it successfully discriminates words that are highly correlated with a certain emotion, thereby boosting the BiLSTM’s ability to find representative lexicons of sentiment. Moreover, the VA features improve the F
1 score significantly. It indicates that when the model considers both valence value as the polarity and arousal value as the strength of sentiment, it can greatly improve the effectiveness of sentiment analysis. Notably, our method outperforms the comparison in each and every category. This is because our method infuses emotion-specific VA features to BiLSTM with the attention mechanism, thereby effectively enhancing its ability to correctly identify the sentiment of Chinese product reviews. According to the above experiment results, our method can indeed improve the performance by providing more detailed emotional knowledge to enhance the effectiveness of sentiment classifiers, and thus it achieves remarkable performances in different types of sentiment classification datasets. The source codes of the proposed method and comparisons can be found in GitHub (
https://github.com/yuyachengg/sentiment).
The above experiments quantitatively evaluate the performance of the proposed method. To gain a deeper insight into the Facebook fan page of the Ministry of Health and Welfare, we carried out a case study specifically on the posts in Apr. 2020. During this period, there was no local confirmed case of COVID-19 for 13 consecutive days in Taiwan; however, one case was confirmed among the
Dunmu navy crew employed by the Ministry of National Defense. The Ministry of Health and Welfare released 16 posts (among a total of 5744 posts) through its Facebook fan page, which attracted lively discussion from netizens. We first predicted the sentiment behind the text of posts and comments. Next, we used word clouds to visualize the categorized positive and negative keywords for each sentiment and color-coded them for clarity. It is intended for the readers to easily associate sentiments with their corresponding terms. The word cloud was built from the top fifty words with higher LLR values in each of the positive and negative sentiment categories. Moreover, we used larger font sizes for words with higher LLR weights.
Figure 6 shows the resulting word cloud. We can observe that the polarity of sentiments in comments can be influenced by the polarity of the terms. For clarity, green terms denote positive comments and red ones are negative.
Through our approach, we are not only able to easily identify that this was a post that brought positive sentiment to the public, but we could also see what the topic was that provokes sentiment among the public. For the positive sentiment part, words including “Epidemic, Ministry of National Defense, Ministry of Health and Welfare, Entire, Government, Team, Nation” illustrate people’s affirmation of the excellent control over the epidemic in Taiwan with the cooperation of government teams as well as the whole nation. For example, one comment mentioned: “Thank you to all the anti-epidemic personnel for their hard work and the cooperation of all Taiwanese. Fight on, Taiwan! and the world will also work hard to survive this pandemic.” We also discovered positive messages about the Ministry of National Defense’s handling of the viral infection on the Panshih ship during the Dunmu remote training mission. Some netizens stated: “Thanks to the strong mobilization of the Ministry of National Defense. The crew on board was urgently recalled when the incident occurred.” This prompted even more people to cheer on the crew members, such as “Dunmu soldiers have worked hard, fight on!” In addition, the Minister of Health and Welfare and Dr. Chen Shih-Chung have been given positive recognition from the public. Many netizens left comments under posts to thank the Minister for his leadership. For instance, “Thanks to Minister Chen Shih-Chung, all the epidemic prevention personnel, and people from all over Taiwan.” “We must trust Minister Chen Shih-Chung. Cheer for Taiwan.”, “Thank you, Minister Chen Shih-Chung, for being cautious. Minister must pay attention to his own health as well.” It can be stated that he not only brought forth positive feelings to the public but also created a stream of positive energy all across the country amid the COVID-19 pandemic.
For the negative public opinions part, words like “Neck guard, Scarf, Outdoor” pointed out that the weather was getting hotter, and because of the epidemic, many people were discussing whether to wear a mask. For example: “Can you please lift the ban forcing us to put on masks? The weather is getting hotter and I really can’t tolerate wearing a mask.” There are also some people who said, “Please quickly pull on the neck guards, masks, and scarfs.” However, more netizens believed that “outdoors should be fine, but I think it’s safer to have masks on indoors.” This indicated that different opinions on epidemic prevention measures have emerged among the community after the COVID-19 epidemic seemed to be calming in Taiwan. Moreover, we have also noticed negative words such as “Injury, Set sail, Mothership,” and “Diamond Princess.” This is because some people have accused the military personnel that participated in the Dunmu training mission held by the Ministry of National Defense: “There are already confirmed cases of the Diamond Princess, why the ship still insists on going out to sea?” On the other hand, many people wrote words of encouragement: “The navy has already done well, because of the epidemic prevention control on the shipboard, the virus did not cause serious damage on the ship. In addition, it has successfully completed the mission that has lasted more than a month, which was much better than two aircraft carriers belonging to the U.S. and French and also a group of cruise passengers.” After the above discussion, we have proven that the proposed method in this work can effectively and professionally analyze public opinion and can further understand the content of the speech in fine detail. It can go further in assisting the government’s management of its social media account, thereby improving the image of the government as well as building more favorable interactions with the people.