With the rapid development and maturity of Internet technologies, many online social platforms have gradually become the primary medium for people to obtain information and communicate with each other. Emerging social platforms, such as Weibo and WeChat, allow users to interact with information quickly and easily.
Weibo is not only a medium for people to communicate with each other, but also a way to express personal emotions. However, while expressing opinions, spreading thoughts, and expressing personal emotions, users also generate a large amount of information with personal subjective emotional characteristics. This information contains emotional characteristics of different tendencies. These emotional characteristics reflect the user’s hobbies and interests. At the same time, it may also have a huge impact on the spread of Internet public opinion. Therefore, the sentiment analysis of Weibo text can understand the use’s preferences and users’ views on some hot events in the real society and make trend predictions.
Personality is a unique characteristic of an individual and profoundly affects the user’s psychological state and social behavior. Personality research mainly focuses on the correlation between various personalities and the relationships between personalities and performance, creativity, and others, most of which are analyzed by self-reporting and regression algorithms. With developments in psychological research, people with the same personality have been found to exhibit similarities in writing and expressions. This feature is the basis for introducing personality into sentiment analysis. At present, personality-based sentiment analysis is still in its exploratory stages. Sentiment analysis does not differentiate the various ways of expressing emotions based on user individuality, nor does it consider the combination of sentiment analysis and personality analysis. In order to address this problem, this paper proposes a personality-based Weibo sentiment analysis model, which introduces personality judgment rules to study the influence of personality on sentiment analysis.
2. Related Work
A number of studies have been conducted to improve traditional sentiment analysis methods. Bermudezgonzalez et al. [1
] proposed building a comprehensive Spanish sentiment repository for subjective analysis of emotions. Cai et al. [2
] solved the polysemy of emotional words by constructing a sentiment dictionary based on a specific domain. It is experimentally confirmed that the accuracy of using two superimposed classifiers, Support Vector Machine (SVM) and Gradient Boosting Decision Tree (GBDT) is better than that of a single model. Xu et al. [3
] effectively constructed the sentiment classification of text by assembling an extended sentiment dictionary containing essential sentiment words, scene sentiment words, and polysemous sentiment words. Yang and Zhou [4
] compare the processing speed and accuracy of Bayesian classifiers and support vector machine classification algorithms that implement sentiment mining for microblogs. Pang et al. [5
] used emotional polarity determination for film reviews through three different supervised machine learning methods, namely support vector machine, naive Bayes, and maximum entropy. In the experiment, Pang et al. used unigram to construct vector features and then carried out chapter-level emotional polarity discrimination. The experimental results show that both the SVM and naive Bayes could achieve better emotional scores. Kamal and Abulaish [6
] proposed an emotion analysis system based on a combination of rules and machine learning methods to identify feature-opinion pairs and their emotional polarity, in order to achieve user evaluation in different electronic products and attain user’s emotional polarity. Song et al. [7
] developed a new emotional word embedding technique. The primary framework differences are the joint code of morphemes and the part-of-speech tags. Under the proposed method, only important morphemes in the embedding space are trained to address the problem. This overcomes the traditional limitations of contextual word embedding methods and significantly improves the performance of sentiment classification. Sharma and Dey [8
] proposes a hybrid sentiment classification model based on enhanced support vector machines. This model makes full use of the classification performance of boosting and support vector machines in sentiment-based online review classification. Experimental results show that in terms of sentiment-based classification accuracy, support vector machine integration using bagging or boosting is significantly better than a single support vector machine. Sharma et al. [9
] proposes a method of emotion classification based on machine learning. The experimental results show that the combination of multiple emotion classifiers can further improve the accuracy of classification. Rong et al. [10
] proposed an auto-encoder-based bagging prediction architecture (AEBPA), which has been shown to have huge potentials by experimental studies on commonly used datasets. Lin et al. [11
] proposed a method to improve sentiment classification by adding weights to highlight emotional features for the first step. Bagging is then used to construct multiple classifiers on different feature spaces and are combined into one aggregate classifier. The results showed that the method could significantly improve the performance of sentiment classification. Wang and Han [12
] propose a micro-blog sentiment analysis method that integrates an explicit semantic analysis algorithm. Wikipedia is regarded as an external semantic knowledge base, which improves the previous text representation method of micro-blog emotion analysis and improves the effectiveness of emotion classification. Waila et al. [13
] used the SO-PMI-IR algorithm, based on unsupervised semantic orientation, to evaluate the classification method based on machine learning (Naive Bayes and SVM) in order to realize the emotion analysis in movie reviews. Mladenovic et al. [14
] established a framework (SAFOS) using emotional dictionaries with emotion polarity scores and thesaurus of Serbian WordNet (SWN) in the feature selection process in order to execute emotion analysis in Serbian.
Numerous attempts have also been made to improve sentiment analysis techniques using deep learning. Yin et al. [15
] propose a semantic enhanced convolutional neural network (SCNN) for sentiment analysis. Based on sentiwordnet, a widely used emotional vocabulary resource, two methods of word embedding and emotion embedding are input into a convolutional neural network classifier, and good experimental results are obtained. Dan and Jiang [16
] proposed a long short-term memory language model (LSTM) for sentiment analysis. Lu et al. [17
] propose a p-lstm model based on long-term memory recurrent neural network (LSTM). The experimental results show that p-lstm has good performance in emotion classification task. In order to cope with the limitations of existing pre-trained word vectors which are used as inputs for CNN, Rezaeinia et al. [18
] propose a novel method, Improved Word Vectors (IWV). The IWV improves the accuracy of CNNs which are used for text classification tasks. Jabreel and Moreno [19
] combines two different methods for sentiment analysis. The first is N-Stream ConvNets, which is a deep learning method, and the second is XGboost regression based on a set of embedded and dictionary-based features. Abdi et al. [20
] propose a method based on deep learning to classify the emotions expressed by users in comments (called RNSA). This method uses a unified feature set to analyze emotions, which represents word embedding, emotional knowledge, emotional transfer rules, statistics and speech knowledge. The experimental results show that the unified feature set learning method can achieve more significant performance than the feature set learning method. Liu and Chen [21
] further studies deep learning and microblog sentiment analysis, extracts data from microblog by crawler, preprocesses it by corpus, takes it as the input sample of the convolutional neural network, establishes a classifier based on SVM / RNN, and finally judges the sentiment orientation of each sentence in a given test set. The experimental results show that the scheme can effectively improve the accuracy of emotional orientation, and the verification results are ideal. Hyun et al. [22
] proposed a target-dependent convolutional neural network (TCNN) method of TLSA (target-level sentiment analysis) tasks. This method uses distance information on target words and neighboring words to understand their importance and achieve the classification task of extracting emotions from text targets. This approach is able to achieve better performance on single-target and multi-target datasets. Chen et al. [23
] used BiLSTM and CNN neural network methods to improve the effect of sentiment analysis. In this approach, the BiLSTM-CRF sequence model is used to classify sentences into three types (no target, one target, multiple targets) based on the number of targets appearing in the sentence. Each set of sentences is then sent to a one-dimensional convolutional neural network of emotional classification. The experimental results show that the proposed method is able to improve the performance of sentence-level sentiment analysis and achieve the latest results from several benchmark datasets. Rezaeinia et al. [24
] proposes an improved word vector (IWV) method for sentiment analysis. This method is based on part of speech tagging technology, word-based method, word location algorithm and word2vec/glove method. The experimental results show that the improved word vector (IWV) is very effective for emotion analysis. Sun et al. [25
] utilized a deep neural network based on convolutional expansion features to perform sentiment analysis on Chinese micro-blogs. The posts and comments on Chinese micro-blogs are integrated to form a micro-blog session. Then, the automatic convolutional encoder is used for training to obtain the integrated features, and a deep belief network is used for the final sentiment classification. The experimental results show that under the proper structure and parameters, the performance of the deep belief network is better than that of SVM or NB. In order to solve the problem of mismatches between reviews and ratings on Amazon, Shrestha and Nasoz [26
] used paragraph vectors to transform product reviews and used vectors to train a circular neural network of gated recursive units. This model combines the semantic relationship between review text and product information in implementing emotion analysis. Bijari et al. [27
] developed a sentence-level graphical representation, which includes stop words that consider semantic and term relationships. The representation learning method of the sentence combination graph is employed to extract the underlying and continuous features of the document. Then, the learning characteristics of the document were entered into the deep neural network used for the emotion classification task. Hassan and Mahmood [28
] proposed a neural network structure using convolutional neural networks (CNN) and long-short-term memory (LSTM) on pre-trained word vectors. In this approach, the ConvLSTM makes use of the LSTM to replace the pool layer on CNN in order to reduce the loss of local detailed information and capture long-term dependencies on sentence sequences.
At present, most sentiment analysis is mainly based on text. However, with the rise of the picture sharing mode in social platforms, multi-modal sentiment analysis research on pictures, texts, and emoji has emerged. In a multimodal sentiment analysis method, Poria et al. [29
] propose a new method to extract features from visual and text patterns by using deep convolution neural network. By inputting these features into the multi-core learning classifier, the performance of the emotion analysis task is better. You et al. [30
] argue that pictures and texts should be jointly analyzed in a structured way. They developed a semantic tree mechanism, where the word and image areas in the text are mapped in implementing sentiment classification of image fusion. Jianzhong et al. [31
] characterized Weibo messages using manual features (such as emotional word frequency, use of negative words, and emoji) and employed SVM for classification. Han and Ren [32
] carried out sentiment classification by improving the Fisher discriminator of the kernel function. The use of latent semantic information with probabilistic characteristics as classification features is able to improve the classification effect of support vector machines. Cai and Xia [33
] pre-trained text CNN and image CNN to obtain text and image representations, and then used CNN to connect two feature vectors. Yu et al. [34
] used pre-trained CNNs to represent text and images and performed sentiment classification using logistic regression. Huang et al. [35
] proposed the deep multimodal attention fusion (DMAF) method as a new image and text sentiment analysis model, which utilizes a hybrid fusion framework to mine distinguishing features and intrinsic relationships of visual and semantic contents. Xu et al. [36
] developed a new bi-directional multi-level attention (BDMLA) model, using the complementary and comprehensive information between image modality and text modality to realize the joint classification of visual-text modality. Poria et al. [37
] used multimodal cues that blended speech, video, and text for sentiment analysis. In this approach, the video is first collected from the website and is processed to obtain the features of the video, voice, and text. The three modes are then merged to obtain the final emotional polarity.
In terms of personality prediction, a number of psychological and computational scientific studies have been conducted exploring the relationship between people’s language use and personality traits in the Big Five model [38
]. Golbeck et al. [39
] analyzed Twitter using structural and linguistic features and applied two regression algorithms to predict user personality traits. Bai et al. [40
] suggested using multi-task regression and incremental regression to predict user personality in the online behavior among Sina micro-blog (weibo.com) users. They found that the Mean Absolute Error (MAE) on this particular microblog platform is between 0.1 to 0.2. In addition, Nowson et al. [41
] applied a machine translation model to solve multi-language problems with text-based personality prediction. Their study achieved a root mean square error (RMSE) between 0.08 and 0.25.
Several studies have adopted integrated learning methods in emotional classification work, but the classification and personality prediction are in different research fields. Sentiment classification does not take into account the different emotional expressions of different personalities, nor does it couple in sentiment and personality analyses. Psychological research has shown that personality affects people’s writing and speaking styles, and people having similar personalities tend to exhibit similar emotional expressions. Considering the potential relationship between emotion and personality, this paper proposes a microblog emotion analysis method based on a personality and bagging algorithm (PBAL).