Tracking and Analyzing Public Emotion Evolutions During COVID-19: A Case Study from the Event-Driven Perspective on Microblogs

Objective: Coronavirus disease 2019 (COVID-19) has caused substantial panic worldwide since its outbreak in December 2019. This study uses social networks to track the evolution of public emotion during COVID-19 in China and analyzes the root causes of these public emotions from an event-driven perspective. Methods: A dataset was constructed using microblogs (n = 125,672) labeled with COVID-19-related super topics (n = 680) from 40,891 users from 1 December 2019 to 17 February 2020. Based on the skeleton and key change points of COVID-19 extracted from microblogging contents, we tracked the public’s emotional evolution modes (accumulated emotions, emotion covariances, and emotion transitions) by time phase and further extracted the details of dominant social events. Results: Public emotions showed different evolution modes during different phases of COVID-19. Events about the development of COVID-19 remained hot, but generally declined, and public attention shifted to other aspects of the epidemic (e.g., encouragement, support, and treatment). Conclusions: These findings suggest that the public’s feedback on COVID-19 predated official accounts on the microblog platform. There were clear differences in the trending events that large users (users with many fans and readings) and common users paid attention to during each phase of COVID-19.


Introduction
Disasters usually happen unexpectedly. Coronavirus disease 2019 (COVID-19), a respiratory disease caused by a novel coronavirus, has spread to over 180 locations internationally in nearly five months since its sudden outbreak in December 2019. According to the World Health Organization (https://www.who.int/emergencies/diseases/novel-coronavirus-2019), as of 22 June 2020, there were 8,860,331 confirmed COVID-19 cases and more than 465,740 deaths worldwide. The effect of COVID-19 on personal health and economic development is self-evident, and it has had a wider impact on public sentiment. The earliest large-scale outbreak of COVID-19 was in China, and the country has now effectively controlled COVID-19. Examining the characteristics of the evolution of public emotion during the COVID-19 outbreak in China would be very useful in helping to understand and predict public emotion worldwide.
To this end, the study aim is to track the evolution of public emotions and analyze the root causes of such emotions from an event-driven perspective. The investigation focuses on the 48 day period following the announcement of the first laboratory-confirmed case of COVID-19 on 1 December 2019 [1] and covers the early and middle stages of the COVID-19 epidemic in China. However, tracking and understanding public emotion evolution during COVID-19 is not an easy task. Three fundamental issues need to be addressed: (1) How can the public's daily complex emotions be detected and generalized during the disaster? (2) How can the dynamic and constantly changing modes of the public's complex emotions be captured? (3) What are the criteria for identifying hot events that dominate current public emotions? As COVID-19 evolved, social network sites became appropriate spaces for the social sharing of emotions, opinions, and coping strategies, providing psychological benefits for users by increasing their feelings of emotional relief [2][3][4][5]. Therefore, the study aim is to track public emotion during COVID-19 using the contents of microblogs and to extract the details of dominant events.
There have been several series of investigations of the information flow of social networks during disasters; these have focused on issues such as crowd psychological states [6][7][8], disaster management strategies [9,10], detection of natural disasters [11,12], and public attention to disasters [13][14][15]. Specific to public sentiment analysis, scholars have proposed crowd emotion detection solutions for natural disasters (e.g., earthquakes [16], forest fires, floods, and droughts [17]) based on microblogging contents. If we view the aforementioned research as a pointwise exploration of public emotion at a particular moment caused by short-lived disasters, this study extends the exploration from point to line and adopts a period-wise approach for this longer term disaster situation. The period-wise perspective and long-lasting nature of COVID-19 enable more in-depth research on the dynamic latent connections among complex public emotions, as well as the root causes of public emotions during different phases of the disaster. Embedding public emotion evolution into the skeleton of COVID-19 also made it possible to strip out periodic changes from the overall epidemic process. From this perspective, research on long-term disasters is worthy of attention and is meaningful.
Another difference from previous work is that, despite the special long-term context of COVID-19, this paper considers public emotions to be driven by events. For example, the rapidly rising number of confirmed COVID-19 cases may trigger public panic, whereas effective government control measures may reduce panic. Behaviors that endanger social security may provoke public anger, whereas positive social donations may spark the emotion of pride. Finding the root causes of public emotion evolution would be helpful for understanding and managing the disaster.
The rest of the paper is organized as follows. Section 2 provides a summary of related work. The data collection process and methods of modeling public emotion evolution and extracting hot events are introduced in Section 3. The experimental results are shown in Section 4. The implications and future work are discussed in Section 5.

Disaster Detection and Management Using Social Networks
Scholars have demonstrated that social media is used extensively for shelter-seeking actions during disasters. Reference [2,3] reported that social media allows users to socially share emotions, seek empathy in a disaster situation, and obtain psychological benefits, as it increases their feelings of emotional relief and being part of a like-minded community. Reference [4,5] reported that data produced through social networks is ubiquitous, rapid, and accessible and has the potential to contribute to situational awareness during emergency social events, not only among civilians, but also among local and federal governments, non-governmental organizations, and local and mass media.
To analyze the effect of a disaster on the psychological state of the public, Reference [7] examined community disaster resilience and social solidarity regarding South Korea's most tragic maritime disaster, the Sewol ferry disaster, which was repeatedly portrayed on social media. Reference [6] uncovered the dynamics involved in the exchange of social support among a group of Puerto Ricans who experienced a natural disaster and highlighted the importance of culture in shaping patterns of help-seeking behavior in the aftermath of a disaster. For emergency event management, Reference [9] developed a framework for real-time urban disaster damage monitoring and assessment based on social media texts sent during and after the Tianjin explosion and Typhoon Nepartak. Reference [10] explored a mass media crisis information release strategy for controlling public panic stemming from accidental hazardous chemical leakage into rivers. Utilizing social networks as natural disaster sensors, Reference [11] presented and evaluated an earthquake detection procedure that relied solely on Twitter data. Reference [12] analyzed the spatial and temporal characteristics of Twitter feed activity in response to an earthquake in the United States and argued that the feeds represented a hybrid form of a sensor system that allowed the identification and localization of the impact area of the natural disaster. To analyze public attention during disasters, Reference [13] examined how the public discussed measles during a measles outbreak using a semantic network analysis of tweet content. They identified four frames based on word frequencies and co-occurrence: news updates, public health, vaccination, and political. Reference [14] proposed an "attention shift network" to observe, measure, and analyze the collective attention dynamics of Twitter users in response to violent terrorist attacks. In contrast to most previous research, which has concentrated on extracting situational information, Reference [15] focused on the potentially adverse effects of communal tweets during disasters and proposed to filter anti-communal tweets from communal tweets. In addition, Reference [18] presented a learning-based approach to identify prominent microblog users susceptible to sharing relevant and exclusive information in a disaster case. Reference [8] identified microblogging trustworthiness by characterizing the main features of tweets in a crisis, considering the effect of author emotions.
The studies discussed above have focused on natural disasters and social crisis events, including disaster detection, disaster management, public psychological dynamics, and public concerns via social networks. They have mainly focused on sudden, short-lived disasters and subsequent public feedback. In contrast, this study focuses on COVID-19, a disaster that happened suddenly and (to date) has lasted for several months. It is important to identify and assess the trends in hot topics on social networks at different stages of COVID-19, to provide an explanation for the continuous changes in public emotions during the COVID-19 epidemic.

Public Emotion Analysis Using Social Networks
Previous research on sentiment analysis in social networks has generated fruitful results using supervised classifier learning [19][20][21][22] and unsupervised emotion lexicon-based methods [23,24]. Most supervised methods are conducted by extracting linguistic features and constructing classifiers using machine-learning or deep-learning techniques. For example, Reference [19] presented a method to automatically build fine-grained emotion lexicon sets and suggested features that improve the performance of machine-learning-based emotion classification in Korean Twitter texts. However, supervised methods are usually affected by the corpus and limited by complicated parameter settings. In addition, most existing emotion classification methods focus on three polarities (positive, negative, and neutral emotions), the emotional granularity of which is not fine enough [25], and easily experience feature sparseness when extracting textual features from large-scale short and random social network posts [26]. Unsupervised methods detect emotion categories based on the emotional tendency of words in combination with semantics and syntactic structure in the text. For example, Reference [24] introduced an approach to classify emotion in microblog texts as anger, disgust, fear, happiness, liking, sadness, or surprise based on class sequential rules. Reference [23] proposed three pruning strategies to automatically build a word-level emotional dictionary for social emotion detection. In the dictionary, each word is associated with the distribution of a series of human emotions, including anger, disgust, fear, joy, sadness, and surprise. Such lexicon-based unsupervised methods can reduce the amount of calculation, are widely used in fine-grained emotion classification, and largely depend on lexicon coverage.
Specific to the analysis of public emotion during social crises, Reference [27] used an emotional change detection approach to analyze users' emotions on Twitter in the community involved in the Las Vegas shooting (October 2017). They argued that such analysis could help to improve emergency response services. Reference [28] took a resource perspective in conducting a qualitative analysis of affective expressions on Twitter collected in Germany during the enterohemorrhagic Escherichia coli (EHEC) food contamination incident. Affective expressions of coping went beyond positive or negative tone, as some people perceived the outbreak as a threat, whereas others perceived it as a challenge with which to cope. Reference [9] proposed to measure both urban emotional tendencies and physical damage from social media texts sent during and after the Tianjin explosion and Typhoon Nepartak based on a sentiment dictionary. Reference [29] examined the public mood in Korea in response to the Sewol ferry disaster by analyzing Twitter keywords ("anger", "anxiety", and "sadness"). They found that in the aftermath of the incident, keywords related to emotional reactions (such as anger and suicide) were prevalent on Twitter. Reference [30] proposed a disaster assessment model by correlating emotional data in voice and text communications with population density and disaster severity. Reference [31] found that during the 2007 Virginia Tech shooting, anger was the most frequently expressed emotion on Twitter, followed by fright, sadness, and anxiety. In terms of natural disasters, Reference [21] collected microblogs about the "H7N9" influenza and classified them into different emotion categories using a supervised classification method. A topic model was then used to extract subtopics about the event. The authors identified the intensity of five types of emotions for each subtopic over time. Reference [17] collected streaming tweets relating to disasters and built a sentiment classifier to categorize users' emotions during disasters (earthquakes, forest fires, floods, and drought) based on their varying levels of distress. They subcategorized users' emotions in response to different disasters as negative sentiments, unhappy, depressed, angry, and positive sentiments. Reference [32] analyzed tweets during Hurricane Sandy and projected users' sentiments onto a geographical map. Reference [16] aimed to track and predict the changing trends in negative emotions of victims during the post-disaster situation of the Ya'an earthquake. Reference [33] trained a binary sentiment classifier to examine public levels of concern and worry before and after Hurricane Irene. Reference [34] found that one-third of bloggers expressed discernible emotions when blogging about Hurricane Katrina, including concern, disgust, anger, fear, and hope.
Focusing on the process of the evolution of public emotions after crises, Reference [35] constructed a Weibo emotional lexicon with a wide coverage. By analyzing the dependency of Weibo text, they proposed an emotion-computing method based on dependency parsing and computed the emotional tendency and intensity of a single post. They used the Beijing "RYBKindergarten Child Abuse Case" as an example to count the positive and negative emotional posts on Weibo during the three stages of Internet public opinion evolution (outbreak, fermentation, and digestion) and obtained percentages for two types of emotional posts at each stage. Reference [36] qualitatively analyzed online public crisis emotions (especially positive ones) from comments on the Boston Marathon Facebook page during one month following the 2013 Boston Marathon bombing. They conducted qualitative content analysis by reading through comments and coding engagement strategies and emotional tones. They found evidence for positive public emotions; as the crisis evolved, public emotional content, tone, and intensity changed slightly as a result of the online public's identity and coping and the organization's engagement. Reference [7] examined how community disaster resilience and social solidarity concerning the Sewol ferry disaster were portrayed by social media over one year. They recorded the most frequently appearing and co-occurring words for three time periods and found a significant change in the sentiments of the community, as well as in ways of forming resilience and solidarity over time. Reference [16] introduced machine-learning methods to track and predict changing emotional trends on microblogs in victims of the Ya'an earthquake. Reference [37] assessed time-specific changes in the intensity and duration of individual distress after an attack, focusing on three primary negative emotions (anger, anxiety, and sadness). Guided by the Crisis and Emergency Risk Communication model, Reference [38] examined public emotional tone (alarm/concern, reassurance, anger, humor/sarcasm, neutral) from 2881 tweets posted during the initial, maintenance, and resolution stages of the 2015 California measles outbreak. They found that the public showed the greatest interest in the initial stage of the crisis, but their interest substantially declined afterward. Reference [39] used the deep-learning method to extract fine-grained public emotions (anger, anxiousness, fear, and sadness) from Chinese microblogs to assist in disaster analysis of the Ya'an earthquake.
These studies are summarized in Table 1. In terms of emotion types, previous work has mainly focused on general public sentiment (positive, negative, or neural) detection (e.g., Reference [9,10,23,25,28]) or negative emotions after crises (e.g., Reference [29,31,33,37,39]). Most research on fine-grained public emotions is restricted to the assessment of emotion types and rarely explores the relationship between different types of public emotions in crises. Regarding the public emotion evolution process, as most crises do not present a continuous evolutionary context, researchers have generally explored the developmental trajectory of public emotions during a specific period of time after the moment of crisis (e.g., Reference [7,16,35,39]). The latent transition modes between different emotions have rarely been studied. Therefore, this article focuses on fine-grained public emotions based on Ekman's six emotion types. It presents an in-depth exploration of the evolutionary patterns of emotions in different stages of COVID-19, as well as the transitions and co-change relationships among emotions. 1. Ekman's six emotion types comprise "happiness", "sadness", "anger", "disgust", "surprise", and "fear". 2. Plutchik's "wheel of emotions" comprises "anger", "disgust", "fear", "happiness", "sadness", "surprise", "trust", and "anticipation". 3. ECEC incident is the abbreviation for the enterohemorrhagic Escherichia coli food contamination incident in 2011. 4. H7N9 is the abbreviation for Influenza A virus subtype H7N9 (a bird flu strain of the species Influenza virus A).

Research Overview and Data Collection
In this study, a dataset was constructed using microblog data (n = 125,672) seeded from COVID-19-related super topics (n = 680) from 40,891 users from 1 December 2019 to 17 February 2020, on China's Sina Weibo platform. All super topics were manually classified into nine categories ("fighting COVID-19", "novel coronavirus", "Wenliang Li", "treatment", "epidemic development", "epidemic prevention and control", "confirmed cases", "isolation", and "support Wuhan") according to their content similarities. Users were divided into two groups: a common user group (n = 38,648) and a large user group (n = 377) based on their account influence after data cleaning. The dataset was then split into subsets by time unit. For each time unit, we aimed to describe the COVID-19 algorithmic skeleton by extracting and summarizing microblogging content in two dimensions: the number of newly confirmed COVID-19 cases and the number of accumulated COVID-19 cases. The hot events in each subset were then hierarchically extracted based on criteria related to both the microblogging and super topic characteristics. To automatically extract public emotions, we built upon previous emotion detection work and modeled the complex emotions in each time unit as a multidimensional description. We followed the categorization of the Chinese Affective Vocabulary Ontology Database for seven dimensions (joy, anger, sadness, fear, disgust, love, surprise) and 21 subdimensions, covering 27,466 Chinese phrases [41]. Weighted term frequency-inverse document frequency (tf-idf) measures were applied to construct the complex emotion matrix for each time unit. Based on emotion matrices, the accumulated emotion trends, the latent covariance modes, and the continuous transition modes for each pair of public emotions were modeled during different phases of COVID-19.
Microblogs. As China's largest and the only active public social network platform, Sina Weibo fosters user relationships that allow users to share, disseminate, and receive information. Through either the website or the mobile app (https://en.wikipedia.org/wiki/Sina_Weibo), users can publish microblogs containing text or multimedia contents for instant sharing; other users are able to browse, comment, or forward the content (similar to "retweet" in Twitter). Sina Weibo attracts more than 560 million monthly active users. Its user group shows a trend for younger users; 25% of users are of the post-1980s generation, 29% of the post-1990s generation, and 37% of the post-1995 generation (https://tui.weibo.com/insight/detail?id=222). In this study, the microblog dataset was collected from two sources. First, a search was made for keywords (n = 12) related to COVID-19 on the Sina microblog platform, and super topics containing the keywords were extracted. The microblogs (n = 93,840) labeled by these super topics were collected and posted by 37,108 users. This source ensured a large range of COVID-19-related topics (n = 585). Second, to ensure large coverage of important news during COVID-19, we examined the above dataset, filtered out the four most popular official accounts (People's Daily, Global Times, People's Hotspots, CCTV News), and collected their microblogs (n = 2183) from 1 December 2019 (when the first laboratory-confirmed case was announced in Wuhan [1]) to 17 February 2020. These four accounts have continued posting COVID-19-related news every day since the COVID-19 outbreak and contain 95 super topics. According to these 95 super topics, thirty-two thousand two-hundred seventeen microblogs were collected and combined into the dataset. We filtered out super topics with fewer than 50 microblogs and limited the post time to between 1 December 2019 and 17 February 2020. Finally, the dataset contained 125,672 microblogs, which were labeled with 326 super topics from 40,891 users. The data collection process is shown in Table 2. All emojis or emoticons in microblog content were converted into corresponding text (e.g., [Smile]). Super topics. All super topics were manually divided into nine categories based on their content, as shown in Table 3 and Figure 1. The number of super topics and microblogs, as well as the duration of each category are also shown. For example, under the category "confirmed cases", a total of 166 super topics and 68,555 microblogs were collected (e.g., #A total of 14,380 confirmed cases nationwide#). These nine super topic categories reflected news covering (1) measures that people adopted to fight COVID-19, (2) the process of COVID-19 in China, (3) a hero who appeared during the disaster, (4) treatment knowledge and achievements, (5) epidemic development nationwide, (6) the number of confirmed cases, (7) support from the world to Wuhan, (8) isolation, and (9) official epidemic control programs. Users. Users were divided into two groups: a large user group and a common user group. We marked users who had ever posted microblogs with more than 1000 comments in the collected dataset as large users. We used this classification because the microblogs of the two groups reflected different information. Common users tended to express their thoughts on the current super topic, and the number of comments, retweets, and likes for their microblogs were relatively small. Large users tended to post important official news, which attracted tens of thousands of fans and substantial social attention, but the content of these posts lacked self-expressed information. The mean number of retweets, comments, likes, topic discussions, and topic reads in each group are shown in Table 4. Table 4. Characteristics of nine microblog categories posted by large users and common users. Mean Mean C 1 = "Fighting COVID-19", C 2 = "Novel coronavirus", C 3 = "Wenliang Li", C 4 = "Treatment", C 5 = "Epidemic development", C 6 = "Epidemic prevention and control", C 7 = "Confirmed cases", C 8 = "Isolation", C 9 = "Support Wuhan".

Extracting the Skeleton of COVID-19
The skeleton describing the evolution process of COVID-19 was defined in two dimensions. Let n new i be the number of newest confirmed cases in time unit t i (day) and n acc i be the number of accumulated confirmed cases until time unit t i . < n new i , n acc i > describes the COVID-19 cases in time unit t i . Thus, < n new 1 , n acc 1 >, < n new 2 , n acc 2 >, · · · , < n new m , n acc m > describes the skeleton of COVID-19 in period < t 1 , t 2 , · · · , t m >. Regular expressions in Python were utilized to match n i and N i from microblogging contents of each time unit in the dataset. For a time unit for which multiple < n new i , n acc i > results were extracted, we ranked all results and chose the largest number for each dimension, considering that the confirmed number may vary and be reported more than once during a time unit. The final results were manually confirmed to ensure no conflicts and are represented in Figure 2. The extraction results before 21 January 2020 are not shown owing to the low posting rates. The number of new confirmed cases continued to increase from 21 January 2020 to 2 February 2020 and kept increasing until 12 February 2020. The peak appeared on February 12 and then kept decreasing until 17 February 2020. Thus, we divided the COVID-19 evolution into three phases in the following analysis according to these two change points.

Evolution Models of Complex Public Emotions
To extract complex public emotions during COVID-19, the microblog dataset was divided into k subsets based on k time units and clustered into k documents, denoted as D =< d 1 , · · · , d k >. For a document d, we aimed to identify its emotion distributions and sub-emotion distributions. The complex emotions in each time unit were extracted according to the Chinese Affective Vocabulary Ontology Database [41]. This vocabulary follows the categorization of Ekman's universal emotions [42] on six dimensions C = {joy, anger, sadness, fear, disgust, surprise} and, in particular, explores the emotion love based on a Chinese corpus containing 27,466 Chinese phrases on 21 subdimensions. Each emotional word w was labeled by <category, subcategory, degree>. For clarification, the following is an example of the extraction of emotion distributions; the extraction of sub-emotions followed a similar process.
To identify the most obvious emotions of each time unit, we propose a weighting scheme for the emotional words in each document d i ∈ D based on the tf-idf method. The tf-idf value increases proportionally to the number of times an emotional term w emo appears in the document d and is offset by the number of documents in the corpus D that contain the term w emo , thus adjusting for the fact that some words generally appear more frequently. The frequency of the emotional term w emo i in document d j is denoted as follows: where n i,j is the normalized frequency of term w emo i in document d j . The inverse document frequency of emotional term w emo i and the tf-idf value of w emo i in document d j are denoted as follows: Furthermore, the emotional degree of each word was combined with the tf-idf value to weight its emotional importance for the current document. Each emotional word w emo i in the vocabulary was assigned a degree level L(w emo i ) ∈ {1, 3, 5, 7, 9}. Let L × t f Id f (w emo i , d j ) be the emotional weight of w emo i contributed to document d j . For each category c ∈ C, its emotional importance in document d j is denoted as follows: Thus, the complex emotions during the k day period of COVID-19 are represented as a k × 7 matrix M, and the complex sub-emotions are represented as a k × 21 matrix M sub . Accumulated Emotions: The accumulated value represents the change in the intensity of each category of public emotion during COVID-19. To eliminate the influence of the unbalanced amount of microblogs in each time unit, t f ij is adopted as the frequency of the emotional term w emo i in document d j , where d j is the document aggregated from microblogs posted within the time unit t j (day). For each emotion category c ∈ C, its accumulated value in each time unit t j is denoted as: where L(w emo i ) ∈ {1, 3, 5, 7, 9} is the degree level of emotional word w emo i assigned in the vocabulary. Emotion Covariance Model: The covariance model measures whether two public emotion categories changed according to the same or opposite trends during COVID-19. To find the latent covariance modes between each pair of emotions, based on the complex emotion matrix M and the sub-emotion matrix M sub , we extracted the emotion covariance accompanying the COVID-19 skeleton (see Section 3.2.1). In Figure 2, the COVID-19 period from 21 January 2020 to 17 February 2020 was divided into three phases by the two change points of new confirmed cases: 21 January 2020 to 4 February 2020; 5 February 2020 to 12 February 2020; and 13 February 2020 to 17 February 2020, denoted as I = < I 1 , I 2 , I 3 >. For clarification, the following takes the emotion matrix M and phase I 1 as an example. The covariance between each pair of sub-emotions in M sub during the other subperiods I 2 and I 3 follows the same process. For each two emotion categories c i , c j ∈ C, and vectors C i and C j are extracted from emotion matrix M, |C i | = |I 1 |, |C j | = |I 1 |. The Pearson correlation coefficient between C i and C j is used to represent the covariance between emotion c i and c j in the current phase during COVID-19.
Emotion Transition Model: An emotion transition model extended from previous work [43] was used to measure whether there was mutual transformation between the two public emotion categories. We used the emotion matrix M during phase I 1 as an example. First, all the emotion vectors C i in M are first converted to ranked vectors by category, denoted as C r i and M r , respectively. Thus, the effects of emotional imbalances are eliminated. Next, the emotional transitions between two neighboring time units U i and U i+1 are measured. U is a 7 × 1 vector here. For each category of emotion U i,p in the ranked emotional vector of time unit U i , we compared U i,p with each emotion category in the next time unit U i+1 , applying function comp(x, y) as follows: Thus, for each pair of neighboring time units U i and U i+1 , the transition matrix T i,i+1 = Trans(U i , U i+1 ) is denoted as follows: where 7]. The public emotion transition matrix in phase I 1 during COVID-19 is denoted as follows:

Extracting Hot Events from COVID-19 Super Topics
Hot events are hierarchically extracted according to their popularity in each time unit. For each super topic e, let W =< w 1 , · · · , w n > be the set of microblogs labeled by super topic e. Five measures are used to describe the hot degree hot(e) from two aspects. (1) At the microblog level, for each post w i , the number of retweets, the number of comments, and the number of likes are used to describe the hot degree of e, denoted as (2) At the topic level, the number of discussions and the number of reads are used to describe the hot degree of e, denoted as S e = [e discuss , e read ]. For the set of super topics E =< e 1 , · · · , e k > in each time unit, the hot degree of each super topic e is as follows: hot(e) = weight 1 * ||rank( 1 n ∑ i∈ [1,n] H i )|| 1 + weight 2 * ||rank(S e )|| 1 (8) Thus, the hottest event in the current time unit is denoted as follows: This paper represents the results of the top two super topics for each time unit in Section 4 and sets weight 1 = weight 2 = 0.5.

Results
To better explain the experimental results, we first introduce the detection results for hot events during COVID-19 in Section 4.1. Then, we show the detection results for public emotion evolution and explain public emotion changes in Section 4.2 based on the corresponding hot social events.

Hot Events from Super Topics during COVID-19
Hot events among common users and large users were extracted by time unit. The top three hot events were statistically analyzed and are shown in Figure 3. From 1 January 2020 to 17 February 2020, among common users, twenty-six-point-eight-three percent of the hot events concentrated on topics about the "New coronavirus", followed by "Isolation" at 20.33% and "Epidemic development" at 17.07%. For large users, only one super topic "Wuhan unknown cause pneumonia" was found before 21 January 2020; thus, the hot events were analyzed from 21 January 2020. These hot events concentrated on topics about "Confirmed cases", which achieved a high proportion of 48.19%, followed by "Supporting Wuhan" at 13.25%. The top three hot events at each time unit at different phases of COVID-19 for common users and large users are shown in Figure 4a. The area of each circle represents the popularity of the current event (e.g., the top hot event is denoted as the largest circle), and the nine event categories are differentiated by color. As few super topics were identified at the very start of the COVID-19 period, before 21 January 2020, we viewed this period of time as a warm-up phase and did not include this period in the subsequent analysis. The detection results for category "Wenliang Li" are not shown here owing to their sparsity. The Y axis shows the number of discussions logged under each hot event. To highlight this pattern, only the top three events are shown. Tables A1 and A2 (see Appendix A) list the top and top two hot events in nine groups at each time unit for common users and large users from 21 January 2020 to 17 February 2020.

Public Emotion Evolution
In this section, we show the detection results for public emotion evolution in different phases of COVID-19 from three aspects: accumulated emotions, emotion covariances, and emotion transitions. We explain public emotion changes based on the corresponding hot social events during different phases of COVID-19 in Section 4.1.
Accumulated Emotions During COVID-19. The accumulated values of seven public emotions and 21 sub-emotions during COVID-19 were analyzed and are presented in Figure 5. Love appeared to have the most significant accumulated value during the whole period from 21 January 2020 to 17 February 2020. The accumulated value of love increased around several time points, such as 26 January 2020, 9 February 2020, and 14 February 2020. It is worth noting that from 14 February 2020 to 16 February 2020, the accumulated value of love was very high. The accumulated value of sub-emotion PD(respect) increased several times, especially from 14 February 2020 to 16 February 2020. PH(praise) showed relatively high accumulated values during the observation window.
Emotion Covariances in Different Phases of COVID-19. The covariances among complex public emotions during COVID-19 were analyzed and are presented in Figure 6 by phase. Each grid of coordinates (x, y) represents the covariance coefficient of two categories of emotions x and y for seven emotions. To avoid confusion, public emotion covariance was analyzed by three phases according to peak/valley values on the "newest confirmed cases of COVID-19" skeleton from 21 January 2020 to 17 February 2020. In the first phase (21 January 2020 to 4 February 2020), the sadness and disgust emotions appeared to have the most significant covariance. In the second phase (4 February 2020 to 12 February 2020), although the covariance between sadness and disgust persisted, the new main covariance was between surprise and joy. Surprise was highly covariant with joy and anger. In the third phase (12 February 2020 to 17 February 2020), stronger covariance between positive and negative emotions occurred. In contrast with the previous two phases, a strong negative correlation appeared between sadness and surprise in the third phase. The covariances for the 21 sub-emotions are presented in Figure A1. The sub-emotions of (doubt, reprimand) and (reprimand, wishing) showed significant covariance during the whole COVID-19 period.

Emotion Transitions in Different Phases of COVID-19.
The transitions between complex public emotions are presented in Figure 6 and analyzed by phase from 21 January 2020 to 17 February 2020. The number in each grid (x, y) denotes the frequency of the accumulated transitions between emotion x and emotion y at two neighboring time units. Across the three phases, love was the most frequent and stable emotion. Disgust was the most stable emotion during the first phase, and anger was the most stable emotion during the second phase. Transitions between (anger → surprise) appeared frequently during the first phase (21 January 2020 to 4 February 2020). Transitions between (sadness → disgust) were relatively high during the third phase. The transitions between 21 sub-emotions are presented in Figure A1. The sub-emotions (jealousy → reassurance) frequently exhibited transitions, and the sub-emotion praise showed high continuity during COVID-19.

Discussion
The aim of this research was to use social networks to track the public's emotions during the spread of COVID-19 over a 48 day period since the first laboratory-confirmed case was announced on 1 December 2019, in China, and to analyze hot spot events underlying the emotions. The results show that, at different stages of COVID-19, public emotions showed interesting migration trajectories, as evidenced by both the covariance and transition modes. There was also a clear difference in the focus and reactions of common microblog users and official accounts in the face of the epidemic.
Some studies on public reactions to crises have focused on negative emotions (e.g., anger, anxiousness, fear, and sadness [37,39]) and have considered the threat of negative emotions to institutional trust [44], reputation [45], and the contribution to responsibility [45]. However, our findings show that public positive emotions (e.g., love, respect, praise) co-existed with and even surpassed negative emotions during COVID-19, which is consistent with previous research on other crisis situations [45][46][47]. Love appeared to have the most significant accumulated value during the whole period of 21 January 2020 to 17 February 2020 and increased around several time points (26 January 2020; 9 February 2020; and 14 February 2020). We further checked top-ranked hot events (Tables A1 and A2) and found that social events about "Fighting COVID-19" became hot during this period, as well as good news about effective treatment (e.g., "Cumulative total of 5911 cases discharged from hospital") and concerns about new cases (e.g., "Total number of cases successfully admitted to Vulcan Mountain Hospital exceeds 1000"). The sub-emotions Respect (PD) and Praise (PH) also showed high accumulated values and increased several times, especially from 14 February 2020 to 16 February 2020. The results confirmed that prevalent positive emotions can enhance the public's coping function during disasters [48,49] and increase trust in government [36,50].
According to [51,52], public reactions to crises comprise four main coping methods: rational thinking (cognitive), emotional venting (emotional), instrumental support, and action (conative). At the very beginning phase of COVID-19 (before 21 January 2020), while instrumental support and effective actions from the government were lacking owing to the suddenness of the epidemic, the public's feedback on COVID-19 predated official accounts on the microblog platform and focused on "isolation" and "novel coronavirus". The microblog platform was used to share timely information and to create a sense of connectedness and functioned as a "psychological first aid" for the public [53]. Then, in the first phase (21 January 2020 to 4 February 2020), the emotions of (sadness, disgust) showed the most significant covariance. The sub-emotions (doubt, reprimand) and (reprimand, wishing) also showed significant covariance at this time. The public mainly focused on "Isolation" and "Rapid development of epidemic". The public's rational thinking and emotional venting were also captured, as topics about "Support Wuhan" became hot among both common and large users. During the middle phase (4 February 2020 to 12 February 2020), surprise and joy showed a new main covariance. The emergence of this covariance can be attributed to several causes. For example, with the decrease in daily reported new confirmed cases, topics about effective "Treatment" and "Fighting COVID-19" (e.g., "Original relay to fight COVID-19") became hot in this phase (Figure 4). Topics about "Epidemic prevention and control" became hot among large users. Topics about instrumental support (providing information and social support [51]) from governments, non-governmental organizations, and medical staff across the country were extremely hot, particularly helped to boost public crisis coping and positive emotions, and also reflected the community disaster resilience of the government [54].
It is notable that surprise was also covariant with anger during the middle phase (4 February 2020 to 12 February 2020). This may have been triggered by the fact that top-ranked events about "Doctor Wenliang Li" appeared among common users (see Tables A1 and A2). Doctor Wenliang Li was the first person to issue a protective warning about COVID-19 and was called the "whistleblower" of the epidemic, but was unfortunately infected. In the last phase (12 February 2020 to 17 February 2020), a strong negative correlation appeared between sadness and surprise. Discussions about "Decrease in newly confirmed cases" became hot in this phase.
In addition, after checking all event ranking results in this phase, we found that more than 20 super topics were proposed about the death of Doctor Wenliang Li. Research shows that community members tend to pay attention to victims and their symbolic behavior after a tragedy, and such social solidarity can increase the collective response to disasters [55]. Communal bereavement can also increase social solidarity and lead to collective action, such as memorial rituals [56]. This could explain why the public expressed such strong sadness for Wenliang Li, as well as anger and disgust at his tragic death.
Considering the emotion stability, love persisted as the most stable emotion during the whole COVID-19 period. Disgust was the most stable emotion during the initial phase, and anger was the most stable emotion during the second phase. Considering the emotion transition patterns, transitions between the emotions (anger → surprise) and the sub-emotions (jealousy → reassurance) frequently occurred. Throughout the observation window, the topics "Novel coronavirus" and "Confirmed cases" remained hot, but the number of discussions about these two categories gradually reduced over time, reflecting the process of gradual public adaptation to sudden disasters. This reflects one of the psychological functions of social media usage in disasters, which is to increase the public's feeling of emotional relief [2][3][4][5]. This finding also indicates that disaster resilience [57,58] and social solidarity [59,60] during COVID-19 in China occurred at different levels (e.g., individual, institutional, and environmental) from the perspective of continuous semantic analysis.
Another issue that requires consideration is the role of government censorship of the Sina Weibo platform. According to the Sina Weibo Service Use Agreement (https://www.weibo.com/signup/ v5/protocol), in cooperation with Internet censorship in China, Sina Weibo imposes some restrictions on the content posted by users; for example, "Must not violate the laws and regulations of the People's Republic of China and relevant international treaties or rules" (Regulation 4.10.1) and "Do not upload, display or disseminate any false, racially discriminatory, violent, bloody, or other illegal information materials." (Regulation 4.10.4). In addition, news media and government agencies that open Weibo accounts must also abide by relevant laws, regulations, organizational rules, and regulatory requirements (Regulation 4.2). Such censorship ensures a safe and healthy operating environment for the platform and does not normally restrict users from publishing legal content. Our dataset and experimental results also show that during COVID-19, negative public emotions, such as doubt, jealously, and disgust, were expressed on the platform. Therefore, Sina Weibo can be regarded as a social platform that can objectively reflect public emotions during crises.
This was an exploratory study and therefore had some limitations. First, we did not differentiate between sarcasm and true negative emotions in the microblogging contents. Sarcasm is a special kind of sentiment that uses expressions that mean the opposite of what the person really wants to say [61]. In recent years, the automatic detection of sarcasm in social networks has been considered an interesting research topic in the field of information retrieval and natural language processing. However, it is also recognized as a difficult task [62], as it requires a system that has the necessary knowledge to interpret the linguistic styles of authors [63]. Researchers have mainly used machine-learning methods [64][65][66] and lexicon-based methods for sarcasm detection. For example, Reference [63] attempted to detect sarcasm in microblogs by considering different sets of features (function words and parts of speech n-grams) and tested a range of different feature sets using fuzzy clustering and naive Bayesian models. Reference [61] proposed two approaches to detect sarcasm in the text of Twitter data: a parsing-based lexicon generation algorithm and the occurrence of interjection words. Reference [62] used a hybrid approach by combining machine-learning and lexicon-based methods. Specific to our problem, the addition of sarcasm detection in future studies could help to improve the accuracy of public emotion detection.
Second, regarding the Sina Weibo platform used in this study, an important issue is how to differentiate between real users who retweeted the microblogs and commented, liked, or read the sites from paid bots. At present, both the platform and researchers have proposed a series of solutions to recognize bots. The Sina Weibo platform has implemented complex technical rules to determine whether user account behavior constitutes an automated behavior; for example, frequent repeated postings, a large number of zombie fans, forwarding, but no original microblogs, no interaction with others, and never commenting. Such accounts are assumed to be "bots" (https://new.qq.com/omn/ 20181113/20181113A0TTA1.html?pc, https://www.weibo.com/signup/v5/protocol). In addition, researchers have conducted much in-depth research on the detection of bots in microblogs and tweets, applying multi-feature-based recognition methods and spam-filtering techniques [67][68][69][70]. The present study did not include the identification of bot accounts as a research aim. The elimination of such fake accounts in future work would help to improve data quality and to obtain more accurate research results.
Third, this study explored general public emotion evolution during COVID-19 in China. In practice, the situation may differ considerably in different regions of the country in terms of, for example, prevention and control policies, time of the outbreak, and public feedback, especially for severely affected cities like Wuhan. Owing to differences in the effect of the epidemic, the emotional states of Wuhan citizens may be different from those in other regions of the country. Therefore, more detailed information (e.g., positioning information published with microblogs, regional information in profiles) needs to be collected to carry out in-depth research on public emotion evolution in different regions. COVID-19 was continuing to evolve when this paper was last modified in August 2020. More areas of the world are facing serious situations because of COVID-19, and the psychological states of the public around the world are attracting more attention. Future work also needs to consider various factors like culture differences and government management.

Conclusions
This study focused on the dynamic latent connections among complex public emotions during COVID-19 in China and viewed public emotions as driven by events. To strip out periodic changes from the overall epidemic process, public emotion evolution was embedded into the skeleton of COVID-19 and explained according to hot spot events. This process can help to identify the root causes of public emotions during different phases of the disaster. We collected microblogs during a 48 day period from when the first laboratory-confirmed case was announced on 1 December 2019, in China.
Several key turning points in the occurrence, development (the number of confirmed cases continued to rise), and effective control (the number of confirmed cases continued to decline) of COVID-19 were identified. As COVID-19 is still developing, this study could not analyze the follow-up situation. However, the analysis of these key turning points has important reference value for the understanding of the epidemic situation in other countries experiencing COVID-19.   New coronavirus with potential for widespread transmission C 2 Zhong Nanshan talks about the epidemic C 5 2020-01-22 New coronavirus with potential for widespread transmission C 2 The first new pneumonia confirmed in Henan C 7 2020-01-23 New coronavirus with potential for widespread transmission C 2 New pneumonia C 2 2020-01-24 New coronavirus with potential for widespread transmission C 2 Guangdong medical team sets off to support Wuhan on New Year's Eve C 9 2020-01- 25 Medical experts help Wuhan C 5 138 medical staff in Sichuan help Wuhan C 9 2020-01- 26 138 medical staff in Sichuan support Wuhan C 9 Latest national epidemic map C 5 2020-01-27 New coronavirus with potential for widespread transmission C 9 160 tons of radish from Jiangxi aid Wuhan C 9 2020-01- 28 A medical team of 205 people from Liaoning assists Wuhan C 9 Caring for Wuhan C 9 2020-01-29 Internet companies help Wuhan C 9 A total of 5974 confirmed cases nationwide C 7 2020-01- 30 Hubei latest epidemic situation map C 5 9692 confirmed cases nationwide C 7 2020-01-31 New pneumonia Jiangsu epidemic map C 5 9692 confirmed cases nationwide C 7 2020-02-01 Weekly outbreak map C 5 Li Wenliang was awarded 100,000 yuan by the charity fund C 3 2020-02-02 New pneumonia outbreak map in Jiangsu C 5 A total of 17,205 new pneumonia cases diagnosed nationwide C 7 2020-02-03 A total of 14,380 confirmed cases nationwide C 7 A total of 17,205 new pneumonia cases diagnosed nationwide C 7 2020-02-04 A total of 11,794 confirmed diagnoses nationwide C 7 Over 20,000 confirmed cases nationwide C 7 2020-02-05 Experts say no new coronavirus has been found in pets or humans C 2 Reduction of the spread of the new coronavirus to the whole country C 5 2020-02-06 Guangdong epidemic map C 2 A total of 24,324 confirmed cases nationwide C 7 2020-02-07 A total of 24,324 confirmed cases nationwide C 5 3143 new confirmed cases nationwide C 7 2020-02-08 Over 30,000 confirmed cases of new coronavirus pneumonia nationwide C 7 National Epidemic Infected Community Map C 5 2020-02-09 Experts say no new coronavirus has been found in pets and humans C 7 Aerosol transmission may occur under special conditions of New Coronavirus C 2 2020-02- 10 Is it a work-related injury to be infected with new coronavirus at work? C 4 A total of 40,171 confirmed cases nationwide C 7 2020-02-11 Is it a work-related injury to be infected with new coronavirus at work? C 4 A total of 42,638 cases of new coronavirus pneumonia diagnosed nationwide C 7 2020-02- 12 Aggregated outbreaks indicate that the virus has not spread to society C 4 New cases dropped outside Hubei for 8 days C 7 2020-02- 13 Is it a work-related injury to be infected with new coronavirus at work? C 5 New coronavirus pneumonia outbreak expected to end by April C 5 2020-02- 14 Cumulative total of 5911 cases discharged from hospital C 7 10 new cases drop outside Hubei C 7 2020-02- 15 Nine people in the same family in Beijing IV have been infected C 7 Mr. Weekly fights COVID-19 with you C 1 2020-02- 16 18 new confirmed cases in Hebei C 7 A total of 68,500 cases of new coronavirus pneumonia diagnosed nationwide C 7 2020-02- 17 A total of 543 cases of new coronavirus pneumonia diagnosed in Jiangsu C 7 Spotlight on new coronavirus pneumonia C 5 C 1 = "Fighting COVID-19", C 2 = "Novel coronavirus", C 3 = "Wenliang Li", C 4 = "Treatment", C 5 = "Epidemic development", C 6 = "Epidemic prevention and control", C 7 = "Confirmed cases", C 8 = "Isolation", C 9 = "Support Wuhan". Wuhan frontline doctor consciously isolated from family C 8 2020-01- 24 New pneumonia nationwide real-time map C 5 Doctor feels sick and chooses self-isolation C 8 2020-01- 25 Wuhan elderly couple self-isolated for New Year's Eve C 8 New pneumonia outbreak may improve before Lantern Festival C 5 2020-01- 26 Update the outbreak map in real time C 5 137 people from Liaoning medical team support Wuhan C 9 2020-01- 27 Liaoning 205 medical team assists Wuhan C 9 Xinjiang uniformly isolates asymptomatic people C 8 2020-01- 28 Potential for widespread spread of new coronavirus C 2 1771 new diagnoses nationwide C 7 2020-01- 29 Wuhan New Coronavirus C 2 Han Hong loves Wuhan C 9 2020-01- 30 Wuhan New Coronavirus C 2 New pneumonia C 2 2020-01-31 More than 30 drugs that may be effective against new coronavirus C 2 Hubei Epidemic Prevention Conference C 6 2020-02-01 Husband yells at nurse wife in Wuhan C 9 A total of 17,205 new pneumonia cases diagnosed nationwide C 7 2020-02-02 A total of 11,821 confirmed diagnoses nationwide C 7 Hubei Epidemic Prevention Conference C 6 2020-02-03 Feces or survival virus in new coronavirus pneumonia patients C 2 New coronavirus survives for 5 days and is transmitted by droplets, etc. C 2 2020-02-04 6 key issues of new coronavirus C 2 Hubei Epidemic Prevention Conference C 6 2020-02-05 Fujian outbreak map C 5 A total of 24,324 confirmed cases nationwide C 7 2020-02-06 Guangdong Epidemic Prevention Conference C 6 Difference between cold flu and new coronavirus C 2 2020-02-07 Reduction of the spread of the new coronavirus to the whole country C 5 National Health Commission expresses deep condolences to Dr. Li Wenliang C 3 2020-02-08 Difference between cold flu and new coronavirus C 2 A total of 34,546 cases of new coronavirus pneumonia diagnosed nationwide C 7 2020-02-09 Over 30,000 confirmed cases of new coronavirus pneumonia nationwide C 7 A total of 34,546 new pneumonia cases diagnosed nationwide C 7 2020-02- 10 Difference between cold flu and new coronavirus C 2 New coronavirus can spread in hot, cold, humid and hot weather C 2 2020-02-11 Over 30,000 confirmed cases of new coronavirus pneumonia nationwide C 7 WHO says new coronavirus is not afraid of hot and humid weather C 2 2020-02-12 New cases drop outside Hubei for seven days C 7 Mr. Weekly fights COVID-19 with you C 1 2020-02- 13 Experts say no new coronavirus has been found in pets and humans C 2 Hubei notified the latest progress of epidemic prevention and control C 6 2020-02- 14 Original relay to fight COVID-19 C 1 Fight with epidemic C 1 2020-02- 15 Total number of cases admitted to Vulcan Mountain Hospital exceeds 1000 C 6 Mr. Weekly fights COVID-19 with you C 1 2020-02- 16 Original relay to fight COVID-19 C 1 Mr. Weekly fights COVID-19 with you C 1 2020-02- 17 Spotlight on new coronavirus pneumonia C 5 C 1 = "Fighting COVID-19", C 2 = "Novel coronavirus", C 3 = "Wenliang Li", C 4 = "Treatment", C 5 = "Epidemic development", C 6 = "Epidemic prevention and control", C 7 = "Confirmed cases", C 8 = "Isolation", C 9 = "Support Wuhan".