Do Informational and Emotional Elements Differ between Online Psychological and Physiological Disease Communities in China? A Comparative Study of Depression and Diabetes

Disease-specific online health communities provide a convenient and common platform for patients to share experiences, change information, provide and receive social support. This study aimed to compare differences between online psychological and physiological disease communities in topics, sentiment, participation, and emotional contagion patterns using multiple methods as well as to discuss how to satisfy the users’ different informational and emotional needs. We chose the online depression and diabetes communities on the Baidu Tieba platform as the data source. Topic modeling and theme coding were employed to analyze discussion preferences for various topic categories. Sentiment analysis was used to identify the sentiment polarity of each post and comment. The social network was used to represent the users’ interaction and emotional flows to discover the differences in participation and emotional contagion patterns between psychological and physiological disease communities. The results revealed that people affected by depression focused more on their symptoms and social relationships, while people affected by diabetes were more likely to discuss treatment and self-management behavior. In the depression community, there were obvious interveners spreading positive emotions and more core users in the negative emotional contagion network. In the diabetes community, emotional contagion was less prevalent and core users in positive and negative emotional contagion networks were basically the same. The study reveals insights into the differences between online psychological and physiological disease communities, providing a greater understanding of the users’ informational and emotional needs expressed online. These results are helpful for society to provide actual medical assistance and deploy health interventions based on disease types.


Background
The World Health Organization [1] reported that "chronic diseases are the leading causes of death worldwide and the disease rates from these conditions are accelerating globally, advancing across every region and pervading all socioeconomic classes". Compared with other diseases, people suffering from chronic diseases often require a long period of treatment, and the disease becomes a part of their daily life. Therefore, they have many special informational needs and emotional needs around the disease over a long period of 2 of 21 time. With the popularization of the Internet and the strengthening of public health awareness, patients with chronic diseases are more willing to seek informational and emotional support online. Online health communities (OHCs) play a strong role in providing support, where people with health problems can share objective and experiential information with others and seek sensitive topics via anonymity [2]. User-generated content (UGC) online is becoming a valuable resource for researchers to study patients' needs and behaviors [3]. Topic modeling [4], sentiment analysis [5], and social network analysis [6] are often used to analyze UGC to uncover topics, emotions, and behaviors hidden in online communities.
Mental health and physical health are two important health issues in chronic diseases that are receiving more and more attention. Dividing communities according to the disease type, and then studying internal informational and emotional elements, is conducive to deepening the understanding of the online behavior of patients with different diseases. On the basis of these discovered differences, personalized interventions and support behaviors can be developed [7,8]. Chronic diseases can be divided into physiological diseases and psychological diseases. Chronic mental disorders mainly include depression, personality disorders, etc. Chronic physical diseases mainly include diabetes, stroke, cardiovascular disease, etc. As both of these types of diseases are chronic, patients with physiological and psychological diseases share some similarities in terms of discussion topics, emotional expression, and engagement patterns, but there are also differences due to different disease types.

Online Disease Communities
One of the leading sources of health information online is from online health communities. Users of OHCs can acquire knowledge and advice from others as well as share objective and experiential information [9]. Online disease communities (ODCs) link people who have common diseases with each other. Compared with OHCs, the topics discussed by users in ODCs are more closely related to a specific disease. Even with strong networks of support from family and friends, ODCs are valuable and important resources that make it possible for individuals to interact with patients with the same disease and obtain social support from them [10]. A wide range of ODC studies have been carried out to study health topics and user needs [4], create social support interventions [11], and understand patient sentiment [12] and behaviors [13].
Previous studies on ODCs have mostly focused on a single disease community such as cancer [14], diabetes [11], autism [15], and so on. There are few comparative studies on diseases in different communities. Some comparative studies of communities only focused on topics and emotions, and there was no further discussion on user participation behavior and different emotional contagion patterns. Regarding topics, Park et al. [7] found that the anxiety, depression, and PTSD communities shared four themes. Chen [16] compared breast cancer, type 1 diabetes, and fibromyalgia communities and found that the theme clusters fell into a set of common categories. Low et al. [17] compared fifteen mental health support groups on Reddit and found that "health anxiety emerged as a general theme across Reddit". For emotions, Gkotsis et al. [18] found that negative sentiment was prevalent in the sixteen mental health communities. Patients with mental illness generally expressed more negative emotions online than those with physical illness [19]. Therefore, in order to better understand the characteristics of different disease communities and user groups, we divided the ODCs into psychological disease communities and physiological disease communities to carry out a comparative study.

Emotional Contagion Theory
Emotional contagion theory was defined as "a type of emotional influence that describes the spread of one person's emotion to others during social encounters" [20], which plays a significant role in group members. It is widely used to identify opinion leaders [21,22] and improve the sentiment analysis model [23,24]. A study in PNAS collected over three million posts on Facebook from 698,003 users over a 20-year period and found that emotion could be transferred to others through emotional contagion, causing people to experience the same emotions unconsciously [25]. These positive and negative emotional states were proven to "behave like infectious diseases spreading across social networks over a long period of time" [26]. Negative emotional contagion, that is, catching someone else's bad mood and experiencing an increase in negative mood as a result, is particularly common in ODCs. Among social networks, unpleasant emotions have been shown to be more likely to transfer to others than pleasant emotions [27]. At the same time, positive emotional support that has proven particularly beneficial to patients' mental health can also be observed in ODCs [28]. Similarly, other desirable health outcomes can also be achieved through ODCs [29] including coping strategies [30], personal relationships [31], and even physical health [32]. In conclusion, figuring out the emotional contagion patterns of ODCs plays a significant role in patients' physical and mental outcomes.

Objectives
The purpose of this study was to compare differences between psychological and physiological disease communities in informational and emotional elements including topic, sentiment, participation, and emotional contagion patterns using multiple methods. Based on these, personalized interventions and support can be provided to satisfy patients' different informational and emotional needs due to different disease types. Specifically, this study aims to answer the following three questions (RQ). RQ1: What are the main themes and how much thematic similarity and difference exists among the psychological and physiological disease communities?
RQ2: What are important features of sentiment and participation in psychological and physiological disease communities? RQ3: What are the similarities and differences in emotional contagion patterns between psychological and physiological disease communities?
Depression and diabetes are two typical mental and physical disorders requiring a continuing network of support. Under these conditions, ODCs have unique value as communication platforms for people who share struggles with psychological or physiological diseases. Compared with previous studies, we innovatively introduce emotional contagion theory to understand positive and negative emotional contagion patterns of the two disease communities. We expect that quantifying differences between online psychological and physiological disease communities will yield valuable insights into specific informational and emotional elements expressed by patients with different types of disease and help deploy treatment more effectively.

Methods
Our approach employs topic modeling, theme coding, sentiment polarity analysis, and social network analysis to compare ODCs focusing on depression and diabetes. By comparing the themes and sentiment of UGC in the two communities as well as user engagement and emotional contagion patterns within the communities, the similarities and differences of the two ODCs in the above aspects can be obtained. The research process is described in Figure 1.

Data Collection
The dataset used in this study was collected from two online disease communities on Baidu Tieba, which is the largest Chinese online platform for self-disclosure and Q&A [33]. There are millions of communities providing places online where users can communicate with others, covering topics related to health, economy, entertainment, and so on [34]. To collect activity data on the users in online physiological and psychological disease communities, we chose the largest communities related to depression and diabetes on Baidu Tieba. The "depression community" and "diabetes community" had each respectively attracted 429,500 users and 186,006 users to contribute. There had been 1,583,885 threads in the depression community and 31,648 threads in the diabetes community by 1 April 2021. Figure 2 shows an example of a completed thread. Each threaded discussion begins with an initial post (P0) from a patient with depression or diabetes. This may be followed by several reply posts (P1, P2, . . . , Pi) including the self-replies (P2). To communicate with the repliers, the patient or someone else can reply to the posts by publishing comments (C1, C2). In addition, members can also reply to the comments under a post (Cj).

Data Collection
The dataset used in this study was collected from two online disease communities on Baidu Tieba, which is the largest Chinese online platform for self-disclosure and Q&A [33]. There are millions of communities providing places online where users can communicate with others, covering topics related to health, economy, entertainment, and so on [34]. To collect activity data on the users in online physiological and psychological disease communities, we chose the largest communities related to depression and diabetes on Baidu Tieba. The "depression community" and "diabetes community" had each respectively attracted 429,500 users and 186,006 users to contribute. There had been 1,583,885 threads in the depression community and 31,648 threads in the diabetes community by 1 April 2021. Figure 2 shows an example of a completed thread. Each threaded discussion begins with an initial post (P0) from a patient with depression or diabetes. This may be followed by several reply posts (P1, P2, …, Pi) including the self-replies (P2). To communicate with the repliers, the patient or someone else can reply to the posts by publishing comments (C1, C2). In addition, members can also reply to the comments under a post (Cj).

Data Collection
The dataset used in this study was collected from two online disease communities on Baidu Tieba, which is the largest Chinese online platform for self-disclosure and Q&A [33]. There are millions of communities providing places online where users can communicate with others, covering topics related to health, economy, entertainment, and so on [34]. To collect activity data on the users in online physiological and psychological disease communities, we chose the largest communities related to depression and diabetes on Baidu Tieba. The "depression community" and "diabetes community" had each respectively attracted 429,500 users and 186,006 users to contribute. There had been 1,583,885 threads in the depression community and 31,648 threads in the diabetes community by 1 April 2021. Figure 2 shows an example of a completed thread. Each threaded discussion begins with an initial post (P0) from a patient with depression or diabetes. This may be followed by several reply posts (P1, P2, …, Pi) including the self-replies (P2). To communicate with the repliers, the patient or someone else can reply to the posts by publishing comments (C1, C2). In addition, members can also reply to the comments under a post (Cj).  Accordingly, we designed a web spider using Python 3.7 to crawl the records dating from 1 January 2010 to 30 October 2020. Since some data collected before 2018 were severely lost, the dataset from 2018 to 2020 was eventually selected for subsequent analysis. During this time, we archived 12,040 threads including a total of 152,287 posts and 164,100 comments in the depression community and 8091 threads including a total of 65,688 posts and 76,218 comments in the diabetes community. After identifying and sampling the threads, we extracted the dataset that contained several fields, as shown in Table 1.  • Floor (the floor in its thread, which represents the order of posts, e.g., P 1 is the first floor) • Author

Data Sampling
Baidu Tieba is a public platform, where users can post without verifying their identities. Consequently, there are some advertisements and meaningless texts in the communities. Given the noise in the whole dataset, we restricted our analysis to threads that were initiated by users with four or more initial posts. This threshold ensured sufficient diseaserelated texts for analyses and was also used to determine that contributors who regularly expressed their views or emotions in ODCs are patients with depression or diabetes [35,36]. In addition, we removed posts and comments that were blank or had authors we could not recognize. Since users can modify their names, we used portrait, the suffix of the user's homepage URL, as the unique identifier of the user.
The number of posts and comments generated by each user varied considerably, most users created only one thread, while few users actively generated the majority of posts and comments. A total of 6965 unique users initiated at least one thread in the depression community, 5418 of them published only one initial post, 465 users created a thread more than four times. They initiated a total of 4033 threads including 29,128 posts and 29,175 comments. The sample threads we extracted involved a total of 7100 users. In the diabetes community, 4147 unique users initiated at least one thread, 3088 of them published only one initial post, and 319 users who initiated a thread more than four times were identified. They initiated 3326 threads in total including 18,834 posts and 19,599 comments. A total of 3233 users were involved. Table 2 presents the basic descriptions of the collected dataset and sample dataset.

Topic Mining
To distinguish the differences in user discussion topics between physiological and psychological disease communities, we used the topic-generated model of LDA [37] to model the text of patient discussion. Latent Dirichlet Allocation (LDA) is a three-tier Bayesian topic-generation model, which assumes that each document has a different probability for each discussed topic. To ensure the amount and accuracy of topic mining, we treated a thread as a document for analysis, which contains the initial post and selfreplies in the thread [38]. Conducting analysis at the thread-level ensures that all the data were indeed the sincere information needs expressed by patients. After identifying data for topic mining, we removed punctuation marks, URLs, and emoticons that were not related to the topic of the text. Then, Jieba 0.39 in Python 3.7 was employed for word segmentation. During the segmentation, we used the Chinese Medical Subject Headings to expand the lexical dictionary due to the particularity of diseases and removed stop words by the stop word list of the Harbin Institute of Technology in China. Finally, the words "depression" and "diabetes" were also removed to avoid having many topic words directly associated with the diseases. After word segmentation, threads that contained at least three words were preserved. The final dataset for topic mining contained 6930 records including 3806 depression threads and 3124 diabetes threads.
Previous studies have shed light on the discussion topics in different ODCs. Based on the topic classification results of ODCs in the existing literature and the characteristics of our dataset, we constructed a generic theme classification scheme for ODCs. Each document has different probabilities of LDA categories. High probability words in some LDA categories may express the same theme. The core theme of LDA categories can be extracted and labeled manually corresponding to the theme classification scheme of ODCs to obtain the distribution probability of each document under each theme. Then, we conducted Mann-Whitney U tests to examine whether the differences in these theme arrays between the depression and diabetes communities were statistically significant.

Sentiment Polarity Identification
In order to find out the whole sentiment polarity of these two ODCs and the differences between initial posts, reply posts, and comments, TextMind [39] was applied to the text. TextMind is a Chinese language psychological analysis system developed by the Chinese Academy of Sciences. It provides easy access to analyze the preferences and degrees of different categories in text and returns each linguistic dimension score as a proportion of the total number of words under analysis.
The two most notable linguistic dimensions related to sentiment are "positive emotion" and "negative emotion", both of which have been used in several studies to measure positive and negative emotion in the online users' posts [40,41]. In this paper, sentiment polarity included three types: positive, neutral, and negative. We defined the sentiment polarity of our records as follows [42]: (1) If the score of positive emotion was higher than that of negative emotion, we defined the document as positive. On the contrary, we defined it as negative; and (2) if the score of positive emotion was equal to negative emotion, we defined it as neutral. All the records in the sample dataset were used for sentiment analysis.

Social Network Analysis
Social network analysis has been widely used in studies of user interaction behavior [43] and social support [44,45] in online communities. In this study, social network analysis was conducted to represent emotional contagion and examine the basic social characteristics of different emotional contagion networks and the role of patients in emotional contagion. A node is defined as a user who posted at least a message in communities. A tie is operationalized as the post-reply relationship, that is, when A replied to B, a tie was established between them and the arrow pointed from A to B. We inferred that the sentiment of a user's reply post is an objective manifestation of the user's subjective desire to cause an emotional impact on other users. According to the sentiment polarity of each reply, the network can be divided into a positive emotional contagion network and a negative emotional contagion network. The aggregated network and subnetworks of the thread ( Figure 1) are shown in Figure 3.
grees of different categories in text and returns each linguistic dimension score as a proportion of the total number of words under analysis.
The two most notable linguistic dimensions related to sentiment are "positive emotion" and "negative emotion", both of which have been used in several studies to measure positive and negative emotion in the online users' posts [40,41]. In this paper, sentiment polarity included three types: positive, neutral, and negative. We defined the sentiment polarity of our records as follows [42]: (1) If the score of positive emotion was higher than that of negative emotion, we defined the document as positive. On the contrary, we defined it as negative; and (2) if the score of positive emotion was equal to negative emotion, we defined it as neutral. All the records in the sample dataset were used for sentiment analysis.

Social Network Analysis
Social network analysis has been widely used in studies of user interaction behavior [43] and social support [44,45] in online communities. In this study, social network analysis was conducted to represent emotional contagion and examine the basic social characteristics of different emotional contagion networks and the role of patients in emotional contagion. A node is defined as a user who posted at least a message in communities. A tie is operationalized as the post-reply relationship, that is, when A replied to B, a tie was established between them and the arrow pointed from A to B. We inferred that the sentiment of a user's reply post is an objective manifestation of the user's subjective desire to cause an emotional impact on other users. According to the sentiment polarity of each reply, the network can be divided into a positive emotional contagion network and a negative emotional contagion network. The aggregated network and subnetworks of the thread ( Figure 1) are shown in Figure 3. Ucinet 6.0 was employed in this study to construct two aggregated networks of ODCs and calculate the degree centrality of each node. Degree centrality is the number of direct relationships of an entity. A node with high degree centrality is generally an active user in the network. In directed networks, in-degrees of nodes reflect the number of replies one received, while the out-degrees indicate the replies one provided to others. After separating different sentiment polarities of posts and comments in ODCs, Gephi 0.9.2 was used to calculate and visualize the positive and negative emotional contagion networks. Ucinet 6.0 was employed in this study to construct two aggregated networks of ODCs and calculate the degree centrality of each node. Degree centrality is the number of direct relationships of an entity. A node with high degree centrality is generally an active user in the network. In directed networks, in-degrees of nodes reflect the number of replies one received, while the out-degrees indicate the replies one provided to others. After separating different sentiment polarities of posts and comments in ODCs, Gephi 0.9.2 was used to calculate and visualize the positive and negative emotional contagion networks.

Preference Topics
Topic-generated model of LDA was used to identify the users' preference topics in ODCs. To determine the optimal number of topics, we calculated the model perplexity in each number of topics (K). Based on the perplexity principle, a smaller value of perplexity reflects the greater clustering effect of the model [37]. As shown in Figure 4, the lowest perplexity was 24 (from 1 to 100). Therefore, we set the number of topics as 24 and generated the topic distribution for each document. Topic-generated model of LDA was used to identify the users' preference topics in ODCs. To determine the optimal number of topics, we calculated the model perplexity in each number of topics (K). Based on the perplexity principle, a smaller value of perplexity reflects the greater clustering effect of the model [37]. As shown in Figure 4, the lowest perplexity was 24 (from 1 to 100). Therefore, we set the number of topics as 24 and generated the topic distribution for each document. Moreover, after referring to previous literature and merging topics, a generic theme classification scheme for ODCs was formed, with a total of four main topics and eight subtopics, as shown in Table 3. In order to test the final classification scheme, an intercoder reliability test was conducted. Two students majoring in information science and medicine annotated each LDA category with the most related theme label. After labeling, Cohen's Kappa coefficient was used to verify the consistency of the labeling results. The verification results showed that the Cohen's Kappa coefficient was 0.855, indicating that the consistency results were good, so the classification scheme was reliable [46]. The coders again discussed and finally formed a consistent content annotation result. After LDA topic modeling, the topic probability distribution for each document could be obtained. Based on the annotation result, we measured the topical coverage of each subtopic by the sum of the topic probabilities of different LDA categories with the same label. For the concrete content of these 24 LDA categories and merging results, see Appendix A, Table A1. Table 3. The two-layer theme classification scheme for the texts in online disease communities.

Main Topics Subtopics Description
Treatment [16,47] Drug therapy [19] Treatment that involves using medications to treat diseases or conditions, usually on a consistent basis.
Nondrug therapy [19] Treatments of diseases, which include the hospitalization, psychotherapy and surgery, etc.
Symptoms [48] Psychological Abnormal feelings and thoughts of patients due to diseases.

Physical
Abnormal physical conditions of patients due to diseases.
Self-management [16] Lifestyle [49,50] Life record about work, diet, mood or other issues. Moreover, after referring to previous literature and merging topics, a generic theme classification scheme for ODCs was formed, with a total of four main topics and eight subtopics, as shown in Table 3. In order to test the final classification scheme, an intercoder reliability test was conducted. Two students majoring in information science and medicine annotated each LDA category with the most related theme label. After labeling, Cohen's Kappa coefficient was used to verify the consistency of the labeling results. The verification results showed that the Cohen's Kappa coefficient was 0.855, indicating that the consistency results were good, so the classification scheme was reliable [46]. The coders again discussed and finally formed a consistent content annotation result. After LDA topic modeling, the topic probability distribution for each document could be obtained. Based on the annotation result, we measured the topical coverage of each subtopic by the sum of the topic probabilities of different LDA categories with the same label. For the concrete content of these 24 LDA categories and merging results, see Appendix A, Table A1. Table 3. The two-layer theme classification scheme for the texts in online disease communities.

Main Topics Subtopics Description
Treatment [16,47] Drug therapy [19] Treatment that involves using medications to treat diseases or conditions, usually on a consistent basis.
Nondrug therapy [19] Treatments of diseases, which include the hospitalization, psychotherapy and surgery, etc.
Symptoms [48] Psychological Abnormal feelings and thoughts of patients due to diseases.

Physical
Abnormal physical conditions of patients due to diseases.
Interventions [51] The intervention of diseases with food, exercise or other methods to keep healthy or help for recovery in daily life.
Social events Discussion about social events related to diseases.
To assess the users' discussion preferences and thematic differences in two disease communities, Figure 5 is displayed to analyze the document probability distribution of each topic in the communities. Nine discussed topics were found to have significant differences between the online depression and diabetes communities: drug therapy, nondrug therapy, psychological, physical, lifestyle, interventions, and relationships.

Social events diseases.
To assess the users' discussion preferences and thematic differences in two disease communities, Figure 5 is displayed to analyze the document probability distribution of each topic in the communities. Nine discussed topics were found to have significant differences between the online depression and diabetes communities: drug therapy, nondrug therapy, psychological, physical, lifestyle, interventions, and relationships.  can clearly clarify the differences in the discussion preferences between the two ODCs. The first theme is treatment, which can be divided into drug therapy and nondrug therapy. Figure 5a shows that diabetics were more willing to discuss their medication habits in the community, while other treatment methods such as hospitalization and surgery were less likely to be discussed. Patients with depression paid less attention to treatment. The second theme is symptoms, which contains psychological and physical symptoms. It can be clearly observed in Figure 5b that patients with depression generally expressed a higher desire to talk about their mental states, and at the same time, they were also concerned about their physical states. Diabetics paid more attention to their physical symptoms and their mental states were not significantly affected by the disease. The third theme is experience and includes texts on lifestyle and interventions. As shown in Figure  5c, the online disease community was an important platform for diabetics to record their living habits and daily self-management behavior. Through this platform, they can  Figure 5 can clearly clarify the differences in the discussion preferences between the two ODCs. The first theme is treatment, which can be divided into drug therapy and nondrug therapy. Figure 5a shows that diabetics were more willing to discuss their medication habits in the community, while other treatment methods such as hospitalization and surgery were less likely to be discussed. Patients with depression paid less attention to treatment. The second theme is symptoms, which contains psychological and physical symptoms. It can be clearly observed in Figure 5b that patients with depression generally expressed a higher desire to talk about their mental states, and at the same time, they were also concerned about their physical states. Diabetics paid more attention to their physical symptoms and their mental states were not significantly affected by the disease. The third theme is experience and includes texts on lifestyle and interventions. As shown in Figure 5c, the online disease community was an important platform for diabetics to record their living habits and daily self-management behavior. Through this platform, they can restrain their behaviors, encourage, and supervise each other to take better care of themselves. In contrast, patients with depression did not pay much attention to their lifestyle and daily interventions for disease. The final theme is social environment, which includes texts on relationships and social events. It can be clearly seen in Figure 5d that patients with depression were more likely to talk about their relationships with others including parents, friends, work partners, etc. In contrast, diabetics worried less about their social relationships and more discussed hot spots of society.

Sentiment Polarity of the Communities
Sentiment polarity of each initial post, reply post, and comment was identified and calculated. Figure 6 shows the proportion of text sentiment in the two communities after classification. In general, initial posts always contained more emotions, followed by reply posts and comments in ODCs. For positive emotion, it was distributed evenly among the three types of texts in the depression community, while in the diabetes community, it accounted for a large proportion in the initial posts and a small proportion in the reply posts and comments. For negative emotion, it accounted for a large proportion in the initial posts and gradually decreased in the reply posts and comments in the depression community, while the negative emotion of the diabetes community was evenly distributed in the three types of text. Compared with diabetics, patients with depression were more likely to initiate a thread to express their negative emotions, and there were more positive voices in the replies. Diabetics tended to post more positive content in their initial posts than those with depression, and users expressed less positive emotion in the response text.

Sentiment Polarity of the Communities
Sentiment polarity of each initial post, reply post, and comment was identified and calculated. Figure 6 shows the proportion of text sentiment in the two communities after classification. In general, initial posts always contained more emotions, followed by reply posts and comments in ODCs. For positive emotion, it was distributed evenly among the three types of texts in the depression community, while in the diabetes community, it accounted for a large proportion in the initial posts and a small proportion in the reply posts and comments. For negative emotion, it accounted for a large proportion in the initial posts and gradually decreased in the reply posts and comments in the depression community, while the negative emotion of the diabetes community was evenly distributed in the three types of text. Compared with diabetics, patients with depression were more likely to initiate a thread to express their negative emotions, and there were more positive voices in the replies. Diabetics tended to post more positive content in their initial posts than those with depression, and users expressed less positive emotion in the response text.  Figure 7 shows the number of posts and comments in different time periods, which reflects the users' participation behavior. The line chart can also reflect the change of positive emotion, negative emotion, and neutral emotion over time in ODCs. The findings revealed that the users preferred to participate in the depression community after 18:00 p.m. and there was a peak at 22:00 p.m. Compared with positive and neutral emotions, negative emotion was stable and changed more slowly. In the diabetes community, three peak periods of use were clearly observed: the first was from 9:00 a.m. to 11:00 a.m., the second was from 15:00 p.m. to 17:00 p.m., and the third was from 20:00 p.m. to 22:00 p.m. The fluctuation range of positive emotion and negative emotion was basically the same. The peak of positive emotion in the afternoon and evening was one hour later than that of negative emotion.  Figure 7 shows the number of posts and comments in different time periods, which reflects the users' participation behavior. The line chart can also reflect the change of positive emotion, negative emotion, and neutral emotion over time in ODCs. The findings revealed that the users preferred to participate in the depression community after 18:00 p.m. and there was a peak at 22:00 p.m. Compared with positive and neutral emotions, negative emotion was stable and changed more slowly. In the diabetes community, three peak periods of use were clearly observed: the first was from 9:00 a.m. to 11:00 a.m., the second was from 15:00 p.m. to 17:00 p.m., and the third was from 20:00 p.m. to 22:00 p.m. The fluctuation range of positive emotion and negative emotion was basically the same. The peak of positive emotion in the afternoon and evening was one hour later than that of negative emotion.

Post-Reply Relationships
Ucinet 6.0 was used to construct aggregated networks of two ODCs and calculate each user's in-degree and out-degree. In aggregated networks, the higher in-degree of a member, the more replies the member received. The higher out-degree of a member, the more replies the member provided. There are 7100 nodes in the depression network and 3233 nodes in the diabetes network. In a log-log plot, the distributions of in-degree and out-degree in two aggregated networks approximate the long-tailed power-law (see Figure 8), which are typical for scale-free networks. This means that the majority of users have low in/out degrees while only a small proportion of users have very high in/out de-

Post-Reply Relationships
Ucinet 6.0 was used to construct aggregated networks of two ODCs and calculate each user's in-degree and out-degree. In aggregated networks, the higher in-degree of a member, the more replies the member received. The higher out-degree of a member, the more replies the member provided. There are 7100 nodes in the depression network and 3233 nodes in the diabetes network. In a log-log plot, the distributions of in-degree and out-degree in two aggregated networks approximate the long-tailed power-law (see Figure 8), which are typical for scale-free networks. This means that the majority of users have low in/out degrees while only a small proportion of users have very high in/out degrees in these two communities. Most posts were published by a few users and only a small number of users could receive the others' replies.

Post-Reply Relationships
Ucinet 6.0 was used to construct aggregated networks of two ODCs and calculate each user's in-degree and out-degree. In aggregated networks, the higher in-degree of a member, the more replies the member received. The higher out-degree of a member, the more replies the member provided. There are 7100 nodes in the depression network and 3233 nodes in the diabetes network. In a log-log plot, the distributions of in-degree and out-degree in two aggregated networks approximate the long-tailed power-law (see Figure 8), which are typical for scale-free networks. This means that the majority of users have low in/out degrees while only a small proportion of users have very high in/out degrees in these two communities. Most posts were published by a few users and only a small number of users could receive the others' replies.

Emotional Contagion Networks
The social network can portray the emotional contagion pattern among users. After identifying the sentiment polarity of each reply, the whole network can be divided into two kinds of emotional contagion networks: positive emotion and negative emotion. Table 4 shows the network measures calculated for each of four emotional contagion subnetworks. We compared the subnetworks with average degree, average clustering coefficient, and average path length. The average degree represents the average degree of nodes. The average clustering coefficient representing the degree of node aggregation in a graph. The small value of the clustering coefficient suggests that peer neighbors are not

Emotional Contagion Networks
The social network can portray the emotional contagion pattern among users. After identifying the sentiment polarity of each reply, the whole network can be divided into two kinds of emotional contagion networks: positive emotion and negative emotion. Table 4 shows the network measures calculated for each of four emotional contagion subnetworks. We compared the subnetworks with average degree, average clustering coefficient, and average path length. The average degree represents the average degree of nodes. The average clustering coefficient representing the degree of node aggregation in a graph. The small value of the clustering coefficient suggests that peer neighbors are not closely connected. The average path length is the average shortest distance between all pairs of nodes in the network. Suppose emotion is spread over a social network. The average clustering coefficient and path length measure the steps from A to B. Different networks have different characteristics. Among the four subnetworks, the positive emotional contagion network of the depression community had the most nodes (3834), followed by the positive subnetwork of the diabetes community (3157). Overall, the nodes in the positive emotional contagion networks were more than those in the negative emotional contagion networks, indicating that more users participated in positive emotional contagion networks in ODCs. The number of ties and average degree of positive emotional contagion networks were higher than those of the negative emotional contagion networks, meaning that each person had fewer negative emotional connections and more positive emotional connections with others in both communities. In the depression community, the average distance between nodes was around 3.872 and the average clustering coefficient was around 0.014 for positive emotional contagion, which indicates that with fewer steps, positive emotion could reach other nodes than negative emotion. In contrast, the positive emotional contagion network had a higher average path length (3.550) and a lower average clustering coefficient (0.029) than the negative emotional contagion network in the diabetes community, which indicates negative emotion could spread more easily than positive emotion. In general, compared with the depression community, positive and negative subnetworks of the diabetes community had higher clustering coefficient and shorter average path length, though there were fewer nodes and ties in it. This means that information can spread among users more quickly in the diabetes community. Figure 9 presents the structure of different emotional contagion networks in ODCs. In these subnetworks, the node represents the user, and the node size is proportional to its out-degree, indicating the emotional intensity expressed by the user. The edge represents the strength of the emotional link between users, and the thicker the edge, the greater the amount of emotional contagion between users. The subnetworks can better capture how emotion flows among ODC users. The community detection algorithm is used to divide users, and nodes with the same characteristics are gathered together. Different communities are distinguished by color. The Louvain algorithm [53] was used to detect the potential community of participants and is recognized as one of the best methods for community detection in terms of computational time [54]. With such visualization, it is possible to see how different emotions can spread from core users to others and how they are positioned in the network. It can be clearly observed that emotional contagion is more prevalent in the depression community than in the diabetes community. For the depression community, there were fewer core users with large emotional contagion volume in the positive emotional contagion network. The network structure was mainly manifested as the core users mainly expressing positive emotions and influencing lots of surrounding users. Finally, three obvious clusters were formed around core users, presenting an apparent core user- It can be clearly observed that emotional contagion is more prevalent in the depression community than in the diabetes community. For the depression community, there were fewer core users with large emotional contagion volume in the positive emotional contagion network. The network structure was mainly manifested as the core users mainly expressing positive emotions and influencing lots of surrounding users. Finally, three obvious clusters were formed around core users, presenting an apparent core user-influenced emotional contagion dynamic. Compared with the positive emotional contagion network, the number of core users of the negative emotional contagion network was significantly higher, but the nodes were generally small. This means that more core users spread negative emotions in the network, but each user delivered a small amount of negative emotion. For the diabetes community, there were several core users in the positive and negative networks. The core nodes in these two networks were interconnected and emotions spread within different communities between users. This means that users in the diabetes community expressed and communicated their feelings with each other more frequently and there were no more influential users to spread emotions.

Core Users and Interveners Identification
To explore differences between the core users, Table 5 presents the 10 participants who had the highest out degrees in the four subnetworks for the depression and diabetes communities. Descriptive network characteristics of these participants in four subnetworks are available in Appendix A, Table A2. For confidentiality reasons, the participants' portraits were substituted with unique identifiers. By comparing the data presented in Table 5, it is possible to infer that users who spread positive emotions are not completely different from those who spread negative emotions as only a few participants appeared in both positive and negative networks in the depression community. In contrast, seven of the top ten core users in the positive emotional contagion network could also be observed in the user list of the negative emotional contagion network in the diabetes community. This means that these participants spread negative emotions as well as positive emotions. Based on this, we defined the unique influential users in the positive emotional contagion network as "interveners". Combined with Figure 9, it can be seen clearly that two obvious interveners (Dep_user_01, Dep_user_02) existed in the positive emotional contagion network of the depression community. They transmitted large positive emotions and had a huge impact on others around them, but rarely conveyed negative emotions. The core user Dep_user_03 was ranked high in both positive and negative subnetworks, so it cannot be defined as an intervener.

Principal Findings
The study presents several significant findings about informational and emotional elements in online disease communities. We not only wanted to compare the overall discussion themes and the sentiment expressed by users in online depression and diabetes communities, but to also identify differences in emotional contagion patterns. Emotional contagion theory enables a better understanding of emotional contagion patterns of ODCs. As we expected, the depression and diabetes communities have some similarities and differences in the above aspects, and our findings well answer the three research questions raised at the beginning of this paper.
Preference topics: The discussion in the ODCs mainly focused on four topics: treatment, symptoms, self-management, and social environment. Patients with depression were more concerned about their own symptoms and social relationships with others. They were more likely to reveal their mental status and share their emotions in the community. Diabetics were more willing to share their life experiences within the community, and they tended to focus on their lifestyle and interventions in daily life. In sum, patients with mental illness paid more attention to the past, their own thoughts, and relationships with others, while patients with physiological disease had a more positive attitude toward the future and hoped to maintain fitness through treatment and self-management.
Emotional expression: Patients were both willing to initiate a thread to express their negative emotions. As time passed, the number of posts in the depression community increased gradually from 6:00 in the morning and reached a peak at 22:00 in the evening. The online diabetes community had three emotional peaks during one day. Accordingly, compared with patients with physiological disease, patients with mental illness preferred to express their emotions at night.
Emotional contagion patterns: Core users of the positive emotional contagion network were fewer and different to those of the negative emotional contagion network in the depression community. This may suggest that positive emotions are conveyed more by the interveners than the patients in the depression community. In general, there were two obvious interveners in the depression community who broadly connected with other users and conveyed positive emotions to them. At the same time, positive emotions were more likely to spread within the community than negative emotion, so it is very important to bring in interveners and intervene proactively in the depression community. In contrast, core users in the diabetes community who spread positive and negative emotions were basically the same, and negative emotions spread more easily. Emotional contagion was less prevalent in the diabetes community. Therefore, more attention should be paid to the filtering of negative texts in the diabetes community.
Self-stigma: This difference in emotional contagion patterns provokes interesting discussions. One alternative explanation of the emotional contagion results could be selfstigma of depressed patients. People with mental disorders are often disdained by the public [55]. While all illnesses can face stigmatized attitudes from others, the public seems to discriminate against people with mental illnesses far more than those with physical illnesses [56]. "Stigmatized persons may internalize perceived prejudices and develop negative feelings about themselves", the result of which is "self-stigma" [57]. Self-stigma has become an important barrier to expressing emotions and seeking treatment for depression and other mental illnesses [58,59]. Compared with physical diseases, the self-stigma of patients with mental illness makes them unwilling to accept help from others and afraid of discrimination. Such a negative view focuses too much on self and is difficult to res-onate within the self-stigma group, thus hindering the spread of negative emotions in the community to some extent.

Implications
This study is significant in that it deepens the understanding of physiological and psychological diseases including the patients' needs expressed on the Internet, their attitudes toward the disease, and the way of participating in online communities. On a theoretical level, we utilized emotional contagion theory to develop the study, and this research has helped to advance the understanding of how emotion flows in online communities and how they differ across different types of ODCs. On a practical level, for health care administrators and health care providers, our research can help provide corresponding social support according to their different information needs and emotional needs. Meanwhile, understanding the contagion patterns of different emotions in ODCs can provide insights for online community managers with suggestions on how to improve the patients' mental well-being through interventions in different communities. Furthermore, this study addresses the psychosocial benefits of ODCs, and we should strengthen psychological counseling via ODCs. The ultimate goal of this research is to provide targeted social support and effective strategies for those who are struggling with psychological and physiological diseases, and contribute to improving their quality of life by making full use of ODCs. Meanwhile, ODCs should consider user privacy when deploying any intervention.

Limitations and Future Directions
There are some limitations that may encourage further research efforts. First, because of the particularity of each community, the users' needs, emotions, and engagement patterns may differ. Although depression and diabetes are two typical chronic psychological and physiological diseases, the results we found in these two ODCs may not be applicable for all the characteristics of the two disease types. Moreover, there is inevitably some overlap among the communities. Patients with physical illness may also suffer from mental distress. Therefore, comparing the differences between patients with psychological and physiological diseases requires more ODC studies and more rigorous user filtering mechanisms.
The second limitation is the methods we used for topic mining and sentiment analysis. Manual coding was used to label the most related topic in each LDA category. This process could cause deviations in theme classification to some extent. A more reasonable topic model or clustering method can be considered in the future. TextMind was used for sentiment analysis, which includes multiple dimensions to describe the users' mental states. We compared the two dimensions in it (positive emotion and negative emotion) to identify sentiment polarity, and its effectiveness needs to be further verified. Future work could further investigate this by using a mixed method of machine learning and manual labeling.
Finally, our study did not examine the duration of emotional contagion within the community and the extent to which users were affected. Although previous studies suggested that emotional contagion is prevalent in social communities, we cannot conclude that positive or negative emotion directly caused an impact on the users. We presented the emotional contagion pattern through a weighted social network, and conducting longitudinal studies to analyze the patients' emotion changes could deepen our understanding of emotional contagion in future work.

Conclusions
The main purpose of this study was to find the similarities and differences between online psychological and physiological disease communities. Data were collected from Baidu Tieba, which is the largest online forum in China. Topic modeling, sentiment analysis, and social network analysis can effectively capture the characteristics of patients with different diseases. Patients with depression had a great demand for emotional catharsis and preferred to express their emotions at night. They focused more on themselves and did not care about the treatment options. Diabetics advocated maintaining fitness through self-management. Furthermore, emotional contagion patterns were generally different in two communities. In the depression community, there were two obvious interveners spreading positive emotions and more core users in the negative emotional contagion network. In the diabetes community, more users were involved in the positive emotional contagion network and core users in positive and negative emotional contagion networks were basically the same. In summary, our overall finding extends the existing emotional contagion theory to the OHCs by identifying the posts' sentiment and core users in active group communications. We also contribute practical suggestions for designing ODCs to improve the mental and physical benefits of members. Different and timely interventions should be adopted according to the content and emotional tone of discussions in the whole communities, thus creating a positive atmosphere and promoting the healthy and sustainable development of the communities. The results are also helpful for health experts and practitioners to understand the users' different informational and emotional needs expressed online and help them better manage their health.

Acknowledgments:
We also appreciate the anonymous reviewers for their valuable comments on an earlier draft of our paper.

Conflicts of Interest:
The authors declare no conflict of interest.  For confidentiality reasons, the users' portraits were not displayed in its entirety, and parts of them were replaced with *.