3.2. LDA Topic Model
The LDA topic model applied a Python program, and Gibbs sampling was used to estimate the model parameters during the modeling process. We performed 1500 iterations; after that, we acquired the most presentative words in each topic and summarized the top 10 words. Table 2
shows the top 10 topic words for the six topics that were ultimately generated.
had given some clues that enabled us to explain and summarize the discussion text in each topic.
Topic 1 contains words describing feelings, such as “feel” and “thought”, as well as words describing emotions, such as “anxiety” and “painful”, which may be understood as patients describing feelings and ideas generated by external things.
Topic 2 includes words such as “accept”, “face” and “understand”. It can be observed that, under this topic, patients mainly describe their self-regulation when they face their diseases or the persuasion of others.
Topic 3 includes the words “parents”, “friends” and “children”, which reflect the close social relationships of patients, as well as “work”, “life” and other key words that reflect social life. Therefore, it is obvious that patients mainly discussed their social environment in this topic, including mainly social relationships and living background.
Topic 4 includes key words such as “doctor”, “hospital” and other entities, as well as verbs such as “surgery” and “inspect”. Thus, this topic can be summarized as nondrug therapy, making it easy to distinguish from drug therapy.
Topic 5 also includes entity keywords such as “doctor” and “hospital”, but it differs from Topic 4 in that key words include “medicine”, “meal”, “effect” and other words that are highly relevant to drug therapy. Thus, this topic is summarized as “drug therapy”. Doctors usually advise patients to exercise when using drug therapy. Thus, the keyword “exercise” also appears under this topic.
The key words of Topic 6 mainly include the words “polypeptide”, “vessel” and pharmacological reaction words, such as “effect”, “ingredient”, and so on. Therefore, we inferred that this topic mostly consisted of disease knowledge and drug recommendations; that is, exchanges of professional knowledge.
To demonstrate that our explanation of each topic is specific enough and to enhance the understanding of these topics by clarifying more details of the differences between patients with physiological and psychological diseases, Table 3
provides examples of patient discussion text from four disease communities. The probability of each text that belongs to the topic is over 60 percent, which ensure the specificity of the text topic.
By observing the representative texts in Topic 1—the patients’ feelings and thoughts—we find that patients with physiological diseases were more likely to discuss their own thoughts or plans for action, while patients with psychological diseases were more likely to discuss how the diseases affected them. Both of the discussions reflect in this topic.
In the examples of Topic 2—patient self-regulation—we found that both physiological and psychological disease communities discussed their own self-regulation. This kind of self-regulation is mainly reflected in patients’ attitude adjustment and emotional control. In the forum, the topic of self-regulation is reflected in whether patients encourage others or comfort themselves.
After further observing characteristics of text in Topic 3—the social environment—we found that patients with physiological diseases were concerned with changes in their social environment for a period of time. In comparison, posts by patients with psychological diseases indicated that they pay more attention to the long-term social environment before and after the illness or to their main social relationships. We can identify this topic by the examples.
Topic 4—nondrug therapy. Physiological diseases can be treated directly with nondrug therapies, the text of the discussion on heart disease and hypertension is reflected in descriptions of surgery or laboratory tests, while psychological disease patients with depression or OCD are more discussed in the process of inspection and psychological interventions, which are related to the particularities of the disease itself.
By observing the posts assigned to Topic 5—drug therapy—we found that both patients with physiological and psychological diseases required certain drug therapies. Therefore, there is no significant difference in the text of these posts between the two disease communities.
We found that posts discussed by patients within Topic 6—professional knowledge exchange—involved a large number of drug recommendations in advertising texts or introducing professional knowledge. In contrast to the posts from the previous five topics, the words associated with Topic 6 were more specialized, and the text was not generated by normal community users. Therefore, although we did not analyze text differences under this topic, the existence of this topic shows patients’ need for professional medical and pharmacological knowledge. This certainly should be of great significance to the operators of online health communities.
To assess the extent to which the topics may reflect patients’ discussion preferences, we plotted Figure 4
a–f to analyze the probability distribution of each disease OHC for each topic. This step was also used to test the validity of the topic model to ensure that the topic model identified the discussion preferences of patients with different types of diseases. As a whole, the topics could clearly identify the differences in topics discussed by patients with different types of physiological and psychological diseases.
a–f can clearly distinguish the discussion preference differences between the two kinds of patients. It can be seen in Figure 4
b that patients with obsessive-compulsive disorder were more likely to discuss Topic 2, patient self-regulation. In Figure 4
c, patients with depression were more likely to discuss Topic 3, social environment. These topics focus on the expression of patients’ own emotions or the description of their living environment. Thus, patients with psychological diseases were more likely to focus on releasing their emotions and displaying their lives in an online health community. However, as shown in Figure 4
d, patients with heart disease were more likely to discuss Topic 4, nondrug therapies. In Figure 4
e, patients with hypertension were more likely to discuss drug therapies. We can conclude that patients with physiological diseases were more likely to be interested in the treatment of their diseases and were more likely to share their own or others’ treatment experiences. Finally, the discussions in Topic 1 can also illustrate the differences in topic preference between patients with physiological and psychological diseases. As shown in Figure 4
a, patients with heart disease and hypertension participated in significantly less discussion pertaining to the topic of “feelings and thoughts” than did patients with depression and OCD; that is, patients with psychological diseases were more willing to express their thoughts, describe their feelings and express positive emotions that were beneficial to community activity.
3.3. Sentiment Analysis
We adopted the Boson NLP Chinese Sentiment Dictionary to analyze the text posted by each group of patients so that we could observe the emotional distribution among different types of patients. This would allow us to better understand the emotional differences between patients with physiological and psychological diseases who participate in the forum [34
The Boson NLP Sentiment Dictionary is built automatically from millions of microblogs, news comment sections and forums by tagging the emotion contained in the user-generated content. Negative emotion words correspond to a negative score, and positive emotion words correspond to a positive score.
Before the sentiment analysis, we had to proceed further based on the sentiment score and content of the text. When calculating sentiment scores, if a patient posted text that contains a lot of repeated complaints and insults or recommended certain medicines (usually advertising text), the sentiment scores were extremely high or low. Thus, we examined text containing extreme emotions, which means sentiment scores greater than ± 100, and removed that which did not reflect the real emotion of patient.
Based on the sentiment score, we plotted a percentage histogram according to the text emotion score in each OHC. Figure 5
shows the result.
As seen in Figure 5
, the negative sentiment score in the posted text for each disease was approximately 50%, with the other 50% corresponding to positive and neutral sentiments. We believe that this is because both the physiological and the psychological diseases cause the patients to experience suffering, so the expressions of emotion are mostly negative. At the same time, the total of positive and neutral text was close to 50%. Thus, we can reasonably assume that whether one’s disease is psychological or physiological, the patients received social support from the community to alleviate their negative emotions.
We calculated the average emotion scores for patients based on the four diseases. As shown in Figure 6
, the average sentiment scores for the four diseases were all less than 0. There was more negative than positive text among the four disease groups, which is consistent with the data presented in Figure 5
. In addition, because the size of the shaded areas represents the negative sentiment scores for each of the diseases, we found that the sentiment scores of patients with heart disease and hypertension were higher than those of patients with depression and OCD; that is, the negative emotion conveyed by patients with physiological diseases was lower than that of patients with psychological diseases.
We can explain this phenomenon from the perspective of relative deprivation theory. The core idea of relative deprivation is that in comparison to a reference object, individuals or groups perceive themselves to be at a disadvantage and do not obtain the rights they deserve. Such perceptions lead to negative emotions, such as anger and grief, and psychological changes influence individual outcomes [35
]. The theory is widely used in the fields of social behavior [36
], social economics [37
] and mental health [38
From the perspective of relative deprivation, it can be understood that, compared to healthy persons, both patients with physiological and psychological diseases are suffering from diseases, resulting in a sense of relative deprivation. This sense reflects in the expression of negative emotions in the OHC. It is further understood that most patients with physiological diseases can cure their diseases through a specific treatment, such as surgery or medication, while most patients with psychological diseases cannot recapture their health through explicit treatments and can only recover through a combination of external treatment and self-psychological adjustment. Therefore, patients with psychological diseases experience greater difficulty obtaining health than people with physiological diseases, and the sense of relative deprivation experienced by patients with psychological diseases is greater, also the negative emotions is greater.
To observe the distribution of sentiment scores among the patients according to the four diseases, we plotted scatter plots of sentiment scores based on all of the text for each disease, as shown in Figure 7
a–d. The horizontal coordinates correspond to each text posted by patients, and the vertical coordinates are the sentiment scores corresponding to each of those pieces of text. We analyzed the degree of dissociation of sentiment scores in patients with heart disease, hypertension, depression and OCD.
As seen in Figure 7
, the sentiment scores for each disease are evenly distributed by the 0-axis, such that the emotional scores of patients with heart disease, hypertension and depression are more concentrated, while the emotional scores of patients with OCD are more scattered. Thus, the emotional scores expressed in the text of patients with OCD are more intense and extreme. We believe that this may be because the OCD group is less concerned than other disease patients and OCD may be more difficult to cure. Thus, those patients may express emotions in the health forum that he/she cannot express in the real world, which in fact shows that OCD patients’ communication needs are not well satisfied.