Understanding the Public’s Emotions about Cancer: Analysis of Social Media Data

Cancer survivors suffer from emotional distress, which varies depending on several factors. However, existing emotion management programs are insufficient and do not take into consideration all of the factors. Social media provides a platform for understanding the emotions of the public. The aim of this study was to explore the relationship between the public’s emotions about cancer and factors affecting emotions using social media data. We used 321,339 posts on cancer and emotions relating to cancer extracted from 22 social media channels between 1 January 2014, and 30 June 2017. The factors affecting emotions were analyzed using association rule mining and social network analysis. Hope/gratitude was the most frequently mentioned emotion group on social media followed by fear/anxiety/overwhelmed, sadness/depression/loneliness/guilt, and anger/denial. Acute survival stage, treatment method, and breast cancer were associated with hope/gratitude. Early stage, gastrointestinal problems, fatigue/pain/fever, and pancreatic cancer were associated with fear/anxiety/overwhelmed. Surgery, hair loss/skin problems, and fatigue/pain/fever were associated with sadness/depression/loneliness/guilt. Acute survival stage and hair loss/skin problems were associated with anger/denial. We found that emotions concerning cancer differed depending on the cancer type, cancer stage, survival stage, treatment, and symptoms. These findings could guide the development of tailored emotional management programs for cancer survivors that meet the public’s needs more effectively.


Introduction
Cancer is a significant disease burden worldwide. In Korea, cancer-related deaths accounted for 26.5% of all deaths in 2018, and cancer has been ranked as the leading cause of death since 1983 [1]. However, with the recent advances in cancer prevention, early detection, treatment, and follow-up care, the five-year survival rate for all cancers combined improved from 44.0% during 1996-2000 to 70.4% during 2013-2017 in Korea [2]. Since the life expectancy of cancer patients has increased, cancer survivors may face physical and emotional hardships from cancer-related and treatment-related short-term, long-term, and late health effects. Cancer survivors experience various physical challenges, such as physical impairment, severe fatigue, vomiting, and loss of appetite. They also face numerous emotional problems, including anger, fear, anxiety, depression, and loneliness, throughout the cancer trajectory. Many cancer survivors suffer from severe emotional distress, which can impair their ability to cope with cancer effectively and impact their quality of life. Anxiety and depression, which are most common among cancer survivors, have been reported to result in poor adherence to cancer treatment, poor cancer survival, and increased risk of suicide if not properly managed [3,4]. Emotional distress in cancer survivors has been associated with several patient and disease characteristics, such as age [5], gender [6], cancer type [7], and stage [7,8]. Hence, cancer survivors are increasingly interested in how to deal with these chronic emotional challenges. The general public also has an increasing need for information concerning cancer-related emotion because they may have cancer survivors around them as family members, friends, or co-workers [9].
The National Cancer Institute and the American Cancer Society (United States), as well as Cancer Research UK (UK), provide comprehensive cancer information on their websites. They also provide information on emotional problems experienced by cancer survivors, as well as providing information regarding the management of emotions. However, the emotions experienced by cancer survivors may vary depending on the gender, age, cancer type, and cancer stage. Thus, it is necessary to provide tailored information on emotional management, which requires a deep understanding of the emotions experienced by the public regarding cancer. Recently, the use of data obtained from social media has been proposed as a novel approach to understand the public's emotional reaction to it. Social media platforms, such as social networking services (SNS), internet blogs, and online communities, have gained increasing popularity with the spread of smartphones and the internet over the last decade. Through these platforms, individuals report personal experiences, opinions, and emotions about cancer, sharing them with others in real-time [10]. Social media posts can be very useful in understanding the public's emotions relating to cancer. For example, Crannell et al. [11] analyzed 146,357 tweets from cancer survivors and found that patients with cancer with favorable prognoses had higher happiness scores than patients with cancers with poor prognoses. However, this and other similar studies using social media data were limited to only one emotion [11] or classified emotions only as positive, negative, or neutral [12][13][14][15]. Thus, these studies did not provide a comprehensive insight into the public's various emotions relating to cancer, such as depression, anxiety, hope, and anger. Additionally, social media data were collected in the previous research using only a few keywords for each topic of interest, which might not be sufficient to capture the full scope of emotions relating to cancer. As social media posts are written in colloquial language, consumer terms must be used to collect social media data. Thus, a list of concepts and terms for emotions about cancer used by the consumers provides a more appropriate framework for the collection and analysis of such data.
In this study, we used a cancer ontology previously developed by the authors [16] as a framework for social media data collection and analysis to investigate the public's emotions concerning cancer. The ontology contained nine superclasses (cancer type, prevention, diagnosis, treatment, prognosis, risk factor, symptom, dealing with cancer, and emotion), 213 class concepts, and 4061 synonyms. The emotion superclass, which was one of the nine superclasses, was composed of nine classes (denial, anger, overwhelmed, anxiety, depression, loneliness, guilt, hope, and gratitude) with 454 synonyms. The synonyms included colloquial expressions (heteronyms, abbreviations, and slang) used by the public in daily life, which are useful for collecting social media data. This study explores the relationship between the public's emotions about cancer and the factors affecting these emotions by association rule mining and social network analysis. The relationship between the public's emotions relating to cancer and the factors affecting these emotions identified in this study will provide a framework for establishing tailored health information that supports the management of emotions.

Materials and Methods
In this study, we used 321,339 posts on cancer and cancer-related emotions. These posts were obtained from online cafés (online communities) operated by Naver and Daum, internet blogs operated by Naver, Daum, Tistory, and Egloos, Twitter, and 15 message boards (e.g., YouTube, Naver Knowledge iN, Nate talk), between 1 January 2014, and 30 June 2017, in Korea. Naver and Daum are the two largest online platforms in Korea. Tistory and Egloos are blogging platforms in Korea.

Data Collection
We collected posts on cancer and cancer-related emotions. As search keywords, we used 302 terms for concepts and synonyms of the cancer type superclass (such as malignant cancer, colon cancer, brain cancer, liver cancer, and breast cancer) and 454 terms for concepts and synonyms of the emotion superclass (such as surprise, embarrassment, contempt, anger, and worry) contained in the cancer ontology developed by the authors. As stop keywords, we used 418 terms (such as malicious virus). We collected posts on cancer, from which we then selected posts on emotions relating to cancer.
Of the 1,854,497 posts on cancer, 434,299 posts included keywords indicating at least one emotion. Of these, 112,960 posts contained advertising keywords (such as detoxification) and were excluded. Finally, a total of 321,339 posts were selected for analysis. Terms were extracted from the posts using ontology-based natural language processing (NLP). The data collection and NLP procedures were performed in collaboration with SK Telecom Smart Insight, a Korea-based big-data marketing platform. The company used web crawler and Java-based NLP tools, which were developed within the company. Each document was word tokenized and mentions of emotions and factors related to emotions in each document were identified using terms in cancer ontology.
The study was approved by the Institutional Review Board of the Seoul National University (IRB No. 1802/001-006). Collected posts did not have any identifiable personal information.

Data Preparation
The collected social media data were converted into structured data for analysis. A single post was treated as an analysis unit. After extracting terms from each post, we identified terms for emotions and factors affecting the emotions, which were included in the cancer ontology.
Emotions concerning cancer were defined as per the emotion classification proposed by Jack et al. [17] as "Happy", "Surprise/Fear", "Sad", and "Disgust/Anger". The nine emotions included in the cancer ontology were divided into four emotion groups as per the classification by Jack et al. as "Hope/Gratitude", "Fear/Anxiety/Overwhelmed", "Sadness/Depression/Loneliness/Guilt", and "Anger/Denial". We coded the posts based on the presence of terms indicating each of four emotion groups as 0 (=no) or 1 (=yes). If more than two emotion groups were mentioned in a single post, each emotion group was counted. If the same emotion group was mentioned multiple times in a single post, it was only counted once.
Factors affecting emotions, such as gender, age, cancer type, stage, treatment, survival stage, and symptoms were defined. Gender was defined as male and female, and individuals were grouped according to age as <10, 10 s, 20 s, 30 s, 40 s, 50 s, 60 s, 70 s, and >80. Cancer types were divided into 14 groups, namely breast cancer, colon cancer, gastric cancer, leukemia, lung cancer, cervical cancer, liver cancer, brain cancer, pancreatic cancer, ovarian cancer, prostatic cancer, gallbladder cancer, kidney cancer, and thyroid cancer, by combining the top 10 cancers in national cancer statistics in Korea, 2016 [18] and social media posts. The cancer stage was defined as early stage, middle stage, and terminal stage. Treatments were classified as surgery, chemotherapy, radiation therapy, immunotherapy, complementary and alternative medicine, and transplantation. Survival stages were classified as acute survival stage, extended survival stage, and permanent survival stage. Symptoms were classified as general symptoms of fatigue/pain/fever, gastrointestinal problems, skin problems, poor circulation, thrombocytopenia, and infection. We coded the posts based on the presence of terms indicating each of the 43 emotion-related factor groups as 0 (=no) or 1 (=yes). If more than two emotion-related factor groups were mentioned in one post, each factor group was counted. If the same emotion-related factor group was mentioned multiple times in a single post, it was only counted once.

Frequency Analysis of Post
We analyzed the frequency of the general characteristics mentioned in the post of each social media channel. We also analyzed the frequency of the four emotion groups for each social media channel.

Association Rule Mining
We performed emotional analysis using association rule mining by applying the Apriori algorithm to investigate the relationship between different emotion-related factors and emotion groups. Three main measures of the association rule mining were support, confidence, and lift. Lift values close to 1 indicated no association between emotion-related factors and emotion groups [19]. Association rule mining was performed using the R software package (version 3.6.0).

Social Network Analysis
We classified social media according to the post's length, as internet blogs, online cafés, and message boards allow for long posts, whereas Twitter posts are limited to 140 characters. We performed social network analysis to examine and visualize relationships between emotion groups or emotion-related factors that appeared together in each post in the two groups. The nodes in the network represent the emotion groups and emotion-related factors, while the edges indicate relationships between the nodes. Node activity was determined based on the degrees corresponding to the number of direct connections to the node. The strength of the relationship between nodes was assessed based on the edge weights corresponding to the number of interactions [20]. The network of the relationships between emotion groups and emotion-related factors was constructed using NetMiner 4.4.3.b (Cyram Inc., Seoul, Korea).

Frequency Analysis of Posts
From the total of 321,339 posts analyzed, 128,944 posts (40.1%) were from internet blogs, 125,976 posts (39.2%) were from online cafés, 61,180 posts (19.0%) were from Twitter, and 5239 posts (1.6%) were from message boards.

Frequency of General Characteristics on Social Media Channels
We analyzed the frequencies of terms describing gender, age, and cancer type mentioned on social media channels (Table 1). Both genders were most commonly mentioned in online cafés, followed by internet blogs, Twitter, and message boards. Although age groups <60 were most commonly mentioned on internet blogs, age groups over 60 were most commonly mentioned in online cafés. Age groups <60 were mentioned in 2.3-9.7% of Twitter posts; however, ages over 60 were mentioned in only 0.6-0.7% of Twitter posts. All cancer types, except leukemia, were most frequently mentioned on internet blogs and online cafés. Notably, gallbladder cancer was mentioned twice as much in online cafés (65.1%) than on internet blogs (33.4%). Leukemia was the most frequently mentioned cancer type on Twitter.

Association Rule Mining
The relationship between emotion groups and emotion-related factors was assessed using association rule mining. The top five association rules with the highest lift are shown in Table 2. 3.2.1. Hope/Gratitude "Radiation therapy", "Acute survival stage", "Male", and "Female" were identified to be strongly associated with hope/gratitude (support, 0.010; confidence, 0.911; lift, 1.957). The support level of 0.010 indicated that the proportion of posts containing "Radiation therapy", "Acute survival stage", "Male", "Female", and hope/gratitude among total posts was 0.010. The confidence level of 0.911 indicated that the proportion of posts with hope/gratitude emotions among the posts mentioning "Radiation therapy", "Acute survival stage", "Male", and "Female" was 0.911. The lift level of 1.957 indicated that the ratio of the appearance of hope/gratitude in posts mentioning "Radiation therapy", "Acute survival stage", "Male", and "Female" to the appearance of hope/gratitude in total posts was 1.957. "Chemotherapy", "Breast cancer", "Surgery", and "Liver cancer" were also strongly associated with hope/gratitude, with a lift value of >1.89. "Early stage" and "Gastrointestinal problems" were associated with fear/anxiety/overwhelmed, with a support value of 0.016, a confidence value of 0.773, and a lift value of 1.623. "Chemotherapy", "Fatigue/Pain/Fever", "Pancreatic cancer", "Acute survival stage", and "Brain cancer" were also associated with fear/anxiety/overwhelmed, with a lift value of >1.61.

Sadness/Depression/Loneliness/Guilt
"Hair loss/Skin problems", "Male", and "Female" were associated with sadness/depression/ loneliness/guilt, with a support value of 0.014, a confidence value of 0.767, and a lift value of 1.949. "Surgery" and "Fatigue/Pain/Fever" were also identified as factors strongly associated with sadness/depression/loneliness/guilt, with a lift value of >1.77.

Anger/Denial
"Hair loss/Skin problems" was associated with anger/denial, with support, confidence, and lift values of 0.010, 0.129, and 1.619, respectively. "Acute survival stage", "Male", and "Female" were also strongly associated with anger/denial (lift value > 1. 35). The factors associated with anger/denial showed a very low confidence value (0.107-0.129) compared with those associated with the other three emotion groups (0.699-0.911).

Social Network Analysis
Tables 3 and 4, and Figures 1 and 2 show the network of the top 24 nodes (50% of the total nodes) that represent emotion groups and factors related to emotion groups, which were mentioned on social media. Tables 3 and 4 show the frequency, degree, and edge weight of nodes on internet blogs, online cafés, and message boards (Table 3) and Twitter (Table 4). Figures 1 and 2 display the network of the relationship between nodes, including the degree and edge weight of the nodes in posts from internet blogs, online cafés, and message boards ( Figure 1) and Twitter (Figure 2). The size of the nodes is proportional to the degree, and the thickness of the edges is proportional to the edge weight.       On internet blogs, online cafés, and message boards, the node representing "Fear/Anxiety/ Overwhelmed" was the most frequent, followed by "Acute survival stage", "Hope/Gratitude", "Sadness/Depression/Loneliness/Guilt", and "Fatigue/Pain/Fever".
Each of the 47 nodes was connected to the remaining 46 nodes.

Discussion
In this study, we explored the relationship between the public's emotions about cancer and emotion-related factors by association rule mining and social network analysis of social media data based on a cancer ontology.
A frequency analysis of the public's emotion for cancer revealed that hope/gratitude, the only positive emotion group, was the most common emotion group mentioned on social media, appearing in 46.5% of all posts. A previous study calculating the happiness value of cancer patients and the general public on Twitter [11] found that the computed happiness value was higher for cancer patients' tweets than for those from the general public and that negative words were less frequent among the tweets of cancer patients. Cancer survivors may be more thankful for being cancer-free or grateful and appreciative of their family. Moreover, cancer survivors not only record their experiences and feelings but also read other people's posts and provide positive emotional support and encouragement to each other, according to the study by Lieberman and Goldstein [21]. Hope and gratitude have a beneficial effect on physical health, psychological wellbeing, and the quality of life of cancer survivors [22].
Fear/anxiety/overwhelmed and sadness/depression/loneliness/guilt appeared in 45.2% and 39.4% of the total posts, respectively, consistent with a previously reported prevalence of anxiety and depression among cancer patients and caregivers [6,23]. Linden et al. [6] reported that the frequency of anxiety symptoms was 41.6% in 10,153 Canadian cancer patients. Similarly, in a meta-analysis involving 21,149 caregivers, Geng et al. [23] found that the frequencies of anxiety and depression were 46.55% and 41.0%, respectively. Uncertainty of the progress and prognosis of cancer could have contributed to the high prevalence of anxiety in cancer patients and caregivers. Herschbach et al. [24] found that the most important psychological distress in cancer patients was anxiety and fear. Notably, cancer patients were afraid of disease progression, re-hospitalization, pain, and not being fit for work. Sadness and depression are also prevalent feelings among cancer patients and caregivers.
Although several studies have reported that depression in cancer patients and caregivers impacts their quality of life, treatment compliance, subjective perception of physical symptoms, and prognosis [3,4], its diagnosis is often overlooked by healthcare providers. Thus, it is crucial that healthcare providers diagnose anxiety and depression in cancer patients and caregivers.
Anger/denial was the less frequently mentioned emotion group on social media, with a frequency of 8.0%. Consistently, Hadi et al. [25] found that anger was less frequent in breast cancer patients than were depression and anxiety. They also found that the anger score in breast cancer patients was significantly lower than that in the general public. Our study and Hadi et al.'s study suggest that cancer survivors may suppress or restrain the expression of anger, potentially leading to distress and depression [26], although anger or denial is the first emotion that most patients experience after cancer diagnosis [27]. Since anger suppression is known to have a negative effect on cancer prognosis [28], the development of interventions that manage anger in cancer patients is urgently needed.
In this study, we also identified factors associated with each emotion group using association rules mining. Acute survival stage, breast cancer, and treatment methods, including radiation therapy, chemotherapy, and surgery, were associated with hope/gratitude. Among those factors, acute survival stage was also associated with anger/denial. According to Mullan [29], acute survival stage is the first stage of cancer survivorship and includes the time of diagnosis until the initial treatment, such as surgery or radiotherapy. After being diagnosed with cancer, many people experience denial and anger. Moreover, people at this stage need practical assistance, such as medical information or social support, and healthcare providers and family members usually give hopeful messages about the prognosis after treatment. Breast cancer was also associated with hope/gratitude, which is likely to be because it is the most common cancer among Korean women and has the second-highest survival rate after thyroid cancer [30]. Breast cancer online cafés have the highest number of members among all cancer-related online cafés in Korea. Breast cancer patients often share health information on their treatment, symptoms, and emotional support. In contrast to men, women tend to share feeling-centered and emotion-focused supportive messages [31].
Early stage disease, gastrointestinal problems, fatigue/pain/fever, and pancreatic cancer were factors associated with fear/anxiety/overwhelmed. Patients with early stage cancer have a considerably better prognosis [32]. However, physical side effects, such as pain, fever, or gastrointestinal symptoms, result in anxiety and fear. Even if the early stage tumor is completely removed during surgery, the possibility of residual disease causes fear and anxiety [33]. Pancreatic cancer was associated with fear/anxiety/overwhelmed. Consistently, Zabora et al. [7] reported that pancreatic cancer had the highest anxiety score among 14 cancer types. Additionally, pancreatic cancer has an extremely poor prognosis, with a 5-year survival rate of only 12% in Korea [34]. Therefore, people with these factors require special attention regarding fear and anxiety.
Surgery, hair loss/skin problems, and fatigue/pain/fever were associated with sadness/depression/ loneliness/guilt. Among these factors, hair loss/skin problems were also associated with anger/denial. Surgery is one of the most common treatments for cancer. Cancer patients often experience depression before surgery due to uncertainty about the surgery outcome, anesthesia, death, fear of postoperative pain, or complications. They also feel depressed after surgery due to pain and physical changes. Patients who experience loss of body parts, such as those undergoing mastectomy [35] or colostomy [36], often experience severe depression. Hair loss is a common side effect of chemotherapy [37]. Although hair loss is not permanent or life-threatening, it has a tremendous psychological impact on patients, as it is a drastic change in physical appearance. Fatigue is a common treatment-related physical symptom, experienced by 90% of cancer patients receiving radiation therapy and 100% of those receiving chemotherapy [38]. Although most healthy people recover from fatigue with sleep and rest, cancer patients suffer from depression and chronic fatigue due to their disease and the side effects of the treatment [39].
An internet blog is a personal platform where individuals can write about their interests, whereas online cafés are communities where people share common interests. In this study, we found that gallbladder cancer and pancreatic cancer, both of which have a poor prognosis, were mentioned more frequently in online cafés than on internet blogs, perhaps due to a potential preference to share their experiences with many people and acquire emotional comfort through an online café. In this study, we found that younger people (10-50 years old) were most frequently mentioned on internet blogs, whereas older people (>60 years old) were most frequently mentioned in online cafés. Younger people are more familiar with social media and often communicate through their own internet blogs. In contrast, older people tend to communicate through established online communities [40]. In terms of emotions and emotion-related factors, acute survival stage, fear/anxiety/overwhelmed, hope/gratitude, and sadness/depression/loneliness/guilt, were most commonly mentioned on internet blogs, online cafés, and message boards. This finding suggests that the general public is most active on social media for seeking information and emotional support during the acute survival stage and that various emotions, including hope, sadness, and fear, are mixed at this stage.
Twitter is a microblog, where users can post short messages to their followers. Since most Twitter users are aged between 10 and 30 [40], cancer types that are more common in young individuals, such as leukemia [41], were more frequently mentioned on Twitter than on internet blogs or online cafés. Leukemia patients or their parents who are the primary caregivers post about the disease and seek emotional support on Twitter. Moreover, leukemia and sadness/depression/loneliness/guilt were most frequently mentioned on Twitter. This could be because the parents, who are the principal caregivers of leukemia patients, feel guilty about their children. This hypothesis is supported by a study that showed that the most common emotion in mothers caring for their children with blood cancer was guilt [42]. Hence, tailored emotion management programs should be established for cancer patients of different ages and with different cancer types, but also for caregivers, considering social media usage patterns.
This study had some limitations. First, although various SNS are available in Korea, such as Kakaostory, Facebook, and Instagram [40], we were only able to collect data from Twitter because the data is the only publicly available on SNS. However, future studies are required to analyze data from other social media platforms. Second, the emotions expressed in the post may not be the emotions felt by the post writers, and the factors related to emotions including demographic factor may not be the writer's actual information, so care must be taken when interpreting the data. Third, if several emotions were mentioned in a single post, the factors related to emotions may have been analyzed as factors related to various emotions. It is necessary to explore the context of the posts to understand which factors are related to each emotion. Fourth, although cancer incidence increases rapidly after the age of 65, social media is mainly used by young people. Hence, older people's emotions in relation to cancer may have been under-represented in this study. It is necessary to collect data from other sources to understand the emotions of older people more fully.

Conclusions
We explored the relationship between the public's emotions about cancer and the factors related to these emotions using social media data. By using consumer terms in the cancer ontology, we could collect comprehensive social media data related to cancer written by the consumers. We found that the most frequently mentioned emotion group was hope/gratitude, and general public engages in active social media activities during the acute survival stage, when they feel various emotions. Thus, it is especially important to manage the emotions of cancer survivors in the acute survival stage. Additionally, the usage patterns of social media channels differed depending on the age and cancer type. Younger generations or leukemia were mentioned more in social media pages, such as internet blogs and Twitter threads, created by affected individuals, whereas older people or gallbladder cancer were mentioned more in social media pages created by others, such as online cafés. Thus, internet blogs or Twitter could be used for younger people and online cafés could be used for older people as the primary social media channel for managing the emotions of cancer survivors. The findings of this study could guide the development of tailored emotion management programs for cancer survivors reflecting the general public's needs by age, cancer type and stage, treatment type, symptoms, and survival stage.

Conflicts of Interest:
The authors declare no conflict of interest.