1. Introduction
Cancer is a significant disease burden worldwide. In Korea, cancer-related deaths accounted for 26.5% of all deaths in 2018, and cancer has been ranked as the leading cause of death since 1983 [
1]. However, with the recent advances in cancer prevention, early detection, treatment, and follow-up care, the five-year survival rate for all cancers combined improved from 44.0% during 1996–2000 to 70.4% during 2013–2017 in Korea [
2]. Since the life expectancy of cancer patients has increased, cancer survivors may face physical and emotional hardships from cancer-related and treatment-related short-term, long-term, and late health effects. Cancer survivors experience various physical challenges, such as physical impairment, severe fatigue, vomiting, and loss of appetite. They also face numerous emotional problems, including anger, fear, anxiety, depression, and loneliness, throughout the cancer trajectory. Many cancer survivors suffer from severe emotional distress, which can impair their ability to cope with cancer effectively and impact their quality of life. Anxiety and depression, which are most common among cancer survivors, have been reported to result in poor adherence to cancer treatment, poor cancer survival, and increased risk of suicide if not properly managed [
3,
4]. Emotional distress in cancer survivors has been associated with several patient and disease characteristics, such as age [
5], gender [
6], cancer type [
7], and stage [
7,
8]. Hence, cancer survivors are increasingly interested in how to deal with these chronic emotional challenges. The general public also has an increasing need for information concerning cancer-related emotion because they may have cancer survivors around them as family members, friends, or co-workers [
9].
The National Cancer Institute and the American Cancer Society (United States), as well as Cancer Research UK (UK), provide comprehensive cancer information on their websites. They also provide information on emotional problems experienced by cancer survivors, as well as providing information regarding the management of emotions. However, the emotions experienced by cancer survivors may vary depending on the gender, age, cancer type, and cancer stage. Thus, it is necessary to provide tailored information on emotional management, which requires a deep understanding of the emotions experienced by the public regarding cancer. Recently, the use of data obtained from social media has been proposed as a novel approach to understand the public’s emotional reaction to it. Social media platforms, such as social networking services (SNS), internet blogs, and online communities, have gained increasing popularity with the spread of smartphones and the internet over the last decade. Through these platforms, individuals report personal experiences, opinions, and emotions about cancer, sharing them with others in real-time [
10]. Social media posts can be very useful in understanding the public’s emotions relating to cancer. For example, Crannell et al. [
11] analyzed 146,357 tweets from cancer survivors and found that patients with cancer with favorable prognoses had higher happiness scores than patients with cancers with poor prognoses. However, this and other similar studies using social media data were limited to only one emotion [
11] or classified emotions only as positive, negative, or neutral [
12,
13,
14,
15]. Thus, these studies did not provide a comprehensive insight into the public’s various emotions relating to cancer, such as depression, anxiety, hope, and anger. Additionally, social media data were collected in the previous research using only a few keywords for each topic of interest, which might not be sufficient to capture the full scope of emotions relating to cancer. As social media posts are written in colloquial language, consumer terms must be used to collect social media data. Thus, a list of concepts and terms for emotions about cancer used by the consumers provides a more appropriate framework for the collection and analysis of such data.
In this study, we used a cancer ontology previously developed by the authors [
16] as a framework for social media data collection and analysis to investigate the public’s emotions concerning cancer. The ontology contained nine superclasses (cancer type, prevention, diagnosis, treatment, prognosis, risk factor, symptom, dealing with cancer, and emotion), 213 class concepts, and 4061 synonyms. The emotion superclass, which was one of the nine superclasses, was composed of nine classes (denial, anger, overwhelmed, anxiety, depression, loneliness, guilt, hope, and gratitude) with 454 synonyms. The synonyms included colloquial expressions (heteronyms, abbreviations, and slang) used by the public in daily life, which are useful for collecting social media data. This study explores the relationship between the public’s emotions about cancer and the factors affecting these emotions by association rule mining and social network analysis. The relationship between the public’s emotions relating to cancer and the factors affecting these emotions identified in this study will provide a framework for establishing tailored health information that supports the management of emotions.
2. Materials and Methods
In this study, we used 321,339 posts on cancer and cancer-related emotions. These posts were obtained from online cafés (online communities) operated by Naver and Daum, internet blogs operated by Naver, Daum, Tistory, and Egloos, Twitter, and 15 message boards (e.g., YouTube, Naver Knowledge iN, Nate talk), between 1 January 2014, and 30 June 2017, in Korea. Naver and Daum are the two largest online platforms in Korea. Tistory and Egloos are blogging platforms in Korea.
2.1. Data Collection
We collected posts on cancer and cancer-related emotions. As search keywords, we used 302 terms for concepts and synonyms of the cancer type superclass (such as malignant cancer, colon cancer, brain cancer, liver cancer, and breast cancer) and 454 terms for concepts and synonyms of the emotion superclass (such as surprise, embarrassment, contempt, anger, and worry) contained in the cancer ontology developed by the authors. As stop keywords, we used 418 terms (such as malicious virus). We collected posts on cancer, from which we then selected posts on emotions relating to cancer.
Of the 1,854,497 posts on cancer, 434,299 posts included keywords indicating at least one emotion. Of these, 112,960 posts contained advertising keywords (such as detoxification) and were excluded. Finally, a total of 321,339 posts were selected for analysis. Terms were extracted from the posts using ontology-based natural language processing (NLP). The data collection and NLP procedures were performed in collaboration with SK Telecom Smart Insight, a Korea-based big-data marketing platform. The company used web crawler and Java-based NLP tools, which were developed within the company. Each document was word tokenized and mentions of emotions and factors related to emotions in each document were identified using terms in cancer ontology.
The study was approved by the Institutional Review Board of the Seoul National University (IRB No. 1802/001-006). Collected posts did not have any identifiable personal information.
2.2. Data Preparation
The collected social media data were converted into structured data for analysis. A single post was treated as an analysis unit. After extracting terms from each post, we identified terms for emotions and factors affecting the emotions, which were included in the cancer ontology.
Emotions concerning cancer were defined as per the emotion classification proposed by Jack et al. [
17] as “Happy”, “Surprise/Fear”, “Sad”, and “Disgust/Anger”. The nine emotions included in the cancer ontology were divided into four emotion groups as per the classification by Jack et al. as “Hope/Gratitude”, “Fear/Anxiety/Overwhelmed”, “Sadness/Depression/Loneliness/Guilt”, and “Anger/Denial”. We coded the posts based on the presence of terms indicating each of four emotion groups as 0 (=no) or 1 (=yes). If more than two emotion groups were mentioned in a single post, each emotion group was counted. If the same emotion group was mentioned multiple times in a single post, it was only counted once.
Factors affecting emotions, such as gender, age, cancer type, stage, treatment, survival stage, and symptoms were defined. Gender was defined as male and female, and individuals were grouped according to age as <10, 10 s, 20 s, 30 s, 40 s, 50 s, 60 s, 70 s, and >80. Cancer types were divided into 14 groups, namely breast cancer, colon cancer, gastric cancer, leukemia, lung cancer, cervical cancer, liver cancer, brain cancer, pancreatic cancer, ovarian cancer, prostatic cancer, gallbladder cancer, kidney cancer, and thyroid cancer, by combining the top 10 cancers in national cancer statistics in Korea, 2016 [
18] and social media posts. The cancer stage was defined as early stage, middle stage, and terminal stage. Treatments were classified as surgery, chemotherapy, radiation therapy, immunotherapy, complementary and alternative medicine, and transplantation. Survival stages were classified as acute survival stage, extended survival stage, and permanent survival stage. Symptoms were classified as general symptoms of fatigue/pain/fever, gastrointestinal problems, skin problems, poor circulation, thrombocytopenia, and infection. We coded the posts based on the presence of terms indicating each of the 43 emotion-related factor groups as 0 (=no) or 1 (=yes). If more than two emotion-related factor groups were mentioned in one post, each factor group was counted. If the same emotion-related factor group was mentioned multiple times in a single post, it was only counted once.
2.3. Data Analysis
2.3.1. Frequency Analysis of Post
We analyzed the frequency of the general characteristics mentioned in the post of each social media channel. We also analyzed the frequency of the four emotion groups for each social media channel.
2.3.2. Association Rule Mining
We performed emotional analysis using association rule mining by applying the Apriori algorithm to investigate the relationship between different emotion-related factors and emotion groups. Three main measures of the association rule mining were support, confidence, and lift. Support indicated the proportion of posts containing emotion-related factors and emotion groups in the entire posts. Confidence indicated the proportion of posts containing emotion groups among posts with emotion-related factors. Lift indicated the ratio of the appearance of an emotion group in posts with emotion-related factors to the appearance of that group in all posts. Rules with lift values >1 indicated a positive correlation, whereas rules with lift values <1 indicated a negative correlation. Lift values close to 1 indicated no association between emotion-related factors and emotion groups [
19]. Association rule mining was performed using the R software package (version 3.6.0).
2.3.3. Social Network Analysis
We classified social media according to the post’s length, as internet blogs, online cafés, and message boards allow for long posts, whereas Twitter posts are limited to 140 characters. We performed social network analysis to examine and visualize relationships between emotion groups or emotion-related factors that appeared together in each post in the two groups. The nodes in the network represent the emotion groups and emotion-related factors, while the edges indicate relationships between the nodes. Node activity was determined based on the degrees corresponding to the number of direct connections to the node. The strength of the relationship between nodes was assessed based on the edge weights corresponding to the number of interactions [
20]. The network of the relationships between emotion groups and emotion-related factors was constructed using NetMiner 4.4.3.b (Cyram Inc., Seoul, Korea).
4. Discussion
In this study, we explored the relationship between the public’s emotions about cancer and emotion-related factors by association rule mining and social network analysis of social media data based on a cancer ontology.
A frequency analysis of the public’s emotion for cancer revealed that hope/gratitude, the only positive emotion group, was the most common emotion group mentioned on social media, appearing in 46.5% of all posts. A previous study calculating the happiness value of cancer patients and the general public on Twitter [
11] found that the computed happiness value was higher for cancer patients’ tweets than for those from the general public and that negative words were less frequent among the tweets of cancer patients. Cancer survivors may be more thankful for being cancer-free or grateful and appreciative of their family. Moreover, cancer survivors not only record their experiences and feelings but also read other people’s posts and provide positive emotional support and encouragement to each other, according to the study by Lieberman and Goldstein [
21]. Hope and gratitude have a beneficial effect on physical health, psychological wellbeing, and the quality of life of cancer survivors [
22].
Fear/anxiety/overwhelmed and sadness/depression/loneliness/guilt appeared in 45.2% and 39.4% of the total posts, respectively, consistent with a previously reported prevalence of anxiety and depression among cancer patients and caregivers [
6,
23]. Linden et al. [
6] reported that the frequency of anxiety symptoms was 41.6% in 10,153 Canadian cancer patients. Similarly, in a meta-analysis involving 21,149 caregivers, Geng et al. [
23] found that the frequencies of anxiety and depression were 46.55% and 41.0%, respectively. Uncertainty of the progress and prognosis of cancer could have contributed to the high prevalence of anxiety in cancer patients and caregivers. Herschbach et al. [
24] found that the most important psychological distress in cancer patients was anxiety and fear. Notably, cancer patients were afraid of disease progression, re-hospitalization, pain, and not being fit for work. Sadness and depression are also prevalent feelings among cancer patients and caregivers. Although several studies have reported that depression in cancer patients and caregivers impacts their quality of life, treatment compliance, subjective perception of physical symptoms, and prognosis [
3,
4], its diagnosis is often overlooked by healthcare providers. Thus, it is crucial that healthcare providers diagnose anxiety and depression in cancer patients and caregivers.
Anger/denial was the less frequently mentioned emotion group on social media, with a frequency of 8.0%. Consistently, Hadi et al. [
25] found that anger was less frequent in breast cancer patients than were depression and anxiety. They also found that the anger score in breast cancer patients was significantly lower than that in the general public. Our study and Hadi et al.’s study suggest that cancer survivors may suppress or restrain the expression of anger, potentially leading to distress and depression [
26], although anger or denial is the first emotion that most patients experience after cancer diagnosis [
27]. Since anger suppression is known to have a negative effect on cancer prognosis [
28], the development of interventions that manage anger in cancer patients is urgently needed.
In this study, we also identified factors associated with each emotion group using association rules mining. Acute survival stage, breast cancer, and treatment methods, including radiation therapy, chemotherapy, and surgery, were associated with hope/gratitude. Among those factors, acute survival stage was also associated with anger/denial. According to Mullan [
29], acute survival stage is the first stage of cancer survivorship and includes the time of diagnosis until the initial treatment, such as surgery or radiotherapy. After being diagnosed with cancer, many people experience denial and anger. Moreover, people at this stage need practical assistance, such as medical information or social support, and healthcare providers and family members usually give hopeful messages about the prognosis after treatment. Breast cancer was also associated with hope/gratitude, which is likely to be because it is the most common cancer among Korean women and has the second-highest survival rate after thyroid cancer [
30]. Breast cancer online cafés have the highest number of members among all cancer-related online cafés in Korea. Breast cancer patients often share health information on their treatment, symptoms, and emotional support. In contrast to men, women tend to share feeling-centered and emotion-focused supportive messages [
31].
Early stage disease, gastrointestinal problems, fatigue/pain/fever, and pancreatic cancer were factors associated with fear/anxiety/overwhelmed. Patients with early stage cancer have a considerably better prognosis [
32]. However, physical side effects, such as pain, fever, or gastrointestinal symptoms, result in anxiety and fear. Even if the early stage tumor is completely removed during surgery, the possibility of residual disease causes fear and anxiety [
33]. Pancreatic cancer was associated with fear/anxiety/overwhelmed. Consistently, Zabora et al. [
7] reported that pancreatic cancer had the highest anxiety score among 14 cancer types. Additionally, pancreatic cancer has an extremely poor prognosis, with a 5-year survival rate of only 12% in Korea [
34]. Therefore, people with these factors require special attention regarding fear and anxiety.
Surgery, hair loss/skin problems, and fatigue/pain/fever were associated with sadness/depression/loneliness/guilt. Among these factors, hair loss/skin problems were also associated with anger/denial. Surgery is one of the most common treatments for cancer. Cancer patients often experience depression before surgery due to uncertainty about the surgery outcome, anesthesia, death, fear of postoperative pain, or complications. They also feel depressed after surgery due to pain and physical changes. Patients who experience loss of body parts, such as those undergoing mastectomy [
35] or colostomy [
36], often experience severe depression. Hair loss is a common side effect of chemotherapy [
37]. Although hair loss is not permanent or life-threatening, it has a tremendous psychological impact on patients, as it is a drastic change in physical appearance. Fatigue is a common treatment-related physical symptom, experienced by 90% of cancer patients receiving radiation therapy and 100% of those receiving chemotherapy [
38]. Although most healthy people recover from fatigue with sleep and rest, cancer patients suffer from depression and chronic fatigue due to their disease and the side effects of the treatment [
39].
An internet blog is a personal platform where individuals can write about their interests, whereas online cafés are communities where people share common interests. In this study, we found that gallbladder cancer and pancreatic cancer, both of which have a poor prognosis, were mentioned more frequently in online cafés than on internet blogs, perhaps due to a potential preference to share their experiences with many people and acquire emotional comfort through an online café. In this study, we found that younger people (10–50 years old) were most frequently mentioned on internet blogs, whereas older people (>60 years old) were most frequently mentioned in online cafés. Younger people are more familiar with social media and often communicate through their own internet blogs. In contrast, older people tend to communicate through established online communities [
40]. In terms of emotions and emotion-related factors, acute survival stage, fear/anxiety/overwhelmed, hope/gratitude, and sadness/depression/loneliness/guilt, were most commonly mentioned on internet blogs, online cafés, and message boards. This finding suggests that the general public is most active on social media for seeking information and emotional support during the acute survival stage and that various emotions, including hope, sadness, and fear, are mixed at this stage.
Twitter is a microblog, where users can post short messages to their followers. Since most Twitter users are aged between 10 and 30 [
40], cancer types that are more common in young individuals, such as leukemia [
41], were more frequently mentioned on Twitter than on internet blogs or online cafés. Leukemia patients or their parents who are the primary caregivers post about the disease and seek emotional support on Twitter. Moreover, leukemia and sadness/depression/loneliness/guilt were most frequently mentioned on Twitter. This could be because the parents, who are the principal caregivers of leukemia patients, feel guilty about their children. This hypothesis is supported by a study that showed that the most common emotion in mothers caring for their children with blood cancer was guilt [
42]. Hence, tailored emotion management programs should be established for cancer patients of different ages and with different cancer types, but also for caregivers, considering social media usage patterns.
This study had some limitations. First, although various SNS are available in Korea, such as Kakaostory, Facebook, and Instagram [
40], we were only able to collect data from Twitter because the data is the only publicly available on SNS. However, future studies are required to analyze data from other social media platforms. Second, the emotions expressed in the post may not be the emotions felt by the post writers, and the factors related to emotions including demographic factor may not be the writer’s actual information, so care must be taken when interpreting the data. Third, if several emotions were mentioned in a single post, the factors related to emotions may have been analyzed as factors related to various emotions. It is necessary to explore the context of the posts to understand which factors are related to each emotion. Fourth, although cancer incidence increases rapidly after the age of 65, social media is mainly used by young people. Hence, older people’s emotions in relation to cancer may have been under-represented in this study. It is necessary to collect data from other sources to understand the emotions of older people more fully.