Exploring Signals for a Nuclear Future Using Social Big Data

: Since the start of the new Korean government in 2017, the Korean nuclear energy system has undergone a major change. This change in national energy policy can be forecasted by analyzing social big data. This study verifies whether future forecasting methodologies using weak signals can be applied to Korean nuclear energy through text mining the data of web news between 2005 and 2018, comparing and applying the methodology to notable events (i.e., the UAE nuclear power plant (NPP) contract and nuclear phase-out). In addition, we predict what changes will be made in the Korean nuclear energy system post-2019. Keywords extracted through text mining were quantitatively classified into a weak signal or a strong signal using a Keyword Emergence Map (KEM) and a Keyword Issue Map (KIM). The extracted keywords predicted the contract of the UAE NPPs in 2009 and nuclear phase-out in 2017. Furthermore, keywords revealing future signals beyond 2019 were found to be ‘nuclear phase-out’ and ‘wind energy’. The weak-signal methodology can be applied as a tool to predict future energy trends during the current circumstance of the rapidly changing world energy market.


Introduction
Historically, mankind has devoted much attention to predicting the future. Previously, predicting the future was solely the responsibility of priests, astrologers, and shamans. Their future forecasts often deviated and lacked concrete rationale. However, with the advent of science and technology, future predictions became more sophisticated and fact-based. Future prediction in the modern sense has been redefined as a value-creating act that scientifically predicts future strategies and presents strategies in a wide range of fields, such as technology, markets, organization, and policy [1]. Future predictions in the modern definition began to take shape between 1945 and 1960, during which time the development of scientific methodologies through rationalization was attempted [2]. At first, future-oriented research on a national level was mainly carried out to analyze new market characteristics due to the rapid economic growth after World War II [3] and to establish nuclear security and military strategies during the Cold War era [4,5]. Today, due to the rapidly changing social environment and competitive business environment, the importance and use of future forecasts based on a Management Information System (MIS) is increasing [6]. An MIS is a computerized information-based system that provides tools to provide prediction information that helps in decision making. As well as major industrialized nations, many large corporations are also focusing on forecasting to identify trends and to select key technologies for the future.
Since the launch of the new Korean government in May 2017, Korea′s nuclear power sector has undergone considerable changes. The new government announced and enforced the so-called "energy conversion policy" that drastically reduced the nuclear energy dependency in accordance with the presidential election pledge. As a result, nuclear science and the industry are experiencing turbulent times. However, this change was fully anticipated because four out of the five potential candidates that ran for the presidential election backed reducing dependency on nuclear power; one of them was elected president, and he is only fulfilling his commitments as so pledged. So, why did the prominent presidential candidates pledge to move away from nuclear power? Generally, most of the promises made by each candidate during the presidential election are composed of contents that can be fully supported by the voters. In other words, each of the commitments is already made up of content that has gained enough support and consensus throughout society, and this is especially true for policies that have a significant impact on the economy, such as energy policies. In other words, it can be judged that there was already a social consensus on nuclear phase-out even before the establishment of the new government. The Korean nuclear energy field was simply not fully aware of the social consensus and is paying for their lack of preparation. So, how could the Korean nuclear industry predict this change before 2017? And what will be the future of nuclear energy in Korea?
This study will use web-news data collected from social media and perform an analysis using the future-predictive signaling method based on text mining to verify [1] the weak signal detection methodology and to predict the future of Korean nuclear energy [2].

Methodology
Since its beginnings in 2000, social media services have been widely used in our society, with many countries and corporations using social data to solve social problems and to predict the future. Therefore, much effort has been invested in developing a future-forecasting methodology using social media data. In order to predict the future using social media, we must first analyze the vast amount of data produced on web news, Facebook, Twitter, etc. Well-known methods for analyzing social media data include text mining [7], opinion mining [8], and network analysis [9]. Among them, text mining is known to be a very effective method for analyzing social media. Since atypical text data produced on social media have a high impact on the real economy and society, it has high information value [10]. Text mining is already commonly used for future predictions in various fields, such as policymaking [10], technology [11,12], management [13,14], and even the movie industry [15].
Recent studies on future predictions were extended to analyze time-series big data. Roberto Corizzo et al. [16] proposed a methodology using clustering to process massive time-series big data efficiently and utilize it to detect an anomaly of a natural signal (gravitational waves) [17]. Other models on time-series analysis were widely used to forecast the energy market [18,19], especially for renewable energy.
Among the future prediction methodologies based on text mining social big data, weak signal detection is attracting much attention [20,21]. Weak signals are defined as 'signs of a possible change in the future' and [22] can develop into a trend and a megatrend by developing into a strong signal over time. Hiltunen (2008) described the concept of weak signals using the triadic model of the sign proposed by Charles Sanders Peirce   [23]. According to Peirce, a sign is expressed in three dimensions: "the object," "the representamen", and "the interpretant". Hiltunen applied this to the concept of future sign to interpret "the object" as an emerging "issue" and "the representamen" as a "signal", such as news articles, rumors, and photographs. Additionally, "the interpretant" was seen as an "interpretation", such as the assumption or sense of people regarding future events. As illustrated in Figure 1, we describe the weak signal as a three-dimensional future-signal space centered on signal, issue, and interpretation. The concept of this weak signal has been used in many future-predictive studies [24,25] to analyze the business environment [26,27], to assist in a company's strategy formulation, and [28] to establish institutional policies. Many research groups, such as the US Strategic Business Insight, UK Horizontal Scanning Center, and Finland Futures Research Center, have recently begun to use the weak signal concept to predict future trends. However, most studies do not identify peripheral vision-which can detect weak signals through quantitative analysis-but have focused on viewing the future according to expert knowledge and opinions [29]. In order to incorporate this weak signal concept into social media analysis, Yoon (2012) linked 'term frequency' and 'document frequency' measured by text mining to the signal and issue as defined by Hiltunen (2008) [20]. Yoon (2012) presented the Keyword Emergence Map (KEM) and the Keyword Issue Map (KIM) as tools for classifying keywords and selecting signals. These keyword portfolio maps considered a time-weighted factor based on two preconditions: (1) important keywords have many appearance frequencies, and (2) the more recent the keywords are, the higher their importance is compared to older keywords.
First, KEM shows the degree of visibility (DoV) of the future signal and is a keyword portfolio map that corresponds to the signal among the three dimensions of the future signal. We use the term frequency of the keyword obtained through text mining as a measure of visibility. The parameter DoV is proposed to take into account the importance of the keywords according to the time of appearance [20]. The DoV of the keyword i appearing in period j is defined as follows: TFij is the total term frequency of keyword i during period j; NNj is the total number of news articles during period j; n is the number of periods, and tw is the time weight. tw values of 0.05 were applied as in Yoon's study (2012). He reported that the value of tw was determined by business experts on solar cells. In other words, the higher the number of occurrences and the more recent the keywords are, the higher the DoV value and the higher the visibility are. In order to analyze the signals of the future-signal keywords, KEM classifies the patterns of keywords by setting the x-axis as the average term frequency and the y-axis as the average increasing rate of the DoV. The weak signal is a signal that shows an abnormal increase pattern, although it has not been exposed yet, so the keyword located in the second quadrant of the median of the x and y axes corresponds to the weak signal. The first quadrant marks the strong signal, the third quadrant shows the latent signal, and the fourth quadrant shows not a strong but well-known signal.
Second, the KIM is a keyword portfolio map that indicates the degree of diffusion of future signals and is related to the issue of future signals. To measure the degree of diffusion of future signals, the document frequency of news articles with keywords were quantitatively used. As with the DoV, the degree of diffusion (DoD) considering the time weight is proposed as a parameter to quantify the signal diffusivity and is defined as follows: DFij is the document frequency at which keyword i appeared during period j. In other words, if the keyword has a higher number of news articles in which a particular keyword appears and the article is newer, the keyword has a larger DoD value and spreads more readily. To analyze the issue of future-signal keywords, the KIM classifies the properties of the keywords by setting the x-axis as the average document frequency and the y-axis as the average increasing rate of the DoD. Therefore, although the keyword in the second quadrant based on the median of both axes is not yet well known, it is a potential weak signal that will become an issue in various news articles in the future.
By using the KEM and KIM, we can classify topic keywords that can be weak signals in terms of both signals and issues. As can be seen in Figure 1, the weak signal can be seen as a signal in the lower region of the plane between the signal and the issue. Therefore, as shown in Figure 2, the KEM and KIM are collated with each other, and the intersection of weak-signal candidates is obtained in each keyword emergence map, so that a weak signal that could become a future trend can be identified. In other words, the keywords in the second quadrant of both the KEM and the KIM are not well known yet because of their low term frequency and document frequency, respectively. However, their rapid increase in visibility (signal) and diffusivity (issue) highlights their potential to develop into a strong signal, which is consistent with the definition of a weak signal by Hiltunen (2008). This methodology developed by Yoon (2012) overcomes the limitation of the qualitative approach while satisfying the weak-signal concept of Hiltunen (2008) [15]. Moreover,  applied the methodology and presented the applicability in the energy field (smart grid) [21]. According to the future-signal space reported by Hiltunen (2008), the plane of the signal and issue constitutes objective dimensions. This plane can be replaced by the overlap of the KEM and KIM, which quantitatively correspond to signals and issues, respectively, without subjective analysis based on reality. Weak signals-derived from the KEM and KIM-are less recognizable but show abnormal patterns and can be reinforced in the near future with the addition of a receiver's future interpretation. Thus, the KEM and KIM provide a basis for finding weak signals objectively and quickly, which can help professionals interpret future signals.

Text Mining and Verification of the Methodology
This study collected 273,066 articles from 'Naver News' that were searched for 'nuclear energy' from 2005 to 2018, as shown in Figure 3. The total period was divided into half-year periods, so that the investing range starts from t1, which is the first half of 2005, to t28, which is the second half of 2018. The collected news articles were classified into: (1) data from the first half of 2005 to the first half of 2010, (2) data from the first half of 2011 to the first half of 2017, and (3) data from the first half of 2017 to the second half of 2018. Since the classified data are in the form of a sentence, it is processed as a keyword type through a morphological analysis process. The total number of keywords calculated in this study is 20,514 for all half-year periods. Among all keywords, one-time keywords that did not appear continuously during the designated period were filtered out. Only the keywords that appear at least once, every half-year, were selected. Then, we filtered out unnecessary words from the chosen keywords, such as the name of the press, the name of a person who is not related to the social environment of nuclear energy, a geographical name of little significance, and common words. The selected keywords were quantified for their term frequency, document frequency, DoV, and DoD according to the future-sign method to find weak signals. Based on these results, each of the keywords was phonetically mapped in accordance with the KIM and KEM, and keywords were identified that are the weak-signal topic in the field of nuclear power. The keywords with a DoV and DoD less than 10 were excluded in KIM and KEM.
The aim of this study was to predict the future of nuclear energy in Korea after 2019. In other words, we collected web news related to nuclear energy, analyzed it with a text mining technique, and searched for a topic that will become a future trend according to the future-sign methodology using weak signals. First, we verified the validity of the future-sign methodology using social media data to find weak signals in the Korean nuclear energy field. For this verification, we set two hypotheses and used two events that the Korean nuclear industry experienced: the UAE nuclear power plant (NPP) contract in December 2009 and the nuclear phase-out declaration in June 2017. The study confirmed whether the two events were predictable in the form of a weak signal and whether the two events changed to a strong signal. Furthermore, its validity was verified when applying the future-sign methodology using weak signals to the nuclear power field. Furthermore, this study presented the potential weak signals that could become a future trend of Korean nuclear energy after 2019.

Results
First, we verified the feasibility of applying the future-signal prediction methodology using social media data to the nuclear energy field. Section 3.1 is the first validation result, and Section 3.2 is the second validation result. Section 3.3 shows the result of the future-signal prediction analysis of the nuclear field after two validation tests.  Table 1 shows the signals derived from the KIM/KEM results. Results include, 'UAE' and 'contract' as weak signals.  Based on this analysis, we can confirm that the keywords 'UAE' and 'contract' were transferred from a weak signal to a strong signal in the second half of 2009 and first half of 2010, as shown in Figure 5 and Table 2. Using these results, we were able to verify that Hypothesis 1, "Can the weak signal methodology predict the UAE nuclear contract by using social media data from the first half of 2005 to the first half of 2010," is supported.

Analysis 2 (2011-2017)
As a result of the analysis, the KIM/KEM was plotted in Figure 6 using words collected with the keyword 'nuclear energy' from 2011 to 2016. As a result, 'president Moon' and 'nuclear phase-out' were included in the weak signal. All keywords were categorized according to their types of signal, as shown in Table 3.  Based on this analysis, we can confirm that the keywords 'president Moon' and 'nuclear phaseout' were transferred from a weak signal to a strong signal in the second half of 2009 and first half of 2010, as shown in Figure 7 and Table 4. Using these results, we were able to verify that Hypothesis 2, "Is it possible to predict the nuclear phase-out declaration of the Korean government using social media data from the first half of 2011 to the first half of 2017," is supported.

Analysis 3 (2017-2018)
We successfully verified the validity of the future-signal prediction methodology using social data. As a result, we have determined that this methodology has sufficient validity. Therefore, this study went on to predict the future related to nuclear power in Korea beyond 2019.
Based on the results of the analysis, the strong signals related to nuclear power in Korea as of 2019 are North Korea and U.S.A. This is because negotiations between the United States and North Korea regarding the North Korean nuclear issue have continued since 2017. We can see that 'nuclear phase-out' is already classified as a trend in Korea. As shown in Figure 8 and Table 5 (first half of 2011-2017), the nuclear phase-out issue has previously appeared as a strong signal and, as a result, it can be seen that the nuclear phase-out movement by the nuclear phase-out NGOs has become a trend in Korean society. Moreover, several particular keywords, such as KFEM (a nuclear phase-out NGO) and Ms. Yangyi (a nuclear phase-out activist) arose as weak signals. At the same time, the keywords "renewable energy" and "wind energy" were classified as weak signals. These weak signals may show that Korea's nuclear phase-out will intensify, and renewable energy-especially wind energywill attract attention as an alternative energy source in Korea.

Conclusions
In this study, social media data for the field of nuclear power was used to predict future signals. The proposed methodology was verified to confirm whether the future-signal prediction methodology is valid for the nuclear power sector (1) from 2005 to the first half of 2010 and (2) from 2011 to the first half of 2017. As a result of the verification process, the future-signal prediction methodology using social data accurately predicted the export of the UAE NPP-which became big news in the first half of 2010-and the declaration of nuclear phase-out in the first half of 2017. Therefore, we could carry out the future prediction of the Korean nuclear energy field beyond 2019, which was the purpose of this study. Using the social data from 2017 to 2018, we confirmed the future signals, and several keywords relevant to 'nuclear phase-out', 'renewable energy' and 'wind energy' were the weak signals. This result suggests that nuclear phase-out in Korea is likely to result in complete nuclear phase-out, both politically and industrially, rather than simply as a political slogan. In fact, in the first half of 2019, the Korean government pushed strongly towards nuclear phase-out policies. 'Wind energy' was analyzed because it is often mentioned as an alternative power supply according to the nuclear phase-out policy. Generally, solar power is considered an alternative to nuclear power. However, due to the nature of Korea, where 75% of the country is mountainous and it is a peninsula, offshore wind energy can be an alternative for Korea instead of solar power generation. In other words, Korea is likely to experience an era of nuclear phase-out and select wind energy as an alternative to nuclear power, even though renewable energy sources are perceived by the public as more cost competitive than they really are [30]. The investigation on the public perceptions of energy sources in South Korea [31] indicated the same conclusion on the choice of the most acceptable power plant. As time passes and debates on the energy sources are exposed, local acceptance of nuclear power will be plunged due to its unfamiliarity [32], and attention on wind energy will emerge [33].
A weak signal is the most probable topic to be a strong signal in the near future. However, as we can observe in the analysis, not all weak signals develop into strong signals. In our case study, a weak signal with a time-weighted, increasing rate of more than 0.3 has a high possibility of being a strong signal. Two cases of the analysis presented in the paper show that the increasing rate seems to be a more critical factor than the term frequency; however, several keywords, such as Turkey and green, were outlied as exceptions. This qualitative criterion was not statistically verified and may not apply to other fields of study. To define explicit criteria, more cases should be investigated statistically for further research.
Besides, we have other challenges that come from the time-variable and hidden factors. The current methodology cannot define the required time for weak signals to develop into strong signals. The continuous changes in the KEM/KIM position between two points of time were not appropriately presented [21]. Moreover, a sensitivity study for the value of tw and investing period should be conducted to improve the methodology. Furthermore, there may be other important factors besides the increasing rate and the term/document frequency. Hundreds of factors might be underlying, including internal (national policy, education, economics) and external (technology innovation, accidental events, price of energy source) factors; however, this study cannot consider those underlying factors because they have high future uncertainty and are challenging to quantify. Although the weak-signal methodology requires several improvements, it provides a number of future signals based on DoD and DoV, is easy to quantify through text mining, and gives an intimation to forewarn a future trend.

Discussion and Implication
In this study, the future-signal prediction methodology was used to analyze social data and predict the future of nuclear energy in Korea, which has experienced a rapidly changing situation since June 2017. The future-signal prediction methodology was chosen because the decision after the Fukushima accident in 2011 for nuclear continuation became a social and political problem, not a scientific and technological one. Since the 1970s, the Korean government and the nuclear industry, which have been operating state-run NPPs, have not recognized the change in social consensus. The focus was placed on solving nuclear energy problems with energy security, economic value, science technology, and system stability. However, the social awareness of ordinary citizens has changed drastically. The change moved the media and eventually had a significant impact on the political environment.
The pledges announced by each presidential candidate reflect the overall social consensus, which has a significant impact on votes. And, in the matter of nuclear power, the consensus was formed at least three or four years before the presidential election. The Korea Atomic Energy Agency has overlooked the fact that the Korean government's declaration of nuclear phase-out had already been announced three or four years prior to June 2017. In 2019, the Korean nuclear industry is experiencing a decline in nuclear power majors, a decrease in applicants, a decline in profits from nuclear-related companies, bankruptcy, and a deterioration in the research performance of nuclear research institutes. This can be attributed to the lack of diligence on the part of the atomic system for being largely indifferent to social change.
In the future, Korea's nuclear energy industry faces the challenge of raising favorable public opinion on nuclear energy and of changing the perception of society as a whole. Through this process, we will be able to find a solution that ensures the survival of the atomic system. The future-signal prediction methodology used in this study will be able to recognize the change and predict the future to suggest better alternatives and policies.

Conflicts of Interest:
The authors declare no conflict of interest.