Cross-Societal Analysis of Climate Change Awareness and Its Relation to SDG 13: A Knowledge Synthesis from Text Mining

: The awareness and the engagement of various stakeholders play a crucial role in the successful implementation of climate policy and Sustainable Development Goals (SDGs). SDG 13, which refers to climate action, has three targets for combating climate change and its impact. Among the three targets, SDG 13.3 aims to “improve education, awareness-raising and human and institutional capacity on climate change mitigation, adaptation, impact reduction, and early warning”. This target should be implemented based on the understanding of climate change awareness among various groups of societies. Furthermore, the indicator related to awareness-raising is absent in SDG 13.3. Hence, this study aims to explore the differences in climate change awareness among various social groups within a country from a text mining technique. By collecting and analyzing a large volume of text data from various sources, climate change awareness was investigated from a multilateral perspective. Two text analyses were utilized for this purpose: Latent Dirichlet Allocation (LDA) topic modeling and term co-occurrence network analysis. In order to integrate and comparatively analyze the awareness differences among diverse groups, extracted topics were compared by classifying them into four indicators derived from the detailed targets in SDG 13.3: mitigation, adaptation, impact reduction, and early warning. The results show that the Korean public exhibited a relatively high awareness of early warning compared to the other four groups, and the media dealt with climate change issues with the widest perspective. The Korean government and academia notably had a high awareness of both climate change mitigation and adaptation. In addition, corporations based in Korea were observed to have substantially focused awareness on climate change mitigation for greenhouse gas reduction. This research successfully explored the disproportion and lack of climate change awareness formed in different societies of public, social, government, industry, and academic groups. Consequently, these results could be utilized as a decision criterion for society-tailored policy formulation and promoting climate action. Our results suggest that this methodology could be utilized as a new SDG indicator and to measure the differences in awareness. developed as a quantitative indicator for this purpose. Nevertheless, the present method highly as a indicator to understand the status of climate change awareness mitigation, adaptation, impact reduction, early warning. methodology policymakers and researchers to develop new indicators for tracking, SDGs at time help of climate results for management and national policy formulation awareness of


Introduction
The adverse impacts of climate change have become more noticeable worldwide, the evidence of which, including rising sea levels [1], melting glaciers [2], increasing wildfires, and changing biodiversity, has been observed all over the world [3]. To respond to climate change, all the parties (195 member countries) in the United Nations Framework Convention on Climate Change (UNFCCC) committed to the Paris Agreement in December 2015 with the aim of limiting global warming to well below 2 • C above pre-industrial levels and pursuing efforts to limit warming to 1.5 • C [4,5]. Notably, the Paris Agreement provides an ambitious opportunity to consolidate the relationship between climate and development [6]. In the same year, the 2030 Agenda for sustainable development was limited to the amount and form of data, the analysis of unstructured data such as images and text is considered a great challenge, whereas traditional statistical analysis usually focuses on numeric datasets [21,22].
Generally, abundant and valuable information is underlying in the textual data [23], and many studies have been devoted to the extraction of latent semantic knowledge utilizing text mining techniques in various fields. For example, some research has focused on the investigation of public opinion through the collection and analysis of text data from social media [24], or the identification of trends and current topics by analyzing scientific literature or patents [25,26]. In addition, text analysis of news data has been utilized as an instrument for monitoring drought impacts [27]. Even though text analysis has been widely used in various disciplines, including social science, political science, computer science, information science, biological science, and environmental science [28,29], most studies have focused on specific text data from a single source. Since climate change issues are complicatedly intertwined, however, identifying awareness from a multilateral perspective should be considered a key factor for successful policymaking [17,30]. To address this challenge, this study aims to understand climate change awareness from a cross-societal perspective through a text analysis based on multi-source data from different societies.
The objective of the present research is to analyze the gap of climate change awareness among different societal groups in support of SDG 13.3 due to the lack of the measurable indicators for awareness-raising. Simultaneously, research results could provide the evidence to promote policy and action for raising climate change awareness. For these purposes, text analysis based on an unsupervised learning algorithm was utilized on multiple sources produced by different societal groups. Five different societies were selected as decisive actors responding to climate change. Social media, news, national R&D data, patents, and scientific articles were selected as the analytic objectives representing public, social, governmental, industrial, and academic awareness. Text analyses, including topic modeling and co-occurrence networks, were applied to the collected text data. Latent Dirichlet Allocation (LDA) topic modeling was used to verify the distribution of climate change awareness from diverse perspectives in each society and a term co-occurrence network analysis was employed for content analysis in detail. Consequently, the differences in climate change awareness among various societies were successfully investigated using this method. To the best of our knowledge, this is the first attempt to compile separately analyzed LDA results using a unified indicator in the field of text mining research. This approach could be helpful in planning and policy-making for raising awareness of climate change.

Data Collection
Five different text data were collected. Social media data were collected through a Twitter application programming interface (API) using Python. News data were collected from the BIG KINDS website (https://www.bigkinds.or.kr/, accessed on 22 March 2021), one of the largest platforms providing metadata from news articles in Korea. The search string "climate change" in Korean was used to retrieve text data containing climate change awareness information. Both sets of data, generated over one year in 2017, were collected in July 2020. The amount of collected data from each was 7632 tweets and 7634 news articles, respectively.
For scientific articles, publications written by Korean authors can be divided into two: the Korean Citation Index (KCI) category, written in the Korean language, and the Science Citation Index (SCI)/Social Sciences Citation Index (SSCI) journal categories, mostly written in English. Accordingly, the data from SCI and SSCI journals were collected from the Web of Science database (DB), and those from Korean journals were collected from the KCI DB. The search string "climate change" was also used in both platforms, and the dataset was based on 2017 publications by authors based in Korean institutions. Most KCI journals recommend, by some mandate, the submission of an additional English abstract, although all the content is written in Korean. Hence, English abstracts were additionally collected from the KCI website by web-scrapping tools using Python, since they are not provided in the KCI DB platform. In total, 920 publications were collected from both DBs.
The Cooperative Patent Classification (CPC) system was used to collect Korean corporations' patent information. The CPC, a well-known classification system developed by the EU and US Patent Office, contains more detailed information and improved retrieval performance than the International Patent Classification (IPC) system [31,32]. A dataset based on the patents granted in 2017 by the Korean corporation was built using the CPC Y-code by web-crawler using python. The Y-codes used here are Y02 (technologies or applications for mitigation or adaptation against climate change) and Y04 (information or communication technologies having an impact on other technology areas), which are related to climate change technology [33]. In total, 7169 documents were collected from patents DB.
National R&D data were used from the National Science and Technology Information Service (NTIS), which is a specific Korean platform for managing all national R&D information. The NTIS R&D data related to climate change were collected from a national research institution, the Green Technology Center (GTC), managing climate change R&D under the Ministry of Science and ICT in Korea. In total, 9591 R&D data conducted in 2017 were collected.

Topic Modeling
Topic modeling was employed to extract latent topics from a large volume and variety of collected documents. Topic modeling is a type of unsupervised machine learning technique, which requires input and interpretation from the researcher [34]. It relies on a probabilistic model of input data, which enables the estimation of the probability of latent topics based on the frequencies and co-occurrences of words [35]. The LDA is the most widely used topic modeling algorithm. Basically, the LDA approach assumes that a single document is a mixture of several topics and each topic is characterized as a probability distribution over words [36]. At the same time, LDA requires a fixed number of topics, defined as the K value, for analysis. [37] In order to determine the number of topics (K value), perplexity, one of the measures for the goodness of fit of statistical models, was evaluated on each text dataset. [35,38]. The perplexity was calculated at a range of K value from 10 to 50. The number of topics (K) was finally determined to be 20, since the lowest perplexity value was evaluated at K = 20 for tweets, R&D, and patent data. On the other hand, the perplexity evaluated from news and scientific articles tended to increase as increasing the number of topics. However, no significant differences in perplexity were observed below the topic number of 30. Thus, the topic number was decided in accordance with the results of the other three data.
A document-term matrix (DTM), the most basic step for most text analyses, should be established before employing LDA. The DTM is composed of a set of documents in a row and terms in a column, indicating the frequency of the terms in each document. The primary procedure for this is pre-processing, which includes tokenization, removal of stop words, and stemming. Subsequently, the pre-processed data is converted to a DTM. The respective analytic text data were utilized for tweets from social media data, the full text of news articles, a purpose section of national R&D data, and a background and technical problem section of patent and abstract of scientific article, respectively. Afterward, the term frequency-inverse document frequency (TF-IDF) value was calculated to determine the weight of each term. This value indicates the importance of a given term in a document [39]. The LDA model was then applied to the TF-IDF matrix for topic extraction; the number of topics (K) was determined to be 20 for this study in order to multilaterally analyze on exclusive multi-data. The statistical software R was utilized for all these procedures. Overall analytic procedures were conducted in accordance with the Korean natural language process (NLP), except for scientific articles, since most of the documents were written in Korean [40,41].

Co-Occurrence Network Analysis
The same text data used in LDA topic modeling were employed to build a term co-occurrence network map containing climate change awareness from each of the five different societies. The weight of terms was determined using TF-IDF after the extraction of nouns from text documents. Subsequently, the correlation between terms was extracted based on the frequency of a pair of terms occurring together in a document. This process also used R software. Using this value and pairs of terms, a co-occurrence network was constructed using the Gephi software, a well-known network construction and analysis tool [42]. Approximately 125 terms were used to construct the network maps, depending on the degree of nodes. However, the co-occurrence network of tweet data consisted of 79 terms because the total text data was less than that of the other four cases. The limited amount of text information is a general characteristic and limitation of text mining from social media data [43,44]. The nodes are composed of co-occurring terms and the edges represent a correlation value to indicate how the two terms are associated with each other. The size of nodes is proportional to the degree, indicating that the larger size of the node is assigned to the larger numbers of co-occurrence with other terms. The thickness of the edges is proportional to the strength of the correlation between a pair of terms. After constructing the network, the clusters were determined by a modularity algorithm provided by Gephi software and the color of the nodes was distinguished by clusters. The position and distance of each node were modulated by the MultiGravity ForceAtlas2 layout algorithm, also provided by Gephi [45]. Each cluster was labeled manually based on the mainly used and highly relevant terms in a given cluster.

Results
The research framework in this study is illustrated in Figure 1. To investigate climate change awareness in five different societies, massive text documents from multiple sources as generated by each society were collected and analyzed. Five different societal groups were selected as decisive actors who affect climate change policies: public, social, government, industrial, and academic groups. The respective text data were utilized under the assumption that each text dataset encompasses awareness and perception of each society [46]. For this study, social media data (Twitter) were utilized to identify public awareness, news data were utilized for social awareness, national R&D data were utilized for governmental awareness, patents were utilized for industrial awareness, and scientific articles were utilized for academic awareness. Each dataset was collected from a specific DB platform or web scrapping, as described in detail in the Methods section. Since articles, patents, and R&D data generally contain a large volume of text information, particular sections of each document were analyzed in order to focus on the elicitation of awareness contained in each document. Accordingly, the purpose section was utilized for the analysis of national R&D, the background and technical problem sections were utilized for patent analysis, and the abstract was utilized as an analytic objective in the case of scientific articles.
After retrieving and pre-processing the analytic data, topic analysis based on the LDA model and term co-occurrence network analysis were separately performed on the respective groups' text data. This approach is undoubtedly helpful for a good understanding of the status of cross-societal awareness in a specific country, which in this paper was determined to be Korea as a case study. Finally, the analysis results were translated from Korean to English to share this worthwhile methodology with many other readers and researchers. Note that unavoidable difference in nuance between Korean and English words may exist by translation, which was conducted in consideration of interchangeable words with similar or identical meanings as well as co-occurring terms. On the other hand, scientific articles were analyzed using English abstracts regardless of Korean and English journal sources because both journal categories provide English abstracts. To identify the holistic awareness perspectives on climate change among different societies, topic modeling based on LDA was employed to collected text documents. As a promising unsupervised learning technique, the LDA topic model can be beneficially utilized for detecting latent topics from massive unstructured documents and classifying documents based on patterns of latent topics [47]. The LDA method was applied discretely to collected text data containing public, social, governmental, industrial, and academic awareness. This study not only aimed to extract of topics and classify documents but was also devoted to inferring how the latent topics are correlated with detailed targets of SDG 13.3 in terms of awareness of climate change. To this end, a manual validation process was additionally performed on the discerned topics in order to evaluate the relationship between each topic and each target in SDG 13.3. There are four detailed targets in SDG 13.3: "mitigation", "adaptation", "impact reduction", and "early warning" with respect to climate change. These four targets were utilized as awareness indicators to reallocate extracted topics in the manual validation process. Twenty topics were extracted from each societal document and subsequently reallocated to one of the detailed SDG 13.3 targets. The reallocation of each topic was evaluated based on relevant terms in a given topic. The awareness distribution from topic modeling with the manual validation process is illustrated in Figure 2. The detailed results of LDA are summarized in Tables A1-A5. As shown in Figure 2, the status of climate change awareness in each society can be comprehensively verified based on topic distribution. The climate change awareness differences were clearly observed depending on the groups of societies. Based on social media (Twitter) analysis, it was identified that the awareness of the Korean public of climate change largely focused on early warning (40%) and mitigation (30%) in 2017; less awareness of adaptation (10%) and impact reduction (20%) was also observed. News data corresponding to social awareness showed evenly distributed topics on all four awareness indicators despite a low early warning ratio (5%). Awareness of mitigation, adaptation, and impact reduction was observed to be 25%, 40%, and 30%, respectively, which implies that Korean media have comparably wider perspectives on climate change issues. Governmental and academic awareness derived from national R&D and scientific articles mostly focused on mitigation and adaptation topics with a high proportion of 85% all together. However, national R&D focused more on mitigation (60%) research, while scientific articles exhibited adaptation-related topics further with 60% ratio. Meanwhile, R&D data and academic articles showed that awareness of impact reduction and early warning lagged in both groups, inferred from the low proportion of awareness with an overall 15% ratio. Noticeably, industrial awareness represented by corporate patents was observed to be mostly concentrated on mitigation technology (80%). This indicates that the Korean industry paid attention to reducing greenhouse gas emissions for climate change mitigation. However, it is recognized that other awareness targets, such as adaptation, impact reduction, and early warning, fell behind compared to mitigation: in particular, the early warning awareness was not observed in industrial groups.
In the above section, the holistic distribution of topics from respective documents was verified to identify climate change awareness in each group through LDA topic modeling. Additionally, one more useful technique to extract valuable insights from a large volume of text data is the term co-occurrence network [48]. As a kind of content analysis, the term co-occurrence network is useful to understand the underlying content structure and relatedness in detail between co-occurring terms constituted in the document [49,50]. Visualization of the data, particularly the co-occurrence network, enables intuitive human cognition improvement [10,51]. In this regard, a co-occurrence network is advantageous for visualizing the association and patterns between items (terms) as a two-dimensional map [6,48]. Furthermore, it could provide a sub-network, called a community or cluster, which indicates aggregates of items (terms) with similarity. In other words, it implies connections between co-occurring terms within a particular theme. Term co-occurrence networks constructed using documents from each society are illustrated in Figures 3-7, and the top 5 terms as a function of degree are summarized in Table 1. The degree is defined as the number of links to a given node. High degree value indicates high co-occurring frequency with other terms, which represents the importance of a term. First, social media data exhibited five clusters having two main words "warmth" and "Earth" in the center of the network with the highest link degree, as shown in Figure 3. The five clusters composing the network were "US withdrawal from the Paris Agreement", "abnormal and extreme climate", "climate change impacts and mortality", "sea-level rise arising from global warming", and "national policy and forum on climate change response". The top three terms were "rise", "sea-level", and "case" in order. The top five terms based on degree indicate the strongest association with other terms; in other words, they frequently cooccur with other terms. Hence, these words could be considered major thematic keywords dealt with importantly by the actors in the corresponding society. The important thematic keywords dealt with in social media data are related to "warmth of the earth (indicating 'global warming')", "sea-level rise", and "cases (for abnormal climate and warming)".      News data representing social awareness showed the highest number of thematic clusters, seven, which implies that media have the widest perspectives on climate change awareness among the five different groups (Figure 4). This wide distribution is consistent with the LDA analysis result, as shown above. "US withdrawal from the Paris Agreement", "G20 Germany Summit", "eco-friendly renewable energy", "economy and policy research on future technology", "international organizations and forums", "countermeasures against disasters in agriculture and water resources ", and "particulate matter and air pollution" were the seven clusters observed in the co-occurrence network. The top five terms were observed as "USA", "GHG", "China", "agreement", and "Trump", in order. These five terms belonged to the same cluster, which is "US withdrawal from the Paris Agreement", implying that Korean media importantly dealt with the news related to US withdrawal from the Paris Agreement in 2017.
As presented in Figure 5, the national R&D analysis results showed five thematic clusters with a comparably balanced thematic distribution of climate change mitigation, adaptation, and impact reduction. Detailed themes of the clusters are as follows: "climate change impact on the ocean, ecology, and forest" related to impact reduction; "energy materials such as solar cells and batteries" and "automatic process equipment" related to mitigation technology; and "climate monitoring and information management system" and "selective breeding of adaptive and superior varieties/species" clusters related to climate change adaptation. The top 5 terms appearing in national R&D data were "efficiency", "design", "system", "nano", and "materials", in order, as summarized in Table 1. The terms "efficiency", "nano", and "materials" were associated with the "energy materials such as solar cells and batteries" cluster. In addition, the terms "design" and "system" were observed to be located in the center of the network.
The industrial patent analysis showed five clusters which were all associated with mitigation technology (Figure 6), as verified above in the topic analysis: "lithium-ion battery", "electric vehicle", "power and telecommunication management", "solar cell materials and manufacturing process", and "battery structure and stacked device". The terms "cell", "electrode", "anode", "cathode", and "discharge" were the top five thematic keywords in patent data (Table 1). These five keywords were associated with battery (or solar cell) technologies for mitigating greenhouse gas. Thus, the results reveal that industrial awareness of climate change in Korea is mostly focused on mitigation technology.
Lastly, the term co-occurrence network of the scientific articles was constructed using VOSviewer software, as illustrated in Figure 7. Since scientific articles solely provide English textual abstracts differently from other documents, VOSviewer, which is specialized for text mining in English, was utilized for the co-occurrence network analysis [22]. However, the method of constructing the network based on the term association between extracted nouns is basically same mechanism [22,52]. The term co-occurrence network built from articles also showed five clusters, but two exhibited minor proportions. There were three major clusters, "technology and policy research on renewable energy", "patterns and trends of temperature rise and precipitation", and "number of fatalities due to climate change", and two thematic clusters, "impact on natural disaster damage" and "soil carbon and plant varieties", were observed as a minor proportion. The terms "year", "system", "level", "development", and "increase" were highly important thematic keywords in the academic society. These important keywords are deeply associated with climate change adaptation themes.

Discussion
Knowledge synthesis based on machine learning is useful to discover underlying patterns or insights from a variety and large volume of big data, which provides evidence for data-driven decision-making [53,54]. The computer-assisted learning technique enables the improvement of the human capacity to handle unstructured text data rapidly, massively, and automatically [21,51,53]. Text analytic methods have already been utilized to track and identify sustainability indicators [34,55]. Likewise, the present method could be an effective tool for sustainably monitoring and managing climate change awareness from diverse perspectives in support of SDG 13 and its targets. Furthermore, this methodology could also be applied to a cross-national level analysis with relevant data, even though this study focused on identifying the differences among cross-societal awareness of climate change at a particular period of time and in a specific country.
Generally, text mining techniques have been applied to text data from a single source or aggregated documents with the same context despite different sources. On the contrary, by conducting text analysis on documents collected from different sources, comprehensive knowledge can be synthesized to infer multifaceted aspects involving diverse stakeholders. Thus, we endeavored to develop this methodology further to improve availability beyond detecting topics and classifying documents. We suggested a validation process for reallocation after topic extraction in an effort to integrate mutually exclusive data from different sources.
The general unsupervised learning method is limited to document classification, which requires additional interpretation [56,57]. Thus, an additional process is required to extract the relationship between topics and specific indicators and to integrate separately analyzed data. For this purpose, awareness indicators derived from detailed targets in SDG 13.3 were utilized. The target of SDG 13.3 is "improving education, awareness-raising, and human and institutional capacity on climate change mitigation, adaptation, impact reduction, and early warning". Accordingly, four detailed targets, which are (1) mitigation, (2) adaptation, (3) impact reduction, and (4) early warning, were utilized as criteria for the derivation of awareness indicators. The reallocation process was performed after the extraction of latent topics using these indicators in the manual validation process. Based on this relationship, the integration of exclusive data and multilateral distribution of awareness was successfully ascertained. In addition, the relation of climate change awareness to SDG 13 can be elicited simultaneously. This novel approach, which utilizes a reallocation procedure using unified awareness indicators, will provide informative and valuable insights in the measurement of climate change awareness. This methodology can be utilized to obtain an opportunity for the achievement of sustainable development goals with high reproducibility and scalability. In addition, a broader or more specific point of view could be clarified if the relevant text data can be collected.
Typically, awareness identification research has been widely conducted through questionnaires, telephone surveys, or interviews [17,58]. These empirical methods have several limitations, such as time and cost intensiveness, issues in sample size and response rate, challenges in real-time collection, missing samples, and potential bias. Nevertheless, there are many advantages, such as collecting information suitable for the purpose from welldesigned questionnaires as well as abundant individual-level background information of respondent groups [55,59,60]. The text analytic method based on computer-assisted learning has recently emerged to complement these limitations of empirical surveys [38,61]. Text mining techniques have also been utilized to elicit valuable insights from the data surveyed by empirical methods [30]. In particular, those are highly favorable for the direct detection of users' opinions from text data owing to the advantages of machine learning from data collection to analysis steps. Likewise, this study successfully captured the discriminative characteristics of climate change awareness among various groups. Therefore, the methodology used in this study can provide scientific evidence for identifying awareness from diverse stakeholder groups, which could complement survey-based research.
As a result of the LDA with the reallocation process, climate change awareness can be successfully inferred from five different societies in Korea: public, social, governmental, industrial, and academic awareness. Based on the collection and analysis of a large volume of text data from various sources, the identification of climate change awareness from diverse perspectives and their relationships with the SDG 13.3 targets was successfully investigated. The LDA topic model was used to explore the relationship between awareness indicators and each topic, as well as the distribution of awareness in terms of mitigation, adaptation, impact reduction, and early warning with respect to climate change. The term co-occurrence network analysis was used as a content analysis, complementing the LDA results in detail. All groups showed relatively high climate change mitigation awareness in the levels with a more than 25% distribution rate. While early warning awareness lagged in the four other groups, the Korean public paid it more attention. The media showed the broadest awareness perspectives on climate change, as verified by LDA and network analysis simultaneously. The awareness of Korean industry was found, by both text analyses, to concentrate on mitigation technology in an effort to reduce greenhouse gas. The government and researchers mostly focused on both mitigation and adaptation awareness of climate change.
The distribution of all topics from the LDA results was illustrated as a network visualization based on the association between topics and awareness indicators, as shown in Figure 8. The four awareness indicators are represented as a hexagonal node and the topics, extracted from each text dataset, are represented as a circular shaped node. The colors of the nodes are distinguished by respective data sources and awareness indicators. Figure 8 is directly correlated with the summation of Tables A1-A5. News topics representing social awareness are located in the center of the network because of even distribution of the four awareness indicators. Patent topics (industrial awareness) are mostly focused on mitigation, and scientific articles are focused on adaptation; R&D topics are primarily distributed to both mitigation and adaptation. In addition, impact reduction and early warning awareness were observed to be at a low level in the overall groups of societies. Using this network visualization, all the relationships between each topic and each awareness indicator can be intuitively recognized with respect to all societies. Sustainability and climate change issues require mutual understanding among stakeholders from various societal groups and backgrounds. In this respect, this study is well suited to provide useful knowledge to be used in policy-making and management contexts that support the implementation of SDG 13 (climate action). Data-driven knowledge management based on multiple sources presented in this study could be extended with relevant datasets to any other policy-and decision-making strategy involving diverse stakeholders. Since this approach is based on text big data analysis, it is appropriate for inferring valuable insights from various and large volume of data. Furthermore, it can be applied to real-time monitoring and analysis with tracking the increase in the velocity of data. Accordingly, this method can also be utilized for other SDG targets and as evidence for the formulation of national policy. In addition, the presented methodology could be used for other decision-making processes such as marketing strategies and sustainable management in corporate management as well as the environmental-social-governance (ESG) agenda, which requires various stakeholders' awareness. It is hoped that this methodology will be utilized widely and developed further for valuable knowledge management.

Conclusions
Raising-awareness of climate change is considered one of the key factors in the sustainable development goal and climate policy [62]. Climate action is most likely dependent on climate change awareness, not only from individuals but from diverse social groups [15,16]. Furthermore, there are no detailed indicators related to awareness-raising in SDG 13.3, even though the goals for climate change awareness-raising have been included. Since policy-making and strategies to address climate change issues and promoting climate action are highly dependent on various stakeholders, understanding the awareness from diverse perspectives should be considered essential. In this respect, text analyses based on multi-source data were utilized as a promising methodology to identify climate change awareness from the diverse perspectives of different groups in this study.
Five different groups were selected as decisive actors responding to climate change, and the text data produced by each of them were collected and analyzed. The differences in climate change awareness have been clearly verified through the LDA topic modeling and co-occurrence network analysis on collected text data. In order to integrate and analyze the awareness differences among these groups, the respective text data were comparatively analyzed by classifying them into four indicators, mitigation, adaptation, impact reduction, and early warning, based on the detailed targets in SDG 13.3. Through this research, the status and distribution of climate change awareness with respect to the disproportion and deficiency can be grasped from five different groups: public, social, governmental, industrial, and academic societies. Accordingly, it can be used as a criterion for decisionand policy-making tailored for each society to raise awareness.
Our research provides two main contributions to the literature. First, the multilateral view of climate change awareness among diverse cross-societal groups in a single country was successfully inferred. We proposed the methodology for comparative analysis by analyzing text data from multiple sources rather than a single source and integrating them into unified indicators. This methodology is beneficial to identify the awareness distribution among different groups based on an unsupervised learning technique. Compared to the survey method, the awareness differences of cross-societal groups could be readily identified with high-reproducibility and resource-saving owing to the computer-assisted procedures [34,55]. However, there are some limitations such as insufficient background information with regard to each stakeholder. In addition, prerequisites of collecting and handling relevant text data which can represent their groups will not be applicable to all countries at present. Nevertheless, this method is favorable for prompt responses and reducing the time and resources needed to investigate awareness. Thus, it is well suited to fill the gap in the limitations of the survey method [17,63].
Second, this methodology and these results could be utilized for the indicator of SDG 13.3 with regard to awareness-raising. However, the methodology should be further developed as a quantitative indicator for this purpose. Nevertheless, the present method is highly appropriate as a qualitative indicator to understand the status of climate change awareness on mitigation, adaptation, impact reduction, and early warning. Thus, this methodology will encourage policymakers and researchers to develop new indicators for identifying, tracking, and monitoring SDGs [8,14], and could at the same time help collect reliable information about the status of climate change awareness. The results investigated here could be utilized for knowledge management and national policy formulation to raise awareness of climate change.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Extracted topics and reallocation results to awareness indicators from social media data.

Topics
Indicators Relevant Keywords