Knowledge Discovering on Graphene Green Technology by Text Mining in National R&D Projects in South Korea

This paper reviews the development of South Korea’s national research and development (R&D) in graphene technology, focusing on projects that have been classified as “green” technology. A total of 826 projects (USD 210 billion) from 2010 to 2019 were collected from the National Science and Technology Information Service (NTIS), which is full-cycle national R&D project management system in South Korea. Then we analyzed its R&D trend by conducting diverse text mining methods including frequency analysis, association rule mining, and topic modeling. The analysis suggests that the number of graphene green technology (GT) R&D projects and the research expenses will show a rising curve again in the incumbent government along with the implementation of the Korean New Deal policy, which integrates the Green New Deal and the Digital New Deal.


Introduction
International competitiveness in research and development (R&D) is a national strength and a key to sustainable national development [1,2]. In 2018, South Korea devoted 4.53% of its gross domestic product to R&D; this percentage was ranked No. 2 in the world [3]. The importance of innovative R&D is being emphasized in South Korea. On 14 July 2020, the South Korean government introduced the "Green New Deal" policy whereby it would spend KRW 73.4 trillion (USD 60.9 billion) and create 659,000 jobs. Green New Deal is a socio-economic paradigm, which means the policy of converting the current energy policy centered on fossil energy into new one that is centered on renewable energy to create a low-carbon economy and increase employment and investment at the same time. In response, the South Korean government announced its aim to reduce the net carbon emissions to zero (Net Zero) by 2050, for example, by supplying 1.13 million eco-friendly vehicles and creating 230,000 energy-efficient buildings. In the Green New Deal policy, the central concern is electricity supply [4]. As the markets for new and renewable energy and electric vehicles grow rapidly, and the limitations of fossil fuels and energy problems continue to emerge, the importance of high-capacity energy storage technologies for efficient energy use and response to climate change is greatly emerging. Therefore, researchers are considering graphene-based technologies for application in the green industry.
Graphene is a single layer of carbon atoms closely packed into a hexagonal lattice structure. It is known as a new generation material that has unrivalled characteristics in terms of mechanical robustness, electronic conductivity, and large surface area. When graphene is composed of other materials, for example, plastics or metals, the new composite materials become much stronger, or lighter [5,6]. Due to its intrinsic features, graphene finds practical application in a variety of fields, including mobile devices, semiconductors, biosensors, flexible displays, etc. [7][8][9][10][11]. Furthermore, it has tremendous potential to enable technologies that can help to solve energy shortages and climate problems by applying in energy storage devices. Therefore, many countries have started to undertake graphene-related R&D projects to position themselves as leading countries in the field [12]. This study seeks to reveal knowledge of graphene green technology (GT) by applying various text mining methods to R&D project data collected from the National Science & Technology Information Service (NTIS) in South Korea.

Literature Review
The South Korean government has invested heavily in research infrastructures and has established NTIS, which is a centralized national science and technology information portal that integrates and provides information including programs, projects, human resources and outcomes of national R&D programs to better monitor R&D projects. NTIS aims to support sustainable national R&D by creating a system to make science and technology information easily accessible to all citizens, including researchers, technology innovators, R&D managers, and policy makers [13]. In this aspect, NTIS is a data source that can be used to guide efficient full-cycle project management but is often overlooked.
Globally, research on graphene has reached an unexpectedly great height and has emerged as a champion in the field of applied sciences [14]. A large body of research has made important contributions to graphene technology. According to Simonsen, five of the world's biggest problems are being targeted by graphene research: clean water, carbon emissions, energy, healthcare, and infrastructure [15]. The main focus of previous research mainly highlighted the technical properties of graphene. Some papers have used bibliometric analysis of papers and patents to analyze the global R&D trends. However, few studies have detected technology trends by applying text mining methods.
Shapira et al. used bibliometric as well as web mining techniques to extract information from graphene-based small and medium enterprises (SMEs) to understand and map companies' development and commercialization in the domain of graphene. The findings suggested that policy needs to ensure attention to the introduction and scale-up of downstream intermediate and final graphene products and associated financial, intermediary, and market identification support [16]. Besides, Kay et al. produced the science and patent overlay maps which display how graphene was discovered and used between 2000 and 2013. The process employed text mining, visualization, software tools, and a self-developed mapping kit. The results showed that graphene research is focused on material sciences and closely related fields such as engineering and physical chemistry, while graphene inventions covered technologies from catalysis and semiconductors to the pharmaceuticals [17].
According to the bibliometric trend analysis on global graphene research done by Lv et al., the number of SCI papers on graphene research soared between 1991 and 2009 from 5 to over 2000 [18]. As per the bibliometric analysis done by Zou et al. on the trends of graphene research and development, global research publications and patent production involving graphene have been increasing since 2005, especially since 2010, and more than 50% of these papers and patents were published from 2014 to 2016. China, USA, South Korea were the top three countries which produced the highest number of publications from the 2010 to 2016 [19]; similarly, in patent production, China, South Korea, USA came out as the top three countries [20]. When it comes to the major actors in graphene research, university and research institutes act as the major actors in China and South Korea, while enterprises are the main R&D actors in the USA [19].
Recent publications, and patents testify to the widespread interest in graphene [20]. In the patent analysis of the graphene industry based on competitive advantage (SCA), South Korea obtained a relatively higher SCA, indicating that South Korea is expected to be a promising competitor in graphene field [21]. South Korea's large number of publications and patent applications demonstrate that it has invested heavily in graphene R&D. However, continuous research and analysis on the trend of graphene R&D in South Korea are insufficient.
Little research has been conducted in GT fields using national R&D data. NTIS R&D data have been used in Jeong et al. [9]'s network analysis to analyze South Korea's GT R&D trends from 2011 to 2016. Keyword centrality analysis determined that the centrality of solar cells and fuel cells was highest between 2011 and 2012, which means that South Korea has invested heavily in new and renewable energy technologies since 2011. In 2013 and 2014, "graphene" appeared in sixth place as a keyword, because graphene is used in the development of solar cells, fuel cells, and secondary batteries. In 2015, as the government announced the "Technology Roadmap for Promoting Commercialization of graphene (2015-2020)", graphene ascended to the fifth rank.
In this paper, we analyzed South Korea's national R&D trends for the recent 10 years from 2010 to 2019 in terms of graphene GT that is a focus of the Green New Deal policy. We conduct both qualitative and quantitative analyses by applying diverse text mining methods including frequency analysis, association rule mining, and topic modeling. We also attempted to predict the trend of graphene GT projects after 2020 and to suggest policy implications to develop graphene technologies that create environmentally sustainable solutions.

Data and Analysis Process
The aim of this research is to explore the chronical change of graphene national R&D trends, the application fields of graphene technology, and policy implications for the future graphene industry. The research process entailed several steps. First, national R&D project data were collected from the NTIS which provides national R&D information including projects, human resources, facilities, and outcomes. Then, 826 projects that fall under the classification of GT field were selected for the analysis. The extracted data were parsed and cleansed for analysis. The first part of the analysis covered quantitative information on the number of projects and the research expenses. Research trend analysis was performed focusing on the number of projects by year, ministry, technology classification, and total research expenses. The second part of the analysis had the goals of quantifying chronical change and the relations between words that appear in the project information. This analysis applied text-mining methods including frequency analysis, association rule analysis, and topic modeling. The R software package is used as our central research method.
This analysis considered national R&D projects from 2010 to 2019 that consider graphene and that are conducted in research areas that are classified as GT. A search of the NTIS database, with "graphene" as the search keyword yielded 4054 R&D research projects. Among these, only 826 were classified as GT. The field values used in the research are "year", "project name", "ministry", "start date of the year", "end date of the year", "science and technology standard classification", "green technology field classification", "6T related technology classification", "R&D stage", "total R&D expenses".

Methods
For frequency analysis, we used following columns in the graphene GT project information: "project name", "keyword", "summary of research objective", "summary of research content". First, during the data-cleaning process, useless components such as postpositions, punctuation, numbers, special characters, and stop words were removed using the "tm" package in R. We also used "graphene" as a stop word, because it was our search keyword and, therefore, appeared in the descriptions of all projects. Second, the root forms of nouns and infinitives of verbs with two or more letters were extracted using a morphological analysis package called "NLP4kec" in R. Then, we formed a data frame that contained a column of words and a column of their frequencies, then compared the chronical change of the keywords that had the top 10 highest frequencies.
Next, we implemented association rule analysis to find the related rules between the words that appeared in the project information. For this analysis we used the "arules" package in R. Association rule mining can find relationships between words, whereas frequency analysis can only compare their frequency of appearance. Association rule analysis is widely used to discover interesting relations between variables in large databases [22]. The association rule is an analysis process that finds a pattern in the form of {A}→{B}, i.e., a rule in which event B occurs simultaneously when a specific event A occurs. The a priori algorithm generates frequent item sets that are determined by testing the candidate item sets. When an itemset within the rules has weight exceeding the minimum support level that a user self-defined, it is considered as a frequent itemset. After generating all frequent item sets, the association rules are retrieved from the gathered frequent item sets. Any rule with confidence exceeding the minimum confidence level is considered as a strong rule set [23][24][25]. For the analysis, the texts were separated into independent sentence units, then were generated as transactions. Then the association rules between words were calculated and the associated words were visualized using "arulesViz" package.
In the last part of our analysis, we conducted topic modeling, which is a method for unsupervised classification of documents to find optimal groups of items. There are several techniques of topic modeling such as Probabilistic Latent Semantic Analysis [26], Latent Dirichlet Allocation (LDA) [27], Hierarchical Dirichlet Processes [28], Correlated Topic Models [29], and Pachinko Allocation [30]. Among them, LDA is one of the most widely used methods to fit a topic model [31,32]. LDA is a mathematical method that can concurrently find the mixture of words that are associated with each topic, and determine the mixture of topics that describes each document [33,34]. Topic modeling consists of four steps: loading of data, pre-processing of data, building the model and visualization of the words in a topic. Adding to these steps, we tried to give names to each topic to determine what kinds of research topics are distributed in the graphene GT field.

Graphene GT R&D Trends
The numbers and budgets (Table 1)    Global competition for graphene technology development was increasing, so the South Korean government aimed to preoccupy the graphene market by establishing a systematic R&D investment strategy. Especially in 2015, when the number of projects was largest, the South Korean government presented the "Technology Roadmap for Promoting Commercialization of graphene (2015-2020)" to realize a leading country in the future material industry by preoccupying the global graphene market [35]. This project entailed an investment of USD 107.7 million by 2019 to support the commercialization of graphene, so numerous new related projects were planned and executed.
When sorted by the number of projects, the specific GT categories that ranked in top five are "high-efficiency secondary battery technology", "other manufacturing process/material efficiency improvement technologies", and "non-silicon solar cell mass production and core source technology", "other non-silicon solar cell technologies" and "next generation high-efficiency fuel cell system technology" (Table 2). "High-efficiency secondary battery technology" ranked first both in the number of projects and the R&D expenses, accounting for 128 out of 826 (15.5%) and USD 37 million, respectively. Secondary batteries are environmentally benign and economical in that they are reusable by storing electricity. They can be used in home appliances, transportation, power grids, and are being used as essential components in IT devices, electric vehicles, and mass storage devices. Due to these characteristics, development of secondary batteries is a promising industry [36].  Of the graphene GT R&D projects, 551 (66.7%) have been basic research, 124 (15%) have been applied research and 134 (16.2%) have been development research. In consideration of the division of roles between the government and the private sector, the private sector focused on technology development in the application field, while the government prepared a support plan to strategically promote the development of raw material manufacturing technology and commercialization of core applied products. This division of foci indicates that the proportion of basic research is largely according to the government's R&D support plan, since the data used in this analysis are national R&D projects which do not include private R&D projects.

Frequency Analysis
We divided the time period into two 5-year subperiods (Table 3). From 2010 to 2014, the top 10 keywords were "technology", "development", "nano", "research", "material", "electrode", "property", "structure", "usage", and "battery". R&D concerns technology development, so "technology" and "development" were the two most-frequent in both periods. "Nano" 479 (58%) projects belonged to nanotechnology field (e.g., nano material technology, nano electronics technology, nanochemistry process technology, nano photonics technology) under the classification of 6T, which refers to six future promising technologies, including Information Technology, Biology Technology, Nano Technology, Environment Technology, Culture Technology, and Science Technology. To increase the efficiency of investment in R&D projects, the South Korean government has started strategic investments in 6T sectors, which are expected to have great ripple effects and contribute to the creation of new growth engines in the future. The 6T field was adopted by the National Economic Advisory Committee in 2001 as the next-generation growth industry in the 21st century and has been intensively nurtured since then [37].
"Battery" rose to rank 6 in the second period from rank 10 in the first period. The reason for this increase is that the government increased its R&D funds in renewable energy and GT to encourage development of highly efficient secondary batteries that can be used in environmentally benign vehicles. The number of projects and funding for graphene GT projects peaked in 2015 along with the establishment of "Technology Roadmap for Promoting Commercialization of graphene (2015-2020)"and with the finalization of the "Paris Agreement on climate", in which gas emission levels were allocated to each country. Countries are expanding the ratio of ecologically benign vehicles by implementing environmental regulations, and a few countries are announcing plans to either ban or reduce the number of diesel vehicles in the near future [38]. South Korea also shares the goal by providing financial support such as subsidies and tax reduction, as well as environmental support such as establishing public procurement programs and installing charging infrastructure, which has been giving a significant impact on the expansion of battery market. "Energy" entered the top 10 in the second period, from 16 in the first period. A total of 147 (17.8%) projects belonged to the energy technology (e.g., energy material technology, energy storage technology, bioenergy technology, unutilized energy-using technology) field under the classification of 6T. In for "battery", South Korea participated in the energy transition in the form of increased R&D funds in new and renewable energy after according to the change in the global energy paradigm finalization of the "Paris Agreement on climate" in 2015.
The new paradigm of energy policy has changed in the direction of securing economic feasibility and taking into account the sustainable development of future generations at the same time. To keep up with the change of the global keynote of the environmental policies, South Korea put much attention on developing graphene technology as a new growth engine for the next generation. Especially, South Korea's patent for Chemical Vapor Deposition of graphene is ranked first in the world, and its patent for graphene flakes is ranked second in the world. The main application fields are displays, mobile phones, and energy, which have a high market share in South Korea [35].
Graphene is an important component in "battery" and "energy" devices. The major industrial fields of the Green New Deal include renewable energy, secondary batteries, electric vehicles, and hydrogen vehicles [4]. The core of the South Korea's Green New Deal, which is fused with the Digital New Deal, is to connect power generation facilities such as solar and wind power to IT infrastructure. Therefore, as the importance of energy-storage technology has increased, the development of graphene-based next-generation batteries is drawing attention in South Korea and internationally.

Association Rule Mining
In total, 15,549 lines were extracted from the project information of graphene GT R&D project information. The support level and the confidence level were both set at 0.05, considering the size of the dataset and the number of created rules. In total, 10,643 transactions and 10,150 items were generated. The top 10 words by frequency were "development", "graphene", "technology", "nano", "research", "material", "property", "structure", "use", "electrode". When looked into the relationships between the words, "technology" and "development" had the highest probability of simultaneous occurrence in each sentence unit with support at 0.14 and confidence at 0.50. We identified numerous rules but first consider only the top 10 (Table 4). The association between word items was presented in parallel coordinates ( Figure 2); the thickness of the line represents the support level. The relationships between {solar}→{battery} and {technology}→{development} were strong; this observation means that graphene-based solar cell technology has been heavily funded and studied. Furthermore, {nano}→{graphene}, and {characteristic}→{research} are also noteworthy. Most graphene-related national R&D remains in the basic research stage, so investment is currently larger in the development of graphene technology than in its commercialization. We visualized 32 association rules ( Figure 3) as a network that shows associations between the words. The significance of the association rule analysis is generally recognized when its lift value is >1. We considered only the 32 rules that had lift ≥1.3. "Graphene", "research", and "technology" were located at the center of the relationship.

Topic Modeling
For topic modeling, four columns were extracted from the graphene GT project information as the dataset: "project name", "keyword", "summary of research objective", "summary of research content". We used LDA function from the "topicmodels" package in R. First, we calculated the coherence score (Figure 4a). The coherence score was highest for k = 20, but resulted in duplication of words in many topics, and each topic could not be easily named. Therefore, we chose k = 12 as the optimal number of topics. This choice yielded overlapping areas only between themes 3 and 6, and themes 10 and 12 ( Figure 4b); this degree of overlap seems acceptable.
In the next step, we visualized per-topic, per-word distribution in order to identify which words are most prominent within a specific topic ( Figure 5). Since LDA does not assign any label to each topic, we named each topic according to the most commonly appeared words (Table 5). All 12 topics were focused around the development of graphene technology. Some of these themes are quite clearly associated with specific topics. For instance, Topic 2 contains words such as "storage" and "capacitor", and therefore seems to be related to energy-storage technology. Topic 6 seems related to batteries, mainly solar cells, and also lithium-ion batteries. The word "process" suggests that a few projects may be related to processing technology. Topic 7 includes "hydrogen", "fuel cell" and "hybrid", and therefore seems to be about green energy. Four out of 12 topics are about energy storage technology and batteries, so a substantial part of graphene GT R&D concerns development of graphene-based next-generation batteries. The development of graphene technology is still heavily focused on basic research (Section 4.1), so one topic is related to industrialization and commercialization of graphene.

Discussion
We have analyzed the graphene GT field by applying text mining techniques to data from South Korea's national R&D project management system. GT began to emerge as the problem of energy depletion resulted from the climate change and rising oil prices, and, therefore, many advanced countries have promoted GT development as a top priority. GT in South Korea has emerged as a key driver of low-carbon green growth as the former administration of South Korea (2008-2013) declared its vision of "Green Growth" in August 2008. In 2009, the government announced a comprehensive GT R&D plan that was investing USD 85 billion in clean energy technologies to foster a clean-tech export industry [39,40].
Research on graphene rapidly increased from 2010, when Andre Geim and Konstantin Novoselov at the University of Manchester won the Nobel Prize for the discovery of graphene. Globally, researchers working in the field of carbon nanotubes gradually moved towards the study of graphene with the high expectations of its applications [41].
The successive government (2013-2017) maintained the position that South Korea must respond to energy and climate problems by promoting new energy related industries as part of the government's core strategy of "Creative Economy" [42]. In 2015, the South Korean government presented the "Technology Roadmap for Promoting Commercialization of graphene (2015-2020)", and the largest number of graphene GT R&D projects were carried out during 2015.
In 2020, the incumbent government has proposed a grand plan for the Green New Deal to harmonize with the establishment of digital infrastructure, called the "Korean New Deal". The government has been pursuing greener energy policies to meet the national targets in accordance with the UN Framework Convention on Climate Change as one of the parties to the Paris Agreement on climate. In June 2019, the South Korean government confirmed the Third Energy Master Plan which calls for expanding renewable energy production to 20% by 2030 and between 30-35% by 2040, while also reducing South Korea's dependence on coal [43]. The government is also taking steps to stimulate the development and the usage of eco-friendly vehicles. The Green New Deal also aims to increase the number of electric vehicles to 1.33 million units by 2025: 1.13 million that use batteries and 200,000 that use hydrogen power [44]. Hyundai, the fourth-largest global manufacturer of electric cars as of May 2020, plans to launch a next-generation electric vehicle that has a range of 450 km (280 miles) per charge, which will take 20 min or less [45]. The key is the development of large-capacity battery technology.
The industrialized utilization of graphene is consistently extending its fields from electrical, mechanical, and medical properties to a different scope of applications such as batteries, capacitors, semiconductors, bio-sensors, displays, etc. [46]. Chinese applications for patents concentrated mainly on preparing batteries and composites, whereas South Korean organizations applied for patents mainly in semiconductor devices and batteries. US organizations did so primarily in semiconductor devices. This study's results show that the main applications of graphene are electronics, batteries, semiconductors, composites, etc. [19]. Besides, its applications can be an asset to a greener environment [46]. Graphene's applications are expected to emerge in areas related to sustainable energy and environmental technologies, such as batteries, hydrogen storage, field emission displays and composite materials [47,48]. Graphene is being evaluated as a suitable material for fast charging because it conducts electricity more than 100 times better than copper and can move electrons more than 140 times faster than silicon. Graphene can increase battery capacity by 45% compared to conventional lithium-ion batteries, and can be charged five times faster than they can. In January 2018, Samsung SDI unveiled a next-generation battery that uses a new material called "graphene ball"; it has attracted the attention of global automakers. The battery can be charged in only 20 min and has a range of 600 km [49]. The dominance of the next-generation electric vehicle market may depend on who has the technology to charge larger batteries fastest.
Graphene-based materials may also have applications in supercapacitors and other energy storage devices [50][51][52]. Use of graphene may increase the charging capacity of a supercapacitor. The most important electrochemical energy storage and conversion systems are conventional capacitors, supercapacitors, Li-air batteries, and fuel cells. Supercapacitors can tolerate higher power rates than batteries and fuel cells and are therefore being studied in academia and industry [53]. Supercapacitors with high capacity as storage batteries can be rapidly charged and can instantaneously generate high-power electricity, so they are mainly used in smart grids and renewable energy generation systems that manage power demand [54]. Replacing the existing carbon electrode with graphene material increased the charge-storage capacity by 2-3 times. As a result, its energy storage capacity should also increase, while maintaining fast charge and discharge characteristics [55].
When it comes to the national R&D of South Korea, from 2010 to 2013, graphene GT projects rapidly increased according to the former government's green policy. However, from 2013 to 2017, green policy was neglected in keeping with the successive government policy direction. Consequently, the graphene GT R&D project showed a decreasing trend except in 2015 when "Technology Roadmap for Promoting Commercialization of graphene (2015-2020)" was announced. From 2017, with the launch of the incumbent government, the trend curve showed a slight decline in 2018, but is increasing again. The result demonstrates a highly optimistic outlook and suggest that spending on related R&D projects will increase in the following years. Particularly from 2020, national R&D investment of graphene GT is expected to increase significantly along with the implementation of the Green New Deal policy.
As seen from the trend curve of graphene GT R&D projects and the research expenses, it is expected to show positive rising curve after 2020, with a huge budget allocated to the new and renewable energy sector. Graphene GT R&D projects are substantially oriented toward batteries and energy storage devices. This trend of government investment will be continued along with the implementation of the Korean New Deal policy. Especially, the demand for high-capacity graphene supercapacitors is expected to surge in the future due to the need to improve the reliability of power supply of new and renewable energy in line with international ecologically benign policies and the introduction of smart grids. In the future, supercapacitors may be a core component of renewable energy vehicles and industrial devices, and may develop into an independent industrial sector.
According to the IDTechEx Report [56], the global market in the promising area of graphene will be worth above USD 300 million by 2028. Graphene is currently in the early stages of industrialization worldwide, and the direction of government budget support is shifting from source technology development to commercialization. In line with the global graphene R&D trend, increased government support for the commercialization stage need to be considered in South Korea as well. One of the biggest obstacles to the commercialization of graphene is that materials companies usually have a structure that cannot survive unless they are large scale. Therefore, more flexible standards are needed so that materials companies can more easily receive government support at the national level.

Conclusions
In this paper, we explored the national R&D trends and knowledge patterns of graphene, which is a new material that is attracting research attention along with the South Korean version of the New Deal policy, especially the "Green New Deal policy". Among the graphene-related national R&D projects, 826 projects categorized as GT were used as the subject of our study, to identify the status and trends from 2010 to 2019. First, we tabulated the trend in graphene GT R&D over those 10 years. Both the number and funding value of the projects increased until 2016, then decreased to the initial level by 2019. Then we conducted frequency analysis to identify the most recurrent terms that characterize graphene GT project information. It was noticeable that "battery" and "energy" moved up in the rankings after 2015. Then we conducted association rule mining to discover relationships between keywords. By visualizing the association rules, we verified that certain research keywords that are strongly correlated. "Solar cell" was identified as one of the core research keywords, but was not seen in the frequency analysis. Finally, we conducted automatic topic classification by topic modeling. This process revealed that one-third of the topics were related to battery or energy storage technology.
This study has the limitation that NTIS data includes only national R&D projects. Thus, we recommend future studies on expanded datasets that include project information of private sectors. Furthermore, we recommend a comparative analysis of graphene R&D trends between countries or analysis using papers and patent data be conducted. This study is expected to help guide the direction of national R&D investment and revitalization of new promising industries. Furthermore, this research may motivate other countries to cooperate in graphene R&D, and may provide a guide to development of successful R&D policy in other countries.