Exploring Promising Research Frontiers Based on Knowledge Maps in the Solar Cell Technology Field

.


Introduction
Recently, many challenging environmental issues such as the increase in energy demands, the exhaustion of fossil fuel and climate change have emerged, making many countries pay attentions on OPEN ACCESS various influential policies and R&D efforts.In particular, recently, shale gas has become a critical substitute source of natural gas, enabling the structure of power in the energy industry to radically be reorganized.Although the development of shale gas was originated from the United States, interests have spread to potential shale gas in the rest of the world.To promptly cope with such rapidly changing situations, strategic research and development (R&D) planning is significantly important because sophisticated technology can realize the needs of such energy demands.Basically, new sources of energy cannot be extracted and developed with traditional technologies that are unable to meet totally different technological needs of them.To this end, detection of promising and emerging research themes in energy technology is regarded as a priority task.However, most of the strategies for development of energy technology have relied on the consideration of national policies and qualitative analyses without a systematic process or methodology.Thus, such R&D processes are subject to yield rough research themes based on intuitions of policy-makers and technologists.
Though some researches utilize quantitative approaches such as bibliometric analyses of patents and publications in energy technology, most of the studies suggest simple statistical graphs and analyze the status quo of technologies [1][2][3][4].The sector of energy technology inevitably needs technology forecasting because a lot of energy issues are global-wide problems that all countries should collaborate to solve for the future of human being.However, fruitful databases of patents and academic papers have been intensively utilized to anticipate promising energy technology.Thus, a systematic approach based on a knowledge map can be a solution to overcome the limitations of previous researches on bibliometric analysis.As one of the knowledge maps, a patent map is able to be utilized for various functional areas.As Yoon et al. [5] suggested for the usage of patent map, it can be used to: (1) identify industry or specific technology trends and understand competitors' technology assets and strategy for administration area; (2) select new product themes, understand portfolios of technology assets, and identify technology vacuums in the R&D area; (3) identify portfolios of R&D human resources in the personnel area; and (4) identify patent infringements and investigate life cycle and scope of patents in the patent management area.
The present research aims to propose a quantitative methodology identifying promising research frontiers of energy technology.A knowledge map of solar cell technology as an illustration of the proposed approach is developed by visualizing research frontiers that are derived from bibliographic information from patents and scientific articles.The hybrid approach of network analysis and principal component analysis is utilized as a visualization technique of a knowledge map based on bibliographic coupling.Promising research frontiers from both patent and scientific paper knowledge maps are suggested by conducting dynamic analysis.Furthermore, practical contents of promising research frontiers are suggested by conducting text-mining technique.
The study is structured as follows.Section 2 briefly explains the relevant studies and describes the research methodology.Based on the methodology, Section 3 illustrates the proposed approach and shows the results of analysis, using solar cell technology and Section 4 describes implications and discussions of the results.Finally, Section 5 draws conclusions and policy implications, discussing the contributions and limitations of this study.

Research Frontier
The concept of a research front was originally introduced by de Sollar Price [6].According to Price, research fronts are research domains under development where papers cite each other densely.Analysis of research frontiers, so called research fronts, was originated by pioneers of bibliometric citation analysis [7].The most highly cited scientific papers were defined as a "research frontier" since the number of citations can be interpreted as signs of their importance as sources of new knowledge [8,9].As a similar concept, Garfield and Small [10] identified "hot field" clusters of highly cited papers within three years from publication.
Quantitative methods have been applied to identify and track the research frontiers.Small and Griffith [11] represented scientific specialty by computing co-citation strength between pairs of documents.Braam et al. [12] analyzed the frequency of indexing terms and classification codes co-occurring in the publications.Clustering methods were improved with co-word based map [13,14].Persson [15] defined research frontiers by developing a research front map based on citation-based bibliographic coupling.Fujita et al. [16] detected research fronts and promising fields by division into clusters in combinational citation networks.Shibata et al. [17] developed a method of detecting emerging research fronts based on topological measures in citation networks of scientific publications.The studies on comparing the performance of cluster solutions such as co-citation, bibliographic coupling, direct citation, and hybrid approaches have been conducted in order to improve research fronts mapping performance [18][19][20].Lucio-Arias and Leydesdorff [21] utilized title words, cited references, and sequence numbers as an indicator of research front activity.Saka et al. [22] analyzed the dynamic changes and trends of scientific research fronts using the database of articles with high citation rates among the published articles.Toivanen [7] analyzed the evolution of Brazilian research frontiers with bibliometric methods.

Bibliometric Analysis
A bibliometric analysis refers to the research methodology utilized in information sciences, employing quantitative analysis and statistics to describe the distribution patterns of articles within a given topic, field, institution, or country [23].Bibilometric methods can be applied to assess the scientific outputs or research patterns of authors, institutes and countries.In addition, this analysis can contribute to the progress of science and technology in many ways by allowing analysts to analyze international collaborations, laying the academic foundation for the evaluation of new developments, identifying major scientific actors, performing technology forecasting and so on.Two main methods of bibliometric analysis are science/technology mapping and performance analysis.The purpose of science/technology mapping is to visualize the conceptual structure of research and technology, showing the evolution of them.However, the performance analysis aims at evaluating the citation impact of the scientific/technological production of difference actors.Thus, it is related to various fields such as bibliometrics, scientometrics, informetrics and so on.These three terms have so considerable fuzzy borderlines that these are used as almost similar concept.Before the terminologies were used, Cole and Eales [24] conducted first bibliometric study although they used terminology of "statistical bibliography".Hulme [25] conducted further work using patents in order to measure social progress in Britain.Lotka [26] studied frequency distribution of scientific productivity using the decennial indices of Chemical Abstract.Gross and Gross [27] conducted their citation-based study in order to support the decision which chemistry periodicals should best purchased by small college libraries.Zipf [28] formulated a law on word frequency or occurrences.
Many following researchers [29][30][31][32][33][34][35] have claimed the term "bibliometrics" is coined by Pritchard [36].Pritchard explained the term bibliometrics as "the application of mathematical and statistical methods to books and other media of communication".The term scientometrics, which is Russian equivalent of the term "naukometriya", were almost simultaneously introduced by Nalimov and Mul'chenko [37].They defined scientometrics as "the application of those quantitative methods which are dealing with the analysis of science viewed as an information process".The term Informetrics, which comes from German term "informetrie", was proposed by Nacke [38] to cover the part of information science.
Hood and Wilson [39] explained the differences among the terminologies.Bibliometrics deal with general information process and focus on the literature itself of science and scholarship, while scientometrics is restricted to the measurement of science communication.However, it offers more information on science and technology after measuring and analyzing literature output; for example, socio-organizational structures, governmental policies toward science and technology and so on.Informetrics stands for a more general subset of information science dealing with mathematical statistical analysis of communication process in science and also deals with electronic media.

Knowledge Map
A knowledge map or mapping is generally a representative tool or technique concerning "knowledge about knowledge" rather than knowledge itself [40][41][42].A knowledge map is identified in terms of various perspectives as: (1) a visualization tool that enables the identification of the competence of a firm by analyzing the knowledge portfolio for knowledge manager [40]; (2) a type of mental diagram that allows the complex idea to be arrayed in a logical sequence for the knowledge management system developer [43]; and (3) a spatial arrangement plan representing inclusion/dependence relationships among knowledge for the knowledge user [44].Knowledge maps are classified as hierarchical/radial knowledge maps, networked knowledge maps, knowledge source maps, and knowledge flow maps, in terms of their structure [45].
Studies on knowledge maps utilizing science and technology information such as patents and scientific papers are widely conducted with various methods.Traditional science and technology knowledge maps suggested statistical graphs on papers, textbooks, and reports [36].The range of mapping data on knowledge maps has been extended to citations, indices, classifications, authors, etc., in terms of bibliographic information.Textual information of scientific documents such as patents and papers is also utilized to generate knowledge maps.Thus, various types of knowledge maps on science and technology are visualized by diverse visualization techniques such as matrix, network analysis, multi-dimensional scaling (MDS), self-organizing mapping (SOM), and generative topological mapping (GTM).
In terms of studies on knowledge maps with scientific papers, Persson [15] suggested a research front map based on bibliographic coupling with articles published in Journal of the American Society for Information Science (JASIS).White and McCain [33] utilized co-citation analysis and MDS to visualize academic research areas.Van Raan [46] generated inter-journal bibliometric maps in order to track how scientific knowledge has developed, centering on the interdisciplinary relationship.Boyack et al. [47] presented a knowledge map and examined differences in subject areas by comparing co-citation and the inter-citation analysis.Medina and Leeuwen [48] presented a network map that comprises the most deeply related journals of the seed journal.Su and Lee [49] made a survey by generating a keywordbased network with the articles published in Research Policy.
In terms of studies of knowledge maps of patents, Shin and Park [50] generated a patent claim map using text-mining and network analysis.Yoon, Yoon and Park [5] suggested technology vacuum maps, claim point maps, and technology portfolio maps utilizing the SOM method.Segev and Kantola [51] identified patent trends by presenting knowledge maps of patents by SOM. Lee et al. [52] discovered new technology opportunities by conducting text-mining and principal component analysis.Son et al. [53] developed a GTM-based patent map to automatically identify patent vacuums.
In terms of studies of knowledge map of other sources, Yoon et al. [54] developed a core R&D map, R&D trend map, R&D concentration map, R&D relation map, and R&D cluster map using a research proposal database.Chen and Chen [55] and Chen [56] established a design patent map with patent examiners.Kim and Park [57] proposed user-centric service map to identify a new service opportunity from potential needs.

Research Concept
This research aims to suggest promising research frontiers by generating knowledge maps of patent and scientific paper.Basically, promising research areas have potentials that they provide significant influences to subsequent patents and papers.Such impacts can be measured by the cited frequency in the citation relationship.In particular, since research frontiers need to cover academic as well as practical worlds, the proposed approach uses two different databases of papers and patents.The results of analysis on two data will be complementarily interpreted to investigate the possibility of promising research themes.The overall research concept is shown as Figure 1.Bibliographic information from the patent or scientific paper database is collected in the research area.Core documents which are highly cited by other documents are extracted from the collected documents.The core documents are clustered to identify research frontiers.The research frontiers are visualized as a knowledge map.In the visualization process, emerging or enhanced research frontiers are identified by conducting dynamic analysis.Those emerging or enhanced research frontiers are identified as promising research frontiers for the future.The related keywords of the promising research frontiers are finally suggested as promising contents.

Research Framework
The overall research process to explore promising research areas by developing knowledge maps consists of several steps, as shown in Figure 2. The first step is the data collection from the patent database or scientific paper database because two databases are main data sources by covering both practical and academic research activities.Second, the core documents are extracted based on the number of citations since the highly cited documents can be considered as core data sources which contain high quality technological information.Third, research frontiers are identified by conducting k-means clustering with core documents.Similar documents that are correlated in terms of technological contents can be grouped to investigate trends of influential technologies.Fourth, the research frontiers are visualized as knowledge maps by conducting a hybrid approach of principal component analysis and network analysis.Fifth, promising research frontiers are determined by conducting a dynamic analysis which is a comparison process of knowledge maps between the first and second periods.If significant changes can be examined by the comparisons over time, promising research themes can be and drawn and highlighted.Finally, promising contents are suggested by conducting text-mining from the related documents in promising research frontiers because concrete research activities can be pursued on the basis of research keywords that are extracted by the text-mining.

Detailed Procedures Data Collection
First of all, a technology field and scope of analysis such as publication year and types of documents for analyzing are identified.Afterward, a search formula is built to collect data.When building the search formula, domain experts' participation or utilization of technology classification table makes for better search results when collecting document data.After the search process, bibliographic information from a reliable patent or scientific paper database such as the United States Patent and Trademark Office (USPTO) database or the web of science (WoS) database is collected.The bibliographic information contains title, abstract, the number of citations, citing reference list.The collected raw data is refined by eliminating noise which is irrelevant in the documents data or have missing values.

Core Documents Extraction
Core patent and scientific papers exert so important an influence and impact in the technology field that core patent identification is highly necessary.In this research, core documents are identified based on the number of citations.Highly cited documents are extracted as core documents since the documents can be considered as documents that contain valuable technological information.In this process, suitable thresholds are selected to extract core patents and scientific papers since the characteristics of each database are different.The previous research on identifying hot research areas by bibliometric methods conducted by National Institute of Science and Technology Policy (NISTEP) utilized 1% of the threshold value based on the number of citations to extract core papers [22].However, the threshold for a patent database should be higher than that of the papers since the number of scientific paper publications and citation is much more than that of patent registration and citation in general.Furthermore, visualization of data should be considered since a map with too much data could not be comprehended intuitively when choosing the threshold.

Research Frontiers Identification
The extracted core documents are clustered by k-means clustering-which is popular for cluster analysis in data mining-to identify the research frontiers (RFs) that are the leading research themes in the technology area.RFs which are extracted from the patent data are technical RFs and those from the scientific paper data are academic RFs.The core documents are clustered based on the reference information; that is to say the documents that cite the same references are similar to each other.The data format for k-means clustering in this step is like Table 1.In this step, a suitable k should be selected by sensitive analysis.After cluster analysis, the extracted RFs are named by domain experts after reviewing the titles and abstracts of documents.

Knowledge Maps Generation
Relations among RFs are visualized as two types of network formatted knowledge maps: a patent knowledge map and a scientific paper knowledge map.The proposed visualizing method is a hybrid approach that is a combination of network analysis and principal component analysis.In the proposed network, the node is RF and link is the bibliographic coupling relation, which means the similarity between documents based on the number of common references.Furthermore, the proposed knowledge map provides position information additionally based on the first two principal component scores by conducting principal component analysis (PCA).PCA is conducted with the data formatted as in Table 2 to present RF's position.Thus, the X-axis is the first principal component score and Y-axis is the second principal component score.The node position indicates the similarity relation between one and other RFs in a whole map while the link indicates a one-on-one similarity relation between two RFs.For example, nodes are positioned based on the principal component scores in Figure 3.The node A or G is similar to B and C or to F and H, respectively, when it comes to a whole network.However, the node A is linked to node G though A is far from G since the two nodes are strongly bibliographically coupled.Bibliographic coupling strength should be calculated in order to visualize the one-on-one similarity relation between RFs.To this end, the original bibliographic coupling strength, which is defined as the number of common references, is calculated as a matrix as in Table 3.However, the original bibliographic coupling strength should be normalized since the documents have different numbers of references.The normalized coupling strength (NCS) is defined as [58] where NCSij is the normalized coupling strength between document i and j, rij is the number of references common to both i and j, ni is the number of references in the reference list of document i, and nj is the number of references in the reference list of document j.The range of the NCS value is zero to one.The original coupling strength matrix between documents is transformed to the normalized coupling strength matrix between documents.The normalized coupling strength matrix between documents is finally transformed by calculating an average value of normalized coupling strength between RFs like Table 4.The RFs are visualized based on the average normalized coupling strength as nodes in the map.The coupling relation between RFs is visualized as a link with a suitable threshold value.In addition, the core research frontier (RF) which has high degree centrality value in the knowledge network can be identified.

Promising Research Frontiers Identification
A promising RF is identified on the basis of the results from dynamic analysis which is a comparison process of the number of core patents in two periods.To this end, the data are split into two time period.This paper defined two types of promising RFs: emerging and enhanced.First, an emerging RF is an RF that is newly emerged since the core documents do not exist in the first time period and emerge in the second period.Second, an enhanced RF is the RF for which the number of core documents in the RF increases in the second period.Thus, the promising RFs are the RFs that are newly emerged or enhanced as the core documents are newly emerged and increased in the dynamic analysis.In addition, the relation between core RFs and promising RFs can be observed to investigate and compare the current important RFs and future important RFs.

Contents Analysis in Promising Research Frontiers
Keywords can play a critical role in coming up with innovative research ideas because the identification of relevant keywords is the first step for idea generation.Thus, this research utilizes the text-mining approach to extract the contents of promising RFs.Since it is difficult to derive promising RFs keywords from a few core documents in the promising RF, promising RF's relevant documents should be collected again.To this end, first, a search formula of promising RFs is developed to collect relevant patent or paper data.Second, data is refined by eliminating noise because the search results may include irrelevant documents.Although this process can be automatically performed, the intervention of experts is necessary for better performance of keyword analysis.Finally, promising keywords are extracted based on the high frequent keywords from the abstract of collected documents by text-mining.The suggested promising keywords can be utilized as the contents of promising RFs.

Data Collection and Core Documents Extraction
Solar cell power generation is one of the strongest technologies for applications of solar energy.The governments of many countries have promoted the solar cell industry to solve energy problems, allowing the industry to become one of the fastest growing new industries.A solar cell can be defined as an electrical device that converts the energy of light directly into electricity by the photovoltaic effect.Energy harvesting powered devices have the potential for widespread use in buildings as sensors in building management systems.Thus, the solar cell technology is one of critical research areas in the energy sector, having industrial impacts in terms of market size as well as technology diffusion.Patent data on the solar cell technology that were granted from 2008 to 2012 is collected from the four patent offices: Unites States (US), Europe (EP), Korea (KR), and Japan (JP).To collect the patent data, patent search queries are built by research areas with domain experts' participation from Korea Institute of Energy Research (KIER) shown in Table A1.A technology classification related to the solar cell technology is utilized in order to collect the relevant patent data.The citation frequency of 1668 patents is distributed as shown in Table 5.The highest cited frequency is 91 and 956 patents have not been cited.The 175 patents among 1668 patents which are cited more than five times and occupy approximately 10% are selected as core patents since these are considered as highly cited patents of the total patent data set (We conducted t-test with self-citation rate of selected patents to verify whether the selection of core patents is biased.The references citing the selected patents are additionally collected and the patents are split into two groups which are top 5% cited of patent group (group 1) and the patents group (group 2) which is occupied from 5% to 10%.As a result, the difference of self-citation rate between group 1 and group 2 is not statistically significant (p-value = 0.152 > 0.05).The result implies that the self-citation may not lead to serious biased results on the selection of core patents.Results: Group 1 (N = 82, mean = 0.3571, S.D. = 0.3989); Group 2 (N = 93, mean = 0.2750, S.D. = 0.3493)).In this research, 174 patents are analyzed and one patent is excluded for having no citing reference.The solar cell technology relevant scientific paper data that was published from 2008 to 2012 and indexed in SCI-Expanded or SSCI is collected from the Web of Science database.Paper search query is composed of combination of keywords "solar", "photovoltaic", and "cell*" in the "title" category.The citation frequency of 13,162 papers is distributed as Table 6.The 131 patents which are cited more than 194 times and occupy about 1% are selected as core papers since these are considered as highly cited papers of the total paper data set.In this research, 130 papers are analyzed and one paper includes insufficient information.The threshold to select a technology relevant core document is different since the properties of patents and papers are different.The number of published papers is much greater than that of granted patents and the number of citations of patent is much less than that of scientific paper publication since patent citation is related to complicated business situations.

Research Frontiers
The solar cell technology relevant 174 core patents are clustered by conducting k-means clustering.The 45 clusters are extracted by sensitive analysis with energy technology domain experts.These 45 clusters are named by domain experts after reviewing the title and abstract of the patents.The 45 clusters are identified as research frontiers (RFs) since these consist of highly cited core patents and imply research themes.The 45 RFs are matched to the research areas (RAs) which are the highest class level in the solar cell technology classification which is utilized when collecting relevant documents (see Table 7).Though several RFs have only one core patent, the core patents can stand for research theme because the core patents are extracted as 10% highly cited patents among all collected patents.In addition, the results can be guaranteed because the RFs are identified by domain experts after clustering analysis.The solar cell technology relevant 130 core scientific papers are assigned to 45 predefined research frontiers which are derived from core patents by domain experts.However, four research frontiers are newly added since some of the core scientific papers are not assigned to the existing 45 research frontiers.The RFs from core scientific papers consist of 11 existing RFs and four newly added RFs which are polymer solar cell (RF46), plasmonic solar cell (RF47), mesoscopic solar cell (RF48), and solar water splitting (RF49), as shown in Table 8.RAs which are thin film solar cells and organic/quantum dot/nano convergent solar cells account for a high portion of core patent data.There are many core patents in CIGS based thin film (RF6) and thin film solar cell manufacturing (RF41) of thin film solar cell RA, dye-sensitized solar cell (RF10) and organic solar cell (RF23) of organic/quantum dot/nano convergent solar cell RA, and manufacturing methods for solar modules (RF18) of the other solar cell RA.However, RAs which are organic/quantum dot/nano convergent solar cell and other solar cell account for a high portion of core scientific paper data.There are many core scientific papers in bulk heterojunction thin film solar cells (RF5) of thin film solar cell area and dye-sensitized solar cell (RF10) of organic/quantum dot/nano convergent solar cell RA, and polymer solar cell (RF46) of the other solar cell RA.

Knowledge Maps
Two types of knowledge maps from patent and scientific papers are generated by conducting principal component analysis (PCA) and network analysis.The results imply the position on the knowledge map and Tables A2 and A3 show the results of PCA from the patent and scientific paper research frontiers.The position among the respective RFs depicts the relationship in the whole knowledge map.
Furthermore, the RFs of patents and scientific papers are visualized based on the bibliographic coupling relation by network analysis (see Figures 4 and 5).The links of the knowledge maps visualize the one-on-one relationship by linking normalized bibliographic coupling strength between nodes that are greater than the threshold value even if the nodes are far from each other with respect to the position.The threshold value is 0.02 which is selected by conducting sensitive analysis to visualize the relation among the research frontiers.The core research frontiers which have a high degree of centrality values in the maps can be derived from maps, as summarized in Tables 9 and 10.The core research frontier nodes are colored red in the maps.Most of the core RFs which are derived from patent knowledge maps are related to structure of the solar cell such as RF33, 3, 8, 14, 20, and 30 while RF6 is related to material and RF44 is related to methods.However, core RFs derived from the scientific paper knowledge map consist of two attribute-related RFs such as RF23 and 46, two structure-related RFs such as RF5 and 37, a method-related RF such as RF45, and a material-related RF such as RF19.

Promising Research Frontiers
Promising research frontiers are identified by conducting dynamic analysis.The whole data set is split into two periods.In the present research, 2008 to 2010 is the first period and the more recent two years, 2011 to 2012, is the second period.Then, knowledge maps, utilizing the patent and scientific paper date which is registered and published from 2008 to 2010, are generated.The dynamic analysis is the process of comparing the knowledge maps of the first period (Figures 6 and 7) and of the second period (Figures 4 and 5) which represent the entire period of knowledge mapping with the cumulative data.The results of the dynamic analysis are shown in Tables 11 and 12.   Twenty three promising RFs among 45 RFs are derived from the patent knowledge map.Five RFs, nine RFs, four RFs, and five RFs are derived from crystalline silicon solar cell, thin film solar cell, organic/quantum dot/nano convergent solar cell, and other solar cell research areas, respectively.Ten promising RFs are related to structure of solar cells, six promising RFs are related to methods, five promising RFs are related to attributes, and two promising RF are related to materials, as shown in Table 11.Eight promising RFs among 15 RFs are derived from the scientific paper knowledge map.One RF, four RFs, two RFs, and one RF are derived from crystalline silicon solar cell, thin film solar cell, organic/quantum dot/nano convergent solar cell, and other solar cell research areas, respectively.Three promising RFs are related to attributes.Two promising RFs are related to structure and methods, respectively, and a promising RF is related to materials, as shown in Table 12.

Contents of Promising Research Frontiers
Textual data of patents and scientific papers from 2008 to 2013 is collected using a search formula since the number of core documents is small.The core keywords of promising RFs, shown in Tables 13  and 14, are suggested by extracting highly frequent keywords excluding noises such as article, pronoun, conjunction, etc. Table 13 shows the results of only emerging research frontiers since there are many emerging and enhanced research frontiers.However, Table 14 consists of emerging and enhanced RFs.The derived keywords are considered as promising contents of promising RFs since those are from the emerging or enhanced RFs.Thus, the users should pay attention to those contents when they plan technological strategy on promising RFs.Table 13.Keywords of promising research frontiers from patent knowledge map.

RF2
Back contact-type solar cell quantum wells, silver particle, back contact-type, contact-type solar RF7 Contact for a photovoltaic cell oxide, dielectric,

Discussion
The present research deals with research frontiers that are derived from patents and scientific papers based on the frequency of citation.The data are split into two periods for extract promising research frontiers.Trends of core documents in research areas and research frontiers can be shown in Tables 15  and 16.The number of core documents in the second period is less than that in the first period in terms of both patent and scientific paper RFs since the recently registered patents or published papers have less of a chance to be cited by others.Furthermore, the total number of core scientific papers in the second period is much less than that of patents since the threshold value of patents is much higher than that of scientific papers.Thus, rate is utilized to analyze the trend instead of the frequency.The rate of core patents in thin film solar cell RA rapidly increased and core patents in RF6, 43, 44 are increased while the core patents in RF30, 24, 41, 28, and 18 are decreased.The rate of core scientific papers in crystalline silicon solar cell and thin film solar cell RAs is drastically increased during the second period and organic/quantum dot/nano convergent solar cell RA is decreased.The rate of core scientific papers in RF45 is increased and that of core scientific papers in RF10 is remarkably decreased.
Core research frontiers are central nodes in a knowledge network of whole periods while promising research frontiers are newly emerging or enhanced nodes in the second period.Thus, the relationship between core research frontiers and promising research frontiers is presented in Tables 17 and 18.It can offer some meaningful insights for energy policy makers to understand the relation between core RFs and promising RFs.Core and promising RFs derived from the patent knowledge map tend to be distributed to structureand method-related research themes while core and promising RFs derived from the scientific paper knowledge map are equally distributed, as shown in Table 19.The industrial technologies such as structure-and method-related core and promising RFs tend to be derived from the patent knowledge map while the basic technologies such as attribute-and material-related core and promising RFs are more included in the scientific paper knowledge map than in the patent knowledge map.That is to say, RFs from the patent knowledge map are related to development and RFs from scientific paper knowledge map are related to research in the R&D research theme.Types of technology are based on the TEMPEST which is one of methods of classifying patents [59,60].TEMPEST is an acronym of a viewpoint for analyzing patents.E is abbreviation of energy, representing principle of work devices or technology or power driving devices.M is for material, and P is for personality which depicts the function, property, or attribute of the technology.S (space) point of view approaches the present structure of product or technology.Finally, T (time) demonstrates manufacturing or controlling methods and processes.The present study distinguishes the type of technology as four types-structure, methods, attributes, materials-in Tables 9-12 and 19.The type regarding energy is excluded since it does not exist in the present case.

Conclusions and Policy Implications
The present research proposes a methodology for identifying promising research frontiers and their keywords by generating knowledge maps based on scientific documents such as patents and academic articles.To this end, the core documents are derived from the collected scientific documents data based on high citation counts.Research frontiers, which core documents are clustered into, are visualized as a knowledge map to represent the relationship between research frontiers using network analysis and principal component analysis.The promising research frontiers are identified by conducting dynamic analysis and the contents of the derived promising research frontiers are suggested.
The proposed method and results can be utilized in various ways since the method is based on the patents and scientific papers reflecting both industrial and academic perspectives.First, researchers can establish research direction by confirming core and promising research themes and contents.Second, R&D policy makers can utilize the results and methods as supporting data and a tool for decision making on upbringing policy of promising research.In energy policy, due to long development time and a large scale of R&D budgets, the proposed approach can importantly contribute to devising future energy policy.Third, firms can obtain insights on future technology, products, and market opportunities to get a good competitive position.
This study makes the following contributions.First, a quantitative approach is proposed to suggest promising research frontiers with both industrial and academic perspective by using scientific papers and patents data.Second, on visualization method, a hybrid approach of network analysis and principal component analysis are suggested.It can be used to visualize not only the relation among all but also the one-on-one relations with positions and links in a network.Finally, the characteristic difference between derived research frontiers from both patent and scientific knowledge maps is presented.
Despite these contributions, this study has some limitations.First, the highly cited documents which have emerged recently might be not included since recently emerging documents have a low chance of being cited.To overcome this limitation, estimation treatment on citation frequency of recently emerging documents or utilization of citation time lag distribution or data extraction by published years is needed in further study.Second, although it suggests a systematic approach, experts' participation is necessary in some processes.However, the appropriate experts' participation needs to be required for a high quality of knowledge maps.Thus, a future study can investigate which processes in this approach should be implemented automatically in a system or manually by experts.

Figure 3 .
Figure 3. Concept of knowledge map.
for thin film photovoltaic cell fabrication Enhanced RF5 Bulkheterojunction thin film solar cell RF37 Tandem-type thin film solar cell RF46 Polymer solar cell

Table 2 .
Data format for principal component analysis.

Table 3 .
Original bibliographic coupling strength matrix.

Table 4 .
Average normalized coupling strength matrix.

Table 5 .
Patent citation frequency distribution.

Table 6 .
Scientific paper citation frequency distribution.

Table 7 .
Research frontiers from core patents.

Table 8 .
Research frontiers from core scientific papers.

Table 9 .
Core research frontiers from patent knowledge map.

Table 10 .
Core research frontiers from scientific paper knowledge map.

Table 11 .
Promising research frontiers from patent knowledge map.

Table 12 .
Promising research frontiers from scientific paper knowledge map.

Table 14 .
Keywords of promising research frontiers from scientific paper knowledge map.

Table 15 .
Change of core patents in research frontier.

Table 16 .
Change of core scientific papers in research frontier.

Table 17 .
Promising research frontier related to core research frontier in patent knowledge map.

Table 18 .
Promising research frontier related to core research frontier in scientific paper knowledge map.

Table 19 .
Summary of research frontiers.

Table A2 .
PCA results from patent RF.

Table A3 .
PCA results from scientific paper RF.