Advanced Technology Evolution Pathways of Nanogenerators: A Novel Framework Based on Multi-Source Data and Knowledge Graph

As an emerging nano energy technology, nanogenerators have been developed rapidly, which makes it crucial to analyze the evolutionary pathways of advanced technology in this field to help estimate the development trend and direction. However, some limitations existed in previous studies. On the one hand, previous studies generally made use of the explicit correlation of data such as citation and cooperation between patents and papers, which ignored the rich semantic information contained in them. On the other hand, the progressive evolutionary process from scientific grants to academic papers and then to patents was not considered. Therefore, this paper proposes a novel framework based on a separated three-layer knowledge graph with several time slices using grant data, paper data, and patent data. Firstly, by the representation learning method and clustering algorithm, several clusters representing specific technologies in different layers and different time slices can be obtained. Then, by calculating the similarity between clusters of different layers, the evolutionary pathways of advanced technology from grants to papers and then to patents is drawn. Finally, this paper monitors the pathways of some developed technologies, which evolve from grants to papers and then to patents, and finds some emerging technologies under research.


Introduction
As a novel energy solution for micro and wearable wireless electronic devices, nanogenerators (NG) have been developed to harvest energy from the environment, including biomechanical energy, solar and wind energy, thermal energy, etc. [1]. Based on different physical effects, nanogenerators can be roughly divided into piezoelectric nanogenerators (PENGs), triboelectric nanogenerators (TENGs), and pyroelectric nanogenerators (PYENGs) [2]. Notably, nanogenerators present widespread applications other than energy harvesting, benefiting from related technologies such as 5G and Internet of Things (IoT) [3], nanomaterials [4], flexible sensors [5], and so on. To date, these applications can be divided into two domains. One is the innovative devices and techniques in the engineering domain (e.g., self-powered sensing systems, wearable devices [6]), the other is the biomedical domain (e.g., implantable devices, tissue regeneration [7]). Due to the rapid development and diversity of nanogenerator technology, identifying and understanding the evolutionary path of nanogenerator technology is crucial for decision-makers to capture the development trends and directions [8].
Some previous studies roughly described a sub-field development path of nanogenerator technology based on literature reviews. However, with the rapid development of

Development of Nanogenerators
With the rise of the Internet of Things (IoT), advanced materials, and electronics, wearable and implantable devices have developed rapidly. Miniaturization and power continuity have become an important development direction of such devices, which puts high demands on power supply systems [20]. Traditional power methods such as lithium batteries and lead-acid batteries have the limitations of considerable size, short service life, poor flexibility, the possibility of environmental pollution, and the need for frequent replacement. Therefore, developing a new microelectronic power supply device with high flexibility and a sustainable power supply has become the focus of researchers. The key contributions of this paper can be summarized as follows: 1.
The nanogenerator field is emergent and rapidly developed, making it hard to analyze the evolutionary pathways of advanced technologies. This paper proposed a novel framework to monitor the evolution pathways based on multi-source data and a knowledge graph.

2.
When monitoring the evolution pathways, we applied the representation learning method and clustering method to connect similar entities, which enables the quantitative analysis of large-scale data, thus improving efficiency and accuracy.  3. This paper used multi-source data from three data sources and analyzed the evolutionary pathways between different data sources, which reflected the technology trends comprehensively and pluralistically.

Development of Nanogenerators
With the rise of the Internet of Things (IoT), advanced materials, and electronics, wearable and implantable devices have developed rapidly. Miniaturization and power continuity have become an important development direction of such devices, which puts high demands on power supply systems [20]. Traditional power methods such as lithium batteries and lead-acid batteries have the limitations of considerable size, short service life, poor flexibility, the possibility of environmental pollution, and the need for frequent replacement. Therefore, developing a new microelectronic power supply device with high flexibility and a sustainable power supply has become the focus of researchers.
Piezoelectric nanogenerators (PENG) using ZnO nanowires were first invented in 2006 by Wang Zhonglin based on the piezoelectric effect to harvest mechanical energy and convert it to electric power, which marked the beginning of self-power technology [21]. After that, other researchers made many attempts and improvements in piezoelectric materials. At present, the mainstream and mature piezoelectric materials include ZnO, BaTiO 3 [22], lead zirconate titanate (PZT) [23], and polyvinylidene fluoride (PVDF) [24]. While developing piezoelectric materials, triboelectric nanogenerators (TENG) came out in 2012, which is based on the conjunction of triboelectrification and electrostatic induction [25]. Compared with PENG, TENG has the advantages of having a high output, low cost, simple structure design, and excellent stability. Up to now, PENG and TENG have made significant progress in output performance, sensitivity, energy conversion rate, flexibility, and being environmentally friendly [26]. At the same time, some other types of nanogenerators have been developed, such as pyroelectricity nanogenerators (PYENG) and piezoelectric triboelectric hybrid nanogenerators (PTENG) [27].

Technology Evolution Pathways
As a law of nature, evolution occurs all the time. Additionally, there is also an evolution process in the field of technology [28]. At present, the definition of technology evolution is not unified. There are roughly two views among researchers: one holds that technology evolution is generated by the accumulation of continuous innovation behind technology, and the other holds that the development and change process of technology itself symbolizes technology evolution and the induction and display of various changes in the form of paths is the technology evolution pathway [29,30].
The early analysis methods of technology evolution were mainly qualitative methods, including morphologic analysis, Delphi survey [30], and technology roadmap [19], which is under the guidance of expert knowledge and experience and requires a lot of human participation. Therefore, qualitative methods have high research costs and subjectivity, making the research results inefficient and unstable. With the rapid growth of data mining technology, quantitative methods have been well applied in technology evolution analysis. The main quantitative analysis methods include patent citation analysis, patent classification analysis, text mining methods, etc.
Huenteler et al. analyzed the evolution process of technology based on the citation links of patents, while a citation network can reflect the flow process of knowledge [31]. Zhou et al. analyzed the technology layout and trends of solar cells based on patent classification by IPC code [32]. However, the citation network analysis and classification analysis do not take semantic information in the text corpus into consideration. Additionally, the IPC code does not change over time. Thus, it is unable to sensitively perceive the technology evolution for the rapidly developing or converging and emerging technology fields. To fully use the semantic information in patent text, text mining methods were taken into consideration to analyze technology evolution. Yoon et al. constructed a semantic network using text mining methods to analyze the development trend of technology [33]. Miao et al. has studied more than 30,000 patents since the 1990s using text mining methods to obtain products and applications with application prospects and rule out traditional technologies with a declining trend [11]. However, text mining methods pay more attention to the semantic information carried by patent text while ignoring the relationship between patents. Naturally, researchers consider combining the patent citation network and text mining methods to research technology evolution trends. Li et al. monitored and forecast the development trend of nanogenerators by citation analysis and used a Hierarchical Dirichlet Process topic model to extract technological topics [8].
Moreover, most of the existing studies only focus on a single source of data such as patents and papers, ignoring the interaction between knowledge discovery represented by grants or papers and technologies applications represented by patents, as well as the correction and difference of the information.

Knowledge Graph and Representation Learning
With the advent of the information age, the explosive growth of multi-source heterogeneous data has brought significant challenges to the data organization and application in the big data environment. A knowledge graph (KG) is a structured knowledge base with strong semantic processing ability, which provides a new idea to solve these problems. KG comes from Google's next-generation intelligent semantic search engine technology. In essence, it is a semantic network that reveals the relationship between entities and can also formally describe things that existed in the real world and their relationships. Now KG has been used to refer to all kinds of large-scale knowledge bases. Within the KG, the storage structure of data and knowledge is a triple, such as <s, p, o> or p (s, o), where s and o are nodes in the KG, representing subject entity knowledge and object entity knowledge, respectively, and p is the edge in the KG, meaning the relational knowledge from subject s to object o.
At present, general knowledge graph technology, such as Freebase, DBpedia, Wikidata, and so on, has played an essential role in the internet field, such as intelligent search, intelligent Q&A, and personalized recommendation. At the same time, it has been preliminarily applied in many areas such as finance, e-commerce, medical treatment, etc. Compared with the general knowledge graph, the domain knowledge graph has more knowledge sources, faster requirements for large-scale expansion, a more complex knowledge structure, higher requirements for knowledge quality, and broader application forms. In the field of nanogenerators, there is little literature on the application of knowledge to analyze the relationships between various entities.
A knowledge graph is a structured knowledge base that stores entities' features and relationships, which demands a data mining method to efficiently obtain specific knowledge from the vast knowledge base. In recent years, representation learning algorithms have developed rapidly. Their purpose is to learn the potential, informative, and low dimensional representation of entities, which can simplify the graph while retaining the graph structure, entities' features, labels, and other auxiliary information. Socher et al. defined the evaluation function for each triplet in the knowledge graph using a single-layer neural network. They solved the representation of each entity by maximizing the evaluation function [34]. Although the nonlinear model based on the single-layer neural network can capture the semantic relationship between entities well, the computational cost is considerable. Inspired by the phenomenon of translation invariance in word vector space, Bordes et al. proposed the TransE model to learn the representation of entities in the knowledge graph in vector space, and the relationship is regarded as the translation vector between related entity pairs to constrain the learning results [35]. The TransE model is simple to reduce the computational cost, and the performance is significantly improved compared with the previous models. Nevertheless, TransE still has many limitations, which has encouraged later researchers to put forward many improved models. Wang  proposed the TransH model to improve the ability to deal with complex relationships [36]. Lin et al. further proposed the TransR model based on the belief that different relationships should correspond to different semantic spaces [37]. The TransR model represents entities in triples into the vector space corresponding to the relationships and then establishes the translation relationship between entity vectors proposed by the TransE model. On the basis of TransR, the TransD model further defines different projection matrices for head entity and tail entity and simplifies the number of parameters of matrix [38].
TransE and its improved model only use the relationship data between entities in the knowledge graph for representation and learning. However, a large amount of descriptive information about the entity itself has not been used. Occasionally, the graph neural network (GNN) has attracted the attention of relevant researchers. GNN is a deep learning model based on information propagation, which can use the structure information and node information of the graph for representation at the same time. However, most classical GNN models, such as GCN [39], GAT [40], GAE [41], etc., can only apply to the knowledge graph of a single type of entity and relationship. To deal with this, Cen et al. proposed the MEIRec model, which uses the meta-path sampling method to sample multiple subgraphs of unified formal structures to facilitate GNN representation learning [42]. Wang et al. proposed the HAN model, which calculates the adjacency matrix of different meta-paths and puts it into the GAT model to learn the graph representation [43].

Data
This study attempted to analyze the knowledge flow between different data sources. Firstly, using the term "nanogenerator* or nano-generator", we collected the papers of nanogenerators in the Thomson Reuters Web Of Science database (WOS) by the end of December 2021. Then, 3304 publications were retrieved from the whole database, including the publication's title, citation information, abstract, time, author, institution, DOI, and journal name. Likewise, using the term "nanogenerator* OR nanometer generator", we collected the patents and nanogenerators in the Derwent Innovation Index (DI) database by the end of December 2021. Then, 984 patents were retrieved from the database, including the patent's title, citation information, time, and institution. Finally, using the term "nanogenerator*", we collected the grants of nanogenerators in the grants database of the China Knowledge Centre for Engineering Science and Technology (CKCEST). A total of 169 grants were retrieved, including title, start date, keywords, abstract, and institution. The details of data acquisition are shown in Table 1.

Knowledge Graph of Different Time Slices
To make use of the semantic information in the multi-source data, we need to construct knowledge graphs to reflect the relationships between entities. Take paper data as an example. Based on the related entities of papers, such as author, institution, and journal, we can construct a mapping r (s, o) to preserve the relationship of paper and other entities, while s represents the source of the relationship and o represents the object of the relationship, and r represents the type of relationship. Then, we can obtain several relationships, such as papers published in a journal p (p, j), papers written by the author w (p, a), and papers owned by an institution o (p, i). In the meantime, by dealing with the citation information of papers, we can obtain the relationship of a paper cited by other papers c (p, p). For each type of relationship, we can construct a matrix M AB to save the mapping, while A and B represent the type of entities.
Thus far, we have obtained the relationships between entities by the semantic information contained in papers. Next, we need to extract features that can reflect the similarity and differences of papers by the word vectorization method. Specifically, we can vectorize the title of papers by the doc2vec model (denoted by f i ). After vectorization, the paper with similar subject words in the title has higher vector similarity, which saves the feature information of papers. The process of knowledge graph construction of patents and grants is the same as that of papers. After

Heterogeneous Graph Attention Network for Representation Learning
In this paper, we use a Heterogeneous Graph Attention Network (HAN) to consider the graph topology and text information at the same time [43]. The HAN model is improved from the Graph Attention Network (GAT) model while reserving the attention mechanism of GAT and proposing a solution for heterogeneous graph representation learning [40]. The framework of HAN is shown in Figure 2.
First, the meta-path was defined as a path in the form of E 1 • · · · • R n between entities E 1 and E n+1 , where • denotes the composition operator on relations.
Based on the definition of meta-path, we can extract relations between different papers, grants, or patents. For example, we can define the relation of journal co-occurrence of papers by the meta-path P 1 published → J 1 publish → P 2 (abbreviated as PJP). The complete meta-paths of different data sources are shown in Table 2

K-Means for Clustering and LDA for Topic Extracting
K-means is an unsupervised clustering algorithm, which identifies clusters = , , ⋯ , based on square error minimization for the given sample set = , , ⋯ , . The process can be expressed as: where = | | ∑ is the mean vector for cluster , and k is the number of clusters proposed to be classified. In this paper, the final embedding of entities was used as the input of the K-means model for clustering. Then, we can obtain k clusters, which represent research sub-fields.
To clarify what each cluster means, we used the Latent Dirichlet Distribution (LDA) topic model to extract topic words for clusters. The LDA topic model is an unsupervised method for extracting hidden topics distribution of document and hidden word distribution of topics. It can represent each cluster by several important topics, and each topic contains several keywords.

Clusters Association for Evolutionary Path Identification
The mean value of entity embedding vectors was calculated to reflect the cluster vector. By calculating the similarity of different cluster vectors in different time slices or different data sources, we can connect clusters with the highest similarity to form technology evolution paths, in which the clusters' topics were used to reflect specific technologies. In Next, based on the transformation matrix of different meta-paths, for each type of entity (e.g., entities with type ϕi), we can conduct information propagation process as follows: where f i and f i are the original and processed features of node i, respectively. After that, self-attention is leveraged to learn the weight among various kinds of entities. Given an entity pair (i, j) which are connected via meta-path ϕ, a node-level attention α ϕ ij can be learned to show how important entity j will be for entity i. The process can be formulated as follows: Then, the meta-path-based embedding of entity i can be aggregated by the neighbor's projected features with the corresponding coefficients as follows: where z ϕ i is the learned embedding of entity i for meta-path ϕ. Given the meta-path set {ϕ 1 , ϕ 2 , · · · ϕ m }, after feeding features into entity-level attention, we can obtain m groups of semantic specific node embeddings, denoted as {Z ϕ1 , Z ϕ2 , · · · Z ϕm }.
Generally, every node contains multiple types of semantic information, and semantic entity embedding can only reflect nodes from one aspect. To learn a more comprehensive node embedding, we need to fuse multiple semantics, which can be revealed by meta-paths. A novel semantic-level attention was proposed to automatically learn the importance of different meta-paths and fuse them. The learned weights of each meta-path can be shown as follows: β ϕ1 , β ϕ2 , · · · β ϕm = att sem Z ϕ1 , Z ϕ2 , · · · , Z ϕm With the learned weights as coefficients, we can fuse these semantic-specific embeddings to obtain the final embedding Z as follows:

K-Means for Clustering and LDA for Topic Extracting
K-means is an unsupervised clustering algorithm, which identifies clusters C = {C 1 , C 2 , · · · , C k } based on square error minimization for the given sample set D = {x 1 , x 2 , · · · , x n }. The process can be expressed as: where µ i = 1 |C i | ∑ x C i x is the mean vector for cluster C i , and k is the number of clusters proposed to be classified.
In this paper, the final embedding of entities was used as the input of the K-means model for clustering. Then, we can obtain k clusters, which represent research sub-fields.
To clarify what each cluster means, we used the Latent Dirichlet Distribution (LDA) topic model to extract topic words for clusters. The LDA topic model is an unsupervised method for extracting hidden topics distribution of document and hidden word distribution of topics. It can represent each cluster by several important topics, and each topic contains several keywords.

Clusters Association for Evolutionary Path Identification
The mean value of entity embedding vectors was calculated to reflect the cluster vector. By calculating the similarity of different cluster vectors in different time slices or different data sources, we can connect clusters with the highest similarity to form technology evolution paths, in which the clusters' topics were used to reflect specific technologies. In this paper, the reciprocal of the Euclidean distance was used to measure the similarity of different clusters.

Representation Learning and Clustering
According to the proposed method in Section 3, the technology evolution pathway was identified and described. The multi-source data were utilized to construct the knowledge graphs of different data sources and different time slices. Based on these knowledge graphs, we can extract the transformation matrix A ∈ R n×n by different meta-paths, and the feature matrix X ∈ R n×m by doc2vec model, while n was the number of grant, paper, or patent entities in the knowledge graph, which can be found in Table 2 and m was the vector dimension of doc2vec output.
Then, the transformation matrix A and feature matrix X were input into the HAN model to learn the representation vector of entities. In this paper, we set the learning rate to 0.005, the dimension of the semantic-level attention vector to 128, the attention head K to 8, the dropout of attention to 0.6, and the training epochs to 200.
After using the trained model to get embedding vectors with 64 dimensions, we utilized K-means model to cluster these embedding vectors. In order to select the number of clusters accurately, we chose the number corresponding to the maximum silhouette coefficient while repeating clustering for cluster number change in ranges 2 to 10.
After clustering, we extracted keywords of clusters by LDA topic model using the text information in each cluster. We provide one topic and ten keywords for each cluster. The details can be found in Tables 3-5.     All of the experimental procedures were based on Python 3 programming language and PyCharm platform.
From Tables 3-5, we can summarize the technology topic of different time slices. In 2006-2012, the main topic was the PENG structure and sensors based on PENG. In 2013-2017, the flexible sensors and wearable devices were the mainstream nanogenerator applications, while TENG began to appear and gradually replace PENG. In 2018-2021, wearable devices were still the research hotspots, while novel energy sources and the performance improvement of nanogenerators such as output voltage became the research questions.

Technology Evolution Pathways
Following the step of K-means, we calculate the vector distance of clusters in different time slices and connect the clusters with minimum distance while the minimum distance is smaller than the threshold (set to 2) to form the technology evolution pathways. The results are shown in Figure 3, in which the evolution pathways were automatically generated by calculating the similarity of the preceding clustering results using a written Python program. The dots in Figure 3 indicate the clusters which connect similar grants, papers, and patents. The line connections between dots indicate high similarity between different clusters, which can represent the knowledge flow and indicate the technology evolution pathways.
From Figure 3, we can analyze the knowledge flow pathways between data sources. First, we can find that the knowledge flows from grants to papers were faster than that from papers to patents, as the technologies proposed by grants can be found in papers in the same time slice but can be found in patents in the backward time slice. An explanation for this condition is that making a profound study is easier than applying theory to application.
Then, we can find several knowledge flows from research to application successfully. The most typical case is the wearable devices with nanogenerator sensors. ways in 2018-2021, which represent the technologies with strong innovativeness. Specifi-cally, cluster 3 of papers in 2018-2021 contains the keywords "mechanical", "wave", "wind", "water", "vibration", "energy" and "harvesting", and cluster 4 of patents in 2018-2021 contains the keywords "water" and "energy". These keywords demonstrate that novel energy sources such as wind, water, and mechanical vibration became new research directions and hotspots. Cluster 0 of papers in 2018-2021 represents the fiber structure of nanogenerators, which indicates the innovation direction of nanogenerator structures.

Conclusions
This paper proposed a novel framework to monitor the evolutionary pathways of nanogenerator technology based on multi-source data and a knowledge graph. In the framework, the knowledge graph makes full use of text information, and the multi-source data fully considers the evolutionary pathways from different data perspectives. Additionally, we show that the novel framework is efficient and accurate.
We find some characteristics that the evolution process and knowledge flow from grants to patents is faster than that from papers to patents, which indicates that making a profound study is easier than applying theories to applications. We also monitor the complete evolution pathways of piezoelectric nanogenerators, wearable devices, and nanogenerator performance improvement technologies. While analyzing the evolution pathways, we also find several emerging research directions for nanogenerators, such as novel energy sources and fiber structure of nanogenerators. Except for these obvious evolution pathways, we can find several isolated short pathways in 2018-2021, which represent the technologies with strong innovativeness. Specifically, cluster 3 of papers in 2018-2021 contains the keywords "mechanical", "wave", "wind", "water", "vibration", "energy" and "harvesting", and cluster 4 of patents in 2018-2021 contains the keywords "water" and "energy". These keywords demonstrate that novel energy sources such as wind, water, and mechanical vibration became new research directions and hotspots. Cluster 0 of papers in 2018-2021 represents the fiber structure of nanogenerators, which indicates the innovation direction of nanogenerator structures.

Conclusions
This paper proposed a novel framework to monitor the evolutionary pathways of nanogenerator technology based on multi-source data and a knowledge graph. In the framework, the knowledge graph makes full use of text information, and the multi-source data fully considers the evolutionary pathways from different data perspectives. Additionally, we show that the novel framework is efficient and accurate.
We find some characteristics that the evolution process and knowledge flow from grants to patents is faster than that from papers to patents, which indicates that making a profound study is easier than applying theories to applications. We also monitor the complete evolution pathways of piezoelectric nanogenerators, wearable devices, and nanogenerator performance improvement technologies. While analyzing the evolution pathways, we also find several emerging research directions for nanogenerators, such as novel energy sources and fiber structure of nanogenerators.
However, due to the numbers of grants, papers, and patents in the nanogenerator field, we cannot unleash the full advantage of the knowledge graph and representation learning. In the meantime, the identification of cluster topics requires expert knowledge and human intervention. So, in future research, we will attempt to get more data and use the machine learning method to achieve the automatic classification of cluster topics.