A Network Analysis Model for Selecting Sustainable Technology

: Most companies develop technologies to improve their competitiveness in the marketplace. Typically, they then patent these technologies around the world in order to protect their intellectual property. Other companies may use patented technologies to develop new products, but must pay royalties to the patent holders or owners. Should they fail to do so, this can result in legal disputes in the form of patent infringement actions between companies. To avoid such situations, companies attempt to research and develop necessary technologies before their competitors do so. An important part of this process is analyzing existing patent documents in order to identify emerging technologies. In such analyses, extracting sustainable technology from patent data is important, because sustainable technology drives technological competition among companies and, thus, the development of new technologies. In addition, selecting sustainable technologies makes it possible to plan their R&D (research and development) efficiently. In this study, we propose a network model that can be used to select the sustainable technology from patent documents, based on the centrality and degree of a social network analysis. To verify the performance of the proposed model, we carry out a case study using actual patent data from patent databases.


Introduction
Technology development is important for many companies, and is often based on previous technologies.As such, the competitive power of a company is dependent on technology [1].Most R&D (research and development) results for a technology are published and registered in papers, articles, or patents [2].Of these, patents provide exclusive rights to a developed technology, and thus, developers apply for their technologies to be patented around the world [3].The exclusive rights offered by a registered patent mean that patent management is an important issue within the Management of Technology (MOT) field.Companies need to avoid patent infringements when planning their R&D, or they could face possible patent lawsuits or be required to pay a legal cost for the infringement.It is possible to use a patented technology owned by someone else, but to do so, the company must either buy the patent or enter into a contract, such as a cross-licensing agreement.Thus, we need a method for determining the technologies required in a given technological field.Here, we propose a sustainable technology selection model to find the necessary technologies for a target domain.Our model is based on a Social Network Analysis (SNA), which is used for technology forecasting in diverse MOT fields [4][5][6][7].Forecasting models exist for finding future technologies in a target domain [8,9].Based on previous works [10,11], we build a model to select sustainable technologies for MOT areas such as R&D planning or new product development.A technology is represented by a vertex in an SNA graph, and edges between the vertices represent technological relations.The representative or central node of an SNA graph can be the sustainable technology in the technology network, being included in all technologies.Lastly, to illustrate how our study can be applied to a real-world problem, we perform a case study using patent documents retrieved from actual patent databases around the world.

Graph Theory and Network Model
Graph theory forms part of computer science or data science.A graph is a data structure representing a relationship between connected objects [12], and can model many real-world networks.A graph (G) is composed of a vertex (V) and an edge (E); thus, G (V, E).Vertices have diverse characteristics, while edges denote relationships among objects.There are two types of graph structures, namely undirected and directed graphs.All objects in an undirected graph are connected to each other in both directions, while in a direct graph, both the connection and its direction are specified.In this study, we compute the degrees of objects from a graph structure in order to select a representative and central vertex.Then, we use this vertex to select the sustainable technology.Our network model is an SNA model and is based on a graph structure.SNA is an analytical method used to understand social structures among individuals and objects using a network model and graph theory [13][14][15].The model analyzes the network of nodes (vertices) and their connecting lines (edges) in a graph data structure.SNA has been applied in diverse areas, including media, medicine, marketing, education, and politics [16].An SNA graph can be visualized as a technological network in which nodes are represented as sub technologies and links between nodes denote connections between sub technologies.Previous studies have used the International Patent Classification (IPC) codes or keywords as nodes of their SNA network models [6,7].Figure 1 shows an SNA graph based on keywords.From the SNA graph in Figure 2, we know that the three technologies of IPC Codes 1, 2, and 3 are closely related.In addition, we can see that the technology of IPC Code 2 is isolated from the other technologies.In practice, we would conclude that the technology of IPC Code 2 has been developed independently of the other three codes.In addition to keywords and IPC codes, we can build SNA technology graphs using other items such as inventors, U.S. patent class (UPC) codes, patent numbers, and so on.In this study, we use IPC codes to denote technological items (i.e., vertices) in the SNA graph.Then, we identify the sustainability technologies from the technological SNA graphs.

Technology Analysis and Sustainable Technology
A technology analysis examines technological data containing researched and developed technologies within the MOT field.There are diverse types of technological data such as patents, papers, articles, news items, and so on.Of these, patent documents contain the most information on developed technologies, because they protect the inventor's right to the technology for a limited period [3].A patent includes information such as the inventor/owner, title, abstract, application and issuance dates, IPC code, family patents, citation information, and figures.Each of these elements is important in terms of determining the developed technology.The IPC is a hierarchical system for patent classification based on technological areas [17].It generally consists of eight sections, with about 70,000 subsections based on hierarchical technology classes [17].Therefore, to obtain the IPC codes from patent data effectively, it is necessary to establish the technological relationship between technologies.IPC code data are used in patent analyses for technology forecasting [6,7,[18][19][20].In this study, we use IPC codes to choose the sustainable technology of a target technology domain.With regard to MOT, sustainable technology is defined as being necessary for the technological sustainability of companies and nations.Figure 3 shows an MOT process that determines a sustainable technology.Determining the sustainable technology is a core part of the MOT process of choosing a target technology domain through to building an R&D policy.Here, we use an SNA measure and graph to accomplish this task.The next section describes our proposed methodology for selecting sustainable technologies.

Patent Citation Network
The meaning of a cited patent is that the patent contains important information about the developed technology [21].So, patent citation has been used to analyze the significance of a patent for technological innovation or R&D strategy [22].To understand the patent citation results efficiently, we can consider visualization of patents.The citation network is one of diverse methods for patent visualization [23].Patent citation networks model was studied by graph theory based on nodes (patents) and their linkages [24].This was researched and used in diverse fields of technology analysis for improving the technological competition of a company [21,22,24,25].To construct a network model for technology analysis, patent citation networks uses the citation results between patents.In comparison, we use social network mining and statistics for making technological network model in this paper.

Network Model for Selecting Sustainable Technology
In this study, we propose a network analysis model with which to extract sustainable technologies for IP mining.Our model is based on SNA measures and visualizations.SNA is one of many network models based on graph theory.A graph is a data structure found in computer science [12] and data science [26].Our graph structure is defined as follows: where n is the number of IPC codes, and IPC codes are used as the elements of vertices.The proposed network model has edges with a maximum of ( × ( − 1))/2 from ( , ) to ( , ) possible edges.We select sustainable technologies from the network structure based on vertices and edges.For example, Figure 4 shows a graph structure that includes a sustainable technology node.Nodes A, B, C, and D represent the holding sub-technologies of a company and node S is our sustainable technology.Note that S is located at the center of the nodes in Figure 4a, and S is related to all other nodes.Thus, the technology of S affects the development of the technologies of A, B, C, or D. Of course, the technologies of A, B, C, and D can also influence each other's development.Therefore, we determine node S to be the sustainable technology.The graph structure in Figure 4b shows another case of sustainable technology.In this case, the sub-technologies are only dependent on the technology of S. Therefore, based on graph structure in Figure 4, a company would develop its technology with node S as the core technology.Figure 5 shows the sustainable technology management proposed in this study.We can select the sustainable technology from the technology network structure.Using the sustainable technology, we develop new products or services and plan the R&D strategies for emerging and future technologies.This will improve the technological competitiveness of a company.In addition, the development of new and innovative technologies is based on R&D planning that uses the sustainable technology.We can select core technologies from the new and innovative technologies.These core technologies become candidates for the sustainable technology of a company.
Sustainable technologies can be used for diverse MOT works, such as intellectual property (IP) R&D or new product development [6].Based on the knowledge of the sustainable technology in a given field, we can manage the technologies of the corresponding fields.In addition, we can effectively develop new products by first determining the sustainable technology.Our model for choosing sustainable technology was constructed using social network mining [20], which is based on SNA.SNA is composed of two components, namely a vertex (or node) and an edge (or connection), and we can explain the relationships between vertices by analyzing their edges [26].Here, we consider a vertex and edge as a technology and a technological connection, respectively.We use the degree, closeness centrality, betweenness centrality, graph centrality, shortest distance, and egocentric measure as evaluation criteria when choosing the sustainable technology.For n vertices, the closeness centrality (CC) of vertex i is defined as follows [4,26,27]: where sdist (i,j) is the shortest distance between vertices i and j.We select the vertex with the largest CC value as the sustainable technology.We define the betweenness centrality (BC) of vertex i as follows [4,26,27]: where sdist(i,k,j) is the shortest distance from i to j through k.As in the CC case, we select the vertex with the largest BC value as the sustainable technology.The graph centrality (GC) of vertex i is defined as follows [4,26,27]: Here, we select the vertex with the largest GC value as the sustainable technology.In this study, we use a degree measure to select the sustainable technology, as follows (see Figure 6).Figure 6 shows an SNA graph including four vertices and four edges.We know that the degree of T1 is 3, T2 and T4 have degree 2, and finally, T3 has degree 1.Therefore, T1 is determined as the most important vertex in the SNA graph.Similarly, we can select the sustainable technology (vertex) with the largest degree value.We also consider an egocentric network and neighborhood measures when selecting sustainable technologies.The egocentric network of vertex i is defined as follows [26]: This is the subgraph (SG) induced by the union of i and the neighborhood of i.The egocentric network shows the incoming, outgoing, and combined specifications in the neighborhood structure of SG.This is a useful tool for evaluating local structural associations in social networks.We select the IPC codes (vertices) with larger egocentric values as sustainable technologies.The following explains the proposed process for selecting a sustainable technology.Our model contributes to the R&D planning and technology management of a company.Figure 7 shows the proposed process for choosing sustainable technology, as well as its applications.Therefore, we propose a methodology for selecting sustainable technologies for R&D planning in MOT.In the next section, we present a case study to illustrate how this research can be applied to a real-world scenario.

Experimental Result
We used patent documents applied for by the Ford Motor Company to illustrate the practical application of our research.We extract the IPC codes from the company's patent data [28][29][30].The hierarchical structure [17] of the IPC codes is shown in Figure 8. Figure 8 is the IPC code of F02M.The section level represents the overall body of the technological domain.This consists of eight levels, from A (human necessities) to H (electricity).The class level shows more detailed technologies than does the section level.In this study, we used the subclass level of the IPC code because in the previous studies, this level provided better results than did other levels [6,7,18].In Figure 9, we show the 49 IPC codes with frequencies above 100.These are the IPC codes we use in this experiment.In our case study, we used the "sna" and "igraph" packages, as well as the statistical functions provided by the R-Project [26][27][28][29].R is a free data language and provides software for statistical computing.Since R is also an object oriented programming language, it has been used in many studies on statistical analysis and visualization.Most jobs in our experiments were performed using the R package and functions.When R is first installed, it includes only the basic functions for statistical computing.Thus, to add more functions for advanced statistical analyses, we install additional packages.In this study, we installed the "sna" and "igraph" packages for SNA and social network mining.The most frequent IPC code in Ford is F16H, which represents a "gearing" technology.We can obtain the technological definition of each IPC code from the World Intellectual Property Organization (WIPO) [17].Therefore, we know that the gearing technology is a basic vehicle technology of Ford.
The second most common IPC code is F02M.This code is related to a technology for "supplying combustion engines in general with combustible mixtures or constituents thereof".These two are followed by B60G, B62D, and B60R explain, which describe the technologies of "vehicle suspension arrangements", "motor vehicles; trailers", and "vehicles, vehicle fittings, as well as other vehicle parts, not otherwise provided for", respectively.Figure 10 shows the frequency distribution of patents according to the number of included IPC codes.We observe that most of the patents have a frequency distribution of less than 10.In particular, the largest number of IPC codes included in a single patent is two.Thus, most technologies of Ford are related to a few sub technologies with fewer than four IPC codes.Based on the previous two figures, we determined that the highest number of IPC codes was 10.Therefore, we selected the following 10 IPC codes for our case study: F16H, F02M, B60G, B62D, B60R, F16D, F02D, B60K, F02B, and H01M. Figure 11 summarizes the numbers of patents and IPC codes, by year.First, the number of applied patents was larger in the 1970s and 1980s, but this has decreased more recently.Second, the trend in IPC codes is similar to that of the patent behavior.Therefore, we know that the technological development of Ford occurred in the 1970s and 1980s.That is, Ford's R&D results have decreased.To understand the technologies of Ford, we performed an SNA using the top 10 IPC codes (see Figure 12).We can see that F02M is connected with most of the IPC codes.Therefore, the F02M technology can be considered a sustainable technology of Ford.Next, F16H is the second IPC code to be a candidate for sustainable technology because of its large number of connections, as shown in Figure 12.On the other hand, B60K is the least likely candidate IPC for a sustainable technology code, because it is connected to only one IPC code, namely F16H.The top 10 IPC codes based on their degree are shown in Table 1.We can see that the vertex of F02M is connected to the eight vertices of B60R, B60G, F02D, F16D, F02B, F16H, B62D, and H01M.Next, F16H has degree seven, and B62D has degree six.The IPC codes of B60G and F16D are both of degree values.Figure 13 shows SNA graphs by partial neighborhood order from 1 to 4 using the top 10 IPC codes.This figure explains the rough relationship structure of the SNA graph.Note that the individual connections between IPC codes are not meaningful in a neighborhood-based SNA graph.We find all IPC codes are connected within their neighborhoods with an order of 1.This means the state keeps its neighborhood when the order is 2, but most IPC codes are separated when the order is 3 or greater.Therefore, most IPC codes are connected on low orders.In other words, the major IPC codes (technologies) of Ford are associated in a similar way.To identify the SNA network structure, we show another neighborhood based an SNA graph in Figure 14.In contrast to the partial neighborhood SNA graph, this graph shows the cumulative structure of IPC code connections by neighborhood order.When the cumulative neighborhood order is 3, most IPC codes are fully connected to each other.From the results of Figures 13 and 14, we find the technologies developed by Ford are closely connected.To perform a more advanced analysis in order to select sustainable technologies, we compute diverse SNA measures.Table 2 shows the results of the centrality and distance of the 10 extracted IPC codes.We can see that F02M has the largest closeness centrality.Therefore, F02M is determined to be the best candidate for a sustainable technology, based on its closeness centrality.The second best IPC code candidate is F16H, also based on its closeness centrality.In terms of betweenness centrality, F16H is the best candidate for a sustainable technology, whereas the second best IPC code candidate is F02M.Next, based on the graph centrality, F02M and F16H have the largest values, and are therefore considered as candidates for a sustainable technology.Finally, we compute the shortest distances for the 10 IPC codes.Here, F02M has the smallest distance, followed by F16H.Based on these results, we conclude that F02M is a sustainable technology.Here, F02M represents the technology for "supplying combustion engines in general with combustible mixtures or constituents thereof".Therefore, we could base our R&D planning or technology management on this sustainable technology selection.

Discussion
In traditional studies, selecting and determining the implications of sustainable technology are the roles of domain experts.These experts select the sustainable technology using the Delphi survey, and based on their experience and knowledge.In general, this can be subjective, which meant that results varied because of the qualitative approach.To overcome this problem, we proposed a quantitative approach to selecting the sustainable technology.Here, we applied graph theory and statistics to derive our network analysis model, which does not depend on experts' subjective knowledge.In this way, our model is both novel and objective.
The results of this research will have implications for sustainable development in diverse areas.Furthermore, the results will contribute to developing new products and services and building R&D strategies for sustainable development.The model is quantitative in nature, but we expect that combining quantitative and qualitative approaches could create a synergistic effect with regard to determining sustainable technology.This remains as a topic for future research.

Conclusions
In this study, we proposed a model for choosing sustainable technology based on a social network analysis and measure.The proposed model includes a visualization of the network and uses measures to evaluate the possibility of an IPC code being a sustainable technology.We also constructed an SNA graph in a case study and computed the closeness centrality, betweenness centrality, graph centrality, shortest distance, and egocentric measure.In the case study, we used the IPC code to represent technology, because such codes represent a hierarchical classification of developed technologies.This research contributes to the technological information of a company or nation in terms of R&D planning or technology management.Our results are based on finding sustainable technologies using the top-ranked IPC codes and SNA.However, more analytical results are needed to find sustainable technologies more accurately and objectively.This is a limitation of our research.In future work, we will develop more advanced methods of extracting the sustainable technology using diverse statistics and machine learning algorithms.

Figure 1 .
Figure 1.SNA (Social Network Analysis) graph based on keywords.

Figure 3 .
Figure 3. MOT (management of technology) process for finding sustainable technology.

Figure 4 .
Figure 4. Sustainable technology in a graph structure: A, B, C, and D are defined as Sub-technologies, S is defined as Sustainable technology in (a) in-directed and (b) directed networks.
Input: Retrieved patent data related to the target technology.Output: Extracted sustainable technology.Step 1: Selection of IPC codes (1.1) Extract all IPC codes from retrieved patent data (1.2) Select IPC codes with a frequency greater than the threshold value Step 2: Descriptive statistics of patent data (2.1)Frequency distribution of patents by IPC codes (2.2) Yearly trend of the numbers of applied patents (2.3) Yearly trend of the Numbers of IPC codes included in applied patents Step 3: Social network mining (3.1) Visualize technology networking using social network graph (3.2) Count the degree of top-ranked IPC codes (3.3) Calculating the closeness centrality of top-ranked IPC codes (3.4) Calculate the betweenness centrality of top-ranked IPC codes (3.5) Calculate the graph centrality of top-ranked IPC codes (3.6) Calculate the shortest distance between top-ranked IPC codes Step 4: Determine the sustainable technologies based on the results of Step 3 Step 5: Apply to practical domain (5.1)R&D planning (5.2) Technology management

Figure 7 .
Figure 7. Proposed process for choosing sustainable technology.

Figure 10 .
Figure 10.Frequency distribution of patents according to the number of included IPC codes.

Figure 11 .
Figure 11.Numbers of patents and IPC codes by year.

Figure 13 .
Figure 13.SNA graph by partial neighborhood order.

Figure 14 .
Figure 14.SNA graph by cumulative neighborhood order.

Table 1 .
Top 10 IPC codes by degree.

Table 2 .
Centrality and distance of the top 10 IPC codes.