Sustainable Technology Analysis of Artiﬁcial Intelligence Using Bayesian and Social Network Models

: Recent developments in artiﬁcial intelligence (AI) have led to a signiﬁcant increase in the use of AI technologies. Many experts are researching and developing AI technologies in their respective ﬁelds, often submitting papers and patent applications as a result. In particular, owing to the characteristics of the patent system that is used to protect the exclusive rights to registered technology, patent documents contain detailed information on the developed technology. Therefore, in this study, we propose a statistical method for analyzing patent data on AI technology to improve our understanding of sustainable technology in the ﬁeld of AI. We collect patent documents that are related to AI technology, and then analyze the patent data to identify sustainable AI technology. In our analysis, we develop a statistical method that combines social network analysis and Bayesian modeling. Based on the results of the proposed method, we provide a technological structure that can be applied to understand the sustainability of AI technology. To show how the proposed method can be applied to a practical problem, we apply the technological structure to a case study in order to analyze sustainable AI technology.


Introduction
Companies face a constant battle to survive in a competitive market environment. An important aspect of remaining competitive is the technology possessed by the company. As a result, companies conduct research to develop innovative technologies to enhance their market competitiveness. As such, research and development (R&D) planning is one of the main issues in business management. In addition, sustainability in technology is a key point in the management of technology [1]. As a result, studies that are related to sustainable technology have been published in diverse domains [2][3][4]. Sustainable technology is essential for the continuous development of a technology in a given field [2]. Park et al. (2015) studied a network analysis model that can be used to select sustainable technology in a given technological field [4]. In addition,  proposed a forecasting methodology for sustainable technology management in the area of defense [3]. Choi et al. (2016) used patent document data for sustainable technology analyses [2]. They consider patents that are related to a given technology domain for a technology analysis because a patent contains diverse and complete results about the developed technology. Statistical analyses of patent data have been used as an effective method for technology analyses in various fields [5][6][7]. In this study, we propose a statistical patent analysis method for sustainable technology analyses. Here, we consider two approaches. First, we apply a Social Network Analysis (SNA) to construct our methodology. Second, we perform Bayesian regression modeling to identify technological relationships for sustainable technology analyses. Then, we demonstrate our methodology by conducting a case study on Artificial Intelligence (AI) technology. The remainder of the paper is structured as follows. Section 2 introduces Bayesian inferences. Then, we propose a sustainable technology analysis method using a Bayesian regression model in Section 3. To illustrate the practical application of our method, we conduct a case study in Section 4. Finally, in Section 5, we present our conclusions and suggestions for future research.

Bayesian Inference
A Bayesian inference is based on Bayes' theorem, representing the conditional relations between random variables [8]. In frequentist statistics, the model parameters are fixed using a maximum likelihood estimator (MLE) [9]. However, the MLE is computed from the observed data, which means a frequentist deals with the uncertainty in the current data. On the other hand, in Bayesian statistics, a Bayesian inference considers the probability distributions for the model parameters, as well as the MLE of the given data. Thus, the Bayesian inference adds the beliefs of the model parameters to the evidence of the MLE. In real-world problems, the beliefs are derived from the knowledge of domain experts. Therefore, using the Bayesian inference, we can select the best model based on the observed data and the knowledge of domain experts. The following structure of conditional probabilities defines the Bayes' rule [10]: Here, X and Y are random variables, and represent technological keywords in a technology analysis. Then, f (X) represents the beliefs of the random variable X that are not dependent on the data. That is, the function represents the domain knowledge of experts in a given technology field.
In addition, f (Y|X) denotes the evidence that is provided by the data and given expert knowledge.
The product of f (Y|X) and f (X) gives f (X|Y), which is the updated beliefs of the domain experts. Therefore, the Bayesian inference updates the beliefs about a given technology continuously using observed technology data and the prior beliefs of domain experts.

Sustainable Technology Analysis Model in Artificial Intelligence
The existing methodologies to technology analysis relied on two approaches, such as qualitative and quantitative methods. The qualitative method is based on the subjective knowledge of domain experts via Delphi survey, and the quantitative method is to analyze the patent data by statistical models. The latter is relatively objective when compared to the former, but the technological analysis results of the expert group are not reflected. To carry out the sustainable technology analysis efficiently and effectively, we need to combine two approaches to technology analysis. So, in order to improve the performance of sustainable technology analysis, we try to combine the evidence that is provided by the observed data and the domain knowledge of experts. Here, we consider patent documents as the observed data, and the prior and posterior distributions as the domain knowledge of experts. To carry out the sustainable technology analysis, we first consider Bayesian inference modeling based on the prior and posterior distributions. From the collected patent documents, we extracted the International Patent Classification (IPC), and construct patent-IPC code matrix. This matrix is composed of row of patent and column of IPC codes. We set the IPC code vector as x 1 , x 2 , . . . , x p . Where x p represents the pth IPC codes. Then, the joint probability distribution of p random variables (IPC codes) is defined as follows: where the parameters (θ) are distributed on the model M. This represents the observed data, called the likelihood. In addition, we consider the prior distribution of experts' domain knowledge. This is defined as f (θ|M) , and represents the beliefs of domain experts without the evidence from the observed data, x 1 , x 2 , . . . , x p . Using Bayes' theorem, we show the posterior distribution as follows: In this study, this represents the updated beliefs of domain experts using the observed data, x 1 , x 2 , . . . , x p . Thus, we obtain the final updated beliefs of the domain knowledge using the following procedure: where x n 1 , x n 2 , . . . , x n p are the technological IPC codes extracted from patent documents (observed data), f θ n−1 M is the updated belief at time step n − 1, and f θ n x n 1 , x n 2 , . . . , x n p , M is the updated belief after learning x n 1 , x n 2 , . . . , x n p and considering f θ n−1 M at time step n. Through such iterative learning, we calculate the final belief based on a Bayesian inference. We extract meaningful relationships between p IPC codes using Bayesian inference and learning. Let C i and C o be two IPC codes, such that we learn the technological relationship between the two codes from the observed data and the prior beliefs. When C i and C o are input and output variables, respectively, in the prediction model, we seek to determine the model C o = f (C i ), which minimizes the loss function, as follows [11]: where C i and C o denote the frequency with which the IPC codes appear in the patent documents. This loss function is used for the regression problem. To infer the response IPC code C o , we use C i1 , C i2 , . . . , C ik as input IPC codes. The Bayesian regression model is as follows: where µ follows a normal distribution with mean 0 and variance σ 2 , µ ∼ N 0, σ 2 . The likelihood function for the observed data is defined as follows: That is, the probability distribution of input IPC codes is represented by the Gaussian formula of response IPC code and linear combination of input IPC codes and their regression parameters. To get the final predictive model, we need a prior distribution. The prior beliefs of domain experts are represented as follows [12]: Using the likelihood and prior functions, we obtain the posterior distribution, and then find the predictive distribution as follows: where C = (C i1 , C i2 , . . . , C ik ) and β = (β 0 , β 1 , . . . , β k ). In addition, we use the probability value (p-value) to check the statistical significance of the input IPC code C i to the output IPC code C o , as follows [9]: The null hypothesis (H 0 ) is that the input IPC code cannot affect the output IPC code. When the p-value from the hypothesis testing is less than 0.05 (95% confidence level), we decide that the input code does influence the output code significantly. In our research, we consider visualization based on a social network analysis (SNA) as sustainable technology analysis method. A SNA is based on graph theory, consisting of nodes and edges: G (Node, Edge) [13,14]. G( ) represents a graph data structure that expresses the relationship between connected objects. Each object is a node in the graph structure, and the connections between the objects are represented by edge. In this study, the nodes and edges are the IPC codes and the connections between IPC codes, respectively. In particular, we create an adjacency matrix, which we use to build the SNA visualization. Therefore, we construct the correlation matrix for an adjacency matrix in Table 1. Table 1. Correlation matrix between International Patent Classification (IPC) codes for adjacency matrix.
Using this matrix, we perform the SNA visualization of the IPC code data. Combining the results of Bayesian inference and the SNA visualization, we carry out the sustainable technology analysis, as follows: Step 1: Collect patent documents related to target technology (1-1) Search for patents in world patent databases using target technology as a keyword (1-2) Filter valid patents that represent target technology Step 2: Preprocessing patent document data (2-1) Transform patent documents into structured data using text mining (2-2) Extract IPC codes from the structured patent data Step 3: Perform SNA visualization (3-1) Select top-ranked IPC codes for SNA (3-2) Visualize top-ranked IPC codes by centrality of SNA Step 4: Analyze IPC codes using Bayesian inference (4-1) Use IPC codes with the largest centrality for the response variable in the Bayesian regression (4-2) Find technological relationships between IPC codes using the Bayesian regression results Step 5: Build a hierarchical structure for the sustainable technology (5-1) Choose statistically significant IPC codes using the p-value in the Bayesian regression models (5-2) Construct the technological structure for sustainable technology related to target technology Through these five steps, we provide a hierarchical structure for sustainable technology in the AI domain. This research result will contribute to the R&D planning of companies or nations in order to improve their sustainability. For example, a company of AI can research and develop the necessary technologies for AI from basic technology to applied technology related to AI by the results of the sustainable technology analysis proposed in this paper. Through this process, the company can develop their own sustainable technologies, and improve the competitiveness in the market.

Case Study of Artificial Intelligence
We conduct a case study using the patent data related to AI technology to illustrate how the proposed method can be applied to find sustainable AI technology. In this paper, we used the R data language to analyze the retrieved patent documents [15]. In addition, the text mining functions provided in the 'tm' package are used to extract the IPC code from the patent document [16]. We also used the Bayesian analysis functions from the 'arm' package to carry out Bayesian data analysis of patent data [17]. To get the SNA visualization, we used the SNA graphic functions from the 'sna' package [18]. First, we retrieve the relevant patent documents from the patent databases of the United States Patent and Trademark Office and the WIPS Corporation [19,20]. We search for patent documents on AI using the 10 keyword searching formulae, as shown in Appendix A. Through the valid patent extraction process, a total of 13,858 patents were selected for the period 1995 to 2016. In addition, we extracted a total of 366 IPC codes from the valid patent documents. All IPC codes that are related to AI technology are shown in Appendix B. Of the 366 IPC codes, we perform a sustainable technology analysis using the top 20 codes. Table 2 shows top 20 IPC codes of patents related to AI technology. The top-ranked IPC code is G06F, with a frequency value of 7529. This means that 7529 patents related to AI technology depend on the technology of G06F. Table 3 provides the top 20 IPC codes and their representative technologies from the World Intellectual Property Organization (WIPO) [21,22].

H04R
Loudspeakers; microphones; gramophone pick-ups or like acoustic electromechanical transducers; deaf-aid sets; public address systems

G06N
Computer systems based on specific computational models

G01S
Radio direction-finding; radio navigation; determining distance or velocity by use of radio waves; locating or presence-detecting by use of the reflection or re-radiation of radio waves; analogous arrangements using other waves

H04L
Transmission of digital information, e.g., telegraphic communication

G06Q
Data processing systems or methods, specially adapted for administrative, commercial, financial, managerial, supervisory or forecasting purposes; systems or methods specially adapted for administrative, commercial, financial, managerial, supervisory or forecasting purposes, not otherwise provided for We found that the technologies defined by the top 20 IPC codes are very diverse in the technological descriptions in the WIPO. In the remainder of our case study, we use these IPC codes for our sustainable technology analysis in the AI field. Therefore, we first perform an SNA and build the SNA graph shown in Figure 1. This figure shows the relationship between the top 20 IPC codes using a centrality measure of SNA. We found that the technologies defined by the top 20 IPC codes are very diverse in the technological descriptions in the WIPO. In the remainder of our case study, we use these IPC codes for our sustainable technology analysis in the AI field. Therefore, we first perform an SNA and build the SNA graph shown in Figure 1. This figure shows the relationship between the top 20 IPC codes using a centrality measure of SNA. In Figure 1, we find that the IPC codes G06F and G06T are located at the center. This means that the technologies based on G06F and G06T are very popular and important for developing AI technology. That is, we decided that these technologies lead the sustainability in AI technology development. Therefore, the rest of this paper finds other IPC codes that affect these two IPC codes and finds statistical relationship between them. Next, the IPC codes G06K, H04N, and G10L are found  In Figure 1, we find that the IPC codes G06F and G06T are located at the center. This means that the technologies based on G06F and G06T are very popular and important for developing AI technology. That is, we decided that these technologies lead the sustainability in AI technology development. Therefore, the rest of this paper finds other IPC codes that affect these two IPC codes and finds statistical relationship between them. Next, the IPC codes G06K, H04N, and G10L are found to play a central role. On the other hand, the IPC codes G11B and G08B are relatively isolated from the other IPC codes. Therefore, we perform a Bayesian inference using G06F and G06T as response (output) variables, and the remaining IPC codes as explanatory (input) variables. The first model of Bayesian regression is determined as follows: H04N, G10L, G06T, A61B, H04M, G01N, H04R, G06N, G01S, H04L, G06Q, H04B, H04W, G09G, G02B, G11B, G08B, G01B) + µ Using Gaussian prior and likelihood functions, we fitted the Bayesian regression model on the response variable G06F in Table 4. The statistically significant IPC codes at the 95% confidence level are G06K, H04N, G10L, G06T, A61B, G01N, H04R, G06N, G01S, H04L, G06Q, H04B, H04W, G09G, G02B, G08B, and G01B. In addition, we find that the IPC codes H04M and G11B are not related to the G06F, because their p-values are less than 0.05. The second Bayesian regression model is built on the response IPC code G06T, as follows: G06K, H04N, G10L, A61B, H04M, G01N, H04R, G06N, G01S, H04L, G06Q, H04B, H04W, G09G, G02B, G11B, G08B, G01B) + µ.
As in the G06F case, we fit the Bayesian regression model on the response variable G06T using the Gaussian prior and likelihood functions in Table 5.
As in the case of G06F and G06T, the final Bayesian regression analysis is performed using the Gaussian prior and likelihood functions in Table 6. We find that the IPC codes H04N, G10L, A61B, H04R, G06N, H04L, G09G, and G02B have a simultaneous technological influence on the IPC codes G06F and G06T. In this case study, we carry out a Bayesian regression analysis from the result of SNA visualization. Then, when combining the analysis results, we create a hierarchy of technologies for sustainable AI technologies, as follows.
In Figure 2, the technologies based on the eight IPC codes H04N, G10L, A61B, H04R, G06N, H04L, G09G, and G02B affect the technologies based on the two IPC codes G06F and G06T. The eight IPC codes based on technologies represent the technologies of "image or speech analysis and processing", "computational models for computer systems", "communication of digital information", and "diverse devices". These become the underlying technologies for AI in terms of sustainability, and influence the technologies of "data processing" based on G06F and G06T. Therefore, we know that the most important technology for sustainable AI technology is data processing. That is, we find that data is the core factor for sustainable AI technology. codes based on technologies represent the technologies of "image or speech analysis and processing", "computational models for computer systems", "communication of digital information", and "diverse devices". These become the underlying technologies for AI in terms of sustainability, and influence the technologies of "data processing" based on G06F and G06T. Therefore, we know that the most important technology for sustainable AI technology is data processing. That is, we find that data is the core factor for sustainable AI technology.

Discussion
In general, finding a technology area with sustainable is a difficult task. But, we should know the sustainable areas in target technology field. The sustainability in technology contains many issues to improve and continue the technology in the society. It is because society needs new technology and technology changes society. In this paper, we studied on new method to find a sustainable technology from the results of Bayesian modeling and SNA visualization using patent data. For the sustainable technology management, the final decision and technological behavior from the hierarchical structure of sustainable technology, as in Figure 2, is the role of domain experts, AI experts in our case study. So, the decision and behavior may not be stable, because they are dependent on the subjective knowledge of domain experts. In the future, it will be necessary to study the objective analysis and process of the entire steps from patent analysis to final decision for the stable decision and behavior in the sustainable technology management.

Conclusions
We proposed a methodology for finding sustainable AI technology using a hierarchical technology structure. We combined SNA visualization and Bayesian modeling to carry out the sustainable technology analysis. In addition, we used patent documents that were related to AI technology to build the technological structure for AI sustainability. The IPC codes that were extracted from the searched patent documents were used for our methodology. Through the SNA visualization, we selected the IPC codes with large centrality. These were used as response variables in the Bayesian regression model in our case study. Using the result of the Bayesian regression modeling, we built the sustainable technology structure for AI technology. This study examined a sustainable technology analysis. We found the hierarchical structure of AI technology for AI sustainability. In addition, we performed a case study to illustrate how the proposed method can be applied to a real-world problem. Our study contributes to the R&D planning of companies or nations needing to improve their technological sustainability.
Our research focuses on finding sustainable technology areas in specific technology areas. In this paper, the technological field is AI technology. We extracted the sustainable technologies related to AI in Figure 2. But, we only used the patent documents of AI as an analysis target for sustainable technology analysis. In future studies, we plan to conduct sustainable technology analysis using more diverse data sources as well as patent data. Also, we will conduct further research on more advanced modeling, such as deep learning, for the methodology of the sustainable technology analysis. Author Contributions: Juhwan Kim designed this study and collected the data for the experiment. Sangsung Park and Dongsik Jang preprocessed the data and selected valid patents. Sunghae Jun analyzed the data to show the validity of the study and wrote the paper and performed all the research steps. In addition, all authors have cooperated with each other to revise the paper.

Conflicts of Interest:
The authors declare no conflict of interest.