Patent Data Analysis of Artiﬁcial Intelligence Using Bayesian Interval Estimation

: Technology analysis is one of the important tasks in technology and industrial management. Much information about technology is contained in the patent documents. So, patent data analysis is required for technology analysis. The existing patent analyses relied on the quantitative analysis of the collected patent documents. However, in the technology analysis, expert prior knowledge should also be considered. In this paper, we study the patent analysis method using Bayesian inference which considers prior experience of experts and likelihood function of patent data at the same time. For keyword data analysis, we use Bayesian predictive interval estimation with count data distributions such as Poisson. Using the proposed models, we forecast the future trends of technological keywords of artiﬁcial intelligence (AI) in order to know the future technology of AI. We perform a case study to provide how the proposed method can be applied to real areas. In this paper, we retrieve the patent documents related to AI technology, and analyze them to ﬁnd the technological trend of AI. From the results of AI technology case study, we can ﬁnd which technological keywords are more important or critical in the entire structure of AI industry. The existing methods for patent keyword analysis were depended on the collected patent documents at present. But, in technology analysis, the prior knowledge by domain experts is as important as the collected patent documents. So, we propose a method based on Bayesian inference for technology analysis using the patent documents. Our method considers the patent data analysis with the prior knowledge from domain experts.


Introduction
Many researches related to patent data analysis have been performed in various fields of management of technology (MOT) [1][2][3][4][5]. They analyzed the patent documents for technology transfer, new product development, technology forecasting, etc. A patent contains detailed information on the technology developed, including the title and abstract of the invention, claims, dates of filing and registration, citation information, technological classification codes, names of inventor and applicant, and so forth [6]. Because of this characteristic of patent, many researchers have carried out technology analysis on their fields using patent document data [7]. In addition, they proposed diverse methods for technology analysis using patent data.  studied on a statistical method based on functional count data modeling for patent data analysis [8]. They extracted technological keywords from the patent documents applied by Apple company and analyzed them by functional count data models based on Poisson, negative binomial, and hurdle Poisson distributions [8]. Using the result of their patent analysis, they found technological structure of Apple for understanding the innovation and evolution of Apple's technologies [8]. Uhm el al. (2017) also proposed a statistical inference for the

Patent Analysis and Technology Forecasting
The patent system in the world encourages the research and development of technology by protecting the inventors' exclusive rights to registered patents for a certain period of time [7]. So, a patent contains various and detailed information of developed technology [6]. We can get valuable knowledge for technology management from the results of patent analysis. Patent analysis is to analyze the data contained in patent documents such as title of invention, abstract, claims, citations, international patent classification (IPC) codes applied, and registered dates, etc. Kim et al. (2019) studied on a method for patent keyword analysis using time series and copular models [15]. They extracted the technological keywords from the searched patent documents and analyzed them for technology forecasting [15]. Kim et al. (2018) analyzed the IPC codes extracted from the patent documents related to AI [16]. They also used Bayesian regression and social networks analysis for the patent IPC codes analysis [16].
Technology forecasting is to forecast the future states of target technology [9]. Uhm et al. (2017) proposed an interval estimation for patent keyword analysis [9]. According to the change of intervals of patent keywords, they found technological structure of target technology [9]. This paper only focused on the collected patent data. That is, the authors did not consider the experience and knowledge of domain experts. But, in most technology analysis based on patent documents, the prior knowledge of domain experts is important to understand and analyze the technology. To solve this problem of previous researches, we propose a new method for patent keyword analysis using Bayesian prediction interval estimation.

Bayesian Prediction Interval Estimation for Patent Data Analysis
The frequentist estimates the model parameters using only the observed data. But, the Bayesian adds the prior belief in data to the observed data for the model parameter estimating [17]. The main difference between frequentist and Bayesian is the use of prior belief in data for statistical inference such as estimation, testing, modeling, etc. [10]. Because patent data analysis is an analysis of technology, more explanatory results can be expected when the expert's knowledge is reflected in data analysis. So, we consider the Bayesian approach for patent data analysis. Basically, the Bayesian approach is based on Bayesian rules as follows [18].
where x and y are variables representing the patent keywords. That is, Bayes' rule is dependent on the conditional probability. In addition, P(x) = θ P(θ)P(x θ) . In Bayesian inference, P(θ x) and P(θ) are posterior and prior respectively. Also, P(x θ) is likelihood function based on observed data. The P(x) does not depend on θ so we can omit it as follow [10].
This is core formula of Bayesian inference. In this paper, the θ represents the keyword extracted from patent documents, and we are finally interested in P(θ x) called posterior density. So, Bayesian inference reflects not only the collected data but also the domain knowledge to which the data belongs, through posterior distribution as follow.

Posterior ∝ Prior × Likelihood
This feature of Bayesian statistics is useful for analyzing patent data. This is because patent data represents technology and technology requires the experience of domain experts. We apply Bayesian inference to patent data analysis for technology analysis as shown in Table 1. In this paper, we combine the domain expert knowledge with collected patent data for technology patent analysis. Using this expression based on prior, likelihood and posterior, we compute Bayesian confidence interval for each keyword (θ). The 100(1 − α)% Bayesian confidence interval C α of keyword θ is defined as follows [10,17].
The θ represents the frequency of the keyword having posterior distribution and is defined as follows [10,17].
In this case, we find the largest C α that satisfies In general, Bayesian confidence intervals are not unique, because we can take many intervals satisfied with the proper probability coverage for a given posterior density. In this paper, x i represents ith keyword in the patent-keyword matrix. Let n keywords data follows a normal distribution as follows.
Among the various credible intervals, we select the interval with the largest posterior density function of θ. If the confidence interval contains many θ s with large value of posterior probability density function, the confidence interval is shortened. Using this Bayesian interval based on posterior density, we carry out patent data analysis as follow.
In Figure 1, we first select the target technology to be analyzed. In this study, we choose AI technology as the target technology. Since the Fourth Industrial Revolution, AI has dominated most technological fields. We search the patent documents related to AI from the patent databases in the world. Using the collected AI patent documents, we build patent-keyword matrix as a structured data for patent data analysis. We apply various text mining techniques to this preprocessing. We use R data language and its packages for the preprocessing based on text mining [19,20]. The row and column of the structured data (patent-keyword matrix) are patent and keyword respectively. Each value of the matrix represents the occurred frequency of each keyword in a patent document. Next, we estimate the Bayesian credible intervals of AI keywords for AI technology analysis. So, we forecast the future trends of AI technology using the results of Bayesian interval estimation.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 13 where 2 is known, and the parameter is distributed as ( , 2 ). Then the posterior distribution of is shown as follow [10,17].
Among the various credible intervals, we select the interval with the largest posterior density function of . If the confidence interval contains many s with large value of posterior probability density function, the confidence interval is shortened. Using this Bayesian interval based on posterior density, we carry out patent data analysis as follow.
In Figure 1, we first select the target technology to be analyzed. In this study, we choose AI technology as the target technology. Since the Fourth Industrial Revolution, AI has dominated most technological fields. We search the patent documents related to AI from the patent databases in the world. Using the collected AI patent documents, we build patent-keyword matrix as a structured data for patent data analysis. We apply various text mining techniques to this preprocessing. We use R data language and its packages for the preprocessing based on text mining [19,20]. The row and column of the structured data (patent-keyword matrix) are patent and keyword respectively. Each value of the matrix represents the occurred frequency of each keyword in a patent document. Next, we estimate the Bayesian credible intervals of AI keywords for AI technology analysis. So, we forecast the future trends of AI technology using the results of Bayesian interval estimation. In this paper, we consider diverse prior distributions for Bayesian inference. The occurred frequency value of patent-keyword matrix can be distributed as Poisson probability density as follow [10].
where, is the parameter of Poisson random variable representing average number of occurred frequency values. In this case, we can choose the following priors; Uniform, Jeffrey, Exponential, Gamma, and Chi-square. For example, we consider Chi-square prior distribution as follow [10], In this paper, we consider diverse prior distributions for Bayesian inference. The occurred frequency value of patent-keyword matrix can be distributed as Poisson probability density as follow [10].
where, λ is the parameter of Poisson random variable representing average number of occurred frequency values. In this case, we can choose the following priors; Uniform, Jeffrey, Exponential, Gamma, and Chi-square. For example, we consider Chi-square prior distribution as follow [10], By combining the prior of Chi-square with likelihood of Poisson, we can get the posterior distribution as follow [10].
So, the posterior probability density function is defined as follow [10]. Table 2 represents the prior and posterior distributions used in our paper for Bayesian interval estimation [10]. Table 2. Prior and posterior distributions for Bayesian interval estimation.

Prior Distribution, P(λ)
Posterior Distribution, P(λ|x) In this paper, we use these five prior and posterior distributions to estimate Bayesian prediction intervals of patent keywords. Also, we estimate the Bayesian intervals of patent keywords over time. Because technology evolves over time [21]. Next, we make an experiment using AI patent keywords to illustrate the validity of our study.

Case Study Using AI Patent Data
Using the patent documents related to AI technology, we carried out a case study to show the validity of our research. This illustrates how our study could be applied to practical problems.
We searched the AI patent documents from the patent databases [22,23]. The searching keywords equation is defined as follow; Searching keywords equation = (big OR data) OR (neural OR network) OR ((Linear OR Logistic)AND Regression) OR (Support AND Vector) OR (Naive AND Bayesian) OR (Hidden AND Markov) OR (Conditional AND Random*) OR (Decision AND Tree) OR Cluster* OR ((Dimension OR feature) AND Reduction) OR (nearest AND neighbor) or ((((big OR data) OR mining) OR ((DB OR big) AND data)) OR knowledge OR (find OR detect OR discovery)))) or ((((ontology OR OWL OR (DAML AND OIL) OR SWRL OR (Semantic AND Web AND Rule AND Language) OR SEMANTIC) OR (represent* OR expres* OR design OR induct* OR deduct* OR reason* OR inferenc*)))) or ((((Boltzma* A/1 Machin*) OR ((deep* OR Convolution* OR Recurre* OR unsupervised* OR supervised* OR Reinforcement*) OR learning)) AND (KNN* OR k-nearest* OR (data OR mining) OR (big AND data*) OR neural OR network OR ((Linear OR Logistic) AND Regression) OR (Support AND Vector*) OR (Naive AND Bayesian*) OR (Hidden AND Markov*) OR (Conditional AND Random*) OR (Decision AND Tree) OR Cluster* OR ((Dimension OR feature) AND Reduction) OR (nearest AND neighbor*)))) or pattern* OR (aware* OR realiz* OR cognit* OR recogn* OR percept* OR understand* OR comprehens* OR estimat* OR assumpt* OR presump* OR anal*) OR (cognit* OR percept*) AND (computing OR process* OR application OR program) OR (humanlife OR "human life" OR living* OR livelihood OR lifelog OR "life log") OR (aware* OR realiz* OR cognit* OR recogn* OR percept* OR understand* OR comprehens* OR estimat* OR assumpt* OR presump* OR anal*) or (emotion* OR sentiment* OR feel*) OR (aware* OR realiz* OR cognit* OR recogn* OR percept* OR understand* OR comprehens* or estimat* or assumpt* or presump* or anal*) OR space* OR (aware* OR realiz* OR cognit* OR recogn* or percept* or understand* or comprehens* or estimat* or assumpt* or presump* or anal*) or (collabor* or collect*) OR intel* OR ((((image* OR video* OR movie OR picture) OR (object* OR target* OR non-rigid* OR nonrigid*) OR (extract* OR awareness* OR realizat* OR cognit*OR capture*)))) From the collected patent documents, we removed the patent documents not related to AI or duplicated, and selected 13,858 valid patents from 1995 to 2016. In this case study, we use the R computing language and its text mining package to preprocess and analyze the collected patent data [19,20]. We also extracted 36 keywords representing AI technology from the valid patent documents by text mining technique. The keywords are as follow; analysis, awareness, behavior, cognitive, collaborative, computing, conversation, corpus, data, dialogue, feedback, figure, image, inference, interface, language, learning, mind. Morphological, natural, network, neuro, object, ontology, pattern, recognition, representation, sentence, sentiment, situation, spatial, speech, understanding, video, vision, voice. In this paper, we selected AI keywords in sufficient consultation with AI technology experts [24]. Using these keywords, we made the patent-keyword matrix as a structured data for Bayesian data analysis. The rows and columns of this matrix are 13,858 patents and 36 keywords respectively. Each element of this matrix is frequency value of occurred keyword in each patent. First, among the 36 AI technology keywords, we selected 8 keywords representing AI technology highly. Table 3 shows Bayesian intervals with 95% confidence level for the mean frequency of eight representative keywords. In this case, we used five kinds of prior probability distributions, uniform, Jeffrey, exponential, and chi-square suitable for patent frequency data. We found that the differences in the widths of intervals were not large according to the prior probability distributions. From these results, we can see that the confidence interval for the keyword of data is the largest. Next was the speech and video keywords. So, we knew the technology based on data is most important to AI system. For more detailed AI technology forecasting, we estimated Bayesian prediction intervals for all 36 keywords. In addition, we show the intervals separated by time (1990s, 2000s, 2010s). Because technology is developed and evolves over time. Table 4 shows the Bayesian prediction interval with 95% confidence level for the 1990s. From the intervals of AI keywords from 1990 to 1999, we can see the characteristics of AI technology during this period. We estimated Bayesian confidence intervals for all AI keywords using five representative prior distributions, uniform, Jeffrey, exponential, gamma, and chi-square, which are used for count data. We can see that the AI keywords with the highest average of their occurred frequencies in the estimated Bayesian intervals are object and data. In other words, we can confirm that the largest weight in the AI object is data. In addition, the keywords of speech, video, network, image, pattern, spatial, language, and analysis are also important in the AI technology field. During this period, AI technology was studied with a focus on data analysis, pattern recognition, speech and image processing, and understanding of natural language. Next, Table 5 illustrates the Bayesian prediction intervals with 95% confidence level for all AI keywords from 2000 to 2009. As shown in the results of Table 4, the AI technology from 2000 to 2009 is affected by technological keywords such as data, object, speech, video, and image. However, this time period shows that the relative weight of feedback and learning keywords is increasing. In other words, we found that active researches related to the learning and behavior technologies for AI are being conducted at this time. Lastly, Table 6 represents the most recent results of Bayesian interval estimation for all AI keywords. Similar to the results in Tables 5 and 6, we can find that there is still a lot of research going on in data technology, imaging, and natural language processing technology at this time. Especially since 2010, the research demands for analysis, recognition, and interface for AI have increased. We also confirmed the emergence of technology related to AI situation. Combining the results in Tables 4-6, we show the technological change of AI over time in Figure 2.
This figure shows the top 20 AI technology keywords for each period. The impact of the keywords of object and data on AI technology has not changed over the years. However, the other technological keywords except object and data show a change in relative influence on AI technology over time. For example, the keyword of voice increased in the 2000s and then declined. In contrast, the keyword of interface tends to be the opposite of the keyword of voice. From the result of Figure 2, we conclude the technology related to data is very important to AI technology. In addition, the technology of video and image influences on the development of AI technology. Pattern recognition and natural language understanding are meaningful technologies for AI. Recently, the technology related to situation is needed to improve the performance of AI technology. This figure also provides the various results for researchers and developers in AI technology areas to build their R&D strategies for AI technology.
To identify the trend of keywords over time, we show the ranking of all AI keywords by the width of Bayesian prediction intervals with 95% confidence level over time in Table 7.    The interval widths of most AI keywords have decreased, and this result is similar to the result of Figure 2. From the results, we confirmed that the detailed technologies for AI are expanding to various technological keywords without focusing on specific keywords. So, in the future, the AI technology is expected to have a large impact on most other technology fields as well as the society.

Conclusions
In this paper, we proposed a method for patent data analysis using Bayesian prediction interval estimation. Bayesian inference combines prior distribution with likelihood function to build posterior distribution for parameters. In general, the prior distribution represents the previous knowledge of parameter domain. In our technology analysis by patent data analysis, the prior distribution contains the experience and knowledge of AI experts. The likelihood function represents the likelihood for the parameters described by the observed data. In our method, we use the collected AI patent documents for the data in the likelihood function. We estimate the Bayesian prediction intervals of the parameters representing AI keywords using the posterior distribution combining the prior and the likelihood. Therefore, the proposed model is a statistical patent analysis model that can reflect the opinions of experts. This is an important issue in technology forecasting and management. This is because technology analysis should consider not only quantitative patent analysis but also expert knowledge in the relevant technological field. We tried to solve this problem in traditional patent data analysis using the learning approach of Bayesian statistics.
In addition, we applied the proposed method to AI technology analysis. This is because AI is a very interesting area in most technological fields. We collected the patent documents related to AI from the patent databases in the world. Using the preprocessing by text mining techniques, we made a structured data which is a matrix consists of patents (rows) and keywords (columns). Each element of this matrix represents the occurred frequency of a keyword in each patent. We estimated the Bayesian prediction intervals of the frequency mean for all AI keywords. From the results of Bayesian interval estimation, we extracted technological trends of AI in the future. So, we conclude that the data technology is core technology for AI system. In addition, the AI developers have studied on the video and image technology as well as the natural language processing. Recently, AI researchers are also interested in developing technology for situational awareness.
Our research is worthy of two perspectives in academic and practical points. On the academic aspects, the proposed method presented Bayesian inference method for patent keyword analysis considering the experience and knowledge of AI domain experts. Next, our research contributes to the technology management such as technology forecasting, research and development planning, technological innovation, etc., on the practical aspects. In our future study, we will study on more advanced Bayesian models for patent data analysis using the Markov Chain Monte Carlo (MCMC) computation, finite mixture or hierarchical models, Bayesian graph model, etc. In addition, we will consider the possibility of geographical patent distribution according to countries. So, we will compare the AI technology portfolios between the countries around the world. Lastly, we are to study on the way how this methodology could be applied to other practical scenarios with experts' opinion.
Author Contributions: J.-B.R. and S.J. designed this research and collected the data set for the experiment. D.U. analyzed the data to show the validity of this paper and wrote the paper and performed all the research steps. In addition, all authors have cooperated with each other for revising the paper. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.