Ensemble Modeling for Sustainable Technology Transfer

: These days, technological advances are being made through technological conversion. Following this trend, companies need to adapt and secure their own sustainable technological strategies. Technology transfer is one such strategy. This method is especially effective in coping with recent technological developments. In addition, universities and research institutes are able to secure new research opportunities through technology transfer. The aim of our study is to provide a technology transfer prediction model for the sustainable growth of companies. In the proposed method, we ﬁrst collected patent data from a Korean patent information service provider. Next, we used latent Dirichlet allocation, which is a topic modeling method used to identify the technical ﬁeld of the collected patents. Quantitative indicators on the patent data were also extracted. Finally, we used the variables that we obtained to create a technology transfer prediction model using the AdaBoost algorithm. The model was found to have sufﬁcient classiﬁcation performance. It is expected that the proposed model will enable universities and research institutes to secure new technology development opportunities more efﬁciently. In addition, companies using this model can maintain sustainable growth in line, coping with the changing pace of society.


Introduction
Technology is a very important tool in the today's information society. Especially in recent years, as innovation is emphasized, the importance of technology is being recognized again. Technological innovation consists of the creation and application of knowledge, and it is an important factor in sustainable growth [1,2]. In the past, companies operated independently in the areas of research and development, management, and marketing. As mentioned above, however, as the importance of technology is emphasized, companies are pursuing the concept of technology management (TM), which considers technology and management together. Many companies have tried to standardize technologies or manage intellectual properties in order to secure a competitive advantage through TM [1].
Intellectual property management (IPM) is a managerial method that strategically utilizes intellectual property, such as patents, trademarks, and copyrights, for corporate management [3]. Among these, a patent discloses information about an invention through the law, and the inventor is able to achieve exclusive rights to the invention. In other words, although a patent possesses a disadvantage in terms of its potential to be disclosed to the public, the inventor can have exclusive rights to the invention for 20 years. For this reason, many technology-based companies try to acquire exclusive

Patent Analysis for Sustainable Technology Management
A patent is an essential element for protecting the rights of inventions and securing technological competitiveness. It is a necessary tool for technology management and can also be used for technology forecasting and R&D management [4,7,9,14,[19][20][21]. In the initial research on technology management using patent data, Biju et al. used only quantitative information from five-year patent data from India to identify trends and innovation levels in India [5]. Their study found macroscopic trends and levels of technology, but it is limited because of the difficulty of understanding the details. On the other hand, Yoon and Park analyzed patent data using a text mining technique [19,20]. They extracted keywords from the patent information literature by text mining and analyzed it using a morphological approach. Through this, they attempted to discover technological opportunities, as well as perform technology forecasting. Daim et al. forecast emerging technology by analyzing the bibliometrics of patents [7]. Jun and Park suggested a comprehensive patent analysis method that considered time-series information, information from literature, and international patent classification (IPC), which is classified according to the technical description of a patent by an examiner. They evaluated the level of technological innovation using the proposed methodology [6,8]. Recently, patent analysis methodology based on statistical techniques has been proposed for technology forecasting and discovering emerging technology. Kim and Jun (2016) focused on the fact that the frequency of words occurring in patent documents is mostly zero. They attempted a technical analysis using a zero-inflated Poisson distribution and a negative binomial regression [22]. In addition, Park and Jun found that the frequency of a keyword and IPC generated in the patent documents follows the Poisson distribution. Considering this, they have constructed a technical analysis model for the LED field through a Poisson regression model and Bayesian network [15]. On the other hand, many studies have been carried out to discover promising technologies and emerging technologies through patterns generated in patent applications using machine learning. Kim and Bae attempted to foresee promising technology by identifying patterns of keywords in patent documents using clustering, a typical unsupervised technique [9]. Kyebambe et al. try to explore emerging technology utilizing supervised learning that uses patent citation information [21].

Topic Modeling
The topic model is one of the unsupervised methods; that is, it is a text mining technique with which the topics or themes of documents can be identified from a larger collected document corpus [23,24]. Latent Dirichlet allocation (LDA), which is one of the most popular topic modeling techniques, is a probabilistic model for expressing a corpus based on Bayesian models and is also considered to be a probabilistic extension of latent semantic analysis (LSA) [23,24]. The basic idea of the LDA is that each document has a topic, and a topic can be defined as a word distribution. Figure 1 shows a graphical model of LDA.

Patent Analysis for Sustainable Technology Management
A patent is an essential element for protecting the rights of inventions and securing technological competitiveness. It is a necessary tool for technology management and can also be used for technology forecasting and R&D management [4,7,9,14,[19][20][21]. In the initial research on technology management using patent data, Biju et al. used only quantitative information from five-year patent data from India to identify trends and innovation levels in India [5]. Their study found macroscopic trends and levels of technology, but it is limited because of the difficulty of understanding the details. On the other hand, Yoon and Park analyzed patent data using a text mining technique [19,20]. They extracted keywords from the patent information literature by text mining and analyzed it using a morphological approach. Through this, they attempted to discover technological opportunities, as well as perform technology forecasting. Daim et al. forecast emerging technology by analyzing the bibliometrics of patents [7]. Jun and Park suggested a comprehensive patent analysis method that considered time-series information, information from literature, and international patent classification (IPC), which is classified according to the technical description of a patent by an examiner. They evaluated the level of technological innovation using the proposed methodology [6,8]. Recently, patent analysis methodology based on statistical techniques has been proposed for technology forecasting and discovering emerging technology. Kim and Jun (2016) focused on the fact that the frequency of words occurring in patent documents is mostly zero. They attempted a technical analysis using a zero-inflated Poisson distribution and a negative binomial regression [22]. In addition, Park and Jun found that the frequency of a keyword and IPC generated in the patent documents follows the Poisson distribution. Considering this, they have constructed a technical analysis model for the LED field through a Poisson regression model and Bayesian network [15]. On the other hand, many studies have been carried out to discover promising technologies and emerging technologies through patterns generated in patent applications using machine learning. Kim and Bae attempted to foresee promising technology by identifying patterns of keywords in patent documents using clustering, a typical unsupervised technique [9]. Kyebambe et al. try to explore emerging technology utilizing supervised learning that uses patent citation information [21].

Topic Modeling
The topic model is one of the unsupervised methods; that is, it is a text mining technique with which the topics or themes of documents can be identified from a larger collected document corpus [23,24]. Latent Dirichlet allocation (LDA), which is one of the most popular topic modeling techniques, is a probabilistic model for expressing a corpus based on Bayesian models and is also considered to be a probabilistic extension of latent semantic analysis (LSA) [23,24]. The basic idea of the LDA is that each document has a topic, and a topic can be defined as a word distribution. Figure 1 shows a graphical model of LDA.   When there is an m-th document D m , the distribution in which the document D m is included in the latent topic z m,n is denoted by → θ m , which is a multinomial distribution whose hyper-parameter α follows the Dirichlet distribution. The number of topics k = 1, . . . , K, can be statistically estimated, or the experimenter can determine a fixed value. The distribution of words for k topics is denoted by → ϕ m , which is also a multinomial distribution whose hyper-parameter β follows the Dirichlet distribution. The word probability w m,n is determined by p(w m,n |z m,n , β).
In Equation (1), the marginal distribution of a document can be obtained as shown in Equation (2) integrating over θ and summing over z.
Finally, the probability of the corpus can be expressed as the product of the marginal probabilities of each single document, as shown in Equation (3).
A topic model is a useful method to identify topics hidden in a set of documents. As a result of the advantage described above, in recent years, research is often carried out to analyze technical documents using topic models. Kim et al. used a topic model to find common technologies included in patent data [25]. Lee and Kang conducted a topic model analysis for published articles and attempted to discover critical topics in technology innovation management [26].

Ensemble Method-AdaBoost Algorithm
The ensemble method is an algorithm to improve accuracy by combining multiple weak learners, typically by boosting and bagging. The Adaboost algorithm is a method proposed by Y. Freund and R. Schapire in 1995 to improve 'boosting' [27,28]. The most important feature of the AdaBoost algorithm is that the weak learners generated in bagging are made in parallel, whereas AdaBoost produces sequentially weaker learners. To solve the multi-label case problem and improve the performance AdaBoost generalization, Schapire and Singer suggested a method for tuning the weight in 1999 [29]. As we consider the problem with a single label in our study, the method proposed by Freund and Schapire is used. Figure 2 shows a graphical model of the AdaBoost algorithm. When there is an -th document , the distribution in which the document is included in the latent topic , is denoted by , which is a multinomial distribution whose hyperparameter follows the Dirichlet distribution. The number of topics 1, … , , can be statistically estimated, or the experimenter can determine a fixed value. The distribution of words for k topics is denoted by , which is also a multinomial distribution whose hyper-parameter follows the Dirichlet distribution. The word probability , is determined by , | , , .
, , | , In Equation (1), the marginal distribution of a document can be obtained as shown in Equation (2) integrating over and summing over .
Finally, the probability of the corpus can be expressed as the product of the marginal probabilities of each single document, as shown in Equation (3).
A topic model is a useful method to identify topics hidden in a set of documents. As a result of the advantage described above, in recent years, research is often carried out to analyze technical documents using topic models. Kim et al. used a topic model to find common technologies included in patent data [25]. Lee and Kang conducted a topic model analysis for published articles and attempted to discover critical topics in technology innovation management [26].

Ensemble Method-AdaBoost Algorithm
The ensemble method is an algorithm to improve accuracy by combining multiple weak learners, typically by boosting and bagging. The Adaboost algorithm is a method proposed by Y. Freund and R. Schapire in 1995 to improve 'boosting' [27,28]. The most important feature of the AdaBoost algorithm is that the weak learners generated in bagging are made in parallel, whereas AdaBoost produces sequentially weaker learners. To solve the multi-label case problem and improve the performance AdaBoost generalization, Schapire and Singer suggested a method for tuning the weight in 1999 [29]. As we consider the problem with a single label in our study, the method proposed by Freund and Schapire is used. Figure 2 shows a graphical model of the AdaBoost algorithm.  The main idea is to adjust the weights of weak classifiers for each distribution and finally combine these weak classifiers to get one strong classifier. In other words, it generates a hypothesis with high accuracy.
Suppose that the training example is given as S = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )}, where x i ∈ χ, y i ∈ {−1, +1}, and y i refer to class or label. An initial distribution D 1 (i) is defined as D 1 (i) = 1/n, a uniform distribution. The distribution D t is updated according to the weighted error t to minimize the weak hypothesis h t in each round.
According to error t , weight α t of weak hypothesis h t is determined according to Equation (4), and the distribution D t+1 (i) is updated through Equation (5) by the weight α t .
As can be seen from Equation (5), if t > 0.5, α t < 0 and if t < 0.5, α t > 0. That is, as the error t of the weak hypothesis h t decreases, the weight α t increases. If the error t > 0.5, the performance of weak classifier is lower than that of random guessing. Therefore, the hypothesis of the weak classifier is not considered, and we move to the next round.
In Equation (6) where Z t represents a normalization factor. This satisfies Finally, the strong classifier is defined as follows.

Methodology
In this research, we propose a scientific-based technology transfer prediction model using the AdaBoost algorithm for sustainable technology management. The proposed model is able to cover the quantitative indexes and information in the patent literature. Patent data are composed of various information on the inventions, such as text, numeric information, equations, and figures. In order to advance this experiment, therefore, patent data need to be changed into structured data. This study progresses as follows. In the first step, we identify the technology topics by analyzing titles and abstracts in patent data. In the next step, we extract quantitative indexes from patent data. Finally, merging the technology topics and quantitative indexes, the technology transfer prediction model is produced using AdaBoost. Figure 3 illustrates this experimental process.

Collecting the Data
The aim of our study is to produce a technology transfer prediction model using patent data. The patents used for the experiment belong to the domain of inference and machine learning technologies filed before August 2016 in the United States. Because artificial intelligence (AI) technology is a fusion of various technologies, such as speech recognition, natural language processing, machine learning, and so on, it is difficult for a company to conduct research and development on all the sub-technologies in that field. Therefore, a lot of enterprises adopt a technology trading strategy. For this reason, we collect the patent data included in the field of inference and machine learning and try to generate the model. The data were collected from worldwide intellectual property service (WIPS), which is a Korean service provider of patent information. The collected data contains noise or redundant patents, so we removed them. As a result, a total of 711 valid patents were selected for the further analysis. In the patent search database that we used, one is able to check the information of applicant and current assignee of a patent. If current assignee is different from the original applicant, we considered that the patent is transferred. Out of the 711 valid patents, 208 were found to have been transferred. Table 1 shows a detailed description of the collected data.

Technical Field
Applicant Country

Technology Clustering Using Topic Model
To enable searchers to locate patents easily, the patent examiner classifies them using IPC, cooperative patent classification (CPC), or file index (FI) according to the contents of the patent specification. However, since the classification codes do not contain detailed technical content, the

Collecting the Data
The aim of our study is to produce a technology transfer prediction model using patent data. The patents used for the experiment belong to the domain of inference and machine learning technologies filed before August 2016 in the United States. Because artificial intelligence (AI) technology is a fusion of various technologies, such as speech recognition, natural language processing, machine learning, and so on, it is difficult for a company to conduct research and development on all the sub-technologies in that field. Therefore, a lot of enterprises adopt a technology trading strategy. For this reason, we collect the patent data included in the field of inference and machine learning and try to generate the model. The data were collected from worldwide intellectual property service (WIPS), which is a Korean service provider of patent information. The collected data contains noise or redundant patents, so we removed them. As a result, a total of 711 valid patents were selected for the further analysis. In the patent search database that we used, one is able to check the information of applicant and current assignee of a patent. If current assignee is different from the original applicant, we considered that the patent is transferred. Out of the 711 valid patents, 208 were found to have been transferred. Table 1 shows a detailed description of the collected data.

Technology Clustering Using Topic Model
To enable searchers to locate patents easily, the patent examiner classifies them using IPC, cooperative patent classification (CPC), or file index (FI) according to the contents of the patent specification. However, since the classification codes do not contain detailed technical content, the ability to ascertain technical material based on them is limited. Therefore, in our study, the technologies in the target domain are first classified through topic modeling. Figure 4 illustrate the process of finding the technology topic in this research. In order to grasp the technological topics of the collected data, we used the text of patent documents obtained by merging the titles and the abstracts into a corpus, which is a linguistic set of texts. Under the given grammatical structure, the word class is generally written according to the location of the predicate, object, and the like. Among them, stop-words such as "the", "a", "by", "as", and "is", referring to commonly used terms, are necessary for building general sentences, but are worthless for the analysis in this study. As such words do not provide special meaning in LDA and cause an unnecessary increase in the complexity of the computation when included, it is necessary to remove them appropriately for efficient information processing. Therefore, preprocessing is performed by eliminating stop-words. In addition, because not every word in the documents is significant for further analysis, the term frequency-inverse document frequency (TF-IDF) method was used to select significant words for the experiment. The form of the word is determined by the position of each word in the sentence. For this reason, if the word form is not the same, a computer is not able to recognize whether some words have the same meaning or not, which can result in a data distortion. Thus, the unification of words that have the same meaning is necessary. In this study, stemming is used to unify the forms of words. Also, we eliminate any numbers, punctuation, and symbols that may distort the analytic process. ability to ascertain technical material based on them is limited. Therefore, in our study, the technologies in the target domain are first classified through topic modeling. Figure 4 illustrate the process of finding the technology topic in this research. In order to grasp the technological topics of the collected data, we used the text of patent documents obtained by merging the titles and the abstracts into a corpus, which is a linguistic set of texts. Under the given grammatical structure, the word class is generally written according to the location of the predicate, object, and the like. Among them, stop-words such as "the", "a", "by", "as", and "is", referring to commonly used terms, are necessary for building general sentences, but are worthless for the analysis in this study. As such words do not provide special meaning in LDA and cause an unnecessary increase in the complexity of the computation when included, it is necessary to remove them appropriately for efficient information processing. Therefore, preprocessing is performed by eliminating stop-words. In addition, because not every word in the documents is significant for further analysis, the term frequency-inverse document frequency (TF-IDF) method was used to select significant words for the experiment. The form of the word is determined by the position of each word in the sentence. For this reason, if the word form is not the same, a computer is not able to recognize whether some words have the same meaning or not, which can result in a data distortion. Thus, the unification of words that have the same meaning is necessary. In this study, stemming is used to unify the forms of words. Also, we eliminate any numbers, punctuation, and symbols that may distort the analytic process. The data that have been refined through the preprocessing system are used to fit the LDA model. To fit the LDA model, we adopted the Gibbs sampling method and increased it from 2 to 10 to find the optimal K. In addition, the number of iterations is limited to 500, 1000, and 2000 so that the topic can be displayed well. The Dirichlet parameter α is set as the estimated value from the document and parameter β is set as 0.1. Table 2 shows the most suitable parameters for fitting LDA models found through repetition.

Component
Candidates Inference Algorithm Gibbs sampling The number of K From 2 to 10 Gibbs sampling iteration 1000 Parameter α, β Α = T/50, β = 0.1 There are various opinions on how to determine the optimal number of topics [23,24,[30][31][32]. For our purposes, the topic is determined by the topic probability θ of each document. The process of finding the technology topic. TF-IDF-term frequency-inverse document frequency.
The data that have been refined through the preprocessing system are used to fit the LDA model. To fit the LDA model, we adopted the Gibbs sampling method and increased it from 2 to 10 to find the optimal K. In addition, the number of iterations is limited to 500, 1000, and 2000 so that the topic can be displayed well. The Dirichlet parameter α is set as the estimated value from the document and parameter β is set as 0.1. Table 2 shows the most suitable parameters for fitting LDA models found through repetition. There are various opinions on how to determine the optimal number of topics [23,24,[30][31][32]. For our purposes, the topic is determined by the topic probability θ of each document.

Technology Transfer Prediction Model Using the AdaBoost Algorithm
In order to implement the technology transfer prediction model, we use the AdaBoost algorithm, which is a typical ensemble model. Figure 5 illustrates the proposed model, and Table 3 shows the used variables in this study.

Technology Transfer Prediction Model Using the AdaBoost Algorithm
In order to implement the technology transfer prediction model, we use the AdaBoost algorithm, which is a typical ensemble model. Figure 5 illustrates the proposed model, and Table 3 shows the used variables in this study.  In our experiment, the quantitative indexes mean the variables, such as citation, period, claim, family patent, family country, patent references, and non-patent references. The technology topic represents the technical description of a patent using the LDA. As mentioned in the background, however, it is not appropriate to use categorical independent variables directly as inputs to the AdaBoost algorithm. Therefore, the technology topic variable needs to be changed into the dummy variable. Among the variables in the dataset, "Transfer" indicates whether a patent has been transferred from the original holder to another. In the patent search database that we used, one is able to check the information of applicant and current assignee of a patent. If an assignee is changed, we consider that the patent is transferred. We used "Transfer" as the output variable in this experiment. Therefore, if the patent is transferred to another, it has a value of +1, otherwise it has a value of −1. The parameters used in the AdaBoost algorithm were estimated through iteration. Table 4 shows the parameters used in this experiment.

Component
Candidates Ensemble Model AdaBoost [27] Max. depth 4 Iteration 100  In our experiment, the quantitative indexes mean the variables, such as citation, period, claim, family patent, family country, patent references, and non-patent references. The technology topic represents the technical description of a patent using the LDA. As mentioned in the background, however, it is not appropriate to use categorical independent variables directly as inputs to the AdaBoost algorithm. Therefore, the technology topic variable needs to be changed into the dummy variable. Among the variables in the dataset, "Transfer" indicates whether a patent has been transferred from the original holder to another. In the patent search database that we used, one is able to check the information of applicant and current assignee of a patent. If an assignee is changed, we consider that the patent is transferred. We used "Transfer" as the output variable y i in this experiment. Therefore, if the patent is transferred to another, it has a value of +1, otherwise it has a value of −1. The parameters used in the AdaBoost algorithm were estimated through iteration. Table 4 shows the parameters used in this experiment. To validate the performance of the proposed model, we consider the measures of accuracy, specificity, and sensitivity, which can be used for evaluating the classification performance. The measures are shown in detail in Table 5. Table 5. Performance measures. TP-true positive; TN-true negative; FP-false positive; FN-false negative; P-positive; N-negative.

Accuracy
TP+TN P+N

TN FP+TN
The accuracy represents the overall performance of the classifier; it considers true positive (TP) and true negative (TN) together. However, when a classifier learns the noisy training data excessively, it may cause overfitting. Thus, the accuracy is not able to measure the correct performance of a classifier in such a situation. In order to overcome this problem, we evaluate the performance of the proposed model considering both the sensitivity, which only considers true positive, and the specificity, which considers true negative only.

Experiment Result
To carry out this study, we collected data for the technical field of inference and machine learning according to the conditions mentioned at the beginning of Section 3.1.
First, in order to examine the trend of the collected data, we show the trend for applications and technology transfer in the graph in Figure 6. In Figure 6, "n" is the graph showing the number of patent applications by year, and "tr" indicates the number of technology transfers by year.
To validate the performance of the proposed model, we consider the measures of accuracy, specificity, and sensitivity, which can be used for evaluating the classification performance. The measures are shown in detail in Table 5. Table 5. Performance measures. TP-true positive; TN-true negative; FP-false positive; FN-false negative; P-positive; N-negative.

Accuracy
Sensitivity

Specificity
The accuracy represents the overall performance of the classifier; it considers true positive (TP) and true negative (TN) together. However, when a classifier learns the noisy training data excessively, it may cause overfitting. Thus, the accuracy is not able to measure the correct performance of a classifier in such a situation. In order to overcome this problem, we evaluate the performance of the proposed model considering both the sensitivity, which only considers true positive, and the specificity, which considers true negative only.

Experiment Result
To carry out this study, we collected data for the technical field of inference and machine learning according to the conditions mentioned at the beginning of Section 3.1.
First, in order to examine the trend of the collected data, we show the trend for applications and technology transfer in the graph in Figure 6. In Figure 6, "n" is the graph showing the number of patent applications by year, and "tr" indicates the number of technology transfers by year.
As can be seen, from 1995 to 2008, patent applications for the above technology field showed a steady increase. In 2009, patent applications dropped sharply compared with 2008, but the number of applications increased until 2013. However, from 2014 onward, patent applications have decreased again. Looking at the trends in technology transfer, the chart begins to measure it in 1997. Although the number of technology transfers is not large, it has gradually increased since 2008 and rose sharply in 2014.  As can be seen, from 1995 to 2008, patent applications for the above technology field showed a steady increase. In 2009, patent applications dropped sharply compared with 2008, but the number of applications increased until 2013. However, from 2014 onward, patent applications have decreased again. Looking at the trends in technology transfer, the chart begins to measure it in 1997. Although the number of technology transfers is not large, it has gradually increased since 2008 and rose sharply in 2014.
We would like to know trends according to specific technologies, but the patent data do not have technical classification information. Therefore, in this study, a technical classification is performed using the LDA as mentioned above. The LDA uses the parameters shown in Table 2. In addition, several methods for selecting the optimal number of topics have already been proposed [23,24,[31][32][33]. We used the method Cao et al. proposed, which is considered to be the most appropriate method for this study [31]. The result is shown in Figure 7. We would like to know trends according to specific technologies, but the patent data do not have technical classification information. Therefore, in this study, a technical classification is performed using the LDA as mentioned above. The LDA uses the parameters shown in Table 2. In addition, several methods for selecting the optimal number of topics have already been proposed [23,24,[31][32][33]. We used the method Cao et al. proposed, which is considered to be the most appropriate method for this study [31]. The result is shown in Figure 7. As a result of the analysis of the optimal number of topics using the method described by Cao et al., it was found to have a minimum at k = 5. Based on these results, we classified the technology topic of collected data. Table 6 shows the results. The top ten keywords included in each topic were utilized to define the technology topic. As a result of the technical classification, the technology of "natural language understanding" occupied the largest part, about 22.4%, in the field of inference and machine learning. Next, "expert system" technology accounted for 21.4%, followed by signal processing, image processing, and artificial neural network technology. Table 7 shows the number of technology transfers for the above technology fields. As a result of the analysis of the optimal number of topics using the method described by Cao et al., it was found to have a minimum at k = 5. Based on these results, we classified the technology topic of collected data. Table 6 shows the results. The top ten keywords included in each topic were utilized to define the technology topic. As a result of the technical classification, the technology of "natural language understanding" occupied the largest part, about 22.4%, in the field of inference and machine learning. Next, "expert system" technology accounted for 21.4%, followed by signal processing, image processing, and artificial neural network technology. Table 7 shows the number of technology transfers for the above technology fields. The transfer rate of the collected data was 29% on average. As a result of confirming the proportion of technology transfers according to the technology classification using LDA, the rate of technology transfer in the field of Topic 3 (natural language understanding) is 38%, which is relatively higher than other technology fields.
To generate the technology transfer prediction model proposed in this study, we merge the previous topic model results and the quantitative index for patents, which is described in Table 3 and used the AdaBoost algorithm. In order to compare the performance of the proposed method, performance comparison tests are performed with K-nearest neighbor classifier (K-NN), support vector machine (SVM), and neural network algorithm, which are representative classification algorithms. The performance measures use the accuracy, sensitivity, and specificity mentioned in Section 3. We also discuss models that include technology topics and those that do not. The experimental results are shown in Tables 8 and 9 and Figures 8 and 9. In Figures 8 and 9, NT refers to a model that does not include the technology topic, and YT refers to a model that includes the technology topic.  The transfer rate of the collected data was 29% on average. As a result of confirming the proportion of technology transfers according to the technology classification using LDA, the rate of technology transfer in the field of Topic 3 (natural language understanding) is 38%, which is relatively higher than other technology fields.
To generate the technology transfer prediction model proposed in this study, we merge the previous topic model results and the quantitative index for patents, which is described in Table 3 and used the AdaBoost algorithm. In order to compare the performance of the proposed method, performance comparison tests are performed with K-nearest neighbor classifier (K-NN), support vector machine (SVM), and neural network algorithm, which are representative classification algorithms. The performance measures use the accuracy, sensitivity, and specificity mentioned in Section 3. We also discuss models that include technology topics and those that do not. The experimental results are shown in Tables 7 and 8      We compared models that include the technology topic with ones that do not include it. The result is shown in the following. The classification performance of the technology transfer prediction model including the technology information from the literature is superior to the model that does not include the literature information. The sensitivity of the model that does not contain the technical content is notably lower than that of the model that includes the technical information because overfitting occurs in the model. Therefore, it can be assumed that technology information is an important factor in predicting technology transfer.
Next, we generated models using the proposed method and other models for comparison, which were based on classifiers such as K-nearest neighbor classifier (K-NN), support vector machine (SVM), and neural network, respectively. The K-NN is simple in structure but has an excellent performance. For this reason, it is used in many classification problems [33][34][35][36]. The support vector machine and the neural network are also well known to have excellent classification performance and are applied in various fields [37][38][39][40]. As a result of the comparison between the proposed model in this study and the other classifiers mentioned above, the accuracy of the models was found to be similar overall. However, in terms of the sensitivity and the specificity, which indicates the true positive and the true negative, respectively, there was a significant difference between the proposed model and the other models. In particular, the sensitivity and specificity of the other models we compared were lower than those of the proposed model. This seems to be because of the overfitting of the model. These results show that the proposed model performs better than the other models do in the case of technology transfer prediction. It can, therefore, be inferred that the proposed model based on patent data in this study is suitable for predicting technology transfer.

Discussion
The advancement of science and technology has made human life more convenient than ever, but competition in society has also become very intense. In today's technology-intensive market environment, companies strive to survive through sustained growth. Such efforts are made in a variety of ways, such as self-driving car alliance. Technology transfer is also a management strategy for maintaining technological competitiveness and sustaining the growth of companies. Previously, studies using surveys or patent data have been conducted to promote technology transfer, but no systematic model was suggested.
This study proposes a predictive model of technology transfer based on an ensemble method to support the continuous growth of enterprises and countries. The proposed model can predict the transferability of patents. In the experimental results, the proposed model showed better classification performance than the other models. If companies or research institutes use the predicted results, it is possible to select patents with a high potential for being transferred, which can increase the success rates of the transactions. The capital acquired through technology transfer can be reinvested in the activities for continuous growth of enterprises or research institutes.
In future work, we expect to improve the generalization performance of the model by using various technical data. In addition, it is necessary to develop an additional algorithm that is able to enhance the performance of prediction.

Conclusions
In recent years, intellectual property has become an indispensable element for the sustainable growth of a corporation. A lot of technology-intensive companies have tried to maintain their competitiveness directly through technological research and development. As a method of technological development is the pursuit of convergence, these days, it is difficult to secure competitiveness using traditional methods. As an alternative, technology transfer is becoming widely used. Technology transfer should be encouraged because not only companies, but also universities and research institutes that have developed technologies, are able to acquire opportunities to create new technologies through such transfers.
Previous work has been studied to find the factors needed to predict technology transfers. In addition, models that only consider either quantitative elements or technical content have been proposed. However, these studies have not focused on a technology transfer prediction model that considers both of them. In this study, we proposed a methodology for predicting technology transfer to enable more effective technology transfers. LDA was used to take into account the technical contents of the collected patent data. Its results were used as variables for the proposed model to represent patent technologies. Also, quantitative factors of the patents were extracted. The technology transfer prediction model based on both the result of the LDA and the quantitative patent variables was finally produced using the AdaBoost algorithm, which is a representative ensemble method. As a result, it was confirmed that the accuracy, sensitivity, and specificity of the proposed model were superior to those of the other methods we compared.
Through the outcome of our study, we were able to predict technology transfer, and the following advantages are expected. There are differences in the quantitative elements of patents in each technology area. Therefore, there is a limitation in that it is challenging to generalize when attempting a technology transfer prediction using only quantitative patent factors. The proposed model is able to reflect the technology field, and it is also sufficient to cover the difference of quantitative factors existing in each technical area as information on technology is included. Also, it is expected that the result of the technology transfer prediction can be taken into consideration when evaluating the value of a technology.