Identifying Technology Opportunities for Electric Motors of Railway Vehicles with Patent Analysis

: An electric motor is a device that changes electrical energy into mechanical energy for railway vehicles. When developing the electric motor, it used to be developed simply for structures or control methods of the motor itself without considering convergence with other devices or technologies. However, as the railway vehicles become more advanced, technology development through convergence with other devices or technologies is spreading. Therefore, based on patent data related to the electric motors applied to the railway vehicles, this research aims to carry out technical forecasting for establishing research and development (R and D) direction for new technologies by predicting vacant technologies from the point of view of technology convergence. In other words, we studied how to ﬁnd the vacant technologies in a ﬁeld of convergence technology for the electric motor of the railway vehicles by analyzing the patent data. More speciﬁcally, we search the patents data associated with the electric motor of the railway vehicle that contain multiple IPC codes, and use multiple IPC codes to determine the ﬁeld of convergence technology. In addition, we extract keywords from the patents data related to each of the determined convergence technologies and deﬁne the vacant technologies by interpreting the ﬁeld of convergence technology and the extracted


Introduction
Railway vehicles have advantages over other vehicles in that they can transport many passengers or goods quickly and can carry a long distance at a relatively low cost. The railway vehicles include a variety of devices or systems, of which electric motors are related to the operation of railway vehicles. The electric motors are very important device in the railway vehicles because they control acceleration and deceleration of the railway vehicles. Therefore, it is very important to predict the direction of technological development of the electric motors in order to predict the direction of technological development of the railway vehicles.
The electric motor areas of the railway vehicles comprise many complex devices or systems. As technology has developed and diversified, the importance of convergence technologies among these devices or systems has increased. In addition, the importance of convergence technologies among the technologies of railway vehicles and technologies from other fields has also increased [1,2]. Therefore, predicting vacant technologies for convergence technologies related to the electric motors is very important to lead the technology areas related to the railway vehicles.
In the past, the vacant technologies were forecasted based on expert opinions using Delphi, scenario methods, etc. [3]. However, the forecasting of the vacant technologies by the experts may involve subjective judgement by the expert, so there is a problem that the conclusions can vary depending on who forecasts them. To overcome this problem, various studies continue to forecast the vacant technologies using patent data [4].
Therefore, in this study, we intend to derive the R and D strategy in an objective and quantitative method by forecasting the vacant technology in the field of the convergence technology related to the electric motor of the railway vehicles using the patent data. The patent data containing multiple IPC codes are defined as the patent data related to convergence technology. That is, we define the field of the convergence technology where the vacant technology can be derived using the patent data containing multiple IPC codes. We then derive the vacant technology for each convergence technology field using keywords extracted from the patent data corresponding to the convergence technology field. Using these derived vacant technologies, the R and D strategy was established by forecasting the vacant technologies for each field of convergence technology related to the electric motor of the railway vehicles. Therefore, this study will allow anyone to derive the vacant technologies in the field of the convergence technology and establish an R and D strategy without using the experts. This paper is organized as follows: Section 2 reviews the case studies on patent analysis and patent map. Section 3 focuses on the overall research framework and specific methods for each step. Section 4 presents the results of analyzing the patent data, and Section 5 presents the conclusions of the results.

Literature Review
The electric motor areas of the railway vehicle can be defined as comprehensive engineering that requires many complex systems. Therefore, companies and research institutions have struggled to establish the direction of R and D because their survival in industry depends on the decision making. Therefore, previous studies attempted to explore appropriate new ideas that could contribute to the research direction [5,6]. However, there are limitations in that the studies have been conducted with qualitative approaches or a specific country aspect. The qualitative approaches can derive various interpretation for different researchers because the level of experience or knowledge vary depending on the researchers. Thus, results can be derived in various ways, depending on how the researcher determines. Studies as an aspect of specific country have limitations in that the technical trend cannot be reflected on the practical R and D planning because the research cannot cover the overall trend of global industry. Therefore, analysis from a global perspective based on a quantitative approach should be performed.
Patent documents contain textual information and bibliographic information [7]. The textual information (i.e., claims, description, etc.) have been employed to identify technological details of specific patent. In many previous studies, text mining has been attempted to extract significant keywords from patent documents, and a keyword vector have been generated based on the extracted keywords [4]. The vector can be substituted for the description of the characteristics of the patent. The bibliographic information (i.e., classifications, patent number, application number, etc.) have been employed to speculate technological trends based on statistical analysis [8]. Patents must be classified by international patent classification (IPC) code that represent the technology fields of each patent. If a patent is classified to various IPC codes, it is possible to interpret that there are some technological associations. A high number of patents classified by an IPC code have regarded high interest in the technology represented by this IPC code.
The concept of patent map has been employed to extract technical information and managerial implication by analyzing, classifying, and arranging the raw patent data. The patent maps relying on simple statistical tools have some limitations in explanatory capacity and operational efficiency due to the complexity of the data [9,10]. As machine learningbased approaches have been employed in analyzing patent data to develop patent map that can objectively detect patent vacuums [8]. The patent vacuums are considered vacant fields of technology with high potential to emerge in the near future.
Various researchers, therefore, applied machine learning techniques (i.e., principal component analysis (PCA), self-organizing map (SOM), and generative topographic mapping (GTM)) in developing the patent maps [11][12][13]. Machine learning-based patent maps can visualize the data space informatively by reducing the multi-dimensional data to two or three dimensions. The PCA-based patent map project the patent data on the two-dimensional space by detecting significant principal components based on linear combinations of the multi-variables [11,14]. The principal components can account for the most significant variations within the database. However, the PCA-based patent maps have some limitations in that they are difficult to automatically define the patent vacuums because there are no rules to define outliers [13].
SOM-based patent maps can locate patent information into discrete nodes by visualizing data as neurons from multi-dimensional neurons into two-dimensional neurons [15,16]. Visualization with color contrast were applied to show the similarities and differences between nodes. Therefore, researchers can detect the location of patent vacuums according to color scale in the SOM-based patent map. Since a great deal of time was required to extract significant and valuable idea for practical R and D planning, this process to detect patent vacuums have been regarded as limitations of the SOM-based patent map.
GTM-based patent map can project multi-dimensional data space into low-dimensional latent space and vice versa, and patent documents can also be located into the discrete distribution. More importantly, vectors for patent vacuums in the map can be estimated due to the inverse mapping algorithm based on Bayes' theorem [17]. Therefore, GTM-based patent map can overcome the aforementioned limitations because it can automatically detect patent vacuums and interpret them in an objective way. According to literatures, the GTM-based patent map is a suitable method for quantitatively generating patent maps in that it does not require manual investigation by researchers for interpretation of a patent vacuums [13].
Therefore, this study employed GTM algorithm to develop the patent map. As aforementioned, GTM-based patent map can automatically detect patent vacuum area. Despite some advantages of machine learning-based algorithm, the result has been inadequate for researchers to make decisions. In this regards, this study attempted to identify technical relationships in terms of technology convergence. Previous studies approached classification of patents. A patent can be characterized according to some IPC codes. Therefore, patent analysis based on the classification information has been conducted to predict future technological trends by analyzing the technological relationships based on social network analysis (SNA) and association rule mining (ARM).
The network according to SNA can be constructed using citation or classification information in the patent data. Frequently cited patents can be regarded as more important patents since the citation information can be qualitative indicator [18]. Patents that are frequently cited in subsequent patents are technically important because they imply that new technologies or related patents based on them may be filed. By analyzing the knowledge flow between the citing patent and the cited patent, it is possible to identify the relationship between technology or researchers [19].
However, the citation-based SNA has several inconvenience [20]. First, the citation network merely demonstrates individual connections between two particular patents, which interferes with understanding the overall relationship between all patents. Second, the network can be simply visualized, but there are difficulties in interpreting the internal relationships between patents or technical functions. Finally, the cited patents have an average time-lag of 10 years from the citing patent [21]. Therefore, the latest technical trends cannot be reflected properly because the latest patents do not have enough time to be cited.
The IPC co-occurrence, two or more IPC codes simultaneously assigned to a patent document, have been considered that there is relationship with technical knowledge [22,23]. For instance, the IPCs that occur frequently can be interpreted as existing the strength of a relationship. Furthermore, there is no time-lag between citing and cited patents, since patent number and classification information are registered at the same time. ARM was then applied to identify relationships from patent data according to co-occurrences [24]. ARM mines frequent IPCs that they have meaningful relationships. ARM can derive pattern of technical information in terms of technology convergence characteristics. However, the classic association rules have some difficulties in that the rules cannot reflect the proper relations. In order to address this limitation, weighted association rule mining (WARM) tools are proposed [25].
This research aims to explore the technology convergence in terms of vacant technology for the electric motors of the railway vehicles. Firstly, we develop IPC-based patent map by employing GTM algorithm that can automatically detect patent vacuum areas. Secondly, technical relationships for technology convergence can be identified by applying WARM to derive associated patterns that are estimated to emerge in the future. Therefore, we can obtain macroscopic information for promising technology convergence (i.e., IPCbased results). In order to grasp specific technical information, a text mining tool is finally applied to extract technical keywords at a micro level. These keywords can be defined as core technologies with high potential. Consequently, this study suggest that technological competitiveness can be reinforced by proposing technological areas with high potential for development through the analyzed promising technologies and providing relevant information to relevant parties in establishing R and D strategies. Figure 1 shows the research framework for finding the vacant technologies in the field of electric motors of the railway vehicles. First, we collect patents in the field of electric motors from patent databases of USPTO. The IPC information of the collected patent documents is extracted. Second, the IPC vector is analyzed to create a GTM-based patent map and derive the patent vacuum that the technology has. The derived patent map analyzes the IPC of the patent vacuum by applying the inverse function. Third, analyze the patent vacuum through inverse mapping and applied WARM to interpret the relationship between technologies. Finally, we analyze the significance of the vacant technique derived by applying the text mining technique. which interferes with understanding the overall relationship between all patents. Second, the network can be simply visualized, but there are difficulties in interpreting the internal relationships between patents or technical functions. Finally, the cited patents have an average time-lag of 10 years from the citing patent [21]. Therefore, the latest technical trends cannot be reflected properly because the latest patents do not have enough time to be cited. The IPC co-occurrence, two or more IPC codes simultaneously assigned to a patent document, have been considered that there is relationship with technical knowledge [22,23]. For instance, the IPCs that occur frequently can be interpreted as existing the strength of a relationship. Furthermore, there is no time-lag between citing and cited patents, since patent number and classification information are registered at the same time. ARM was then applied to identify relationships from patent data according to co-occurrences [24]. ARM mines frequent IPCs that they have meaningful relationships. ARM can derive pattern of technical information in terms of technology convergence characteristics. However, the classic association rules have some difficulties in that the rules cannot reflect the proper relations. In order to address this limitation, weighted association rule mining (WARM) tools are proposed [25].

Methodology
This research aims to explore the technology convergence in terms of vacant technology for the electric motors of the railway vehicles. Firstly, we develop IPC-based patent map by employing GTM algorithm that can automatically detect patent vacuum areas. Secondly, technical relationships for technology convergence can be identified by applying WARM to derive associated patterns that are estimated to emerge in the future. Therefore, we can obtain macroscopic information for promising technology convergence (i.e., IPC-based results). In order to grasp specific technical information, a text mining tool is finally applied to extract technical keywords at a micro level. These keywords can be defined as core technologies with high potential. Consequently, this study suggest that technological competitiveness can be reinforced by proposing technological areas with high potential for development through the analyzed promising technologies and providing relevant information to relevant parties in establishing R and D strategies. Figure 1 shows the research framework for finding the vacant technologies in the field of electric motors of the railway vehicles. First, we collect patents in the field of electric motors from patent databases of USPTO. The IPC information of the collected patent documents is extracted. Second, the IPC vector is analyzed to create a GTM-based patent map and derive the patent vacuum that the technology has. The derived patent map analyzes the IPC of the patent vacuum by applying the inverse function. Third, analyze the patent vacuum through inverse mapping and applied WARM to interpret the relationship between technologies. Finally, we analyze the significance of the vacant technique derived by applying the text mining technique.

Data Collection and Preprocessing
Many companies have patent databases in many countries to protect their rights regarding their technologies. The database with the most patents filed by many companies is the USPTO [26]. In this paper, we extracted the patent data of the electric motor technology by applying for USPTO patents.
The extracted patent consists of title, abstract, IPC, image, etc. The IPC code indicates the technology group to which the patent belongs. The IPC code consists of sections, classes, subclasses, main groups, and subgroups, and through the information, you can find out the technical information of the patents. In this paper, the IPCs of patent documents are extracted and used to analyze the field of the convergence technologies.

Generative Topographic Mapping
GTM is an algorithm developed by Bishop et al. (1998), a nonlinear mapping methodology using the characteristics of data that map data to a lower dimension in multidimensional space [17]. GTM utilizes latent variables to model various correlations between many variables in the input data [27]. Each grid expressed as a latent variable displays data, and blank grids are automatically significant as empty spaces. It is a methodology that overcomes the limitations of PCA that require qualitative detection and definition of blank areas and SOM that need to be interpreted qualitatively for blank areas. In particular, GTM, which features nonlinearity, can identify the relationship between potential and observational variables through inverse mapping [13]. The basic principle of GTM is to dimension data in multi-dimensional space by projecting it in a low-dimensional potential grid of nodes of the size of k × k. Shown in Figure 2, nodes are represented by radial basis functions (RBFs) in a grid of low-dimensional potentials, where RBFs included in the function y(x; W) that connects the two spaces.

Data Collection and Preprocessing
Many companies have patent databases in many countries to protect their rights regarding their technologies. The database with the most patents filed by many companies is the USPTO [26]. In this paper, we extracted the patent data of the electric motor technology by applying for USPTO patents.
The extracted patent consists of title, abstract, IPC, image, etc. The IPC code indicates the technology group to which the patent belongs. The IPC code consists of sections, classes, subclasses, main groups, and subgroups, and through the information, you can find out the technical information of the patents. In this paper, the IPCs of patent documents are extracted and used to analyze the field of the convergence technologies.

Generative Topographic Mapping
GTM is an algorithm developed by Bishop et al. (1998), a nonlinear mapping methodology using the characteristics of data that map data to a lower dimension in multidimensional space [17]. GTM utilizes latent variables to model various correlations between many variables in the input data [27]. Each grid expressed as a latent variable displays data, and blank grids are automatically significant as empty spaces. It is a methodology that overcomes the limitations of PCA that require qualitative detection and definition of blank areas and SOM that need to be interpreted qualitatively for blank areas. In particular, GTM, which features nonlinearity, can identify the relationship between potential and observational variables through inverse mapping [13]. The basic principle of GTM is to dimension data in multi-dimensional space by projecting it in a low-dimensional potential grid of nodes of the size of k × k. Shown in Figure 2, nodes are represented by radial basis functions (RBFs) in a grid of low-dimensional potentials, where RBFs included in the function y ; that connects the two spaces. The parameters weight and noise are estimated through the EM algorithm. The EM algorithm consists of two stages, an E-step and an M-step. In the E-step, the initial values of the two parameters are used to calculate the responsibilities of the Gaussian mixed model. Reliability allows the calculation of the post-probability distribution between potential variables and data. In the M-step, the reliability function is updated to update the parameters. The error function is used as the objective function to update, where the updated direction reflects the characteristics of the data.
Finally, for each M mapped on GTM, the probability of M present in node K, that is, the probability matrix R (M, K) that provides the reflectivity of node K for molecule M, is calculated. This matrix can be used to identify the characteristics of each node and to visualize the GTM [28].
In this paper, we used IPC code extracted from patent data to generate the content-IPC Matrix and then applied GTM. Each IPC code represents the nature of the vector, and the resulting IPC vector has an N dimension. The data mapped to a potential two-dimensional space, and as a result, the nodes in the mapped map are joined together by patents on a similar topic based on IPC, enabling blank technical analysis through nodes on the map. The parameters weight and noise are estimated through the EM algorithm. The EM algorithm consists of two stages, an E-step and an M-step. In the E-step, the initial values of the two parameters are used to calculate the responsibilities of the Gaussian mixed model. Reliability allows the calculation of the post-probability distribution between potential variables and data. In the M-step, the reliability function is updated to update the parameters. The error function is used as the objective function to update, where the updated direction reflects the characteristics of the data.
Finally, for each M mapped on GTM, the probability of M present in node K, that is, the probability matrix R (M, K) that provides the reflectivity of node K for molecule M, is calculated. This matrix can be used to identify the characteristics of each node and to visualize the GTM [28].
In this paper, we used IPC code extracted from patent data to generate the content-IPC Matrix and then applied GTM. Each IPC code represents the nature of the vector, and the resulting IPC vector has an N dimension. The data mapped to a potential two-dimensional space, and as a result, the nodes in the mapped map are joined together by patents on a similar topic based on IPC, enabling blank technical analysis through nodes on the map.

Weighted Association Rule Mining
The classic ARM has been considered one of the most useful mining tools that can discover patterns based on co-occurrence in the dataset [29]. A dataset contains numerous transactions that consist of a set of items. Association rule is an implication to find frequent item sets. In this research, we define a patent as a transaction, and the IPCs classifying a patent can be defined as items included in the transaction. As aforementioned, WARM is applied to improve the limitations [30]. The weight is assigned to each IPC by calculating the number of patents including a specific IPC. Transaction weight (TW) is calculated by dividing the total weight of the items by the item counts included in the transaction. The weighted itemset (WI) is calculated by dividing the summed weight of TW according to specific transaction by the total weight of the transactions. Therefore, we can identify the most associated IPC combinations that could be candidates, when planning practical R and D strategies.

Text Mining
In this research, core keywords were derived using TF-IDF from the text mining method. TF-IDF is an algorithm for extracting the core word of each document by weighting each term based on the term frequency and the inverse document frequency. In other words, TF-IDF is a method of comparing the term frequency in each document with the term frequency in the entire document and applying important weighting to each word to determine the importance of words [31]. Each t f d,t represents the frequency at which each term (t) appears in the document concerning all words appearing in the specific document (d), and the higher the t f d,t , the higher the importance in the document. d f t represents the frequency of the word appearing in the whole document (N), and the higher the d f t , the more frequently the word is used in all documents in general [32]. The TF-IDF algorithm is a method to reduce the importance of a word by determining that the word frequently appears in a particular document is not so important and determining d f t from the whole document (N), as shown in Equation (1) [33]. In other words, the TF-IDF applies weights to the words that appear frequently in all documents, and to the words that appear frequently in specific documents. Therefore, since the derived high-level words can be said to be the keywords for technologies that have not had many patent applications in each IPC technical field, the derived high-level words can be used as keywords that can define blank descriptions [34].

Data Collection and Preprocessing
In this research, we selected keywords related to 'electric motor technology of railway vehicles' through expert advice. Based on selected keywords, patents related to 'electric motor technology of railway vehicles' were collected through the USPTO from 1990 to 2020. A total of 672 valid patents were selected through noise filtering and used for analysis. The IPC included in each patent was classified as subclass units (four-digit), resulting in a total of 119 IPCs. In this research, IPC codes are used to specify the field of technology, so instead of the entire IPC code, we utilize four-digit IPC codes that can specify the field of technology. We calculated the frequency for each IPC and only 15 IPCs with greater frequency than average were extracted to generate the patent-IPC matrix, as shown in Table 1.

Detection of Patent Vacuum
The GTM algorithm was applied to create a patent map and to derive a patent blank area. To create a GTM-based content map, you need to determine the appropriate map size (K). If the size of the map is too large, the sparse map is derived, resulting in too many blank areas, and if the size is too small, the blank area may not be derived [15]. In this research, the K value was set from 5 to 10 to determine the optimal K value. Shown in Figure 3, the optimal K set to 6, and the experiment resulted in a total of 15 blank nodes. with greater frequency than average were extracted to generate the patent-IPC matrix, as shown in Table 1.

Detection of Patent Vacuum
The GTM algorithm was applied to create a patent map and to derive a patent blank area. To create a GTM-based content map, you need to determine the appropriate map size (K). If the size of the map is too large, the sparse map is derived, resulting in too many blank areas, and if the size is too small, the blank area may not be derived [15]. In this research, the K value was set from 5 to 10 to determine the optimal K value. Shown in Figure 3, the optimal K set to 6, and the experiment resulted in a total of 15 blank nodes. Through inverse mapping, GTM can express each patent space as a new vector space. The process of Inverse mapping is as shown in Figure 4. At this time, IPC with a probability value of 0.4 or higher for each IPC per node was determined to be the technical field associated with the blank technology of the patent. Through inverse mapping, GTM can express each patent space as a new vector space. The process of Inverse mapping is as shown in Figure 4. At this time, IPC with a probability value of 0.4 or higher for each IPC per node was determined to be the technical field associated with the blank technology of the patent.  The IPC for each patent space derived from the inverse mapping result is as shown in Table 2.  The IPC for each patent space derived from the inverse mapping result is as shown in Table 2.

Identifying Technology Convergence Field
Shown in Table 3, the technical field of blank technology derived from the GTM-based patent map can be defined as B60K, B60L, B61C, B61D, B61F, H01F, H02K, H02P, based on which WARM is applied, and the weighted support value is 0.006 or higher.

Investigation into Technology Opportunity
Reclassified patent data by Itemset derived by Weighted Association Rule Mining. Itemset1 selects patent data containing H02K and B60L by IPC code, Itemset2 selects patent data containing B61C and B60K by IPC code. Itemset3 selects patent data containing B61C, H02K, and B60L by IPC code, Itemset4 selects patent data containing B61F, H02K and B60L by IPC code. Itemset5 selects patent data containing B61F and B60L by IPC code, Itemset6 selects patent data containing B61F, B61C, and B60L by IPC code, Itemset7 selects patent data containing B61C and H02K by IPC code.
Text Mining was also applied based on the summary and claim information of the patent data sorted by Itemset to derive detailed technology keywords. Shown in Table 4, top 10 keywords by item derived as a result of text mining are as follows.

Results
The IPC code associated with vacant techniques derived from the GTM-based patent map are defined, as shown in Table A1 of Appendix A. Multiple IPC codes used to analyze the field of technology for each Itemset. Using keywords extracted by itemset, the vacant technology in the defined technology field was derived. In the process of deriving the vacant technology, the vacant technology was defined except for unrelated keywords.
Itemset 1 is a field of convergence technology related to H02K, B60L, and the top 10 keywords extracted are filter, transducer, suspension, valve, flashover, coil, chain, multivoltage, sensor, monitor. Of the IPC codes associated with Itemset 1, H02K is a technology for rotary-electrical machinery including electric motors, and B06L is a technology for electrical propulsion, power supply, charging, control, safety devices, and electrical braking in vehicles. Therefore, the technology related to Itemset 1 can be seen as a convergence technology for electrical propulsion, power supply, charging, control, safety devices, and electrical braking using the electric motors of the railway vehicles. Based on the keywords extracted concerning Itemset 1, the vacant technology can be seen as a monitoring technology such as filters, transducers, valves, coils, etc., using sensors. Therefore, the vacant technology derived from the IPC codes and keywords related to Item 1 is the monitoring technology such as filters, transducers, valves, coils, etc., using sensors in the field of technology related to electrical propulsion, power supply, charging, control, safety devices, and electrical braking using the electric motors of the railway vehicles.
Itemset 2 is a field of technology related to B61C, B60K, and the top 10 keywords extracted are rotor, highspeed, hollow, shaft, cooling, winding, yoke, linear, magnet, supply. IPC code B61C is a technology for power devices applied to railway vehicles and B60K is a technology for the deployment or control of power transfer devices in vehicles. Therefore, the technology related to Itemset 2 can be seen as a convergence of power devices applied to railway vehicles, etc., and power transmission devices transmitting power from power units. Based on the keywords extracted concerning Itemset 2, the vacant technology can be seen as a high-speed cooling technology of permanent magnet motors. Therefore, the vacant technology derived from the IPC codes and keywords related to Item 2 is the highspeed cooling technology of permanent magnet motors in the field of technology related to the power devices or the power transmission devices of the railway vehicles.
Itemset 3 is a field of technology related to B61C, H02K, and B60L. The top 10 keywords extracted are speed, supercapacitor, inverter, diesel, asynchronous, reducer, wiring, magnet, permanent, small. IB61C is a technology for power devices applied to the railway vehicles. H02K is a technology for rotary-electric machinery including the electric motors. B06L is a technology for electric propulsion, power supply, charging, control, safety devices, electrical braking in vehicles including railway vehicles. Therefore, the technology related to Itemset 3 can be seen as a convergence technology for electrical propulsion, power supply, charging, control, safety devices, and electrical braking using the electric motors of the railway vehicles. Based on the keywords extracted concerning Itemset 3, the vacant technology can be seen as a speed control technology of the permanent magnet motors and a minimization technology of the permanent magnet motors. Therefore, the vacant technology derived from the IPC codes and keywords related to Itemset 3 is the speed control technology of the permanent magnet motors and a minimization technology of the permanent magnet motors in the field of technology related to electrical propulsion, power supply, charging, control, safety devices, and electrical braking using the electric motors of the railway vehicles.
Itemset 4 is a field of technology related to B61F, H02K, and B60L. The top 10 keywords extracted are vehicle, lightweight, synchronous, rail, reduction, schematic, dieselelectric, copper, surface, aluminum. B61F is a technique for the suspensions, vehicle protection, and obstacle removal of the railway vehicles. H02K is a technology for rotary-electric machinery are including the electric motors, and B06L is a technology for electrical propulsion, power supply, charging, control, safety devices, and electrical braking in the railway vehicles. Therefore, the field of technology related to Itemset 4 can be seen as a convergence of the technology related to the suspension, vehicle protection, and obstacle removal of the railway vehicles and the control technology of the electric motors. Based on the keywords extracted concerning Itemset 4, the vacant technology can be seen as a lightening technology using the alternative materials, such as aluminum, for railway vehicles. Therefore, the vacant technology derived from the IPC codes and keywords related to Itemset 4 is the technology to lighten the suspensions, vehicle protection, and obstacle removal of the railway vehicles using the alternative materials, such as aluminum, to improve the performance of electrical propulsion, power supply, charging, control, safety devices, and electrical braking in the railway vehicles.
Itemset 5 is a field of technology related to B61F and B60L. The top 10 keywords extracted are alternation, aluminum, carrier, construction, frame, gate, intervals, legs, light, motor. B61F is a technology on the suspensions, vehicle protection, and obstacle removal of the railway vehicles. B06L is a technology for electric propulsion, power supply, charging, control, safety devices, electrical braking in the railway vehicles. Therefore, the field of technology related to Itemset 6 can be seen as a convergence of the technology related to the suspension, vehicle protection, and obstacle removal of the railway vehicles and the control technology of the electric motors. Based on the keywords extracted concerning Itemset 5, the vacant technology can be seen as a lightening technology using the alternative materials, such as aluminum, for railway vehicles. Therefore, the vacant technology derived from the IPC codes and keywords related to Itemset 5 is the technology to lighten the suspensions, vehicle protection, and obstacle removal of the railway vehicles using the alternative materials, such as aluminum, to improve the performance of electrical propulsion, power supply, charging, control, safety devices, and electrical braking in the railway vehicles.
Itemset 6 is a field of technology related to B61F, B61C, and B60L. The top 10 keywords extracted are levitation, track, induction, frame, bolster, arc, passive, connecting, vertical, running. B61F is a technique for the suspensions, vehicle protection, and obstacle removal of the railway vehicles. B61C is the technology for power devices applied to the railway vehicles, and B06L is the technology for electric propulsion, power supply, charging, control, safety devices, and electrical braking in the railway vehicles. The fields of technology analyzed through B61F, B61C, and B60 are analyzed as a convergence of technologies related to the suspension of the railway vehicles and the control technology of power devices applied to the railway vehicles. Based on the keywords extracted concerning Itemset 6, the technology of magnetic levitation train with power unit or the technology related to the surface or frame of the railway vehicle is analyzed as the vacant technology. Therefore, the vacant technology derived from the IPC codes and keywords related to Itemset 6 is the control technology of the magnetic levitation train with the electric motor or a technology related to the surface or the frame of the magnetic levitation train.
Itemset 7 is a field of technology related to B61C, H02K. The top 10 keywords extracted are damper, decompression, drawing, motor, spring, steep, synchronizer, track, trolley, wheel. B61C is a technology for power devices applied to the railway vehicles, and H02K is a technology for rotary-electric machinery including the electric motors. H02K contains the electric motor used in various devices, and B61C includes a power unit used in the railway vehicles. The technical fields commonly applied to H02K and B61C analyzed as to the electric motors fitted to other devices, including the railway vehicles. Therefore, Itemset 7 has analyzed a single technology related to electric motor, not a technology that combines motor-related technology with technology from other fields. The results were excluded without analysis of the keywords extracted because they did not conform to the subject matter of this research.

Conclusions
This research explored the patent data and looked for ways to derive the vacant technology in the field of convergence technology for the electric motor of the railway vehicles. Unlike other studies that analyzed vacant technologies alone, we derived vacant technologies in terms of convergence to derive future promising technologies that are further advanced. By applying the GTM methodology and WARM, we extract IPC codes and define the field of convergence technology us the extracted IPC codes. In addition, we extracted keywords from the patents related to each of the determined convergence technologies by using text mining. Lastly, we define the vacant technologies by interpreting the field of convergence technology and the extracted keywords.
Promising technologies defined through patent analysis are technologies that apply elastic materials to lighten/smaller vehicle frames and control objects and speed through monitoring. It also includes technology to applies permanent magnet motors, technology to transfer power at high speeds, and cooling technology. These promising technologies show that technologies related to small size, lightweight, permanent magnet, cooling, monitoring, low loss materials, and individual control can be core technology keywords in the future. This paper confirmed that the miniaturization and weight reduction of electric motors is a global trend. It is analyzed that it is appropriate to adopt a permanent magnetic motor rather than an inductive motor for compact/lightweight motor. In order to secure the safety and reliability of the vehicle, it is analyzed that R and D is also necessary to establish a system for real-time condition monitoring of electric motors. The field related to this technology is a field with a high potential for development as the vacant technologies. If you apply the analyzed results to derive new technologies, you can gain a competitive advantage in related fields. This study analyzes promising technologies in the future, provides information on those technologies, provides information to help in R and D strategy establishment, and provides insights to researchers, strategy makers, and decisionmakers. It can help you invent new technologies and plan your skills development.
However, the limitation of this research is that even in the field of technology that includes multiple IPC codes, if it contains different IPC codes related to it, it must be considered as a single technology, not a convergence technology. Nevertheless, this research is meaningful in that instead of relying on subjective judgments of experts to the vacant technology, IPC codes, and keywords extracted from patent data can be used to find the vacant technology in the field of convergence technology.
Author Contributions: Y.C. performed data curation, investigation, methodology, validation, and writing the original manuscript; Y.J.H. contributed to conceptualization, funding acquisition, investigation, and project administration; J.H. contributed to conceptualization, investigation, and formal analysis; J.Y. and S.K. both contributed to data curation and formal analysis; C.L. contributed to conceptualization, investigation, project administration, supervision, and editing the manuscript; S.L. and K.P.Y. contributed to funding acquisition and idea conception. All authors have read and agreed to the published version of the manuscript.