Patent Data Analytics for Technology Forecasting of the Railway Main Transformer

: The railway main transformer is considered one of the most important electrical equipment for trains. Companies and research institutes around the world are striving to develop high-performance railway main transformers. In order to be the ﬁrst mover for railway main transformer technology, companies and research institutes should predict vacant technology based on the analysis of promising detailed technology areas. Therefore, in this study, a patent analysis to predict vacant technologies based on identiﬁed promising IPC technology areas is provided. In order to identify promising detailed IPC technology areas, the technology mapping analysis, the time series analysis, and the social network analysis are conducted based on the patent-IPC matrix, extracted from the data information of 707 patents from the patent database of Korea, China, Japan, United States, Canada


Introduction
The railway main transformer is one of the heaviest and bulkiest of a train's electrical equipment.To meet the requirements for high-speed rail, passenger comfort, open space for passengers, and a heightened level of supplemental power, the railway main transformer should have technological innovations such as high power, high efficiency, and reduced weight and volume [1,2].In recent years, many companies and research institutes all over the world are investing in the development of railway main transformer technology.To be competitive in the market, we need more advanced technology than ordinary technology [3,4].In these circumstances, due to factors such as increased efficiency and reduced costs, more creative and progressive transformer technologies will help to improve railway competitiveness in the future market.Considering the current market situation in which the technological importance of railway main transformers is increasing, it is required to secure technological competitiveness through investment by companies for the development of technology.Therefore, it is necessary to strategically establish an investment direction Technology forecasting through patent analysis is a method that is widely used in establishing future strategies for technology development [5].Patents can provide a variety of valuable information about researched and developed technologies, including indicators of technology trends [6,7].In addition, the patent includes information on the applicant who applied for the patent and the period and date of application, which can be used to identify competitors in the market through technology trend analysis [8].Patent analysis to predict vacant technology based on promising IPC technology areas can be useful in establishing technology development strategies for targeting markets and promising technologies in railway main transformers.However, previous research literature mainly focused on qualitative methods through expert evaluations and opinions, and they did not consider the quantitative patent analysis and technology trends at the same time.The qualitative evaluation method is a disadvantage in that a lot of time and costs are incurred.
To overcome the above limitations, this study aims to provide an analysis framework for simultaneously performing technology trend analysis and predicting vacant technology based on promising detailed IPC technology areas identified through quantitative patent analysis.Through this, in the current situation where many companies and research institutes are actively developing railway main transformers, we intend to contribute to providing an analysis framework for R&D strategy development based on patent data by predicting vacant technology based on promising detailed technology areas.
The analysis process of the paper is as follows.First, patent analysis in railway main transformer technology is obtained to identify the result of technology trend analysis.Then, to identify promising detailed IPC technology domains, we obtain technology mapping analysis, time series analysis, and social network analysis results based on the patent-IPC matrix extracted from patent data information.Moreover, through GTM analysis of promising detailed IPC technology areas, vacant technology node and analysis target nodes surrounding the vacant technology node to predict vacant technology are obtained.Lastly, we predict the groups of vacant technology using patent information contained in each node of the obtained analysis target nodes.Furthermore, we discuss the vacant technology of railway main transformers based on our analysis results.The composition of the sectional papers below is as follows.First, a prior literature review related to this study is provided in Section 2.Then, the analysis methodology used in this study is presented in Section 3, and the analysis results are presented in Section 4. Finally, Sections 5 and 6 conclude the paper by presenting the contributions and limitations through discussion and conclusions on the analysis results.

Literature Review
Patents are a valuable source of technological information and a measure of the research and development activities of companies and research institutes around the world [9].The World Intellectual Property Organization (WIPO), an international organization that discusses policies and institutions related to intellectual property, describes patent applications as a process to protect inventions internationally [10].Patent analysis is a valuable method for gathering information about new technologies and for measuring the impact of research and development activities [9].Intellectual property (IP) rights are a form of monopoly rights and are considered a key strategic resource for businesses to improve their overall competitiveness.As carriers of intellectual property, patents allow organizations to use and exploit their invention in a unique way [11].Patents are guaranteed authority for 20 years from the date of filing.This means that a company can hold onto the specific technology for a long time [12].Patent data provide valuable information about technology, geography, technology, and applicants [13].Moreover, patent data provide a valuable source of information about the latest technological developments and potential business opportunities [9].Therefore, it is important to further explore the technology in real or potential applications by analyzing the related patents [14].
Patent information consists of unstructured and structured data [15].Patent data are organized in structure, which makes them easy to find and access specific information [9].Patent analysis is a common method used to evaluate technological innovations and provide insights into their development status [16].In addition, patent analysis is an important method for technology monitoring because it can show how innovation is developing in a specific sector [17].Patent analysis can trace the emergence of technology from the past to identify recent technological changes [18].By analyzing patent information recorded in patent documents, it is possible to identify competitors related to technology by identifying trends in technology and identifying promising detailed technology areas [19].Patents can be analyzed by extracting the number of applications and inventors corresponding to the IPC class as time series data using the IPC class, explaining the detailed technology area of the patent, and measuring the technology of the patent using the number of family size and the citations [20].
The prior academic literature discussed analysis methods using patent documents to visualize valuable information in the market, and through this, it intended to understand the status of R&D of competitors.In addition, technology trends analysis can be used to establish R&D plans by predicting promising technologies through understanding them [21].Moreover, previous patent research has used text information, such as titles and abstracts, to analyze patents.For example, previous patent research extracted some keywords to analyze development patterns [22][23][24][25][26] and create technology maps [27][28][29][30][31][32].In addition, Noh et al. extracted the term frequency-inverse document frequency (TF-IDF) using keywords and IPCs included in the patent information for patent analysis [33].Traditional keyword-based patent analysis cannot always understand the contextual meaning of words in a broad technology sector [34].To overcome the limitations of existing analysis methods, our analysis is performed based on the IPC of the patent.The IPC standard taxonomy is used to sort, organize, classify, determine, and search patent documents [35].IPC (International Patent Classification) is the name of a field in patent documents that was originally established by the Strasbourg Agreement of 1971.IPC information is currently available from the WIPO (World Intellectual Property Organization).The ninth edition of the IPC report was released on 1 January 2009.The IPC structure is well designed and has a high level of public confidence.The classification has a five-level hierarchical structure, with subsections at the top, classes at the middle, subclasses below classes, groups below subclasses, and subgroups below groups.The hierarchy contains 8 sections, 120 classes, 600 subclasses, and around 70,000 groups.Each group can be further broken down into main groups and subgroups.There are main groups in the population of 7000, while the remaining 63,000 individuals are subgroups.The following sentences are examples of an IPC code hierarchical structure ("H01L 27/18").The section "H" represents a section of the code, and the class "01" represents a class of the code.The subclass "L" represents a subclass of the class, and the main group "27" represents the main part of the code.The subgroup "18" represents a subpart of the main group.IPC codes can be thought of as a subject label regarding the content of a patent document.IPC codes are used to enrich the retrieval method for patent documents [21,[30][31][32]34,36].
Patents provide information about relationships between different technologies, which can help companies to choose the most effective technologies for their needs.This information can also help firms make better decisions about which technologies to develop or use.Technology forecasting will help companies make informed decisions, prioritize R&D departments, and leverage knowledge [37].Previous literature used qualitative and quantitative analysis methods to predict promising technology using patent data.Methods of quantitative analysis were included" "(A) Time Series Analysis (TSA)" and "(B) Social Network Analysis (SNA)".First, through "(A) Time Series Analysis (TSA)", the frequency of patent applications by technology was extracted as time series data to predict promising future technology [21,24,[28][29][30][31]34,[38][39][40][41][42][43][44].Then, through "(B) Social Network Analysis (SNA)", the relationship between nodes as a quantitative indicator through centrality indices was extracted, and promising detailed descriptive areas based on the extracted indicators were predicted [22,[24][25][26][27][28][29][30][31][32]34,39,40,43,[45][46][47][48][49][50].In addition to the quantitative analysis methods, the qualitative analysis method of "(C) Technology Mapping Analysis (TM)" was also used in previous studies.Through "(C) Technology Mapping Analysis (TM)", the importance of the level of technology for each area was obtained, and priorities for technology development were identified [21,24,[29][30][31]38,40,[45][46][47][48][49][50][51].In addition to predicting promising technology, previous studies have identified vacant technology by analyzing patent information of analysis target nodes through "(D) Generative Topographic Mapping Analysis (GTM)", identifying empty technology [24,27,28,40,52].Table 1 summarizes the prior literature using patent analysis to forecast the technology.The methodology is the use of the alphabet mentioned above.In addition, recent research has been conducted in technology analysis and prediction for railways using patent data.Xiao analyzed patent applications related to railway control system technology and summarized the status of various train control system technologies, but the analysis needs to obtain the direction of innovation through various analysis methods [59].Du et al. attempted to analyze the market by analyzing the invention patents filed by Thales SEC Transportation in the field of train signal control technology, but there is a limitation in that it only identified trends [53].Liu and Yang have attempted to explore the technological innovation capabilities (TICs) of high-speed railways (HSRs) using patent information, but there is a limit to the limited use of patent information [51].Peng and Yu accessed patent data from the patent information platform of China's Intellectual Property Network and calculated the quantity and composition of patents held by China's two railway systems, but the country to be analyzed has a limitation of bias [55].Sun et al. look at how China's high-speed railway (HSR) construction has led to knowledge ripple effects.As a result, by using patent information, Sun wanted to discover that HSR stimulated the growth of new businesses and patent applications, but there was a limit in considering only the single impact of variables [41].Hanley et al. attempted to understand the effect of high-speed railways on patent cooperation between Chinese urban class companies, but they were used for analysis using only the simple frequency of patent applications [42].Zhang and Zhang conducted research on the patent activities of vibration reduction control technology in the field of high-speed railway vehicle systems applied in China, but only a bibliographic analysis was conducted [58].By mining patent data on magnetic levitation (linear motor car) transportation technology, Gou analyzed patents to understand global competitive trends in magnetic levitation transport technology, but only analyzed trends [56].Zhang and Zhang analyzed patents from the railway industry in the international patent database produced by the EPO (European Patent Office), but there was a limitation in that the variables used in the analysis were limited [57].Cho et al. researched to obtain R&D strategies through technology prediction through patent analysis based on patent data to predict vacant technology in electric motor technology applied to railway vehicles, but only through qualitative analysis [52].Karasev et al. researched the technology trend of foreign railway using bibliometric patent analysis, but there is a limitation in that it is limited to bibliographic analysis [54].Moreover, Salmi and Torkkeli researched the patent trend of utilizing satellite navigation systems in the railway industry, but there is a limitation in that it is limited to bibliographic analysis [60].
To meet the requirements for high-speed rail, passenger comfort, open space for passengers, and a heightened level of supplemental power, the railway main transformer should have technological innovations such as high power, high efficiency, and reduced weight and volume [1,2].Based on the above, the analysis of patents related to railway main transformer technologies is essential for technological innovation.Nevertheless, the topic of railway main transformer technology has not yet been researched.Therefore, forecasting vacant technology based on promising detailed technology areas through patent analysis under the theme of railway main transformers is a suitable topic for research.Furthermore, the existing study is lacking in-depth and various approaches to the detailed railway technologies sector.To overcome the above problem, including the existing patent analysis methods and indicators, other in-depth and various approaches, such as patent-IPC matrix extraction, technology mapping analysis, time series analysis, social network analysis, generative topographic mapping, and so on, would improve this research more comprehensively.In this study, we will discuss vacant technology in the railway main transformer through technology trend analysis using patent data and predict vacant technology based on promising detailed IPC technology areas identified through quantitative patent analysis.

Research Framework
Figure 1 show that patent analysis is conducted in three stages to identify vacant technology of railway main transformer that is predicted based on promising detailed IPC technology areas identified through quantitative patent analysis.
network analysis, generative topographic mapping, and so on, would improve this research more comprehensively.In this study, we will discuss vacant technology in the railway main transformer through technology trend analysis using patent data and predict vacant technology based on promising detailed IPC technology areas identified through quantitative patent analysis.

Research Framework
Figure 1 show that patent analysis is conducted in three stages to identify vacant technology of railway main transformer that is predicted based on promising detailed IPC technology areas identified through quantitative patent analysis.First, we collected patent data to be analyzed for main transformer technology based on each specific search keyword extracted from the opinions of railway experts.If collected patent data was not related to the railway main transformer, it could not be included in the analysis.Then, the unstructured data of the collected patent data was preprocessed to enable effective data analysis [61,62].Since raw unstructured data includes unnecessary noise in the analysis, unstructured data that was not related to the data analysis was removed through preprocessing, and then valid unstructured data was extracted [63].In addition, we obtained a result of technology trend analysis from the collected patent data to understand technology trends for railway main transformer.
Secondly, the patent-IPC matrix was extracted from patent data by text mining.Then, the results of technology mapping, time series, and social network analysis were derived to identify promising detailed IPC technology areas [64].
Lastly, based on the obtained promising detailed IPC technology areas, the study forecasted vacant technology.Generative topographic mapping (GTM) was used to predict vacant technology [65,66].Through GTM analysis of promising detailed IPC technology areas, vacant technology nodes, and analysis target nodes, surrounding vacant technology nodes to predict vacant technology were obtained.To predict vacant technology, we analyzed patent data containing each analysis target node.Then, based on the results of the patent analysis, the vacant technology of the railway transformer was discussed.Table 2 shows the position of our analysis method in the relevant prior literature about First, we collected patent data to be analyzed for main transformer technology based on each specific search keyword extracted from the opinions of railway experts.If collected patent data was not related to the railway main transformer, it could not be included in the analysis.Then, the unstructured data of the collected patent data was preprocessed to enable effective data analysis [61,62].Since raw unstructured data includes unnecessary noise in the analysis, unstructured data that was not related to the data analysis was removed through preprocessing, and then valid unstructured data was extracted [63].In addition, we obtained a result of technology trend analysis from the collected patent data to understand technology trends for railway main transformer.
Secondly, the patent-IPC matrix was extracted from patent data by text mining.Then, the results of technology mapping, time series, and social network analysis were derived to identify promising detailed IPC technology areas [64].
Lastly, based on the obtained promising detailed IPC technology areas, the study forecasted vacant technology.Generative topographic mapping (GTM) was used to predict vacant technology [65,66].Through GTM analysis of promising detailed IPC technology areas, vacant technology nodes, and analysis target nodes, surrounding vacant technology nodes to predict vacant technology were obtained.To predict vacant technology, we analyzed patent data containing each analysis target node.Then, based on the results of the patent analysis, the vacant technology of the railway transformer was discussed.Table 2 shows the position of our analysis method in the relevant prior literature about technology forecasting in the railway field.Our study provides a diversified analysis framework compared to previous literature.

Preprocessing and Patent-IPC Matrix Extraction
Patent-IPC matrix is a matrix representation of the frequency of each IPC code that appears in each patent data.The frequency of the IPC code appearing in each data is expressed as a matrix value, as demonstrated in the example shown in Figure 2. Patent-IPC matrix can be quantified to compare the corresponding IPC codes between patent data, and promising detailed IPC technology areas can be derived by analyzing the various indicators of patent data corresponding to each IPC code.Text data must be preprocessed before analysis to improve the accuracy of the analysis [67].Even in patent legal documents with many professional legal terms and abbreviations, preprocessing work is required [68].
Preprocessing of text data is a process of converting text data from patent documents into a format suitable for analysis by cleaning text and removing unnecessary words [69].IPC information of patent data specifying detailed technology may contain punctuation, special symbols, and other details that must be cleaned up before analysis.To extract IPC information for analysis, it is possible to build a patent-IPC matrix that can confirm the IPC included in the patent document by preprocessing the patent IPC data.For example, if the IPC code contained in the patent data is extracted as "H01L 27/18", remove the unnecessary special symbol (", /) to preprocess it and obtain it into the top four digits (H01L) corresponding to the section (H), class (01), and subclass (L) of the code.This process was carried out for the IPC data specifying detailed technology.We extracted keywords using the DTM function of R programming and converted them into a patent-IPC matrix, as shown in Figure 2. technology forecasting in the railway field.Our study provides a diversified analysis framework compared to previous literature.✓ Use the corresponding methodology.

Preprocessing and Patent-IPC matrix Extraction
Patent-IPC matrix is a matrix representation of the frequency of each IPC code that appears in each patent data.The frequency of the IPC code appearing in each data is expressed as a matrix value, as demonstrated in the example shown in Figure 2. Patent-IPC matrix can be quantified to compare the corresponding IPC codes between patent data, and promising detailed IPC technology areas can be derived by analyzing the various indicators of patent data corresponding to each IPC code.Text data must be preprocessed before analysis to improve the accuracy of the analysis [67].Even in patent legal documents with many professional legal terms and abbreviations, preprocessing work is required [68].Preprocessing of text data is a process of converting text data from patent documents into a format suitable for analysis by cleaning text and removing unnecessary words [69].IPC information of patent data specifying detailed technology may contain punctuation, special symbols, and other details that must be cleaned up before analysis.To extract IPC information for analysis, it is possible to build a patent-IPC matrix that can confirm the IPC included in the patent document by preprocessing the patent IPC data.For example, if the IPC code contained in the patent data is extracted as "H01L 27/18", remove the unnecessary special symbol (", /) to preprocess it and obtain it into the top four digits (H01L) corresponding to the section (H), class (01), and subclass (L) of the code.This process was carried out for the IPC data specifying detailed technology.We extracted keywords using the DTM function of R programming and converted them into a patent-IPC matrix, as shown in Figure 2.

Technology Level Evaluation of IPC
Qualitative and quantitative analysis were used in combination to evaluate the technology level of the IPC.Qualitative evaluation is a method of evaluating the relative level of technology for information on patent data [70].We can identify the technology level by applying the technology mapping analysis [71].In addition, through quantitative evaluation, the indicator of centrality and time series were extracted from each detailed IPC technology area, and the results of the analysis were confirmed through social network analysis and time series analysis [72,73].

Technology Mapping Analysis
Qualitative analysis of the quality of technology and its market securing ability was conducted by measuring the PFS (patent family size) and CPP (cites per patent) values.The patent family refers to all patents related to an application patent when applied in a foreign country based on a patent filed in its own country [74][75][76].If an applicant wants to plan to enter the global market through technology, they apply for patents in various countries at the domestic and international patent bodies, so the high level of family patents shows high marketability [77][78][79][80].PFS is calculated as in Equation ( 1 CPP is a metric that quantifies how frequently per patent is cited [81].A higher number of citations indicates that the technology possesses a high level of qualitative value, and a low number of citations indicates that the technology has a low level of qualitative value [82].Higher levels of citations indicate that the patent is likely to be significant and may have value as an indicator of the qualitative quality of patents [83].CPP (cites per patent) is calculated as in Equation ( 2 In this study, to assess the technological level assessment of the per IPCs, we performed a technology level evaluation analysis through technology mapping analysis.There are four quadrants in which technology can be located: the 1st, 2nd, 3rd, and 4th.In each quadrant, different technology areas can be found [84].In the first quadrant, both PFS and CPP indicators are high, which means the level of the program is high.In the second and third quadrants, one of the PFS and CPP indicators is high, which means the level of the program is medium.Finally, the fourth quadrant is defined as low when all indicators are low.Figure 3 below is an example of a technology mapping analysis.The detailed IPC technology area corresponding to "IPC_4" corresponds to the first quadrant, the detailed IPC technology area corresponds to "IPC_1,2,3" corresponds to the second and third quadrants, and the detailed IPC technology area corresponds to "IPC_5".

Technology Level Evaluation of IPC
Qualitative and quantitative analysis were used in combination to evaluate the technology level of the IPC.Qualitative evaluation is a method of evaluating the relative level of technology for information on patent data [70].We can identify the technology level by applying the technology mapping analysis [71].In addition, through quantitative evaluation, the indicator of centrality and time series were extracted from each detailed IPC technology area, and the results of the analysis were confirmed through social network analysis and time series analysis [72,73].

Technology Mapping Analysis
Qualitative analysis of the quality of technology and its market securing ability was conducted by measuring the PFS (patent family size) and CPP (cites per patent) values.The patent family refers to all patents related to an application patent when applied in a foreign country based on a patent filed in its own country [74][75][76].If an applicant wants to plan to enter the global market through technology, they apply for patents in various countries at the domestic and international patent bodies, so the high level of family patents shows high marketability [77][78][79][80].PFS is calculated as in Equation ( 1): CPP is a metric that quantifies how frequently per patent is cited [81].A higher number of citations indicates that the technology possesses a high level of qualitative value, and a low number of citations indicates that the technology has a low level of qualitative value [82].Higher levels of citations indicate that the patent is likely to be significant and may have value as an indicator of the qualitative quality of patents [83].CPP (cites per patent) is calculated as in Equation ( 2): CPP = (2) In this study, to assess the technological level assessment of the per IPCs, we performed a technology level evaluation analysis through technology mapping analysis.There are four quadrants in which technology can be located: the 1st, 2nd, 3rd, and 4th.In each quadrant, different technology areas can be found [84].In the first quadrant, both PFS and CPP indicators are high, which means the level of the program is high.In the second and third quadrants, one of the PFS and CPP indicators is high, which means the level of the program is medium.Finally, the fourth quadrant is defined as low when all indicators are low.Figure 3 below is an example of a technology mapping analysis.The detailed IPC technology area corresponding to "IPC_4" corresponds to the first quadrant, the detailed IPC technology area corresponds to "IPC_1,2,3" corresponds to the second and third quadrants, and the detailed IPC technology area corresponds to "IPC_5".

Time Series Analysis
Time series analysis is a statistical analysis method based on time series data for forecasting technology trends [85].Time series data consist of an independent variable representing a time point and a dependent variable representing the frequency that can predict the future trend [86,87].Among the models for analyzing time series data, the ARMA model is used as a representative [88,89].ARIMA is a generalized model of the automatic regression moving average (ARMA) model.ARIMA model can describe the current and future time series values using past observations and errors.In this study, we predicted the future trend of each detailed IPC technology area through time series analysis using ARIMA models by converting the IPC frequency of appearance by year into time series data.As shown in Figure 4, the fields were predicted as hot, active, and cold according to the future trend for each detailed IPC technology area.

Time Series Analysis
Time series analysis is a statistical analysis method based on time series data for forecasting technology trends [85].Time series data consist of an independent variable representing a time point and a dependent variable representing the frequency that can predict the future trend [86,87].Among the models for analyzing time series data, the ARMA model is used as a representative [88,89].ARIMA is a generalized model of the automatic regression moving average (ARMA) model.ARIMA model can describe the current and future time series values using past observations and errors.In this study, we predicted the future trend of each detailed IPC technology area through time series analysis using ARIMA models by converting the IPC frequency of appearance by year into time series data.As shown in Figure 4, the fields were predicted as hot, active, and cold according to the future trend for each detailed IPC technology area.

Social Network Analysis
Social network analysis was performed using a correlation matrix based on the IPC classification [90].Matrix was used to identify common IPCs used in patent data and to analyze the correlations between them [91].The correlation between IPCs was analyzed, and relevant technology was identified by those with a high correlation [92].In Table 3, patent 1 contains references to IPC 1, 2, 3 and, as such, can be interpreted as containing associations between these technologies.Each patent IPCs is correlated in the patent-IPC matrix shown in Table 4. Therefore, SNA could be combined to obtain significant technological implications.

Social Network Analysis
Social network analysis was performed using a correlation matrix based on the IPC classification [90].Matrix was used to identify common IPCs used in patent data and to analyze the correlations between them [91].The correlation between IPCs was analyzed, and relevant technology was identified by those with a high correlation [92].In Table 3, patent 1 contains references to IPC 1, 2, 3 and, as such, can be interpreted as containing associations between these technologies.Each patent IPCs is correlated in the patent-IPC matrix shown in Table 4. Therefore, SNA could be combined to obtain significant technological implications.
In social network analysis, IPCs are represented as nodes, and their relationships are illustrated by the edges between them.This reveals the patterns and relationships between IPCs.SNA can be used to illustrate network structures and quantify the relationships between nodes through centrality indices [93].IPCs can be visualized as nodes in a network, with each node representing IPC and the connections between nodes representing the relationships between the IPCs [94].In this study, we could obtain IPC nodes with high centrality, and promising detailed technology areas could be predicted based on the connections between the nodes [95].Centrality indices can calculate "Degree", "Closeness", and "Betweenness" through the following equation [96].
Equation ( 4) is a formula that measures not only direct connections but also indirect connections by calculating the distances of all nodes connected to one node in a network: Equation ( 5) is a formula that measures how well a node acts as a hub; if the betweenness centrality is large, it is likely to affect the flow of connections between nodes in the network: g jk g jk (N i ) : Number o f paths that include node i, the shortest path f rom node j to k g jk : Shortest path between nodes j and k (5)

Identification of Vacant Technology through Generative Topographic Mapping (GTM)
Based on the derived promising detailed IPC technology area, we applied a method of patent-IPC matrix-based generative topographic mapping (GTM) to identify vacant technologies that may be of interest to researchers and investors.GTM-based patent mapping patents split into two groups based on how many distributions of promising detailed IPCs are in each patent document.Technology mapping analysis through generative topographic mapping (GTM) was constructed through a grid of square map size N × N [97].It does not have a specific rule to specify the map size of the map [98].If the map size is too small, you cannot obtain empty nodes; if it is too large, you can obtain too many empty nodes, making it difficult to analyze empty nodes [99].Therefore, in this study, we utilized a method of incrementally increasing the map size of a small size through a blanking method of selecting an appropriate map size [40].The patent mapping, obtained from the GTM analysis presented in Figure 5, consists of nodes with and without patents.Vacant technology identifies technology that is not patented, connecting non-patent nodes [100].To determine the corresponding vacant technology, patents in the vicinity of the nodes must be analyzed [101].Since there was a patent of the analysis target nodes surrounding vacant technology, the vacant technology was defined using the patent information of the target nodes.
nodes must be analyzed [101].Since there was a patent of the analysis target nodes surrounding vacant technology, the vacant technology was defined using the patent information of the target nodes.

Patent Data Collect
Patent data on railway main transformer technology were collected based on technology keywords extracted from the opinions of railway experts.Patent data of open application were collected over a period of years, covering a variety of the major patent applicant region.The data used to analyze patents were taken from applications filed from 1990 to 2020 issued by the six regions of Korea, China, Japan, the USA, Canada, and Europe.Because patent information is available after 18 months from open application, patent data after 18 months was therefore used in the analysis [102][103][104].Although it will not be used for patent analysis, data for 2021 and 2022 will also be extracted separately to identify technology trend analysis.To extract patent data, a patent search using technology keywords was performed using the platform's patent database search engine.Effective data was extracted by filtering, as the raw data contains unnecessary noisy data in the analysis [105].If the technology was not related to the railway main transformer, it could not be included in the study.The result of technology keywords extraction regarding railway main transformer technology was that the "Frame", "Core", "Coil", "Insulating Oil", and "Cooling Fan" keywords were extracted.Furthermore, for the results of the data extraction and filtering based on the technology keywords, a total of 707 effective data were extracted, and 67 data for technology trend analysis in 2021 and 2022 were extracted.

Technology Trend Analysis
Figure 6 shows the number of collected patents data from 1990 to 2020.Because patents are granted confidential status for up to 18 months at the request of the applicant [102,106], some of the patents filed in 2021 and 2022 are not reflected yet.Therefore, technology trend analysis was conducted on patent data for the accuracy of the study from 1990 to 2020.Railway main transformer technology-related patents have continuously increased since 1990; it can be classified around 2006 when patent applications in China rapidly increased.From 1990 to 2006, there was an average annual increase rate of 10.5%, and a continuous increase in patent applications according to the passing year.Then, since 2006, there was an average annual increase rate of 18.1% and a rapid increase in patent applications.In addition, more than 20 patents have been filed every year since 2006, except in 2007, when the application was temporarily slowed down.

Patent Data Collect and Technology Trend Analysis 4.1.1. Patent Data Collect
Patent data on railway main transformer technology were collected based on technology keywords extracted from the opinions of railway experts.Patent data of open application were collected over a period of years, covering a variety of the major patent applicant region.The data used to analyze patents were taken from applications filed from 1990 to 2020 issued by the six regions of Korea, China, Japan, the USA, Canada, and Europe.Because patent information is available after 18 months from open application, patent data after 18 months was therefore used in the analysis [102][103][104].Although it will not be used for patent analysis, data for 2021 and 2022 will also be extracted separately to identify technology trend analysis.To extract patent data, a patent search using technology keywords was performed using the platform's patent database search engine.Effective data was extracted by filtering, as the raw data contains unnecessary noisy data in the analysis [105].If the technology was not related to the railway main transformer, it could not be included in the study.The result of technology keywords extraction regarding railway main transformer technology was that the "Frame", "Core", "Coil", "Insulating Oil", and "Cooling Fan" keywords were extracted.Furthermore, for the results of the data extraction and filtering based on the technology keywords, a total of 707 effective data were extracted, and 67 data for technology trend analysis in 2021 and 2022 were extracted.

Technology Trend Analysis
Figure 6 shows the number of collected patents data from 1990 to 2020.Because patents are granted confidential status for up to 18 months at the request of the applicant [102,106], some of the patents filed in 2021 and 2022 are not reflected yet.Therefore, technology trend analysis was conducted on patent data for the accuracy of the study from 1990 to 2020.Railway main transformer technology-related patents have continuously increased since 1990; it can be classified around 2006 when patent applications in China rapidly increased.From 1990 to 2006, there was an average annual increase rate of 10.5%, and a continuous increase in patent applications according to the passing year.Then, since 2006, there was an average annual increase rate of 18.1% and a rapid increase in patent applications.In addition, more than 20 patents have been filed every year since 2006, except in 2007, when the application was temporarily slowed down.Moreover, Figure 6 shows the trend of patent application rates by region from 1990 to 2020.China, where about 48.38% of effective data were filled, is a large region for patent applications related to railway main transformer technology, and applicants from China are interested in recent technology.In 1990-2006, application activities in China were relatively minimal.However, since 2007, the number of patent applications has increased every year.The number of patent applications in Korea has increased slightly since 2007, with 6.38% of the effective data.The USA, where about 7.51% of effective data were filled, shows a continuously decreasing number of patent applications since the year 2010.In the case of Canada, which had a market share of about 0.57%, the overall application activity was insignificant.In addition, Japan had the highest patent application ratio between 1990 and 2006, but its effective data share continued to decline to 18.86%.Nevertheless, the continuous application was shown.Lastly, Europe, which is filled with about 18.3% of the effective data, has been found to have a consistently high number of patent applications since 1990 when the investigation began.
Table 5 shows the 10 most active applicants who applied for technical patents from 1990 to 2020.The companies listed in this table account for about 33.5% of the entire patents filed in technology; "CRRC Co., Ltd." and "Mitsubishi Electric Corp." are the two most prominent filing companies.As expected, companies in China are dominant in filing patents.Moreover, Figure 6 shows the trend of patent application rates by region from 1990 to 2020.China, where about 48.38% of effective data were filled, is a large region for patent applications related to railway main transformer technology, and applicants from China are interested in recent technology.In 1990-2006, application activities in China were relatively minimal.However, since 2007, the number of patent applications has increased every year.The number of patent applications in Korea has increased slightly since 2007, with 6.38% of the effective data.The USA, where about 7.51% of effective data were filled, shows a continuously decreasing number of patent applications since the year 2010.In the case of Canada, which had a market share of about 0.57%, the overall application activity was insignificant.In addition, Japan had the highest patent application ratio between 1990 and 2006, but its effective data share continued to decline to 18.86%.Nevertheless, the continuous application was shown.Lastly, Europe, which is filled with about 18.3% of the effective data, has been found to have a consistently high number of patent applications since 1990 when the investigation began.
Table 5 shows the 10 most active applicants who applied for technical patents from 1990 to 2020.The companies listed in this table account for about 33.5% of the entire patents filed in technology; "CRRC Co., Ltd." and "Mitsubishi Electric Corp." are the two most prominent filing companies.As expected, companies in China are dominant in filing patents.Figure 7 displays the patent filing patterns of the ten most active applicants and the number of patents filed by each company in the ten most active patent applicants.As expected, the frequency of patent applications by Chinese companies has been increasing recently.In the case of Japan, even though the market share of the Japanese patent is decreasing, Japanese companies are still showing high patent applications.
Sustainability 2023, 15, 278 13 of 26 Figure 7 displays the patent filing patterns of the ten most active applicants and the number of patents filed by each company in the ten most active patent applicants.As expected, the frequency of patent applications by Chinese companies has been increasing recently.In the case of Japan, even though the market share of the Japanese patent is decreasing, Japanese companies are still showing high patent applications.Finally, Table 6 shows the 10 applicants with the most active patent applications for the 233 technical patents filed over the past five years from 2018 to 2022.Patents are given confidential status for up to 18 months at the request of the applicant.Therefore, patent data in 2021 and 2022 are not reflected in the data and cannot produce accurate statistics.Nevertheless, the companies in the table are responsible for about 43% of the entire patents filed in technology, and "CRRC Co., Ltd." is the most prominent company filing."CRRC Co., Ltd." accounts for 15% of all patent applications.Chinese application patents account for 74.3% of all patents over the past five years.Moreover, it is noteworthy that new companies such as "Bombardier Inc." and "Zhuzhou Lince Group Co., Ltd" have emerged over the past five years.Finally, Table 6 shows the 10 applicants with the most patent applications for the 233 technical patents filed over the past five years from 2018 to 2022.Patents are given confidential status for up to 18 months at the request of the applicant.Therefore, patent data in 2021 and 2022 are not reflected in the data and cannot produce accurate statistics.Nevertheless, the companies in the table are responsible for about 43% of the entire patents filed in technology, and "CRRC Co., Ltd." is the most prominent company filing."CRRC Co., Ltd." accounts for 15% of all patent applications.Chinese application patents account for 74.3% of all patents over the past five years.Moreover, it is noteworthy that new companies such as "Bombardier Inc." and "Zhuzhou Lince Group Co., Ltd" have emerged over the past five years.

Technology Level Evaluation of IPC
In this study, the top four digits classes of the IPC code were used for technology classification, and railway main transformer technology was divided into 127 detailed IPC technology areas through the patent-IPC matrix.To identify promising detailed IPC technology areas, we conducted a qualitative and quantitative evaluation of social network analysis, technology mapping analysis, and time series analysis.Social network analysis confirms the linkage between IPCs.We analyzed the centrality indicators of "Betweenness", "Closeness", and "Degree", which are indicators that can be confirmed through social network analysis.The network analysis result is shown in Figure 8 and Table 7.

Technology Level Evaluation of IPC
In this study, the top four digits classes of the IPC code were used for technology classification, and railway main transformer technology was divided into 127 detailed IPC technology areas through the patent-IPC matrix.To identify promising detailed IPC technology areas, we conducted a qualitative and quantitative evaluation of social network analysis, technology mapping analysis, and time series analysis.Social network analysis confirms the linkage between IPCs.We analyzed the centrality indicators of "Betweenness", "Closeness", and "Degree", which are indicators that can be confirmed through social network analysis.The network analysis result is shown in Figure 8 and Table 7.
"H01T", "H02G", "H02S", "H03M", "H04N", "H05G" Each of the 127 detailed IPC technology areas of railway main transformer technology has a different technology qualitative level and market-securing ability.Therefore, we analyze the quality level of technology and the ability to secure markets by deriving PFS and CPP indices for each detailed IPC technology group through technology mapping analysis.The technology level according to the PFS and CPP indices is divided into three stages, and the relative position of the technology level by IPC is shown in Figure 9 and Table 8 through the technology mapping analysis.

Low
In this study, we analyzed future technology trends by detailed IPC technology through time series analysis.We identified the future potential by each 127 detailed IPC technology through the ARIMA model.In this analysis, we defined the parameters p, d, and q, which are variables related to the order, difference, and moving average of the model, using the R programming package via "auto.arima".Since patents are disclosed 18 months after filing, the number of patent applications by detailed IPC technology from 1990 to 2020 was extracted as time series data and used for time series analysis.The analysis results were classified into hot, active, and cold fields according to future trends, as shown in Table 9.

Cold
In this study, qualitative and quantitative analysis were used together to evaluate the technology level for each IPC to obtain promising detailed IPC technology areas.The level of technology for each detailed technology was analyzed based on the PFS index and CPP index derived from the technology mapping analysis, which is a qualitative analysis.In addition, through quantitative evaluation, the indicator of centrality and time series were extracted from each detailed IPC technology area, and the analysis results of the technology level were confirmed through social network analysis and time series analysis.The results of the three integrated analysis frameworks are summarized below, as shown in Table 10.As shown in the table below, the analysis results are different for each analysis procedure, so if all three analysis results are integrated, the results can be predicted as promising detailed IPC technology in the future.We would like to obtain the vacant technology through GTM analysis based on the obtained promising detailed IPC technology areas.
10. Results of the integrated analysis frameworks.

Qualitative Evaluation
Quantitative Evaluation Priority Technology Mapping Analysis

Forecasting of Vacant Technology in Each Group
The patent information of patents included in the analysis target nodes to be analyzed and identified through GTM analysis was investigated, and the vacant technology of each group in analysis target nodes was defined, as shown in Table 12.In predicting vacant technology topics, among patents filed in each group, it was selected based on technology keywords that can represent the group.The technology keyword is obtained from the keyword related to the technology of each group that shows the most frequency in the text data included in the patent for each group.Existing main transformers were one of the reasons for reducing the efficiency of electric energy consumption because they are one of the largest parts of the electrical equipment installed in railways.Therefore, railway main transformers should have technological innovations such as high power, high efficiency, and reduced weight and volume.Looking at the results of the obtained vacant technology in each group, overall, the focus is on lightening, reduction of maintenance costs, and reduction of noise.
Group 1 was identified as a blowerless technology in the main transformer.As a result of searching for technologies keywords through investigating the patent documents of Group 1, the five technology keywords were identified as "(1) fanless", "(2) blowerless", "(3) inverter", "(4) cooling", and "(5) nature".The analysis shows that the technology keyword will correspond to the fan field of the main transformer corresponding to "F04D" and "H01F" among the obtained promising detailed IPC technology areas.Blowerless technology that does not require a fan is a technology that provides necessary cooling through natural cooling without using a fan at the main transformer.Since the main transformer fan is so noisy, some high-speed railways do not operate the fan at the station,  Existing main transformers were one of the reasons for reducing the efficiency of electric energy consumption because they are one of the largest parts of the electrical equipment installed in railways.Therefore, railway main transformers should have technological innovations such as high power, high efficiency, and reduced weight and volume.Looking at the results of the obtained vacant technology in each group, overall, the focus is on lightening, reduction of maintenance costs, and reduction of noise.
Group 1 was identified as a blowerless technology in the main transformer.As a result of searching for technologies keywords through investigating the patent documents of Group 1, the five technology keywords were identified as "(1) fanless", "(2) blowerless", "(3) inverter", "(4) cooling", and "(5) nature".The analysis shows that the technology keyword will correspond to the fan field of the main transformer corresponding to "F04D" and "H01F" among the obtained promising detailed IPC technology areas.Blowerless technology that does not require a fan is a technology that provides necessary cooling through natural cooling without using a fan at the main transformer.Since the main transformer fan is so noisy, some high-speed railways do not operate the fan at the station, and the fan is turned after the train leaves the station.If a fan is not used, the main transformer is cooled using a natural cooling method, so the weight of the main transformer can be greatly reduced, and noise can be greatly reduced because no fan occupies most of the noise.The blowerless cooling method is a cost-effective cooling solution, and the technology is applied to electric multiple units, but this technology has not been applied to high-speed railways yet, so it needs to be developed through research.
Group 2 was identified as an oil-free technology in the main transformer.As a result of searching for technologies keywords through investigating the patent documents of Group 2, the five technologies keywords were identified as "(1) refrigerant", "(2) insulating", "(3) dry ", "(4) leak", and "(5) oilfree".The analysis shows that the technology keyword will correspond to the insulating oil field of the main transformer corresponding to "F04D", "H01B", and "H01F" among the obtained promising detailed IPC technology areas.Insulating oil occupies a lot of volume in the main transformer and causes weight gain.In addition, this technology is applied to electric multiple units, but this technology has not been applied to high-speed railways yet, so it needs to be developed through research.Therefore, it is necessary to develop an oil-free main transformer technology.This technology is expected to reduce the weight and volume of a railway because insulation oil is not required.
Group 3 was identified as a solid-state technology in the main transformer.As a result of searching for technologies keywords through investigating the patent documents of Group 3, the seven technologies keyword were identified as "(1) solidstate", "(2) highfrequency", "(3) semiconductor", "(4) converter", and "(5) modular", "(6) electronic", "(7) multilevel".The analysis shows that the technology keyword will correspond to the solid-state field of the main transformer corresponding to "H01B", "H01F", and "H02M" among the obtained promising detailed IPC technology areas.In accordance with the identified technology keyword, related technology development is currently underway around multilevel converter, AC/DC, and DC/AC power conversion technology.The railway main transformer should have technological innovations such as high power, high efficiency, and reduced weight and volume.Therefore, it is expected that the efficient application of solid-state technology would be possible to reduce the weight and volume of railway main transformers.It is also expected to increase railway efficiency while leading to the reduction of maintenance costs.

Discussion
The main implication of this study is to provide an analysis framework for simultaneously performing technology trend analysis and predicting vacant technology based on promising detailed IPC technology areas identified through quantitative patent analysis.Previous research literature mainly focused on qualitative methods through expert evaluations and opinions, and they did not consider the quantitative patent analysis and technology trends at the same time.
Therefore, to overcome the above limitations, in this study, firstly, a patent analysis in the railway main transformer technology was obtained to identify the result of technology trend analysis.Then, to identify promising detailed IPC technology domains, we obtained technology mapping analysis, time series analysis, and social network analysis results based on the patent-IPC matrix extracted from patent data information.Moreover, through GTM analysis of promising detailed IPC technology areas, a vacant technology node and analysis target nodes surrounding the vacant technology node to predict vacant technology were obtained.Lastly, we predicted the groups of vacant technology using patent information contained in each node of the obtained analysis target nodes.
Despite the advancement of the analysis methodology aspects mentioned above, the study has some limitations, and it is intended to supplement them with follow-up research.Patent information is available after 18 months from open application; therefore, there are limitations in using the recently applicated patent data for analysis.To overcome these limitations, we will establish an analysis framework that can further justify the validity of the results derived in this study through subsequent studies.The results of this study are expected to be widely used in follow-up studies to support decision-making by researchers and investors within the field of the railway main transformer and to apply analysis framework for the railway main transformer not only to the railway industry but also to various other equipment industries.In addition, we are expected to contribute to the establishment of a patent information-based analysis framework to improve technology competitiveness.

Conclusions
In this study, we researched vacant technology groups for forecasting future railway main transformer technology based on promising detailed IPC areas.In addition, the purpose of this study was to provide insight into the technology trend in railway main transformers, as well as an analysis framework that can be used to make patent-based informed decision-making.First, we attempted to identify the result of technology trend analysis through patent analysis in railway main transformer technology.Then, to identify promising detailed IPC technology domains, we obtained technology mapping analysis, time series analysis, and social network analysis results based on the patent-IPC matrix extracted from patent data information.Moreover, the results of GTM analysis of promising detailed IPC technology areas were obtained to maximize the possibility of identifying vacant technology groups.Lastly, we discussed the future technology of railway main transformers based on our analysis results.As a result of our study, technology trend analysis using patent data was performed, and we confirmed that patent applications in China have been increasing rapidly recently.Furthermore, through GTM analysis of promising detailed IPC technology areas, one vacant technology node and three analysis target nodes surrounding the vacant technology node to predict vacant technology were obtained.Then, we predicted the groups of vacant technology using patent information contained in each node of the obtained analysis target nodes.Through this process, the vacant technologies discovered by our patent analysis framework are (1) blowerless technology in main transformer; (2) oil-free technology in main transformer; and (3) solid-state technology in main transformer.
To verify whether our analysis results were reasonable, we compared the future technology roadmap of the related organization about the railway main transformer.Currently, related research is being conducted in related organizations for technology development, and as shown in Table 13, many future technology roadmaps have been established.First, the International Union of Railways (UIC) has established a future railway strategy for Europe.Second, the Railway Technical Research Institute (RTRI) aims to commercialize the technology within the next 10-20 years with the main goal of improving safety, improving convenience, and reducing costs through the roadmap derived from "Master plan Research 2020".Third, the Korea Railroad Research Institute (KRRI) analyzed the value-added effect, production inducement effect, and job creation effect in the railway transportation-related industry, and a technology roadmap with promising commercialization of technology development in the future was established.The above organizations established detailed goals for establishing future railway strategies as follows: (1) reduction of maintenance costs, (2) reduction of noise, (3) high power, (4) high efficiency, and (5) lightening.The above-detailed goals were found to be like the expectation effectiveness of the vacant technology forecasted results obtained from each group.reduce noise and volume.This technology is applied to electric multiple units, but it has not been applied to high-speed railways yet, so it needs to be developed through research.Therefore, we expect that research on applying this technology to main transformers of high-speed railways will be actively conducted in the future.Second, the group that applies oil-free technology in main transformers is expected to be in the spotlight.Since this technology does not need insulation oil, it is expected to reduce the volume and weight of main transformers.Moreover, this technology is applied to electric multiple units, but it has not been applied to high-speed railways yet, so it needs to be developed through research.Therefore, we expect that research on applying this technology to the main transformer will be actively conducted in the future.Third, the group that applies solid-state technology in main transformers is expected to be in the spotlight.The railway main transformer should have technological innovations such as high power, high efficiency, and reduced weight and volume.Therefore, it is expected that the efficient application of solid-state technology would make it possible to reduce the weight and volume of the railway main transformer.It is also expected to increase railway efficiency while leading to reduced maintenance costs.Therefore, we expect that research on applying this technology to the main transformer will be actively conducted in the future.
vacant technology group that has not yet been activated based on the promising detailed technology areas of the railway main transformer.

Figure 1 .
Figure 1.Research methods and processes.

Figure 1 .
Figure 1.Research methods and processes.

Figure 2 .
Figure 2. Examples of patent-IPC matrix extraction through patent data.Figure 2. Examples of patent-IPC matrix extraction through patent data.

Figure 2 .
Figure 2. Examples of patent-IPC matrix extraction through patent data.Figure 2. Examples of patent-IPC matrix extraction through patent data.
): PFS = Count of family patents on each detailed IPC technology area Number of patents on each detailed IPC technology area (1) ): CPP = Count of citing patents on each detailed IPC technology area Number of patents on each detailed IPC technology area (2)

Figure 3 .
Figure 3. Example of technology mapping analysis.

Figure 4 .
Figure 4. Example of time series analysis.

Figure 4 .
Figure 4. Example of time series analysis.

Figure 5 .
Figure 5.The above GTM-based patent map shows the distribution of patents; this information can be used to indicate where technology development has taken place.(a) Light blue nodes represent where patent application has occurred, dark blue nodes represent where patent apply has occurred frequently; (b) red nodes represent vacant technology, and yellow area represents nodes that need to be analyzed [40].

Figure 5 .
Figure 5.The above GTM-based patent map shows the distribution of patents; this information can be used to indicate where technology development has taken place.(a) Light blue nodes represent where patent application has occurred, dark blue nodes represent where patent apply has occurred frequently; (b) red nodes represent vacant technology, and yellow area represents nodes that need to be analyzed [40].

Figure 6 .
Figure 6.Number of patents according to each region.

Figure 6 .
Figure 6.Number of patents according to each region.

Figure 7 .
Figure 7. Number of railway main transformer technology patents filed each year by each company in the ten most active applicants.

Figure 7 .
Figure 7. Number of railway main transformer technology patents filed each year by each company in the ten most active applicants.

Figure 8 .
Figure 8. Result of social network analysis.

Figure 8 .
Figure 8. Result of social network analysis.

Figure 9 .
Figure 9. Result of technology mapping analysis.

Figure 9 .
Figure 9. Result of technology mapping analysis.

Figure 10 .
Figure 10.Results of GTM-based patent map based on promising detailed IPC technology areas.

Figure 10 .
Figure 10.Results of GTM-based patent map based on promising detailed IPC technology areas.

Table 1 .
Literature on using patent analysis to predict technology.

Table 2 .
Position of our analysis method in prior literature in the field of railways.

Table 2 .
Position of our analysis method in prior literature in the field of railways.

Table 3 .
Examples of patent-IPC matrix based on patent data.

Table 3 .
Examples of patent-IPC matrix based on patent data.

Table 4 .
Example of IPC classification-based correlation matrix.

Table 5 .
Top ten applicants filing patents in railway main transformer technology.

Table 5 .
Top ten applicants filing patents in railway main transformer technology.

Table 6 .
Top ten applicants filing patents in railway main transformer technology in the last five years.

Table 6 .
Top ten applicants filing patents in railway main transformer technology in the last five years.

Table 7 .
Social network analysis results.

Table 8 .
Normalization results of technology mapping analysis

Table 8 .
Normalization results of technology mapping analysis.

Table 9 .
Result of time series analysis.

Table 12 .
Forecasted vacant technology in each group.

Table 12 .
Forecasted vacant technology in each group.

Forecasted Vacant Technology Technology Keywords for Each Identified Group
1 Blowerless Technology in Main Transformer fanless, blowerless, inverter, cooling, nature 2 Oil-Free Technology in Main Transformer refrigerant, insulating, dry, leak, oilfree 3 Solid-State Technology in Main Transformer solid-state, highfrequency, semiconductor, converter, modular, electronic, multilevel