Discovering Technology Opportunity by Keyword-Based Patent Analysis: A Hybrid Approach of Morphology Analysis and USIT

: As innovative technology is being developed at an accelerated rate, the identiﬁcation of technology opportunities is especially critical for both companies and governments. Among various approaches to search for opportunities, one of the most frequently used is to discover technology opportunity from patent data. In line with it, this paper aims to propose a hybrid approach based on morphological analysis (MA) and uniﬁed structured inventive thinking (USIT) for technology opportunity discovery (TOD) through patent analysis using text mining and Word2Vec clustering analysis to explore the intrinsic links of innovation elements. A basic morphology matrix is constructed according to patent information and then is extended using the innovation algorithms that are reorganized from USIT. Technology opportunities are analyzed at two layers to generate new technical ideas. To illustrate the research process and validate its utility, this paper selects the technology of coalbed methane (CBM) extraction as a use case. This hybrid approach contributes by suggesting a semi-autonomous and systematic procedure to perform MA for TOD. By integrating the innovation algorithms, this approach improves the procedure of value extension in MA.


Introduction
Recently, technology opportunity discovery (TOD) has been emphasized as a core competitive factor for companies and governments, and dominating technology in advance can provide considerable competitive advantages [1,2]. Technology opportunity contains the potential for technological progress in general or within a specific field. To capture technological opportunities successfully, technology opportunity analysis, logical forecasting, or prediction is necessary. At the national level, technology forecasting is conducted in many developed countries to forecast science and technology policy and economic growth [3,4]. At the firm level, many large companies are striving to perform technology forecasting activities to prioritize research and development (R&D) and gain the pre-emption of emerging markets' lead position [5,6]. Thus, in both private and public spheres, many efforts have been made to discover technology opportunity.
Technological opportunity is defined as a set of possibilities or potential technological progress that can create value from new ideas within specific fields or over different industries [1]. TOD is thus the process of identifying technology characteristics and further deriving new valuable technology ideas. Scholars have studied TOD from three aspects: framework, methodology, and extension of One thing to note is that the objective of this research is neither to propose the most effective method of generating creative technology ideas nor to supersede conventional idea generation techniques. The approach rather serves as a heuristic process model, contributing to the process of extending traditional morphology matrix and further supporting the process of developing more new technology ideas in wider concept. The contribution of this study is threefold. First, a hybrid approach is suggested to extract the existing patent documents, which can provide a basic data source for MA and USIT, and extend the information in a systematic fashion. In the process, keywords are extracted from patent documents as the representative information to extract the morphology of patents, which constitute the values in the morphology matrix. Second, the combination for creating new ideas is performed from two layers, i.e., dimension and value, which is advantageous for discovering technology opportunities systematically. Thus, by combining USIT and patent analysis, the basic information extracted from patents can be transformed into that with novel and valuable features. This hybrid approach can improve the limitation of MA on extending and varying the information extracted from patent documents. Third, the application of the proposed approach in the technology of coalbed methane (CBM) extraction is an initiative attempt to integrate USIT with MA, which can be further generalized in other fields.
The remainder of this paper is organized as follows. An overview of TOD, MA, USIT, and patent analysis is presented in Section 2. In Section 3, the proposed hybrid approach covers how USIT is applied for the construction of the morphology matrix, and related concepts, such as the framework and detailed overall process, are introduced. Subsequently, in Section 4, an exemplary analysis regarding CBM extraction is used to exhibit the application process of the hybrid approach. Finally, discussion, conclusions, implication, and future research topics are discussed.

Technology Opportunity Discovery
Technology opportunity analysis (TOA) was first developed at the Georgia Tech Technology Policy in 1990 [14]. Due to the increased risks involved in launching and growing a new business, TOD mainly collects bibliometric or patent information to conduct value-added analysis as a reference for technical engineers, strategy managers, or market analysts in a suitable form [5]. The patent documents provide not only the bibliographic information but also the detailed technical information, such as the principles of function, the components of process, and the intrinsic causal relationship of each element [16]. The use of patent analysis has been increasingly significant in recent years, and patents have been applied to identify technological opportunity [17].
Since qualitative approaches such as the Delphi method and analytical hierarchy processes [18] have been employed in identifying technology opportunities, multiple quantitative approaches based on patent data have been developed as a way of minimizing the impact of experts. For example, Yoon et al. [7] applied subject-action-object (SAO) semantic mining to patent data analysis to extract technology opportunities from a functional perspective. Lee et al. [4] extracted the key information of an R&D plan from bibliometric data through chunk-based mining and transformed it into a customized and detailed R&D plan. However, due to the inconsistent and incoherent nature of bibliometric and textual data, the combination of qualitative and quantitative approaches has been considered as a remedy for TOD, such as applying factor analysis and cluster analysis to analytic hierarchy process (AHP) [19], latent Dirichlet allocation (LDA) to TRIZ [20], and Wikipedia to morphological analysis [21]. In addition, the development of TOD approaches proves that a supplementary expert's discussion is required to select meaningful sections for the investigation of technology opportunities from the extracted data.
Nevertheless, previous research on TOD has focused on the combination of methods that could analyze a sea of patent data to discover technology opportunities, as well as decreasing expert involvement while the data are underutilized to form the paths to derive new technology ideas. The methods are bounded by complexity and randomness to discover technological opportunities.
TRIZ inventive principles and engineering parameters, which are available tools for solving complicit problems, have been introduced to improve information accuracy and explore technology opportunities [22,23]. Nevertheless, due to the complexity of TRIZ, this study selects USIT, which is simplified from TRIZ and applicable for industry fields [24], and further reorganizes it to perform the function of guidelines for varying technology values in MA.

Morphology Analysis
As a powerful systematic methodology and non-quantified modelling approach, MA enables technological, organizational, and social problems to be analyzed by deriving unrealized combinations relying on expert experience and knowledge. The core of MA is that the research object is resolved into several fundamental dimensions, which depict the object from a detailed and inclusive perspective. Each dimension then can be divided into several sub-dimensions to explain the characteristics of the entire system [5]. The components of sub-dimensions are considered as values that contain the potential for discovering technology opportunities. By combining these dimensions and values of all possible forms, the new system can be invested, and unexpected creations may be discovered, which can be a strong stimulus for inventing new alternatives to meet imposed requirements. The superiority of MA lies in its intuitive form for structuring complex systems in a non-quantitative manner [25]. The analysis process can be executed and developed quickly without superfluous data.
The basic procedure of MA is as follows [26]. First, the essential functions of the subject are defined, and its characteristics are divided into several dimensions. Second, all possible values of each dimension are listed to demonstrate itself. Third, all combinations that can produce unique sets of values are investigated. The number of combinations can be calculated by multiplying the number of values associated with each dimension together. Fourth, evidence is sought for each combination by practical examples. Finally, the unfeasible combinations are eliminated, while valuable combinations are retained and ranked by their significance.
Valuable contributions have been made to the development of MA, and the major research stream lies in its application. MA has been applied to the design of materials and products in computer-based design [27,28], the development of new services [29], technology forecasting (TF) [30], TOD [15], and the generation of business models [31]. However, MA has the defect of inability to perform in a stand-alone mode. To systematize and automate the process of building the morphological matrix, other approaches have been used as a facilitator in the process of TOD. For example, text mining has been widely used to extract valuable information from the large volume of data by extracting keywords from patent documents [10,25,32] and selecting appropriate shapes based on the frequency and co-frequency of each word [5]. As a supplement, software tools have also been developed and applied to support the execution of MA [33].

USIT
Unified structured inventive thinking (USIT) was originally developed by Ed. Sickafus at the Ford Motor Company in the 1990s based on TRIZ, a theory of inventive problem solving, and SIT, a methodology of systematic inventive thinking [34] (pp. [43][44][45][46]. Therefore, USIT is defined as a systematic innovation approach for providing a beginning point for creativity with the goal of generating conceptual solutions rather than engineered solutions in a concise manner. Three key elements of USIT are object, attribute, and function, which are utilized to denote the entire innovation process. To identify and characterize the effective attributes of objects, the diagram of object-attribute-function (OAF) is constructed to serve as a heuristic tool in the application of solution techniques [34] (79-83). The strength of USIT lies in its concise structuring process for setting up and settling technological problems and short turn-around time for invention [35]. Moreover, two algorithms are developed in USIT: the closed-world algorithm, which establishes a vantage point for viewing the problem situation in functional relationships between selected objects; and the particles method algorithm, which enables analysts to find a conceptual solution to a certain problem by identifying physical, chemical, biological, and geometric effects to satisfy its specific features.
USIT presents a concise and systematic process for generating elements of ideas in engineering fields. Among the previous studies on USIT, the representative is Nakagawa's work. It proposes five categories of solution generation methods containing 32 sub-methods, called the "USIT Operators," which were reorganized from 40 innovative principles in TRIZ by Nakagawa [36]. For instance, "1a, eliminate the object" refers to the first sub-method in the first category method, the object pluralization method, which is derived from the two inventive principles of simplification and trimming. In this study, the meaning of each method is not elaborated, while the improved USIT Operators are utilized to discover innovation algorithms. Text mining is applied to produce the occurrence frequency of each keyword extracted from USIT Operators.
Jupp, Campean, and Travcenko [37] have integrated TRIZ and USIT as an effective design development in vehicle engineering, and Schöfer, Maranzana, Aoussat, Gazo, and Bersano [38] have utilized USIT tools to analyze the outcome of the problem-solving process on non-technological domains. However, the application of USIT is still insufficient. Furthermore, the definition of objects in USIT could be a wide range, which causes some trouble in discovering the core target objects. Therefore, our study is a development of USIT by integrating the solution generation techniques with MA to apply it in technological domains. The motivation is to utilize the reorganized USIT Operators to manage the innovation dimensions extracted from the morphological matrix, extend the scope of generated potential technological ideas, and evaluate their feasibility to obtain the technology opportunities in technical engineering fields.

Patent Analysis
Patents represent a useful source of knowledge about technical innovation and R&D performance in addition to acting as a means to protect inventions. It is suggested that the source of 80% of technical information lies in patents [39]. Among multiple sources for technical information (e.g. scientific and technical publications, products, and processes), it is indicated that the capability of patents to explaining most aspects of technical innovation activities has been recognized widely [40]. Basically, patent information, such as patent documents and citations, is considered as one of the sources in transferring knowledge to promote policymakers at the worldwide level, favoring promising technologies in terms of social optimum [41] and stimulate firm's strategy makers focusing from internal knowledge to external knowledge when developing product innovation [42,43]. Since the raw patent data offers detailed information on a specific invention (e.g., novel product or process) [44], previous studies suggested that patent analysis has long been employed as a useful analytical technique for TOD and has significantly benefited from the use of computerized methods such as patent statistics analysis and patent bibliometric analysis [45].
Since patent documents consist of two parts: structured part (e.g., patent citation, patent classification) and non-structured part (e.g., abstracts, claims, and description) [45], the techniques of patent analysis for TOD could largely be classified into two categories: the target of analysis aiming to structured part (e.g., patent co-classification analysis [46,47], patent citation analysis [48], etc.) and non-structured part (e.g., keyword analysis [10,49], patent network analysis [50], etc.). Among these techniques, keyword analysis combined with text mining techniques is particularly useful for scrutinizing the technical content of patents with the aim of decomposing technology to capture the overall technology features [30]. The keywords extracted from patent documents contain valuable information for TOD. In line with it, various studies have been conducted to discover new technology opportunities based on keywords extracted from patent data. For instance, Boon and Park [10] extracted representative keywords from 137 patent documents to establish the morphology matrix systematically and further identify technology opportunities. The analysis result indicated that the unoccupied territory of configurations was suggested as technology opportunities by listing the occupied configurations of collected patents. Also, Lee, Yoon, and Park [12] utilized text mining to identify keywords of patent documents by constructing the extracted keywords into structured vectors with the aim of creating keyword-based patent maps and identifying the vacancy containing the potential for new technology ideas. With the development of text mining techniques, the morphological analysis based on patent data has been one of the major research issues, while MA involves the implementation of text mining of patent content to identify core innovation elements and analyze them morphologically. The keyword-based morphological patent analysis can discover critical technology information from patents and thus provide our underlying motivation and is fully addressed in this study [49].

Research Concepts
To overcome the drawbacks of classical MA in innovation design activities, this research proposes the integration of MA with USIT, especially the "USIT Operators." Both methods share the same objective of deriving new ideas systematically and concisely and have the feature of hierarchical structure. MA provides a structured framework for idea generation, while USIT fills the paths to extend the values in MA. Different from the classical MA or USIT, this hybrid MA-USIT approach utilizes innovation algorithms from USIT to identify detailed technological characteristics on the basis of a basic morphological matrix to further extend the matrix.
The innovation algorithms are proposed through a semi-autonomous approach to improve the flexibility of USIT towards the target field and compatibility with MA. An innovation algorithm is the integration of the inner innovative principles to tell designers how to deal with the target innovation dimension. Compared with the USIT Operators, the innovation algorithms, which are reorganized from the solution generation methods in USIT, have simpler classifications and are also easier to be understood and used. In addition, with the inherit advantages of USIT Operators, innovation algorithms are applicable in industry fields and hardly need enormous knowledge bases except what you have learned.
In this paper, the morphology matrix is constructed with dimensions, sub-dimensions, and values. Although USIT is a systematic approach for providing a beginning point for creativity in a concise manner, the innovation paths of extending values may be equivocal. By integrating MA and reorganizing USIT, this hybrid approach could fill the morphological structure with supplementary values to discover new technology ideas from existing patent documents. Under the guidance of innovation algorithms, a systematic analysis will be employed to show what to focus on and how to generate supplementary values. In this paper, the semantic similarity of patent information and knowledge flow between patents are considered in extending the morphological structure to discover new technology opportunities.

Research Framework
The hybrid MA-USIT approach mainly includes two parts: constructing the basic morphology matrix and extending values using the innovation algorithms recognized from USIT. They are designed as two discrete sections and are executed successively.
The procedure of MA in this approach consists of seven steps, as shown in Figure 1. First, select the target technology area. In this step, the target technology that needs to be explored is selected. Second, collect relevant patent documents from the World Intellectual Property Organization (WIPO). Third, preprocess the collected patent data. In this step, function words and stop words are removed to eliminate valueless information, and keywords are extracted by text mining. Fourth, define the morphological structure by clarifying the relationships between dimensions, sub-dimensions, and values. Dimension is the preliminary level and could assist analysts in grasping the target knowledge domain effectively, while values of each sub-dimension consisting of keywords can provide detailed descriptions of the technology. Fifth, construct the basic morphological matrix by matching the values with the sub-dimensions. Sub-dimensions are concluded by combining the F-term system with the clustering results of Word2Vec to describe the technological characteristics in detail. F-term classification is a system created by the Japan patent office (JPO) to classify the detailed technical features of the inventions. The system of analyzing technical attributes from various perspectives enables the F-term to cope with the ever-increasing volume and diversification of technologies and to improve the efficiency of prior art searches for patent examination [1]. A simple example is listed in Table 1 to show the basic structure of the F-term featuring two parts. The first part consisting of five digits from the left is the "theme code", which represents a technological field. The second part is the four-digit "term code" from the right which is split from themes to explicate various technical viewpoints. "Viewpoint" and "figure" are the component of "term code". For example, in the F-term of 2D129AA01, 2D129 is the theme code ('earth drilling'), AA01 is the term code, while the viewpoint "AA" refers to 'object to be drilled' and figure "01" represents 'underwater' [51]. And the Word2Vec model is utilized to transform chaotic data into a structured word vector so that we can further explore the deep connectivity of the selected keywords by the value of TF-IDF. Word2Vec model, a technique released by Google for deep learning, uses word contexts to model the semantic meaning of a word when merging synonyms. This tool adopts two main model architectures: continuous bag-of-words (CBOW) model and continuous skip-gram model, to learn the vector representations of words. The CBOW architecture predicts the current word based on the context, and the skip-gram predicts surrounding words given the current word [52,53]. The Word2Vec models semantic meaning based on the relations between words and the surrounding context word collection [54]. For example, words that "pretty", "beautiful", and "palace" are equally distant even though semantically, "pretty" should be closer to "beautiful" than "palace". Semantic similarity analysis contributes to the construction of the MA matrix in discovering innovation dimensions and values. Sixth, extend the overall technological values in the morphological matrix by integrating the reorganized innovation algorithms from USIT with expert involvement. Finally, new technology ideas are derived from the combinations of values under the guideline of dimensions. code, while the viewpoint "AA" refers to 'object to be drilled' and figure "01" represents 'underwater' [51]. And the Word2Vec model is utilized to transform chaotic data into a structured word vector so that we can further explore the deep connectivity of the selected keywords by the value of TF-IDF. Word2Vec model, a technique released by Google for deep learning, uses word contexts to model the semantic meaning of a word when merging synonyms. This tool adopts two main model architectures: continuous bag-of-words (CBOW) model and continuous skip-gram model, to learn the vector representations of words. The CBOW architecture predicts the current word based on the context, and the skip-gram predicts surrounding words given the current word [52,53]. The Word2Vec models semantic meaning based on the relations between words and the surrounding context word collection [54]. For example, words that "pretty", "beautiful", and "palace" are equally distant even though semantically, "pretty" should be closer to "beautiful" than "palace". Semantic similarity analysis contributes to the construction of the MA matrix in discovering innovation dimensions and values. Sixth, extend the overall technological values in the morphological matrix by integrating the reorganized innovation algorithms from USIT with expert involvement. Finally, new technology ideas are derived from the combinations of values under the guideline of dimensions.    To be specific, the procedure of obtaining innovation algorithms consists of six steps to extend the morphological values, as shown in Figure 1. The first and second steps are the same as those of MA, i.e., field-selecting and data-collecting. Third, the texts are preprocessed by Python, one of the computer programming languages for deep learning. The patent data are cleaned by removing function words, stop words, and the tenses of words. Fourth, the contents of USIT Operators are analyzed by experts, and the keywords are extracted to represent USIT Operators. Fifth, the extracted keywords are further selected by their high occurrence frequency and reorganized based on semantic similarity analysis by experts' knowledge. Sixth, the inner connectivity of the keywords is analyzed to reorganize the USIT Operators of the target technology field by expert knowledge. Seventh, the value of the new reorganized algorithms is defined to make the algorithms clear and convenient to use. In fact, these algorithms are not only applicable for the target technology field but also valuable to other fields that could be explored in further research.

Phase 1: Data Collection and Preprocessing
Since the proposed method is executed based on the patent documents in specific technical fields, structured data must be extracted to explore their inner logic and law. First, WIPO PatentScope is selected as the data source, which records many countries' patents, including China, the Russian Federation, the United States, etc. The PatentScope database provides access to international Patent Cooperation Treaty (PCT) applications in full-text format on the day of publication, as well as the patent documents of participating national and regional patent offices. Thus, numerous patent documents from worldwide patent databases can be collected in WIPO PatentScope, and valuable information can be derived automatically. Although all patent documents can be collected and analyzed around the world, each database has its own characteristics in terms of main patent classifications and diversity of applicants, and the integration of databases may entangle valuable information when analyzing different forms of patent texts. Second, the retrieval methods in WIPO PatentScope consist of five categories. This research selects the cross lingual expansion because it can search the patent documents of various languages by inputting English keywords. The collected patent documents are ranked by their relevance in WIPO PatentScope, and each of them has IPC (international patent classes) codes to represent the target technology. According to the case in this paper, the patent documents are collected by searching the keyword of "coalbed extraction" in the search pattern of cross lingual expansion, which is effective for increasing the integrity of innovation elements. Third, the patent data is preprocessed by Python because the mass of valuable information in the patents is interfered with by meaningless information, such as the detailed parameters, effectivity, etc. Word segmentation is executed to divide the patent documents into single words and make preparations for the next step. For example, the sentence of "Well, I have an idea" is divided into "Well," "I," "have," "an", and "idea." Fourth, function words and stop words are removed to eliminate unnecessary impact. For example, "Well" and "an" are eliminated to decrease the workload of experts. In this step, the keywords in patent documents are extracted by calculating the TF-IDF of each word in the collected patent documents. Finally, stemming is executed to transform the words from inflectional affixes into the basic form of stems. For example, "separate" is transformed into "sep," and "measurement" is transformed into "meas." In this step, the keywords with high frequency in patent documents are identified from USIT Operators.

Phase 2: Basic Morphology Matrix Construction
In the basic morphology matrix construction phase, the morphological structure of the selected technology is defined to denote the characteristics of the technology hierarchically, as illustrated in Table 2. Generally, the morphological matrix is constructed by the combination of dimensions, corresponding sub-dimensions, and specific values. In this study, sub-dimensions are integrated by multiple values to represent the main structure of the target technology. Dimensions are defined to ensure that the integration of those sub-dimensions forms a coherent whole of the target. From the perspective of dimensions, the entire morphology of the technology domain could be described in a concise manner. The values composed by keywords represent the basic innovation elements of the potential for new technological ideas, while domain-based dimensions divide the boundary of values. To make the connection between dimensions and values clear, the subordinate concept is assigned as sub-dimensions to bridge the gap between dimensions and values by F-term and a supplementary expert's discussion. Technological opportunities are derived at the value level, while the innovation paths are performed based on dimensions and sub-dimensions. The process of constructing the basic morphology matrix is performed as follows. First, TF-IDF is utilized to identify the keywords from preprocessed patent documents assuming that keywords are used to label the crucial contents of documents. A word whose TF-IDF is not zero is selected as a preparatory keyword. The TF-IDF matrix for identifying representative keywords is listed in the supplementary material. Second, based on the experts' knowledge, a series of valuable keywords are screened from the preparatory keywords related to the technology. Third, the keywords are transformed into structured data to generate keyword vectors and then clustered into different groups by Word2Vec. The principle of classifying keywords lies in semantic similarity. As the details illustrated in Appendix A indicate, multiple dimensional vectors are produced from the extracted keywords based on the semantic contexts. The semantic relation among terms in the corpus comprised of extracted keywords is discovered by skip-gram technique. After the training converges, terms with similar meaning are mapped to a similar position in the vector space. Fourth, according to the F-term system and domain experts, the selected keywords in each cluster are reclassified and connected with values. The rule of reclassification is that the inner link represents the common characteristics. For instance, pulse, ultrasonic, microwave, and infrared in Cluster 2 are similar in the form that some types of energy take as they move, so they could be utilized to generate percussion in the coal seam. Finally, dimensions are concluded on the basis of values by domain experts' analysis, and the concepts of sub-dimensions are developed from dimensions. Thus, the basic morphological matrix is constructed to portray the structure of technology in a semi-autonomous manner, as shown in Table 2. With the keywords of the collected patents, we can begin to expand the values by F-term classification, and dimensions can be further concluded on the basis of sub-dimensions to minimize the dependence of experts' intuition. TF-IDF was first introduced as a term weighting scheme in 1972 [55] and has been the most fundamental form of document representation [56]. The basis of TF-IDF lies in the bag-of-words scheme whereby each document can be represented by a series of keywords that occur in the document. TF-IDF involves multiplying the IDF measure (the inverse document frequency of the term in the document; in this case, the less, the better) by a TF measure (the frequency of the term in the document; in this case, the more, the better), which reflects each word's value in the document in a quantitative manner. In this research, we apply the basic formula of TF-IDF analysis as shown in Equation (1) [57]. The parameter tf ij represents the number of times word i appears in document j. The parameter idf (t i ) represents that term t i occurs in n i of the total number of records N:

Phase 3: Innovation Algorithm Generation
The innovation algorithms are generated on the basis of USIT Operators. In the innovation algorithm generation phase, USIT Operators are defined by main keywords to identify the representative keywords for the target technology. USIT Operators have a system of solution-generating methods containing 5 main methods with 32 sub-methods [35].
The process of extracting representative keywords from USIT is as follows. First, USIT Operators are defined by some keywords. Second, the frequency of each keyword in preprocessed patent documents is counted by CountVectorizer, a Python toolkit. Stemming is executed to transform them into the basic form of stems. For instance, "sep" means "separate," and "meas" means "measurement." Third, 31 keywords are selected and improved into the original form. Some invalid keywords are then eliminated due to failing to explain the original words, and some external similar words are added as keywords to increase the accuracy of actual keywords. Consequently, it is effective to analyze the link between USIT Operators and patent documents by using the keywords rather than the entire contents of the patent documents.
To simplify the application of USIT to MA, USIT Operators are reconstructed by transforming the original 32 sub-methods into five, and they are called innovation algorithms. According to the results gained from the previous step, representative keywords of USIT Operators are further grouped on the basis of semantic similarity analysis by experts. Specifically, because "movable," "vary", and "flexible" share the meaning of "dynamic" and both "fluid" and "liquid" describe the flowing state of a substance, which also contains the meaning of "dynamic," these words are therefore classified into the innovation algorithm and defined as "Dynamic." MA is widely used for its intuition and simplicity, while the innovation algorithms can improve the compatibility between USIT and MA, which provides the path for extending the basic morphology matrix.

Phase 4: TOD Achievement
Because the information extracted from existing patent documents is limited by the depth of values and the width of sub-dimensions, which is disadvantageous to TOD, a supplementary morphology matrix is developed to extend the values and sub-dimensions in the TOD achievement phase.

Extending Sub-Dimensions
Although the F-term classification system covers the detailed technical features of inventions, it makes the entire framework of a specific technological field absent. In response, sub-dimensions need to be expanded vertically to increase their width and develop a more structured matrix. Innovation algorithms will be beneficial to expand the hyponyms in the dimension-framework with the aid of experts. For example, when a sub-dimension in the mechanism dimension is developed, the innovation algorithm "friendly" could guide the development of hyponyms as preliminary concepts, which aims to adjust the relationship between the elements in the innovation system and environment. In this manner, numerous concepts can be extended and further developed as sub-dimensions. This is possible because the concepts are created by the interaction of the result of clustering analysis with innovation algorithms.

Extending Values
The expansion of values is mainly classified into two categories. First, it refers to the original values in the basic morphology matrix. Second, it refers to the new values developed based on the original values in the first category. Moreover, on the premise of the basic matrix, the developed values in each sub-dimension could be further expanded in the guideline of innovation algorithms. For example, the value of "reverse circulation," which belongs to the basic matrix, could be developed into "positive cycle." The extended morphology matrix executes the value generation process to expand the depth of values. By this, the values in the basic morphology matrix could be extended horizontally to enrich the technological values of technology in the form of nouns or noun phrases.

Identifying Technology Ideas
New technology ideas are identified from the extended morphology matrix. In this paper, the procedure of identifying new ideas composes two layers: the first layer identifies conceptual ideas, and the second layer identifies new technology ideas. The analysis in the first layer is performed in dimensions and sub-dimensions. Based on the single sub-dimension and combinations of sub-dimensions in the same or different dimensions, multiple new conceptual ideas are derived and could be developed into detailed technology ideas by domain experts. The results in the first layer provide a guideline for the latter layer, which shows the detailed description of new technology ideas. For the second layer, detailed technology ideas are derived from values on the basis of conceptual ideas. Values enrich the details under the framework of dimensions and sub-dimensions. Thus, the second layer is actually performed as a length-and-width extension of the first layer.

Phase 1: Data Collection and Preprocessing
To illustrate the process of integrating basic MA with the innovation algorithms derived from USIT to explore technological opportunities, we provide an example of CBM extraction technology. The reasons for selecting CBM extraction technology are as follows. First, previous research on TOD has mainly focused on the field of product innovation but has rarely involved engineering innovation. The example will help engineers discover technological opportunities in engineering fields. Second, because CBM is an efficient, clean and environmentally friendly new energy and has become one of the emerging industries with huge development potential in the energy industry, identifying possible technological opportunities in CBM extraction is greatly significant for the industry, the nation, and all of society in that it can reduce air pollution and harness mine gas hazards.
The time window is set from 2009 to 2018, and 1385 patent documents related to coalbed methane extraction technology are collected by searching the keywords coalbed extraction in the WIPO PatentScope database with cross-lingual expansion. Before data preprocessing, we invited three researchers in the field of CBM from China University of Mining and Technology (CUMT) and Henan Coal Society (HCS). In addition, we invited one patent examiner from the Intellectual Property Office of China (SIPO). With the assistance of four experts, only 100 patent documents with high relevance to coalbed methane extraction methods remain according to the evaluation mechanism of WIPO PatentScope database, except for its devices and systems, to discover critical innovation dimensions and then technology opportunities. Table 3 indicates the high relevant IPC classes and paralleled F-term classes, which has reflected the CBM extraction technology development in the past 10 years and paved the way for discovering the values of keyword-based innovation elements.
Text mining is utilized to develop a technology dictionary that represents the major components of coalbed methane extraction. First, we extracted keywords on the basis of TF-IDF from each patent document by Python. A matrix of 100 patents and 2930 keywords [m ij ] 100×2930 is constructed to show the TF-IDF values of keywords in each patent document. According to this matrix, 2930 keywords are originally obtained. Then, the number of keywords is decreased sharply to 168 as a result of eliminating the supplementary or superfluous words such as "ability," "feedback," "greatly," etc. In this process, each keyword is selected based on its special value in coalbed methane extraction with expert assistance. With this, 168 keywords are developed without pointing out the relationship among each word. The TF-IDF matrix is shown in the supplementary material, and the clustering results is listed in Appendix B.

Phase 2: Basic Morphology Matrix Construction
To deal with the ambiguous relation of the values from different patent documents, all of them are transformed into a keyword vector and then clustered into 40 groups by the Word2Vec algorithm. For example, "discharge" and "outlet" are close to each other. With this, the values with similar meanings are grouped into the same cluster, as illustrated in Figure 2. The full result of word vectors is presented in Appendix A. The line between each associated word represents the semantic proximity. The darker the colour is, the closer these two words' semantics are. The values in each cluster can be explained by a few dimensions with the guidance of domain experts. To make the connection between dimensions and values clear, values are first assigned into sub-dimensions, the concepts of which are subsidiary to the dimensions in Table 2. Each keyword has a specific value in the technology, which is developed by F-term and domain experts. F-term theme and viewpoint are extracted from the JPO database. Hence, the morphology of coalbed methane extraction is presented in Table 2. There are three innovation dimensions for coalbed methane extraction, including the mechanism dimension, space dimension, and material dimension. The technology can be structured in a multidimensional view according to the analysis of sub-dimensions and values. Moreover, Table 4 lists the definitions of dimensions that are used to illustrate the technology morphology.

2
Space dimension Specific representations and characteristics of the system (machine, product, engineering, etc.) in the space domain.

3
Material dimension All attributes of the material belonging to the innovation system (machine, product, engineering, etc.).

Phase 3: Innovation Algorithm Generation
In this phase, the definitions of USIT Operators were analyzed to determine the keywords for text mining. With expert assistance, USIT Operators were divided into 70 keywords as the representative innovation elements. Appendix C shows a list of keywords extracted by the definition of solution generation methods. Their frequency is calculated by CountVectorizer. Considering the heterogeneous tense of the patent documents, the lexical form was restored to simplify the patents and USIT Operators. For example, "sep" means "separate," and "meas" means "measurement." Some invalid keywords were eliminated on account of failing to find the original words, such as "cont," "in," and "plac." Consequently, the keywords in USIT Operators were improved in the form of complete words. The full result of improved USIT keywords is presented in Appendix D. Moreover, some supplementary words were added to enrich the meaning of USIT Operators and increase the likelihood of innovation elements appearing in the patent documents. Then, the representative keywords in the technology of CBM extraction were selected on the basis of the improved keywords of USIT Operators.
According to the results in Appendix D, those keywords with high frequency are selected and then clustered into 5 groups to identify the core innovation algorithms by experts, as shown in Table 5. The five innovation algorithms were not selected from USIT Operators but were reorganized from the high-frequency keywords in USIT Operators. The criterion of selecting the keywords were the frequency of occurrence in patent documents, and the criterion of reorganizing these keywords lied in the semantic similarity. In group 1, for example, "connect" has the meaning of joining two or more things together, while "combine" means to join two or more things or groups together to form a single one according to the Oxford English Dictionary. Thus, the two words are assigned to the same group to compose the innovation algorithm of "combination and integration". In group 2, "gaseous," "liquid," and "fluid" have the characteristics of being flexible, movable, and variable. Hence, the three words are utilized to describe the change of an object's state. In group 3, words such as "improve," "modify," "change," and so on refer to making something different from what it was before. In group 4, words such as "smaller," "separate," "divide," and so on share the meaning of causing something to move or be apart. In group 5, "measurement," "detection," and "level" are clustered in the view of human factors of engineering. Table 6 gives the definitions of innovation algorithms that may be utilized to extend the values of technology morphology. When the innovation algorithms are identified and restructured based on USIT Operators, they will be applied to extend the values of technology morphology so that the basic morphology matrix can be expanded horizontally and systematically.  Table 6. Definition of innovation algorithms derived from USIT in the technology of CBM.

Group Innovation Algorithm Definition 1 Combination and integration
Combine the same or different performance elements into one and take some methods to change the original scattered state of some independent elements to make them more interconnected and organically integrated.

Dynamic
Give the innovation elements, the ability to change in time, space, environment, and other conditions.

Local optimization
Adjust the parameters to reach the optimal solution state partially to improve the performance of the entire innovation system (machine, product, engineering, etc.).

Decomposition and removal
Decompose and transform the innovative system (machine, product, engineering, etc.) into small units by means of segmentation, separation, removal, extraction, etc. to add necessary functions, reduce unnecessary functions, remove harmful functions, etc.

Friendliness
Adjust the relationship between the elements of the innovation system (machine, product, engineering, etc.) and the environment to reduce the occupied resources, minimize the harm caused to the environment, and achieve harmonious and friendly human-machine-environment relationships in view of human factors of engineering.

Extending Sub-Dimensions
The basic morphology matrix is supplemented and reconstructed by generating additional values and sub-dimensions to yield an extended morphology matrix. Numerous concepts of sub-dimensions are developed to make the matrix more structured. For coalbed methane extraction, "fracture," "scour," "thermal driving," "fusion," etc. are extended as the new sub-dimensions that show the characteristics of the specific technology field. Based on the clustering results, corresponding values in the extended sub-dimensions are developed to enrich the matrix. Furthermore, several novel sub-dimensions are expanded, mainly based on innovation algorithms and experts' knowledge, such as microorganism, which is proposed to adjust the relationship between the elements of the innovation system and the environment.

Extending Values
Via innovation algorithms, the hyponyms of basic sub-dimensions are generated to extend the values, including "positive circulation" and "shock." For instance, if a value is selected as the target to be extended, at least one innovation algorithm will be applied to develop the hyponym in the range of dimensions and sub-dimensions. Table 7 shows the examples of value extension by coupling the innovation algorithms with the basic values. The procedure of value extension in the earth drilling method is depicted from two categories. In the first category, new values are developed from the basic matrix. For example, the new value "positive circulation" is developed from the basic value of "reverse circulation" because both of them share the meaning of the hypernym "earth drilling method." For the second category, values are developed from the new ones in the first category. For example, "impact swing" is developed on the basis of "rotation" and "shock," while "rotation" and "shock" are both values expanded in the first category. The extended morphology matrix by coupling innovation algorithms is presented as Table 8.  In the first layer, conceptual ideas are identified based on dimensions and sub-dimensions. To be specific, in the sub-dimension of the earth drilling method, the combination of multiple methods could derive new drilling methods. The combination of different dimensions induces the conceptual ideas. For example, in the framework of the space dimension and mechanism dimension, location of the object to be drilled could be combined with fracture, scour, etc. The new possible ideas are illustrated as Table 9. Table 9. New conceptual ideas in the first layer.

New Conceptual Ideas Description Dimension Sub-Dimension
Space dimension Location of the object to be drilled The state of pressure, permeability and gas saturation depend on the area of coalbed. Choosing a different operation point of the drill or reaction may contribute to the output of CBM.

Space dimension, Mechanism dimension
Location of the object to be drilled, Fracture, Scour By applying different principles to the process of extraction, such as fracture, scour, etc., it is possible to control the process and vary the medium of the process according to the space and CBM content in a more harmless way.

Space dimension, Mechanism dimension, Material dimension
Location of the object to be drilled, Fracture, Flooding medium, Fracturing medium In view of the space between different coalbeds, a specific medium or reactant may be applied to the coalbed to optimize the effect of recycling CBM or flow conductivity of fractures.

Identifying Technology Ideas in the Second Layer
The analysis in the second layer is performed in the values. New technology ideas are identified based on the combinations of conceptual ideas derived from the above three dimensions. If the sub-dimension of the space dimension, location of the object to be drilled, and the sub-dimension fracture in the mechanism dimension are selected as the guideline, the combinations of their values will contribute to new detailed technology ideas.
Additionally, logical analysis is essential in this part. To acquire feasible technology ideas, the location value of the space dimension is first confirmed. In other words, it determines where to extract CBM. Then, the expected mechanism in the mechanism dimension is determined to apply the technology. The material dimension is then performed as an auxiliary tool to support the procedure of discovering technology opportunity. Table 10 shows the new technology ideas that are derived based on the extended morphology matrix. The horizontal well and vertical well are combined into a complex well group, and the horizontal well layout plan of the well group is optimized according to the geological conditions of the mine area. The large-area fracture network is formed by the horizontal well fracturing to increase the speed of coal mine drainage pressure reduction and increase the production of CBM.
Location: coal reservoir

Percussion: ultrasonic
One or more holes for the bedding layer or the through-hole vibration hole are arranged on the target coal seam, and the explosion-proof or intrinsically safe vibration equipment is placed into the bottom of the hole by the mounting rod. The vibration operation is performed through ultrasound, and the vibration-affected area is carried out according to a conventional method for CBM extraction.

Location: coal reservoir
Microorganism: hydrogen-producing bacteria + methanogens + Fracture: hydraulic Fracturing medium: hydrogen-producing bacteria+ methanogens The fracturing fluid used in the coal fracturing operation is a bio-fracturing fluid prepared by mixing hydrogen-producing bacteria with methanogens. Hydrogen-producing bacteria and methanogens can transform coal reservoirs without harming the coal seam and improve the capacity of CBM desorption.
Location: coal reservoir + surrounding rock Earth drilling method: impact swing + Scour: gas + fluid Drilling into a coal reservoir from surrounding rock by the method of impact swing to increase the drilling rate and decrease the cost of extraction and then using the air fluid flushing to release the pressure from the acupoint. The high-pressure air shock wave is used as the power source, and the hole wall is subjected to the near-cylindrical rotary impact to cut the acupoints so that the coal around the borehole is gradually broken away from the hole wall to form a pressure-relief space. Then, gas drainage is implemented in the coal seam according to a conventional method. Different from the previous practice of simply drilling coalbed methane in the surrounding rock, the scheme is to drill along a long seam by the method of deep mixing in the surrounding rock. Then, supercritical dioxide is injected to perform the process of fracturing by repeated pulsation fracturing in the surrounding rock to create more gaps to solve the issue that fracture stress could not be effectively transferred in the plastic structure.

Discussion
This study employed a keyword-based hybrid approach of MA and USIT to analyzing patent information for TOD in the field of CBM extraction. Depending on the complexity of technology, two morphology matrixes: basic and extended morphology matrix were developed and three major dimensions including mechanism, space, and material dimensions, as mentioned in Table 4, were analyzed according to the results of patent analysis. Otherwise, as we may observe from Tables 9 and 10, the analysis result showed that the mechanism dimension was critical to develop the technology of CBM extraction, while space dimension and material dimension played the role of the subsidiary. As mentioned in Table 6, five innovation algorithms were developed from USIT based on the result of patent analysis. These innovation algorithms inherited the succinctness of USIT and carry the adaptability in the CBM extraction technology. Conventionally, the process of establishing a morphology matrix was only conducted by plenty of domain experts and practitioners. Furthermore, biased knowledge bases limited the process of knowledge transfer. Thus, this research provided a systematic way to develop basic and extended morphology matrix and identify the critical innovation elements in the matrix and further discover technology opportunities for government and firms.
Several policies of the hybrid approach are worth noting. First, the dimensions and innovation algorithms in this research are defined in a generalized way, though they are derived from the specific technical field, which carries the potential to be applied to other research fields. Second, the extension of sub-dimensions and values is conducted by the guideline of innovation algorithms while basic sub-dimensions and values are developed from patent documents. Compared to the basic morphology matrix, the extended one offered more valuable information. Third, the coalition between MA and USIT is performed at the implemented level, indicating that the structure of the morphology matrix and logistic of USIT are retained in the intrinsic form. Although previous research has merged MA and other techniques, such as WordNet [15], Wikipedia [21], conjoint analysis [30] etc., the approach differs from them in the intervention of innovation algorithms in the process of extending morphology matrix.

Conclusions
As a remedy for subjective and expert-based morphology building, this research has proposed a hybrid approach for TOD on the basis of MA and five innovation algorithms developed from USIT to extend the basic morphology matrix. While technology can be visual in morphological structure, classical MA is subject to the qualitative characteristic, contributing to the fuzziness of values building in each dimension, which decreases the efficiency of exploring new technological opportunities.
Several contributions are made in this paper. From the perspective of theory, first, this paper provides a semiautonomous and systematic procedure of constructing the basic and extended morphology matrix for review research. The use of innovation algorithms in the morphology analysis extends the subjective and expert-based morphology. Second, the proposed framework provides an initial study for the concept of innovation algorithms that are derived from USIT that has not appeared in existing review research. USIT Operators are reorganized into five high-frequency innovation algorithms in the specific technology field to improve the applicability for varying the technological morphology. From the perspective of application, the procedure of deriving new technology ideas is systematically depicted from two layers, which act as a guideline for TOD, on which prior research rarely focuses. The proposed approach would show remarkable performance to discover numerous valuable technology ideas in the framework of MA.
As mentioned earlier, the proposed method could be utilized by government and industry to plan technology strategies because it can discover the potential technology opportunities based on existing patent documents. Patent analysis is employed actively in excavating the related technology information and relevant innovation algorithms for technology opportunity discovery. Thus, at the national level, a new technology idea derived by the proposed approach may accelerate the development of an emerging technology that requires government to formulate technology policy for economic growth. Based on the patent analysis, the basic and extended technological morphology matrix can be constructed to visualize the technology. When the technology is decomposed into several dimensions and new potential values are developed widely through innovation algorithms, MA as a method to structure a problem is effective to avoid legal disputes over existing patents between developers competing around the world. The hybrid approach uses innovation algorithms instead of technological content to extend the morphology matrix, thus, this approach guides a path for knowledge transfer by combining knowledge from the aspects that were undetected previously in a practical way. At the firm level, under the circumstances of increasing technology complexity, growing technology convergence, and segmented technological knowledge, the direction for TOD can help create a competitive strategy for R&D planning, especially for SMEs to gain the lead position preferentially in emerging markets by clarifying technological performance efficiently. Key information is identified for new technology opportunities that may hold great business value for companies from ambiguous patent data by automated approaches. Considering the R&D cost, this hybrid approach provides a systematic way to utilize the common patent data to analyze the complex technology systems and further pursue innovation. Although the procedure of clarifying technological performance is slightly complex to perform, experts' participation has been reduced, and the obtained innovation paths are clearly presented, which will decrease the R&D cost.
This study still has some limitations. First, new technology ideas are derived at a theoretical level. The system for assessing each idea's value is defective, which cuts the efficiency of discovering feasible technology opportunities from new derived ideas. Because TOD should concentrate on opportunities and the strategic use of information in terms of objects and activity [8], the link between opportunities and the use of strategy is not so explicit in this study. Second, five types of innovation algorithms were defined in this paper to further simplify the use of USIT and extend the morphology matrix. However, because they were clustered by the frequency of keywords, the results might be inappropriate for providing a generalized system of innovation algorithms. Third, bibliometric analysis is only performed by semantic similarity analysis, which might hinder it from clarifying the overall technological characteristics. The procedure to process documents is also complex compared to other improved text mining methods. This study endeavored to minimize the subjective opinions and the effort or time cost of humans with a computerized approach, but expert opinions are additionally required with the goal of analyzing and reflecting practical opinions more accurately.
Therefore, future research can focus on the following three aspects. First, an effective value assessment system needs to be generated for new technology opportunities, which can be utilized to systematize the procedure of screening feasible technology opportunities. Moreover, a generalized insight of the concepts reorganized from USIT could be further defined to explore the deep link of innovation algorithms. Third, integration with other bibliometric techniques, such as LDA [56], based on both word frequency and semantic information can distribute the topics comprehensively, and an a-priori algorithm based on the mining associate rule can also be considered to reduce the iteration times with high accuracy [58,59].

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.

No. USIT Operators Explanation Keywords
2f Change the phase, utilize the phase change, or change the inner-structure of the Object.
Change the phase (i.e., state of condensation) of Object(s), utilize the phase change, or introduce/change the inner structure at the micro level for using various Attributes thus activated/enhanced. change, introduce, micro level 2g Utilize Attributes/properties at the micro level.
Consider/design the structure/properties/interactions of Object(s) at the micrometer or nanometer (or even smaller) scale, and solve the problem from the micro-level principles.
micro level 2h Improve the properties/performance of the system as a whole.
(Besides the Attributes and Functions of the Objects as the components of the system,) consider the properties (or Attributes) and Functions of the system as a whole and improve them by designing/implementing/improving the system and its components. In order to achieve the target of the system or to solve the problem, introduce a new Function and assign it to an Object either present/modified or newly introduced.
introduce, assign 3e Distribute/vary the Function in space or utilize the spatial distribution/motion/vibration Function.
Distribute/arrange the Function(s) in some spatial order/structure and increase the degree of spatial freedom. Utilize/enhance the spatial Function(s) of distributing/moving/vibrating the Object(s) (or the Attribute(s) of Object(s)).
distribute, arrange, enhance,  Introduce/enhance Function(s) for adapting/coordinating/controlling the system and make the system higher and more intelligent.
introduce, enhance, intelligent 3i Achieve the Function with a different physical principle.
In place of the present Function (especially the one achieved by gravitational or mechanical principles), achieve the similar Function in a more effective and controllable way on the basis of a different physical principle. Combine solutions at the super-system level.
Consider the higher-level purpose or principal function which should be performed by the system in the problem and solve the current problem by combining/coordinating the present system with the neighboring system(s) and forming/improving the higher-level system (i.e., the super-system).

Solution Generalization Method
Represent a solution in a more general way, form a solution template, and obtain concepts of solutions in the associative manner. Also generate a hierarchical system of solutions.
hierarchical, generalize 5a Generalize/specify the solution for associative thinking.
Replace the technical/specific terms in a solution with plain/generic terms, form a plain solution template, and then obtain new specific conceptual solutions in an associative way. By making the word more familiar to ordinary people, who are not specialist, the system becomes easier for common people to learn well and quicker to get the satisfactory answers.
replace, generalize 5b Construct a hierarchical system of solutions.
Classify a number of solutions obtained so far, make a hierarchical system of solutions with respect to the levels of generalization, consider the overall view of the solution space, and try a comprehensive search of solutions.