Research on Product Core Component Acquisition Based on Patent Semantic Network

Patent data contain plenty of valuable information. Recently, the lack of innovative ideas has resulted in some enterprises encountering bottlenecks in product research and development (R&D). Some enterprises point out that they do not have enough comprehension of product components. To improve efficiency of product R&D, this paper introduces natural-language processing (NLP) technology, which includes part-of-speech (POS) tagging and subject–action–object (SAO) classification. Our strategy first extracts patent keywords from products, then applies a complex network to obtain core components based on structural holes and centrality of eigenvector algorism. Finally, we use the example of US shower patents to verify the effectiveness and feasibility of the methodology. As a result, this paper examines the acquisition of core components and how they can help enterprises and designers clarify their R&D ideas and design priorities.


Introduction
As critical elements of the product, components (i.e., parts), especially core components, are essential for product function [1]. With the rapid development of modern manufacturing, the speed of product iteration is accelerating. For products, the core components determine the performance and quality [2]. More companies are starting to pay attention to the identification and acquisition of core components and to strive for efficient product innovation through research of core components.
Although there are many ways to identify components, companies still lack methods to classify essential core components. As a result, enterprises and R&D institutions cannot fully understand what the core components for a product are, which hinders product R&D. In fact, core components play a vital role in identifying the focus of current product design [3]. However, companies tend to dismantle all the products in the marketplace to identify core parts, which is inefficient and costly.
At present, mining of patent texts is receiving increasing attention from academia. Patent texts are technical and contain nearly 90% of the technical information in the world. Use of available, valid patent information could save companies around 60% R&D time and 40% R&D expense [4]. Patent text has the characteristics of standard format, rigorous language and detailed technical descriptions. In addition, patent text has a sturdy structure, which allows its extraction from modules in order to obtain different information. More importantly, patent application time is crucial for patent text. The combination of patent technology information and application time enables the development of technology for its analyzation and prediction [5].
From the characteristics of the patent text, it can be found that patent text contains rich product information. In order to fully explore product structure information from patent texts and eliminate doubts for product designers, this study proposes a core component identification method based on patent texts, which combines the requirements for core component acquisition and the technical advantages of patent text analysis. Meanwhile, the methodology adopts mining technology of SAO text, which offers extra component description information apart from the patent text. Further, the theories of centrality of eigenvectors and structure holes to analyze the obtained SAO structure are introduced. Finally, the weights of components in products to obtain priority development objectives for enterprises and designers are calculated.
Adopting this methodology enables companies to mine and organize patent text information and identify core components. Compared to dismantling market products, analysis based on patent big data enables analysis of product design in a more comprehensive manner. This helps companies understand which components are critical to product design, analyze the design focus of existing industry, and provide a reference for further research.

Methods of Patent Information Mining
Patent information mining research includes three main methods: patent research method based on (i) international patent classification (IPC) [6], (ii) citation analysis [7] and (iii) patent text [8]. The method based on the IPC, includes the combination of IPC number with the quantity and frequency of related patent numbers. This method reflects the process of technology development and foresees a future development path based on timing characteristics [9]. However, this kind of patent extraction method is relatively rough and based on the macro-level, so it cannot maximize information usage [10]. Patent citation analysis starts from the citation network to analyze the relationship between patents and the process of technology evolution [11]. This method can analyze the development process of technology and future trends. Nonetheless, patent citation analysis cannot ensure timeliness; it also ignores vital information in the contents of the patent [12]. Therefore, the most-used analysis method is based on patent text information mining. In recent years, with the growing number of patents, more scholars prefer using machines for patent text mining [13].
Patent text information mining is mainly divided into three strategies: domain knowledge [14], vector space [15] and patent text semantics [16]. Patent information mining based on domain knowledge comprehensively mines patent technology information by constructing domain knowledge ontology, with the information in the patent text represented by the ontology. However, construction of domain knowledge requires a lot of manual participation, which is time-consuming and labor-intensive [17]. Patent information mining based on vector space models patent information as a vector in order to explore the spatial relationships between vectors, which has the advantage of identifying mature technology [18]. Patent information mining based on the semantic information of the patent text represents the text with semantics as the association rules, allowing for semantic mining and semantic relationship mapping to undertake full mining of the deep patent information, thus it has gradually become the research direction favored in academia [19].
To make the semantic information of patent texts more sufficient, and the technical features of patents to be expressed more clearly, scholars use increasingly efficient methods to mine the semantic information of patent texts, such as entity extraction and SAO. Among them, Ki and Kim [20] used SAO to extract the technical information from a patent by summarizing the defects in existing analysis methods so as to reduce manual operations as much as possible when extracting technical description content, and they displayed the results in the form of a matrix. As for An et al. [21], in order to determine the relationship between keywords in patents, prepositions were introduced into the semantic analysis network, thereby overcoming the limitations of keyword network analysis. Wang et al. [22] considered the relationship between different SAOs when using SAO to mark patent texts and proposed the DWSAO framework to assign different weights to different SAOs, which improves the extraction efficiency of patent text information. In addition to SAO, Chen et al. [23] built a patent information extraction framework through named entity Entropy 2022, 24, 549 3 of 20 recognition and semantic relation extraction, which improved performance of named entity extraction in patent texts. Xiong et al. [24] monitored, identified and extracted named entities in drug patents, and, by matching knowledge graphs, increased the efficiency of drug information mining in patents to 83%, providing reliable data for building drug monitoring programs. Compared to SAO, the named entity recognition method extracts less deep text information and cannot relate between words and words in patents [25].

Methods of Patent Analyzation
Patent analysis methods are mainly based on four strategies: complex network, vectors, TRIZ and keyword maps.
With the continuous improvement of computer big data analysis and graph computing applications, more academics have begun to emphasize complex networks. High similarity between complex networks and big data in scale, complexity and dynamics resulted in effective correlation in data expression. There are many patent analysis methods based on complex networks, for example, Iwan, Thorsten and Katja [26] took patents as nodes and citation relationships as arcs, showed reference relationships among patents in the form of complex networks and then analyzed the characteristics. Jiang and Zhou [27], through the establishment of complex network and a small-world model, explored the patent performance of industry-university research cooperation; As Park, Chun and Jeong [28] proposed, extracting keywords from patents and analyzing the network in the form of vectors and matrices can identify relationships between keywords and increase patent development efficiency. Choi, Lee and You [29] implemented text mining to extract the characteristics of word vectors, and the document is clustered by an outlier detection algorithm to develop a patent recognition network and explore potential technologies. In addition, Li et al. [30] raised the problem of avoiding patent infringement; this paper puts forward a problem-solving tool based on modern TRIZ tools, which successfully avoids the problem of the patent equivalent of word infringement. Kim M, Park Y and Yoon J [31] built a patent map through patent semantic topic analysis to discover the inheritance relationships between patents. Comparison of patent analysis methods is shown in Table 1: After comparative analysis, the patent analysis method based on a complex network considers the semantic information in the patent text, can be used for network analysis and has better visualization effect, so it has gradually become a popular method for patent analysis. Since 2019, new ideas of patent mining research based on complex networks have constantly been brought up. It also will be a hot topic in the future [32].

Methods of Core Component Acquisition
For component recognition, identification methods of core components are mainly divided into four categories: network algorithm, machine learning, quality function deployment (QFD) and term frequency-inverse document frequency (TF-IDF). Yin et al. [33] proposed the identification of influential parts of complex mechanical products based on a complex network: the components in the product are mapped to nodes. This method use a local rank algorithm to sort the influence of components. However, it can only analyze a specific product, and the construction of complex networks is quite complicated. For discovering components, Klahn et al. [34] proposed automatic identification of parts by QFD, which is of great significance to design change and technology transfer. Zheng et al. [35] pointed out that components' healthy and stable operation is the guarantee of stable and safe operation in power system, so they proposed a machine learning method to identify and detect defects in transmission line components. Dorji et al. [36] applied TF-IDF to domain-related terms to identify and extract domain-related terms. The comparison of core component identification methods is shown in Table 2: Table 2. Comparison of core component identification methods.

Method
Mode of Action Advantage Disadvantage Literature

Network algorithm
Map product components to networks, and identify core components through network measurement algorithms.
Easy to measure position and use of nodes in network, strong visualization.

Insufficient dynamic visualization.
Yin et al.

Machine learning
The model is trained through sample data, and the trained model is used to analyze and predict data.
High degree of automation, faster training speed.
Influenced by algorithm accuracy and quality.
Zheng et al.

QFD
Obtains the components that contribute most to requirements through requirement analysis.
Strong purpose, high excavation accuracy.
Influenced by subjective experience.

TF-IDF
Takes the frequency of keywords appearing in the text and the frequency of keywords appearing in all texts as the criteria for judging the importance of keywords.
The algorithm is simple and easy to implement.
Semantic order and context in the text are ignored.

Dorji et al.
It can be seen that core components play an essential role in equipment manufacturing, social progress, and enhancing a country's hard power. Core components are the most critical part of a product, so identifying core components is of great significance.

Summary
Judging from the research status of component identification, patent information mining and patent analysis methods, existing research mostly focuses on mining product technical information in patent texts to achieve technical efficacy analysis, cluster analysis, etc., but little research has been done in identifying and acquiring core components [37][38][39][40][41][42]. There has been some research on the application of patent text information mining to product functions [43]. In general, current academic circles are paying more and more attention to research of patent text information mining in product development. Therefore, application of patent text analysis has bright future prospects in core component identification and acquisition for both small-scale databases and large-scale databases [44]. In summary, this study proposes to build a semantic network for core components based on patent text big data, integrating the methods of SAO, structure hole and complex network Entropy 2022, 24, 549 5 of 20 measurement to carry out the identification and acquisition of core components. It is hoped that this method can provide an idea for core component extraction.

Overview of Core Components
Patented products usually contain multiple components and various connections between them. The importance of components depends on their functions and particular connection frequencies with other components. Based on the previous definition of core components [45], and combined with practical information from the mechanical field, this study makes the following definition: Definition 1. A core component is the component in a crucial position and that occupies more resources in production.
Resources occupied by a component can be measured by two aspects: (i) the quantity of adjacent components; and (ii) the quality of adjacent components. Suppose the relationship of components in a product as a matrix; the size of the adjacent component matrix depends on the number of adjacent components. Therefore, the nodes with more adjacent components have higher participation in the product, which means they play a more prominent role. In fact, some components are not only adjacent with many other components but also adjacent with other highly important components. Therefore, only considering the quantity but ignoring the quality is not reasonable. Thus, deciding whether a component is essential depends on both the quality and quantity of its adjacent components. For example, the shower in Figure 1 is composed of 10 components. The adjacent matrix shows that components 3 and 4 are connect to the same number of components, but the weight of component 3 (W3 = 8) is larger than that of component 4 (W4 = 7).
technical information in patent texts to achieve technical efficacy analysis, cluster analysis, etc., but little research has been done in identifying and acquiring core components [37][38][39][40][41][42]. There has been some research on the application of patent text information mining to product functions [43]. In general, current academic circles are paying more and more attention to research of patent text information mining in product development. Therefore, application of patent text analysis has bright future prospects in core component identification and acquisition for both small-scale databases and large-scale databases [44]. In summary, this study proposes to build a semantic network for core components based on patent text big data, integrating the methods of SAO, structure hole and complex network measurement to carry out the identification and acquisition of core components. It is hoped that this method can provide an idea for core component extraction.

Overview of Core Components
Patented products usually contain multiple components and various connections between them. The importance of components depends on their functions and particular connection frequencies with other components. Based on the previous definition of core components [45], and combined with practical information from the mechanical field, this study makes the following definition: Definition 1: a core component is the component in a crucial position and that occupies more resources in production.
Resources occupied by a component can be measured by two aspects: (i) the quantity of adjacent components; and (ii) the quality of adjacent components. Suppose the relationship of components in a product as a matrix; the size of the adjacent component matrix depends on the number of adjacent components. Therefore, the nodes with more adjacent components have higher participation in the product, which means they play a more prominent role. In fact, some components are not only adjacent with many other components but also adjacent with other highly important components. Therefore, only considering the quantity but ignoring the quality is not reasonable. Thus, deciding whether a component is essential depends on both the quality and quantity of its adjacent components. For example, the shower in Figure 1 is composed of 10 components. The adjacent matrix shows that components 3 and 4 are connect to the same number of components, but the weight of component 3 (W3 = 8) is larger than that of component 4 (W4 = 7). Starting from considering the "quantity" and "quality" of the component, adjacent matrixes can identify more-essential components. However, it is difficult to identify those "mediators" between networks because they have fewer adjacent components, so judging whether a component is essential also depends on the component's location in the product network. To judge whether the position of the component in the product is favorable, first, Starting from considering the "quantity" and "quality" of the component, adjacent matrixes can identify more-essential components. However, it is difficult to identify those "mediators" between networks because they have fewer adjacent components, so judging whether a component is essential also depends on the component's location in the product network. To judge whether the position of the component in the product is favorable, first, we start with the number of shortest paths through the component in the product network. A component's ability as a "mediator" in the product determines its importance. Perhaps there are not many adjacent components for a "mediator", however, as more components must communicate through the "mediators", the role of the component becomes more prominent.
It can be considered that once important components are damaged, the whole product system will take a huge loss. The core component is the one in closer contact with others and located in a more-favorable location in the product networks and that undertakes the main functions.

Research Framework of Core Component Acquisition
This study proposes a framework of core component acquisition of patents based on SAO, structure hole and complex network technology. The framework contains four steps, as shown in Figure 2. Firstly, the SAO structure of the patent text is extracted. Secondly, the complex network algorithm and structure hole theory are used to analyze the SAO structure and calculate the component's importance. Finally, the essential measurement of the core component is obtained according to the calculated data. The steps for acquiring core components are as follows: others and located in a more-favorable location in the product networks an takes the main functions.

Research Framework of Core Component Acquisition
This study proposes a framework of core component acquisition of pat SAO, structure hole and complex network technology. The framework steps, as shown in Figure 2. Firstly, the SAO structure of the patent text is e ondly, the complex network algorithm and structure hole theory are used t SAO structure and calculate the component's importance. Finally, the essen ment of the core component is obtained according to the calculated data. acquiring core components are as follows: Step 1: Patent data retrieval and acquisition Firstly, retrieve and download industry patent information through w ondly, calculate recall and precision ratio to verify downloaded patent tex will meet the requirements. Finally, select the downloaded patent informa text.
Step 2: Data preprocessing and supplementation Check the patent technical information acquired from Step 1 and man and supplement the patent text information, avoiding incorrect and garble information. The format of the patent text is generally divided into five part field; (ii) background technology; (iii) invention content; (IV) description of and (v) specific implementation. The background technology part mainly i background of the patent, the technical background and the significance o patent. The invention content part mainly explains the applied patent in de gram of the patented product is explained in the illustration part. Finally, th plementation part states the content that the patent wants to protect. Step 1: Patent data retrieval and acquisition Firstly, retrieve and download industry patent information through websites. Secondly, calculate recall and precision ratio to verify downloaded patent text information will meet the requirements. Finally, select the downloaded patent information from the text.
Step 2: Data preprocessing and supplementation Check the patent technical information acquired from Step 1 and manually retrieve and supplement the patent text information, avoiding incorrect and garbled patent text information. The format of the patent text is generally divided into five parts: (i) technical field; (ii) background technology; (iii) invention content; (IV) description of the drawings; and (v) specific implementation. The background technology part mainly introduces the background of the patent, the technical background and the significance of the applied patent. The invention content part mainly explains the applied patent in detail. The diagram of the patented product is explained in the illustration part. Finally, the specific implementation part states the content that the patent wants to protect.
The patent overview includes the description of the patent or the technical points of the utility model; this method extracts the invention part from patent text. Then, remove the label in the text by using a regular expression. After that, manually delete other interference information in the text. Finally, after preprocessing and supplementing the patent overview information, regular expressions are used to segment the text information.
Step 3: Part-of-speech tagging and SAO acquisition The SAO structure will be extracted from POS tagging obtained in Step 2. Since there are varieties of synonyms in the patent text, it is inaccurate to statistically analyze the SAO structure extracted in Step 2 directly. Use sememe similarity to compare and merge words with the same meaning. Then, screen and eliminate the nodes of unrelated components in SAO.
Step 4: Complex network construction and core component acquisition Prepare for construction of the SAO complex network by calculating the node's weight according to the measurement value of S (subject) and O (object) in the structure of SAO. After processing the SAO structure, take "S" as the source node, take "A" as the edge and "O" as the target node to form a directed graph. Finally, apply the structure hole algorithm and eigenvector centrality to calculate the obtained SAO to get the core components.

POS Tagging and SAO
POS tagging is a technology used in natural language processing (NLP), also known as grammar tagging. Valuable information in the text can be extracted by tagging the POS in the text. The primary forms of POS in the text are shown in Table 3: Subject-action-object (SAO) or subject-predicate-object is triple extracted from text. Among SAO, the subject and object usually appear as nouns, representing the executor and the executed, respectively, in the event. The predicate connects the subject and object as action. A sentence may contain only one group or multiple groups of SAO structures. In patent text, the structure of SAO in reference Guo et al. [43] is summarized as shown in Table 4:  The switching mechanism is disposed on fixing holder" (US10464077): 1. switching mechanism-disposed-holder. The present invention has arisen to mitigate and/or obviate the afore-described disadvantages (US20150354186A1): The temperature display is disposed on the curved pendant and connected with the temperature sensor (US20180202135A1): The switching mechanism is disposed on fixing holder" (US10464077): 1. switching mechanism-disposed-holder.  The switching mechanism is disposed on fixing holder" (US10464077): 1. switching mechanism-disposed-holder. The present invention has arisen to mitigate and/or obviate the afore-described disadvantages (US20150354186A1): The temperature display is disposed on the curved the forward position and the backward position are arranged at two sides of the initial position (US10449559): 1. forward position-arranged at-initial position 2. backward position-arranged at-initial position.   The switching mechanism is disposed on fixing holder" (US10464077): 1. switching mechanism-disposed-holder.  Components appear with high frequency in patent texts and mainly exist as subjects and objects in SAO structure, while the connection between components is mainly based on function as a "bridge". Taking "fast block valve" as an example, the SAO structure connecting components with functions is shown in Table 5: Applying SAO to an information-dense text can quickly and concisely explain the central idea of the sentence, reducing the workload in reading and analyzing. Furthermore, SAO ignores unimportant information in the text but quickly understands the core content. SAO is widely used in natural language processing, text information mining and artificial intelligence. Although the general keyword extraction method can extract keywords, it ignores the relationships between them, which reduces the utilization rate of data. Therefore, using SAO extraction technology does not only extract the main information in the text, but also retain the relationships between the key information. Component information and technical information can be collected accurately and quickly by Preferably, the rotation valve comprises a sleeve, a valve core unit, a rotation support and a switch knob (US8720799): 1. rotation valve-comprises-sleeve. 2. rotation valve-comprises-valve core unit. 3. rotation valve-comprises-rotation support. 4. rotation valve-comprises-switch knob. 4  The switching mechanism is disposed on fixing holder" (US10464077): 1. switching mechanism-disposed-holder. The present invention has arisen to mitigate and/or obviate the afore-described disadvantages (US20150354186A1): 1. Invention-mitigate-disadvantages.

5
The temperature display is disposed on the curved pendant and connected with the temperature sensor (US20180202135A1): 1. temperature display-disposed on-curved pendant.
Components appear with high frequency in patent texts and mainly exist as subjects and objects in SAO structure, while the connection between components is mainly based on function as a "bridge". Taking "fast block valve" as an example, the SAO structure connecting components with functions is shown in Table 5: Applying SAO to an information-dense text can quickly and concisely explain the central idea of the sentence, reducing the workload in reading and analyzing. Furthermore, SAO ignores unimportant information in the text but quickly understands the core content. SAO is widely used in natural language processing, text information mining and artificial intelligence. Although the general keyword extraction method can extract keywords, it ignores the relationships between them, which reduces the utilization rate of data. Therefore, using SAO extraction technology does not only extract the main information in the text, but also retain the relationships between the key information. Component information and technical information can be collected accurately and quickly by The present invention has arisen to mitigate and/or obviate the afore-described disadvantages (US20150354186A1): 1. Invention-mitigate-disadvantages. 2. Invention-obviate-disadvantages.   The switching mechanism is disposed on fixing holder" (US10464077): 1. switching mechanism-disposed-holder. The present invention has arisen to mitigate and/or obviate the afore-described disadvantages (US20150354186A1): 1. Invention-mitigate-disadvantages. 2. Invention-obviate-disadvantages.

5
The temperature display is disposed on the curved pendant and connected with the temperature sensor (US20180202135A1): 1. temperature display-disposed on-curved pendant.
Components appear with high frequency in patent texts and mainly exist as subjects and objects in SAO structure, while the connection between components is mainly based on function as a "bridge". Taking "fast block valve" as an example, the SAO structure connecting components with functions is shown in Table 5: Applying SAO to an information-dense text can quickly and concisely explain the central idea of the sentence, reducing the workload in reading and analyzing. Furthermore, SAO ignores unimportant information in the text but quickly understands the core content. SAO is widely used in natural language processing, text information mining and artificial intelligence. Although the general keyword extraction method can extract keywords, it ignores the relationships between them, which reduces the utilization rate of data. Therefore, using SAO extraction technology does not only extract the main information in the text, but also retain the relationships between the key information. Component information and technical information can be collected accurately and quickly by The temperature display is disposed on the curved pendant and connected with the temperature sensor (US20180202135A1): 1. temperature display-disposed on-curved pendant. 2. temperature display-connectedtemperature sensor.
Components appear with high frequency in patent texts and mainly exist as subjects and objects in SAO structure, while the connection between components is mainly based on function as a "bridge". Taking "fast block valve" as an example, the SAO structure connecting components with functions is shown in Table 5: Applying SAO to an information-dense text can quickly and concisely explain the central idea of the sentence, reducing the workload in reading and analyzing. Furthermore, SAO ignores unimportant information in the text but quickly understands the core content. SAO is widely used in natural language processing, text information mining and artificial intelligence. Although the general keyword extraction method can extract keywords, it ignores the relationships between them, which reduces the utilization rate of data. Therefore, using SAO extraction technology does not only extract the main information in the text, but also retain the relationships between the key information. Component information and technical information can be collected accurately and quickly by extracting the SAO structure from the patent text, and the Algorithm 1 of SAO extraction process is described as follows: . import CSV to DataFrame 12. Subject_POS_list = ["Subj", "nsubj", "nsubjpass"] 13. Action_POS_list = ["aux", "auxpass", "complm", "prt", "cordmod", "mmod"] 14. Object_POS_list = ["obj","dobj", "pobj"] 15

Semantic Similarity
There may be one or more interpretations for the same word. For example, the word 'handle' has two interpretations as a noun, namely (a) the part by which a thing is held, carried, or controlled; and (b) a name or nickel. As a verb, 'handle' also has two meanings: (a) feel or regulate with the hands; and (b) manage (a situation or problem). According to the meaning 'feel or regulate with the hands' as a verb, its synonyms are 'hold', 'pick up', 'grab', 'grip', 'lift', 'feel', 'touch', 'finger', 'thumb', 'touch with', 'play with' and 'paw'. According to its meaning as a noun, 'the part by which a thing is held, carried or controlled', similar words are 'have', 'shank', 'stock', 'shift', 'grip', 'handgrip', 'hilt', 'helve' and 'but knob'. Moreover, take the word 'hold' as an example-as a verb, it can be interpreted as grab, carry, or support with one's hands; as a noun, it means an act or manner of grasping something, and it also means power or control. These meanings are consistent with the interpretation of 'handle'. Therefore, the similarity of the interpretation of two words determines the similarity between them.
Suppose that the degree of similarity between a word and itself is 1; the similarity between two words with completely different meanings is 0. Set the word w i to contain sememes set (w i1 , w i2 , · · · , w im ), the word w j has interpretations set (w j1 , w j2 , · · · , w jm ), then the formula of semantic similarity (SS) between word w i and word w j is as Formula (1): where SOI(w ip ,w jq ) means similarity of interpretation. In order to calculate semantic similarity accurately, the definitions of words are divided into four characteristics: (1) basic definition description; (2) other definition description; (3) relational definition description; and (4) relational symbol description. Due to the different features of word interpretation, the influence of interpretation features on semantic similarity is also diverse. Thus, the weight of interpretation in semantic similarity is different. Based on this consideration, the weight factor β is added to the calculation of semantic similarity. Then, calculation of semantic similarity is shown in Formula (2): where β 1 is the basic interpretation of the word, β 2 represents other definitions, β 3 is the relationship interpretation and β 4 is the relationship symbol. Further, β 1 + β 2 + β 3 + β 4 = 1, and β 1 > β 2 > β 3 > β 4 . Since β 1 is the most apparent basic interpretation, the weight of β 1 is defined at above 0.5 according to the research of [46]. Sememe is the smallest unit of interpretation, and a finite set of sememes can express an interpretation. When calculating sememe similarity, it is usually calculated by the path length between two sememes in the sememe tree. When two sememes are on different sememe trees, the path length between the two sememe nodes is considered infinite, which means the similarity between them is 0. To calculate the interpretation similarity, Formula (3) can be applied: The dst(s i1 , s i2 ) in Formula (3) is the path distance between two sememe nodes, and α is the adjustable parameter. The sememe tree model is shown in Figure 3: opy 2022, 24, x FOR PEER REVIEW length between two sememes in the sememe tree. When two semem sememe trees, the path length between the two sememe nodes is c which means the similarity between them is 0. To calculate the interp Formula (3) can be applied: ( )  (3) is the path distance between two s α is the adjustable parameter. The sememe tree model is shown in Fig   Figure 3. Semaphore tree model.
Taking the sememe tree in Figure 2 as an example, "Root" is the ro nodes can find at least one path to reach the root node. In addition, B1 B11, B12, and so on. When two nodes do not have a common root nod Taking the sememe tree in Figure 2 as an example, "Root" is the root node, and other nodes can find at least one path to reach the root node. In addition, B 1 is the root node of B 11 , B 12 , and so on. When two nodes do not have a common root node, it means that the two nodes have no path to reach each other, so it can be regarded that the meanings expressed by the two sememes are not similar.

Complex Network
The term "semantic network" was proposed by Quillian [47]. It is a knowledge base with a directed graph structure in the knowledge graph. It is mainly used for understanding natural language. A semantic network can describe the state and the relationship between things.
The directed graph of a semantic network is composed of nodes and arcs (i.e., edges) between nodes. Nodes are mainly used to represent events or objects, and arcs represent the relationships between them. For example, data from Table 5-taking some components as nodes, the functions between components as arcs, and the degree as the node size-can be used to construct a semantic network model, shown in Figure 4: Taking the sememe tree in Figure 2 as an example, "Root" is the root node, an nodes can find at least one path to reach the root node. In addition, B1 is the root B11, B12, and so on. When two nodes do not have a common root node, it means t two nodes have no path to reach each other, so it can be regarded that the meani pressed by the two sememes are not similar.

Complex Network
The term "semantic network" was proposed by Quillian [47]. It is a knowled with a directed graph structure in the knowledge graph. It is mainly used for unde ing natural language. A semantic network can describe the state and the relations tween things.
The directed graph of a semantic network is composed of nodes and arcs (i.e. between nodes. Nodes are mainly used to represent events or objects, and arcs re the relationships between them. For example, data from Table 5-taking some nents as nodes, the functions between components as arcs, and the degree as th size-can be used to construct a semantic network model, shown in Figure 4: Apparently, a semantic network has the advantage of expressing the relations tween nodes directly and clearly. Moreover, a semantic network can efficiently a and infer the relationships between nodes. With their continued development, mo cepts and theories based on semantic networks have been applied to politics, the ec and intelligence analysis. Using semantic network to visualize the SAO structure in text, combined with the concepts and methods of structure hole and feature Apparently, a semantic network has the advantage of expressing the relationship between nodes directly and clearly. Moreover, a semantic network can efficiently analyze and infer the relationships between nodes. With their continued development, more concepts and theories based on semantic networks have been applied to politics, the economy and intelligence analysis. Using semantic network to visualize the SAO structure in patent text, combined with the concepts and methods of structure hole and feature vector centrality in semantic networks, the information in patent text can be fully mined and displayed.

Calculation of Element Position Importance Based on Structural Hole Theory
The structural hole theory was first proposed by Ronald Burt in 1992 [48]. A structural hole refers to a social phenomenon that a gap exists between individuals in society due to non-direct connection. This phenomenon is named as structural hole, as a hole seems to appear on the whole network. According to this theory, the position of an individual in the network is more important than the strength of the personal relationship. When the structural hole occurs between two network nodes, the node between them without direct connection has stronger information and control advantages. With these advantages, the node can obtain more resources and value. As shown in Figure 5, there is no direct connection from node A, B or C to E, but node D links them. According to Burt's structural hole theory, node D obtains more information because it occupies a structural hole. Therefore, the more structural holes a node occupies in the network, the more external connections it obtains, and it becomes more favorable.
When the structural hole occurs between two network nodes, the node between them without direct connection has stronger information and control advantages. With these advantages, the node can obtain more resources and value. As shown in Figure 5, there is no direct connection from node A, B or C to E, but node D links them. According to Burt's structural hole theory, node D obtains more information because it occupies a structural hole. Therefore, the more structural holes a node occupies in the network, the more external connections it obtains, and it becomes more favorable. According to the structural constraint algorithm, the higher the network constraint coefficient is, the better the network closure, and the fewer structural holes exist in the network. The calculation of the network constraint coefficient is shown in Formula (4): where ij α and ji α represent the weight of the connected edges between nodes; ik α and ki α represent the total weight of the connected edges between node i and all nodes; ij stren represents the strength of the connection between node i and node j.
From the strength of the connection between nodes, the network constraint coefficient (NC) of node i can be calculated by Formula (5): The value of k is not equal to i and j. It can be seen from the above formula that when j is the only connected node of i, i NC achieves its largest value, which is 1. When j has no direct or indirect connection with node i, From the calculation results of the network constraint coefficient, the critical node is the one with a small constraint coefficient. From the structure constraint algorithm of structure hole, we can see that the smaller the constraint coefficient is, the more structure holes it occupies in the network, and the more information it obtains. Figure 5 shows that node D is the only way for other nodes to contact node E. According to the intermediary centrality algorithm in the structural hole theory, the two nondirect contact nodes that get contact through other nodes will be controlled and restricted by the nodes on the path. Consider the product as a network and the components as According to the structural constraint algorithm, the higher the network constraint coefficient is, the better the network closure, and the fewer structural holes exist in the network. The calculation of the network constraint coefficient is shown in Formula (4): where α ij and α ji represent the weight of the connected edges between nodes; α ik and α ki represent the total weight of the connected edges between node i and all nodes; stren ij represents the strength of the connection between node i and node j.
From the strength of the connection between nodes, the network constraint coefficient (NC) of node i can be calculated by Formula (5): The value of k is not equal to i and j. It can be seen from the above formula that when j is the only connected node of i, NC i achieves its largest value, which is 1. When j has no direct or indirect connection with node i, NC i = stren ij . From the calculation results of the network constraint coefficient, the critical node is the one with a small constraint coefficient. From the structure constraint algorithm of structure hole, we can see that the smaller the constraint coefficient is, the more structure holes it occupies in the network, and the more information it obtains. Figure 5 shows that node D is the only way for other nodes to contact node E. According to the intermediary centrality algorithm in the structural hole theory, the two non-direct contact nodes that get contact through other nodes will be controlled and restricted by the nodes on the path. Consider the product as a network and the components as nodes, suppose the shortest path number from component q to component j is g qi , the number of shortest paths from element q to element j passing through element i is n i qj , then the betweenness centrality calculation of node i is shown in Formula (6): Taking the situation in Figure 5 as an example, the shortest paths between non-directly connected nodes are: A-B-D-E, A-D-E, A-C-D-E, B-D-C, B-A-C, B-D-E, C-D-E. That is, g AE = 3, g BC = 2, g BE = 1, g CE = 1, the shortest paths from node A to node E all pass through D, then n D AE = 3, and only one shortest path from node B to node C passes through node D, so n D BC = 1, and so on, n D BE = 1, n D CE = 1. Finally, the betweenness centrality of the node D is BC D = 3/3 + 1/2 + 1/1 + 1/1 = 3.5.
The formula of component position importance (CPI) in node i network can be obtained by combining the network constraint coefficient and the intermediary centrality algorithm in the structure hole, as shown in Formula (7): Entropy 2022, 24, 549 13 of 20 where σ is the coefficient of BC i , and the value is set to 0.5 according to experience.

Calculation of Component Resource Occupancy Based on Eigenvector Centrality
Academia prefers to use the concept of eigenvector centrality to measure the importance of a node in a network. Eigenvector centrality (EC) not only considers the degree of the node itself but also considers the importance of the node and its connected nodes.
Suppose a product is a network of components, and components are nodes of the product network. Based on the degree centrality of nodes, a product has n components, and take the component library as (e 1 , e 2 , · · · , e n ), α (i,j) is the relation component matrix of e i , then the calculation method of component metrics (CM) of e i is shown in Formula (8): From Formula (8), although the degree centrality can identify nodes with higher connectivity to other nodes in the product network, it only considers the degree centrality and ignores the importance of associated nodes. According to the centrality of eigenvectors, the importance of a node depends on both the quantity and the quality of its adjacent nodes. For example, in Figure 6, B, D and E have a higher I/O degree than the others. Thus, B, D and E occupy more resources. However, F and G have to pass through E to get in touch with other nodes, so node E holds a relatively higher resource amount than F and G; therefore, E is more critical than B and D.
where c is a constant. Since i CRO equals . suppose A is the component's iteration matrix of the associated components. After several iterations, V cAV = and the constant c is the inverse of the eigenvalue of matrix A.
The centrality of eigenvector is based on the centrality of degree, combined with measurement of the importance between nodes. This enables accurate identification of the core components. Therefore, the core components can be obtained by comprehensively considering the "quantity" and "quality" of component association.  Suppose the relative node matrix of e i is α (i,j) ; the importance measure of e i is v i . The calculation method of component resource occupancy (CRO) of e i is shown in Formula (9): where c is a constant. Since CRO i equals v i . suppose V = [v 1 , v 2 , · · · , v n ] T . A is the component's iteration matrix of the associated components. After several iterations, V = cAV and the constant c is the inverse of the eigenvalue of matrix A. The centrality of eigenvector is based on the centrality of degree, combined with measurement of the importance between nodes. This enables accurate identification of the core components. Therefore, the core components can be obtained by comprehensively considering the "quantity" and "quality" of component association.

Comprehensive Weight Calculation of Core Components
Based on the core component definition and identification methods mentioned in the former section, to calculate component importance (CI) for component i (i.e., CI i ), Formula (10) is applied: where α is the parameter of CPI i , specified as 0.5; CI is the value between the interval [0,1]. Using Formula (10) as an indicator, the higher the CI of the component, the more important it is. In previous research, the identification and acquisition of component importance had not considered the component's location and associations in the product. Formula (10) can make up for the shortcomings of existing research. CPI is based on the location of components, and CRO is based on the resources (associations) owned by components. Through parameter adjustment, component importance data under different focuses can be obtained.

Research Background
As a typical product in the field of mechanical design, a shower is a rapidly updating and high demand product. Due to fierce competition in the marketplace, enterprises upgrade their products to meet the requirements of mainstream consumer groups (post-1980s and 1990s consumers) by adding concepts such as energy conservation, environmental protection and intelligence. However, many companies do not understand the core components and main functions of shower products, which results in obstacles in product design and upgrading. The main reason for the low efficiency of transforming design schemes into market products is insufficient understanding of the product's core components, in other words, the core components of the designer's supervisory consciousness have deviated from the facts. Based on these problems, this study takes US shower patents as the research object to obtain the shower's core components.

Retrieval and Acquisition of Patent Data
The cooperative patent classification (CPC) is more detailed than the international patent classification (IPC). Therefore, a US shower patent was chosen as the research object. This case searched for and downloaded US shower patent text prior to 2020 through the well-known patent website USPTO database (http://www.uspto.gov accessed on 1 December 2019). A total of 1733 US shower patent texts were obtained with the keywords and CPC numbers in Table 6, in which the * means string of character of any length. The results met the requirements of patent analysis by conforming to the recall ratio and the accuracy ratio.

Data Preprocessing and Supplementation
First, we checked the shower patent texts and queried and supplemented missing shower patent text manually. Next, we extracted the patent overview section of the text and applied the BeautifulSoup module in Python to remove the label section, the title section and the garbled section from the text.

POS Tagging and SAO Acquisition
There are three kinds of NLP libraries currently: spacy, NLTK and core NLP. Because Spacy has certain advantages in terms of application ability, running speed and accuracy, it was used for extracting SAO structure and to annotate the POS of words in the text. The first step was to put the overview section of the obtained patent text into a CSV file, then, the overview was read line by line, and the punctuation in each sentence was divided into sentences in the form of regular expressions. Finally, all the sentences were listed, and Spacy was used to mark the POS of the words. Next, we extracted the subject, action, and object in the text according to POS tagging. Some sentences contained multiple subjects, actions, and object keywords. After matching, a total of 1048574 SAO structure groups were acquired. Each SAO structure group was input into the DataFrame according to patent number. Finally, the DataFrame was saved locally in CSV format. The results are shown in Table 7: In order to verify the effectiveness of this method, we compared our results with TF (term frequency) and TF-IDF keyword extraction algorithms, which are traditional methods [37,49,50], by precision (P), recall (R) rate and F-measure (F) Equations (11)- (13).
Ten patents that contained more than 600 words were randomly selected as test objects, and experts were invited to verify the effect. The results are shown in Figure 7. The ROC value of our technique (SAO-complex network (SAO-CN)) is significantly better than that of other algorithms. In order to verify the effectiveness of this method, we compared our results with TF (term frequency) and TF-IDF keyword extraction algorithms, which are traditional methods [37,49,50], by precision (P), recall (R) rate and F-measure (F) Equations (11)-(13).

TP P TP FP
= + (11) TP R TP FN = + (12) 2 P R F P R × × = + (13) Ten patents that contained more than 600 words were randomly selected as test objects, and experts were invited to verify the effect. The results are shown in Figure 7. The ROC value of our technique (SAO-complex network (SAO-CN)) is significantly better than that of other algorithms. There are some words with high similarity in the combination of SAO keywords, such as 'head' and 'heads'; 'contained' and 'containing'. Generally, the similarity between words and their plural forms and the primary meaning of passive voice is 1. Thus, synonyms are combined for nodes with the original similarity of 1. The degree of a node is There are some words with high similarity in the combination of SAO keywords, such as 'head' and 'heads'; 'contained' and 'containing'. Generally, the similarity between words and their plural forms and the primary meaning of passive voice is 1. Thus, synonyms are combined for nodes with the original similarity of 1. The degree of a node is taken as the weight to measure the size of the node. After statistical treatment, the weight of the corresponding nodes are shown in Table 8:

Complex Network Construction and Core Component Acquisition
Gephi is software for analyzing complex networks. In this study, Gephi was applied to analyze the shower patent case. For analyzing the shower patent text and building a complex network: firstly, import the CSV file into Gephi; take 'subject' as the initial node, 'label' as the directed edge, and point to the node target at 'object'. Finally, 3099 groups of component nodes are obtained by selecting and eliminating the SAO structures independent of components. Screen the core nodes by the K-core algorithm, and obtain the complex network of SAO keyword combinations of shower patent text, as shown in Figure 8:

Complex Network Construction and Core Component Acquisition
Gephi is software for analyzing complex networks. In this study, Gephi was applied to analyze the shower patent case. For analyzing the shower patent text and building a complex network: firstly, import the CSV file into Gephi; take 'subject' as the initial node, 'label' as the directed edge, and point to the node target at 'object'. Finally, 3099 groups of component nodes are obtained by selecting and eliminating the SAO structures independent of components. Screen the core nodes by the K-core algorithm, and obtain the complex network of SAO keyword combinations of shower patent text, as shown in Figure 8: According to Formula (9), the occupancy of component resources (CRO) is obtained by taking the centrality of eigenvector as the judgment standard, as shown in Table 9.
According to Formula (7), the position importance of components (CPI) is obtained from the betweenness centrality and structural constraint coefficient of components (the calculation results shown in Table 8). Because the intermediate centrality coefficient value According to Formula (9), the occupancy of component resources (CRO) is obtained by taking the centrality of eigenvector as the judgment standard, as shown in Table 9. According to Formula (7), the position importance of components (CPI) is obtained from the betweenness centrality and structural constraint coefficient of components (the calculation results shown in Table 8). Because the intermediate centrality coefficient value is small, the coefficient is adjusted to one decimal place. According to the values listed in Tables 7 and 8, combined with Formula (10), component importance is calculated as shown in Table 9. Due to the nature of this study, only the top 20 components are listed here.

Demonstration and Discussion
Furthermore, we selected US patent applications from 1 January 2020 to 31 December 2021 as a test dataset, which included 72 patents and 30,000 words, to evaluate the predictive ability of the method, and we used P, R and F calculated by Equations (11)-(13) as ability indexes. By importing the data in Table 9 into a test dataset, the scores of P, R and F were 0.75, 0.79 and 0.77, respectively. Those high scores mean that the performance of the proposed method is reliable. After all, it is hard to detach a few valuable components from a huge amount of patent data for R&D.
As shown in Table 9, patent data mining determined a valve is the core component of showers. In fact, the valve of showers in the market is mainly divided into single-way, double-way or three-way. Showers with more complicated valves can accommodate more water flow patterns, meeting diverse customer needs. In light of this, we designed a new shower with a motor-controlled valve that enables multiple types of water flow, as shown in Figure 9. The working principle of this shower is: when the water has passed through coil 5, the coil will generate electricity, which will be stored in the battery placed inside the showerhead 25; the battery will be used to support the motor 14 to adjust the valve 15, thereby allowing water flow control. The motor can be controlled by a microchip (not shown). Compared to existing products, the valve has a simpler mechanical structure and automatic-adjust function by using the motor and electric circuit.