Introducing Patents with Indirect Connection (PIC) for Establishing Patent Strategies

: A patent system requires novelty and progressiveness so that new patents do not infringe on the rights of prior art. Patent investigation including a prior art search is essential to the process of commercialization of technology. In general, patent investigation has been conducted by experts based on their qualitative judgement. However, the number of patents has increased so fast that it has become difﬁcult to handle the quantitative burdens of the search with a conventional approach. There have been previous studies dealing with patent investigation to ﬁnd similar technologies. They had limitations as they did not utilize the citation relationship and similarity between patents in a comprehensive way. In addition, they could not properly reﬂect the sequential citation relationship of patents though this is effective in discovering similar patents. In this study, we propose an efﬁcient methodology to discover similar technologies by comprehensively considering the similarity and citation relationship between patents. In particular, we intended to reﬂect the citation sequence and indirect citation relationship in the process of searching for similar patents. For this, we introduced the concept of “patents with indirect connections” (PICs) and devised an algorithm to efﬁciently detect patent pairs having such a relationship. The proposed methodology of this study contributes to preventing patent litigation in advance by discovering patents with such potential risks. It is expected that this method will provide patent applicants with the opportunity to establish appropriate strategies against competitors with similar technologies. In order to examine the practical applicability of the proposed method, Korean patents related to machine learning and deep learning were collected. As a result of the experiment, it was possible to identify 24 pairs of similar patents without a direct citation relationship and derive appropriate counter strategies.


Introduction
Sustainable growth and development are very important goals for companies, but they are difficult to achieve [1]. It is technology that makes these goals possible by giving companies a competitive advantage in the marketplace [2]. According to Benz (2011), technology is created through the application of knowledge and plays an important role in sustainable growth [3][4][5]. Thus, it is inevitable that there is fierce competition among companies for secure superior technologies to gain competitive power in the market [6][7][8][9][10]. In a highly competitive market environment, there is a need for an institutional device that can safely protect the right to technology created as a result of research and development (R&D). It is a patent system that guarantees applicants the legal rights to a technology. It promotes the development of the industry by letting companies disclose the contents of their technology to the public. As compensation for this, they are guaranteed an exclusive right to implement this technology for a certain period of time.
A patent without novelty is likely to cause legal disputes and social losses. A patent is registered after examination by the patent offices of each country and its rights are granted to the applicant. In order for a patent to be registered, it requires novelty and progressiveness as well as industrial applicability. Among the requirements for patent registration is novelty, meaning that the rights claimed by an applied patent are sufficiently differentiated without infringing the rights of any prior art. If a patent without novelty infringes on the scope of the rights belonging to prior art, there is a high possibility of conflict between the patent owners. In addition, this could lead to legal litigation, resulting in financial losses for both sides and impeding industrial development. Therefore, prior art investigation is an essential prerequisite for research and development (R&D) and patent application. It plays an important role in preventing such problems in advance.
It is common that companies conduct a prior art search before R&D or patent filing and reflect the results in their management strategy. However, the direction of strategy differs depending on companies' positions and the existence of similar technologies [11]. Companies trying to apply for new patents use this investigation process to prevent potential disputes with prior art owners. If similar prior art is found, they might attempt to invalidate the rights of existing patents or differentiate the claims of new patents from them. On the other hand, the owners of existing patents can also carry out patent investigation to monitor whether any following patents infringe on their scope of rights. If an infringement is occurring, they can file a lawsuit to claim compensation. Another possible alternative is that companies compromise with each other through cross-licensing. Therefore, it is obvious that patent investigation including a prior art search is a very important procedure for allowing applicants to determine the direction of their patent management strategy.
The main purpose of the prior art search is to investigate whether a technology similar to a patent to be applied exists. If there are live patents with an overlapping scope of rights in the market and they cannot be found them in time, it will be difficult to avoid conflicts between their owners. For this reason, there have been a lot of studies dealing with the methodology of prior art search [12][13][14]. The value of these studies lies in how effectively and efficiently similar technologies can be discovered. Some scholars have proposed a method to search for similar technologies based on the citation relationship of patents [15][16][17]. These studies have the advantage of being able to effectively find similar patents connected to each other in the patent citation network. However, since direct connection in the citation network does not always guarantee high similarity between patents, we need to expand the scope of the search to include patents with indirect connection. Another group tried to search for prior art based on the similarity of the text in documents such as patents and papers [18][19][20][21]. The advantage of this method is that it can quantitatively assess the degree of similarity. However, there are also disadvantages in that it is difficult to limit the scope of the search for prior art and to reflect the changes in terminology used over time. The motivation for conducting this study is the recognition that the above limitations can be improved if there is a methodology of patent investigation that utilizes both the document similarity evaluation using the bibliographic information and citation information of patents. Even if Yaghtin et al. (2019) recognized a significant correlation between the citation information and the degree of similarity between patent documents, there are few studies that have applied both methods to identify core patents and prior art in a comprehensive way. Even when both pieces of information were used to search for similar prior art, the citation sequence or indirect citation relationship could not be reflected.
For the sustainable growth of companies and industries, the methodology of finding prior art and similar technologies should be able to answer the following questions.

1.
What patents may pose a potential threat to my organization? 2.
Which of our technologies could be involved in lawsuits? 3.
What are the prior technologies that can serve as a driving force for competitive advantage when converged with our technology?
The case corresponding to the first question occurs when a patent of a competitor is likely to infringe on the rights of a company's existing intellectual properties. In this case, the risk of potential loss can be eliminated by claiming the legal rights through patent litigation or licensing agreements. The second is the case where it is determined that the patent to be filed by an organization is similar to the prior art. In this case, it may be necessary to amend the claims or insist on the invalidation of the preceding patent so as not to infringe on the rights of the prior art. The final question is to find a technology that can generate synergy through fusion with the patent to be applied.
In order to effectively and efficiently discover similar technologies at risk of potential legal disputes, this study proposes a methodology for prior art searching with new principles. To be specific, the proposed method detects similar technologies by utilizing both the citation relationship and the similarity between patents. In particular, in order to overcome the limitations of previous research utilizing the citation relationship, we defined a special relationship between patents that may appear in the citation network as "patents with indirect connection (PIC)", which is useful in finding similar technologies and improves the search efficiency. The proposed method takes into account the sequential citation relationship among patents based on PIC. Patents tend to re-cite documents cited by similar prior patents, in order that the sequential citation relationship can be helpful to make it efficient in discovering similar technologies. In this study, the algorithm used to identify similar patents generates a citation network and a similarity network by using the citation and bibliographic information of patents. It also includes the process of integrating the two networks into one numeric matrix, from which we can detect patent groups that are similar to each other and have a high potential for rights conflict. In addition, the result is represented as a visualized network to allow users to easily find the pairs of patents corresponding to PICs. We expect the proposed methodology to provide patent applicants with an opportunity to prepare for potential patent disputes by making it easier to find similar technologies.
The rest of this article is organized as follows. In Section 2, the related works of this study is explained. Section 3 describes the theoretical background for network analysis using patent citation information. In Section 4, the proposed methodology for finding similar technologies is explained in detail. Section 5 deals with an experiment to verify the applicability of the proposed methodology. Section 6 discusses the disadvantages as well as the strengths of the proposed method. Finally, Section 7 proposes future research to improve the shortcomings discussed in the previous section.

Studies on Finding Core Patents and Prior Art
When there are lots of patents owned by competitors in a specific industry, it is necessary for companies entering the market to establish a counter strategy to overcome the barriers to entry. Existing companies also need to constantly monitor whether there is a possibility of patent rights conflict with other applicants when filing new patents or implementing existing technologies, which should be reflected in their management strategies. Establishing such strategies requires companies to find core patents and prior art. Patent investigation, which searches for core patents and similar prior art, is a process that must be preceded in establishing a company's technology management strategy. A core patent is not only unique and likely to be used for mass production, but a major target of patent disputes and licensing [22]. Prior art must be investigated to prevent the infringement of rights and to prove the novelty of a new patent. Identifying these kinds of patents based on qualitative analysis is quite time-consuming and costly. In recent years, therefore, research has been widely conducted to effectively search for core patents and prior art from patent data and utilize the results for establishing counter strategies.
Applicants cite prior art to claim the novelty and differentiation of their patents. They also use the family patent system to secure patent rights in several countries. Such information helps to find similar patents and build strategies to respond to them [23][24][25][26][27]. Su et al. (2011) proposed the concept of a patent priority network (PPN) using family patent information, which is applied when searching for valuable patents. They also defined a critical chain and a significant chain to detect the possibility of a patent dispute. Kim et al. (2015) conducted a study to extract core patents by using information such as citation and family patents. In order to visualize the results, they represented a matrix composed of patent documents and international patent classification (IPC) codes. Yoon & Choi (2012) and Kwon et al. (2018) carried out studies to derive core patents by indexing quantitative information such as the number of forward citations, conducting a matrix analysis based on it. They visualized patents in a two-dimensional matrix and proposed a method of constructing a counter strategy according to the characteristics of the patents in each quadrant. Kang et al. (2017) collected patents in direct citation with target patents to develop the invalidation logic of core patents. The study also proposed a method for selecting candidate patents likely to be used for the invalidation. Furthermore, there have been lots of prior studies which have applied co-citation information into prior art search [28,29]. In particular, Shibata et al. (2008) clarified the concept of inter-citation as well as co-citation to derive insights from the citation relationship between documents. Yaghtin et al. (2019) implied that the existence of a co-citation relationship had a significant correlation with the degree of similarity between patent documents.
Patents contain textual information such as an abstract and claims, as well as various numeric information, both of which can be effectively used to evaluate the degree of similarity between patents and find prior arts corresponding to a target technology [30][31][32][33].  2017) proposed a method of recommending prior art to be used for invalidation logic development by calculating the similarity between two arbitrary patents based on information entropy and topic modeling.
As a result of reviewing the literature, previous studies used citation relationships and textual information to find core patents and prior art in a domain of interest. Some of them identified a significant correlation between the citation information and the similarity of the document through empirical experiments [29]. However, there was a limitation in that they mostly did not use both sets of information to identify core patents and prior art in a comprehensive way. Although there have been some studies that recommend prior art candidates based on citation relationships and similarity between patents, there is still a problem that the citation sequence of the patents is not considered.

Development of Counter Strategies
When filing a new patent application, it is necessary to be careful not to infringe on the rights of prior art. In order to avoid a conflict of rights with prior art discovered through investigation, it is required that companies consider the following strategies [34]: 1.
Developing non-infringement logic: Discovering loopholes in existing patents' claims.

2.
Developing invalidation logic: Prior art searches that could deny the novelty or progressiveness of the claims included in existing patents.

3.
Design of circumvention: Alternative technology design to avoid infringing on the rights of existing patents.

4.
Cross license: Negotiation through contracts with patent owners where there is potential for patent rights conflict.
We can classify the first two as defensive strategies and the last two as aggressive strategies. Grindley et al. (1997) defined that the defensive strategies are to freely innovate and commercialize technology in a market where competitors possess a lot of prior art [35]. Developing non-infringement and invalidation logics can be used when a lawsuit for the infringement of rights is filed against a later patent. In this situation, defendants may attempt to invalidate the patents owned by the plaintiff by examining patents filed earlier than them. They can also try to claim non-infringement by logically explaining the difference of their invention from that of the plaintiff. The design of circumvention and cross licenses can also be possible alternatives to reduce the risk of conflict. Applicants should make an effort to write a claim with novelty so as not to infringe on the scope of the rights of prior art. If it is difficult to invent in such a way, it is better to try to cooperate with the holders of prior patents. Lippman & Rumelt (1982) maintained that the aggressive strategies were to prevent their technology from being imitated and to attain a monopolistic advantage in the marketplace [36]. For example, first movers and fast followers might try to monitor new competitors' patent activity in order to protect their own patents and prevent potential losses. According to Arora and Andrea (2003), new companies that lack commercialization capabilities tend to become active negotiators and try licensing with others with relatively good capabilities and more experience [37]. As such, patent strategies can vary depending on the purpose and the size and position of a company [38,39].

Backgrounds
A patent is a document to protect the scope of legal rights on a technology. It is required that prior art is cited when filing a patent application so as not to infringe upon their legal rights. In the context that patents with superior technical characteristics are more likely to be cited from other patents, there have been many citation-related studies [40][41][42]. A citation network analysis of patents is a representative case in this research field. Figure 1 is an example of citation network analysis of patents. The patent document can be converted into a vector based on its term frequency. Then, it is possible to evaluate the degree of similarity between two patents. The most representative is the cosine similarity index, which measures how closely the direction of two vectors coincide [43][44][45]. If A and B are both N-dimensional vectors, the cosine similarity between the two can be obtained by Equation (1): If the two vectors are in exactly the same direction, the value is equal to 1. On the other hand, when the value equals −1 they are in the completely opposite direction. Therefore, the cosine similarity between two documents is calculated as a value between −1 and 1. Figure 2 shows the process of creating a similarity network using the similarity measure. To make this, pairs of documents whose similarity value is greater than a preset threshold value are identified. In the example, the threshold value is set as 0.5. The similarity network used in this study is the result of visualizing a similarity adjacency matrix (SAM) constructed based on the similarity between documents.

Proposed Methodology
This study proposes a method of searching for patent groups likely to have overlapping scopes of rights by using the citation relationship and similarity between them. Figure 3 shows the task flow of the proposed methodology. First, patent documents matching the purpose of the analysis are collected. The text in the collected patent is preprocessed and converted to document-term matrix (DTM) through lexical analysis. Next, we draw the citation network by using the citation information of the collected patents. Then, the text similarity between each patent is calculated based on the contents of the representative claims. The completed citation network and similarity network are integrated into a citation and similarity network (CS-Net). How to configure the CS-Net through combining citation and similarity networks is described in Section 4.2. It is effective in finding patents with a special relationship that we define as "patents with indirect connection (PIC)", because it considers citation relationship and similarity between patents in a comprehensive way. PIC refers to a pair of patents considered similar to each other as they are indirectly connected in CS-Net. In order to make it easy to discover PICs in CS-Net, which is a large network composed of collected patent big data, we propose an algorithm named PIC-explorer (PIC-E). The details of the PIC-E algorithm is described in Section 4.3. Even though a pair of patents corresponding to PIC are not directly linked in a citation network, there is a possibility that they have a similar scope of rights. Therefore, once there are PICs detected by PIC-E, it is necessary to examine the possibility of patent infringement and establish an appropriate response strategy.

PIC: A Pair of Patents that Can Be Found by Similarity and Citation Information between Technologies
This study aims to find sets of patents dealing with similar technologies, but that are not directly linked to each other in the citation network, as shown in Figure 4. For example, suppose (i) patent B is a cited patent of C and it is re-cited by patent A; and (ii) patent A and patent C are similar to each other. In this case, the filing years of patents A and C are 2017 and 2005, respectively. It is highly likely that C, whose filing date is earlier than A, is a prior art of A. In this study, the relationship between A and C is defined as PIC. Considering the relationship, the applicant of the preceding patent C needs to investigate if patent A has infringed the rights of patent C. On the other hand, the applicant of the succeeding patent A may need to develop a differentiation or invalidation logic in order not to infringe C's scope of rights. Besides, new market entrants in this domain need to carefully scrutinize the claims of existing patents and establish a filing strategy to circumvent their scope of rights. Therefore, it is helpful for them to analyze the patents constituting the PICs.

CS-Net: A Method of Merging the Citation Network and the Similarity Network
Let n be the number of documents collected. The size of the document similarity matrix is then n by n. When there is no citation among the collected documents, the size of the citation matrix is also n by n.
However, if there are k documents not included in the collected patents, the size of the citation matrix is (n + k) by (n + k). Therefore, the size of the citation matrix is always greater than or equal to that of the document similarity matrix. Thus, the addition of the two matrices cannot be done by the general sum of matrices. Table 1 shows the pseudo code of the CS-Net algorithm for merging the two networks. The adjacency matrix to build CS-Net is the sum of the CAM and the SAM. Since it is the sum of two matrices composed of 0 and 1, each element constituting the adjacency matrix is one of 0, 1, and 2.

Input
: CAM = citation adjacency matrix (m × m), m = n + k SAM = similarity adjacency matrix (n × n), m ≥ n Output : CS = adjacency matrix (m × m) Initialize : CS = zero matrix (m × m) FOR all the elements in the CAM IF the order of elements ≤ n then Summation each element of CAM and SAM ELSE the value of CAM is used as it is END Figure 5 shows the conceptual diagram of CS-Net. In the example, both n and k are three. The sizes of the citation matrix and the similarity matrix are 6 by 6 and 3 by 3, respectively. Since both matrices contain patents A, B, and C, values corresponding to the same patent pair are added together. In this example, not only is there a direct citation relationship between patents A and B, and B and C, but A and C are similar. As a result, a pair of PIC (A and C) can be found from a network consisting of six nodes. Figure 5 shows the conceptual diagram of CS-Net. In the example, both n and k are three. The sizes of the citation matrix and the similarity matrix are 6 by 6 and 3 by 3, respectively. Since both matrices contain patents A, B, and C, values corresponding to the same patent pair are added together. In this example, not only is there a direct citation relationship between patents A and B, and B and C, but A and C are similar. As a result, a pair of PIC (A and C) can be found from a network consisting of six nodes.  Table 2 describes the pseudo code of PIC-E, an algorithm to search for PICs in CS-Net. The input of PIC-E is the CS matrix obtained by applying the CS-Net algorithm with the citation matrix and the similarity matrix. x ij denotes an element in the ith row and jth column of the CS matrix, and has a value of 0, 1, or 2. The first condition that x ij has a non-zero value is that there is a citation relationship between the ith patent P i and the jth patent P j . The other condition is that the similarity value between the two patents is greater than or equal to the preset threshold. The value of x ij is 2 when both conditions are satisfied, and 1 when only one of the conditions is satisfied. Figure 5 is an example of CS where m is 6. In this example, patents A and C are P 1 and P 3 , respectively. Seeing the value of x 13 in the CS matrix equals to 1, one of the two conditions mentioned above is satisfied. The following step is to compare Date 1 and Date 3 , which are the filing dates of P 1 and P 3 . Since Date 1 is later than Date 3 , Diff representing the time difference is positive. Therefore, the later patent P 1 corresponds to P L , and P 3 is P E . F 1 denotes a set of forward citations of the prior patent P E . In Figure 5, patents B and F are included in F 1 because they are forward citations of P E (patent C). F 2 represents the forward citations of the patents belonging to F 1 . If P L (patent A) whose filing date is later than P E (patent C) is included in F 2 , the relationship between P E and P L is PIC.  CS-Net is able to visualize both the citation relationship and the similarity information of the collected patent big data. In other words, it is easy to grasp the citation flow of a patent and the process of similar technology development through CS-Net. However, since CS-Net is a very large network, it can be reconstructed through PIC-E by selecting only the patents with a PIC relationship. As a result, we can efficiently visualize big data and find the patents with a high risk of patent disputes.

Experimental Study
This section conducts experiments to confirm the practical applicability of the proposed method. For the experiment, we collected 1484 patents related to machine learning and deep learning published by the Korean Intellectual Property Office (KIPO). These technologies have recently been widely applied in robotics, specifically to the part that plays the role of the brain in robots. Based on the time of analysis, the number of patents cited more than once from other patents is 771. The largest number of times a patent was cited from another patent was 38.
Khaiii (Kakao Hangul Analyzer III) is a morpheme analyzer that learned the Korean corpus called 'Sejong' provided by the National Institute of the Korean Language with a deep learning structure [46][47][48]. We use Khaiii as a tokenizer and part-of-speech (POS) tagger for preprocessing text in the collected Korean patents. Tokenization and POS-tagging were performed for the representative claims in the patent documents, and only nouns were extracted [49,50]. In addition, DTM was constructed by calculating the term frequency-inverse document frequency (TF-IDF) weights of each extracted noun [51,52]. Based on this, a similarity matrix was created based on the cosine similarity.
The next step was to explore PICs from the CS-Net via PIC-E. As a result, a total of 24 pairs of PICs consisting of 48 patent nodes were identified. Figure 6 shows the CS-Net which expresses the 48 patent nodes corresponding to the PICs in a dark color. The nodes expressed in a light color are patents that are similar to or have a citation relationship with the patents belonging to the PICs. Patents corresponding to the PIC are marked with a CS-Net is able to visualize both the citation relationship and the similarity information of the collected patent big data. In other words, it is easy to grasp the citation flow of a patent and the process of similar technology development through CS-Net. However, since CS-Net is a very large network, it can be reconstructed through PIC-E by selecting only the patents with a PIC relationship. As a result, we can efficiently visualize big data and find the patents with a high risk of patent disputes.

Experimental Study
This section conducts experiments to confirm the practical applicability of the proposed method. For the experiment, we collected 1484 patents related to machine learning and deep learning published by the Korean Intellectual Property Office (KIPO). These technologies have recently been widely applied in robotics, specifically to the part that plays the role of the brain in robots. Based on the time of analysis, the number of patents cited more than once from other patents is 771. The largest number of times a patent was cited from another patent was 38.
Khaiii (Kakao Hangul Analyzer III) is a morpheme analyzer that learned the Korean corpus called 'Sejong' provided by the National Institute of the Korean Language with a deep learning structure [46][47][48]. We use Khaiii as a tokenizer and part-of-speech (POS) tagger for preprocessing text in the collected Korean patents. Tokenization and POS-tagging were performed for the representative claims in the patent documents, and only nouns were extracted [49,50]. In addition, DTM was constructed by calculating the term frequencyinverse document frequency (TF-IDF) weights of each extracted noun [51,52]. Based on this, a similarity matrix was created based on the cosine similarity.
The next step was to explore PICs from the CS-Net via PIC-E. As a result, a total of 24 pairs of PICs consisting of 48 patent nodes were identified. Figure 6 shows the CS-Net which expresses the 48 patent nodes corresponding to the PICs in a dark color. The nodes expressed in a light color are patents that are similar to or have a citation relationship with the patents belonging to the PICs. Patents corresponding to the PIC are marked with a number denoted by h, and the two patent nodes constituting a pair of PICs are assigned the same number (h = 1, 2, . . . , 24). For example, the two patents of the 3rd PIC are both labeled 3. In addition, the edge representing the PIC relationship in the network is indicated by a thick blue line. Through the visualization results, nodes 5, 7, 13, and 17 each formed an independent network. On the other hand, the others appear to have a similar or citation relationship as they constitute one connected network. The patent pairs that formed the PIC relationship are likely to deal with similar technologies. Table 1 in the Appendix A shows the cosine similarity (Sim) and applicant year (Year) of patents corresponding to PICs, and the title of the patent document. Among the PIC-related patent pairs, the technology with the largest similarity is related to a vehicle vision system equipped with an artificial intelligence chip. The second-largest pair of patents dealt with an energy management system using machine learning. Considering the PIC relationship, it is possible to establish a strategy for prior art applicants to claim infringement from new patents. Conversely, applicants of later patents may develop logic to circumvent or invalidate the scope of the prior art.

Discussion
Industrial applicability, novelty, and progressiveness are essential elements of a patent. Novelty is crucial because patents legally protect rights instead of disclosing technology. Considering prior art, researchers try to improve and develop advanced technologies. This process is the ideal goal of the patent system. In this context, a prior art search is essential for rights protection and technological advancement. Researchers plan to develop new technologies through a prior art analysis. If similar prior art exists, they attempt to make their invention different or more advanced. Without the investigation of prior art, the risk of potential patent litigation increases.
Previous studies used various methods to improve the efficiency of searching for prior art. Most studies used a citation relationship to search for similar patents. Similar prior art and technologies that infringed on the rights of other patents were searched with the information. There were also related studies that evaluated and reflected the degree of similarity of patent documents using the text information. However, previous studies did not comprehensively consider the correlation between citation information and document similarity, and the sequence of citations.
This study proposes a prior art search method that considers both citation information and document similarity. It is designed based on the characteristic that patents cited by other patents tend to be re-cited by patents similar to them. If there is no citation relationship among similar patents, it is necessary to question whether the rights of preceding patents are infringed. Therefore, the first purpose of this method is to find prior art whose scope of rights may have been infringed by later patents. The second purpose is to monitor later patents so that they do not infringe on the scope of the rights of previous patents.
Our research still has some limitations as described below. First, it is difficult to reflect new technologies because they have relatively few opportunities to be cited by other patents. The proposed method is designed considering the tendency that similar patents are likely to cite the same patents. Therefore, recently developed technologies may be less likely to be detected by this method. The second limitation is on the depth of sequential citations. We have focused on the indirect relationship between two patents. In some cases, however, similarities between patents in direct citation relationship may be large. Furthermore, sufficient consideration is required for the case where the length of the citation sequence is long.

Conclusions
The purpose of the patent system is to promote technological advancement and industrial development. According to the purpose of the patent system, a new patent requires novelty and progressiveness compared to prior art. When a new patent infringes the rights of prior art, it is inevitable that companies spend lots of time and money in resolving patent disputes. In order to prevent patent disputes, this paper proposed a method of establishing a counter strategy using citation relationships and similarities of prior art.
The proposed method was tested with patents related to machine learning and deep learning to confirm the practical applicability of the method. As a result of the experiment, a total of 48 patents were similar, but there was no direct citation relationship. In addition, some of the patents in the indirect citation relationship were judged to have a possibility of dispute because their claims are similar to each other. The similar patent pairs differ in the time of filing, so it is possible to prepare a strategy for judging the infringement of rights and a strategy for developing a non-infringement or invalidation logic. This methodology is expected to be widely used to search for prior art or to monitor the occurrence of rights infringement in domains that form a complex citation relationship, such as in the field of robotics.
In the future, it is necessary to study the counter strategy that has expanded from the patent level to the company level. If this is possible, it can search for competitors. In addition, research on algorithms that can reflect new technologies is needed. To this end, not only the citation information of a patent but also family patents may be used. Finally, a method that considers a deeper citation relationship is needed. Such research can be utilized to analyze the direction of technology development and search for basic technologies. Basic technologies are patents that form the basis of a technical field, and once they are identified, the flow can be easily understood. Therefore, it is expected that this method will be used for efficient patent big data analysis. Table 1 in the Appendix A is a list of PICs derived from the experiment in Section 5. The first column of the table is the index of the PIC. Sim is the value of cosine similarity (Sim) of the two patents belonging to the PIC in descending order. Year refers to the filing year of each patent. Among the two patents corresponding to each PIC, the older one is written at the top.