Next Article in Journal
LIVAS-Net: A Parameter-Efficient 3D Architecture for Intracranial Artery Segmentation in TOF-MRA
Previous Article in Journal
Fixed-Frequency Dual-Active-Bridge Resonant Converter with Four Degrees of Freedom Using Triple Phase Shift and Current-Controlled Variable-Inductor
Previous Article in Special Issue
Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification in Substation Worker Safety Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Purpose-Aware Semantic Reasoning Model for Patent Infringement Detection in the DIKWP Network

1
School of Information and Communication Engineering, Hainan University, Haikou 570228, China
2
School of Computer Science and Technology, Hainan University, Haikou 570228, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(11), 2449; https://doi.org/10.3390/electronics15112449
Submission received: 30 April 2026 / Revised: 24 May 2026 / Accepted: 29 May 2026 / Published: 3 June 2026
(This article belongs to the Special Issue AI for Industry)

Abstract

Patent infringement detection requires coordinated interpretation of technical claims, legal standards, and contextual evidence. This study proposes a semantic AI framework for patent infringement detection grounded in the DIKWP network and artificial consciousness theory. The DIKWP network organizes the analytical modules as interacting semantic spaces rather than as a strictly layered pipeline. This design supports iterative semantic interpretation, knowledge integration, and purpose-oriented reasoning. The framework integrates document ingestion, semantic information extraction, ontology-based knowledge representation, rule-guided inference, and decision support. The system processes patent claims, product descriptions, and prior-art documents with patent-oriented NLP. Named entity recognition and subject–action–object parsing convert unstructured text into structured semantic representations. Legal and technical ontologies support claim-element interpretation. Knowledge graphs, semantic pattern matching, and inference rules then align claim elements with product features and identify potential infringement risks. A prototype implementation demonstrates end-to-end processing from raw text to infringement-oriented assessment. The evaluation was conducted in two layers. First, a controlled synthetic patent–product corpus was used to isolate claim-element reasoning, rule-guided inference, and purpose-conditioned operating modes. Second, a real-world pilot corpus was constructed from publicly available patent claims and real product technical descriptions, including manufacturer manuals, technical datasheets, official product webpages, installation guides, and technical brochures. The controlled-corpus results show that the DIKWP network improves over keyword-matching and ontology-only baselines by integrating semantic coverage, claim-level legal reasoning, and explainable output. The real-world pilot provides a preliminary external-validity check of whether the framework can preserve element-level reasoning under realistic drafting styles, domain terminology, incomplete product evidence, and borderline claim-to-product correspondences. These findings provide preliminary evidence of feasibility and analytical value, rather than a final benchmark of litigation-level performance.

1. Introduction

Patent infringement analysis is a core task in intellectual property protection because it requires determining whether an accused product or process falls within the scope of asserted patent claims. This task entails the interpretation of technically dense claim language, domain-specific terminology, and legally constrained evidence. As patent filings, technical disclosures, product manuals, product web pages, standards documents, and patent drawings continue to proliferate, manual claim-to-product comparison becomes increasingly costly, time-consuming, and difficult to scale. Recent surveys show rapid growth in patent-oriented natural language processing and patent retrieval [1,2]. PatentSBERTa [3] and PatentBERT [4] further demonstrate that patent-specific transformer models can improve document-level distance estimation, classification, and semantic comparison.
The methodological difficulty, however, is not solved by better document-level similarity alone. SAO-based infringement analysis and dependency-based claim analysis show that claim-to-product comparison must operate at the level of technical relations and claim limitations [5,6]. Later studies on SAO text mining, weighted semantic structures, and function-oriented patent semantics further confirm that infringement-oriented comparison requires functional and structural alignment rather than ordinary lexical overlap [7,8,9].
The legal reason for this granularity is straightforward. Claim construction determines the operative meaning of asserted limitations. Literal infringement is generally assessed element by element, and the doctrine of equivalents remains constrained by element-specific reasoning and prosecution-history limitations. Therefore, an infringement-detection system must distinguish semantic relatedness from legally sufficient claim coverage.
Real-world patent litigation introduces additional linguistic and doctrinal complexity that cannot be fully captured by synthetic patent–product pairs. Patent claims may contain intentionally broad functional expressions, open-ended modifiers, nested dependencies, numerical ranges, means-plus-function formulations, and strategically ambiguous terminology. Product descriptions may also be incomplete, promotional, or selectively drafted. In such settings, semantic similarity alone may increase false positives, whereas overly strict element matching may increase false negatives. The proposed DIKWP network is therefore intended as an explainable decision-support framework that identifies, ranks, and explains potential correspondences, rather than as an autonomous substitute for legal claim construction or expert infringement judgment.
Early computational approaches to patent analysis mainly relied on keyword matching, Boolean retrieval, vector-space comparison, citation networks, or classification features. These methods remain useful for prior-art discovery and coarse-grained patent landscape analysis [10,11]. They are less suitable for infringement detection because they cannot adequately represent functional relations, claim syntax, spatial qualifiers, quantity constraints, and semantic equivalence across alternative technical expressions.
The infringement task is element-centric rather than document-centric. Park et al. used SAO-based technological similarity to capture functional relations that keyword search tends to miss [5]. Lee et al. introduced dependency-based semantic claim analysis for infringement-risk assessment, showing that claim syntax has direct analytical value [6]. Subsequent work refined this direction through SAO-based text mining, broader SAO similarity measures, and function-oriented semantic knowledge [7,8,9]. Wang et al. further showed that a richer understanding of SAO semantics can improve patent text similarity modeling [12]. Recent research also shows that patents should not be treated as plain text alone. Patent documents encode relations among functions, structures, materials, components, locations, operating conditions, purposes, and embodiment alternatives. Ontology-based representation learning and ontological knowledge-graph refinement have therefore been introduced to organize patent semantics beyond surface tokens [13,14].
Knowledge graphs have also been used for patent examiner citation recommendation and patent recommendation, indicating their value for relational patent analysis and retrieval support [15,16]. Interpretable patent recommendation and product-innovation design studies further show that graph structures can preserve technical relations that flat embeddings may weaken [17,18]. Particularly relevant to the present task, a patent infringement analysis approach based on patent knowledge graphs and image similarity demonstrates the value of combining symbolic graph structure with other evidence channels [19].
This graph-oriented perspective is consistent with general knowledge-graph research. Foundational surveys describe knowledge graphs as entity-relation structures that support schema modeling, reasoning, completion, and learning [20,21]. Relational machine learning on graphs provides additional methods for exploiting typed links and graph paths when semantic correspondence is not reducible to direct lexical identity [22].
However, patent infringement detection is not merely a semantic matching problem. It is also a reasoning problem governed by legal criteria. Claim construction ordinarily begins with the claim language and its intrinsic context, including the specification and prosecution history. Literal infringement generally requires that every material claim limitation be found in the accused product or process. Doctrine-of-equivalents analysis may consider whether a non-identical feature is substantially equivalent to a claimed element, but it remains element-specific and legally constrained. Prosecution history estoppel may further restrict equivalence when claim amendments narrow the scope of a limitation.
These legal constraints require more than black-box similarity scoring. Ontologies and knowledge-aware explanation methods can make intermediate semantic commitments visible to reviewers [23,24]. Legal information extraction and ontology-supported legal research show that legal AI systems must preserve legally relevant entities, relations, and evidentiary context rather than only optimizing predictive accuracy [25,26]. General XAI and interpretability research likewise warns that high-stakes decision support needs transparent and auditable reasoning structures [27,28].
The emergence of large language models creates additional opportunities and risks. LLMs may assist with claim summarization, paraphrase generation, candidate correspondence discovery, and interactive legal-technical explanation. Retrieval-augmented generation provides one way to connect generation with external evidence, while foundation-model research highlights both the breadth and the governance risks of general-purpose models [29,30].
At the same time, empirical work shows that LLMs can hallucinate legal propositions and citations [31]. Broader surveys of large language models in legal systems and hallucination research emphasize that fluent output is not equivalent to factual or doctrinal reliability [32,33]. For patent infringement detection, generative outputs should therefore be embedded within controlled semantic architectures that validate candidate correspondences against ontologies, evidence links, and legal rules.
To address these requirements, the present study adopts the DIKWP model as the organizing framework for system design. DIKWP extends the traditional Data-Information-Knowledge-Wisdom framing by making Purpose an explicit control dimension [34]. Recent applications show that DIKWP can support purpose-sensitive reasoning in domain-specific settings, including medical dispute resolution and smart healthcare systems [35,36].
In this paper, DIKWP is not treated as a rigid hierarchy or one-way pipeline. It is operationalized as a network of interacting semantic dimensions in which data, information, knowledge, wisdom, and purpose are recursively transformed and mutually constrained during analysis. DIKWP studies on uncertainty modeling and semantic judicial reasoning support this networked interpretation [37,38]. Work on personalized and bidirectional semantic communication further motivates the use of purpose as a regulating dimension in human-machine reasoning [39,40].
Based on this perspective, this paper proposes a DIKWP network for patent infringement detection. The framework integrates patent-oriented NLP, ontology-based knowledge representation, graph-based semantic alignment, rule-guided legal inference, and purpose-aware decision support. Named entity recognition, dependency parsing, and SAO extraction transform unstructured text into structured semantic units. These units are linked to legal and technical ontologies and organized into knowledge graphs for claim interpretation and feature alignment. The system then performs infringement-oriented reasoning by combining semantic similarity, claim-limitation coverage analysis, functional-equivalence screening, and rule-based evidence aggregation, while preserving intermediate representations that support expert review.
This study makes three contributions. First, it develops a DIKWP-network-based design for patent infringement detection and extends purpose-aware semantic modeling to a concrete legal-technical task. Second, it constructs an end-to-end analytical framework that connects patent NLP, ontology and knowledge-graph representation, explicit legal logic, uncertainty handling, and explainable decision support. Third, it evaluates the proposed framework through a two-layer validation strategy. A controlled synthetic patent–product corpus is used to isolate claim-element reasoning and compare system configurations, while a real-world pilot corpus constructed from public patent claims and real product technical descriptions is used to examine external validity under realistic drafting styles, domain terminology, and incomplete product evidence. By framing patent infringement detection as a problem of semantic understanding, structured knowledge, legal reasoning, and purpose-sensitive control, this work contributes to research at the intersection of legal AI, patent analytics, knowledge-based systems, and semantic computing.
The remainder of this paper is organized as follows. Section 2 reviews the DIKWP model, artificial consciousness theory, patent NLP, semantic patent analysis, knowledge graphs, legal AI, and explainable decision support. Section 3 presents the overall system framework. Section 4 describes the semantic processing workflow. Section 5 details the prototype implementation and illustrative outputs. Section 6 reports experimental results. Section 7 discusses implications, limitations, deployment considerations, and future research directions. Section 8 concludes the paper.

2. Background and Related Work

2.1. DIKWP Network and Artificial Consciousness

The DIKWP model extends the traditional Data-Information-Knowledge-Wisdom framework by explicitly incorporating Purpose into intelligent processing. Early DIKWP work frames purpose as a bridge between task-oriented reasoning and more general intelligent behavior [34]. Domain-specific studies then illustrate how the model can organize medical dispute resolution and smart healthcare decision support [35,36].
In this study, DIKWP is not treated as a rigid hierarchy. It is understood as a network model in which data, information, knowledge, wisdom, and purpose interact dynamically and recursively during analysis. Studies on DIKWP uncertainty handling and semantic judicial reasoning support this non-linear view [37,38]. Related work on DIKWP-based distributed learning and semantic communication further emphasizes interaction among semantic spaces rather than a one-way conversion chain [39,40,41].
Within this formulation, data refers to raw patent claims, specifications, product descriptions, drawings, and related technical materials. Information denotes structured outputs derived from these materials, including extracted entities, claim limitations, product features, predicate-argument structures, and metadata. Knowledge comprises legal and technical ontologies, patent knowledge graphs, domain rules, lexical resources, and learned semantic correspondences. Wisdom refers to higher-order reasoning, including claim-coverage analysis, semantic equivalence assessment, uncertainty-aware judgment, and explanation generation. Purpose specifies the governing objective of the analysis, such as infringement screening, enforcement support, freedom-to-operate assessment, design-around guidance, or evidence prioritization for expert review.
A networked interpretation of DIKWP is particularly appropriate for patent infringement detection because the reasoning process is inherently iterative. Purpose may influence evidence selection, matching thresholds, equivalence screening, and report granularity; knowledge constrains semantic normalization and relation interpretation; and intermediate reasoning results may trigger additional retrieval, ontology expansion, or re-analysis of ambiguous product evidence. DIKWP research on uncertainty modeling and judicial reasoning provides the theoretical basis for this feedback-rich interpretation [37,38]. Studies of DIKWP semantic communication further support the idea that task purpose can shape information exchange between humans and machines [39,40].
Artificial consciousness is introduced here in a limited engineering sense rather than a strong philosophical sense. The present study does not claim that the proposed system is conscious in a phenomenological or subjective sense. Instead, artificial consciousness serves as a design inspiration for goal-aware coordination, context-sensitive adaptation, self-monitoring of reasoning status, and reflective control over explanations. Recent discussions in artificial consciousness research similarly stress the need to distinguish full consciousness claims from more modest functional notions such as self-monitoring, adaptive control, and goal-sensitive behavior [42,43,44].
From this perspective, the relevance of artificial consciousness to DIKWP lies in the explicit role of purpose. A purpose-aware system does not merely process patent text; it adjusts its reasoning strategy according to the analytical task. Purpose-driven DIKWP research motivates this form of control at the architectural level [34]. Recent DIKWP applications show how purpose can regulate domain-specific decision processes rather than merely label their outputs [35,36]. In the present setting, uncertainty-aware DIKWP reasoning and semantic judicial reasoning are particularly relevant because infringement assessment often involves incomplete evidence and legally constrained ambiguity [37,38].

2.2. Patent NLP, Semantic Representation, and Claim-Level Analysis

Patent analysis has become a major application area for natural language processing, machine learning, and knowledge-based systems. Earlier patent-analysis and patent-retrieval surveys emphasized the difficulty of patent language, including long sentences, broad functional expressions, synonymy, claim drafting conventions, and domain-specific terminology [10,11]. Recent surveys show that the field has moved beyond bibliographic matching toward machine-learning-based patent analysis and patent-specific NLP [1,45]. Deep-learning surveys further document the transition from handcrafted features to neural patent representations [46]. Tailored patent search studies show that retrieval strategies must still be adapted to the target technical and professional context [47].
Patent-specific transformer resources illustrate this shift. BIGPATENT supplies a large patent summarization corpus and demonstrates the availability of high-volume patent text for model development [48]. PatentBERT shows that BERT fine-tuning can improve patent classification [4]. PatentSBERTa extends sentence-transformer representations to patent distance estimation and classification [3].
General pretrained language models also provide useful representation components. BERT introduced contextual bidirectional pretraining, Sentence-BERT adapted transformers for sentence-level semantic similarity, and SciBERT demonstrated the value of domain-aware pretraining for scientific text [49,50,51]. Legal-domain models such as LEGAL-BERT and benchmarks such as LexGLUE show that legal language also benefits from specialized adaptation and evaluation [52,53].
These methods substantially improve retrieval and classification, but infringement detection requires a more granular analysis of claim limitations and accused-product features. A patent and a product can be semantically related without satisfying all material limitations. Conversely, two descriptions may have low lexical similarity yet correspond at the element level because of synonymy, functional equivalence, or alternative technical terminology.
One influential response to this limitation is the use of structure-aware semantic representations, especially SAO models. SAO representations make functional relations explicit and reduce dependence on surface lexical overlap. Park et al. used SAO-based semantic technological similarity to identify potential infringement relations [5]. Lee et al. proposed dependency-based semantic claim analysis for patent infringement risk assessment [6]. Later studies refined SAO-based infringement analysis and patent similarity modeling through text mining, weighted semantic structures, and function-oriented semantic knowledge [7,8,9]. Newer SAO work continues to improve patent similarity by modeling SAO semantics more comprehensively [12]. A mathematical-logical approach to semantic patentability assessment further shows that patent analysis benefits from explicit formal interpretation [54]. Patent-specific semantic relation classification research also supports treating relation extraction as a distinct task rather than as a by-product of document similarity [55].
Patent claims also exhibit a semi-structured legal form. Claim preambles, transitional phrases, limitations, dependent-claim references, means-plus-function expressions, process steps, numerical ranges, and negations all influence claim scope. The present framework therefore treats information extraction as more than a generic NLP task. It separates claim segmentation, product feature extraction, relation extraction, modifier detection, quantity parsing, and negation recognition. The output of this stage is a set of structured claim limitations and product features that can be aligned by the knowledge dimension and evaluated by the wisdom dimension.

2.3. Ontologies, Knowledge Graphs, and Semantic Patent Analytics

A second major research direction emphasizes ontology-based and knowledge-graph-based patent representation. Ontologies provide explicit vocabularies for a domain, including concepts, relations, constraints, and axioms [56]. Analyses of intellectual-property ontologies show that ontology design is itself a specialized problem in this domain [57]. Knowledge graphs extend this idea by organizing entities and relations into graph-structured semantic networks that can support querying, reasoning, completion, and machine learning [20,21,22]. Patent documents are natural candidates for such representation because they encode components, functions, structural relations, materials, operating conditions, technical effects, and embodiment alternatives.
Recent patent analytics studies demonstrate the value of explicit semantic representation. Zhai et al. designed a patent ontology for patent representation learning [13]. Trappey et al. used ontological knowledge-graph refinement for patent portfolio analysis [14]. Lu et al. used knowledge graphs to support patent examiner citation recommendation [15]. Patent knowledge graphs have also been applied to patent recommendation and interpretable patent recommendation, suggesting that graph representations can support both retrieval and explanation [16,17]. Related work on double-classification patent retrieval and product innovation design shows that structured patent knowledge can improve search and design-support tasks [18,58]. Particularly relevant to infringement detection, Jing et al. proposed a patent infringement analysis method that combines patent knowledge graphs with graph and image similarity [19].
The present framework builds on this literature by using a dual ontology. The technical ontology encodes domain concepts such as components, materials, functions, structures, part-whole relations, and operational relations. The legal ontology encodes concepts such as Patent, Claim, ClaimLimitation, ProductFeature, Correspondence, InfringementEvidence, LiteralMatch, EquivalenceCandidate, NegativeEvidence, and UncertainEvidence. This dual structure is important because infringement analysis requires both technical interpretation and legal sufficiency. A product feature may be technically similar to a claim limitation but legally insufficient if a material modifier, quantity condition, sequence constraint, or exclusion is missing.
Knowledge graphs also support explanation. Instead of producing a single similarity score, the system records how a claim limitation was extracted, which product feature was linked to it, which ontology relation justified the link, which rule evaluated the link, and how the final claim-level assessment was produced. Ontology-centered XAI research treats explicit knowledge as a basis for transparent explanation [23]. Legal XAI work emphasizes evidential presentation and reviewability in legal settings [24]. Broader XAI surveys and social-science accounts of explanation further show that useful explanations should be selective, contrastive, and understandable to human reviewers [27,59,60].

2.4. Legal AI, Claim Construction, and Explainable Decision Support

The broader legal AI literature confirms that legal decision support requires more than predictive accuracy. Legal information extraction must identify legally relevant entities, relations, and events in text, while legal reasoning must respect rule structures, evidentiary constraints, and doctrinal categories [25,61]. Legal NLP benchmarks and domain-adapted models such as LEGAL-BERT and LexGLUE show that legal language is sufficiently specialized to warrant domain-aware modeling [52,53]. These developments support the use of patent-specific and legal-specific NLP components in the proposed DIKWP network.
Recent patent-legal analytics also extends beyond retrieval and classification. Claim-scope-aware litigation risk prediction illustrates how patent drafts can be analyzed for dispute-related risk [62]. Generative AI has been explored for standard-essentiality assessment, and LLM-based patent litigation mining has been tested in domain-specific dispute analysis [63,64]. These studies reinforce the need to combine language-model flexibility with legally constrained validation.
For patent infringement, claim construction and element-level analysis are central. Markman established the court-centered role of claim construction, while Phillips clarified that claim meaning should be interpreted in the context of the claims, specification, and prosecution history. Literal infringement depends on the presence of each material claim limitation in the accused product or process. Doctrine-of-equivalents analysis may extend beyond literal identity, but it remains element-specific and constrained by prosecution history estoppel and related limiting principles. These legal constraints justify the framework’s emphasis on all-elements reasoning, limitation-level mapping, negative evidence, and expert-review flags.
Explainability is especially important because patent infringement analysis is a high-stakes legal task. A system output is of limited value unless it can show which claim limitations were matched, which product features supplied the evidence, which correspondences were semantic rather than literal, and which limitations remained unsupported. General XAI surveys classify explanation methods for black-box models and hybrid systems [27,59]. Work on interpretable machine learning warns that post hoc explanations may be insufficient in high-stakes settings [27,28]. Explanation-theory studies further support user-oriented, contrastive, and reviewable explanations [60,65].

2.5. Epistemological Basis of Symbolic–Statistical Hybridization

The proposed DIKWP network is not merely an engineering aggregation of heterogeneous AI techniques. It is grounded in a guarded symbolic–statistical epistemology in which different components make different types of knowledge claims and are assigned different degrees of legal authority. This distinction is essential for patent infringement analysis because the task involves both empirical semantic interpretation and normative legal sufficiency.
Statistical components, including patent-oriented NLP, named entity recognition, dependency parsing, SAO extraction, transformer-based embedding similarity, and candidate relation extraction, operate under an inductive epistemological assumption. They assume that recurring linguistic patterns, distributional similarity, and learned contextual representations can reveal candidate entities, relations, paraphrases, and semantic correspondences. Their outputs are therefore probabilistic or confidence-oriented. These components are appropriate for discovering possible claim elements, product features, synonym relations, and functionally related expressions. However, they do not by themselves establish legal sufficiency. A high embedding-similarity score may indicate semantic relatedness, but it cannot determine whether all material claim limitations are satisfied.
Symbolic components, including ontologies, knowledge graphs, legal rules, all-elements reasoning, prosecution-history constraints, and explanation templates, operate under a different epistemological assumption. They assume that legally and technically relevant concepts can be explicitly represented as typed entities, relations, constraints, and inference rules. Their role is not primarily to discover semantic candidates, but to validate, constrain, and explain them. In the proposed framework, symbolic structures determine whether a candidate correspondence is anchored to a material claim limitation, whether required relations and modifiers are preserved, whether negative evidence blocks a match, and whether the claim-level all-elements condition is satisfied.
This division of labor is consistent with Barbierato et al.’s argument that machine learning should be conceptually distinguished from broader artificial intelligence. They argue that ML has developed its own methodological identity, centered on data-driven performance optimization, while broader AI also includes symbolic reasoning, problem solving, and governance-oriented concerns [66]. In the present framework, this distinction prevents machine-learning-style components from being treated as autonomous legal reasoners. Instead, ML and NLP modules serve as candidate-generation and uncertainty-estimation mechanisms, whereas ontology-based and rule-based modules provide legal-semantic validation, evidentiary traceability, and doctrinal control.
This hybridization is appropriate for a high-stakes legal task because neither component family is sufficient alone. A purely statistical model may capture paraphrase, technical synonymy, and functional similarity, but it may also confuse general technical relatedness with legally sufficient claim coverage. A purely symbolic model may enforce legal rules transparently, but it may fail when real product descriptions use terminology that differs from claim language. The proposed DIKWP network combines the semantic flexibility of statistical processing with the auditability and constraint-sensitivity of symbolic reasoning.
Within the DIKWP interpretation, the Data and Information dimensions mainly host evidence acquisition and statistical extraction. The Knowledge dimension normalizes extracted candidates into ontologies and knowledge graphs. The Wisdom dimension performs legal sufficiency assessment, uncertainty handling, and claim-level aggregation. The Purpose dimension controls operating posture, including whether the system emphasizes enforcement-oriented recall, clearance-oriented precision, design-around analysis, or expert-review routing. Thus, the DIKWP network provides not only a modular engineering architecture but also an epistemological allocation of responsibility among statistical estimation, symbolic representation, legal reasoning, and purpose-sensitive decision support.
Table 1 summarizes the epistemological roles of the main symbolic, statistical, and control components in the proposed DIKWP network.

2.6. Research Gap and Positioning of This Study

Existing studies have established the value of patent-specific NLP, SAO extraction, ontology modeling, knowledge graphs, patent recommendation, semantic retrieval, and explainable legal AI. However, most prior work addresses only one part of the infringement-detection problem. SAO-based infringement analysis and knowledge-graph-based infringement prediction provide important foundations but do not by themselves constitute a full purpose-aware decision architecture [7,19]. Patent recommendation and innovation-design studies demonstrate the value of graph representations, but their primary tasks differ from claim-to-product infringement assessment [16,17]. Legal information extraction and ontology-based legal research address legal semantics, yet they do not fully integrate patent-specific claim-element logic with DIKWP-style purpose control [25,26].
A further limitation is that many architectures remain implicitly linear, even when they use advanced semantic models. Such designs are effective for retrieval or classification but less suitable for infringement analysis, where evidence selection, semantic interpretation, equivalence screening, and reporting thresholds may need to change according to the user’s objective. Purpose-driven DIKWP research provides the conceptual basis for treating purpose as an explicit control dimension [34,35]. DIKWP work on uncertainty and semantic judicial reasoning supports recurrent interaction among semantic spaces [37,38]. DIKWP semantic communication studies further motivate human-machine feedback as part of the reasoning architecture [39,40].
The present study addresses this gap by proposing a DIKWP-network-based semantic AI framework for patent infringement detection. The framework integrates patent-oriented NLP, ontology and knowledge-graph construction, rule-guided inference, uncertainty handling, purpose-aware control, and explanation generation within a single architecture. Patent infringement detection is therefore treated not as a standalone similarity task, but as a coordinated process of semantic understanding, structured knowledge integration, legal sufficiency assessment, and purpose-governed reasoning.

3. System Architecture Overview

Figure 1 presents the overall architecture of the proposed semantic AI system for patent infringement detection. Although the architecture is organized analytically around the five DIKWP dimensions—Data, Information, Knowledge, Wisdom, and Purpose—it is not implemented as a rigid bottom-up stack. Instead, it is designed as a networked semantic architecture in which these dimensions operate as interacting functional spaces. Bottom-up processing transforms patent and product documents into structured evidence, while top-down regulation from the purpose space dynamically adjusts matching strategies, reasoning strictness, and explanation requirements. In this sense, DIKWP serves not merely as a descriptive taxonomy, but as the organizing principle of a purpose-aware and explainable reasoning system [1,19,23,38].
For expository clarity, Figure 1 arranges the DIKWP dimensions vertically. Operationally, however, the system behaves as a coordinated network in which modules exchange information recurrently rather than only sequentially. Data acquisition supports information extraction; extracted information is normalized and linked within the knowledge space; knowledge structures guide reasoning in the wisdom space; and the purpose space continuously conditions the behavior of the lower dimensions by regulating thresholds, retrieval priorities, and inference scope. This recurrent design is particularly important for patent infringement analysis, where claim interpretation, technical matching, and legal judgment often require iterative refinement rather than one-pass processing [38,40,67].
At the data dimension, the system acquires the patent materials to be protected or examined and the description of the potentially infringing product or process. These materials may include patent claims, specification excerpts, product manuals, technical brochures, web descriptions, and, where available, patent drawings. In the current prototype, the inputs are provided as structured text files, but the architecture is extensible to web crawlers, enterprise databases, and document management systems. Before further processing, the raw materials are normalized through standard preprocessing operations such as character-encoding unification, removal of irrelevant boilerplate, and segmentation of legally relevant sections. For infringement analysis, the claims and the technically informative parts of the specification are prioritized over bibliographic front matter, because the central task is claim-to-product comparison rather than general patent retrieval.
The information dimension transforms raw text into structured semantic units suitable for downstream matching and reasoning. Patent claims are first segmented into individual claim elements, since infringement analysis ultimately depends on whether each legally material element can be found, directly or equivalently, in the accused product. Product descriptions are processed in parallel and decomposed into feature statements or predicate–argument structures. To support this transformation, the architecture employs a patent-adapted NLP pipeline consisting of tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and subject–action–object (SAO) extraction. The use of claim parsing and SAO-oriented semantic representation is motivated by earlier work showing that infringement-related comparison benefits from functional and structural representations that go beyond lexical overlap [14,15,23].
The information dimension also performs term indexing and preliminary candidate matching. Each extracted claim element is associated with its corresponding technical entities, actions, and modifiers, while product features are indexed in an analogous manner. This step enables efficient retrieval of candidate correspondences before more expensive ontology-based reasoning is invoked. For example, a claim element such as “a hinge connecting the door to the frame” may be transformed into one or more structured units that preserve both the action and the relational context. Such representations are important because patent language often embeds technically critical qualifiers—such as location, direction, or function—that would be lost in purely keyword-based comparison [23,36].
The knowledge dimension is the core semantic integration space of the architecture. Here, the extracted information is mapped onto formal legal and technical representations, including domain ontologies, lexical resources, and a patent-oriented knowledge graph. The technical ontology encodes relevant entities and relations in the technological domain of the patent, such as components, functions, materials, part–whole structures, and operational relations. The legal ontology captures concepts such as claim, claim element, accused product, correspondence, infringement evidence, and equivalence. This dual-ontology design allows the system to align textual expressions with both technical semantics and legal interpretation [16,45,58].
Because technical vocabulary evolves rapidly, the ontology layer is designed as a maintainable and extensible knowledge resource rather than as a fixed dictionary. New candidate concepts and relations can be harvested from patent claims, specifications, product manuals, CPC/IPC classifications, technical standards, and expert claim charts. Candidate terms are first proposed through named entity recognition, relation extraction, embedding-based clustering, and knowledge-graph completion. They are then validated through ontology consistency checks, provenance tracking, and expert review before being incorporated into the production ontology.
Once the ontology mapping is completed, the system builds a knowledge graph in which nodes represent claim elements, product features, technical entities, and inferred semantic correspondences, while edges represent structural, functional, and legal relations. This graph-based representation allows the architecture to preserve relational information that flat textual similarity measures often fail to capture. For instance, if a patent claim refers to a “hinge” and a product description refers to a “pivot,” the system may identify a potential correspondence not through lexical identity but through ontology-level proximity, lexical expansion, or functional equivalence. Knowledge-graph-based approaches have recently shown clear promise for patent representation and infringement-related analysis, especially when combined with semantic similarity and multimodal evidence [17].
Reasoning rules are also maintained in the knowledge dimension. These include general correspondence rules, taxonomy-aware matching rules, and legal rules that operationalize core infringement standards. The most important of these is the all-elements rule, according to which literal infringement requires that every material claim element be present in the accused product. Additional rules can be introduced for function-based equivalence analysis, allowing the system to flag cases in which a product feature is not lexically identical to a claim element but may perform substantially the same function in a substantially similar way. In this manner, the knowledge dimension does not merely store information; it creates the structured semantic conditions under which infringement reasoning becomes possible.
The wisdom dimension performs high-level evidential reasoning and converts graph-level correspondences into a legal-technical assessment. Its role is not limited to executing deterministic rules; it also evaluates ambiguity, resolves conflicts, and aggregates heterogeneous evidence into a usable conclusion. In practice, not all claim elements are matched with equal certainty. Some correspondences may be exact, some ontology-mediated, and some only functionally analogous. The wisdom dimension therefore includes a confidence aggregation mechanism that assigns differentiated weights to different forms of match evidence and produces a final infringement-oriented assessment based on the configured decision policy.
This dimension also generates explanations. Because patent infringement analysis is a high-stakes legal task, a decision is of limited value unless the system can show how it was reached. The explanation generator therefore traces the reasoning path from extracted claim elements and product features, through ontology mapping and graph correspondence, to the final decision outcome. A typical output may state which claim elements were matched exactly, which were matched through semantic expansion or functional equivalence, and which remained unsupported or ambiguous. Such traceability is consistent with broader developments in explainable legal AI, where structured evidence presentation and reviewable reasoning are regarded as central requirements rather than optional features [25,26].
An optional case-based reasoning component can also be positioned in the wisdom dimension. This component does not replace rule-guided reasoning, but supplements it by retrieving and comparing similar prior analytical patterns or historical dispute configurations. If the current case resembles a previously observed non-infringement or high-risk pattern, this information may be used as supplementary decision support. In journal style, it is preferable to present this component as an optional extension rather than as a mandatory module, since its usefulness depends on the availability and quality of prior case data.
The purpose dimension regulates the architecture at the highest level of abstraction. In the proposed DIKWP network, purpose is not a passive label added after reasoning has finished. Instead, it functions as an active orchestration space that configures the system according to the user’s legal or strategic objective. Different purposes imply different operational preferences. An enforcement-oriented task may prioritize recall and sensitivity, thereby encouraging broader semantic matching and stronger attention to potential equivalence. A clearance or design-around task may prioritize precision and conservative risk control, thereby requiring stricter thresholds and more cautious treatment of uncertain matches. Recent DIKWP work in legal and semantic reasoning similarly emphasizes that purpose should be modeled as a controlling factor that shapes semantic processing rather than as an external annotation [38,63].
The purpose dimension also supports user interaction and feedback. Users may specify analytical preferences such as higher recall, stronger explainability, or emphasis on exact claim coverage. They may also initiate follow-up queries, for example by asking which elements were not matched, which correspondences are uncertain, or which design changes might reduce infringement risk. In such cases, the purpose orchestrator reconfigures the information, knowledge, and wisdom dimensions and initiates another reasoning cycle. This top-down influence is one of the main reasons why the architecture should be described as a DIKWP network rather than as a fixed layer-by-layer pipeline.
Communication across the architecture is therefore both bottom-up and top-down. Bottom-up processing remains the primary evidential path, beginning with raw documents and culminating in an infringement-oriented judgment. Top-down control, however, is equally important: purpose can modify reasoning strategy, and wisdom can request additional semantic evidence or refined ontology alignment when the available support is insufficient. This iterative interaction resembles the working process of a human analyst, who may return to the claims, re-examine terminology, or seek additional contextual evidence when initial comparison results remain inconclusive.
A major design principle of the proposed DIKWP network is modularity. Each functional component can be refined or replaced without redesigning the entire system. For example, a rule-based SAO extractor may later be replaced with a patent-specific relation extraction model, and the ontology-based matcher may be complemented by graph embeddings or other learned semantic similarity mechanisms. This modularity is particularly valuable in patent AI because both the language technology and the legal knowledge layer are evolving rapidly. Hybrid symbolic–subsymbolic enhancement is therefore possible within the same architecture, provided that explainability and traceability are preserved [3,19,55].
A second key design principle is explainability. The architecture is designed to preserve a trace from raw evidence to final assessment, corresponding naturally to the DIKWP network. The data space records the original textual materials; the information space records extracted claim elements and product features; the knowledge space records ontological mappings, graph structures, and inferred correspondences; the wisdom space records confidence aggregation and decision rationale; and the purpose space records task settings and strategic constraints. This explicit traceability is essential for legal deployment because it supports expert review, error analysis, and procedural accountability [23,24].
Overall, the proposed DIKWP network treats patent infringement detection as a networked process of semantic transformation, structured knowledge integration, evidential reasoning, and purpose-aware orchestration. This formulation better reflects the actual requirements of patent analysis than a purely sequential or text-similarity-based pipeline. The next section presents the semantic processing workflow and illustrates how patent claims and product descriptions are transformed through the DIKWP network into an infringement-oriented analytical result.

4. Semantic Processing Workflow and Module Functionality in the DIKWP Network

Building on the system architecture introduced in Section 3, this section describes how the proposed semantic AI system transforms patent claims and accused-product descriptions into an infringement-oriented assessment. For clarity of presentation, the workflow is described in a sequence of analytical stages. Operationally, however, the system is not implemented as a strictly linear pipeline. Instead, it functions as a networked DIKWP process in which the data, information, knowledge, wisdom, and purpose dimensions interact recurrently. Bottom-up processing converts raw text into structured evidence, while top-down control from the purpose dimension adjusts matching strategies, inference scope, uncertainty handling, and explanation requirements. This networked formulation is consistent with the DIKWP view that intelligent processing arises from interaction among semantic spaces rather than from a one-way hierarchical chain [36,38,40].
The inputs to the workflow include one or more patent claims, a textual description of the potentially infringing product or process, and, where available, additional supporting documents such as manuals, drawings, or prior-art references. The output is a structured analytical report containing element-level correspondences, a claim-level infringement assessment, a confidence estimate, and an explanation trace. Figure 2 summarizes this workflow. Although the stages are introduced below in a sequential manner, the system permits backward transitions. For example, if the wisdom dimension identifies an unresolved element mismatch, it may request additional ontology expansion from the knowledge dimension or more fine-grained feature extraction from the information dimension. Likewise, the purpose dimension may alter thresholds or matching rules in accordance with the user’s objective, such as enforcement, risk screening, or design-around analysis.
To illustrate the workflow, consider the following simplified example. Claim 1 of a hypothetical patent states: “A chair comprising a seat, a plurality of legs, and a backrest attached to the seat.” The accused-product description states: “Our product is a portable stool with a flat round sitting surface supported by three collapsible legs. It does not have any back support.” A human expert would immediately suspect that the product does not literally infringe the claim because the claim requires a backrest, whereas the product explicitly lacks any back support. The example is used below to show how the proposed DIKWP network reaches the same conclusion in a transparent and legally interpretable manner.

4.1. Data Acquisition and Normalization

The workflow begins in the data dimension, where raw patent and product materials are acquired and normalized. In the current prototype, these materials are provided as text files or structured text strings, although the architecture is extensible to patent databases, document management systems, and web-based product sources. At this stage, the system performs basic preprocessing operations such as character-encoding normalization, section segmentation, and removal of irrelevant boilerplate. For patent documents, the claims and technically informative parts of the specification are prioritized because infringement analysis depends primarily on claim construction and claim-to-product comparison rather than on bibliographic metadata or general background text.
In the running example, the data dimension stores the patent claim and the product description as raw textual inputs. No legal conclusion is drawn at this point. The purpose of the data dimension is to preserve the original evidence and make it available for subsequent semantic transformation. This separation is important for traceability because it enables the system to maintain an auditable record of the exact textual material from which later correspondences and inferences are derived.

4.2. Information Extraction and Claim Structuring

The information dimension converts raw text into structured semantic units suitable for downstream matching and reasoning. Patent claims are first segmented into legally material claim elements, while product descriptions are decomposed into feature statements or predicate–argument structures. This stage relies on a patent-adapted NLP workflow comprising tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and subject–action–object extraction. The use of claim parsing and SAO-oriented representation is motivated by prior research showing that infringement-related comparison benefits from structural and functional representations rather than lexical overlap alone [15,23,64].
For the example claim, the system identifies three core elements: a seat, a plurality of legs, and a backrest attached to the seat. These elements can be represented by a set of structured relations such as (chair, comprises, seat), (chair, comprises, legs), (chair, comprises, backrest), and (backrest, attached_to, seat). The objective here is not simply to extract isolated terms, but to preserve the structural constraints embedded in the claim. The phrase “backrest attached to the seat,” for instance, is not treated as a flat keyword set; rather, it is decomposed into an object and an internal relation, because the attachment relation may later become relevant to element-level comparison.
The product description is processed in parallel. From the sentence “Our product is a portable stool with a flat round sitting surface supported by three collapsible legs,” the system extracts relations such as (stool, has, sitting_surface) and (sitting_surface, supported_by, legs). From the sentence “It does not have any back support,” the system extracts a negated relation, such as (stool, not_have, back_support). Negation handling is especially important. In patent infringement detection, the absence of a claim element can be legally decisive, and the system must preserve negative evidence rather than treat it as mere omission or noise.
The information dimension also performs term indexing and preliminary candidate matching. Each claim element is linked to its associated entities, modifiers, and quantities, while product features are indexed in an analogous manner. In the present example, the phrase “plurality of legs” is normalized into a quantity constraint, while “three collapsible legs” is represented as a feature bundle that includes both count and attribute information. This indexing stage supports efficient retrieval of candidate correspondences before ontology-based reasoning is invoked. It also supports later explanation generation because each extracted feature remains linked to its originating sentence.

4.3. Ontology Mapping and Knowledge Graph Construction

The knowledge dimension is the primary semantic integration space of the workflow. At this stage, the extracted information is mapped onto formal semantic structures, including a technical ontology, a legal ontology, lexical resources, and a patent-oriented knowledge graph. The technical ontology represents concepts such as chair, stool, seat, leg, and backrest, together with their relations and constraints. The legal ontology captures concepts such as claim, claim element, accused product, correspondence, infringement evidence, and equivalence. The purpose of this dual-ontology design is to align linguistic expressions with both technical semantics and legal interpretation [17,45,58].
In the running example, the system maps “sitting surface” to the concept of seat and “back support” to the concept of backrest through ontology-based normalization and lexical expansion. The term “stool” is mapped to the concept stool, which may be modeled as a seating device but not necessarily as an object containing a backrest. The patent claim is then represented as a graph in which the claimed chair has part relations to seat, legs, and backrest, together with the relation attached_to(backrest, seat). The product is represented as a second graph in which the stool has a seat-like surface and legs, but no backrest node is instantiated. The negative statement in the product description is preserved either as an explicit negative relation or as a graph-level absence constraint.
Once the two representations are constructed, the system performs semantic alignment between claim elements and product features. This alignment uses lexical similarity, ontology subsumption, synonym expansion, and graph-level relational consistency. In the example, the correspondence between “sitting surface” and “seat” is straightforward, and the relation between “three collapsible legs” and “a plurality of legs” is supported by both lexical identity and numerical compatibility. The critical issue concerns the backrest element. Because the product text explicitly states that no back support is present, the system registers both the absence of a corresponding feature and explicit negative evidence against a backrest match.
The knowledge dimension also hosts the rule base used for infringement analysis. The central rule is the all-elements rule: literal infringement requires that every material claim element be found in the accused product. Additional rules may encode taxonomy-aware matching, quantity conditions, and function-based equivalence. For example, a product feature may satisfy a claim element either through direct identity or through ontology-mediated correspondence if it is a subtype or accepted synonym. Equivalence rules can also be invoked when a feature differs lexically but performs substantially the same function in a substantially similar way. In the present example, however, no substitute exists for the missing backrest; accordingly, both literal matching and equivalence-based matching fail for that element.

4.4. Reasoning, Decision Formation, and Explanation

The wisdom dimension transforms graph-level correspondences into an infringement-oriented assessment. Its role is not limited to executing deterministic rules. It also aggregates evidence, evaluates ambiguity, and produces a decision rationale that can be inspected by legal and technical experts. This is important because patent infringement analysis often yields mixed evidence: some claim elements may match exactly, some only approximately, and some may remain unsupported. In the proposed system, each element-level correspondence is therefore associated with a confidence value derived from the type and strength of the match.
In the running example, the seat element is matched through semantic normalization between “seat” and “sitting surface.” The legs element is matched through direct lexical and quantitative correspondence because the product contains three legs and therefore satisfies the claim requirement of a plurality of legs. The backrest element is not matched, and the product description provides explicit negative support for its absence. Under the all-elements rule, the failure of a single required element is sufficient to defeat literal infringement. Because the product also lacks any evident structure serving the same function as a backrest, the doctrine of equivalents is not triggered. The resulting conclusion is therefore non-infringement.
The explanation generator then produces a traceable report. A representative output may be formulated as follows: “No infringement is detected for Claim 1 because the accused product does not contain the backrest element required by the claim. The product includes a seat-like sitting surface and a plurality of legs, which correspond to two claim elements. However, the description explicitly states that the product has no back support. Because at least one material claim element is absent, literal infringement is not established.” This type of explanation links the decision directly to extracted evidence and legal logic, which is consistent with the requirements of explainable legal AI [25,26].
The wisdom dimension may also invoke auxiliary reasoning components when the evidence is less clear. An optional case-based reasoning module can retrieve similar prior analytical patterns or previously observed dispute configurations to support the current assessment. This component is not essential to the basic workflow, but it can strengthen decision support in borderline cases. More importantly, the wisdom dimension may request additional processing from the lower dimensions when the current evidence is insufficient. For example, if the match status of a claim element remains ambiguous, the system may request ontology refinement, synonym expansion, or extraction of additional relations from the product description. This feedback mechanism further illustrates why the proposed DIKWP formulation should be described as a network rather than a unidirectional processing chain.

4.5. Purpose-Guided Control and User Feedback

The purpose dimension regulates the workflow according to the user’s analytical objective. In the proposed DIKWP network, the purpose is not introduced only after the reasoning process has been completed. Rather, it actively configures the behavior of the lower dimensions throughout the analysis. An enforcement-oriented task may prioritize recall and broader semantic matching, thereby lowering the threshold for flagging possible equivalence. A clearance or design-around task may prioritize precision and conservative risk assessment, thereby requiring stricter correspondence criteria. This role of purpose as an active control space is consistent with the DIKWP network perspective advanced in recent work on purpose-sensitive semantic systems [38,40,63].
In the present example, the legal conclusion remains non-infringement regardless of operating mode because the absence of a backrest is explicit and decisive. However, the form of the output may vary according to purpose. In an enforcement scenario, the system may report that no actionable claim coverage is presently supported by the available evidence. In a design-around scenario, the system may emphasize that the absence of a backrest is the principal reason why infringement is not established and may further indicate that adding a back-support structure would materially increase infringement risk. Thus, purpose affects not only thresholds and inference scope, but also the explanatory framing of the final report.
The purpose dimension also supports interactive follow-up queries. Users may ask which claim elements were unmatched, which correspondences relied on equivalence rather than direct identity, or which textual passages served as the decisive evidence. Because the system preserves a trace from raw text to decision, these questions can be answered transparently. If the available product description is incomplete, the purpose dimension may instruct the system to return an “insufficient evidence” status rather than a definitive non-infringement conclusion. This distinction is practically important because unmentioned features should not automatically be treated as absent features.

4.6. Robustness to Ambiguous Patent Language, Multi-Claim Processing, and Practical Considerations

Several robustness issues must be addressed in real-world applications. First, product descriptions are often incomplete, promotional, or strategically vague. In such circumstances, failure to detect a claim element may reflect missing disclosure rather than actual absence. The proposed workflow therefore distinguishes between explicit negative evidence and mere non-mention. Only the former should strongly support non-infringement, whereas the latter may trigger uncertainty handling or a request for additional materials.
Building on this distinction, the system assigns one of five evidence states to each claim element: supported, contradicted, uncertain, non-mentioned, or equivalence-candidate. Supported evidence indicates that a product feature directly or ontology-medially corresponds to a claim limitation while preserving all material constraints. Contradicted evidence is assigned when the product description explicitly denies the presence of the claimed feature or conflicts with a material relation, modifier, quantity, range, or sequence constraint. Uncertain evidence applies when the available description is vague, incomplete, or semantically under-specified. Non-mentioned features are not automatically treated as absent; rather, they trigger uncertainty handling, additional evidence retrieval, or human review. Equivalence-candidate evidence applies when a non-identical product feature may perform a comparable function in a comparable way with a comparable result, but is not treated as literal support. Table 2 summarizes the five evidence states and their corresponding system actions.
Second, semantic mismatch may arise from domain-specific terminology, spelling variation, or differences between technical and commercial descriptions. To mitigate this problem, the knowledge dimension combines ontology-based normalization with lexical resources and, where appropriate, learned similarity models. This hybrid strategy reduces the risk of false mismatches when equivalent technical concepts are expressed using different surface forms.
Third, patent infringement analysis is inherently claim-specific. The workflow described above is therefore executed independently for each asserted claim. Since infringement of any valid asserted claim may be legally sufficient, the final report aggregates claim-level outcomes rather than producing only one global similarity score. This claim-by-claim processing is preferable to document-level matching because it preserves legal granularity and supports targeted explanation.
Overall, the proposed semantic processing workflow converts raw patent and product texts into a structured, knowledge-rich, and purpose-sensitive infringement assessment. Although described in ordered stages for clarity, the workflow is implemented as a DIKWP network in which data acquisition, semantic extraction, ontology mapping, evidential reasoning, and purpose-guided orchestration interact recurrently. This networked design enables the system to combine claim-level precision, semantic flexibility, legal interpretability, and user-oriented control within a single analytical framework. The next section presents the prototype implementation and representative outputs generated by the system.

5. Prototype Implementation and Illustrative Outputs in the DIKWP Network

This section presents a proof-of-concept implementation of the proposed semantic AI framework for patent infringement detection. The prototype is intended to show how the DIKWP-based design can be operationalized through existing natural language processing, semantic-web, and rule-reasoning technologies, and how its intermediate reasoning states can be represented in an explainable form. The illustrative examples in this section make the element-level representation, ontology-mediated correspondence, rule-based inference, and explanation trace concrete, while the quantitative performance evaluation and statistical interpretation are reported separately in Section 6. In contrast to a rigid sequential pipeline, the prototype is implemented as a networked semantic process in which the data, information, knowledge, wisdom, and purpose dimensions exchange intermediate representations recurrently. Information extraction produces structured evidence for ontology mapping, knowledge-driven reasoning constrains subsequent matching, and purpose settings regulate inference sensitivity, uncertainty handling, and reporting style [63].

5.1. Prototype Realization

The prototype was implemented primarily in Python 3.11, with semantic-web components used for ontology engineering and graph-based knowledge representation. At the information dimension, linguistic preprocessing relies on spaCy for tokenization, part-of-speech tagging, and dependency parsing, with domain-oriented customization to better handle long patent claims and technical noun phrases. Subject–action–object extraction is implemented through a hybrid rule layer that integrates dependency patterns, matcher templates, and curated grammar rules for passive constructions and domain-specific relation patterns, such as “attached to,” “supported by,” and “connected to.” Technical entity recognition is handled by a transformer-based model fine-tuned on patent annotations, enabling the detection of components, materials, and other domain-relevant entities. This design choice is consistent with prior work showing that claim parsing and SAO-oriented semantic structures are more suitable for infringement-related comparison than purely lexical similarity [14,23].
The knowledge dimension is implemented through an OWL ontology designed in Protégé and a graph representation managed through RDFlib. The ontology contains approximately fifty classes covering general mechanical and electrical components together with core legal concepts, including Patent, Claim, ClaimElement, Product, ProductFeature, Correspondence, and InfringementEvidence. WordNet-based lexical expansion is used where appropriate to support synonym and hyponym matching. The extracted claim elements and product features are converted into RDF-style triples and then integrated into a claim–product knowledge graph. This graph functions as the semantic substrate for ontology alignment, element correspondence, rule evaluation, and explanation tracing [17,45].
The prototype enforces the epistemological separation described in Section 2.5. Outputs from statistical modules are stored as candidate structures rather than as final legal determinations. For example, transformer-based entity recognition may propose a product feature, and embedding similarity may propose a candidate correspondence between that feature and a claim limitation. However, the correspondence is not counted as legally supported until the ontology layer confirms concept compatibility, the graph layer preserves relation consistency, and the rule engine verifies that no material modifier, quantity constraint, negative evidence, or legal exclusion defeats the match.
Reasoning in the wisdom dimension is implemented through a hybrid strategy that combines ontology lookup with a custom rule engine in Python. Although off-the-shelf semantic reasoners were considered during prototyping, a lightweight custom engine proved more convenient for element-wise claim analysis, negative-evidence handling, and purpose-conditioned threshold adjustment. Literal infringement is operationalized through an all-elements rule. Let c denote an asserted patent claim and let p denote the accused product or process description. The set E ( c ) = { e 1 , , e m } denotes the legally material claim elements extracted from c after claim segmentation. Each claim element e i is represented as a structured unit e i = k i , R i , M i , Q i , N i , where k i denotes the normalized core technical concept, R i denotes required structural or functional relations, M i denotes textual modifiers, Q i denotes quantity or range constraints, and N i denotes negation, exclusion, or limiting conditions. The set F ( p ) = { f 1 , , f n } denotes the product features extracted from p. Each product feature f j is represented as f j = k j , R j , M j , Q j , S j , P j , where k j denotes the normalized product concept, R j denotes observed structural or functional relations, M j denotes observed modifiers, Q j denotes observed quantities or ranges, S j denotes the evidence state, and P j records provenance links to the source text. Literal infringement is operationalized through an all-elements rule. In the detailed legal reasoning module below, this rule is formalized through the LiteralSupport ( f , e ) predicate, which requires satisfaction of the core concept, required relations, modifiers, quantities, sequence constraints, and negative-evidence gates.
A simplified functional-similarity screening predicate is also included, but it is not treated as an infringement predicate. The predicate SameFunction ( f , e ) is used only as an internal screening signal for equivalence-candidate routing. It returns 1 when the product feature appears to perform a comparable function in a technically comparable way and produces a comparable technical result according to the function ontology and curated functional-action rules. However, SameFunction ( f , e ) = 1 does not establish legal equivalence, does not satisfy the all-elements rule by itself, and does not override explicit negative evidence, prosecution-history constraints, or material claim limitations. The legally cautious equivalence-screening rule is formalized later as EquivalenceConcern ( c , p ) in Section 5.2.
Table 3 lists representative failure and boundary cases used to illustrate how the prototype separates literal-support failures from functional-equivalence screening.
The purpose dimension is realized as a lightweight orchestration layer that controls system mode, reasoning strictness, and output configuration. In an enforcement-oriented mode, the system can operate with higher sensitivity and broader equivalence screening. In a clearance or design-around mode, it can apply stricter correspondence criteria and emphasize unmatched claim elements or non-infringing design distinctions. The current prototype exposes these controls through a command-line interface, but the analytical core is interface-independent and can be embedded in a graphical or web-based system without changing the underlying DIKWP network logic.
Ontology expansion is handled through a semi-automatic update loop. The system records unmatched claim elements, low-confidence correspondences, and repeatedly observed out-of-vocabulary technical terms. These items are added to an ontology expansion queue. Candidate concepts and relations are generated by patent-specific entity recognition, SAO extraction, embedding-based term clustering, and graph-completion methods. However, automatic expansion is not directly committed to the legal-technical ontology. Each proposed update is associated with provenance metadata, confidence scores, and source evidence, and it must pass consistency checking and expert validation. This design allows the ontology to evolve with emerging technologies while preserving the traceability required for legal decision support.

5.2. Legal Operationalization of Claim Construction and Infringement Reasoning

To make the legal reasoning component operational rather than merely conceptual, the prototype represents claim construction, limitation decomposition, literal matching, equivalence screening, missing-limitation handling, prosecution-history constraints, and legally relevant similarity as explicit data structures and rules. The system does not treat patent infringement as document-level similarity. Instead, it treats infringement-risk assessment as a claim-by-claim and element-by-element reasoning process in which each asserted limitation must be represented, matched, contradicted, or marked as evidentially unresolved.
Claim limitation decomposition begins by segmenting each asserted claim into the preamble, transition phrase, and body limitations. The body is further decomposed into material limitations, including structural components, functional requirements, relational constraints, modifiers, quantity or range constraints, process-order requirements, negative limitations, and dependent-claim references. Where available, specification excerpts and prosecution-history materials are linked to the corresponding limitation as claim-construction evidence. The output is not a flat keyword list, but a set of legally structured claim-limitation objects.
For an asserted claim c, the decomposed claim-element set is denoted as E ( c ) = { e 1 , , e m } . Each claim limitation is represented as
e i = i d i , t y p e i , c o r e i , r e l i , m o d i , q u a n t i , s e q i , n e g i , d e p i , c o n s t r i , p r o s i , m a t i , p r o v i .
Here, i d i is the limitation identifier; t y p e i indicates whether the limitation is structural, functional, relational, numerical, material, process-oriented, or negative; c o r e i denotes the normalized technical concept; r e l i records required structural, spatial, causal, or functional relations; m o d i records material, positional, shape, or purpose modifiers; q u a n t i records numerical, range, threshold, and plurality constraints; s e q i records method-step order; n e g i records negative or exclusionary language; d e p i records inherited dependent-claim limitations; c o n s t r i records claim-construction sources from claim language, specification, or other intrinsic evidence; p r o s i records prosecution-history constraints; m a t i indicates whether the limitation is material for all-elements reasoning; and p r o v i records provenance links to the source text.
Table 4 defines the fields used in this structured claim-limitation representation.
Missing limitations are handled conservatively. The prototype distinguishes supported, contradicted, non-mentioned, uncertain, and equivalence-candidate evidence states. Explicit contradiction is treated as strong negative evidence. Non-mention is not treated as proof of absence because real product descriptions may be incomplete or selectively drafted. A claim-level positive literal-support outcome is generated only when every material limitation is supported. If one material limitation is contradicted, literal support fails. If one material limitation is non-mentioned or uncertain, the system returns an insufficient-evidence or expert-review status rather than a definitive positive conclusion.
Table 5 reports the corresponding evidence states and their claim-level consequences.
For a product feature f and a claim limitation e, literal support is defined as
LiteralSupport ( f , e ) = 1 CoreOK ( f , e ) RelOK ( f , e ) ModOK ( f , e ) QuantOK ( f , e ) SeqOK ( f , e ) ¬ Contradicted ( f , e ) .
The predicate CoreOK ( f , e ) may be satisfied by direct lexical identity, accepted synonymy, or ontology-mediated literal support, such as a recognized subtype or domain-standard alternative term. However, literal support still requires that all legally material relations, modifiers, quantities, ranges, and process-order constraints be satisfied. Therefore, ontology-mediated semantic normalization is not equivalent to unrestricted semantic similarity.
LiteralInfringe ( c , p ) = 1 e i E M ( c ) , f j F ( p ) LiteralSupport ( f j , e i ) = 1 ,
where E M ( c ) denotes the set of material claim limitations.
The doctrine-of-equivalents component is implemented only as a screening mechanism. It does not produce a final legal determination of equivalence. Instead, it produces an expert-review flag:
EquivalenceConcern ( c , p ) = 1 e i E M ( c ) , f j F ( p ) [ FWR ( f j , e i ) = 1 ¬ LiteralSupport ( f j , e i ) ¬ Contradicted ( f j , e i ) ¬ EstoppelBlocked ( f j , e i ) ] .
Here, FWR ( f j , e i ) denotes a simplified function–way–result screening predicate. It is satisfied only when the product feature performs a comparable function, in a comparable technical way, and achieves a comparable technical result according to the functional ontology and curated rule base. Even when FWR ( f j , e i ) = 1 , the system does not conclude doctrine-of-equivalents infringement. It only flags the limitation for expert legal review.
Table 6 summarizes the legal matching hierarchy applied by the prototype.
Prosecution-history limitations are represented in the legal ontology and knowledge graph rather than being treated as external narrative comments. The legal ontology is extended with the classes Amendment, NarrowingAmendment, ApplicantArgument, ExaminerRejection, Disclaimer, EstoppelConstraint, ExcludedEquivalent, and ClaimConstructionEvidence. These classes allow the system to represent whether a limitation was narrowed, whether a feature was disclaimed, whether an applicant distinguished prior art on a particular ground, and whether an asserted equivalence candidate falls within an excluded technical territory.
In the knowledge graph, prosecution-history evidence is encoded through typed edges such as narrowed_by, disclaimed, argued_distinct_from, and blocked_by. The rule engine uses these relations as gates. If a proposed product feature falls within an excluded equivalent or a disclaimed feature, the system blocks the equivalence-candidate flag even if the feature is functionally similar. If prosecution-history evidence is unavailable, the system does not assume that no estoppel exists. Instead, the output is marked as prosecution-history-unchecked, and the equivalence assessment is routed to expert review.
EstoppelBlocked ( f , e ) = 1 x ExcludedEquivalent ( x , e ) FallsWithin ( f , x ) d Disclaimer ( d , e ) Covers ( d , f ) .
The system distinguishes legally relevant similarity from merely technical similarity through an anchoring rule. A correspondence is legally relevant only when it is anchored to a specific material claim limitation, supported by traceable product evidence, and not defeated by relation, modifier, quantity, sequence, negative-evidence, or prosecution-history constraints. General topical similarity, shared technical field, common function, or embedding proximity is therefore insufficient.
LegallyRelevant ( f , e ) = 1 Material ( e ) AnchoredTo ( f , e ) TraceableEvidence ( f ) ConstraintSatisfied ( f , e ) ¬ Excluded ( f , e ) .
If TechnicalSimilar ( f , e ) = 1 but LegallyRelevant ( f , e ) = 0 , the system records the relation as background technical similarity only. It is not counted as a matched limitation, does not satisfy the all-elements rule, and does not support a positive infringement-risk output.
Table 7 summarizes how prosecution-history constraints are represented in the legal ontology and knowledge graph.
Table 8 summarizes the operational steps used for legal reasoning in the prototype.

5.3. Representation of Claim and Product Knowledge

A central design goal of the prototype is to preserve the structural relation between claim language and product evidence rather than collapsing both into flat text vectors. Each patent claim is therefore decomposed into a set of claim elements, and each element is represented as an entity node together with its attributes and internal relations. Product descriptions are processed in a parallel manner. When a claim contains a relation such as “a backrest attached to the seat,” the representation includes both the backrest entity and the relational edge attached_to(backrest, seat). Likewise, when a product description states that a “sitting surface is supported by three collapsible legs,” the representation preserves the component identity, the support relation, and the descriptive modifiers.
The resulting knowledge graph is source-aware. Claim-derived nodes and product-derived nodes remain distinguishable, while candidate correspondences are represented by explicit alignment relations. This design enables three forms of evidential reasoning. First, direct matches can be established when the same or synonymous concepts appear in both sources. Second, ontology-mediated matches can be inferred when a product feature is a subtype or accepted variant of the claimed element. Third, explicit negative evidence can be preserved when the product description denies the presence of a claimed feature. The last of these is particularly important in patent infringement analysis, because an explicit absence statement may be legally decisive and should not be treated as a mere omission.

5.4. Illustrative Chair–Stool Example

To demonstrate the end-to-end behavior of the prototype, the system was applied to the running example introduced in Section 4. The asserted claim was: “A chair comprising a seat, a plurality of legs, and a backrest attached to the seat.” The accused-product description was: “Our product is a portable stool with a flat round sitting surface supported by three collapsible legs. It does not have any back support.” A human analyst would ordinarily conclude that literal infringement is unlikely because the accused product lacks the backrest element. The purpose of this experiment was to determine whether the prototype would reproduce the same conclusion through a transparent DIKWP-network reasoning process.
At the information dimension, the claim was segmented into three required elements: seat, plurality of legs, and backrest attached to the seat. The product description yielded three relevant feature statements: a sitting surface, three collapsible legs, and an explicit negative statement indicating the absence of back support. At the knowledge dimension, sitting surface was normalized to the concept of seat, and back support was mapped to the concept of backrest for the purpose of semantic comparison. The claim graph and the product graph were then aligned and tested against the all-elements condition.
The result was a clear non-infringement outcome. The seat element was matched through ontology-based normalization between sitting surface and seat. The legs element was matched directly, with the additional numerical check that three legs satisfy the claim requirement of a plurality of legs. The backrest element was not matched, and the explicit negative product statement provided positive support for its absence. Because at least one material claim element was missing, literal infringement was rejected. No substitute structure was identified that would satisfy the simplified equivalence condition, and equivalence-based concern was therefore also rejected. Table 9 summarizes the element-level comparison.
The explanation generator then produced a structured report stating that the product contains seat and leg features corresponding to two claim elements, but does not contain the required backrest or any identified equivalence-candidate substitute. This example is useful not because the legal conclusion is difficult, but because it shows that the prototype preserves a traceable path from raw text to final assessment. Such traceability is a central requirement in explainable legal AI and is especially important for patent infringement analysis, where expert review must remain possible at the level of individual claim elements and supporting evidence [25,26].
To complement this minimal example, Table 10 presents a litigation-style claim–product semantic comparison for a mechanical clamp. The table illustrates how the DIKWP network renders claim limitations, ontology concepts, and aligned product features within a format that remains familiar to patent professionals. Unlike purely lexical claim charts, this semantic chart separates literal support, ontology-mediated correspondence, and functional-equivalence candidates. In the example, a pivot may provide an ontology-mediated correspondence to a hinge limitation if claim construction permits, while a torsion spring may trigger a functional-equivalence candidate flag rather than a final legal equivalence conclusion.
The prototype also records a rule-oriented decision trace linking semantic matches to legal inference. Table 11 summarizes this operational logic. The table should be read as the prototype’s reasoning abstraction rather than as a complete statement of jurisdiction-specific patent doctrine. Its purpose is to show how literal coverage, equivalence-oriented screening, and failure of the all-elements condition are distinguished transparently within the DIKWP network.
The examples in Table 9, Table 10 and Table 11 are illustrative outputs intended to demonstrate element-level representation, semantic correspondence, and rule-oriented reasoning traces. They are not used as validation evidence. Quantitative evaluation and statistical interpretation are reported separately in Section 6.

5.5. Additional Qualitative Evaluation

Beyond the chair–stool example, the prototype was tested on several hypothetical scenarios and one qualitative case derived from a publicly reported coffee-capsule patent dispute. In that case, the patent description and the accused product were sufficiently similar to trigger extensive correspondence analysis, yet one structural element required by the asserted claim did not appear in the literal claimed form. The system initially rejected direct element identity, but it also detected a functionally similar edge structure that could trigger an equivalence-candidate flag. The resulting output was therefore not a simple binary judgment, but an uncertainty-aware report indicating that literal correspondence was incomplete while equivalence-based concern remained.
This qualitative case illustrates the benefit of the DIKWP network formulation. The initial mismatch identified in the knowledge dimension triggered additional reasoning in the wisdom dimension, which in turn depended on purpose-conditioned evaluation criteria. In an enforcement-oriented configuration, the same case would be reported as an equivalence concern requiring legal review. In a clearance-oriented configuration, the system instead emphasizes the disputed element as a possible design vulnerability or redesign target. The semantic evidence remains the same, but the analytical posture and reporting logic are shaped by the purpose dimension.

5.6. Preliminary Throughput and Implementation Observations

A preliminary throughput test was conducted to assess whether the prototype architecture is computationally workable for small-scale experimental use. In this test, one patent claim set was compared against 1000 randomly selected patent abstracts used as surrogate product descriptions. Under a rule-based configuration without deep neural extraction, the system processed approximately 50 cases per minute on a standard desktop computer used for prototyping. When the patent-adapted deep NLP models were enabled, throughput decreased to approximately 5 cases per minute. These numbers are not intended as benchmark results; rather, they provide a practical indication of the relative computational burden of the main modules.
The main bottleneck was not graph matching or rule execution, but linguistic parsing and entity extraction. This observation is consistent with the broader patent-NLP literature, where domain-adapted parsing and semantic extraction often dominate runtime cost [1,3]. From a design perspective, this suggests that future optimization should focus on batching, pre-parsing, model distillation, or staged candidate filtering rather than on reducing the already moderate cost of ontology-based reasoning. It also supports the modular strategy adopted in the prototype, because individual components can be replaced or accelerated without altering the overall DIKWP network structure.
Future optimization will focus on the information dimension, where linguistic parsing and entity extraction dominate runtime. Several strategies are planned. First, a staged filtering architecture can apply lightweight lexical retrieval, BM25-style ranking, or dense vector retrieval to identify candidate patent–product pairs before invoking the full deep NLP pipeline. Second, repeated claim parsing can be cached, because the same patent claims are often compared against many products. Third, transformer-based extraction models can be accelerated through batching, GPU inference, quantization, pruning, and task-specific model distillation. Fourth, graph construction can be made incremental so that only changed or newly introduced entities are recomputed. These optimizations would reduce latency while preserving the explainable DIKWP reasoning structure.

5.7. Implications of the Prototype

The prototype should be interpreted as a proof of feasibility rather than as a production-level legal system. Its main contribution is to show that the proposed DIKWP network can be instantiated through existing NLP, semantic-web, and rule-reasoning techniques while preserving explainability and legal interpretability. The implementation demonstrates that the architecture is capable of preserving claim-element structure, handling negative evidence, supporting ontology-mediated semantic alignment, producing human-readable explanations, and adjusting its reporting behavior to different analytical purposes.
At the same time, the current prototype also reveals the main areas requiring further development. Ontology coverage remains limited to a manageable set of technical and legal classes; equivalence reasoning is operational rather than jurisprudentially complete; and the interface is intentionally lightweight. These limitations do not weaken the conceptual value of the framework, but they identify the next steps toward a more mature system. In particular, richer domain ontologies, stronger patent-specific entity recognition resources, multimodal processing of patent drawings, and tighter integration between symbolic reasoning and learned semantic matching would substantially improve practical coverage [19,55].
Overall, the prototype validates the architectural claim advanced in the previous sections: patent infringement detection can be modeled as a networked process of semantic extraction, ontology-guided alignment, evidential reasoning, and purpose-aware orchestration. The next section therefore moves from illustrative implementation results to a more systematic evaluation using controlled experimental data and quantitative performance metrics.

6. Experimental Evaluation of the DIKWP Network on Controlled and Real-World Patent–Product Corpora

This section evaluates the proposed DIKWP-based semantic AI framework from three perspectives: infringement-risk detection effectiveness, computational efficiency, and the contribution of the DIKWP network dimensions. To address both internal controllability and external validity, the revised evaluation uses a two-layer design. The first layer is a controlled synthetic patent–product corpus, which allows the reasoning behavior of different system configurations to be isolated and compared under known positive, negative, and near-miss conditions. The second layer is a real-world pilot corpus constructed from publicly available patent claims and real product technical descriptions. This pilot corpus is designed to examine whether the framework can preserve claim-element reasoning under realistic patent drafting styles, domain-specific terminology, incomplete product evidence, and borderline semantic correspondences.
The two evaluation layers serve different purposes and are therefore reported separately. The controlled synthetic corpus is used for configuration comparison, ablation analysis, and transparent precision–recall calculation. The real-world pilot corpus is used as an external-validity check rather than as a litigation-level benchmark. This distinction is important because real patent infringement analysis depends not only on textual similarity but also on claim construction, product-description completeness, prosecution-history context, and expert legal interpretation.

6.1. Dataset Construction and Annotation Protocol

To address the possibility that the reported precision and recall could reflect artifacts of dataset design rather than genuine element-level reasoning, the controlled synthetic corpus was constructed according to an explicit claim-element protocol. The controlled corpus contains 100 patent–product pairs generated from 20 synthetic patent claim templates across four technical domains: mechanical devices, electrical/electronic devices, software/process methods, and medical tools. Each template was used to generate five product descriptions representing different claim-to-product correspondence patterns. The purpose of this corpus is not to approximate the full complexity of patent litigation, but to provide a transparent and controllable evaluation setting in which claim-limitation decomposition, missing-limitation handling, near-miss discrimination, and equivalence-candidate routing can be tested.
Each synthetic patent claim template was first decomposed into legally material claim limitations. A material limitation was defined as a claim element whose absence, contradiction, or material alteration would affect the claim-level infringement-risk label under the prototype’s all-elements reasoning rule. The decomposed limitations included core technical concepts, structural relations, functional relations, spatial constraints, material or purpose modifiers, numerical or range constraints, and process-order constraints. Product descriptions were then generated by systematically preserving, paraphrasing, omitting, contradicting, or functionally modifying these limitations.
Positive cases were generated only when all material limitations were preserved in the product description. To avoid trivial lexical overlap, positive cases included literal-support descriptions, synonym/paraphrase descriptions, and ontology-mediated support descriptions in which the product expression used a domain-recognized alternative term while preserving the material relation and constraint. Negative cases were generated by omitting, contradicting, or materially altering at least one required limitation. Near-miss negative cases preserved most claim elements but changed one legally decisive limitation, such as a spatial relation, material modifier, numerical range, or process-step order. Borderline cases were generated when a product feature was functionally related to a claim limitation but was not treated as literal or ontology-mediated support. These cases were labeled as equivalence-candidate cases and routed to expert review rather than counted as definitive literal-support positives.
Table 12 summarizes the generation rule, annotation criterion, and evaluation purpose for each case type in the controlled synthetic corpus.
The corpus was annotated at two levels. At the element level, each material claim limitation was assigned an evidence state. At the claim level, each patent–product pair was assigned an infringement-risk label. The claim-level label was derived from the element-level evidence states rather than from global document similarity. This design prevents a high overall textual similarity score from producing a positive label when a material limitation is missing, contradicted, or legally unresolved.
Table 13 reports the element-level and claim-level annotation criteria used for the controlled synthetic corpus.
Table 14 reports the label distribution and domain coverage of the controlled synthetic corpus. The corpus contains 50 positive cases and 50 non-positive cases. The non-positive cases include 20 missing-limitation cases, 20 near-miss cases, and 10 borderline equivalence-candidate cases. This design prevents the evaluation from collapsing into an easy positive-versus-obvious-negative task.
Table 15 provides representative positive, negative, near-miss, borderline, and insufficient-evidence examples. These examples are included to make the annotation logic inspectable. They also show that the labels were assigned through element-level legal-technical reasoning rather than through document-level lexical overlap.
Several safeguards were used to reduce dataset-design artifacts. First, positive cases were not generated by simply copying claim language; synonym, paraphrase, and domain-specific alternative terminology were introduced. Second, negative cases were not limited to obvious non-infringement examples; near-miss cases preserved most claim elements while changing one decisive limitation. Third, labels were assigned at the claim-element level before claim-level aggregation, so global document similarity could not by itself produce a positive label. Fourth, borderline equivalence-candidate cases were separated from literal-support cases and were not treated as definitive infringement positives. These safeguards make the evaluation more suitable for testing structured reasoning than a simple lexical-overlap benchmark.

6.2. Evaluation Settings and System Configurations

The evaluation compared five system configurations. The first was a keyword-matching baseline based on lexical overlap between patent claims and product descriptions. The second was a semantic configuration using SAO extraction and ontology-based matching without the full DIKWP reasoning and purpose control. The third was the full proposed system in its default setting. The fourth and fifth were purpose-conditioned variants of the full system, namely an aggressive mode optimized for higher recall and a conservative mode optimized for higher precision. These variants allow the contribution of the purpose dimension to be assessed explicitly.
The quantitative metrics included precision, recall, and F1-score at the patent–product pair level. Average processing time, scalability with respect to corpus size, and memory consumption were also measured. Explanation quality was manually assessed by checking whether each generated explanation identified the decisive claim element or elements. In 98 of the 100 evaluated cases, the generated explanation correctly identified the decisive claim element or elements. In the remaining two cases, parser errors led to slightly inaccurate element naming, although the final infringement label remained unchanged.

6.3. Real-World Pilot Corpus and External-Validity Check

To address the limited external validity of a purely synthetic patent–product corpus, we added a real-world pilot corpus based on publicly available patent claims and real product technical descriptions. The purpose of this pilot corpus is not to construct a litigation-level benchmark, but to examine whether the proposed DIKWP network can preserve claim-element reasoning under realistic drafting styles, domain-specific terminology, incomplete product evidence, and borderline claim-to-product correspondences. This design directly responds to the concern that synthetic patent–product pairs may not fully capture the ambiguity and evidentiary incompleteness of real patent infringement analysis.
Patent claims in the pilot corpus were collected from public patent-search resources. For each selected patent, one independent claim was used as the primary asserted claim, and dependent claims were considered when they introduced material technical limitations relevant to the product comparison. Product evidence was collected from real-world technical materials, including manufacturer manuals, technical datasheets, official product webpages, installation guides, user manuals, and technical brochures. Short promotional descriptions or commercial summaries were not used alone unless they contained technically informative product features. Where publicly available litigation-related materials were available, such as court opinions, complaints, claim-construction materials, or claim-chart-like descriptions, they were used only for qualitative sanity checking of legally relevant claim limitations rather than as binding legal determinations.
The pilot corpus was designed to complement, rather than replace, the controlled synthetic corpus. The synthetic corpus provides internal controllability because positive, negative, and near-miss conditions can be systematically constructed. By contrast, the real-world pilot corpus provides an external-validity check because real patents and product descriptions often contain drafting variation, implicit terminology, incomplete disclosure, and domain-specific expressions. The two corpora are therefore reported separately and are not pooled into a single performance estimate.
The pilot corpus contains real patent–product pairs across three technical domains: mechanical devices, electrical/electronic devices, and software/process claims. These domains were selected because they represent different types of claim limitations. Mechanical-device claims often contain structural and spatial relations. Electrical/electronic claims often contain component, signal, and control-function relations. Software/process claims often contain functional limitations, process steps, and sequence constraints. This domain coverage allows the pilot study to test whether the DIKWP network can handle different forms of claim-to-product correspondence beyond surface lexical overlap.
Table 16 summarizes the composition of the real-world pilot corpus.
Each real-world patent–product pair was annotated at two levels. At the element level, each material claim limitation was assigned one of five evidence states: supported, contradicted, non-mentioned, uncertain, or equivalence-candidate. A limitation was labeled as supported when the product evidence directly, synonymously, or ontology-medially disclosed the required feature and its material constraints. A limitation was labeled as contradicted when product evidence explicitly denied the required feature or disclosed an incompatible relation, modifier, range, quantity, or process order. A limitation was labeled as non-mentioned when the available product evidence did not disclose the feature. A limitation was labeled as uncertain when the product evidence was vague, incomplete, or technically under-specified. A limitation was labeled as equivalence-candidate when a product feature was not literally identical to the claimed limitation but appeared functionally related in a way that could require expert doctrine-of-equivalents review.
At the claim level, each pair was assigned one of four outcome labels: likely literal support, likely non-coverage due to a missing or contradicted material limitation, borderline equivalence concern, or insufficient evidence. Likely literal support was assigned only when every material claim limitation was supported by traceable product evidence. Likely non-coverage was assigned when at least one material limitation was contradicted or materially mismatched. Borderline equivalence concern was assigned when literal support was incomplete but a non-identical product feature appeared to perform a comparable function in a comparable way with a comparable technical result. Insufficient evidence was assigned when one or more material limitations were not disclosed by the product evidence and no reliable positive or negative inference could be made.
For reporting and interpretation, definite positive and definite negative cases were separated from borderline and insufficient-evidence cases. The real-world pilot corpus was not pooled with the controlled synthetic corpus and was not used to produce a separate headline precision–recall estimate. Instead, it was used as a qualitative external-validity check to examine whether the DIKWP network could preserve element-level reasoning, avoid overconfident infringement or non-infringement conclusions, and route uncertain or borderline cases to expert review or insufficient-evidence status. This reporting design is important because forcing legally unresolved cases into binary labels would overstate the certainty of the system output.
Table 17 summarizes the annotation criteria used for the real-world pilot corpus.
The real-world pilot corpus also highlights a practical difficulty that is less visible in synthetic testing: product descriptions often omit legally material implementation details. In such cases, the system should not treat non-disclosure as proof of absence. The DIKWP network therefore assigns non-mentioned or uncertain evidence states and routes the case to expert review when the available evidence cannot support a definite claim-level conclusion. This conservative behavior is consistent with the intended role of the framework as an explainable decision-support tool rather than an autonomous infringement adjudicator.
The pilot corpus remains limited in scale and should not be interpreted as a statistically stable benchmark. Its purpose is to test external validity at a preliminary level by exposing the system to real patent drafting styles and real product-description quality. These pilot observations were analyzed qualitatively and were not pooled with the controlled-corpus precision–recall scores. Broader validation will require larger expert-annotated patent–product datasets, public claim charts, litigation materials, prosecution-history records, and multimodal evidence such as patent drawings and product images.

6.4. Results on the Controlled Synthetic Corpus

Table 18 reports the main classification results on the controlled synthetic corpus. These results are used to compare system configurations under a balanced and controlled setting. They are reported separately from the real-world pilot corpus and should not be interpreted as estimates of real-world litigation accuracy. The keyword baseline achieved a precision of 0.68, a recall of 0.52, and an F1-score of 0.59. This result indicates that surface-level lexical overlap is insufficient for infringement-risk analysis, especially when claims and product descriptions express similar functions through different terminology. The ontology-supported semantic configuration improved performance to 0.78 precision, 0.70 recall, and 0.74 F1, indicating that semantic normalization and structured matching already provide a measurable improvement over lexical methods. The full DIKWP network in default mode improved precision to 0.87, recall to 0.82, and F1-score to 0.85 within the controlled synthetic corpus.
To make the calculation of the default-mode performance transparent, Table 19 reports the corresponding confusion matrix. The full DIKWP network correctly identified 41 of the 50 positive cases and correctly rejected 44 of the 50 negative cases. It missed 9 positive cases and falsely flagged 6 negative cases. These counts yield an accuracy of 0.85, a precision of 0.87, a recall of 0.82, and an F1-score of 0.85.
The performance gains are further reported as absolute within-corpus effect sizes in Table 20. Compared with the keyword-matching baseline, the default DIKWP network improved F1 by 0.26. Compared with the semantic-and-ontology configuration without full DIKWP reasoning, it improved F1 by 0.11. These gains indicate measurable analytical benefit within the constructed evaluation setting, but they should not be interpreted as externally generalizable estimates of litigation-level performance.
Because the evaluation corpus contains only 100 synthetic patent–product pairs, the reported gains remain sensitive to case construction, domain coverage, parser behavior, and ontology completeness. As an additional indication of small-sample uncertainty, Wilson 95% confidence intervals were estimated for the default-mode results. The approximate intervals were 0.75–0.94 for precision, 0.69–0.90 for recall, and 0.77–0.91 for accuracy. These intervals characterize uncertainty within the controlled corpus only and should not be treated as confidence intervals for real-world litigation performance.
The improvement from the ontology-based configuration to the full DIKWP configuration is analytically important. It indicates that the performance gain does not arise from semantic normalization alone. The wisdom dimension contributes by enforcing claim-level consistency, especially through the all-elements rule and conflict resolution among partially matched elements. The purpose dimension contributes by controlling the decision threshold and the treatment of borderline correspondences. Within the controlled experimental setting, the DIKWP advantage is therefore not reducible to any single module; it emerges from the coordinated interaction of semantic extraction, structured knowledge, evidential reasoning, and purpose-sensitive orchestration.
The aggressive and conservative configurations further demonstrate the operational value of the purpose dimension. In aggressive mode, recall increased to 0.92, while precision decreased to 0.80. This setting is appropriate when the primary objective is to avoid missing potentially infringing products, even at the cost of additional manual review. In conservative mode, precision increased to 0.93, while recall decreased to 0.75. This setting is more appropriate for clearance analysis, where false positives may lead to unnecessary redesign efforts or legal concern. These two operating points show that the purpose dimension can shift the precision–recall balance within the prototype, although the stability of this behavior requires further validation on larger and externally sourced datasets.
Figure 3 visualizes the precision, recall, and F1-score of the main configurations. The figure shows that the full DIKWP network achieves the best overall balance within the controlled corpus, whereas the two purpose-conditioned variants shift the operating point toward recall or precision according to the analytical objective.
Performance also varied across technical domains. Table 21 reports a domain-disaggregated view across the four categories used in the controlled synthetic corpus. Mechanical and electrical/electronic cases produced stronger results because their structural and component-level limitations were more directly represented in the ontology. Software/process and medical-tool cases were more difficult because they contained more functional wording, process-step ambiguity, specialized terminology, and borderline equivalence patterns. These domain-level results remain exploratory because each domain contains only 25 synthetic pairs.
Overall, the detection results support the feasibility and analytical value of the DIKWP network as a proof-of-concept decision-support framework. They do not establish generalizable litigation-level accuracy. Broader validation will require externally collected patent–product pairs, expert-annotated claim charts, public litigation materials, and larger cross-domain corpora.

6.5. Efficiency and Scalability

The evaluation used a workstation equipped with an 8-core CPU and 16 GB RAM. For a single patent–product pair, the DIKWP network required an average runtime of 2.5 s. Linguistic parsing, entity extraction, and knowledge-graph construction accounted for approximately 2.0 s of this runtime, whereas the reasoning stage typically required less than 0.5 s. This distribution shows that the main computational bottleneck lies in the information dimension rather than in the knowledge or wisdom dimensions.
Table 22 provides a module-level breakdown of latency and resource usage. The table shows that linguistic parsing and graph construction dominate end-to-end latency, while semantic inference consumes the largest share of CPU and memory. Purpose validation and report generation introduce comparatively little overhead. This pattern is consistent with the DIKWP network interpretation of the prototype: the main cost arises when raw text is transformed into structured semantic evidence, whereas the purpose dimension mainly regulates the analytical posture rather than performing heavy computation itself.
Based on this bottleneck profile, future optimization should focus primarily on reducing the cost of linguistic processing, graph construction, and candidate alignment, rather than on legal-rule execution itself. Table 23 summarizes the main engineering bottlenecks and corresponding optimization strategies.
Scalability tests were performed by comparing one product description against increasing numbers of patents, with each patent represented by an average of ten claims. Under this setting, the system processed approximately 100 patents, corresponding to roughly 1000 claims, against a single product in about three minutes. The observed scaling behavior was approximately linear. This is expected because most claim-to-product analyses are independent once the documents have been preprocessed. The same linear pattern was observed when a single patent was compared against many product descriptions.
Memory usage remained moderate throughout the experiments. Ontology loading and graph construction required only a few megabytes per case, and even when processing about one thousand comparisons in batch mode, memory consumption remained within a few hundred megabytes. These results suggest that the current architecture is computationally feasible for offline professional analysis and can be further improved through batching, pre-parsing, or parallel execution. Since the individual patent–product analyses are largely independent, the system is also amenable to straightforward parallelization.

6.6. Error Analysis

A more detailed inspection of the errors provides insight into the current limitations of the framework. Out of the 50 positive cases, the system missed 9, which corresponds to the reported recall of 0.82. Five of these false negatives were caused primarily by linguistic parsing failures. In these cases, complex claim syntax was segmented incorrectly, leading to incomplete or distorted claim-element representations. For example, compound technical noun phrases were occasionally split into separate units, causing the system to search for an incorrect feature in the product description. This finding suggests that the information dimension remains the most fragile part of the current architecture.
The remaining four false negatives were caused by difficult equivalence patterns that were not captured by the existing ontology or rule base. In such cases, the accused product did not use the same term or explicit structure as the patent claim, but the functional similarity was still strong enough that a human expert might consider equivalence. These misses indicate that the present equivalence mechanism remains operational and simplified rather than jurisprudentially complete.
Out of the 50 negative cases, the system falsely flagged 6 as infringing. Together with 41 true positive cases, this yields a revised precision of 0.87, which is consistent with the default-mode confusion matrix reported in Table 19. Most of these false positives occurred when all major components appeared to be present, but the claim actually depended on a more specific limitation that the current prototype did not model deeply enough. Typical examples included numerical ranges, compositional proportions, or process constraints. In such cases, the knowledge graph captured the presence of the relevant elements but did not fully represent the quantitative or procedural condition that restricted the claim scope. This suggests that future work should strengthen support for numerical claim interpretation, method claims, and more detailed legal claim construction.
The real-world pilot corpus produced a different error profile from the controlled synthetic corpus. The most frequent difficulty was not direct contradiction but incomplete product evidence. Manufacturer brochures and webpages often describe product advantages, high-level functions, or commercial specifications without disclosing all structural or process details required by a patent claim. In these cases, the system frequently assigned uncertain or non-mentioned evidence states. This behavior lowered the number of definitive outputs but improved legal caution because the system did not treat non-disclosure as proof of absence.
A second difficulty was domain-specific terminology. Some real product documents used commercial or engineering terms that differed from claim language even when they referred to related structures or functions. The ontology and knowledge-graph layer reduced some of these mismatches, but several cases still required expert review. A third difficulty involved borderline functional substitutions, where a product feature performed a similar function but used a different structure or operating mode. In such cases, the system flagged equivalence concern rather than producing a final infringement conclusion.
Taken together, these errors are likely to become more pronounced in real litigation materials. Broad functional claim terms may cause the ontology matcher to over-generalize correspondences, while highly specific modifiers may be missed if they are embedded in long dependent claims or technical descriptions. Accordingly, the model is expected to perform best as a screening and explanation tool when the available evidence is textually explicit, and to require stronger human-in-the-loop review when claim language is intentionally broad, functionally defined, or dependent on jurisdiction-specific claim construction.
From an explanation perspective, the system performed relatively well. Manual inspection showed that in 98 of 100 quantitative cases the explanation correctly named the decisive matched or unmatched elements. In the remaining two cases, the final label was correct but one extracted element name was imprecise because of a parser error. This finding is encouraging because it shows that explanation quality is closely tied to extraction quality; when the information dimension is correct, the downstream explanation is usually coherent and legally interpretable.

6.7. Contribution of the DIKWP Network

The experimental results allow the role of each DIKWP dimension to be assessed more explicitly. The data and information dimensions provide the foundational semantic evidence. When parsing quality deteriorates, the rest of the network is forced to reason over incomplete or distorted representations. This was visible in the false-negative cases caused by claim-segmentation errors. In practical terms, the DIKWP network cannot compensate indefinitely for poor information extraction; errors introduced early in the process propagate unless they are corrected through feedback.
The knowledge dimension contributes mainly through ontology-based normalization and semantic alignment. When this dimension was weakened in the ablated configurations, recall decreased substantially because synonymy, taxonomy, and part–whole relations were no longer captured reliably. For example, expressions such as back support and backrest, or sitting surface and seat, are easy for human experts to align but may be missed by purely lexical systems. The ontology and knowledge graph therefore play a decisive role in bridging the linguistic gap between patent language and product descriptions.
The wisdom dimension contributes by transforming local semantic correspondences into a legally meaningful decision. Without this dimension, the system can report that several elements appear semantically related, but it cannot reliably determine whether the claim as a whole is covered. The performance difference between the ontology-only configuration and the full system shows that claim-level logical consistency materially improves precision. In particular, the all-elements rule prevented the DIKWP network from treating limited semantic correspondences as sufficient evidence of infringement when the broader claim structure did not support that conclusion.
The purpose dimension contributes by controlling the operating posture of the system. Its effect is visible in the contrast between the aggressive and conservative configurations. More importantly, the purpose dimension does not merely adjust a fixed threshold after the fact. It regulates the interpretation of uncertain matches, the activation of equivalence-oriented reasoning, and the framing of the final report. This is why the present system is better understood as a DIKWP network than as a fixed stack of modules. The purpose dimension can influence how the wisdom dimension treats ambiguity, and the wisdom dimension can in turn request additional support from the knowledge or information dimensions when the available evidence is insufficient. The empirical behavior of the system is therefore consistent with a recurrent and purpose-sensitive networked architecture.

7. Discussion

The results of this study suggest that patent infringement detection benefits substantially from combining semantic text processing with structured knowledge representation and purpose-aware reasoning. The experimental findings indicate that purely lexical or keyword-based matching is insufficient for claim-level infringement assessment, especially when product descriptions and patent claims express similar technical content through different terminology or functional phrasing. By contrast, the proposed framework improves analytical reliability by connecting patent-oriented NLP, ontology-based normalization, knowledge-graph construction, and rule-guided reasoning within a unified DIKWP network. This result is consistent with broader developments in legal AI and explainable AI, where structured knowledge and transparent reasoning are increasingly recognized as necessary in high-stakes decision-support systems.
A central implication of this work is that the knowledge dimension is not merely an auxiliary enhancement to text analytics, but a core requirement for legally meaningful patent analysis. The ontology and knowledge graph enable the system to preserve structural, functional, and relational information that would otherwise be weakened or lost in flat text similarity approaches. This is particularly important in patent law, where infringement depends on the presence or absence of specific claim elements and their relations rather than on global topical resemblance alone. In this respect, the present study supports the view that guarded symbolic–statistical architectures are especially suitable for legal-technical domains, because they combine the semantic flexibility of NLP with the explicit interpretability of formal knowledge models [13].
The study also shows the methodological value of treating DIKWP as a networked design paradigm rather than as a hierarchical checklist. In the proposed system, purpose does not appear only at the end of the reasoning process. Instead, it functions as an active control dimension that regulates matching sensitivity, the treatment of borderline correspondences, and the form of the final analytical report. This became visible in the evaluation, where aggressive and conservative operating modes produced different precision–recall trade-offs without changing the underlying evidential basis. The effect is conceptually important because it demonstrates that the same semantic evidence can support different analytical postures depending on whether the user’s objective is enforcement, clearance, or design-around support. In this sense, the DIKWP network contributes not only to system architecture but also to analytical controllability.
Another important contribution concerns explainability and user trust. Patent professionals are unlikely to rely on AI tools that produce conclusions without traceable justification. The present framework addresses this issue by preserving an evidence path from raw text to extracted elements, from extracted elements to ontology-based correspondences, and from those correspondences to claim-level reasoning outcomes. This layered evidence trace is particularly valuable in legal settings because it supports both user confidence and post hoc review. If the system reaches a plausible but incorrect conclusion, the error can usually be localized to a specific stage, such as claim parsing, ontology mapping, or equivalence reasoning. This is an important practical advantage over opaque predictive systems, especially in regulated domains where justification and accountability matter as much as output accuracy.
From a computational perspective, the evaluation suggests that the proposed DIKWP network is feasible for offline analytical use. The main runtime burden lies in the information dimension, particularly in linguistic parsing and entity extraction, whereas ontology-based reasoning and graph-level matching remain relatively lightweight in the current implementation. This indicates that future optimization should focus more on patent-specific NLP efficiency than on reducing the cost of rule execution. A pragmatic deployment strategy would therefore use staged analysis: a lightweight retrieval or filtering component could first narrow the candidate set, after which the full semantic and reasoning workflow would be applied only to the most relevant patent–product pairs. Such a two-stage approach would preserve interpretability while making the system more scalable for industrial use.
The present work also clarifies the role of the wisdom dimension. In practical terms, this dimension corresponds to the part of the architecture that transforms local semantic correspondences into a legally meaningful assessment. The ontology may indicate that two concepts are related, but that does not by itself establish infringement. Claim-level reasoning still requires consistency checking, element aggregation, and treatment of ambiguity. The wisdom dimension performs this function by enforcing the all-elements rule, handling uncertain matches, and determining whether function-based equivalence should be considered. It is precisely this layer that prevents the system from confusing semantic relatedness with legal sufficiency.
Although the architecture is informed by artificial consciousness theory, the contribution should be interpreted cautiously. The present system does not claim machine consciousness in any phenomenological sense. Rather, the relevance of artificial consciousness lies in the architectural emphasis on purpose-aware coordination, reflective explanation, and adaptive control. In this limited engineering sense, the DIKWP network offers a useful way to operationalize higher-order design principles without making ontologically strong claims about consciousness itself. This restrained interpretation is preferable in a journal context, because it preserves the conceptual contribution while avoiding unnecessary overstatement [42,43,44].
The addition of the real-world pilot corpus strengthens the external validity of the evaluation, while also showing that patent–product analysis under realistic evidentiary conditions is more ambiguous than controlled synthetic testing. Real patent claims often contain broad functional expressions, nested limitations, numerical ranges, and terminology shaped by drafting and prosecution strategy. Real product descriptions, by contrast, are frequently incomplete, promotional, or selectively drafted, and may omit structural, compositional, or process details that are legally material to infringement analysis. The pilot evaluation therefore suggests that an infringement-support system should not be evaluated only by binary classification accuracy, but also by its ability to distinguish supported evidence, contradicted evidence, uncertain evidence, and non-mentioned limitations.
The real-world pilot should nevertheless be interpreted cautiously. Its scale remains limited, and publicly available product evidence does not always disclose all product features that would be material in litigation. The pilot therefore should not be treated as a litigation-level benchmark. Rather, it serves as an external-validity check showing that the DIKWP network can preserve claim-element reasoning, identify unsupported or non-mentioned limitations, and route uncertain or borderline cases to expert review under more realistic evidentiary conditions.
At the same time, several limitations should be acknowledged. First, the current system handles relatively straightforward claim structures more effectively than highly intricate dependent claims or method claims. Patent claims often include nested constraints, numerical ranges, process steps, and functionally defined limitations that require more detailed semantic and legal modeling than the current prototype provides. Second, although the framework includes a simplified equivalence mechanism, this component should not be confused with a full implementation of doctrine-of-equivalents reasoning in its jurisdiction-specific legal complexity. Third, the current ontology remains limited in scope and would require significant extension for broader deployment across domains such as biotechnology, chemistry, or telecommunications.
A related limitation concerns uncertainty. The present system can assign confidence-oriented outputs and flag borderline cases, but it does not yet provide a rich formal treatment of evidential uncertainty, incompleteness, or conflicting sources. In real-world practice, product descriptions are often incomplete or strategically vague, and missing mention should not always be interpreted as absence. Future versions of the architecture should therefore make a clearer distinction between explicit negative evidence, uncertain evidence, and non-mentioned features. This would improve both legal robustness and practical usability.
The system also faces the classic knowledge acquisition bottleneck. Building and maintaining legal and technical ontologies remains labor-intensive, and performance depends heavily on ontology quality. Although automatic extraction, lexical resources, learned embeddings, and graph-completion methods can accelerate ontology expansion, the final incorporation of legal and technical concepts should remain governed by expert validation. This is not merely an engineering limitation but a governance requirement in legal AI, because incorrect ontology updates may directly affect infringement-risk assessment. Accordingly, future ontology maintenance should combine semi-automatic candidate generation with provenance tracking, consistency checking, and expert review.
Recent advances in large language models also raise an important question: whether a general-purpose LLM could replace part of the proposed DIKWP network. In practice, LLMs may be useful in extraction, paraphrase normalization, or candidate correspondence generation. However, their outputs are probabilistic and may be inconsistent or insufficiently constrained by legal doctrine. For this reason, the more promising direction is not replacement, but careful integration. An LLM can assist with semantic flexibility, while ontology-based validation and rule-guided reasoning maintain determinism, traceability, and legal consistency. This kind of guarded hybridization would be fully compatible with the DIKWP network and may strengthen both coverage and robustness.
Finally, although this study focuses on patent infringement detection, the broader architectural principle may generalize to other legal domains. Tasks such as claim construction support, freedom-to-operate analysis, contract compliance checking, and trademark risk analysis all require some combination of semantic extraction, structured legal knowledge, explicit reasoning, and purpose-sensitive reporting. The specific ontology and rule base would differ, but the DIKWP-network formulation remains applicable. This suggests that the present work may have value beyond patents as a more general model for explainable legal-semantic AI.

8. Conclusions and Future Work

This paper presented a semantic AI framework for patent infringement detection grounded in the DIKWP network and informed, in a limited architectural sense, by artificial consciousness theory. The framework transforms raw patent and product descriptions into structured, explainable infringement-oriented assessments by coordinating five interacting semantic dimensions: data, information, knowledge, wisdom, and purpose. Unlike conventional linear pipelines, the proposed system is formulated as a recurrent network in which semantic extraction, knowledge organization, decision logic, and task objectives interact during analysis.
The main contribution is to show that patent infringement detection can be modeled as an integrated process of patent-oriented NLP, ontology-based semantic alignment, knowledge-graph construction, legal rule-guided reasoning, uncertainty handling, and purpose-aware orchestration. At the architectural level, the study instantiates the DIKWP network as a modular legal-AI system. At the methodological level, it develops a claim-to-product semantic workflow capable of preserving claim-limitation structure, handling negative evidence, and producing interpretable reasoning traces. At the implementation level, it provides a proof-of-concept prototype and illustrative outputs. At the evaluation level, the study provides two forms of evidence. The controlled synthetic corpus shows that the proposed DIKWP network improves over keyword-based and ontology-only baselines under known positive, negative, and near-miss conditions. The real-world pilot corpus further examines whether the framework can preserve claim-element reasoning when applied to public patent claims and real product technical descriptions under more realistic evidentiary conditions.
Theoretically, the study further clarifies the epistemological division of labor between statistical candidate generation and symbolic legal validation. Statistical components provide semantic flexibility and uncertainty estimates, while symbolic components preserve legal constraints, evidentiary provenance, and auditable reasoning. This guarded hybridization is especially important for patent infringement analysis because legally sufficient claim coverage cannot be reduced to document-level similarity or unconstrained predictive optimization.
The literature synthesis situates the proposed framework within patent retrieval, patent NLP, transformer-based representation learning, knowledge graphs, ontology engineering, claim-construction doctrine, explainable AI, legal information extraction, and LLM reliability research. This positioning clarifies the intended role of the system: it is a decision-support assistant rather than an autonomous legal decision maker. Its goal is to augment patent professionals by automating large-scale element comparison, surfacing plausible correspondences, identifying missing or uncertain limitations, and producing reviewable evidence traces.
Future research should extend this pilot validation to larger expert-annotated patent–product datasets, richer public patent examination and litigation materials, multimodal patent drawings and product images, and jurisdiction-specific claim-construction and prosecution-history evidence. Additional work should improve claim parsing for complex legal syntax, strengthen numerical and method-claim interpretation, formalize evidential uncertainty and source provenance, optimize deep NLP components through distillation and hardware acceleration, and explore human-in-the-loop adaptive learning. More broadly, the study suggests that cognitive-inspired frameworks such as DIKWP can make a practical contribution to legal AI when translated into explicit system design principles rather than treated only as abstract theory.

Author Contributions

Conceptualization, Z.G. and Y.D.; methodology, Z.G. and Y.D.; formal analysis, Z.G.; writing—original draft, Z.G.; writing—review and editing, Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 72462016; the Hainan Province Health Science and Technology Innovation Joint Program, grant number WSJK2024QN025; and the Hainan Province Key R&D Program, grant numbers ZDYF2022GXJS007 and ZDYF2022GXJS010.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The controlled synthetic corpus construction protocol, summary statistics, label distribution, and representative examples are included in the article. Summary metadata for the real-world pilot corpus, including patent identifiers, product-evidence source categories, annotation labels, and system outputs, can be provided where legally and practically permissible. Public patent and product materials remain available from their original public sources. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jiang, L.; Goetz, S.M. Natural language processing in the patent domain: A survey. Artif. Intell. Rev. 2025, 58, 214. [Google Scholar] [CrossRef]
  2. Ali, A.; Tufail, A.; De Silva, L.C.; Abas, P.E. Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art Searches. Appl. Syst. Innov. 2024, 7, 91. [Google Scholar] [CrossRef]
  3. Bekamiri, H.; Hain, D.S.; Jurowetzki, R. PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT. Technol. Forecast. Soc. Change 2024, 206, 123536. [Google Scholar] [CrossRef]
  4. Lee, J.-S.; Hsiang, J. Patent classification by fine-tuning BERT language model. World Pat. Inf. 2020, 61, 101965. [Google Scholar] [CrossRef]
  5. Park, H.; Yoon, J.; Kim, K. Identifying patent infringement using SAO based semantic technological similarities. Scientometrics 2012, 90, 515–529. [Google Scholar] [CrossRef]
  6. Lee, C.; Song, B.; Park, Y. How to assess patent infringement risks: A semantic patent claim analysis using dependency relationships. Technol. Anal. Strateg. Manag. 2013, 25, 23–38. [Google Scholar] [CrossRef]
  7. Kim, S.; Yoon, B. Patent infringement analysis using a text mining technique based on SAO structure. Comput. Ind. 2021, 125, 103379. [Google Scholar] [CrossRef]
  8. Wang, X.; Ren, H.; Chen, Y.; Liu, Y.; Qiao, Y.; Huang, Y. Measuring patent similarity with SAO semantic analysis. Scientometrics 2019, 121, 1–23. [Google Scholar] [CrossRef]
  9. Teng, H.; Wang, N.; Zhao, H.; Hu, Y.; Jin, H. Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents. J. Informetr. 2024, 18, 101467. [Google Scholar] [CrossRef]
  10. Abbas, A.; Zhang, L.; Khan, S.U. A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 2014, 37, 3–13. [Google Scholar] [CrossRef]
  11. Lupu, M.; Hanbury, A. Patent Retrieval. Found. Trends Inf. Retr. 2013, 7, 1–97. [Google Scholar] [CrossRef]
  12. Wang, N.; Wan, Z.; Zhao, H.; Hu, Y. New patent text similarity methods with a comprehensive understanding of SAO semantics. World Pat. Inf. 2025, 83, 102403. [Google Scholar] [CrossRef]
  13. Zhai, D.; Zhai, L.; Li, M.; He, X.; Xu, S.; Wang, F. Patent representation learning with a novel design of patent ontology: Case study on PEM patents. Technol. Forecast. Soc. Change 2022, 183, 121912. [Google Scholar] [CrossRef]
  14. Trappey, A.J.C.; Lin, G.-B.; Hung, L.-P. Intelligent Text Mining for Ontological Knowledge Graph Refinement and Patent Portfolio Analysis—Case Study of Net-Zero Data Center Innovation Management. Information 2024, 15, 374. [Google Scholar] [CrossRef]
  15. Lu, Y.; Tong, X.; Xiong, X.; Zhu, H. Knowledge graph enhanced citation recommendation model for patent examiners. Scientometrics 2024, 129, 2181–2203. [Google Scholar] [CrossRef]
  16. Xiao, Y.; Li, C.; Thürer, M. A patent recommendation method based on KG representation learning. Eng. Appl. Artif. Intell. 2023, 126, 106722. [Google Scholar] [CrossRef]
  17. Chen, H.; Deng, W. Interpretable patent recommendation with knowledge graph and deep learning. Sci. Rep. 2023, 13, 2586. [Google Scholar] [CrossRef]
  18. Jiang, S.; Yang, J.; Xie, J.; Xu, X.; Dou, Y.; Jing, L. Product innovation design approach driven by implicit relationship completion via patent knowledge graph. Adv. Eng. Inform. 2024, 61, 102530. [Google Scholar] [CrossRef]
  19. Jing, L.; Zhou, C.; Feng, D.; Dou, Y.; Fan, X.; Jiang, S. A Patent Infringement Analysis Approach Based on Patent Knowledge Graph Driven Fusion of Graph and Image Similarity. IEEE Access 2025, 13, 29944–29968. [Google Scholar] [CrossRef]
  20. Hogan, A.; Blomqvist, E.; Cochez, M.; D’amato, C.; Melo, G.D.; Gutierrez, C.; Kirrane, S.; Gayo, J.E.L.; Navigli, R.; Neumaier, S.; et al. Knowledge Graphs. ACM Comput. Surv. 2021, 54, 71. [Google Scholar] [CrossRef]
  21. Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Yu, P.S. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 494–514. [Google Scholar] [CrossRef]
  22. Nickel, M.; Murphy, K.; Tresp, V.; Gabrilovich, E. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE 2016, 104, 11–33. [Google Scholar] [CrossRef]
  23. Confalonieri, R.; Kutz, O.; Calvanese, D.; Alonso-Moral, J.M.; Zhou, S.-M. The role of ontologies and knowledge in Explainable AI. Semant. Web 2024, 15, 933–936. [Google Scholar] [CrossRef]
  24. Richmond, K.M.; Muddamsetty, S.M.; Gammeltoft-Hansen, T.; Olsen, H.P.; Moeslund, T.B. Explainable AI and Law: An Evidential Survey. Digit. Soc. 2024, 3, 1. [Google Scholar] [CrossRef]
  25. Premasiri, D.; Ranasinghe, T.; Mitkov, R.; El-Haj, M.; Frommholz, I. Survey on legal information extraction: Current status and open challenges. Knowl. Inf. Syst. 2025, 67, 11287–11358. [Google Scholar] [CrossRef]
  26. Li, G.-K.J.; Trappey, C.V.; Trappey, A.J.C.; Li, A.A.S. Ontology-based knowledge representation and semantic topic modeling for intelligent trademark legal precedent research. World Pat. Inf. 2022, 68, 102098. [Google Scholar] [CrossRef]
  27. Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 2018, 51, 93. [Google Scholar] [CrossRef]
  28. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  29. Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-T.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
  30. Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
  31. Dahl, M.; Magesh, V.; Suzgun, M.; Ho, D.E. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. J. Leg. Anal. 2024, 16, 64–93. [Google Scholar] [CrossRef]
  32. Dehghani, F.; Dehghani, R.; Naderzadeh Ardebili, Y.; Rahnamayan, S. Large Language Models in Legal Systems: A Survey. Humanit. Soc. Sci. Commun. 2025, 12, 1977. [Google Scholar] [CrossRef]
  33. Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
  34. Duan, Y. Bridging the Gap between Purpose-Driven Frameworks and Artificial General Intelligence. Appl. Sci. 2023, 13, 10747. [Google Scholar] [CrossRef]
  35. Mei, Y.; Duan, Y. The DIKWP (Data, Information, Knowledge, Wisdom, Purpose) Revolution: A New Horizon in Medical Dispute Resolution. Appl. Sci. 2024, 14, 3994. [Google Scholar] [CrossRef]
  36. Duan, Y.; Guo, Z. DIKWP-Driven Artificial Consciousness for IoT-Enabled Smart Healthcare Systems. Appl. Sci. 2025, 15, 8508. [Google Scholar] [CrossRef]
  37. Wu, K.; Duan, Y. Modeling and Resolving Uncertainty in DIKWP Model. Appl. Sci. 2024, 14, 4776. [Google Scholar] [CrossRef]
  38. Mei, Y.; Duan, Y. DIKWP Semantic Judicial Reasoning: A Framework for Semantic Justice in AI and Law. Information 2025, 16, 640. [Google Scholar] [CrossRef]
  39. Mei, Y.; Duan, Y. A Review of Personalized Semantic Secure Communications Based on the DIKWP Model. Electronics 2025, 14, 3671. [Google Scholar] [CrossRef]
  40. Mei, Y.; Duan, Y. Bidirectional Semantic Communication Between Humans and Machines Based on Data, Information, Knowledge, Wisdom, and Purpose Artificial Consciousness. Appl. Sci. 2025, 15, 1103. [Google Scholar] [CrossRef]
  41. Mei, Y.; Duan, Y.; Wu, K.; Li, Y. A DIKWP white-box semantic distributed learning approach for resource-constrained edge devices in web environments. Int. J. Web Inf. Syst. 2026, 22, 17–43. [Google Scholar] [CrossRef]
  42. Farisco, M.; Evers, K.; Changeux, J.-P. Is artificial consciousness achievable? Lessons from the human brain. Neural Netw. 2024, 180, 106714. [Google Scholar] [CrossRef]
  43. Chella, A. Artificial consciousness: The missing ingredient for ethical AI? Front. Robot. AI 2023, 10, 1270460. [Google Scholar] [CrossRef]
  44. Evers, K.; Farisco, M.; Chatila, R.; Earp, B.D.; Freire, I.T.; Hamker, F.; Nemeth, E.; Verschure, P.F.M.J.; Khamassi, M. Preliminaries to artificial consciousness: A multidimensional heuristic approach. Phys. Life Rev. 2025, 52, 180–193. [Google Scholar] [CrossRef]
  45. Chi, Z.; Lin, W.; Xiao, Z.; Li, H.; Chen, W.; Liu, X. A review of patent analysis based on machine learning. Appl. Soft Comput. 2026, 186, 114063. [Google Scholar] [CrossRef]
  46. Krestel, R.; Chikkamath, R.; Hewel, C.; Risch, J. A survey on deep learning for patent analysis. World Pat. Inf. 2021, 65, 102035. [Google Scholar] [CrossRef]
  47. Waterstraat, J.; Walter, L. Designing tailored patent search approaches—A case study on nursing care technology. World Pat. Inf. 2026, 84, 102420. [Google Scholar] [CrossRef]
  48. Sharma, E.; Li, C.; Wang, L. BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Florence, Italy, 2019; pp. 2204–2213. [Google Scholar] [CrossRef]
  49. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  50. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3982–3992. [Google Scholar] [CrossRef]
  51. Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3615–3620. [Google Scholar] [CrossRef]
  52. Chalkidis, I.; Fergadiotis, M.; Malakasiotis, P.; Aletras, N.; Androutsopoulos, I. LEGAL-BERT: The Muppets straight out of Law School. In Findings of the Association for Computational Linguistics: EMNLP 2020; Association for Computational Linguistics: Hong Kong, China, 2020; pp. 2898–2904. [Google Scholar] [CrossRef]
  53. Chalkidis, I.; Jana, A.; Hartung, D.; Bommarito, M.; Androutsopoulos, I.; Katz, D.; Aletras, N. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 4310–4330. [Google Scholar] [CrossRef]
  54. Schmitt, V.J.; Walter, L.; Schnittker, F.C. Assessment of patentability by means of semantic patent analysis—A mathematical-logical approach. World Pat. Inf. 2023, 73, 102182. [Google Scholar] [CrossRef]
  55. Chen, L.; Xu, S.; Zhu, L.; Zhang, J.; Yang, G.; Xu, H. A deep learning based method benefiting from characteristics of patents for semantic relation classification. J. Informetr. 2022, 16, 101312. [Google Scholar] [CrossRef]
  56. Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
  57. Nardi, J.C.; Barcellos, M.P.; Almeida, J.P.A. An Analysis of Ontologies for the Intellectual Property Domain. Appl. Ontol. 2026, 21, 49–77. [Google Scholar] [CrossRef]
  58. Li, C.; Li, W.; Hong, Y.; Xiang, H. A patent retrieval method and system based on double classification. Inf. Sci. 2024, 672, 120659. [Google Scholar] [CrossRef]
  59. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  60. Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
  61. Zhong, H.; Xiao, C.; Tu, C.; Zhang, T.; Liu, Z.; Sun, M. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence. Proc. ACL 2020, 5218–5230. [Google Scholar] [CrossRef]
  62. Sakthivel, C.; Jose, J. Multifeature fusion for claim scope-aware litigation risk prediction for patent drafts. PeerJ Comput. Sci. 2025, 11, e3069. [Google Scholar] [CrossRef]
  63. Herzberg, A. Assessing the standard-essentiality of 5G technology patents by means of generative artificial intelligence. World Pat. Inf. 2025, 81, 102363. [Google Scholar] [CrossRef]
  64. Trappey, A.J.C.; Chou, S.-C.; Li, G.-K.J. Patent litigation mining using a large language model—Taking unmanned aerial vehicle development as the case domain. World Pat. Inf. 2025, 80, 102332. [Google Scholar] [CrossRef]
  65. Mittelstadt, B.; Russell, C.; Wachter, S. Explaining Explanations in AI. In Proceedings of the Conference on Fairness, Accountability, and Transparency; Association for Computing Machinery: New York, NY, USA, 2019; pp. 279–288. [Google Scholar] [CrossRef]
  66. Barbierato, E.; Gatti, A.; Incremona, A.; Pozzi, A.; Toti, D. Breaking Away From AI: The Ontological and Ethical Evolution of Machine Learning. IEEE Access 2025, 13, 55627–55647. [Google Scholar] [CrossRef]
  67. Li, R.; Yu, W.; Wang, S. Research on Chinese patent classification based on structured features. Sci. Rep. 2025, 15, 18036. [Google Scholar] [CrossRef]
Figure 1. Conceptual architecture of the proposed DIKWP network for purpose-aware patent infringement analysis.
Figure 1. Conceptual architecture of the proposed DIKWP network for purpose-aware patent infringement analysis.
Electronics 15 02449 g001
Figure 2. Operational semantic processing workflow and legal interpretability trace of the proposed DIKWP network.
Figure 2. Operational semantic processing workflow and legal interpretability trace of the proposed DIKWP network.
Electronics 15 02449 g002
Figure 3. Precision, recall, and F1-score of different system configurations.
Figure 3. Precision, recall, and F1-score of different system configurations.
Electronics 15 02449 g003
Table 1. Epistemological roles of symbolic and statistical components in the DIKWP network.
Table 1. Epistemological roles of symbolic and statistical components in the DIKWP network.
ComponentMethod TypeEpistemological AssumptionPrimary OutputLegal Control Mechanism
Patent-oriented NLP, NER, dependency parsing, and SAO extractionStatistical or rule-assisted statisticalLinguistic patterns and learned contextual representations reveal candidate entities, actions, and relationsCandidate claim elements and product featuresOutput must retain provenance and remain subject to element-level review
Embedding-based semantic similarityStatisticalDistributional proximity suggests semantic relatedness or paraphrase potentialCandidate correspondences with confidence scoresSimilarity cannot establish legal sufficiency without rule-based validation
Ontology and lexical resourcesSymbolicDomain concepts, synonyms, taxonomic relations, and technical functions can be explicitly representedNormalized concepts and typed semantic relationsPrevents unsupported synonym expansion and preserves material limitations
Patent knowledge graphSymbolic with optional graph learningClaim elements, product features, and evidence relations can be represented as traceable graph structuresClaim–product correspondence graphMaintains source provenance, relation structure, and explanation paths
Legal rule engineSymbolic and normativeLegal sufficiency depends on explicit doctrinal constraints rather than global similarityEvidence states and claim-level outcomesEnforces all-elements reasoning, negative-evidence handling, and prosecution-history gates
Purpose layerControl and decision policyAnalytical objectives affect acceptable risk posture, review routing, and explanation granularityEnforcement, clearance, design-around, or expert-review operating modesConfigures thresholds while preserving decision-support rather than autonomous adjudication
Table 2. Evidence states for ambiguous claim-element matching.
Table 2. Evidence states for ambiguous claim-element matching.
Evidence StateMeaningSystem Action
SupportedClaim element is directly or ontology-medially supported and all material constraints are satisfiedCount as matched
ContradictedProduct text explicitly denies the feature or conflicts with a material constraintTreat as strong negative evidence
UncertainText is vague, broad, incomplete, or under-specifiedFlag for expert review
Non-mentionedFeature is not described in the available product evidenceDo not infer absence automatically
Equivalence-candidateNon-identical feature may satisfy function–way–result screening but is not literal supportRoute to doctrine-of-equivalents review
Table 3. Representative failure and boundary cases for element matching and functional-equivalence screening.
Table 3. Representative failure and boundary cases for element matching and functional-equivalence screening.
Case TypeClaim ElementProduct EvidenceSystem Interpretation
Explicit negative evidencebackrest attached to the seatthe product does not have any back support LiteralSupport = 0 because the product text explicitly contradicts the required element
Quantity or range mismatchheating temperature between 40–60 °Cheating temperature reaches 80 °C LiteralSupport = 0 because the quantitative limitation is not satisfied
Relation mismatchsensor mounted inside the housingsensor mounted externally on the housing LiteralSupport = 0 because the required spatial relation is materially different
Process-order mismatchcoating applied before curingcuring performed before coating LiteralSupport = 0 because the order of process steps is inconsistent with the claim limitation
Modifier mismatchstainless-steel cutting bladeplastic cutting blade LiteralSupport = 0 if the material modifier is treated as legally material
Functional non-equivalencespring urging arms apartmagnetic latch holding arms together SameFunction = 0 because the function and result are materially different
Equivalence candidate onlybiasing spring urging arms apartelastic band urging arms apart SameFunction = 1 may trigger an equivalence-candidate flag, but it is not counted as literal support and expert review is required before any legal equivalence conclusion
Table 4. Structured representation of decomposed claim limitations.
Table 4. Structured representation of decomposed claim limitations.
FieldMeaningExample
i d i Limitation identifierClaim 1, limitation 1.b
t y p e i Limitation typestructural, functional, numerical, process-order, negative
c o r e i Normalized core conceptbackrest, biasing spring, image block, sensor
r e l i Required relationsattached_to(backrest, seat), mounted_inside(sensor, housing)
m o d i Material, spatial, shape, or purpose modifiersstainless-steel, external, collapsible, configured to filter
q u a n t i Quantity, range, threshold, or plurality constraintsplurality, at least one, 40–60 °C
s e q i Method-step ordercoating before curing
n e g i Negative or exclusionary languagewithout adhesive, not electrically connected
d e p i Dependent-claim inheritanceClaim 3 depends from Claim 1
c o n s t r i Claim-construction sourceclaim language, specification excerpt, definition
p r o s i Prosecution-history constraintnarrowing amendment, disclaimer, excluded equivalent
m a t i Materiality flagmaterial limitation for all-elements reasoning
p r o v i Provenancesource sentence, claim number, document section
Table 5. Evidence states and claim-level consequences for missing or ambiguous limitations.
Table 5. Evidence states and claim-level consequences for missing or ambiguous limitations.
Evidence StateConditionElement-Level ResultClaim-Level Consequence
SupportedProduct evidence directly, synonymously, or ontology-medially supports the limitation and all material constraints are satisfiedMatchedMay support literal coverage if all other material limitations are also supported
ContradictedProduct evidence explicitly denies the limitation or conflicts with a material relation, modifier, range, quantity, or sequence constraintNot matchedStrong negative evidence; literal support fails for the claim
Non-mentionedProduct evidence does not disclose the limitationNot matched, but absence is not inferred automaticallyInsufficient evidence unless additional evidence is retrieved
UncertainProduct evidence is vague, incomplete, or technically under-specifiedExpert-review flagNo definitive positive or negative conclusion
Equivalence-candidateProduct feature is not literally or ontology-medially identical, but may perform a comparable function in a comparable way with a comparable resultExpert-review flag; not literal supportDoctrine-of-equivalents review may be triggered, subject to prosecution-history and legal constraints
Table 6. Legal matching hierarchy used by the prototype.
Table 6. Legal matching hierarchy used by the prototype.
Match CategoryRequired ConditionSystem OutputLegal Interpretation
Literal lexical identityProduct feature uses the same normalized term and satisfies all material relations, modifiers, quantities, ranges, and sequence constraintsLiteral supportSupports literal coverage at the element level
Ontology-mediated literal supportProduct expression is an accepted synonym, subtype, or domain-recognized variant, and all material constraints remain satisfiedLiteral support with ontology explanationMay support literal coverage because semantic normalization does not remove any material limitation
Functional equivalence candidateProduct feature differs from literal wording but may satisfy a function–way–result screening rule and is not contradicted or estoppel-blockedExpert-review flagDoes not establish infringement; triggers doctrine-of-equivalents review
Technical similarity onlyProduct feature is generally related to the same technical field or function but is not anchored to the specific material limitationNot matchedLegally insufficient despite technical relatedness
Contradicted or excluded featureProduct evidence denies the limitation or prosecution history excludes the proposed correspondenceNot matchedBlocks literal support and may block equivalence screening
Non-mentioned featureProduct documents do not disclose the limitationInsufficient-evidence flagDoes not support literal coverage and does not prove absence by itself
Table 7. Representation of prosecution-history constraints in the legal ontology and knowledge graph.
Table 7. Representation of prosecution-history constraints in the legal ontology and knowledge graph.
Ontology Class or EdgeMeaningExample RepresentationReasoning Effect
NarrowingAmendmentAmendment that narrows a claim limitationlimitation narrowed by adding “stainless steel”Reduces permissible correspondence scope
ApplicantArgumentArgument distinguishing prior artapplicant argued that external mounting is different from internal mountingRecords basis for later exclusion
DisclaimerExpress or implicit disclaimer of a feature or interpretationclaim element disclaimed external sensor placementBlocks inconsistent matching
ExcludedEquivalentFeature excluded from equivalence considerationelastic substitute excluded after amendmentBlocks equivalence-candidate flag
EstoppelConstraintLegal constraint limiting equivalencecandidate blocked by prosecution estoppelRoutes to negative or expert-review outcome
narrowed_byEdge from limitation to amendmentsensor location narrowed_by amendmentUpdates material limitation representation
blocked_byEdge from candidate correspondence to estoppel constraintequivalenceCandidate blocked_by estoppelConstraintPrevents functional similarity from becoming legal equivalence concern
Table 8. Operational steps for legal reasoning in the prototype.
Table 8. Operational steps for legal reasoning in the prototype.
StepOperationOutput
1Segment asserted claim into preamble, transition, and body limitationsCandidate limitation clauses
2Extract core concepts, relations, modifiers, quantities, ranges, negations, and process-order constraintsStructured claim-limitation objects
3Link limitations to specification excerpts, dependent claims, and prosecution-history evidence where availableClaim-construction and provenance links
4Extract product features and evidence states from product descriptionsStructured product-feature objects
5Evaluate literal lexical and ontology-mediated support for each material limitationElement-level literal-support status
6Apply negative-evidence, missing-evidence, and prosecution-history gatesBlocked, contradicted, uncertain, or insufficient-evidence status
7Apply function–way–result screening only to non-literal candidate correspondencesEquivalence-candidate flags for expert review
8Aggregate element-level results under the all-elements ruleClaim-level output and explanation trace
Table 9. Element-level matching results for the chair–stool example.
Table 9. Element-level matching results for the chair–stool example.
Patent Claim ElementProduct EvidenceMatch StatusInterpretation
seatflat round sitting surfaceMatchedOntology-based normalization supports seat correspondence
plurality of legsthree collapsible legsMatchedQuantity condition is satisfied because 3 ≥ 2
backrest attached to the seatexplicit statement of no back supportNot matchedNegative evidence defeats literal coverage; no equivalent identified
Table 10. Example semantic mapping between patent claim elements and product components in a mechanical clamp case.
Table 10. Example semantic mapping between patent claim elements and product components in a mechanical clamp case.
Claim ElementOntology ConceptProduct FeatureMatch Outcome
First and second arms pivotally connected by a hingeArticulated arms + hinge jointPliers handles joined by a pivotOntology-mediated joint correspondence if claim construction permits; otherwise expert review
Spring urging the arms apartBiasing spring mechanismTorsion spring at pivotFunctional-equivalence candidate; expert review required before any legal equivalence conclusion
Gripping jaws on opposite ends of the armsClamping jawsSerrated jaw surfacesLiteral or ontology-mediated support; same clamping role and spatial position
Table 11. Operational legal decision trace used by the prototype.
Table 11. Operational legal decision trace used by the prototype.
ConditionRule or CheckPrototype Output
All material claim limitations are literally or ontology-literally supportedAll-elements test with literal-support predicateLiteral-support risk flagged; explanation lists matched limitations and supporting evidence
At least one material limitation is contradictedNegative-evidence gateLiteral support rejected for the asserted claim
At least one material limitation is non-mentioned or uncertainMissing-evidence ruleInsufficient evidence or expert-review output; no definitive positive conclusion
A non-identical feature satisfies function–way–result screening and is not estoppel-blockedEquivalence-candidate screeningEquivalence concern flagged; expert legal review required
A product feature is technically similar but not anchored to a material limitationLegal-relevance anchoring ruleTechnical similarity recorded but not counted as infringement evidence
A candidate correspondence is disclaimed or estoppel-blockedProsecution-history gateCandidate rejected or routed to legal review depending on available evidence
Table 12. Data-generation protocol for the controlled synthetic corpus.
Table 12. Data-generation protocol for the controlled synthetic corpus.
Case TypeGeneration RuleAnnotation CriterionPurpose in Evaluation
Positive literal supportProduct description preserves all material limitations using substantially the same technical terminologyAll material claim limitations are supported directlyTests basic all-elements reasoning under explicit evidence
Positive synonym/paraphrase supportProduct description preserves all material limitations using accepted synonyms, paraphrases, or domain-specific alternative termsAll material limitations are supported through lexical normalization or ontology-mediated correspondenceTests resistance to lexical variation and drafting style changes
Positive ontology-mediated supportProduct description uses a domain-recognized alternative expression while preserving the material function, relation, and constraintThe disputed element is treated as supported only when the ontology confirms compatibility and no material limitation is lostTests structured semantic normalization beyond surface wording
Negative missing-limitation caseProduct description omits at least one material limitationAt least one material limitation is non-mentioned and therefore prevents a definite positive labelTests missing-limitation handling
Negative near-miss caseProduct description preserves most limitations but contradicts or materially alters one decisive limitationAt least one limitation is contradicted or materially mismatchedTests whether global semantic similarity is prevented from overriding the all-elements rule
Borderline equivalence-candidate caseProduct feature is functionally related to a claim limitation but differs in structure, way, or legally material implementationElement is labeled as equivalence-candidate and routed to expert reviewTests uncertainty handling and expert-review routing
Table 13. Element-level and claim-level annotation criteria for the controlled synthetic corpus.
Table 13. Element-level and claim-level annotation criteria for the controlled synthetic corpus.
Annotation LevelLabelConditionClaim-Level Effect
Element levelSupportedProduct evidence directly, synonymously, or ontologically supports the claim limitation and all material constraintsCounts as a matched material limitation
Element levelContradictedProduct evidence explicitly conflicts with the limitation or changes a legally material relation, modifier, quantity, range, or sequenceCounts as negative evidence
Element levelNon-mentionedProduct description does not disclose the required limitationDoes not infer absence automatically, but prevents a definitive positive label
Element levelEquivalence-candidateProduct feature is functionally related but not treated as literal or legally sufficient supportRouted to expert review; not counted as literal support
Claim levelPositive/literal-or-supportedAll material limitations are supportedPositive case for binary precision–recall calculation
Claim levelNegative/missing-limitationAt least one material limitation is non-mentionedNegative case for binary precision–recall calculation
Claim levelNegative/near-missAt least one material limitation is contradicted or materially alteredNegative case for binary precision–recall calculation
Claim levelBorderline/equivalence-candidateLiteral support is incomplete, but functional-equivalence screening is triggeredNon-positive for binary metrics; expert-review case
Table 14. Label distribution and domain coverage of the controlled synthetic corpus.
Table 14. Label distribution and domain coverage of the controlled synthetic corpus.
DomainClaim TemplatesPairsPositive LiteralPositive SynonymPositive Ontology-MediatedNegative MissingNegative Near-MissBorderline CandidateMain Limitation Types
Mechanical devices525555550Components, joints, spatial and force relations
Electrical/electronic devices525555550Components, signal paths, control functions
Software/process methods525550555Process steps, functional operations, sequence constraints
Medical tools525550555Specialized components, material and operational constraints
Total20100202010202010Four-domain controlled synthetic corpus
Table 15. Representative positive, negative, borderline, and insufficient-evidence examples in the controlled synthetic corpus.
Table 15. Representative positive, negative, borderline, and insufficient-evidence examples in the controlled synthetic corpus.
Case TypeClaim LimitationProduct EvidenceAnnotationRationale
Positive literalbiasing spring urging the arms apartthe accused clamp includes a biasing spring urging the arms apartSupportedThe product evidence preserves the material limitation using substantially the same terminology
Positive synonym/paraphrasepivot joint connecting the first arm to the second armhinge pin coupling the two armsSupportedOntology-mediated synonymy supports the same connection relation
Negative missing limitationbiasing spring urging the arms apartthe documentation does not describe any biasing spring urging the arms apartNon-mentionedA material limitation is not supported by product evidence
Negative near-missbiasing spring urging the arms aparta magnetic latch holding the arms togetherContradictedThe product feature changes the claimed function and result rather than merely paraphrasing it
Borderline equivalence candidatetransform coding each image blockan alternative image-processing structure performs a related compression function in a materially different wayEquivalence-candidateThe case is semantically related but not treated as literal support; expert review is required
Insufficient-evidence patternsensor mounted inside the housingthe brochure mentions a sensor but does not disclose its locationNon-mentioned/uncertainThe evidence does not support the required spatial relation
Table 16. Composition of the real-world pilot corpus.
Table 16. Composition of the real-world pilot corpus.
DomainPatent ClaimsProduct DescriptionsDefinite CasesBorderline/Insufficient CasesMain Evidence Sources
Mechanical devices2020146Manuals, datasheets, product brochures
Electrical/electronic devices2020146Datasheets, installation guides, official product webpages
Software/process claims2020128Technical documentation, user guides, product webpages
Total60604020Public patent claims and real product descriptions
Table 17. Annotation criteria for the real-world pilot corpus.
Table 17. Annotation criteria for the real-world pilot corpus.
Annotation LevelLabelConditionSystem Interpretation
Element levelSupportedProduct evidence discloses the limitation directly, synonymously, or through ontology-mediated supportCounts as a matched material limitation
Element levelContradictedProduct evidence explicitly conflicts with the limitation or with a material relation, modifier, quantity, range, or sequence constraintStrong negative evidence
Element levelNon-mentionedProduct evidence does not disclose the limitationDoes not infer absence; prevents definite positive output
Element levelUncertainProduct evidence is vague, incomplete, or technically under-specifiedExpert-review or insufficient-evidence status
Element levelEquivalence-candidateNon-identical feature may perform a comparable function in a comparable way with a comparable resultRouted to doctrine-of-equivalents review; not literal support
Claim levelLikely literal supportAll material limitations are supportedDefinite positive case
Claim levelLikely non-coverageAt least one material limitation is contradicted or materially mismatchedDefinite negative case
Claim levelBorderline equivalence concernLiteral support is incomplete, but functional-equivalence screening is triggeredExpert-review case
Claim levelInsufficient evidenceOne or more material limitations are not disclosed and cannot be resolved from available evidenceNo definitive legal conclusion
Table 18. Performance of different configurations on the controlled synthetic corpus.
Table 18. Performance of different configurations on the controlled synthetic corpus.
ApproachPrecisionRecallF1-ScoreInterpretation
Keyword matching baseline0.680.520.59Relies on lexical overlap; misses paraphrased or functionally expressed claim elements
Semantic matching with ontology, without full DIKWP reasoning0.780.700.74Improves semantic correspondence, but still produces some logically inconsistent matches
Full DIKWP network (default mode)0.870.820.85Best overall balance of semantic coverage and claim-level decision logic in the controlled corpus
Full DIKWP network, aggressive mode0.800.920.85Recall-oriented setting; useful for broad enforcement screening
Full DIKWP network, conservative mode0.930.750.83Precision-oriented setting; useful for clearance or design-around analysis
Table 19. Confusion matrix of the full DIKWP network in default mode on the controlled synthetic corpus.
Table 19. Confusion matrix of the full DIKWP network in default mode on the controlled synthetic corpus.
Predicted PositivePredicted Negative
Actual positive419
Actual negative644
Table 20. Absolute within-corpus effect sizes of the full DIKWP network compared with baseline and purpose-conditioned configurations.
Table 20. Absolute within-corpus effect sizes of the full DIKWP network compared with baseline and purpose-conditioned configurations.
ComparisonΔPrecisionΔRecallΔF1Interpretation
Full DIKWP network vs. keyword baseline+0.19+0.30+0.26Controlled-corpus improvement over lexical matching
Full DIKWP network vs. semantic + ontology configuration+0.09+0.12+0.11Added benefit of claim-level reasoning, conflict handling, and purpose control
Aggressive mode vs. default mode−0.07+0.100.00Recall-oriented operating-point shift
Conservative mode vs. default mode+0.06−0.07−0.02Precision-oriented operating-point shift
Table 21. Domain-specific performance of the full DIKWP network on the controlled synthetic corpus.
Table 21. Domain-specific performance of the full DIKWP network on the controlled synthetic corpus.
CategoryPrecisionRecallF1-ScoreFalse Positive Rate
Mechanical device0.930.930.930.10
Electrical/electronic device0.930.870.900.10
Software/process method0.800.800.800.13
Medical tool0.750.600.670.13
Table 22. Resource usage of major modules in the prototype DIKWP network.
Table 22. Resource usage of major modules in the prototype DIKWP network.
ModuleLatency (ms)CPU UsageMemory (MB)
Linguistic parsing and knowledge-graph construction200055%900
Semantic inference40075%1500
Purpose validation and report generation10010%200
Table 23. Potential optimization strategies for improving efficiency and scalability.
Table 23. Potential optimization strategies for improving efficiency and scalability.
BottleneckCurrent IssueOptimization Strategy
Deep NLP parsingLong patent claims and nested syntactic structures slow down inferenceBatch inference, GPU acceleration, and model distillation
Repeated claim processingThe same claims may be repeatedly parsed in multi-product or multi-scenario analysisClaim-graph caching and reusable claim-element representations
Large candidate setFull DIKWP reasoning may be applied before irrelevant patent–product pairs are filtered outTwo-stage retrieval followed by full DIKWP reasoning on high-priority candidates
Knowledge-graph constructionRebuilding graphs from scratch increases preprocessing costIncremental graph updates and partial graph reuse
Ontology matchingCandidate alignment may become more expensive as the ontology expandsApproximate nearest-neighbor search and ontology-index-based candidate alignment
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, Z.; Duan, Y. A Purpose-Aware Semantic Reasoning Model for Patent Infringement Detection in the DIKWP Network. Electronics 2026, 15, 2449. https://doi.org/10.3390/electronics15112449

AMA Style

Guo Z, Duan Y. A Purpose-Aware Semantic Reasoning Model for Patent Infringement Detection in the DIKWP Network. Electronics. 2026; 15(11):2449. https://doi.org/10.3390/electronics15112449

Chicago/Turabian Style

Guo, Zhendong, and Yucong Duan. 2026. "A Purpose-Aware Semantic Reasoning Model for Patent Infringement Detection in the DIKWP Network" Electronics 15, no. 11: 2449. https://doi.org/10.3390/electronics15112449

APA Style

Guo, Z., & Duan, Y. (2026). A Purpose-Aware Semantic Reasoning Model for Patent Infringement Detection in the DIKWP Network. Electronics, 15(11), 2449. https://doi.org/10.3390/electronics15112449

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop