1. Introduction
Patent infringement analysis is a core task in intellectual property protection because it requires determining whether an accused product or process falls within the scope of asserted patent claims. This task entails the interpretation of technically dense claim language, domain-specific terminology, and legally constrained evidence. As patent filings, technical disclosures, product manuals, product web pages, standards documents, and patent drawings continue to proliferate, manual claim-to-product comparison becomes increasingly costly, time-consuming, and difficult to scale. Recent surveys show rapid growth in patent-oriented natural language processing and patent retrieval [
1,
2]. PatentSBERTa [
3] and PatentBERT [
4] further demonstrate that patent-specific transformer models can improve document-level distance estimation, classification, and semantic comparison.
The methodological difficulty, however, is not solved by better document-level similarity alone. SAO-based infringement analysis and dependency-based claim analysis show that claim-to-product comparison must operate at the level of technical relations and claim limitations [
5,
6]. Later studies on SAO text mining, weighted semantic structures, and function-oriented patent semantics further confirm that infringement-oriented comparison requires functional and structural alignment rather than ordinary lexical overlap [
7,
8,
9].
The legal reason for this granularity is straightforward. Claim construction determines the operative meaning of asserted limitations. Literal infringement is generally assessed element by element, and the doctrine of equivalents remains constrained by element-specific reasoning and prosecution-history limitations. Therefore, an infringement-detection system must distinguish semantic relatedness from legally sufficient claim coverage.
Real-world patent litigation introduces additional linguistic and doctrinal complexity that cannot be fully captured by synthetic patent–product pairs. Patent claims may contain intentionally broad functional expressions, open-ended modifiers, nested dependencies, numerical ranges, means-plus-function formulations, and strategically ambiguous terminology. Product descriptions may also be incomplete, promotional, or selectively drafted. In such settings, semantic similarity alone may increase false positives, whereas overly strict element matching may increase false negatives. The proposed DIKWP network is therefore intended as an explainable decision-support framework that identifies, ranks, and explains potential correspondences, rather than as an autonomous substitute for legal claim construction or expert infringement judgment.
Early computational approaches to patent analysis mainly relied on keyword matching, Boolean retrieval, vector-space comparison, citation networks, or classification features. These methods remain useful for prior-art discovery and coarse-grained patent landscape analysis [
10,
11]. They are less suitable for infringement detection because they cannot adequately represent functional relations, claim syntax, spatial qualifiers, quantity constraints, and semantic equivalence across alternative technical expressions.
The infringement task is element-centric rather than document-centric. Park et al. used SAO-based technological similarity to capture functional relations that keyword search tends to miss [
5]. Lee et al. introduced dependency-based semantic claim analysis for infringement-risk assessment, showing that claim syntax has direct analytical value [
6]. Subsequent work refined this direction through SAO-based text mining, broader SAO similarity measures, and function-oriented semantic knowledge [
7,
8,
9]. Wang et al. further showed that a richer understanding of SAO semantics can improve patent text similarity modeling [
12]. Recent research also shows that patents should not be treated as plain text alone. Patent documents encode relations among functions, structures, materials, components, locations, operating conditions, purposes, and embodiment alternatives. Ontology-based representation learning and ontological knowledge-graph refinement have therefore been introduced to organize patent semantics beyond surface tokens [
13,
14].
Knowledge graphs have also been used for patent examiner citation recommendation and patent recommendation, indicating their value for relational patent analysis and retrieval support [
15,
16]. Interpretable patent recommendation and product-innovation design studies further show that graph structures can preserve technical relations that flat embeddings may weaken [
17,
18]. Particularly relevant to the present task, a patent infringement analysis approach based on patent knowledge graphs and image similarity demonstrates the value of combining symbolic graph structure with other evidence channels [
19].
This graph-oriented perspective is consistent with general knowledge-graph research. Foundational surveys describe knowledge graphs as entity-relation structures that support schema modeling, reasoning, completion, and learning [
20,
21]. Relational machine learning on graphs provides additional methods for exploiting typed links and graph paths when semantic correspondence is not reducible to direct lexical identity [
22].
However, patent infringement detection is not merely a semantic matching problem. It is also a reasoning problem governed by legal criteria. Claim construction ordinarily begins with the claim language and its intrinsic context, including the specification and prosecution history. Literal infringement generally requires that every material claim limitation be found in the accused product or process. Doctrine-of-equivalents analysis may consider whether a non-identical feature is substantially equivalent to a claimed element, but it remains element-specific and legally constrained. Prosecution history estoppel may further restrict equivalence when claim amendments narrow the scope of a limitation.
These legal constraints require more than black-box similarity scoring. Ontologies and knowledge-aware explanation methods can make intermediate semantic commitments visible to reviewers [
23,
24]. Legal information extraction and ontology-supported legal research show that legal AI systems must preserve legally relevant entities, relations, and evidentiary context rather than only optimizing predictive accuracy [
25,
26]. General XAI and interpretability research likewise warns that high-stakes decision support needs transparent and auditable reasoning structures [
27,
28].
The emergence of large language models creates additional opportunities and risks. LLMs may assist with claim summarization, paraphrase generation, candidate correspondence discovery, and interactive legal-technical explanation. Retrieval-augmented generation provides one way to connect generation with external evidence, while foundation-model research highlights both the breadth and the governance risks of general-purpose models [
29,
30].
At the same time, empirical work shows that LLMs can hallucinate legal propositions and citations [
31]. Broader surveys of large language models in legal systems and hallucination research emphasize that fluent output is not equivalent to factual or doctrinal reliability [
32,
33]. For patent infringement detection, generative outputs should therefore be embedded within controlled semantic architectures that validate candidate correspondences against ontologies, evidence links, and legal rules.
To address these requirements, the present study adopts the DIKWP model as the organizing framework for system design. DIKWP extends the traditional Data-Information-Knowledge-Wisdom framing by making Purpose an explicit control dimension [
34]. Recent applications show that DIKWP can support purpose-sensitive reasoning in domain-specific settings, including medical dispute resolution and smart healthcare systems [
35,
36].
In this paper, DIKWP is not treated as a rigid hierarchy or one-way pipeline. It is operationalized as a network of interacting semantic dimensions in which data, information, knowledge, wisdom, and purpose are recursively transformed and mutually constrained during analysis. DIKWP studies on uncertainty modeling and semantic judicial reasoning support this networked interpretation [
37,
38]. Work on personalized and bidirectional semantic communication further motivates the use of purpose as a regulating dimension in human-machine reasoning [
39,
40].
Based on this perspective, this paper proposes a DIKWP network for patent infringement detection. The framework integrates patent-oriented NLP, ontology-based knowledge representation, graph-based semantic alignment, rule-guided legal inference, and purpose-aware decision support. Named entity recognition, dependency parsing, and SAO extraction transform unstructured text into structured semantic units. These units are linked to legal and technical ontologies and organized into knowledge graphs for claim interpretation and feature alignment. The system then performs infringement-oriented reasoning by combining semantic similarity, claim-limitation coverage analysis, functional-equivalence screening, and rule-based evidence aggregation, while preserving intermediate representations that support expert review.
This study makes three contributions. First, it develops a DIKWP-network-based design for patent infringement detection and extends purpose-aware semantic modeling to a concrete legal-technical task. Second, it constructs an end-to-end analytical framework that connects patent NLP, ontology and knowledge-graph representation, explicit legal logic, uncertainty handling, and explainable decision support. Third, it evaluates the proposed framework through a two-layer validation strategy. A controlled synthetic patent–product corpus is used to isolate claim-element reasoning and compare system configurations, while a real-world pilot corpus constructed from public patent claims and real product technical descriptions is used to examine external validity under realistic drafting styles, domain terminology, and incomplete product evidence. By framing patent infringement detection as a problem of semantic understanding, structured knowledge, legal reasoning, and purpose-sensitive control, this work contributes to research at the intersection of legal AI, patent analytics, knowledge-based systems, and semantic computing.
The remainder of this paper is organized as follows.
Section 2 reviews the DIKWP model, artificial consciousness theory, patent NLP, semantic patent analysis, knowledge graphs, legal AI, and explainable decision support.
Section 3 presents the overall system framework.
Section 4 describes the semantic processing workflow.
Section 5 details the prototype implementation and illustrative outputs.
Section 6 reports experimental results.
Section 7 discusses implications, limitations, deployment considerations, and future research directions.
Section 8 concludes the paper.
2. Background and Related Work
2.1. DIKWP Network and Artificial Consciousness
The DIKWP model extends the traditional Data-Information-Knowledge-Wisdom framework by explicitly incorporating Purpose into intelligent processing. Early DIKWP work frames purpose as a bridge between task-oriented reasoning and more general intelligent behavior [
34]. Domain-specific studies then illustrate how the model can organize medical dispute resolution and smart healthcare decision support [
35,
36].
In this study, DIKWP is not treated as a rigid hierarchy. It is understood as a network model in which data, information, knowledge, wisdom, and purpose interact dynamically and recursively during analysis. Studies on DIKWP uncertainty handling and semantic judicial reasoning support this non-linear view [
37,
38]. Related work on DIKWP-based distributed learning and semantic communication further emphasizes interaction among semantic spaces rather than a one-way conversion chain [
39,
40,
41].
Within this formulation, data refers to raw patent claims, specifications, product descriptions, drawings, and related technical materials. Information denotes structured outputs derived from these materials, including extracted entities, claim limitations, product features, predicate-argument structures, and metadata. Knowledge comprises legal and technical ontologies, patent knowledge graphs, domain rules, lexical resources, and learned semantic correspondences. Wisdom refers to higher-order reasoning, including claim-coverage analysis, semantic equivalence assessment, uncertainty-aware judgment, and explanation generation. Purpose specifies the governing objective of the analysis, such as infringement screening, enforcement support, freedom-to-operate assessment, design-around guidance, or evidence prioritization for expert review.
A networked interpretation of DIKWP is particularly appropriate for patent infringement detection because the reasoning process is inherently iterative. Purpose may influence evidence selection, matching thresholds, equivalence screening, and report granularity; knowledge constrains semantic normalization and relation interpretation; and intermediate reasoning results may trigger additional retrieval, ontology expansion, or re-analysis of ambiguous product evidence. DIKWP research on uncertainty modeling and judicial reasoning provides the theoretical basis for this feedback-rich interpretation [
37,
38]. Studies of DIKWP semantic communication further support the idea that task purpose can shape information exchange between humans and machines [
39,
40].
Artificial consciousness is introduced here in a limited engineering sense rather than a strong philosophical sense. The present study does not claim that the proposed system is conscious in a phenomenological or subjective sense. Instead, artificial consciousness serves as a design inspiration for goal-aware coordination, context-sensitive adaptation, self-monitoring of reasoning status, and reflective control over explanations. Recent discussions in artificial consciousness research similarly stress the need to distinguish full consciousness claims from more modest functional notions such as self-monitoring, adaptive control, and goal-sensitive behavior [
42,
43,
44].
From this perspective, the relevance of artificial consciousness to DIKWP lies in the explicit role of purpose. A purpose-aware system does not merely process patent text; it adjusts its reasoning strategy according to the analytical task. Purpose-driven DIKWP research motivates this form of control at the architectural level [
34]. Recent DIKWP applications show how purpose can regulate domain-specific decision processes rather than merely label their outputs [
35,
36]. In the present setting, uncertainty-aware DIKWP reasoning and semantic judicial reasoning are particularly relevant because infringement assessment often involves incomplete evidence and legally constrained ambiguity [
37,
38].
2.2. Patent NLP, Semantic Representation, and Claim-Level Analysis
Patent analysis has become a major application area for natural language processing, machine learning, and knowledge-based systems. Earlier patent-analysis and patent-retrieval surveys emphasized the difficulty of patent language, including long sentences, broad functional expressions, synonymy, claim drafting conventions, and domain-specific terminology [
10,
11]. Recent surveys show that the field has moved beyond bibliographic matching toward machine-learning-based patent analysis and patent-specific NLP [
1,
45]. Deep-learning surveys further document the transition from handcrafted features to neural patent representations [
46]. Tailored patent search studies show that retrieval strategies must still be adapted to the target technical and professional context [
47].
Patent-specific transformer resources illustrate this shift. BIGPATENT supplies a large patent summarization corpus and demonstrates the availability of high-volume patent text for model development [
48]. PatentBERT shows that BERT fine-tuning can improve patent classification [
4]. PatentSBERTa extends sentence-transformer representations to patent distance estimation and classification [
3].
General pretrained language models also provide useful representation components. BERT introduced contextual bidirectional pretraining, Sentence-BERT adapted transformers for sentence-level semantic similarity, and SciBERT demonstrated the value of domain-aware pretraining for scientific text [
49,
50,
51]. Legal-domain models such as LEGAL-BERT and benchmarks such as LexGLUE show that legal language also benefits from specialized adaptation and evaluation [
52,
53].
These methods substantially improve retrieval and classification, but infringement detection requires a more granular analysis of claim limitations and accused-product features. A patent and a product can be semantically related without satisfying all material limitations. Conversely, two descriptions may have low lexical similarity yet correspond at the element level because of synonymy, functional equivalence, or alternative technical terminology.
One influential response to this limitation is the use of structure-aware semantic representations, especially SAO models. SAO representations make functional relations explicit and reduce dependence on surface lexical overlap. Park et al. used SAO-based semantic technological similarity to identify potential infringement relations [
5]. Lee et al. proposed dependency-based semantic claim analysis for patent infringement risk assessment [
6]. Later studies refined SAO-based infringement analysis and patent similarity modeling through text mining, weighted semantic structures, and function-oriented semantic knowledge [
7,
8,
9]. Newer SAO work continues to improve patent similarity by modeling SAO semantics more comprehensively [
12]. A mathematical-logical approach to semantic patentability assessment further shows that patent analysis benefits from explicit formal interpretation [
54]. Patent-specific semantic relation classification research also supports treating relation extraction as a distinct task rather than as a by-product of document similarity [
55].
Patent claims also exhibit a semi-structured legal form. Claim preambles, transitional phrases, limitations, dependent-claim references, means-plus-function expressions, process steps, numerical ranges, and negations all influence claim scope. The present framework therefore treats information extraction as more than a generic NLP task. It separates claim segmentation, product feature extraction, relation extraction, modifier detection, quantity parsing, and negation recognition. The output of this stage is a set of structured claim limitations and product features that can be aligned by the knowledge dimension and evaluated by the wisdom dimension.
2.3. Ontologies, Knowledge Graphs, and Semantic Patent Analytics
A second major research direction emphasizes ontology-based and knowledge-graph-based patent representation. Ontologies provide explicit vocabularies for a domain, including concepts, relations, constraints, and axioms [
56]. Analyses of intellectual-property ontologies show that ontology design is itself a specialized problem in this domain [
57]. Knowledge graphs extend this idea by organizing entities and relations into graph-structured semantic networks that can support querying, reasoning, completion, and machine learning [
20,
21,
22]. Patent documents are natural candidates for such representation because they encode components, functions, structural relations, materials, operating conditions, technical effects, and embodiment alternatives.
Recent patent analytics studies demonstrate the value of explicit semantic representation. Zhai et al. designed a patent ontology for patent representation learning [
13]. Trappey et al. used ontological knowledge-graph refinement for patent portfolio analysis [
14]. Lu et al. used knowledge graphs to support patent examiner citation recommendation [
15]. Patent knowledge graphs have also been applied to patent recommendation and interpretable patent recommendation, suggesting that graph representations can support both retrieval and explanation [
16,
17]. Related work on double-classification patent retrieval and product innovation design shows that structured patent knowledge can improve search and design-support tasks [
18,
58]. Particularly relevant to infringement detection, Jing et al. proposed a patent infringement analysis method that combines patent knowledge graphs with graph and image similarity [
19].
The present framework builds on this literature by using a dual ontology. The technical ontology encodes domain concepts such as components, materials, functions, structures, part-whole relations, and operational relations. The legal ontology encodes concepts such as Patent, Claim, ClaimLimitation, ProductFeature, Correspondence, InfringementEvidence, LiteralMatch, EquivalenceCandidate, NegativeEvidence, and UncertainEvidence. This dual structure is important because infringement analysis requires both technical interpretation and legal sufficiency. A product feature may be technically similar to a claim limitation but legally insufficient if a material modifier, quantity condition, sequence constraint, or exclusion is missing.
Knowledge graphs also support explanation. Instead of producing a single similarity score, the system records how a claim limitation was extracted, which product feature was linked to it, which ontology relation justified the link, which rule evaluated the link, and how the final claim-level assessment was produced. Ontology-centered XAI research treats explicit knowledge as a basis for transparent explanation [
23]. Legal XAI work emphasizes evidential presentation and reviewability in legal settings [
24]. Broader XAI surveys and social-science accounts of explanation further show that useful explanations should be selective, contrastive, and understandable to human reviewers [
27,
59,
60].
2.4. Legal AI, Claim Construction, and Explainable Decision Support
The broader legal AI literature confirms that legal decision support requires more than predictive accuracy. Legal information extraction must identify legally relevant entities, relations, and events in text, while legal reasoning must respect rule structures, evidentiary constraints, and doctrinal categories [
25,
61]. Legal NLP benchmarks and domain-adapted models such as LEGAL-BERT and LexGLUE show that legal language is sufficiently specialized to warrant domain-aware modeling [
52,
53]. These developments support the use of patent-specific and legal-specific NLP components in the proposed DIKWP network.
Recent patent-legal analytics also extends beyond retrieval and classification. Claim-scope-aware litigation risk prediction illustrates how patent drafts can be analyzed for dispute-related risk [
62]. Generative AI has been explored for standard-essentiality assessment, and LLM-based patent litigation mining has been tested in domain-specific dispute analysis [
63,
64]. These studies reinforce the need to combine language-model flexibility with legally constrained validation.
For patent infringement, claim construction and element-level analysis are central. Markman established the court-centered role of claim construction, while Phillips clarified that claim meaning should be interpreted in the context of the claims, specification, and prosecution history. Literal infringement depends on the presence of each material claim limitation in the accused product or process. Doctrine-of-equivalents analysis may extend beyond literal identity, but it remains element-specific and constrained by prosecution history estoppel and related limiting principles. These legal constraints justify the framework’s emphasis on all-elements reasoning, limitation-level mapping, negative evidence, and expert-review flags.
Explainability is especially important because patent infringement analysis is a high-stakes legal task. A system output is of limited value unless it can show which claim limitations were matched, which product features supplied the evidence, which correspondences were semantic rather than literal, and which limitations remained unsupported. General XAI surveys classify explanation methods for black-box models and hybrid systems [
27,
59]. Work on interpretable machine learning warns that post hoc explanations may be insufficient in high-stakes settings [
27,
28]. Explanation-theory studies further support user-oriented, contrastive, and reviewable explanations [
60,
65].
2.5. Epistemological Basis of Symbolic–Statistical Hybridization
The proposed DIKWP network is not merely an engineering aggregation of heterogeneous AI techniques. It is grounded in a guarded symbolic–statistical epistemology in which different components make different types of knowledge claims and are assigned different degrees of legal authority. This distinction is essential for patent infringement analysis because the task involves both empirical semantic interpretation and normative legal sufficiency.
Statistical components, including patent-oriented NLP, named entity recognition, dependency parsing, SAO extraction, transformer-based embedding similarity, and candidate relation extraction, operate under an inductive epistemological assumption. They assume that recurring linguistic patterns, distributional similarity, and learned contextual representations can reveal candidate entities, relations, paraphrases, and semantic correspondences. Their outputs are therefore probabilistic or confidence-oriented. These components are appropriate for discovering possible claim elements, product features, synonym relations, and functionally related expressions. However, they do not by themselves establish legal sufficiency. A high embedding-similarity score may indicate semantic relatedness, but it cannot determine whether all material claim limitations are satisfied.
Symbolic components, including ontologies, knowledge graphs, legal rules, all-elements reasoning, prosecution-history constraints, and explanation templates, operate under a different epistemological assumption. They assume that legally and technically relevant concepts can be explicitly represented as typed entities, relations, constraints, and inference rules. Their role is not primarily to discover semantic candidates, but to validate, constrain, and explain them. In the proposed framework, symbolic structures determine whether a candidate correspondence is anchored to a material claim limitation, whether required relations and modifiers are preserved, whether negative evidence blocks a match, and whether the claim-level all-elements condition is satisfied.
This division of labor is consistent with Barbierato et al.’s argument that machine learning should be conceptually distinguished from broader artificial intelligence. They argue that ML has developed its own methodological identity, centered on data-driven performance optimization, while broader AI also includes symbolic reasoning, problem solving, and governance-oriented concerns [
66]. In the present framework, this distinction prevents machine-learning-style components from being treated as autonomous legal reasoners. Instead, ML and NLP modules serve as candidate-generation and uncertainty-estimation mechanisms, whereas ontology-based and rule-based modules provide legal-semantic validation, evidentiary traceability, and doctrinal control.
This hybridization is appropriate for a high-stakes legal task because neither component family is sufficient alone. A purely statistical model may capture paraphrase, technical synonymy, and functional similarity, but it may also confuse general technical relatedness with legally sufficient claim coverage. A purely symbolic model may enforce legal rules transparently, but it may fail when real product descriptions use terminology that differs from claim language. The proposed DIKWP network combines the semantic flexibility of statistical processing with the auditability and constraint-sensitivity of symbolic reasoning.
Within the DIKWP interpretation, the Data and Information dimensions mainly host evidence acquisition and statistical extraction. The Knowledge dimension normalizes extracted candidates into ontologies and knowledge graphs. The Wisdom dimension performs legal sufficiency assessment, uncertainty handling, and claim-level aggregation. The Purpose dimension controls operating posture, including whether the system emphasizes enforcement-oriented recall, clearance-oriented precision, design-around analysis, or expert-review routing. Thus, the DIKWP network provides not only a modular engineering architecture but also an epistemological allocation of responsibility among statistical estimation, symbolic representation, legal reasoning, and purpose-sensitive decision support.
Table 1 summarizes the epistemological roles of the main symbolic, statistical, and control components in the proposed DIKWP network.
2.6. Research Gap and Positioning of This Study
Existing studies have established the value of patent-specific NLP, SAO extraction, ontology modeling, knowledge graphs, patent recommendation, semantic retrieval, and explainable legal AI. However, most prior work addresses only one part of the infringement-detection problem. SAO-based infringement analysis and knowledge-graph-based infringement prediction provide important foundations but do not by themselves constitute a full purpose-aware decision architecture [
7,
19]. Patent recommendation and innovation-design studies demonstrate the value of graph representations, but their primary tasks differ from claim-to-product infringement assessment [
16,
17]. Legal information extraction and ontology-based legal research address legal semantics, yet they do not fully integrate patent-specific claim-element logic with DIKWP-style purpose control [
25,
26].
A further limitation is that many architectures remain implicitly linear, even when they use advanced semantic models. Such designs are effective for retrieval or classification but less suitable for infringement analysis, where evidence selection, semantic interpretation, equivalence screening, and reporting thresholds may need to change according to the user’s objective. Purpose-driven DIKWP research provides the conceptual basis for treating purpose as an explicit control dimension [
34,
35]. DIKWP work on uncertainty and semantic judicial reasoning supports recurrent interaction among semantic spaces [
37,
38]. DIKWP semantic communication studies further motivate human-machine feedback as part of the reasoning architecture [
39,
40].
The present study addresses this gap by proposing a DIKWP-network-based semantic AI framework for patent infringement detection. The framework integrates patent-oriented NLP, ontology and knowledge-graph construction, rule-guided inference, uncertainty handling, purpose-aware control, and explanation generation within a single architecture. Patent infringement detection is therefore treated not as a standalone similarity task, but as a coordinated process of semantic understanding, structured knowledge integration, legal sufficiency assessment, and purpose-governed reasoning.
3. System Architecture Overview
Figure 1 presents the overall architecture of the proposed semantic AI system for patent infringement detection. Although the architecture is organized analytically around the five DIKWP dimensions—Data, Information, Knowledge, Wisdom, and Purpose—it is not implemented as a rigid bottom-up stack. Instead, it is designed as a networked semantic architecture in which these dimensions operate as interacting functional spaces. Bottom-up processing transforms patent and product documents into structured evidence, while top-down regulation from the purpose space dynamically adjusts matching strategies, reasoning strictness, and explanation requirements. In this sense, DIKWP serves not merely as a descriptive taxonomy, but as the organizing principle of a purpose-aware and explainable reasoning system [
1,
19,
23,
38].
For expository clarity,
Figure 1 arranges the DIKWP dimensions vertically. Operationally, however, the system behaves as a coordinated network in which modules exchange information recurrently rather than only sequentially. Data acquisition supports information extraction; extracted information is normalized and linked within the knowledge space; knowledge structures guide reasoning in the wisdom space; and the purpose space continuously conditions the behavior of the lower dimensions by regulating thresholds, retrieval priorities, and inference scope. This recurrent design is particularly important for patent infringement analysis, where claim interpretation, technical matching, and legal judgment often require iterative refinement rather than one-pass processing [
38,
40,
67].
At the data dimension, the system acquires the patent materials to be protected or examined and the description of the potentially infringing product or process. These materials may include patent claims, specification excerpts, product manuals, technical brochures, web descriptions, and, where available, patent drawings. In the current prototype, the inputs are provided as structured text files, but the architecture is extensible to web crawlers, enterprise databases, and document management systems. Before further processing, the raw materials are normalized through standard preprocessing operations such as character-encoding unification, removal of irrelevant boilerplate, and segmentation of legally relevant sections. For infringement analysis, the claims and the technically informative parts of the specification are prioritized over bibliographic front matter, because the central task is claim-to-product comparison rather than general patent retrieval.
The information dimension transforms raw text into structured semantic units suitable for downstream matching and reasoning. Patent claims are first segmented into individual claim elements, since infringement analysis ultimately depends on whether each legally material element can be found, directly or equivalently, in the accused product. Product descriptions are processed in parallel and decomposed into feature statements or predicate–argument structures. To support this transformation, the architecture employs a patent-adapted NLP pipeline consisting of tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and subject–action–object (SAO) extraction. The use of claim parsing and SAO-oriented semantic representation is motivated by earlier work showing that infringement-related comparison benefits from functional and structural representations that go beyond lexical overlap [
14,
15,
23].
The information dimension also performs term indexing and preliminary candidate matching. Each extracted claim element is associated with its corresponding technical entities, actions, and modifiers, while product features are indexed in an analogous manner. This step enables efficient retrieval of candidate correspondences before more expensive ontology-based reasoning is invoked. For example, a claim element such as “a hinge connecting the door to the frame” may be transformed into one or more structured units that preserve both the action and the relational context. Such representations are important because patent language often embeds technically critical qualifiers—such as location, direction, or function—that would be lost in purely keyword-based comparison [
23,
36].
The knowledge dimension is the core semantic integration space of the architecture. Here, the extracted information is mapped onto formal legal and technical representations, including domain ontologies, lexical resources, and a patent-oriented knowledge graph. The technical ontology encodes relevant entities and relations in the technological domain of the patent, such as components, functions, materials, part–whole structures, and operational relations. The legal ontology captures concepts such as claim, claim element, accused product, correspondence, infringement evidence, and equivalence. This dual-ontology design allows the system to align textual expressions with both technical semantics and legal interpretation [
16,
45,
58].
Because technical vocabulary evolves rapidly, the ontology layer is designed as a maintainable and extensible knowledge resource rather than as a fixed dictionary. New candidate concepts and relations can be harvested from patent claims, specifications, product manuals, CPC/IPC classifications, technical standards, and expert claim charts. Candidate terms are first proposed through named entity recognition, relation extraction, embedding-based clustering, and knowledge-graph completion. They are then validated through ontology consistency checks, provenance tracking, and expert review before being incorporated into the production ontology.
Once the ontology mapping is completed, the system builds a knowledge graph in which nodes represent claim elements, product features, technical entities, and inferred semantic correspondences, while edges represent structural, functional, and legal relations. This graph-based representation allows the architecture to preserve relational information that flat textual similarity measures often fail to capture. For instance, if a patent claim refers to a “hinge” and a product description refers to a “pivot,” the system may identify a potential correspondence not through lexical identity but through ontology-level proximity, lexical expansion, or functional equivalence. Knowledge-graph-based approaches have recently shown clear promise for patent representation and infringement-related analysis, especially when combined with semantic similarity and multimodal evidence [
17].
Reasoning rules are also maintained in the knowledge dimension. These include general correspondence rules, taxonomy-aware matching rules, and legal rules that operationalize core infringement standards. The most important of these is the all-elements rule, according to which literal infringement requires that every material claim element be present in the accused product. Additional rules can be introduced for function-based equivalence analysis, allowing the system to flag cases in which a product feature is not lexically identical to a claim element but may perform substantially the same function in a substantially similar way. In this manner, the knowledge dimension does not merely store information; it creates the structured semantic conditions under which infringement reasoning becomes possible.
The wisdom dimension performs high-level evidential reasoning and converts graph-level correspondences into a legal-technical assessment. Its role is not limited to executing deterministic rules; it also evaluates ambiguity, resolves conflicts, and aggregates heterogeneous evidence into a usable conclusion. In practice, not all claim elements are matched with equal certainty. Some correspondences may be exact, some ontology-mediated, and some only functionally analogous. The wisdom dimension therefore includes a confidence aggregation mechanism that assigns differentiated weights to different forms of match evidence and produces a final infringement-oriented assessment based on the configured decision policy.
This dimension also generates explanations. Because patent infringement analysis is a high-stakes legal task, a decision is of limited value unless the system can show how it was reached. The explanation generator therefore traces the reasoning path from extracted claim elements and product features, through ontology mapping and graph correspondence, to the final decision outcome. A typical output may state which claim elements were matched exactly, which were matched through semantic expansion or functional equivalence, and which remained unsupported or ambiguous. Such traceability is consistent with broader developments in explainable legal AI, where structured evidence presentation and reviewable reasoning are regarded as central requirements rather than optional features [
25,
26].
An optional case-based reasoning component can also be positioned in the wisdom dimension. This component does not replace rule-guided reasoning, but supplements it by retrieving and comparing similar prior analytical patterns or historical dispute configurations. If the current case resembles a previously observed non-infringement or high-risk pattern, this information may be used as supplementary decision support. In journal style, it is preferable to present this component as an optional extension rather than as a mandatory module, since its usefulness depends on the availability and quality of prior case data.
The purpose dimension regulates the architecture at the highest level of abstraction. In the proposed DIKWP network, purpose is not a passive label added after reasoning has finished. Instead, it functions as an active orchestration space that configures the system according to the user’s legal or strategic objective. Different purposes imply different operational preferences. An enforcement-oriented task may prioritize recall and sensitivity, thereby encouraging broader semantic matching and stronger attention to potential equivalence. A clearance or design-around task may prioritize precision and conservative risk control, thereby requiring stricter thresholds and more cautious treatment of uncertain matches. Recent DIKWP work in legal and semantic reasoning similarly emphasizes that purpose should be modeled as a controlling factor that shapes semantic processing rather than as an external annotation [
38,
63].
The purpose dimension also supports user interaction and feedback. Users may specify analytical preferences such as higher recall, stronger explainability, or emphasis on exact claim coverage. They may also initiate follow-up queries, for example by asking which elements were not matched, which correspondences are uncertain, or which design changes might reduce infringement risk. In such cases, the purpose orchestrator reconfigures the information, knowledge, and wisdom dimensions and initiates another reasoning cycle. This top-down influence is one of the main reasons why the architecture should be described as a DIKWP network rather than as a fixed layer-by-layer pipeline.
Communication across the architecture is therefore both bottom-up and top-down. Bottom-up processing remains the primary evidential path, beginning with raw documents and culminating in an infringement-oriented judgment. Top-down control, however, is equally important: purpose can modify reasoning strategy, and wisdom can request additional semantic evidence or refined ontology alignment when the available support is insufficient. This iterative interaction resembles the working process of a human analyst, who may return to the claims, re-examine terminology, or seek additional contextual evidence when initial comparison results remain inconclusive.
A major design principle of the proposed DIKWP network is modularity. Each functional component can be refined or replaced without redesigning the entire system. For example, a rule-based SAO extractor may later be replaced with a patent-specific relation extraction model, and the ontology-based matcher may be complemented by graph embeddings or other learned semantic similarity mechanisms. This modularity is particularly valuable in patent AI because both the language technology and the legal knowledge layer are evolving rapidly. Hybrid symbolic–subsymbolic enhancement is therefore possible within the same architecture, provided that explainability and traceability are preserved [
3,
19,
55].
A second key design principle is explainability. The architecture is designed to preserve a trace from raw evidence to final assessment, corresponding naturally to the DIKWP network. The data space records the original textual materials; the information space records extracted claim elements and product features; the knowledge space records ontological mappings, graph structures, and inferred correspondences; the wisdom space records confidence aggregation and decision rationale; and the purpose space records task settings and strategic constraints. This explicit traceability is essential for legal deployment because it supports expert review, error analysis, and procedural accountability [
23,
24].
Overall, the proposed DIKWP network treats patent infringement detection as a networked process of semantic transformation, structured knowledge integration, evidential reasoning, and purpose-aware orchestration. This formulation better reflects the actual requirements of patent analysis than a purely sequential or text-similarity-based pipeline. The next section presents the semantic processing workflow and illustrates how patent claims and product descriptions are transformed through the DIKWP network into an infringement-oriented analytical result.
4. Semantic Processing Workflow and Module Functionality in the DIKWP Network
Building on the system architecture introduced in
Section 3, this section describes how the proposed semantic AI system transforms patent claims and accused-product descriptions into an infringement-oriented assessment. For clarity of presentation, the workflow is described in a sequence of analytical stages. Operationally, however, the system is not implemented as a strictly linear pipeline. Instead, it functions as a networked DIKWP process in which the data, information, knowledge, wisdom, and purpose dimensions interact recurrently. Bottom-up processing converts raw text into structured evidence, while top-down control from the purpose dimension adjusts matching strategies, inference scope, uncertainty handling, and explanation requirements. This networked formulation is consistent with the DIKWP view that intelligent processing arises from interaction among semantic spaces rather than from a one-way hierarchical chain [
36,
38,
40].
The inputs to the workflow include one or more patent claims, a textual description of the potentially infringing product or process, and, where available, additional supporting documents such as manuals, drawings, or prior-art references. The output is a structured analytical report containing element-level correspondences, a claim-level infringement assessment, a confidence estimate, and an explanation trace.
Figure 2 summarizes this workflow. Although the stages are introduced below in a sequential manner, the system permits backward transitions. For example, if the wisdom dimension identifies an unresolved element mismatch, it may request additional ontology expansion from the knowledge dimension or more fine-grained feature extraction from the information dimension. Likewise, the purpose dimension may alter thresholds or matching rules in accordance with the user’s objective, such as enforcement, risk screening, or design-around analysis.
To illustrate the workflow, consider the following simplified example. Claim 1 of a hypothetical patent states: “A chair comprising a seat, a plurality of legs, and a backrest attached to the seat.” The accused-product description states: “Our product is a portable stool with a flat round sitting surface supported by three collapsible legs. It does not have any back support.” A human expert would immediately suspect that the product does not literally infringe the claim because the claim requires a backrest, whereas the product explicitly lacks any back support. The example is used below to show how the proposed DIKWP network reaches the same conclusion in a transparent and legally interpretable manner.
4.1. Data Acquisition and Normalization
The workflow begins in the data dimension, where raw patent and product materials are acquired and normalized. In the current prototype, these materials are provided as text files or structured text strings, although the architecture is extensible to patent databases, document management systems, and web-based product sources. At this stage, the system performs basic preprocessing operations such as character-encoding normalization, section segmentation, and removal of irrelevant boilerplate. For patent documents, the claims and technically informative parts of the specification are prioritized because infringement analysis depends primarily on claim construction and claim-to-product comparison rather than on bibliographic metadata or general background text.
In the running example, the data dimension stores the patent claim and the product description as raw textual inputs. No legal conclusion is drawn at this point. The purpose of the data dimension is to preserve the original evidence and make it available for subsequent semantic transformation. This separation is important for traceability because it enables the system to maintain an auditable record of the exact textual material from which later correspondences and inferences are derived.
4.2. Information Extraction and Claim Structuring
The information dimension converts raw text into structured semantic units suitable for downstream matching and reasoning. Patent claims are first segmented into legally material claim elements, while product descriptions are decomposed into feature statements or predicate–argument structures. This stage relies on a patent-adapted NLP workflow comprising tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and subject–action–object extraction. The use of claim parsing and SAO-oriented representation is motivated by prior research showing that infringement-related comparison benefits from structural and functional representations rather than lexical overlap alone [
15,
23,
64].
For the example claim, the system identifies three core elements: a seat, a plurality of legs, and a backrest attached to the seat. These elements can be represented by a set of structured relations such as (chair, comprises, seat), (chair, comprises, legs), (chair, comprises, backrest), and (backrest, attached_to, seat). The objective here is not simply to extract isolated terms, but to preserve the structural constraints embedded in the claim. The phrase “backrest attached to the seat,” for instance, is not treated as a flat keyword set; rather, it is decomposed into an object and an internal relation, because the attachment relation may later become relevant to element-level comparison.
The product description is processed in parallel. From the sentence “Our product is a portable stool with a flat round sitting surface supported by three collapsible legs,” the system extracts relations such as (stool, has, sitting_surface) and (sitting_surface, supported_by, legs). From the sentence “It does not have any back support,” the system extracts a negated relation, such as (stool, not_have, back_support). Negation handling is especially important. In patent infringement detection, the absence of a claim element can be legally decisive, and the system must preserve negative evidence rather than treat it as mere omission or noise.
The information dimension also performs term indexing and preliminary candidate matching. Each claim element is linked to its associated entities, modifiers, and quantities, while product features are indexed in an analogous manner. In the present example, the phrase “plurality of legs” is normalized into a quantity constraint, while “three collapsible legs” is represented as a feature bundle that includes both count and attribute information. This indexing stage supports efficient retrieval of candidate correspondences before ontology-based reasoning is invoked. It also supports later explanation generation because each extracted feature remains linked to its originating sentence.
4.3. Ontology Mapping and Knowledge Graph Construction
The knowledge dimension is the primary semantic integration space of the workflow. At this stage, the extracted information is mapped onto formal semantic structures, including a technical ontology, a legal ontology, lexical resources, and a patent-oriented knowledge graph. The technical ontology represents concepts such as chair, stool, seat, leg, and backrest, together with their relations and constraints. The legal ontology captures concepts such as claim, claim element, accused product, correspondence, infringement evidence, and equivalence. The purpose of this dual-ontology design is to align linguistic expressions with both technical semantics and legal interpretation [
17,
45,
58].
In the running example, the system maps “sitting surface” to the concept of seat and “back support” to the concept of backrest through ontology-based normalization and lexical expansion. The term “stool” is mapped to the concept stool, which may be modeled as a seating device but not necessarily as an object containing a backrest. The patent claim is then represented as a graph in which the claimed chair has part relations to seat, legs, and backrest, together with the relation attached_to(backrest, seat). The product is represented as a second graph in which the stool has a seat-like surface and legs, but no backrest node is instantiated. The negative statement in the product description is preserved either as an explicit negative relation or as a graph-level absence constraint.
Once the two representations are constructed, the system performs semantic alignment between claim elements and product features. This alignment uses lexical similarity, ontology subsumption, synonym expansion, and graph-level relational consistency. In the example, the correspondence between “sitting surface” and “seat” is straightforward, and the relation between “three collapsible legs” and “a plurality of legs” is supported by both lexical identity and numerical compatibility. The critical issue concerns the backrest element. Because the product text explicitly states that no back support is present, the system registers both the absence of a corresponding feature and explicit negative evidence against a backrest match.
The knowledge dimension also hosts the rule base used for infringement analysis. The central rule is the all-elements rule: literal infringement requires that every material claim element be found in the accused product. Additional rules may encode taxonomy-aware matching, quantity conditions, and function-based equivalence. For example, a product feature may satisfy a claim element either through direct identity or through ontology-mediated correspondence if it is a subtype or accepted synonym. Equivalence rules can also be invoked when a feature differs lexically but performs substantially the same function in a substantially similar way. In the present example, however, no substitute exists for the missing backrest; accordingly, both literal matching and equivalence-based matching fail for that element.
4.4. Reasoning, Decision Formation, and Explanation
The wisdom dimension transforms graph-level correspondences into an infringement-oriented assessment. Its role is not limited to executing deterministic rules. It also aggregates evidence, evaluates ambiguity, and produces a decision rationale that can be inspected by legal and technical experts. This is important because patent infringement analysis often yields mixed evidence: some claim elements may match exactly, some only approximately, and some may remain unsupported. In the proposed system, each element-level correspondence is therefore associated with a confidence value derived from the type and strength of the match.
In the running example, the seat element is matched through semantic normalization between “seat” and “sitting surface.” The legs element is matched through direct lexical and quantitative correspondence because the product contains three legs and therefore satisfies the claim requirement of a plurality of legs. The backrest element is not matched, and the product description provides explicit negative support for its absence. Under the all-elements rule, the failure of a single required element is sufficient to defeat literal infringement. Because the product also lacks any evident structure serving the same function as a backrest, the doctrine of equivalents is not triggered. The resulting conclusion is therefore non-infringement.
The explanation generator then produces a traceable report. A representative output may be formulated as follows: “No infringement is detected for Claim 1 because the accused product does not contain the backrest element required by the claim. The product includes a seat-like sitting surface and a plurality of legs, which correspond to two claim elements. However, the description explicitly states that the product has no back support. Because at least one material claim element is absent, literal infringement is not established.” This type of explanation links the decision directly to extracted evidence and legal logic, which is consistent with the requirements of explainable legal AI [
25,
26].
The wisdom dimension may also invoke auxiliary reasoning components when the evidence is less clear. An optional case-based reasoning module can retrieve similar prior analytical patterns or previously observed dispute configurations to support the current assessment. This component is not essential to the basic workflow, but it can strengthen decision support in borderline cases. More importantly, the wisdom dimension may request additional processing from the lower dimensions when the current evidence is insufficient. For example, if the match status of a claim element remains ambiguous, the system may request ontology refinement, synonym expansion, or extraction of additional relations from the product description. This feedback mechanism further illustrates why the proposed DIKWP formulation should be described as a network rather than a unidirectional processing chain.
4.5. Purpose-Guided Control and User Feedback
The purpose dimension regulates the workflow according to the user’s analytical objective. In the proposed DIKWP network, the purpose is not introduced only after the reasoning process has been completed. Rather, it actively configures the behavior of the lower dimensions throughout the analysis. An enforcement-oriented task may prioritize recall and broader semantic matching, thereby lowering the threshold for flagging possible equivalence. A clearance or design-around task may prioritize precision and conservative risk assessment, thereby requiring stricter correspondence criteria. This role of purpose as an active control space is consistent with the DIKWP network perspective advanced in recent work on purpose-sensitive semantic systems [
38,
40,
63].
In the present example, the legal conclusion remains non-infringement regardless of operating mode because the absence of a backrest is explicit and decisive. However, the form of the output may vary according to purpose. In an enforcement scenario, the system may report that no actionable claim coverage is presently supported by the available evidence. In a design-around scenario, the system may emphasize that the absence of a backrest is the principal reason why infringement is not established and may further indicate that adding a back-support structure would materially increase infringement risk. Thus, purpose affects not only thresholds and inference scope, but also the explanatory framing of the final report.
The purpose dimension also supports interactive follow-up queries. Users may ask which claim elements were unmatched, which correspondences relied on equivalence rather than direct identity, or which textual passages served as the decisive evidence. Because the system preserves a trace from raw text to decision, these questions can be answered transparently. If the available product description is incomplete, the purpose dimension may instruct the system to return an “insufficient evidence” status rather than a definitive non-infringement conclusion. This distinction is practically important because unmentioned features should not automatically be treated as absent features.
4.6. Robustness to Ambiguous Patent Language, Multi-Claim Processing, and Practical Considerations
Several robustness issues must be addressed in real-world applications. First, product descriptions are often incomplete, promotional, or strategically vague. In such circumstances, failure to detect a claim element may reflect missing disclosure rather than actual absence. The proposed workflow therefore distinguishes between explicit negative evidence and mere non-mention. Only the former should strongly support non-infringement, whereas the latter may trigger uncertainty handling or a request for additional materials.
Building on this distinction, the system assigns one of five evidence states to each claim element: supported, contradicted, uncertain, non-mentioned, or equivalence-candidate. Supported evidence indicates that a product feature directly or ontology-medially corresponds to a claim limitation while preserving all material constraints. Contradicted evidence is assigned when the product description explicitly denies the presence of the claimed feature or conflicts with a material relation, modifier, quantity, range, or sequence constraint. Uncertain evidence applies when the available description is vague, incomplete, or semantically under-specified. Non-mentioned features are not automatically treated as absent; rather, they trigger uncertainty handling, additional evidence retrieval, or human review. Equivalence-candidate evidence applies when a non-identical product feature may perform a comparable function in a comparable way with a comparable result, but is not treated as literal support.
Table 2 summarizes the five evidence states and their corresponding system actions.
Second, semantic mismatch may arise from domain-specific terminology, spelling variation, or differences between technical and commercial descriptions. To mitigate this problem, the knowledge dimension combines ontology-based normalization with lexical resources and, where appropriate, learned similarity models. This hybrid strategy reduces the risk of false mismatches when equivalent technical concepts are expressed using different surface forms.
Third, patent infringement analysis is inherently claim-specific. The workflow described above is therefore executed independently for each asserted claim. Since infringement of any valid asserted claim may be legally sufficient, the final report aggregates claim-level outcomes rather than producing only one global similarity score. This claim-by-claim processing is preferable to document-level matching because it preserves legal granularity and supports targeted explanation.
Overall, the proposed semantic processing workflow converts raw patent and product texts into a structured, knowledge-rich, and purpose-sensitive infringement assessment. Although described in ordered stages for clarity, the workflow is implemented as a DIKWP network in which data acquisition, semantic extraction, ontology mapping, evidential reasoning, and purpose-guided orchestration interact recurrently. This networked design enables the system to combine claim-level precision, semantic flexibility, legal interpretability, and user-oriented control within a single analytical framework. The next section presents the prototype implementation and representative outputs generated by the system.
5. Prototype Implementation and Illustrative Outputs in the DIKWP Network
This section presents a proof-of-concept implementation of the proposed semantic AI framework for patent infringement detection. The prototype is intended to show how the DIKWP-based design can be operationalized through existing natural language processing, semantic-web, and rule-reasoning technologies, and how its intermediate reasoning states can be represented in an explainable form. The illustrative examples in this section make the element-level representation, ontology-mediated correspondence, rule-based inference, and explanation trace concrete, while the quantitative performance evaluation and statistical interpretation are reported separately in
Section 6. In contrast to a rigid sequential pipeline, the prototype is implemented as a networked semantic process in which the data, information, knowledge, wisdom, and purpose dimensions exchange intermediate representations recurrently. Information extraction produces structured evidence for ontology mapping, knowledge-driven reasoning constrains subsequent matching, and purpose settings regulate inference sensitivity, uncertainty handling, and reporting style [
63].
5.1. Prototype Realization
The prototype was implemented primarily in Python 3.11, with semantic-web components used for ontology engineering and graph-based knowledge representation. At the information dimension, linguistic preprocessing relies on spaCy for tokenization, part-of-speech tagging, and dependency parsing, with domain-oriented customization to better handle long patent claims and technical noun phrases. Subject–action–object extraction is implemented through a hybrid rule layer that integrates dependency patterns, matcher templates, and curated grammar rules for passive constructions and domain-specific relation patterns, such as “attached to,” “supported by,” and “connected to.” Technical entity recognition is handled by a transformer-based model fine-tuned on patent annotations, enabling the detection of components, materials, and other domain-relevant entities. This design choice is consistent with prior work showing that claim parsing and SAO-oriented semantic structures are more suitable for infringement-related comparison than purely lexical similarity [
14,
23].
The knowledge dimension is implemented through an OWL ontology designed in Protégé and a graph representation managed through RDFlib. The ontology contains approximately fifty classes covering general mechanical and electrical components together with core legal concepts, including Patent, Claim, ClaimElement, Product, ProductFeature, Correspondence, and InfringementEvidence. WordNet-based lexical expansion is used where appropriate to support synonym and hyponym matching. The extracted claim elements and product features are converted into RDF-style triples and then integrated into a claim–product knowledge graph. This graph functions as the semantic substrate for ontology alignment, element correspondence, rule evaluation, and explanation tracing [
17,
45].
The prototype enforces the epistemological separation described in
Section 2.5. Outputs from statistical modules are stored as candidate structures rather than as final legal determinations. For example, transformer-based entity recognition may propose a product feature, and embedding similarity may propose a candidate correspondence between that feature and a claim limitation. However, the correspondence is not counted as legally supported until the ontology layer confirms concept compatibility, the graph layer preserves relation consistency, and the rule engine verifies that no material modifier, quantity constraint, negative evidence, or legal exclusion defeats the match.
Reasoning in the wisdom dimension is implemented through a hybrid strategy that combines ontology lookup with a custom rule engine in Python. Although off-the-shelf semantic reasoners were considered during prototyping, a lightweight custom engine proved more convenient for element-wise claim analysis, negative-evidence handling, and purpose-conditioned threshold adjustment. Literal infringement is operationalized through an all-elements rule. Let c denote an asserted patent claim and let p denote the accused product or process description. The set denotes the legally material claim elements extracted from c after claim segmentation. Each claim element is represented as a structured unit , where denotes the normalized core technical concept, denotes required structural or functional relations, denotes textual modifiers, denotes quantity or range constraints, and denotes negation, exclusion, or limiting conditions. The set denotes the product features extracted from p. Each product feature is represented as , where denotes the normalized product concept, denotes observed structural or functional relations, denotes observed modifiers, denotes observed quantities or ranges, denotes the evidence state, and records provenance links to the source text. Literal infringement is operationalized through an all-elements rule. In the detailed legal reasoning module below, this rule is formalized through the predicate, which requires satisfaction of the core concept, required relations, modifiers, quantities, sequence constraints, and negative-evidence gates.
A simplified functional-similarity screening predicate is also included, but it is not treated as an infringement predicate. The predicate
is used only as an internal screening signal for equivalence-candidate routing. It returns 1 when the product feature appears to perform a comparable function in a technically comparable way and produces a comparable technical result according to the function ontology and curated functional-action rules. However,
does not establish legal equivalence, does not satisfy the all-elements rule by itself, and does not override explicit negative evidence, prosecution-history constraints, or material claim limitations. The legally cautious equivalence-screening rule is formalized later as
in
Section 5.2.
Table 3 lists representative failure and boundary cases used to illustrate how the prototype separates literal-support failures from functional-equivalence screening.
The purpose dimension is realized as a lightweight orchestration layer that controls system mode, reasoning strictness, and output configuration. In an enforcement-oriented mode, the system can operate with higher sensitivity and broader equivalence screening. In a clearance or design-around mode, it can apply stricter correspondence criteria and emphasize unmatched claim elements or non-infringing design distinctions. The current prototype exposes these controls through a command-line interface, but the analytical core is interface-independent and can be embedded in a graphical or web-based system without changing the underlying DIKWP network logic.
Ontology expansion is handled through a semi-automatic update loop. The system records unmatched claim elements, low-confidence correspondences, and repeatedly observed out-of-vocabulary technical terms. These items are added to an ontology expansion queue. Candidate concepts and relations are generated by patent-specific entity recognition, SAO extraction, embedding-based term clustering, and graph-completion methods. However, automatic expansion is not directly committed to the legal-technical ontology. Each proposed update is associated with provenance metadata, confidence scores, and source evidence, and it must pass consistency checking and expert validation. This design allows the ontology to evolve with emerging technologies while preserving the traceability required for legal decision support.
5.2. Legal Operationalization of Claim Construction and Infringement Reasoning
To make the legal reasoning component operational rather than merely conceptual, the prototype represents claim construction, limitation decomposition, literal matching, equivalence screening, missing-limitation handling, prosecution-history constraints, and legally relevant similarity as explicit data structures and rules. The system does not treat patent infringement as document-level similarity. Instead, it treats infringement-risk assessment as a claim-by-claim and element-by-element reasoning process in which each asserted limitation must be represented, matched, contradicted, or marked as evidentially unresolved.
Claim limitation decomposition begins by segmenting each asserted claim into the preamble, transition phrase, and body limitations. The body is further decomposed into material limitations, including structural components, functional requirements, relational constraints, modifiers, quantity or range constraints, process-order requirements, negative limitations, and dependent-claim references. Where available, specification excerpts and prosecution-history materials are linked to the corresponding limitation as claim-construction evidence. The output is not a flat keyword list, but a set of legally structured claim-limitation objects.
For an asserted claim
c, the decomposed claim-element set is denoted as
. Each claim limitation is represented as
Here, is the limitation identifier; indicates whether the limitation is structural, functional, relational, numerical, material, process-oriented, or negative; denotes the normalized technical concept; records required structural, spatial, causal, or functional relations; records material, positional, shape, or purpose modifiers; records numerical, range, threshold, and plurality constraints; records method-step order; records negative or exclusionary language; records inherited dependent-claim limitations; records claim-construction sources from claim language, specification, or other intrinsic evidence; records prosecution-history constraints; indicates whether the limitation is material for all-elements reasoning; and records provenance links to the source text.
Table 4 defines the fields used in this structured claim-limitation representation.
Missing limitations are handled conservatively. The prototype distinguishes supported, contradicted, non-mentioned, uncertain, and equivalence-candidate evidence states. Explicit contradiction is treated as strong negative evidence. Non-mention is not treated as proof of absence because real product descriptions may be incomplete or selectively drafted. A claim-level positive literal-support outcome is generated only when every material limitation is supported. If one material limitation is contradicted, literal support fails. If one material limitation is non-mentioned or uncertain, the system returns an insufficient-evidence or expert-review status rather than a definitive positive conclusion.
Table 5 reports the corresponding evidence states and their claim-level consequences.
For a product feature
f and a claim limitation
e, literal support is defined as
The predicate
may be satisfied by direct lexical identity, accepted synonymy, or ontology-mediated literal support, such as a recognized subtype or domain-standard alternative term. However, literal support still requires that all legally material relations, modifiers, quantities, ranges, and process-order constraints be satisfied. Therefore, ontology-mediated semantic normalization is not equivalent to unrestricted semantic similarity.
where
denotes the set of material claim limitations.
The doctrine-of-equivalents component is implemented only as a screening mechanism. It does not produce a final legal determination of equivalence. Instead, it produces an expert-review flag:
Here, denotes a simplified function–way–result screening predicate. It is satisfied only when the product feature performs a comparable function, in a comparable technical way, and achieves a comparable technical result according to the functional ontology and curated rule base. Even when , the system does not conclude doctrine-of-equivalents infringement. It only flags the limitation for expert legal review.
Table 6 summarizes the legal matching hierarchy applied by the prototype.
Prosecution-history limitations are represented in the legal ontology and knowledge graph rather than being treated as external narrative comments. The legal ontology is extended with the classes Amendment, NarrowingAmendment, ApplicantArgument, ExaminerRejection, Disclaimer, EstoppelConstraint, ExcludedEquivalent, and ClaimConstructionEvidence. These classes allow the system to represent whether a limitation was narrowed, whether a feature was disclaimed, whether an applicant distinguished prior art on a particular ground, and whether an asserted equivalence candidate falls within an excluded technical territory.
In the knowledge graph, prosecution-history evidence is encoded through typed edges such as narrowed_by, disclaimed, argued_distinct_from, and blocked_by. The rule engine uses these relations as gates. If a proposed product feature falls within an excluded equivalent or a disclaimed feature, the system blocks the equivalence-candidate flag even if the feature is functionally similar. If prosecution-history evidence is unavailable, the system does not assume that no estoppel exists. Instead, the output is marked as prosecution-history-unchecked, and the equivalence assessment is routed to expert review.
The system distinguishes legally relevant similarity from merely technical similarity through an anchoring rule. A correspondence is legally relevant only when it is anchored to a specific material claim limitation, supported by traceable product evidence, and not defeated by relation, modifier, quantity, sequence, negative-evidence, or prosecution-history constraints. General topical similarity, shared technical field, common function, or embedding proximity is therefore insufficient.
If but , the system records the relation as background technical similarity only. It is not counted as a matched limitation, does not satisfy the all-elements rule, and does not support a positive infringement-risk output.
Table 7 summarizes how prosecution-history constraints are represented in the legal ontology and knowledge graph.
Table 8 summarizes the operational steps used for legal reasoning in the prototype.
5.3. Representation of Claim and Product Knowledge
A central design goal of the prototype is to preserve the structural relation between claim language and product evidence rather than collapsing both into flat text vectors. Each patent claim is therefore decomposed into a set of claim elements, and each element is represented as an entity node together with its attributes and internal relations. Product descriptions are processed in a parallel manner. When a claim contains a relation such as “a backrest attached to the seat,” the representation includes both the backrest entity and the relational edge attached_to(backrest, seat). Likewise, when a product description states that a “sitting surface is supported by three collapsible legs,” the representation preserves the component identity, the support relation, and the descriptive modifiers.
The resulting knowledge graph is source-aware. Claim-derived nodes and product-derived nodes remain distinguishable, while candidate correspondences are represented by explicit alignment relations. This design enables three forms of evidential reasoning. First, direct matches can be established when the same or synonymous concepts appear in both sources. Second, ontology-mediated matches can be inferred when a product feature is a subtype or accepted variant of the claimed element. Third, explicit negative evidence can be preserved when the product description denies the presence of a claimed feature. The last of these is particularly important in patent infringement analysis, because an explicit absence statement may be legally decisive and should not be treated as a mere omission.
5.4. Illustrative Chair–Stool Example
To demonstrate the end-to-end behavior of the prototype, the system was applied to the running example introduced in
Section 4. The asserted claim was: “A chair comprising a seat, a plurality of legs, and a backrest attached to the seat.” The accused-product description was: “Our product is a portable stool with a flat round sitting surface supported by three collapsible legs. It does not have any back support.” A human analyst would ordinarily conclude that literal infringement is unlikely because the accused product lacks the backrest element. The purpose of this experiment was to determine whether the prototype would reproduce the same conclusion through a transparent DIKWP-network reasoning process.
At the information dimension, the claim was segmented into three required elements: seat, plurality of legs, and backrest attached to the seat. The product description yielded three relevant feature statements: a sitting surface, three collapsible legs, and an explicit negative statement indicating the absence of back support. At the knowledge dimension, sitting surface was normalized to the concept of seat, and back support was mapped to the concept of backrest for the purpose of semantic comparison. The claim graph and the product graph were then aligned and tested against the all-elements condition.
The result was a clear non-infringement outcome. The seat element was matched through ontology-based normalization between sitting surface and seat. The legs element was matched directly, with the additional numerical check that three legs satisfy the claim requirement of a plurality of legs. The backrest element was not matched, and the explicit negative product statement provided positive support for its absence. Because at least one material claim element was missing, literal infringement was rejected. No substitute structure was identified that would satisfy the simplified equivalence condition, and equivalence-based concern was therefore also rejected.
Table 9 summarizes the element-level comparison.
The explanation generator then produced a structured report stating that the product contains seat and leg features corresponding to two claim elements, but does not contain the required backrest or any identified equivalence-candidate substitute. This example is useful not because the legal conclusion is difficult, but because it shows that the prototype preserves a traceable path from raw text to final assessment. Such traceability is a central requirement in explainable legal AI and is especially important for patent infringement analysis, where expert review must remain possible at the level of individual claim elements and supporting evidence [
25,
26].
To complement this minimal example,
Table 10 presents a litigation-style claim–product semantic comparison for a mechanical clamp. The table illustrates how the DIKWP network renders claim limitations, ontology concepts, and aligned product features within a format that remains familiar to patent professionals. Unlike purely lexical claim charts, this semantic chart separates literal support, ontology-mediated correspondence, and functional-equivalence candidates. In the example, a pivot may provide an ontology-mediated correspondence to a hinge limitation if claim construction permits, while a torsion spring may trigger a functional-equivalence candidate flag rather than a final legal equivalence conclusion.
The prototype also records a rule-oriented decision trace linking semantic matches to legal inference.
Table 11 summarizes this operational logic. The table should be read as the prototype’s reasoning abstraction rather than as a complete statement of jurisdiction-specific patent doctrine. Its purpose is to show how literal coverage, equivalence-oriented screening, and failure of the all-elements condition are distinguished transparently within the DIKWP network.
The examples in
Table 9,
Table 10 and
Table 11 are illustrative outputs intended to demonstrate element-level representation, semantic correspondence, and rule-oriented reasoning traces. They are not used as validation evidence. Quantitative evaluation and statistical interpretation are reported separately in
Section 6.
5.5. Additional Qualitative Evaluation
Beyond the chair–stool example, the prototype was tested on several hypothetical scenarios and one qualitative case derived from a publicly reported coffee-capsule patent dispute. In that case, the patent description and the accused product were sufficiently similar to trigger extensive correspondence analysis, yet one structural element required by the asserted claim did not appear in the literal claimed form. The system initially rejected direct element identity, but it also detected a functionally similar edge structure that could trigger an equivalence-candidate flag. The resulting output was therefore not a simple binary judgment, but an uncertainty-aware report indicating that literal correspondence was incomplete while equivalence-based concern remained.
This qualitative case illustrates the benefit of the DIKWP network formulation. The initial mismatch identified in the knowledge dimension triggered additional reasoning in the wisdom dimension, which in turn depended on purpose-conditioned evaluation criteria. In an enforcement-oriented configuration, the same case would be reported as an equivalence concern requiring legal review. In a clearance-oriented configuration, the system instead emphasizes the disputed element as a possible design vulnerability or redesign target. The semantic evidence remains the same, but the analytical posture and reporting logic are shaped by the purpose dimension.
5.6. Preliminary Throughput and Implementation Observations
A preliminary throughput test was conducted to assess whether the prototype architecture is computationally workable for small-scale experimental use. In this test, one patent claim set was compared against 1000 randomly selected patent abstracts used as surrogate product descriptions. Under a rule-based configuration without deep neural extraction, the system processed approximately 50 cases per minute on a standard desktop computer used for prototyping. When the patent-adapted deep NLP models were enabled, throughput decreased to approximately 5 cases per minute. These numbers are not intended as benchmark results; rather, they provide a practical indication of the relative computational burden of the main modules.
The main bottleneck was not graph matching or rule execution, but linguistic parsing and entity extraction. This observation is consistent with the broader patent-NLP literature, where domain-adapted parsing and semantic extraction often dominate runtime cost [
1,
3]. From a design perspective, this suggests that future optimization should focus on batching, pre-parsing, model distillation, or staged candidate filtering rather than on reducing the already moderate cost of ontology-based reasoning. It also supports the modular strategy adopted in the prototype, because individual components can be replaced or accelerated without altering the overall DIKWP network structure.
Future optimization will focus on the information dimension, where linguistic parsing and entity extraction dominate runtime. Several strategies are planned. First, a staged filtering architecture can apply lightweight lexical retrieval, BM25-style ranking, or dense vector retrieval to identify candidate patent–product pairs before invoking the full deep NLP pipeline. Second, repeated claim parsing can be cached, because the same patent claims are often compared against many products. Third, transformer-based extraction models can be accelerated through batching, GPU inference, quantization, pruning, and task-specific model distillation. Fourth, graph construction can be made incremental so that only changed or newly introduced entities are recomputed. These optimizations would reduce latency while preserving the explainable DIKWP reasoning structure.
5.7. Implications of the Prototype
The prototype should be interpreted as a proof of feasibility rather than as a production-level legal system. Its main contribution is to show that the proposed DIKWP network can be instantiated through existing NLP, semantic-web, and rule-reasoning techniques while preserving explainability and legal interpretability. The implementation demonstrates that the architecture is capable of preserving claim-element structure, handling negative evidence, supporting ontology-mediated semantic alignment, producing human-readable explanations, and adjusting its reporting behavior to different analytical purposes.
At the same time, the current prototype also reveals the main areas requiring further development. Ontology coverage remains limited to a manageable set of technical and legal classes; equivalence reasoning is operational rather than jurisprudentially complete; and the interface is intentionally lightweight. These limitations do not weaken the conceptual value of the framework, but they identify the next steps toward a more mature system. In particular, richer domain ontologies, stronger patent-specific entity recognition resources, multimodal processing of patent drawings, and tighter integration between symbolic reasoning and learned semantic matching would substantially improve practical coverage [
19,
55].
Overall, the prototype validates the architectural claim advanced in the previous sections: patent infringement detection can be modeled as a networked process of semantic extraction, ontology-guided alignment, evidential reasoning, and purpose-aware orchestration. The next section therefore moves from illustrative implementation results to a more systematic evaluation using controlled experimental data and quantitative performance metrics.
6. Experimental Evaluation of the DIKWP Network on Controlled and Real-World Patent–Product Corpora
This section evaluates the proposed DIKWP-based semantic AI framework from three perspectives: infringement-risk detection effectiveness, computational efficiency, and the contribution of the DIKWP network dimensions. To address both internal controllability and external validity, the revised evaluation uses a two-layer design. The first layer is a controlled synthetic patent–product corpus, which allows the reasoning behavior of different system configurations to be isolated and compared under known positive, negative, and near-miss conditions. The second layer is a real-world pilot corpus constructed from publicly available patent claims and real product technical descriptions. This pilot corpus is designed to examine whether the framework can preserve claim-element reasoning under realistic patent drafting styles, domain-specific terminology, incomplete product evidence, and borderline semantic correspondences.
The two evaluation layers serve different purposes and are therefore reported separately. The controlled synthetic corpus is used for configuration comparison, ablation analysis, and transparent precision–recall calculation. The real-world pilot corpus is used as an external-validity check rather than as a litigation-level benchmark. This distinction is important because real patent infringement analysis depends not only on textual similarity but also on claim construction, product-description completeness, prosecution-history context, and expert legal interpretation.
6.1. Dataset Construction and Annotation Protocol
To address the possibility that the reported precision and recall could reflect artifacts of dataset design rather than genuine element-level reasoning, the controlled synthetic corpus was constructed according to an explicit claim-element protocol. The controlled corpus contains 100 patent–product pairs generated from 20 synthetic patent claim templates across four technical domains: mechanical devices, electrical/electronic devices, software/process methods, and medical tools. Each template was used to generate five product descriptions representing different claim-to-product correspondence patterns. The purpose of this corpus is not to approximate the full complexity of patent litigation, but to provide a transparent and controllable evaluation setting in which claim-limitation decomposition, missing-limitation handling, near-miss discrimination, and equivalence-candidate routing can be tested.
Each synthetic patent claim template was first decomposed into legally material claim limitations. A material limitation was defined as a claim element whose absence, contradiction, or material alteration would affect the claim-level infringement-risk label under the prototype’s all-elements reasoning rule. The decomposed limitations included core technical concepts, structural relations, functional relations, spatial constraints, material or purpose modifiers, numerical or range constraints, and process-order constraints. Product descriptions were then generated by systematically preserving, paraphrasing, omitting, contradicting, or functionally modifying these limitations.
Positive cases were generated only when all material limitations were preserved in the product description. To avoid trivial lexical overlap, positive cases included literal-support descriptions, synonym/paraphrase descriptions, and ontology-mediated support descriptions in which the product expression used a domain-recognized alternative term while preserving the material relation and constraint. Negative cases were generated by omitting, contradicting, or materially altering at least one required limitation. Near-miss negative cases preserved most claim elements but changed one legally decisive limitation, such as a spatial relation, material modifier, numerical range, or process-step order. Borderline cases were generated when a product feature was functionally related to a claim limitation but was not treated as literal or ontology-mediated support. These cases were labeled as equivalence-candidate cases and routed to expert review rather than counted as definitive literal-support positives.
Table 12 summarizes the generation rule, annotation criterion, and evaluation purpose for each case type in the controlled synthetic corpus.
The corpus was annotated at two levels. At the element level, each material claim limitation was assigned an evidence state. At the claim level, each patent–product pair was assigned an infringement-risk label. The claim-level label was derived from the element-level evidence states rather than from global document similarity. This design prevents a high overall textual similarity score from producing a positive label when a material limitation is missing, contradicted, or legally unresolved.
Table 13 reports the element-level and claim-level annotation criteria used for the controlled synthetic corpus.
Table 14 reports the label distribution and domain coverage of the controlled synthetic corpus. The corpus contains 50 positive cases and 50 non-positive cases. The non-positive cases include 20 missing-limitation cases, 20 near-miss cases, and 10 borderline equivalence-candidate cases. This design prevents the evaluation from collapsing into an easy positive-versus-obvious-negative task.
Table 15 provides representative positive, negative, near-miss, borderline, and insufficient-evidence examples. These examples are included to make the annotation logic inspectable. They also show that the labels were assigned through element-level legal-technical reasoning rather than through document-level lexical overlap.
Several safeguards were used to reduce dataset-design artifacts. First, positive cases were not generated by simply copying claim language; synonym, paraphrase, and domain-specific alternative terminology were introduced. Second, negative cases were not limited to obvious non-infringement examples; near-miss cases preserved most claim elements while changing one decisive limitation. Third, labels were assigned at the claim-element level before claim-level aggregation, so global document similarity could not by itself produce a positive label. Fourth, borderline equivalence-candidate cases were separated from literal-support cases and were not treated as definitive infringement positives. These safeguards make the evaluation more suitable for testing structured reasoning than a simple lexical-overlap benchmark.
6.2. Evaluation Settings and System Configurations
The evaluation compared five system configurations. The first was a keyword-matching baseline based on lexical overlap between patent claims and product descriptions. The second was a semantic configuration using SAO extraction and ontology-based matching without the full DIKWP reasoning and purpose control. The third was the full proposed system in its default setting. The fourth and fifth were purpose-conditioned variants of the full system, namely an aggressive mode optimized for higher recall and a conservative mode optimized for higher precision. These variants allow the contribution of the purpose dimension to be assessed explicitly.
The quantitative metrics included precision, recall, and F1-score at the patent–product pair level. Average processing time, scalability with respect to corpus size, and memory consumption were also measured. Explanation quality was manually assessed by checking whether each generated explanation identified the decisive claim element or elements. In 98 of the 100 evaluated cases, the generated explanation correctly identified the decisive claim element or elements. In the remaining two cases, parser errors led to slightly inaccurate element naming, although the final infringement label remained unchanged.
6.3. Real-World Pilot Corpus and External-Validity Check
To address the limited external validity of a purely synthetic patent–product corpus, we added a real-world pilot corpus based on publicly available patent claims and real product technical descriptions. The purpose of this pilot corpus is not to construct a litigation-level benchmark, but to examine whether the proposed DIKWP network can preserve claim-element reasoning under realistic drafting styles, domain-specific terminology, incomplete product evidence, and borderline claim-to-product correspondences. This design directly responds to the concern that synthetic patent–product pairs may not fully capture the ambiguity and evidentiary incompleteness of real patent infringement analysis.
Patent claims in the pilot corpus were collected from public patent-search resources. For each selected patent, one independent claim was used as the primary asserted claim, and dependent claims were considered when they introduced material technical limitations relevant to the product comparison. Product evidence was collected from real-world technical materials, including manufacturer manuals, technical datasheets, official product webpages, installation guides, user manuals, and technical brochures. Short promotional descriptions or commercial summaries were not used alone unless they contained technically informative product features. Where publicly available litigation-related materials were available, such as court opinions, complaints, claim-construction materials, or claim-chart-like descriptions, they were used only for qualitative sanity checking of legally relevant claim limitations rather than as binding legal determinations.
The pilot corpus was designed to complement, rather than replace, the controlled synthetic corpus. The synthetic corpus provides internal controllability because positive, negative, and near-miss conditions can be systematically constructed. By contrast, the real-world pilot corpus provides an external-validity check because real patents and product descriptions often contain drafting variation, implicit terminology, incomplete disclosure, and domain-specific expressions. The two corpora are therefore reported separately and are not pooled into a single performance estimate.
The pilot corpus contains real patent–product pairs across three technical domains: mechanical devices, electrical/electronic devices, and software/process claims. These domains were selected because they represent different types of claim limitations. Mechanical-device claims often contain structural and spatial relations. Electrical/electronic claims often contain component, signal, and control-function relations. Software/process claims often contain functional limitations, process steps, and sequence constraints. This domain coverage allows the pilot study to test whether the DIKWP network can handle different forms of claim-to-product correspondence beyond surface lexical overlap.
Table 16 summarizes the composition of the real-world pilot corpus.
Each real-world patent–product pair was annotated at two levels. At the element level, each material claim limitation was assigned one of five evidence states: supported, contradicted, non-mentioned, uncertain, or equivalence-candidate. A limitation was labeled as supported when the product evidence directly, synonymously, or ontology-medially disclosed the required feature and its material constraints. A limitation was labeled as contradicted when product evidence explicitly denied the required feature or disclosed an incompatible relation, modifier, range, quantity, or process order. A limitation was labeled as non-mentioned when the available product evidence did not disclose the feature. A limitation was labeled as uncertain when the product evidence was vague, incomplete, or technically under-specified. A limitation was labeled as equivalence-candidate when a product feature was not literally identical to the claimed limitation but appeared functionally related in a way that could require expert doctrine-of-equivalents review.
At the claim level, each pair was assigned one of four outcome labels: likely literal support, likely non-coverage due to a missing or contradicted material limitation, borderline equivalence concern, or insufficient evidence. Likely literal support was assigned only when every material claim limitation was supported by traceable product evidence. Likely non-coverage was assigned when at least one material limitation was contradicted or materially mismatched. Borderline equivalence concern was assigned when literal support was incomplete but a non-identical product feature appeared to perform a comparable function in a comparable way with a comparable technical result. Insufficient evidence was assigned when one or more material limitations were not disclosed by the product evidence and no reliable positive or negative inference could be made.
For reporting and interpretation, definite positive and definite negative cases were separated from borderline and insufficient-evidence cases. The real-world pilot corpus was not pooled with the controlled synthetic corpus and was not used to produce a separate headline precision–recall estimate. Instead, it was used as a qualitative external-validity check to examine whether the DIKWP network could preserve element-level reasoning, avoid overconfident infringement or non-infringement conclusions, and route uncertain or borderline cases to expert review or insufficient-evidence status. This reporting design is important because forcing legally unresolved cases into binary labels would overstate the certainty of the system output.
Table 17 summarizes the annotation criteria used for the real-world pilot corpus.
The real-world pilot corpus also highlights a practical difficulty that is less visible in synthetic testing: product descriptions often omit legally material implementation details. In such cases, the system should not treat non-disclosure as proof of absence. The DIKWP network therefore assigns non-mentioned or uncertain evidence states and routes the case to expert review when the available evidence cannot support a definite claim-level conclusion. This conservative behavior is consistent with the intended role of the framework as an explainable decision-support tool rather than an autonomous infringement adjudicator.
The pilot corpus remains limited in scale and should not be interpreted as a statistically stable benchmark. Its purpose is to test external validity at a preliminary level by exposing the system to real patent drafting styles and real product-description quality. These pilot observations were analyzed qualitatively and were not pooled with the controlled-corpus precision–recall scores. Broader validation will require larger expert-annotated patent–product datasets, public claim charts, litigation materials, prosecution-history records, and multimodal evidence such as patent drawings and product images.
6.4. Results on the Controlled Synthetic Corpus
Table 18 reports the main classification results on the controlled synthetic corpus. These results are used to compare system configurations under a balanced and controlled setting. They are reported separately from the real-world pilot corpus and should not be interpreted as estimates of real-world litigation accuracy. The keyword baseline achieved a precision of 0.68, a recall of 0.52, and an F1-score of 0.59. This result indicates that surface-level lexical overlap is insufficient for infringement-risk analysis, especially when claims and product descriptions express similar functions through different terminology. The ontology-supported semantic configuration improved performance to 0.78 precision, 0.70 recall, and 0.74 F1, indicating that semantic normalization and structured matching already provide a measurable improvement over lexical methods. The full DIKWP network in default mode improved precision to 0.87, recall to 0.82, and F1-score to 0.85 within the controlled synthetic corpus.
To make the calculation of the default-mode performance transparent,
Table 19 reports the corresponding confusion matrix. The full DIKWP network correctly identified 41 of the 50 positive cases and correctly rejected 44 of the 50 negative cases. It missed 9 positive cases and falsely flagged 6 negative cases. These counts yield an accuracy of 0.85, a precision of 0.87, a recall of 0.82, and an F1-score of 0.85.
The performance gains are further reported as absolute within-corpus effect sizes in
Table 20. Compared with the keyword-matching baseline, the default DIKWP network improved F1 by 0.26. Compared with the semantic-and-ontology configuration without full DIKWP reasoning, it improved F1 by 0.11. These gains indicate measurable analytical benefit within the constructed evaluation setting, but they should not be interpreted as externally generalizable estimates of litigation-level performance.
Because the evaluation corpus contains only 100 synthetic patent–product pairs, the reported gains remain sensitive to case construction, domain coverage, parser behavior, and ontology completeness. As an additional indication of small-sample uncertainty, Wilson 95% confidence intervals were estimated for the default-mode results. The approximate intervals were 0.75–0.94 for precision, 0.69–0.90 for recall, and 0.77–0.91 for accuracy. These intervals characterize uncertainty within the controlled corpus only and should not be treated as confidence intervals for real-world litigation performance.
The improvement from the ontology-based configuration to the full DIKWP configuration is analytically important. It indicates that the performance gain does not arise from semantic normalization alone. The wisdom dimension contributes by enforcing claim-level consistency, especially through the all-elements rule and conflict resolution among partially matched elements. The purpose dimension contributes by controlling the decision threshold and the treatment of borderline correspondences. Within the controlled experimental setting, the DIKWP advantage is therefore not reducible to any single module; it emerges from the coordinated interaction of semantic extraction, structured knowledge, evidential reasoning, and purpose-sensitive orchestration.
The aggressive and conservative configurations further demonstrate the operational value of the purpose dimension. In aggressive mode, recall increased to 0.92, while precision decreased to 0.80. This setting is appropriate when the primary objective is to avoid missing potentially infringing products, even at the cost of additional manual review. In conservative mode, precision increased to 0.93, while recall decreased to 0.75. This setting is more appropriate for clearance analysis, where false positives may lead to unnecessary redesign efforts or legal concern. These two operating points show that the purpose dimension can shift the precision–recall balance within the prototype, although the stability of this behavior requires further validation on larger and externally sourced datasets.
Figure 3 visualizes the precision, recall, and F1-score of the main configurations. The figure shows that the full DIKWP network achieves the best overall balance within the controlled corpus, whereas the two purpose-conditioned variants shift the operating point toward recall or precision according to the analytical objective.
Performance also varied across technical domains.
Table 21 reports a domain-disaggregated view across the four categories used in the controlled synthetic corpus. Mechanical and electrical/electronic cases produced stronger results because their structural and component-level limitations were more directly represented in the ontology. Software/process and medical-tool cases were more difficult because they contained more functional wording, process-step ambiguity, specialized terminology, and borderline equivalence patterns. These domain-level results remain exploratory because each domain contains only 25 synthetic pairs.
Overall, the detection results support the feasibility and analytical value of the DIKWP network as a proof-of-concept decision-support framework. They do not establish generalizable litigation-level accuracy. Broader validation will require externally collected patent–product pairs, expert-annotated claim charts, public litigation materials, and larger cross-domain corpora.
6.5. Efficiency and Scalability
The evaluation used a workstation equipped with an 8-core CPU and 16 GB RAM. For a single patent–product pair, the DIKWP network required an average runtime of 2.5 s. Linguistic parsing, entity extraction, and knowledge-graph construction accounted for approximately 2.0 s of this runtime, whereas the reasoning stage typically required less than 0.5 s. This distribution shows that the main computational bottleneck lies in the information dimension rather than in the knowledge or wisdom dimensions.
Table 22 provides a module-level breakdown of latency and resource usage. The table shows that linguistic parsing and graph construction dominate end-to-end latency, while semantic inference consumes the largest share of CPU and memory. Purpose validation and report generation introduce comparatively little overhead. This pattern is consistent with the DIKWP network interpretation of the prototype: the main cost arises when raw text is transformed into structured semantic evidence, whereas the purpose dimension mainly regulates the analytical posture rather than performing heavy computation itself.
Based on this bottleneck profile, future optimization should focus primarily on reducing the cost of linguistic processing, graph construction, and candidate alignment, rather than on legal-rule execution itself.
Table 23 summarizes the main engineering bottlenecks and corresponding optimization strategies.
Scalability tests were performed by comparing one product description against increasing numbers of patents, with each patent represented by an average of ten claims. Under this setting, the system processed approximately 100 patents, corresponding to roughly 1000 claims, against a single product in about three minutes. The observed scaling behavior was approximately linear. This is expected because most claim-to-product analyses are independent once the documents have been preprocessed. The same linear pattern was observed when a single patent was compared against many product descriptions.
Memory usage remained moderate throughout the experiments. Ontology loading and graph construction required only a few megabytes per case, and even when processing about one thousand comparisons in batch mode, memory consumption remained within a few hundred megabytes. These results suggest that the current architecture is computationally feasible for offline professional analysis and can be further improved through batching, pre-parsing, or parallel execution. Since the individual patent–product analyses are largely independent, the system is also amenable to straightforward parallelization.
6.6. Error Analysis
A more detailed inspection of the errors provides insight into the current limitations of the framework. Out of the 50 positive cases, the system missed 9, which corresponds to the reported recall of 0.82. Five of these false negatives were caused primarily by linguistic parsing failures. In these cases, complex claim syntax was segmented incorrectly, leading to incomplete or distorted claim-element representations. For example, compound technical noun phrases were occasionally split into separate units, causing the system to search for an incorrect feature in the product description. This finding suggests that the information dimension remains the most fragile part of the current architecture.
The remaining four false negatives were caused by difficult equivalence patterns that were not captured by the existing ontology or rule base. In such cases, the accused product did not use the same term or explicit structure as the patent claim, but the functional similarity was still strong enough that a human expert might consider equivalence. These misses indicate that the present equivalence mechanism remains operational and simplified rather than jurisprudentially complete.
Out of the 50 negative cases, the system falsely flagged 6 as infringing. Together with 41 true positive cases, this yields a revised precision of 0.87, which is consistent with the default-mode confusion matrix reported in
Table 19. Most of these false positives occurred when all major components appeared to be present, but the claim actually depended on a more specific limitation that the current prototype did not model deeply enough. Typical examples included numerical ranges, compositional proportions, or process constraints. In such cases, the knowledge graph captured the presence of the relevant elements but did not fully represent the quantitative or procedural condition that restricted the claim scope. This suggests that future work should strengthen support for numerical claim interpretation, method claims, and more detailed legal claim construction.
The real-world pilot corpus produced a different error profile from the controlled synthetic corpus. The most frequent difficulty was not direct contradiction but incomplete product evidence. Manufacturer brochures and webpages often describe product advantages, high-level functions, or commercial specifications without disclosing all structural or process details required by a patent claim. In these cases, the system frequently assigned uncertain or non-mentioned evidence states. This behavior lowered the number of definitive outputs but improved legal caution because the system did not treat non-disclosure as proof of absence.
A second difficulty was domain-specific terminology. Some real product documents used commercial or engineering terms that differed from claim language even when they referred to related structures or functions. The ontology and knowledge-graph layer reduced some of these mismatches, but several cases still required expert review. A third difficulty involved borderline functional substitutions, where a product feature performed a similar function but used a different structure or operating mode. In such cases, the system flagged equivalence concern rather than producing a final infringement conclusion.
Taken together, these errors are likely to become more pronounced in real litigation materials. Broad functional claim terms may cause the ontology matcher to over-generalize correspondences, while highly specific modifiers may be missed if they are embedded in long dependent claims or technical descriptions. Accordingly, the model is expected to perform best as a screening and explanation tool when the available evidence is textually explicit, and to require stronger human-in-the-loop review when claim language is intentionally broad, functionally defined, or dependent on jurisdiction-specific claim construction.
From an explanation perspective, the system performed relatively well. Manual inspection showed that in 98 of 100 quantitative cases the explanation correctly named the decisive matched or unmatched elements. In the remaining two cases, the final label was correct but one extracted element name was imprecise because of a parser error. This finding is encouraging because it shows that explanation quality is closely tied to extraction quality; when the information dimension is correct, the downstream explanation is usually coherent and legally interpretable.
6.7. Contribution of the DIKWP Network
The experimental results allow the role of each DIKWP dimension to be assessed more explicitly. The data and information dimensions provide the foundational semantic evidence. When parsing quality deteriorates, the rest of the network is forced to reason over incomplete or distorted representations. This was visible in the false-negative cases caused by claim-segmentation errors. In practical terms, the DIKWP network cannot compensate indefinitely for poor information extraction; errors introduced early in the process propagate unless they are corrected through feedback.
The knowledge dimension contributes mainly through ontology-based normalization and semantic alignment. When this dimension was weakened in the ablated configurations, recall decreased substantially because synonymy, taxonomy, and part–whole relations were no longer captured reliably. For example, expressions such as back support and backrest, or sitting surface and seat, are easy for human experts to align but may be missed by purely lexical systems. The ontology and knowledge graph therefore play a decisive role in bridging the linguistic gap between patent language and product descriptions.
The wisdom dimension contributes by transforming local semantic correspondences into a legally meaningful decision. Without this dimension, the system can report that several elements appear semantically related, but it cannot reliably determine whether the claim as a whole is covered. The performance difference between the ontology-only configuration and the full system shows that claim-level logical consistency materially improves precision. In particular, the all-elements rule prevented the DIKWP network from treating limited semantic correspondences as sufficient evidence of infringement when the broader claim structure did not support that conclusion.
The purpose dimension contributes by controlling the operating posture of the system. Its effect is visible in the contrast between the aggressive and conservative configurations. More importantly, the purpose dimension does not merely adjust a fixed threshold after the fact. It regulates the interpretation of uncertain matches, the activation of equivalence-oriented reasoning, and the framing of the final report. This is why the present system is better understood as a DIKWP network than as a fixed stack of modules. The purpose dimension can influence how the wisdom dimension treats ambiguity, and the wisdom dimension can in turn request additional support from the knowledge or information dimensions when the available evidence is insufficient. The empirical behavior of the system is therefore consistent with a recurrent and purpose-sensitive networked architecture.
7. Discussion
The results of this study suggest that patent infringement detection benefits substantially from combining semantic text processing with structured knowledge representation and purpose-aware reasoning. The experimental findings indicate that purely lexical or keyword-based matching is insufficient for claim-level infringement assessment, especially when product descriptions and patent claims express similar technical content through different terminology or functional phrasing. By contrast, the proposed framework improves analytical reliability by connecting patent-oriented NLP, ontology-based normalization, knowledge-graph construction, and rule-guided reasoning within a unified DIKWP network. This result is consistent with broader developments in legal AI and explainable AI, where structured knowledge and transparent reasoning are increasingly recognized as necessary in high-stakes decision-support systems.
A central implication of this work is that the knowledge dimension is not merely an auxiliary enhancement to text analytics, but a core requirement for legally meaningful patent analysis. The ontology and knowledge graph enable the system to preserve structural, functional, and relational information that would otherwise be weakened or lost in flat text similarity approaches. This is particularly important in patent law, where infringement depends on the presence or absence of specific claim elements and their relations rather than on global topical resemblance alone. In this respect, the present study supports the view that guarded symbolic–statistical architectures are especially suitable for legal-technical domains, because they combine the semantic flexibility of NLP with the explicit interpretability of formal knowledge models [
13].
The study also shows the methodological value of treating DIKWP as a networked design paradigm rather than as a hierarchical checklist. In the proposed system, purpose does not appear only at the end of the reasoning process. Instead, it functions as an active control dimension that regulates matching sensitivity, the treatment of borderline correspondences, and the form of the final analytical report. This became visible in the evaluation, where aggressive and conservative operating modes produced different precision–recall trade-offs without changing the underlying evidential basis. The effect is conceptually important because it demonstrates that the same semantic evidence can support different analytical postures depending on whether the user’s objective is enforcement, clearance, or design-around support. In this sense, the DIKWP network contributes not only to system architecture but also to analytical controllability.
Another important contribution concerns explainability and user trust. Patent professionals are unlikely to rely on AI tools that produce conclusions without traceable justification. The present framework addresses this issue by preserving an evidence path from raw text to extracted elements, from extracted elements to ontology-based correspondences, and from those correspondences to claim-level reasoning outcomes. This layered evidence trace is particularly valuable in legal settings because it supports both user confidence and post hoc review. If the system reaches a plausible but incorrect conclusion, the error can usually be localized to a specific stage, such as claim parsing, ontology mapping, or equivalence reasoning. This is an important practical advantage over opaque predictive systems, especially in regulated domains where justification and accountability matter as much as output accuracy.
From a computational perspective, the evaluation suggests that the proposed DIKWP network is feasible for offline analytical use. The main runtime burden lies in the information dimension, particularly in linguistic parsing and entity extraction, whereas ontology-based reasoning and graph-level matching remain relatively lightweight in the current implementation. This indicates that future optimization should focus more on patent-specific NLP efficiency than on reducing the cost of rule execution. A pragmatic deployment strategy would therefore use staged analysis: a lightweight retrieval or filtering component could first narrow the candidate set, after which the full semantic and reasoning workflow would be applied only to the most relevant patent–product pairs. Such a two-stage approach would preserve interpretability while making the system more scalable for industrial use.
The present work also clarifies the role of the wisdom dimension. In practical terms, this dimension corresponds to the part of the architecture that transforms local semantic correspondences into a legally meaningful assessment. The ontology may indicate that two concepts are related, but that does not by itself establish infringement. Claim-level reasoning still requires consistency checking, element aggregation, and treatment of ambiguity. The wisdom dimension performs this function by enforcing the all-elements rule, handling uncertain matches, and determining whether function-based equivalence should be considered. It is precisely this layer that prevents the system from confusing semantic relatedness with legal sufficiency.
Although the architecture is informed by artificial consciousness theory, the contribution should be interpreted cautiously. The present system does not claim machine consciousness in any phenomenological sense. Rather, the relevance of artificial consciousness lies in the architectural emphasis on purpose-aware coordination, reflective explanation, and adaptive control. In this limited engineering sense, the DIKWP network offers a useful way to operationalize higher-order design principles without making ontologically strong claims about consciousness itself. This restrained interpretation is preferable in a journal context, because it preserves the conceptual contribution while avoiding unnecessary overstatement [
42,
43,
44].
The addition of the real-world pilot corpus strengthens the external validity of the evaluation, while also showing that patent–product analysis under realistic evidentiary conditions is more ambiguous than controlled synthetic testing. Real patent claims often contain broad functional expressions, nested limitations, numerical ranges, and terminology shaped by drafting and prosecution strategy. Real product descriptions, by contrast, are frequently incomplete, promotional, or selectively drafted, and may omit structural, compositional, or process details that are legally material to infringement analysis. The pilot evaluation therefore suggests that an infringement-support system should not be evaluated only by binary classification accuracy, but also by its ability to distinguish supported evidence, contradicted evidence, uncertain evidence, and non-mentioned limitations.
The real-world pilot should nevertheless be interpreted cautiously. Its scale remains limited, and publicly available product evidence does not always disclose all product features that would be material in litigation. The pilot therefore should not be treated as a litigation-level benchmark. Rather, it serves as an external-validity check showing that the DIKWP network can preserve claim-element reasoning, identify unsupported or non-mentioned limitations, and route uncertain or borderline cases to expert review under more realistic evidentiary conditions.
At the same time, several limitations should be acknowledged. First, the current system handles relatively straightforward claim structures more effectively than highly intricate dependent claims or method claims. Patent claims often include nested constraints, numerical ranges, process steps, and functionally defined limitations that require more detailed semantic and legal modeling than the current prototype provides. Second, although the framework includes a simplified equivalence mechanism, this component should not be confused with a full implementation of doctrine-of-equivalents reasoning in its jurisdiction-specific legal complexity. Third, the current ontology remains limited in scope and would require significant extension for broader deployment across domains such as biotechnology, chemistry, or telecommunications.
A related limitation concerns uncertainty. The present system can assign confidence-oriented outputs and flag borderline cases, but it does not yet provide a rich formal treatment of evidential uncertainty, incompleteness, or conflicting sources. In real-world practice, product descriptions are often incomplete or strategically vague, and missing mention should not always be interpreted as absence. Future versions of the architecture should therefore make a clearer distinction between explicit negative evidence, uncertain evidence, and non-mentioned features. This would improve both legal robustness and practical usability.
The system also faces the classic knowledge acquisition bottleneck. Building and maintaining legal and technical ontologies remains labor-intensive, and performance depends heavily on ontology quality. Although automatic extraction, lexical resources, learned embeddings, and graph-completion methods can accelerate ontology expansion, the final incorporation of legal and technical concepts should remain governed by expert validation. This is not merely an engineering limitation but a governance requirement in legal AI, because incorrect ontology updates may directly affect infringement-risk assessment. Accordingly, future ontology maintenance should combine semi-automatic candidate generation with provenance tracking, consistency checking, and expert review.
Recent advances in large language models also raise an important question: whether a general-purpose LLM could replace part of the proposed DIKWP network. In practice, LLMs may be useful in extraction, paraphrase normalization, or candidate correspondence generation. However, their outputs are probabilistic and may be inconsistent or insufficiently constrained by legal doctrine. For this reason, the more promising direction is not replacement, but careful integration. An LLM can assist with semantic flexibility, while ontology-based validation and rule-guided reasoning maintain determinism, traceability, and legal consistency. This kind of guarded hybridization would be fully compatible with the DIKWP network and may strengthen both coverage and robustness.
Finally, although this study focuses on patent infringement detection, the broader architectural principle may generalize to other legal domains. Tasks such as claim construction support, freedom-to-operate analysis, contract compliance checking, and trademark risk analysis all require some combination of semantic extraction, structured legal knowledge, explicit reasoning, and purpose-sensitive reporting. The specific ontology and rule base would differ, but the DIKWP-network formulation remains applicable. This suggests that the present work may have value beyond patents as a more general model for explainable legal-semantic AI.
8. Conclusions and Future Work
This paper presented a semantic AI framework for patent infringement detection grounded in the DIKWP network and informed, in a limited architectural sense, by artificial consciousness theory. The framework transforms raw patent and product descriptions into structured, explainable infringement-oriented assessments by coordinating five interacting semantic dimensions: data, information, knowledge, wisdom, and purpose. Unlike conventional linear pipelines, the proposed system is formulated as a recurrent network in which semantic extraction, knowledge organization, decision logic, and task objectives interact during analysis.
The main contribution is to show that patent infringement detection can be modeled as an integrated process of patent-oriented NLP, ontology-based semantic alignment, knowledge-graph construction, legal rule-guided reasoning, uncertainty handling, and purpose-aware orchestration. At the architectural level, the study instantiates the DIKWP network as a modular legal-AI system. At the methodological level, it develops a claim-to-product semantic workflow capable of preserving claim-limitation structure, handling negative evidence, and producing interpretable reasoning traces. At the implementation level, it provides a proof-of-concept prototype and illustrative outputs. At the evaluation level, the study provides two forms of evidence. The controlled synthetic corpus shows that the proposed DIKWP network improves over keyword-based and ontology-only baselines under known positive, negative, and near-miss conditions. The real-world pilot corpus further examines whether the framework can preserve claim-element reasoning when applied to public patent claims and real product technical descriptions under more realistic evidentiary conditions.
Theoretically, the study further clarifies the epistemological division of labor between statistical candidate generation and symbolic legal validation. Statistical components provide semantic flexibility and uncertainty estimates, while symbolic components preserve legal constraints, evidentiary provenance, and auditable reasoning. This guarded hybridization is especially important for patent infringement analysis because legally sufficient claim coverage cannot be reduced to document-level similarity or unconstrained predictive optimization.
The literature synthesis situates the proposed framework within patent retrieval, patent NLP, transformer-based representation learning, knowledge graphs, ontology engineering, claim-construction doctrine, explainable AI, legal information extraction, and LLM reliability research. This positioning clarifies the intended role of the system: it is a decision-support assistant rather than an autonomous legal decision maker. Its goal is to augment patent professionals by automating large-scale element comparison, surfacing plausible correspondences, identifying missing or uncertain limitations, and producing reviewable evidence traces.
Future research should extend this pilot validation to larger expert-annotated patent–product datasets, richer public patent examination and litigation materials, multimodal patent drawings and product images, and jurisdiction-specific claim-construction and prosecution-history evidence. Additional work should improve claim parsing for complex legal syntax, strengthen numerical and method-claim interpretation, formalize evidential uncertainty and source provenance, optimize deep NLP components through distillation and hardware acceleration, and explore human-in-the-loop adaptive learning. More broadly, the study suggests that cognitive-inspired frameworks such as DIKWP can make a practical contribution to legal AI when translated into explicit system design principles rather than treated only as abstract theory.