Repairing DNN Numerical Defects with Semantic-Driven Knowledge Graph Retrieval
Abstract
1. Introduction
- Constructing a conceptual semantic framework for DNN numerical defects to unify and interrelate heterogeneous defect–repair knowledge. A structured conceptual semantic framework is proposed to model DNN numerical defects through explicit representations of faulty components, numerical anomaly states, repair operations, and domain constraints. By focusing on conceptual representations, this framework unifies heterogeneous defect–repair artifacts, enabling transferable defect–repair knowledge beyond individual code instances.
- Developing a multi-view semantic-knowledge-driven graph indexing and hybrid retrieval method. Building on the proposed semantic framework, an index-based GraphRAG extension for numerical defect retrieval is introduced, where multi-view semantic subgraphs are constructed to capture complementary aspects of defect knowledge, and a hybrid retrieval mechanism is designed to integrate structure-aware subgraph matching with semantic vector similarity, allowing precise yet flexible retrieval of relevant defect–repair knowledge grounded in domain semantics.
- Proposing a knowledge-guided LLM-based repair generation pipeline for DNN numerical defects. A knowledge-guided repair pipeline is developed to leverage the retrieved semantic context to guide large language models, reducing hallucinations and improving the reliability of numerical defect–repair generation.
- RQ1: Comparative Evaluation of Retrieval Effectiveness. Does NCKG enable more semantically accurate retrieval of numerical defect–repair knowledge compared to existing retrieval methods, as measured by the semantic alignment between the retrieved mitigation strategies and the ground-truth strategies?
- RQ2: Impact of the Retrieved Context on Repair Generation. How does the quality of the final generated repair change when large language models are augmented with NCKG-retrieved semantic contexts across different retrieval backbones and generative models?
- RQ3: Ablation Study of the Hybrid Retrieval Mechanism. What are the individual contributions of the graph-based and vector-based retrieval components within the hybrid NCKG framework to overall repair effectiveness?
2. Related Work
2.1. Knowledge Engineering for Software Tasks
2.2. Prompt Engineering and Retrieval-Augmented Generation for LLMs in Code Processing
2.3. Automated Repair for DNN Numerical Defects
3. Methodology
3.1. Unified Framework of Conceptual Semantics and Relations for DNN Numerical Defect
3.1.1. Numerical Defect Semantic Elements Definition
3.1.2. Semantic Relation Schema
3.2. Graph-Index Construction via Semantic Subgraphs
3.3. Hybrid Semantic Retrieval over Graph Indices
3.3.1. Query Representation
3.3.2. Subgraph-Based Similarity Retrieval
- Node Overlap Similarity: The Jaccard similarity coefficient between node sets measures the conceptual commonality:where intersection is defined as nodes sharing identical semantic attributes, accounting for potential variations through normalization techniques such as lower casing and those stemming for textual attributes.
- Edge Overlap Similarity: Structural similarity is measured via edge set overlap:where two edges are considered identical if they share the same source node attribute, target node attribute, and relation type r.
3.3.3. Multi-View Score Aggregation and Hybrid Retrieval
| Algorithm 1 Pseudocode of the hybrid retrieval method |
|
3.4. Knowledge-Guided Repair Generation
3.4.1. Prompt Design for Contextual Repair Generation
- 1.
- Instruction Header: A fixed natural language instruction specifying the repair task.
- 2.
- Retrieved Exemplars Block: A sequence of retrieved exemplars, formatted as shown below.
- 3.
- Target Problem Block: The query’s defective code with a placeholder for the model to complete.
| Listing 1. Knowledge-Guided Repair Prompt Template. |
![]() |
3.4.2. Generation and Post-Processing
4. Experimental Setup
4.1. Dataset
4.2. Baseline Methods
- BM25 baseline—a classic lexical retrieval model that relies on exact keyword matching and term frequency statistics. BM25 is included as a representative lexical-based retrieval baseline and is applied to the concatenated textual fields of each defect–fix pair, including code tokens and commit messages. This baseline represents shallow lexical matching methods that do not explicitly model semantic or structural relationships.
- DPR baseline—a neural dual-encoder retrieval model following the DPR paradigm, where queries and candidate passages are independently encoded into a shared dense vector space. In our implementation, both encoders are instantiated using GraphCodeBERT, a code-aware pretrained model that captures syntactic and semantic properties of the source code. The model is fine-tuned on the training split using defective contexts as queries and corresponding fix contexts as positive passages.
- GPT-Neo (125M, 1.3B, and 2.7B parameters)—autoregressive decoder-only models. Developed by EleutherAI, this family of models replicates the GPT-3 architecture using the open-source GPT-NeoX framework. These models were trained on The Pile, a diverse 825 GB English text corpus, providing strong general language-understanding capabilities.
- Phi-2 (2.7B parameters)—developed by Microsoft, this compact transformer model demonstrates remarkable reasoning abilities despite its relatively small parameter count. Phi-2 employs innovative training techniques including “textbook-quality” synthetic data generation and reinforcement learning from human feedback (RLHF), resulting in superior performance on reasoning benchmarks compared to models of a similar size.
- DeepSeekCoder (6.7B parameters)—a series of code-specialized LLMs pretrained on a corpus of 2 trillion tokens across 87 programming languages. In this study, we use the DeepSeekCoder-6.7B-Instruct version. It employs an advanced “fill-in-the-middle” training objective that enables bidirectional context awareness, making it particularly suitable for code completion and repair tasks.
- CodeLlama (7B parameters)—Meta’s code-optimized variant of the Llama 2 architecture, further pretrained on 500B tokens of code-specific data. The model demonstrates state-of-the-art performance on code generation benchmarks including HumanEval and MBPP.
4.3. Evaluation Metrics
4.3.1. Retrieval Metrics
- 1.
- Exact Match@K: The proportion of queries where the top-K retrieved results contain at least one exact match of the ground-truth mitigation strategy. This measures the retrieval system’s ability to return perfectly relevant fixes within the top-K positions.where Q is the total number of queries, is the strategy of the j-th retrieved result for query i, and is the ground-truth strategy.
- 2.
- Reciprocal Rank (RR) and Mean Reciprocal Rank (MRR): The reciprocal of the rank at which the first exact strategy match occurs. For a single query q, the reciprocal rank is defined aswith if no exact match is found. The MRR is the average of across all queries:
- 3.
- Overall Success Rate (OSR): The proportion of queries for which at least one exact strategy match exists within the entire retrieved list. In our implementation, we retrieve up to 10 results per query, so this is equivalent to EM@10. Formally,where R is the total number of retrieved results per query (10 in our experiments). Note that the OSR provides an upper bound on recall given the retrieval depth constraint.
4.3.2. Generation Metrics
- 1.
- Automated Assessment with an LLM: We employ a structured prompt template to instruct a Deepseek-R1-8b model to act as a software repair evaluation expert. The template asks for three analyses:
- Strategy Match: It measures whether the generated repair’s mitigation strategy aligns with the ground truth. The confidence score (match_confidence) directly serves as .
- Code Similarity: It quantifies resemblance to the ground-truth fix through three sub-scores: syntactic, semantic, and textual similarity.
- Feasibility: It assesses practical viability through three sub-scores: compilation feasibility, logical correctness, and the risk of new bugs.
The output is constrained to a specific JSON format containing scores and reasoning. From this JSON, we extract numerical metrics and calculate their averages as final indicators. The rationale behind each metric’s score is also preserved to facilitate subsequent manual verification. - 2.
- Human Validation: To ensure reliability, a subset of the Deepseek evaluations (stratified by score ranges) is reviewed by human experts. Experts verify the correctness of the strategy classification and the plausibility of the similarity/feasibility scores, correcting any clear discrepancies. The final reported generation metrics are based on this validated set.
4.4. Implementation and Configuration
5. Results
5.1. RQ1: Comparative Evaluation of Retrieval Effectiveness
5.2. RQ2: Impact of Retrieved Context on Repair Generation
5.3. RQ3: Ablation Study of the Hybrid Retrieval Mechanism
6. Discussion
6.1. The Necessity of Structured Semantic Retrieval (RQ1)
6.2. From Accurate Retrieval to Reliable Generation (RQ2)
6.3. Synergy of the Hybrid Design (RQ3)
6.4. Limitation and Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Detailed Definition of Conceptual Semantics for DNN Numerical Defect
Appendix A.1. Detailed Definition of Conceptual Semantics for DNN Numerical Defect
- denotes the high-level phase or module type, drawn from the enumerated set .
- f is a descriptor specifying the particular function or sub-module (e.g., softmax).
- is a descriptor indicating the core mathematical operation involved (e.g., matrix multiplication, exponential, or normalization).
- is the observed symptom or anomaly type from the enumerated set .
- ρ is a description of the hypothesized root cause (e.g., overflow, underflow, or GradientExplosion).
- χ is a textual context providing additional situational details about the problem’s manifestation.
- is the high-level strategy, drawn from change variable type, increase variable precision, rewrite math formula, use a different algorithm, add warning, add overflow check, limit input range, Other}.
- μ is a string descriptor specifying the concrete method or implementation of the strategy.
- is the knowledge type from .
- ξ is a textual knowledge context (e.g., a snippet of forum discussion, a mathematical principle, or a code implementation).
Appendix A.2. Automated Extraction Process and Semantic Concept Alignment
| Semantic Elements | Corresponding FIDES/OSDEP Knowledge | Source | Alignment Rationale and Purpose |
|---|---|---|---|
| Execution context (C), numerical defect (), and mitigation method (M) | Execution–Executor–Procedure (EEP) ODP and Result–Context (RC) ODP | FIDES | Reusable structural pattern—Formally link processes (defect and fix), their executors (DNN component), and the circumstantial context of their occurrence. |
| Phase type () | Model: What is the objective of the model? | FIDES | Modeling-phase alignment—Anchors the defect and repair within a specific stage of the ML’s life cycle, enabling phase-aware reasoning and mitigation. |
| Function descriptor () | Model: Which package implements the algorithm? | FIDES | Implementation context—Identifies the specific component where the defect manifests, linking the implementation-level context. |
| Operation descriptor () | Model: What is the base algorithm of the ML-base model? | FIDES | Algorithmic context—Specifies the core algorithm or mathematical operation involved, connecting the computational foundation of the defect. |
| Symptom type () | Failure: A subtype of event that brings about a failure state. | OSDEF | Failure characterization—Aligns with observable failure events in software, providing a taxonomy for the external manifestations of numerical defects. |
| Root cause () | Defect: A subtype of vulnerability that inheres in an object. | OSDEF | Defect root cause—Disambiguates the underlying bug from its symptomatic errors or failures. |
| Contextual description () | Error: The incorrect internal state. | OSDEF | Error description—Represents the erroneous computational state that bridges the internal defect and the external failure, enriching the semantic description for defect. |
| Constraint (K) | Execution-Executor-Procedure (EEP) ODP | FIDES | Validity anchoring—Provides a condition or property that anchors the validity of a procedure. |
Appendix B. Extensible Evaluation Process for Generated Repairs
- Strategy match can be replaced or complemented by
- -
- Rule-based strategy classifiers;
- -
- Ontology-driven matching over defect–fix concepts;
- -
- Supervised classifiers trained on annotated defect–strategy pairs.
- Code similarity can be computed using
- -
- CodeBLEU [41], which combines n-gram matching, AST structure matching, data-flow matching, and semantic weighting for code;
- -
- Embedding-based code similarity models independent of LLM reasoning.
- Feasibility can be strengthened through
- -
- Static syntax checking and type checking;
- -
- Static analysis tools targeting numerical safety;
- -
- Execution-based validation when runnable environments and test data are available.
References
- Humbatova, N.; Jahangirova, G.; Bavota, G.; Riccio, V.; Stocco, A.; Tonella, P. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion 2020), Seoul, Republic of Korea, 5–11 October 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 1110–1121. [Google Scholar] [CrossRef]
- Harzevili, N.S.; Shin, J.; Wang, J.; Wang, S.; Nagappan, N. Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries. In Proceedings of the 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), Melbourne, Australia, 15–16 May 2023; Curran Associates, Inc.: Red Hook, NY, USA, 2023; pp. 27–38. [Google Scholar] [CrossRef]
- Zhang, Y.; Ren, L.; Chen, L.; Xiong, Y.; Cheung, S.C.; Xie, T. Detecting numerical bugs in neural network architectures. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering; ACM: New York, NY, USA, 2020; pp. 826–837. [Google Scholar] [CrossRef]
- Yan, M.; Chen, J.; Zhang, X.; Tan, L.; Wang, G.; Wang, Z. Exposing numerical bugs in deep learning via gradient back-propagation. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering; ACM: New York, NY, USA, 2021; pp. 627–638. [Google Scholar] [CrossRef]
- Kloberdanz, E.; Kloberdanz, K.G.; Le, W. DeepStability: A study of unstable numerical methods and their solutions in deep learning. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 25–27 May 2022; ACM: New York, NY, USA, 2022; pp. 586–597. [Google Scholar] [CrossRef]
- Wang, G.; Wang, Z.; Chen, J.; Chen, X.; Yan, M. An Empirical Study on Numerical Bugs in Deep Learning Programs. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, MI, USA, 10–14 October 2022; ACM: New York, NY, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Zhang, X.; Zhai, J.; Ma, S.; Shen, C. AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021; IEEE Press: New York, NY, USA, 2021; pp. 359–371. [Google Scholar] [CrossRef]
- Wardat, M.; Cruz, B.D.; Le, W.; Rajan, H. DeepDiagnosis: Automatically diagnosing faults and recommending actionable fixes in deep learning programs. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 25–27 May 2022; ACM: New York, NY, USA, 2022; pp. 561–572. [Google Scholar] [CrossRef]
- Li, L.; Zhang, Y.; Ren, L.; Xiong, Y.; Xie, T. Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; IEEE Press: New York, NY, USA, 2023; pp. 1827–1839. [Google Scholar] [CrossRef]
- Shang, Y.; Liu, S. LRNN: A Formal Logic Rules-Based Neural Network for Software Defect Prediction. In Proceedings of the Formal Methods and Software Engineering: 25th International Conference on Formal Engineering Methods, ICFEM 2024, Hiroshima, Japan, 2–6 December 2024; Springer Nature: Berlin/Heidelberg, Germany, 2024; pp. 106–124. [Google Scholar] [CrossRef]
- Abdu, A.; Zhai, Z.; Abdo, H.A.; Lee, S.; Al-masni, M.A.; Gu, Y.H.; Algabri, R. Cross-project software defect prediction based on the reduction and hybridization of software metrics. Alex. Eng. J. 2025, 112, 161–176. [Google Scholar] [CrossRef]
- Yamaguchi, F.; Golde, N.; Arp, D.; Rieck, K. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of the 2014 IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 18–21 May 2014; IEEE Computer Society: Washington, DC, USA, 2014; pp. 590–604. [Google Scholar]
- Xu, J.; Ai, J.; Liu, J.; Shi, T. ACGDP: An Augmented Code Graph-Based System for Software Defect Prediction. IEEE Trans. Reliab. 2022, 71, 850–864. [Google Scholar] [CrossRef]
- Radjenović, D.; Heričko, M.; Torkar, R.; Živkovič, A. Software fault prediction metrics: A systematic literature review. Inf. Softw. Technol. 2013, 55, 1397–1418. [Google Scholar] [CrossRef]
- Muthukumaran, K.; Choudhary, A.; Murthy, N.B. Mining GitHub for novel change metrics to predict buggy files in software systems. In Proceedings of the 2015 International Conference on Computational Intelligence and Networks, Odisha, India, 12–13 January 2015; IEEE Press: New York, NY, USA, 2015; pp. 15–20. [Google Scholar]
- Ai, J.; Su, W.; Zhang, S.; Yang, Y. A software network model for software structure and faults distribution analysis. IEEE Trans. Reliab. 2019, 68, 844–858. [Google Scholar] [CrossRef]
- Phan, A.V.; Le Nguyen, M.; Bui, L.T. Convolutional neural networks over control flow graphs for software defect prediction. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA, 6–8 November 2017; IEEE Press: New York, NY, USA, 2017; pp. 45–52. [Google Scholar]
- Wang, S.; Liu, T.; Nam, J.; Tan, L. Deep semantic feature learning for software defect prediction. IEEE Trans. Softw. Eng. 2018, 46, 1267–1293. [Google Scholar] [CrossRef]
- Zhao, Z.; Yang, B.; Li, G.; Liu, H.; Jin, Z. Precise learning of source code contextual semantics via hierarchical dependence structure and graph attention networks. J. Syst. Softw. 2022, 184, 111108. [Google Scholar] [CrossRef]
- Wang, L.; Sun, C.; Zhang, C.; Nie, W.; Huang, K. Application of knowledge graph in software engineering field: A systematic literature review. Inf. Softw. Technol. 2023, 164, 107327. [Google Scholar] [CrossRef]
- Schumi, R.; Sun, J. ExAIS: Executable AI semantics. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 25–27 May 2022; ACM: New York, NY, USA, 2022; pp. 859–870. [Google Scholar] [CrossRef]
- Zhou, Q.; Zhou, D.; Dai, C.; Chen, J.; Guo, Z. Knowledge-driven innovation in industrial maintenance: A neural-enhanced model-based definition framework for lifecycle maintenance process information propagation. J. Manuf. Syst. 2025, 82, 976–999. [Google Scholar] [CrossRef]
- Xia, L.; Liang, Y.; Leng, J.; Zheng, P. Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network. Reliab. Eng. Syst. Saf. 2023, 232, 109068. [Google Scholar] [CrossRef]
- Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv 2024, arXiv:2402.07927. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Dong, Q.; Li, L.; Dai, D.; Zheng, C.; Ma, J.; Li, R.; Xia, H.; Xu, J.; Wu, Z.; Chang, B. A survey on in-context learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; Association for Computational Linguistics: Kerrville, TX, USA, 2024; pp. 1107–1128. [Google Scholar]
- Yin, X.; Ni, C.; Wang, S.; Li, Z.; Zeng, L.; Yang, X. Thinkrepair: Self-directed automated program repair. In ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 16–20 September 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1274–1286. [Google Scholar]
- Liu, Z.; Du, X.; Liu, H. ReAPR: Automatic program repair via retrieval-augmented large language models. Softw. Qual. J. 2025, 33, 30. [Google Scholar] [CrossRef]
- Trotman, A.; Puurula, A.; Burgess, B. Improvements to BM25 and language models examined. In ADCS ’14: Proceedings of the 19th Australasian Document Computing Symposium; Association for Computing Machinery: New York, NY, USA, 2014; pp. 58–65. [Google Scholar]
- Karpukhin, V.; Oguz, B.; Min, S.; Lewis, P.S.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.t. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Association for Computational Linguistics: Kerrville, TX, USA, 2020; pp. 6769–6781. [Google Scholar]
- Zhang, Q.; Chen, S.; Bei, Y.; Yuan, Z.; Zhou, H.; Hong, Z.; Dong, J.; Chen, H.; Chang, Y.; Huang, X. A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv 2025, arXiv:2501.13958. [Google Scholar]
- Li, Z.; Chen, X.; Yu, H.; Lin, H.; Lu, Y.; Tang, Q.; Huang, F.; Han, X.; Sun, L.; Li, Y. Structrag: Boosting knowledge intensive reasoning of llms via inference-time hybrid information structurization. arXiv 2024, arXiv:2410.08815. [Google Scholar]
- Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Metropolitansky, D.; Ness, R.O.; Larson, J. From local to global: A graph rag approach to query-focused summarization. arXiv 2024, arXiv:2404.16130. [Google Scholar] [CrossRef]
- Sun, J.; Xu, C.; Tang, L.; Wang, S.; Lin, C.; Gong, Y.; Ni, L.M.; Shum, H.Y.; Guo, J. Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph. arXiv 2023, arXiv:2307.07697. [Google Scholar]
- Ma, S.; Xu, C.; Jiang, X.; Li, M.; Qu, H.; Guo, J. Think-on-graph 2.0: Deep and interpretable large language model reasoning with knowledge graph-guided retrieval. arXiv 2024, arXiv:2407.10805. [Google Scholar]
- Liu, J.; Ai, J.; Su, H.; Shi, T. Enhancing Reliability Assurance for DNN against Numerical Defect with Large Language Models. In Proceedings of the 2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE), São Paulo, Brazil, 21–24 October 2025; IEEE Press: New York, NY, USA, 2025; pp. 300–310. [Google Scholar] [CrossRef]
- Fernandez, I.; Aceta, C.; Gilabert, E.; Esnaola-Gonzalez, I. FIDES: An ontology-based approach for making machine learning systems accountable. J. Web Semant. 2023, 79, 100808. [Google Scholar] [CrossRef]
- Duarte, B.B.; Falbo, R.A.; Guizzardi, G.; Guizzardi, R.S.; Souza, V.E. Towards an Ontology of Software Defects, Errors and Failures; Springer: Berlin/Heidelberg, Germany, 2018; pp. 349–362. [Google Scholar]
- Guizzardi, G. Ontological Foundations for Structural Conceptual Models. Ph.D. Thesis, University of Twente, Enschede, The Netherlands, 2005. [Google Scholar]
- Esnaola-Gonzalez, I.; Bermúdez, J.; Fernandez, I.; Arnaiz, A. EEPSA as a core ontology for energy efficiency and thermal comfort in buildings. Appl. Ontol. 2021, 16, 193–228. [Google Scholar] [CrossRef]
- Ren, S.; Guo, D.; Lu, S.; Zhou, L.; Liu, S.; Tang, D.; Sundaresan, N.; Zhou, M.; Blanco, A.; Ma, S. CodeBLEU: A Method for Automatic Evaluation of Code Synthesis. arXiv 2020, arXiv:2009.10297. [Google Scholar] [CrossRef]






| Semantic Elements | Symbol | Description | Primary Attribute |
|---|---|---|---|
| Execution Context | (WHEN and WHERE) denote the architectural or algorithmic module within a DNN pipeline. | Phase type (), function descriptor (f), and operation descriptor () | |
| Numerical Defect | (WHAT) characterizes the problem and observable manifestation of the numerical defect. | Symptom type (), root cause (), and contextual description () | |
| Mitigation Method | (HOW) categorizes the approach for repairing or mitigating the defect. | Strategy type () and method descriptor () | |
| Constraint | (WHY) documents the origin of the background knowledge. | Knowledge type () and knowledge context () |
| Relation Name r | Source S | Target T | Semantic Description |
|---|---|---|---|
| phase_defines | A component’s high-level type logically encompasses or is implemented by specific functions. | ||
| operation_implements | C.op | A mathematical operation (e.g., exp) is a constituent part or the core computation within a specific function. | |
| symptom_manifests_in | A specific symptom is observed during the execution of a particular function. | ||
| symptom_indicates | A manifested symptom implies or is directly caused by an underlying root cause (e.g., Inf symptom caused by “division by zero”). | ||
| context_informs_cause | The problem context contains information that further explain the root cause. | ||
| cause_suggests_method | An identified root cause dictates or strongly suggests a specific mitigation method. | ||
| context_suggests_method | The problem context dictates or strongly suggests a specific mitigation method. | ||
| strategy_generalizes | A high-level repair/mitigation strategy is concretely implemented by a specific method. | ||
| knowledge_explains_context | External knowledge (e.g., a forum post, a numerical stability principle) provides the rationale or explanation for the observed problem context. | ||
| knowledge_motivates_strategy | External knowledge (e.g., a mathematical principle) motivates or justifies a repair strategy. | ||
| operation_constrained_by | C.op | A particular mathematical operation is associated with specific background knowledge (e.g., “numerical stability trick: subtract max(logit)”). | |
| type_of_knowledge | The knowledge type categorizes the nature or provenance of the knowledge context. |
| Exact Match @1 | Exact Match @3 | Exact Match @5 | Mean Reciprocal Rank | Overall Success Rate | |
|---|---|---|---|---|---|
| BM25 | 0.3330 | 0.2727 | 0.2647 | 0.4610 | 0.6777 |
| DPR | 0.3443 | 0.3333 | 0.2990 | 0.4540 | 0.6883 |
| NCKG | 0.8710 | 0.5520 | 0.4410 | 0.8297 | 0.9460 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, J.; Zhou, Q.; Ai, J.; Shi, T. Repairing DNN Numerical Defects with Semantic-Driven Knowledge Graph Retrieval. Appl. Sci. 2026, 16, 2124. https://doi.org/10.3390/app16042124
Liu J, Zhou Q, Ai J, Shi T. Repairing DNN Numerical Defects with Semantic-Driven Knowledge Graph Retrieval. Applied Sciences. 2026; 16(4):2124. https://doi.org/10.3390/app16042124
Chicago/Turabian StyleLiu, Jingyu, Qidi Zhou, Jun Ai, and Tao Shi. 2026. "Repairing DNN Numerical Defects with Semantic-Driven Knowledge Graph Retrieval" Applied Sciences 16, no. 4: 2124. https://doi.org/10.3390/app16042124
APA StyleLiu, J., Zhou, Q., Ai, J., & Shi, T. (2026). Repairing DNN Numerical Defects with Semantic-Driven Knowledge Graph Retrieval. Applied Sciences, 16(4), 2124. https://doi.org/10.3390/app16042124


