LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts
Abstract
1. Introduction
- We propose and implement DFTAgent, an innovative framework integrating an LLM agent with DFT tools. This includes a methodology for encapsulating legacy EDA tools into a structured toolset for reliable agent interaction.
- We provide a comprehensive empirical evaluation of the framework’s performance on industry-standard benchmarks, demonstrating its advantages in robustness and ease of use over traditional script-based approaches.
- We release the complete implementation of the DFTAgent framework as an open-source project to foster reproducibility and provide a practical baseline for future research in EDA automation, available at: https://github.com/ame-shiro/DFTAgent (accessed on 23 October 2025).
2. Background and Related Work
2.1. DFT Workflows
2.2. LLM Agent Architectures
2.3. Related Work
3. The DFTAgent Framework
3.1. Conceptual Architecture
- Entry Layer: This is the user-facing layer. It consists of a command-line interface for natural language interaction and the LLM Assistant, which serves as the central brain and orchestrator of the entire system. Powered by the DeepSeek model, the LLM Assistant is responsible for interpreting user goals, planning task sequences, and dispatching calls to the appropriate tools.
- Tool Layer: This layer forms the functional framework. It comprises a suite of specialized Python modules. Each encapsulates a specific EDA task: test vector generation, netlist analysis, circuit visualization, fault report parsing, and summary generation.
- Data and External Tools Layer: This layer manages all external dependencies. It includes data sources, supporting a wide range of standard circuit formats such as verilog, bench, and edif. It also interfaces with External Tools, leveraging the power of specialized and pre-existing software. DFTAgent integrates Atalanta [25] for ATPG execution within a wsl environment and Graphviz [26] for circuit visualization.
- Output Layer: The final layer produces tangible and human-readable results. The primary outputs are test vectors in standard formats and circuit diagrams in SVG format, which provide an intuitive visual representation of the netlist.

3.2. Core Agentic Orchestration
3.3. The Toolset: An Interface to the EDA
3.4. Semantic Parsing of Tool Output
- Tool-mode constraints: Our framework does not allow the LLM to act unconstrained. Each function in the toolset, such as generate_test_vectors, is defined with a strict API schema (function signature and docstring) that is made available to DFTAgent. This schema explicitly states, for instance, that the required circuit_file input must be in the .bench format. When DFTAgent plans its next action, it is bound by such constraints, effectively translating the abstract goal of “run ATPG” into the concrete requirement for a valid .bench file.
- Inherent Domain Knowledge: The LLM possesses foundational knowledge of the EDA domain. It understands that while a verilog file describes the circuit’s hardware design, tasks like ATPG are performed on a gate-level netlist, which in the context of academic benchmarks like ISCAS, is commonly represented by a .bench file. This knowledge allows the agent to associate the user’s intent (perform ATPG) with the required artifact (a .bench netlist).
- Deterministic File System Search: Armed with the context from the LLM’s knowledge and the constraint from the tool schema, DFTAgent’s final step is not a guess but a targeted action.
4. Evaluation Results
4.1. Experimental Setup
4.2. Experiment 1: Complete ATPG Task Cycle Automation
- 1.
- Invoke the test_vectors_generate tool within the WSL environment;
- 2.
- Upon successful completion, identify the newly generated report file;
- 3.
- Invoke the parse_fault_report tool to extract key metrics from the report file;
- 4.
- Invoke the generate_summary tool to present the final results.
- Finding 1: DFTAgent demonstrates contextual awareness by seamlessly transitioning between analysis tasks based on natural language commands. The agent successfully interpreted a sequence of conversational requests, invoking tools for structural analysis, visualization, and test generation in turn, and correctly inferred the required file extension for the final step. This ability to dynamically chain different tools based on conversational context highlights its practical intelligence, proving it can function as a versatile and intuitive partner in complex EDA workflows.
4.3. Experiment 2: Ablation Study
- Baseline: All modules active, full workflow execution.
- Ablation: One module removed or replaced by a non-functional placeholder returning default outputs.
- Evaluation: Run identical ISCAS’85 and ISCAS’89 benchmark tasks.
| Ablation Setting | Performance Impact |
|---|---|
| Baseline (all modules) | – |
| Without parse_fault_report | Final report generation not possible |
| Without visualize_circuit | Workflow intact, loses visualization capability |
| Without generate_summary | Output lacks readability, partial tasks considered incomplete |
| Without analyze_netlist | Higher failure rate when input format is ambiguous |
4.4. Experiment 3: Toolset vs. LLM Reasoning: Deterministic Executor Comparison
- LLM Agent: The current DFTAgent implementation.
- Deterministic Executor: A hard-coded sequential workflow without natural language parsing or contextual reasoning. Runs only under fully standardized input conditions.
4.5. Experiment 4: Multi-Tool Orchestration for Interactive Analysis
- Workflow and Results:
- Initial Query (Netlist Analysis)User: “Tell me about the c499.v netlist.”Agent Action: DFTAgent correctly identified the need for structural information and called analyze_netlist.Agent Response: DFTAgent returned a structured summary as shown in Figure 2. In response to a query about the netlist, DFTAgent provides a high-level overview of the circuit’s structure. It identifies the file type as Verilog and details its complexity, including the total number of cells, inputs, outputs, and wires.
- Follow-up Query (Visualization)User: “That’s interesting. Can you draw me a picture of c499.v?”Agent Action: Understanding the user’s intent to visualize, DFTAgent called visualize_circuit.Agent Response: DFTAgent returned a structured summary as shown in Figure 3. When asked to visualize the circuit, DFTAgent not only generates the diagram but also provides a deeper functional analysis. It identifies the circuit as a well-known “ECAT” error correction benchmark and describes its three-stage logic structure, breaking down the composition of its gates and I/O ports.
- Final Query (ATPG Execution)User: “Okay, now let’s see its testability. Run ATPG on c499.” (Note: DFTAgent must infer the correct file format,.bench).Agent Action: The agent correctly inferred the corresponding .bench file and called generate_test_vectors.Agent Response: DFTAgent returned a structured summary as shown in Figure 4. For the ATPG task, DFTAgent executes the workflow and presents a comprehensive testability report. It highlights key metrics, such as achieving 96.57% fault coverage with 55 compacted test patterns, and offers a detailed analysis of the fault types and the effectiveness of the test pattern compaction.
- Finding 2: DFTAgent demonstrates its ability to automate a complete analysis workflow based on simple conversational instructions. DFTAgent successfully interpreted a sequence of conversational requests, invoking tools for structural analysis, visualization, and test generation in turn. It correctly inferred the required file extension for the final step. This ability to dynamically chain different tools based on conversational commands highlights its practical intelligence, proving it can function as a versatile and intuitive partner in complex EDA workflows.
4.6. Experiment 5: Resilience to Environmental Changes
- Finding 3: DFTAgent exhibits superior robustness and adaptability compared to traditional hard-coded scripts. In a direct baseline comparison, a standard Tcl script failed when faced with a minor alteration to a report file format. In contrast, DFTAgent successfully parsed the modified report, demonstrating its ability to handle unexpected variations in tool output. This resilience to superficial formatting changes directly addresses a core weakness of script-based automation, reducing maintenance overhead and proving the framework’s potential to create more robust and adaptive EDA workflows.
4.7. Experiment 6: Resilience to Systematic Text Structure Changes
- Paragraph Reordering: Relocated the fault coverage and test pattern sections to different positions in the file, including placing them at the end.
- Section Header Renaming: Changed key headers (e.g., Fault Coverage → Coverage Rate, Test Patterns → Generated Vectors).
- Result Block Merging/Splitting: Combined multiple metrics into one sentence, or split a single metric into multiple lines.
- Noise Injection: Added unrelated statistical data blocks that could confuse parsers relying on positional rules.
- Finding 4: DFTAgent’s resilience extends beyond minor environmental variations to complex structural changes in report format. This demonstrates strong adaptability to realistic toolchain evolutions, further validating the framework’s value in production environments.
5. Discussion
5.1. Experimental Validation: Robustness and Intelligence
5.2. Limitations and Future Research Directions
- Integration with Commercial EDA Suites: The most critical next step is to expand the Tool Layer by developing robust wrappers for industry-standard commercial tools, such as Synopsys TestMAX and Siemens Tessent. This effort will involve overcoming significant challenges. First, it requires programmatic license management and the ability to handle complex tool configuration files. Second, the system must be able to parse the much richer and more varied log file outputs generated by these sophisticated tools.
- Scalability and Asynchronous Task Management: Industrial DFT and ATPG runs can take many hours or even days to complete on large designs, especially when applying advanced fault models or high test compression ratios. DFTAgent’s current synchronous orchestration loop is insufficient for such workloads, as it assumes tools will complete within minutes. To support production-scale designs, the core executive module must be enhanced to manage persistent, long-running jobs in a truly asynchronous manner. This entails implementing job completion callbacks from ATPG tools, periodic status polling with configurable intervals, and integration with workload managers such as Slurm or LSF for compute farm submission. The framework should maintain a resilient task queue with checkpointing so that partial progress can survive agent restarts or system failures. Such capabilities would enable DFTAgent to continue orchestrating other design or verification tasks in parallel while ATPG jobs execute in the background, thereby aligning with the operational requirements of industrial-scale DFT workflows.
5.3. Minimum Experimental Plan
- 1.
- We will utilize synthetic benchmarks with large logic depths to measure the scaling of runtime and resource consumption.
- 2.
- A test-compression variant of the c6288 benchmark will be used to assess the system’s compatibility with compressed Automatic Test Pattern Generation (ATPG) patterns.
- 3.
- We will conduct controlled log perturbation tests (e.g., missing tags, noisy timestamps, and multithread interleaving) to quantify error-recovery rates and replication stability.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Agnihotri, P.; Kalla, P.; Blair, S. Design-for-Test for Silicon Photonic Circuits. In Proceedings of the 2024 IEEE International Test Conference (ITC), San Diego, CA, USA, 3–8 November 2024; IEEE: New York, NY, USA, 2024; pp. 86–90. [Google Scholar]
- Yuan, S.; Yaldagard, M.A.; Xun, H.; Fieback, M.; Marinissen, E.J.; Kim, W.; Rao, S.; Couet, S.; Taouil, M.; Hamdioui, S. Design-for-test for intermittent faults in STT-MRAMs. In Proceedings of the 2024 IEEE European Test Symposium (ETS), The Hague, The Netherlands, 20–24 May 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Wang, Y.; Mäntylä, M.V.; Liu, Z.; Markkula, J.; Raulamo-jurvanen, P. Improving test automation maturity: A multivocal literature review. Softw. Test. Verif. Reliab. 2022, 32, e1804. [Google Scholar] [CrossRef]
- Eisty, N.U.; Kanewala, U.; Carver, J.C. Testing research software: An in-depth survey of practices, methods, and tools. Empir. Softw. Eng. 2025, 30, 81. [Google Scholar] [CrossRef]
- Pan, J.; Zhou, G.; Chang, C.C.; Jacobson, I.; Hu, J.; Chen, Y. A survey of research in large language models for electronic design automation. ACM Trans. Des. Autom. Electron. Syst. 2025, 30, 1–21. [Google Scholar] [CrossRef]
- Pasandi, G.; Kunal, K.; Tej, V.; Shan, K.; Sun, H.; Jain, S.; Li, C.; Deng, C.; Ene, T.D.; Ren, H.; et al. JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation. arXiv 2025, arXiv:2505.14978. [Google Scholar]
- Huang, G.; Hu, J.; He, Y.; Liu, J.; Ma, M.; Shen, Z.; Wu, J.; Xu, Y.; Zhang, H.; Zhong, K.; et al. Machine learning for electronic design automation: A survey. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2021, 26, 1–46. [Google Scholar] [CrossRef]
- Mirhoseini, A.; Goldie, A.; Yazgan, M.; Jiang, J.W.; Songhori, E.; Wang, S.; Lee, Y.J.; Johnson, E.; Pathak, O.; Nova, A.; et al. A graph placement methodology for fast chip design. Nature 2021, 594, 207–212. [Google Scholar] [CrossRef] [PubMed]
- Shi, Z.; Li, M.; Khan, S.; Wang, L.; Wang, N.; Huang, Y.; Xu, Q. Deeptpi: Test point insertion with deep reinforcement learning. In Proceedings of the 2022 IEEE International Test Conference (ITC), Anaheim, CA, USA, 23–30 September 2022; IEEE: New York, NY, USA, 2022; pp. 194–203. [Google Scholar]
- Chen, Z.; Xu, J.; Alippi, C.; Ding, S.X.; Shardt, Y.; Peng, T.; Yang, C. Graph neural network-based fault diagnosis: A review. arXiv 2021, arXiv:2111.08185. [Google Scholar] [CrossRef]
- Pandey, S.; Sarangi, S.R. HybMT: Hybrid Meta-Predictor based ML Algorithm for Fast Test Vector Generation. In Proceedings of the 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 22–25 January 2024; IEEE: New York, NY, USA, 2024; pp. 497–502. [Google Scholar]
- Banerjee, S.; Talukdar, J.; Firouzi, F. Silicon Whisperers: Improving Test Quality and Cost in the Age of Generative AI. In Proceedings of the 2025 IEEE International Conference on Omni-layer Intelligent Systems (COINS), Madison, WI, USA, 4–6 August 2025; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar]
- Chen, L.; Chen, Y.; Chu, Z.; Fang, W.; Ho, T.Y.; Huang, R.; Huang, Y.; Khan, S.; Li, M.; Li, X.; et al. The dawn of ai-native eda: Opportunities and challenges of large circuit models. arXiv 2024, arXiv:2403.07257. [Google Scholar]
- Chen, L.; Chen, Y.; Chu, Z.; Fang, W.; Ho, T.Y.; Huang, R.; Huang, Y.; Khan, S.; Li, M.; Li, X.; et al. Large circuit models: Opportunities and challenges. Sci. China Inf. Sci. 2024, 67, 200402. [Google Scholar] [CrossRef]
- Huang, Z.; Huang, Z.; Tao, S.; Chen, S.; Zeng, Z.; Ni, L.; Zhuang, C.; Li, W.; Zhao, X.; Liu, H.; et al. AiEDA: An open-source AI-native EDA library. In Proceedings of the 2024 2nd International Symposium of Electronics Design Automation (ISEDA), Xi’an, China, 10–13 May 2024; IEEE: New York, NY, USA, 2024; pp. 794–795. [Google Scholar]
- Thakur, S.; Ahmad, B.; Pearce, H.; Tan, B.; Dolan-Gavitt, B.; Karri, R.; Garg, S. Verigen: A large language model for verilog code generation. ACM Trans. Des. Autom. Electron. Syst. 2024, 29, 1–31. [Google Scholar] [CrossRef]
- Firouzi, F.; Nakkilla, S.S.R.; Fu, C.; Banerjee, S.; Talukdar, J.; Chakrabarty, K. Llm-aid: Leveraging large language models for rapid domain-specific accelerator development. In Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, New York, NY, USA, 27–31 October 2024; pp. 1–9. [Google Scholar]
- Fu, Y.; Zhang, Y.; Yu, Z.; Li, S.; Ye, Z.; Li, C.; Wan, C.; Lin, Y.C. Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models. In Proceedings of the 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, USA, 28 October–2 November 2023; IEEE: New York, NY, USA, 2023; pp. 1–9. [Google Scholar]
- Wu, H.; He, Z.; Zhang, X.; Yao, X.; Zheng, S.; Zheng, H.; Yu, B. Chateda: A large language model powered autonomous agent for eda. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2024, 43, 3184–3197. [Google Scholar] [CrossRef]
- Jiang, Z.; Zhang, Q.; Liu, C.; Cheng, L.; Li, H.; Li, X. Iicpilot: An intelligent integrated circuit backend design framework using open eda. arXiv 2024, arXiv:2407.12576. [Google Scholar] [CrossRef]
- Wu, H.; Zheng, H.; He, Z.; Yu, B. Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation. arXiv 2025, arXiv:2502.10857. [Google Scholar]
- Wang, X.; Wan, G.W.; Wong, S.Z.; Zhang, L.; Liu, T.; Tian, Q.; Ye, J. Chatcpu: An agile cpu design and verification platform with llm. In Proceedings of the 61st ACM/IEEE Design Automation Conference, San Francisco, CA, USA, 23–27 June 2024; pp. 1–6. [Google Scholar]
- Hu, Y.; Ye, J.; Xu, K.; Sun, J.; Zhang, S.; Jiao, X.; Pan, D.; Zhou, J.; Wang, N.; Shan, W.; et al. Uvllm: An automated universal rtl verification framework using llms. arXiv 2024, arXiv:2411.16238. [Google Scholar] [CrossRef]
- Liu, B.; Zhang, H.; Gao, X.; Kong, Z.; Tang, X.; Lin, Y.; Wang, R.; Huang, R. LayoutCopilot: An LLM-Powered Multiagent Collaborative Framework for Interactive Analog Layout Design. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2025, 44, 3126–3139. [Google Scholar] [CrossRef]
- Petrolo, V.; Medya, S.; Graziano, M.; Pal, D. DETECTive: Machine Learning-driven Automatic Test Pattern Prediction for Faults in Digital Circuits. In Proceedings of the Great Lakes Symposium on VLSI 2024, GLSVLSI ’24, Clearwater, FL, USA, 12–14 June 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 32–37. [Google Scholar]
- Shrestha, P.; Aversa, A.; Phatharodom, S.; Savidis, I. EDA-schema: A Graph Datamodel Schema and Open Dataset for Digital Design Automation. In Proceedings of the Great Lakes Symposium on VLSI 2024, GLSVLSI ’24, Clearwater, FL, USA, 12–14 June 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 69–77. [Google Scholar]
- Rahimifar, M.; Jahanirad, H.; Fathi, M. Deep transfer learning approach for digital circuits vulnerability analysis. Expert Syst. Appl. 2024, 237, 121757. [Google Scholar] [CrossRef]





| Approach/Framework | Automation Scope | Interaction Mode |
|---|---|---|
| Traditional Script-Driven DFT | Tool-specific Task Automation | Command Scripting (Tcl or Perl) |
| DFTAgent(Our Work) | Complete ATPG task cycle | Natural Language → Dynamic Tool Invocation |
| JARVIS [6] | Script Generation for EDA Tools | Natural Language → Tcl or Python 3.10 Scripts |
| ChatEDA [19] | Full Digital Design Flow Orchestration | Natural Language |
| VeriGen [16] | Verilog Hardware Description Generation | Natural Language → Synthesizable Hardware Description Language |
| LayoutCopilot [24] | Interactive Analog Layout Design | Natural Language Interactive Interface |
| UVLLM [23] | Register-transfer Level Verification and Error Repair | Natural Language → Verification Tool Execution |
| Tool Name | Description |
|---|---|
| generate_test_vectors | Generates test vectors for given BENCH files using the Atalanta tool |
| analyze_netlist | Analyzes a Verilog or SPICE netlist to extract structural information like module name, port counts, and cell instances |
| visualize_circuit | Generates a visual circuit diagram from a netlist file using Graphviz |
| parse_fault_report | Parses a fault report to identify untested faults and provide structured analysis |
| generate_summary | Consolidates results from a test run into a final summary, reporting key metrics like fault coverage and pattern count |
| Circuit | Fault Coverage/Test Patterns Count | CPU Time(s) | Agent Overhead (USD) |
|---|---|---|---|
| ISCAS’85 Combinational Circuits | |||
| c17 | 100.000%/63 | 32.9 ± 0.1 | 0.0028 ± 0.0001 |
| c880 | 100.000%/149 | 32.8 ± 0.2 | 0.0032 ± 0.0001 |
| c432 | 99.046%/63 | 96.9 ± 0.3 | 0.0035 ± 0.0002 |
| c1908 | 99.468%/129 | 41.1 ± 0.2 | 0.0037 ± 0.0001 |
| c2670 | 95.741%/512 | 86.1 ± 0.4 | 0.0041 ± 0.0002 |
| c6288 | 99.251%/64 | 37.0 ± 0.3 | 0.0043 ± 0.0001 |
| ISCAS’89 Sequential Circuits | |||
| s27 | 100.000%/8 | 24.5 ± 0.2 | 0.0030 ± 0.0001 |
| s208 | 100.000%/46 | 32.1 ± 0.2 | 0.0031 ± 0.0001 |
| s349 | 99.430%/63 | 23.5 ± 0.1 | 0.0032 ± 0.0001 |
| s526 | 100.000%/100 | 26.5 ± 0.2 | 0.0034 ± 0.0001 |
| s832 | 98.390%/125 | 45.5 ± 0.3 | 0.0040 ± 0.0001 |
| ITC’99 Benchmark Circuits | |||
| b01 | 100.000%/16 | 23.9 ± 0.1 | 0.0032 ± 0.0001 |
| b02 | 100.000%/12 | 27.0 ± 0.2 | 0.0035 ± 0.0001 |
| b03 | 100.000%/31 | 24.1 ± 0.1 | 0.0039 ± 0.0001 |
| b04 | 98.930%/98 | 24.2 ± 0.2 | 0.0042 ± 0.0001 |
| b05 | 81.870%/95 | 24.9 ± 0.2 | 0.0048 ± 0.0001 |
| Execution Mode | Standard Input | Non-Standard Input |
|---|---|---|
| LLM Agent | 100% (11/11) | 91% (10/11) |
| Deterministic Executor | 100% (11/11) | 0% (0/11) |
| Change Type | Baseline Script | DFTAgent |
|---|---|---|
| Paragraph Reordering | 0% / FAIL | 100% / PASS |
| Section Header Renaming | 0% / FAIL | 90% / PASS |
| Block Merging/Splitting | 0% / FAIL | 80% / PASS |
| Noise Injection | 0% / FAIL | 90% / PASS |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Wang, Y.; Liu, J.; Liu, H. LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts. Appl. Sci. 2025, 15, 11390. https://doi.org/10.3390/app152111390
Li H, Wang Y, Liu J, Liu H. LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts. Applied Sciences. 2025; 15(21):11390. https://doi.org/10.3390/app152111390
Chicago/Turabian StyleLi, Hailong, Yun Wang, Jian Liu, and Haiyang Liu. 2025. "LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts" Applied Sciences 15, no. 21: 11390. https://doi.org/10.3390/app152111390
APA StyleLi, H., Wang, Y., Liu, J., & Liu, H. (2025). LLM Agents as Catalysts for Resilient DFT: An Orchestration-Based Framework Beyond Brittle Scripts. Applied Sciences, 15(21), 11390. https://doi.org/10.3390/app152111390

