Causal Graph-Enhanced Large Language Models for Automated Fault Diagnosis and Intelligent Operation and Maintenance in Distributed Computing Systems
Abstract
1. Introduction
- Question 1: How to systematically construct domain-specific causal graphs for particular fields and effectively integrate them with causal relationship networks?
- Question 2: How to generate coherent and relevant chains of reasoning to enable reasoning and decision-making?
- Question 3: Can causal graph-augmented LLMs enhance the accuracy of fault diagnosis in distributed computing systems?
2. Methodology
2.1. Construction of UCG
2.1.1. Construction of Basic Graph Based on Causal Knowledge
2.1.2. Methodology for Constructing the Unified Causal Relationship Graph
2.2. Automatic Generation of Diagnostic Reasoning Chains via System State Perception
2.2.1. System State Perception and Abnormal Symptom Mining
2.2.2. Diagnostic Reasoning Chain Generation
2.3. Fault Diagnosis and Auxiliary Operation and Maintenance Framework Based on LLMs
2.3.1. Fault Diagnosis
2.3.2. Auxiliary Operation and Maintenance Decision Suggestions
3. Results
3.1. Distributed Computing System Introduction
3.2. Dataset Preparation
3.3. The Unified Causal Relationship Graph
3.4. Generation of Chain of Thought
3.5. Auxiliary Decision-Making for O&M
4. Discussion
4.1. The Advantages of Method
- Precision enhancement: This study demonstrates notable advantages in precision enhancement through its innovative integration of domain expertise and data-driven approaches. At the core of this approach is the construction of the UCG, which effectively combines domain expert knowledge with advanced data-driven algorithms to address the inherent limitations in knowledge completeness and generalization capabilities that have plagued traditional fault diagnosis methods. With the generated diagnostic reasoning chains-of-thought achieving retrieval hit rates exceeding 93% across a diverse set of fault types, and certain fault categories even approaching optimal performance levels.
- Reliability improvement: This study demonstrates remarkable reliability performance in complex fault scenarios, particularly under conditions of concurrent multi-fault occurrences and incomplete observation environments. Experimental analysis further reveals that the employed strategy achieves substantial improvements in overall matching performance, with accuracy enhancements of approximately 41% and 34% compared to the “With faults” and “With symptoms” groups, respectively. Notably, the framework is capable of accurately diagnosing faults even in highly convoluted and complex fault situations.
- Interpretability enhancement: This study significantly improves diagnostic process transparency through the synergistic integration of LLMs with knowledge graphs. The LLM-generated diagnostic results encompass not only the most probable fault identification but also provide detailed logical justifications, causal propagation pathways, and rationales for excluding alternative fault candidates.
4.2. Limitations and Future Research Directions
4.2.1. Limitation Evaluation on a Single Distributed Computing System
4.2.2. Pre-Training and Fine-Tuning of Domain-Specific LLMs
4.2.3. Explainability Techniques for LLMs in Distributed Computing Systems
4.2.4. Expand Baselines and O&M-Recommendation Assessment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| c | The normalization factor |
| cj | The j-th cluster |
| d | Euclidean distance |
| E | Average path length of all isolation trees |
| h | The path lengths of data |
| hr | Hit rate |
| i | The value of counter |
| J | Objective function value |
| K | Number of filtered chains-of-thought which is 3 |
| k | Preset number of clusters which is 5 |
| N | The number of abnormal symptom nodes |
| NUCG,i | The count of UCG nodes retrieved |
| n | The sampling size |
| X | The number of chains |
| x | Data point of cj |
| xcurrent | Current metric value |
| xnormal | Mean historical value of metric |
| xmax | Maximum historical value of metric |
| xmin | Minimum historical value of metric |
| P | Set of steady-state data points |
| Q | Set of history steady-state data points |
| p | The working pressure indicator of sets P |
| q | The working pressure indicator of sets Q |
| Volatility | Volatility of symptoms deviating from their normal values |
| s | Anomaly score |
| ssub | The collection of abnormal symptom nodes |
| sact | The set of empirically identified abnormal symptom nodes |
| μj | Centroid of the j-th cluster |
Appendix A
| Representative Approach | Core Mechanism | Data Assumptions | Interpretability |
|---|---|---|---|
| Classical supervised learning | Learn feature-to-fault mapping from labeled examples. | Requires sufficient, representative labeled data. | Moderate overall; deep models tend toward black-box behavior. |
| Unsupervised learning | Discover latent structure or anomalies without dense supervision. | Weak labels or none. | Structural patterns can be explained; semantic fault naming remains weak. |
| Knowledge-driven diagnosis | Encode expert if–then logic or probabilistic graphical inference. | Depends on curated expert knowledge and maintenance. | High when rules or graphs are complete; brittle when coverage is incomplete. |
| Few-shot cross-domain learning | Meta-learning plus optimization to relieve scarce target-domain samples. | Labeled source domain; few target-domain samples. | Moderate. |
| High-resolution time–frequency feature learning | Physically motivated time–frequency features followed by a classifier. | Fault-related vibration or process signals. | Strong at the feature level; less explicit on end-to-end decision logic. |
| GNNs on multivariate sensor graphs | Trainable graph adjacency with GNN encoding of variable relations. | Variable graph construction plus labels for training. | Graph topology and neighborhoods provide partial explanations. |
| Recurrent GCN for sensor FDI&A | Recurrent graph convolutions for detection, isolation, and accommodation of sensor faults. | Sensing streams aligned with twin or network topology. | Structure- and residual-based analyses are feasible. |
| Data-driven learning | Bayesian filtering on residuals plus open-set separation of known vs. unknown faults. | Residual and operating-condition trajectories. | Moderate; statistical summaries of residuals are inspectable. |
| LLM-assisted O&M/cloud incident RCA | LLMs read incidents, logs, and reports to attribute root causes in natural language. | Unstructured incident text plus O&M corpora. | Strong natural-language rationales; weaker guarantees on structured causal consistency. |
| LLM-based industrial fault diagnosis from sensor narratives | Prompting or light fine-tuning so LLMs consume textualized sensor context. | Domain corpora and careful prompt design. | Natural-language explanations; limited built-in graph-level structure. |
| Fine-tuning for LLM domain adaptation | Update model parameters to fit target-domain language and tasks. | High-cost curated annotations and compute for adaptation. | Moderate; explanations depend on prompting and post hoc tools unless constrained. |
| Retrieval-Augmented Generation (RAG) | Retrieve external documents or passages at inference to ground generations without full retraining. | Quality and coverage of the external knowledge base dominate performance. | Retrieved citations are traceable; flat retrieval still limits multi-hop relational reasoning. |
| Graph RAG | Encode external or operational knowledge as a graph, then couple with LLM reasoning or generation. | Requires building and maintaining a domain event/knowledge graph. | High: explicit graph structure plus LLM-generated rationales. |
| AIOps/microservice RCA | Joint use of logs, traces, metrics, and optionally LLMs for incident understanding and localization. | Cloud-native observability stacks. | Interpretability varies by pipeline stage and tooling; component-level clarity is uneven. |
Appendix B



Appendix C

References
- Van Steen, M.; Tanenbaum, A.S. A Brief Introduction to Distributed Systems. Computing 2016, 98, 967–1009. [Google Scholar] [CrossRef]
- Khole, A.; Thakar, A.; Kulkarni, A.; Jadhav, H.; Shende, S.; Karajkhede, V. A Compendium on Distributed Systems. arXiv 2023, arXiv:230203990. [Google Scholar] [CrossRef]
- Coulouris, G.; Dollimore, J.; Kindberg, T. Distributed Systems: Concepts and Design, 3rd ed.; Addison Wesley: Reading, MA, USA, 2001. [Google Scholar]
- Xingang, W. A Research Review of Distributed Computing System. In Recent Developments in Intelligent Computing, Communication and Devices; Springer: Berlin/Heidelberg, Germany, 2018; pp. 357–368. [Google Scholar]
- Adel, A.; Alani, N.H.; Jan, T.; Prasad, M. A Review of Major ICT Failures and Recovery Strategies: Strengthening Digital Resilience. Comput. Secur. 2025, 159, 104678. [Google Scholar] [CrossRef]
- Gorbenko, A.; Romanovsky, A.; Tarasyuk, O. Fault Tolerant Internet Computing: Benchmarking and Modelling Trade-Offs between Availability, Latency and Consistency. J. Netw. Comput. Appl. 2019, 146, 102412. [Google Scholar] [CrossRef]
- Costa, V.G.; Pedreira, C.E. Recent Advances in Decision Trees: An Updated Survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]
- Rodriguez, E.; Otero, B.; Gutierrez, N.; Canal, R. A Survey of Deep Learning Techniques for Cybersecurity in Mobile Networks. IEEE Commun. Surv. Tutor. 2021, 23, 1920–1955. [Google Scholar] [CrossRef]
- Ren, Y.-S.; Ma, C.-Q.; Kong, X.-L.; Baltas, K.; Zureigat, Q. Past, Present, and Future of the Application of Machine Learning in Cryptocurrency Research. Res. Int. Bus. Financ. 2022, 63, 101799. [Google Scholar] [CrossRef]
- Fotopoulou, S. A Review of Unsupervised Learning in Astronomy. Astron. Comput. 2024, 48, 100851. [Google Scholar] [CrossRef]
- Ademujimi, T.; Prabhu, V. Fusion-Learning of Bayesian Network Models for Fault Diagnostics. Sensors 2021, 21, 7633. [Google Scholar] [CrossRef]
- Nan, C.; Khan, F.; Iqbal, M.T. Real-Time Fault Diagnosis Using Knowledge-Based Expert System. Process Saf. Environ. Prot. 2008, 86, 55–71. [Google Scholar]
- Lee, J.M.; Kim, J.H. An Integration of Heuristic and Model-Based Reasoning in Fault Diagnosis. Eng. Appl. Artif. Intell. 1993, 6, 345–356. [Google Scholar] [CrossRef]
- Zhao, H.; Liu, C.; Dang, X.; Xu, J.; Deng, W. Few-Shot Cross-Domain Fault Diagnosis of Transportation Motor Bearings Using MAML-GA. IEEE Trans. Transp. Electrif. 2025, 12, 1165–1174. [Google Scholar] [CrossRef]
- Deng, W.; Guan, H.; Zhao, H. Parameterized Iterative Time-Frequency-Multisqueezing Transform for Bearing Fault Diagnosis. IEEE Trans. Instrum. Meas. 2025, 74, 1–11. [Google Scholar]
- Kovalenko, A.; Pozdnyakov, V.; Makarov, I. Graph Neural Networks with Trainable Adjacency Matrices for Fault Diagnosis on Multivariate Sensor Data. IEEE Access 2024, 12, 152860–152872. [Google Scholar] [CrossRef]
- Darvishi, H.; Ciuonzo, D.; Rossi, P.S. Deep Recurrent Graph Convolutional Architecture for Sensor Fault Detection, Isolation, and Accommodation in Digital Twins. IEEE Sens. J. 2023, 23, 29877–29891. [Google Scholar] [CrossRef]
- Jung, D. Data-Driven Open-Set Fault Classification of Residual Data Using Bayesian Filtering. IEEE Trans. Control Syst. Technol. 2020, 28, 2045–2052. [Google Scholar] [CrossRef]
- Zhang, B.; Yin, C.; Liu, K.; Zhai, X.; Sun, Y.; Du, M. Research on the Construction of Geographic Knowledge Graph Integrating Natural Disaster Information. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 10, 79–85. [Google Scholar] [CrossRef]
- Liu, S.; Zhou, Y.; Ying, L.; Tian, Y.; Zhang, J.; Zhou, S.; Cui, W.; Lin, Q.; Moscibroda, T.; Zhang, H.; et al. Rcinvestigator: Towards Better Investigation of Anomaly Root Causes in Cloud Computing Systems. arXiv 2024, arXiv:240515571. [Google Scholar] [CrossRef] [PubMed]
- Nikpour, H.; Aamodt, A. Fault Diagnosis under Uncertain Situations within a Bayesian Knowledge-Intensive Cbr System. Prog. Artif. Intell. 2021, 10, 245–258. [Google Scholar] [CrossRef]
- Chen, Y.; Xie, H.; Ma, M.; Kang, Y.; Gao, X.; Shi, L.; Cao, Y.; Gao, X.; Fan, H.; Wen, M.; et al. Automatic Root Cause Analysis via Large Language Models for Cloud Incidents. In Proceedings of the Nineteenth European Conference on Computer Systems; Association for Computing Machinery: New York, NY, USA, 2024; pp. 674–688. [Google Scholar]
- Lee, X.Y.; Vidyaratne, L.; Farahat, A.; Gupta, C. Exploring LLM-Based Frameworks for Fault Diagnosis. arXiv 2025, arXiv:250923113. [Google Scholar] [CrossRef]
- Yang, T.-L.; Liu, J.-S.; Tseng, Y.-H.; Jang, J.-S.R. Knowledge Retrieval Based on Generative AI. arXiv 2025, arXiv:250104635. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, C.; Lu, J.; Zhao, Y. Domain-Specific Large Language Models for Fault Diagnosis of Heating, Ventilation, and Air Conditioning Systems by Labeled-Data-Supervised Fine-Tuning. Appl. Energy 2025, 377, 124378. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H.; Wang, H.; et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2023, arXiv:231210997. [Google Scholar]
- Asai, A.; Zhong, Z.; Chen, D.; Koh, P.W.; Zettlemoyer, L.; Hajishirzi, H.; Yih, W. Reliable, Adaptable, and Attributable Language Models with Retrieval. arXiv 2024, arXiv:240303187. [Google Scholar] [CrossRef]
- Chen, K.; Zhou, X.; Lin, Y.; Feng, S.; Shen, L.; Wu, P. A Survey on Privacy Risks and Protection in Large Language Models. J. King Saud Univ. Comput. Inf. Sci. 2025, 37, 163. [Google Scholar] [CrossRef]
- Delgaty, S.; LeBang, E. Boosting Domain Knowledge Understanding of LLMs through Fine-Tuning with a Novel KNN Algorithm. Res. Sq. 2024. [Google Scholar] [CrossRef]
- Roffo, G. Exploring Advanced Large Language Models with Llmsuite. arXiv 2024, arXiv:240712036. [Google Scholar] [CrossRef]
- Kuok, K.L.; Liu, H.H.; Lo, W.W. CrimeKGQA: A Crime Investigation System Based on Retrieval-Augmented Generation and Knowledge Graphs. Res. Sq. 2024. [Google Scholar] [CrossRef]
- Men, C.; Han, Y.; Wang, P.; Tao, J.; Huang, C.-G. The Interpretable Reasoning and Intelligent Decision-Making Based on Event Knowledge Graph with LLMs in Fault Diagnosis Scenarios. IEEE Trans. Instrum. Meas. 2025, 74, 1–16. [Google Scholar] [CrossRef]
- Zhang, L.; Jia, T.; Jia, M.; Wu, Y.; Liu, A.; Yang, Y.; Wu, Z.; Hu, X.; Yu, P.S.; Li, Y. A Survey of Aiops for Failure Management in the Era of Large Language Models. arXiv 2024, arXiv:240611213. [Google Scholar] [CrossRef]
- Wang, T.; Qi, G. A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends. arXiv 2024, arXiv:240800803. [Google Scholar] [CrossRef]
- Pearl, J. Causality; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Alanazi, H.; Alnaqeib, R.; Hmood, A.K.; Zaidan, M.; Al-Nabhani, Y. On the Module of Internet Banking System. arXiv 2010, arXiv:10054029. [Google Scholar] [CrossRef]
- Ntagengerwa, M.A.; Caltais, G.; Stoelinga, M. Fault Tree Synthesis from Knowledge Graphs. In Proceedings of the 2025 IEEE Annual Reliability and Maintainability Symposium-Europe (RAMS-Europe); IEEE: New York, NY, USA, 2025; pp. 1–7. [Google Scholar]
- Yu, P.; Zhang, H.; Jiang, X.; Zhou, Y.; Yan, X.; Zeng, Q.; Lin, Y. FLAM: Locating and Mitigating 5GC Network Failure with Knowledge Graphs in China Telecom’s Network. Res. Sq. 2023. [Google Scholar] [CrossRef]
- Saha, A.; Hoi, S.C. Mining Root Cause Knowledge from Cloud Service Incident Investigations for Aiops. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice; Association for Computing Machinery: New York, NY, USA, 2022; pp. 197–206. [Google Scholar]
- Liang, X.; Zhang, Q.; Man, Y.; He, Z. Toward Sustainable Process Industry Based on Knowledge Graph: A Case Study of Papermaking Process Fault Diagnosis. Discov. Sustain. 2024, 5, 93. [Google Scholar] [CrossRef]
- Guo, B.; Wang, Y.; Pan, W.; Sun, Y. Fault Diagnosis Method for Hydro-Power Plants with Bi-LSTM Knowledge Graph Aided by Attention Scheme. J. Vibroengineering 2023, 25, 1629–1641. [Google Scholar] [CrossRef]
- Michau, G.; Fink, O. Unsupervised Fault Detection in Varying Operating Conditions. In Proceedings of the 2019 IEEE International Conference on Prognostics and Health Management (ICPHM); IEEE: New York, NY, USA, 2019; pp. 1–10. [Google Scholar]
- Kanungo, T.; Mount, D.M.; Netanyahu, N.S.; Piatko, C.D.; Silverman, R.; Wu, A.Y. An Efficient K-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 881–892. [Google Scholar] [CrossRef]
- Murali, V.; Yao, E.; Mathur, U.; Chandra, S. Scalable Statistical Root Cause Analysis on App Telemetry. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP); IEEE: New York, NY, USA, 2021; pp. 288–297. [Google Scholar]
- Janus, P.; Rzadca, K. Slo-Aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements. In Proceedings of the 2017 Symposium on Cloud Computing; Association for Computing Machinery: New York, NY, USA, 2017; pp. 256–268. [Google Scholar]
- Liu, J.; Jiang, Z.; Gu, J.; Huang, J.; Chen, Z.; Feng, C.; Yang, Z.; Yang, Y.; Lyu, M.R. Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems. In Proceedings of the 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE); IEEE: New York, NY, USA, 2023; pp. 268–280. [Google Scholar]
- Foorthuis, R. The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets. arXiv 2020, arXiv:200812330. [Google Scholar] [CrossRef]
- Wang, D.; Chen, Z.; Fu, Y.; Liu, Y.; Chen, H. Disentangled Causal Graph Learning for Online Unsupervised Root Cause Analysis. arXiv 2023, arXiv:230510638. [Google Scholar] [CrossRef]
- Ma, H.; Ghojogh, B.; Samad, M.N.; Zheng, D.; Crowley, M. Isolation Mondrian Forest for Batch and Online Anomaly Detection. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE: New York, NY, USA, 2020; pp. 3051–3058. [Google Scholar]
- Stripling, E.; Baesens, B.; Chizi, B.; vanden Broucke, S. Isolation-Based Conditional Anomaly Detection on Mixed-Attribute Data to Uncover Workers’ Compensation Fraud. Decis. Support Syst. 2018, 111, 13–26. [Google Scholar] [CrossRef]
- Hou, J. Research on Fault Diagnosis and Root Cause Analysis Based on Full Stack Observability. arXiv 2025, arXiv:250912231. [Google Scholar] [CrossRef]
- Sheikholeslami, S.; Ghasemirahni, H.; Payberah, A.H.; Wang, T.; Dowling, J.; Vlassov, V. Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning. In Proceedings of the 5th Workshop on Machine Learning and Systems; Association for Computing Machinery: New York, NY, USA, 2025; pp. 230–237. [Google Scholar]











| Fault Name | Fault Description | Example |
|---|---|---|
| Fault 1 | Redis cluster capacity not enough | “node a—[fault 1] → node b node a—[fault 1] → node c” |
| Fault 2 | Downstream service response error | “node a—[fault 2] → node d node b—[fault 2] → node e” |
| Fault 3 | Downstream service breakdown | “node f—[fault 3] → node h node g—[fault 3] → node h” |
| Slot | Definition | Example |
|---|---|---|
| [X1] | Key components of a distributed computing system | “Redis cluster CPU, service cluster CPU usage, service cluster io…” |
| [X2] | Functional architecture and operational process of a distributed computing system | “In phase 1, the main task is for the consumer to process data…” |
| [X3] | Abnormal symptoms of a distributed computing system | “Symptom 1: Service cluster CPU usage is higher than normal condition. Symptom 2…” |
| [X4] | Diagnostic reasoning chains-of-thought | “Fault 5: Downstream service response timeout downstream service process latency is high → [fault 5 → downstream service response latency is normal…” |
| [X5] | LLMs diagnosed fault name | “Fault 1: Redis cluster capacity not enough” |
| [X6] | Operation and maintenance methods for the current fault in distributed computing systems | “1. Log in to the service console and identify the keys that trigger errors during Redis interactions…” |
| Fault Name | Fault Type | Description |
|---|---|---|
| Fault 1 | Redis error | Redis cluster capacity not enough |
| Fault 2 | Downstream error | Downstream service response error |
| Fault 3 | Downstream error | Downstream service breakdown |
| Fault 4 | Downstream error | Downstream service capacity not enough |
| Fault 5 | Downstream error | Downstream service response timeout |
| Fault 6 | Queue error | Service cluster IO threads not enough |
| Fault 7 | Service error | Service cluster instances not enough |
| Fault 8 | Service error | Consumer polling message threads not enough |
| Metric Name | Unit | Description |
|---|---|---|
| CPU_RE | Percentage | Redis engine CPU usage |
| CPU | Percentage | CPU usage |
| IO | Percentage | Network usage |
| DISK | Percentage | Disk usage |
| MEM | Percentage | Memory usage |
| INS | Item | Instance count |
| RPS | Item/Second | Requests per second |
| OH | Second | Internal message queue processing latency |
| PD | Second | External message queue processing latency |
| ERR_COUNT | Item | Downstream error count |
| UP_LT | Second | Downstream processing latency |
| DS_LT | Second | Downstream response latency |
| DS_CB | Item | Downstream circle break |
| OH_NUM | Item | Internal message number |
| PD_NUM | Item | External message number |
| Fault Name | Hit Rate (%) |
|---|---|
| Fault 1 | 100.00 |
| Fault 2 | 100.00 |
| Fault 3 | 95.00 |
| Fault 4 | 100.00 |
| Fault 5 | 94.29 |
| Fault 6 | 100.00 |
| Fault 7 | 93.33 |
| Fault 8 | 100.00 |
| Different Group | QWEN3-8B | GPT-4o | GPT-5.4 |
|---|---|---|---|
| With GNN | 24.9% | 46.5% | 55.4% |
| With faults | 31.9% | 70.7% | 72.6% |
| With symptoms | 38.8% | 74.9% | 78.3% |
| Complete system | 44.6% | 100.0% | 100.0% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gu, Y.; Zhang, J.; Du, Y. Causal Graph-Enhanced Large Language Models for Automated Fault Diagnosis and Intelligent Operation and Maintenance in Distributed Computing Systems. Electronics 2026, 15, 2359. https://doi.org/10.3390/electronics15112359
Gu Y, Zhang J, Du Y. Causal Graph-Enhanced Large Language Models for Automated Fault Diagnosis and Intelligent Operation and Maintenance in Distributed Computing Systems. Electronics. 2026; 15(11):2359. https://doi.org/10.3390/electronics15112359
Chicago/Turabian StyleGu, Yu, Jian Zhang, and Yugen Du. 2026. "Causal Graph-Enhanced Large Language Models for Automated Fault Diagnosis and Intelligent Operation and Maintenance in Distributed Computing Systems" Electronics 15, no. 11: 2359. https://doi.org/10.3390/electronics15112359
APA StyleGu, Y., Zhang, J., & Du, Y. (2026). Causal Graph-Enhanced Large Language Models for Automated Fault Diagnosis and Intelligent Operation and Maintenance in Distributed Computing Systems. Electronics, 15(11), 2359. https://doi.org/10.3390/electronics15112359
