RCEGen: A Generative Approach for Automated Root Cause Analysis Using Large Language Models (LLMs)
Abstract
1. Introduction
- RCEGen: A novel LLM-based framework for generating evidence-grounded root cause explanations (RCEs) from bug reports.
- Comparative evaluation: A systematic comparison of multiple state-of-the-art LLMs using unified zero-shot prompting.
- LLM-as-Judge evaluation framework: A ranking-based scoring system that assesses RCE quality using Correctness, Clarity, and Depth of Reasoning metrics validated through inter-rater agreement analysis.
2. Related Work
2.1. Automated RCA as a Classification Problem with Supervised and Deep Learning Approaches
2.2. Automated RCA Through Large Language Models (LLMs)
3. Methodology
3.1. RCEGen Framework Overview
3.2. Pre-Processing Phase
3.3. RCE Generation
3.3.1. Qwen 2.5 Coder 32B Instruct
3.3.2. DeepSeek Coder 33B Instruct
3.3.3. Codestral 22B
3.3.4. CodeLlama 34B Instruct
3.3.5. OpenCoder 8B Instruct
3.4. Prompt Template Design
3.5. RCE Evaluation
3.5.1. LLM Judge Model Selection
3.5.2. RCE Selection
4. Experiment
4.1. Dataset
4.2. Experimental Setup
4.3. LLM Judges
- Correctness: The explanation should clearly and precisely identify the actual root cause.
- Clarity: It needs to be easy for developers to understand and fit well within the context of the bug report.
- Depth of Reasoning: The explanation should connect the observed symptoms to the root cause logically, providing solid evidence to support the diagnosis.
5. Results
- 1.
- Under-specification: Short titles/descriptions lacking both context and reproduction cues yielded shallow, speculative explanations. The canonical case is shown in Figure 4. With only the two-word title “RRateLimiter delete” and a minimal code snippet, CodeLlama-34B-Instruct conjectured that “the client attempts to delete a rate-limiter” whereas the true defect was that delete() fails to remove all Redis keys allocated by the rate-limiter. Without signals about the observable evidence the LLM defaulted to a generic interpretation.
- 2.
- Missing cross-checks: Even when a stack trace was provided, the absence of “expected vs. actual” text led models into symptom chasing. They recited the top frame of the trace as the “cause” instead of reasoning about why the exception was thrown.
- 3.
- Ambiguous scope: Reports describing multi-component workflows, for example, front-end and backend API, but omitting module boundaries, caused hallucinated fixes that touched the wrong subsystem. Titles naming only a high-level feature like Search page fails provided no steer on which layer was responsible.
6. Discussion
Study Takeaways
- Generative RCA surpasses classification-based approaches. Previous studies largely conceptualized automated RCA as a classification task, predicting coarse-grained categories of root causes [9,10,11]. While useful for bug triaging, these models often produced ambiguous results that still required manual inspection [17]. Our findings confirm concerns raised by Catolino et al. [28] that “not all bugs are the same,” and that fixed categories cannot capture the diversity of root causes. By reframing RCA as a generative task, RCEGen enables LLMs to produce detailed, context-specific explanations, thereby addressing the lack of fine-grained outputs that prior classification-based methods struggled to deliver.
- Qwen2.5-Coder-Instruct and Codestral-22B deliver the strongest performance. Consistent with earlier findings that larger, instruction-tuned models outperform smaller models in software engineering tasks [32,33], we observed that Qwen2.5-Coder-Instruct and Codestral-22B generated the most accurate and useful root cause explanations. These results extend work of Plein et al. [31], who found that LLMs like ChatGPT could generate useful test cases from bug reports, but only when sufficient capacity and training breadth were present. In contrast, lightweight models such as OpenCoder-8B underperformed, reinforcing earlier observations by Hirsch et al. [10] that smaller models fail to capture the complexity of real-world bug reports.
- Correctness is strong, but reasoning depth remains limited. While models in our study achieved high correctness (≈0.89), their reasoning depth remained moderate (0.65). This mirrors findings by Du et al. [34] that pre-trained models achieve reasonable accuracy in bug prediction but fail to capture deeper semantic chains. Similarly, Jin et al. [30] showed in program repair tasks that LLMs often propose correct patches but struggle to justify their correctness with deeper reasoning. Our results extend these insights by showing that in RCA, correctness without depth produces explanations that identify a fault but fail to connect it to observable symptoms—limiting their diagnostic value.
- Evaluator LLMs provide stable, complementary judgments. Our use of GPT-4o and DeepSeek-V3 as independent evaluators aligns with recent work demonstrating that LLMs can serve as reliable judges in software engineering tasks [43,44]. We found substantial agreement on correctness ( 0.70), similar to Wang et al.’s (2025) observation that LLM-judges align closely with human evaluators on factual dimensions. However, low agreement on clarity echoes concerns raised by Kumar et al. [35], who found that readability and stylistic attributes are more subjective and less consistently judged by LLMs. This suggests that hybrid pipelines, which combine LLM and human judgments, may remain necessary.
- Semantic alignment with developer analyses is high. We observed strong semantic similarity (CodeBERT ≈ 0.98) between LLM-generated and developer-written explanations, even though lexical overlap (ROUGE-L < 0.27) was low. This finding complements work [32] on bug summarization, which showed that LLMs can paraphrase bug descriptions faithfully without exact word matches. Similarly, Plein et al. [31] demonstrated that LLMs generate semantically meaningful but lexically different reproducing test cases. Our results confirm that semantic alignment, rather than surface similarity, is a more faithful metric for evaluating RCA outputs.
- Bug report quality strongly influences model effectiveness. Our findings reinforce Bettenburg et al.’s [6] seminal study showing that bug report quality strongly affects triaging and resolution outcomes. In our evaluation, vague titles or missing expected/actual behavior consistently led to speculative and shallow explanations, a pattern consistent with Tan et al. [14], who showed that defect characteristics influence developers’ ability to localize faults. By contrast, reports with error messages and reproduction steps enabled LLMs to produce accurate and actionable root cause explanations. This highlights the continued importance of structured bug reporting templates, echoing Baysal et al. [8] and Hirsch et al. [10], who argued that report quality is a central determinant of debugging efficiency.
- Future improvements require richer inputs and adaptive prompting. Earlier studies have emphasized the value of augmenting RCA with richer contextual signals, such as execution logs or code diffs [15]. Jin et al. [30] similarly showed that external signals improve program repair quality, while Du et al. [34] found that bug prediction benefits from contextual metadata. Our results suggest that adaptive prompting, where LLMs request missing details from reporters, could complement these approaches. This aligns with Zhang et al. [33], who proposed interactive, in-context learning for cloud incident RCA. Together, these studies suggest hybrid frameworks where generative RCA is enhanced by additional context and adaptive clarification mechanisms.
7. Threats to Validity
7.1. Internal Validity
7.2. External Validity
8. Limitations
9. Future Work
10. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rooney, J.J.; Heuvel, L.N.V. Root cause analysis for beginners. Qual. Prog. 2004, 37, 45–56. [Google Scholar]
- Van Moll, J.; Jacobs, J.; Freimut, B.; Trienekens, J. The importance of life cycle modeling to defect detection and prevention. In Proceedings of the 10th International Workshop on Software Technology and Engineering Practice, Montreal, QC, Canada, 6–8 October 2002; IEEE: Piscataway, NJ, USA, 2002; pp. 144–155. [Google Scholar]
- Adeel, K.; Ahmad, S.; Akhtar, S. Defect prevention techniques and its usage in requirements gathering-industry practices. In Proceedings of the 2005 Student Conference on Engineering Sciences and Technology, Karachi, Pakistan, 27 August 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 1–5. [Google Scholar]
- Davies, S.; Roper, M. What’s in a bug report? In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Torino, Italy, 18–19 September 2014; pp. 1–10. [Google Scholar]
- Xia, X.; Lo, D.; Wang, X.; Zhou, B. Accurate developer recommendation for bug resolution. In Proceedings of the 2013 20th Working Conference on Reverse Engineering (WCRE), Koblenz, Germany, 14–17 October 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 72–81. [Google Scholar]
- Bettenburg, N.; Just, S.; Schröter, A.; Weiss, C.; Premraj, R.; Zimmermann, T. What makes a good bug report? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Atlanta, GA, USA, 9–14 November 2008; pp. 308–318. [Google Scholar]
- Dalal, S.; Chhillar, R.S. Empirical study of root cause analysis of software failure. ACM SIGSOFT Softw. Eng. Notes 2013, 38, 1–7. [Google Scholar] [CrossRef]
- Baysal, O.; Holmes, R.; Godfrey, M.W. Revisiting bug triage and resolution practices. In Proceedings of the 2012 First International Workshop on User Evaluation for Software Engineering Researchers (USER), Zurich, Switzerland, 5 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 29–30. [Google Scholar]
- Lal, H.; Pahwa, G. Root cause analysis of software bugs using machine learning techniques. In Proceedings of the 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence, Noida, India, 12–13 January 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 105–111. [Google Scholar]
- Hirsch, T.; Hofer, B. Root cause prediction based on bug reports. In Proceedings of the 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Coimbra, Portugal, 12–15 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 171–176. [Google Scholar]
- Hirsch, T.; Hofer, B. Using textual bug reports to predict the fault category of software bugs. Array 2022, 15, 100189. [Google Scholar] [CrossRef]
- Alsaedi, S.A.; Noaman, A.Y.; Gad-Elrab, A.A.; Eassa, F.E. Nature-based prediction model of bug reports based on Ensemble Machine Learning Model. IEEE Access 2023, 11, 63916–63931. [Google Scholar] [CrossRef]
- Du, X.; Liu, Z.; Li, C.; Ma, X.; Li, Y.; Wang, X. LLM-BRC: A large language model-based bug report classification framework. Softw. Qual. J. 2024, 32, 985–1005. [Google Scholar] [CrossRef]
- Tan, L.; Liu, C.; Li, Z.; Wang, X.; Zhou, Y.; Zhai, C. Bug characteristics in open source software. Empir. Softw. Eng. 2014, 19, 1665–1705. [Google Scholar] [CrossRef]
- Thung, F.; Lo, D.; Jiang, L. Automatic recovery of root causes from bug-fixing changes. In Proceedings of the 2013 20th Working Conference on Reverse Engineering (WCRE), Koblenz, Germany, 14–17 October 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 92–101. [Google Scholar]
- Kawrykow, D.; Robillard, M.P. Non-essential changes in version histories. In Proceedings of the 33rd International Conference on Software Engineering, Honolulu, HI, USA, 21–28 May 2011; pp. 351–360. [Google Scholar]
- Ni, Z.; Li, B.; Sun, X.; Chen, T.; Tang, B.; Shi, X. Analyzing bug fix for automatic bug cause classification. J. Syst. Softw. 2020, 163, 110538. [Google Scholar] [CrossRef]
- Chillarege, R.; Bhandari, I.S.; Chaar, J.K.; Halliday, M.J.; Moebus, D.S.; Ray, B.K.; Wong, M.Y. Orthogonal defect classification-a concept for in-process measurements. IEEE Trans. Softw. Eng. 1992, 18, 943–956. [Google Scholar] [CrossRef]
- Fluri, B.; Wursch, M.; PInzger, M.; Gall, H. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng. 2007, 33, 725–743. [Google Scholar] [CrossRef]
- Falleri, J.R.; Morandat, F.; Blanc, X.; Martinez, M.; Monperrus, M. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, Vsters, Sweden, 15–19 September 2014; pp. 313–324. [Google Scholar]
- Zhou, B.; Neamtiu, I.; Gupta, R. Predicting concurrency bugs: How many, what kind and where are they? In Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, Nanjing, China, 27–29 April 2015; pp. 1–10. [Google Scholar]
- Ahmed, H.A.; Bawany, N.Z.; Shamsi, J.A. Capbug-a framework for automatic bug categorization and prioritization using nlp and machine learning algorithms. IEEE Access 2021, 9, 50496–50512. [Google Scholar] [CrossRef]
- Tabassum, N.; Namoun, A.; Alyas, T.; Tufail, A.; Taqi, M.; Kim, K.H. Classification of bugs in cloud computing applications using machine learning techniques. Appl. Sci. 2023, 13, 2880. [Google Scholar] [CrossRef]
- Limsettho, N.; Hata, H.; Monden, A.; Matsumoto, K. Automatic unsupervised bug report categorization. In Proceedings of the 2014 6th International Workshop on Empirical Software Engineering in Practice, Osaka, Japan, 12–13 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 7–12. [Google Scholar]
- Limsettho, N.; Hata, H.; Monden, A.; Matsumoto, K. Unsupervised bug report categorization using clustering and labeling algorithm. Int. J. Softw. Eng. Knowl. Eng. 2016, 26, 1027–1053. [Google Scholar] [CrossRef]
- Liu, X.; Xu, Z.; Yang, D.; Yan, M.; Zhang, W.; Zhao, H.; Xue, L.; Fan, M. An unsupervised cross project model for crashing fault residence identification. IET Softw. 2022, 16, 630–646. [Google Scholar] [CrossRef]
- IEEE Std 1044-2009; IEEE Standard Classification for Software Anomalies. IEEE: Piscataway, NJ, USA, 2010; pp. 1–23. [CrossRef]
- Catolino, G.; Palomba, F.; Zaidman, A.; Ferrucci, F. Not all bugs are the same: Understanding, characterizing, and classifying the root cause of bugs. arXiv 2019, arXiv:1907.11031. [Google Scholar] [CrossRef]
- Ahmed, T.; Pai, K.S.; Devanbu, P.; Barr, E. Automatic semantic augmentation of language model prompts (for code summarization). In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. [Google Scholar]
- Jin, M.; Shahriar, S.; Tufano, M.; Shi, X.; Lu, S.; Sundaresan, N.; Svyatkovskiy, A. Inferfix: End-to-end program repair with llms. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, 3–9 December 2023; pp. 1646–1656. [Google Scholar]
- Plein, L.; Bissyandé, T.F. Can llms demystify bug reports? arXiv 2023, arXiv:2310.06310. [Google Scholar] [CrossRef]
- Xiang, B.; Shao, Y. SUMLLAMA: Efficient Contrastive Representations and Fine-Tuned Adapters for Bug Report Summarization. IEEE Access 2024, 12, 78562–78571. [Google Scholar] [CrossRef]
- Zhang, X.; Ghosh, S.; Bansal, C.; Wang, R.; Ma, M.; Kang, Y.; Rajmohan, S. Automated root causing of cloud incidents using in-context learning with GPT-4. In Proceedings of the Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, Porto de Galinhas, Brazil, 15–19 July 2024; pp. 266–277. [Google Scholar]
- Du, X.; Li, C.; Ma, X.; Zheng, Z. How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction? In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, Lisbon, Portugal, 14–20 April 2024; pp. 346–347. [Google Scholar]
- Kumar, A.; Haiduc, S.; Das, P.P.; Chakrabarti, P.P. LLMs as Evaluators: A Novel Approach to Evaluate Bug Report Summarization. arXiv 2024, arXiv:2409.00630. [Google Scholar] [CrossRef]
- Liu, J.; Xia, C.S.; Wang, Y.; Zhang, L. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Adv. Neural Inf. Process. Syst. 2023, 36, 21558–21572. [Google Scholar]
- Hui, B.; Yang, J.; Cui, Z.; Yang, J.; Liu, D.; Zhang, L.; Liu, T.; Zhang, J.; Yu, B.; Lu, K.; et al. Qwen2.5-Coder Technical Report. arXiv 2024, arXiv:2409.12186. [Google Scholar] [CrossRef]
- Guo, D.; Zhu, Q.; Yang, D.; Xie, Z.; Dong, K.; Zhang, W.; Chen, G.; Bi, X.; Wu, Y.; Li, Y.K.; et al. DeepSeek-Coder: When the Large Language Model Meets Programming—The Rise of Code Intelligence. arXiv 2024, arXiv:2401.14196. [Google Scholar] [CrossRef]
- AI, M. Codestral: A State-of-the-Art Code Language Model. 2024. Available online: https://mistral.ai/news/codestral/ (accessed on 3 July 2025).
- Rozière, B.; Gehring, J.; Gloeckle, F.; Sootla, S.; Gat, I.; Tan, X.E.; Adi, Y.; Liu, J.; Sauvestre, R.; Remez, T.; et al. Code Llama: Open Foundation Models for Code. arXiv 2024, arXiv:2308.12950. [Google Scholar] [CrossRef]
- Huang, S.; Cheng, T.; Liu, J.K.; Hao, J.; Song, L.; Xu, Y.; Yang, J.; Liu, J.; Zhang, C.; Chai, L.; et al. OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models. arXiv 2025, arXiv:2411.04905. [Google Scholar] [CrossRef]
- Hurst, A.; Lerer, A.; Goucher, A.P.; Perelman, A.; Ramesh, A.; Clark, A.; Ostrow, A.; Welihinda, A.; Hayes, A.; Radford, A. GPT-4o System Card. arXiv 2024, arXiv:2410.21276. [Google Scholar] [CrossRef]
- Wang, R.; Guo, J.; Gao, C.; Fan, G.; Chong, C.Y.; Xia, X. Can llms replace human evaluators? an empirical study of llm-as-a-judge in software engineering. Proc. ACM Softw. Eng. 2025, 2, 1955–1977. [Google Scholar] [CrossRef]
- Tan, S.; Zhuang, S.; Montgomery, K.; Tang, W.Y.; Cuadron, A.; Wang, C.; Popa, R.A.; Stoica, I. Judgebench: A benchmark for evaluating llm-based judges. arXiv 2024, arXiv:2410.12784. [Google Scholar]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C. DeepSeek-V3 Technical Report. arXiv 2025, arXiv:2412.19437. [Google Scholar] [CrossRef]
- Yamane, T. Statistics: An Introductory Analysis; Harper & Row: New York, NY, USA; Evanston: London, UK; John Weatherhill, Inc.: Tokyo, Japan, 1973. [Google Scholar]




| Judge | Model | Correctness | Clarity | Depth | Overall |
|---|---|---|---|---|---|
| DeepSeek-V3 | CodeLlama 34B Instruct | 0.757 (0.217) | 0.813 (0.156) | 0.493 (0.187) | 0.688 (0.158) |
| Codestral 22B | 0.836 (0.176) | 0.883 (0.132) | 0.616 (0.159) | 0.778 (0.120) | |
| DeepSeek Coder 33B Instruct | 0.743 (0.219) | 0.837 (0.145) | 0.537 (0.190) | 0.706 (0.156) | |
| OpenCoder 8B Instruct | 0.756 (0.227) | 0.677 (0.251) | 0.519 (0.208) | 0.651 (0.181) | |
| Qwen 2.5 Coder 32B Instruct | 0.879 (0.155) | 0.879 (0.133) | 0.636 (0.151) | 0.798 (0.112) | |
| GPT-4o | CodeLlama 34B Instruct | 0.810 (0.186) | 0.769 (0.122) | 0.550 (0.165) | 0.710 (0.137) |
| Codestral 22B | 0.873 (0.157) | 0.824 (0.121) | 0.656 (0.149) | 0.784 (0.121) | |
| DeepSeek Coder 33B Instruct | 0.781 (0.199) | 0.778 (0.132) | 0.562 (0.179) | 0.707 (0.150) | |
| OpenCoder 8B Instruct | 0.731 (0.220) | 0.575 (0.254) | 0.506 (0.222) | 0.604 (0.199) | |
| Qwen 2.5 Coder 32B Instruct | 0.890 (0.133) | 0.821 (0.115) | 0.648 (0.138) | 0.786 (0.105) |
| Model | Correctness | Clarity | Depth of Reasoning |
|---|---|---|---|
| CodeLlama 34B Instruct | 0.670 | 0.236 | 0.608 |
| Codestral 22B | 0.704 | 0.146 | 0.546 |
| DeepSeek Coder 33B Instruct | 0.709 | 0.255 | 0.650 |
| OpenCoder 8B Instruct | 0.639 | 0.631 | 0.555 |
| Qwen2.5 Coder 32B Instruct | 0.662 | 0.216 | 0.532 |
| Judge | Model | Overall (Original) | Overall (Without Clarity) |
|---|---|---|---|
| DeepSeek-V3 | CodeLlama 34B Instruct | 0.688 (0.158) | 0.625 (0.194) |
| Codestral 22B | 0.778 (0.120) | 0.726 (0.158) | |
| DeepSeek Coder 33B Instruct | 0.706 (0.156) | 0.640 (0.196) | |
| OpenCoder 8B Instruct | 0.651 (0.181) | 0.638 (0.204) | |
| Qwen 2.5 Coder 32B Instruct | 0.798 (0.112) | 0.758 (0.143) | |
| GPT-4o | CodeLlama 34B Instruct | 0.710 (0.137) | 0.710 (0.137) |
| Codestral 22B | 0.784 (0.121) | 0.784 (0.121) | |
| DeepSeek Coder 33B Instruct | 0.707 (0.150) | 0.707 (0.150) | |
| OpenCoder 8B Instruct | 0.604 (0.199) | 0.604 (0.199) | |
| Qwen 2.5 Coder 32B Instruct | 0.786 (0.105) | 0.786 (0.105) |
| LLM | CodeBERT Score | ROUGE-L Score |
|---|---|---|
| Qwen2.5 Coder 32B Instruct | 0.989 ± 0.005 | 0.134 ± 0.057 |
| DeepSeek Coder 33B Instruct | 0.978 ± 0.009 | 0.153 ± 0.063 |
| Codestral 22B | 0.987 ± 0.006 | 0.165 ± 0.079 |
| OpenCoder 8B Instruct | 0.947 ± 0.013 | 0.017 ± 0.027 |
| CodeLlama 34B Instruct | 0.987 ± 0.006 | 0.183 ± 0.086 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mollik, R.H.; Datta, A.; Mollah, A.H.; Aljedaani, W. RCEGen: A Generative Approach for Automated Root Cause Analysis Using Large Language Models (LLMs). Software 2025, 4, 29. https://doi.org/10.3390/software4040029
Mollik RH, Datta A, Mollah AH, Aljedaani W. RCEGen: A Generative Approach for Automated Root Cause Analysis Using Large Language Models (LLMs). Software. 2025; 4(4):29. https://doi.org/10.3390/software4040029
Chicago/Turabian StyleMollik, Rubel Hassan, Arup Datta, Anamul Haque Mollah, and Wajdi Aljedaani. 2025. "RCEGen: A Generative Approach for Automated Root Cause Analysis Using Large Language Models (LLMs)" Software 4, no. 4: 29. https://doi.org/10.3390/software4040029
APA StyleMollik, R. H., Datta, A., Mollah, A. H., & Aljedaani, W. (2025). RCEGen: A Generative Approach for Automated Root Cause Analysis Using Large Language Models (LLMs). Software, 4(4), 29. https://doi.org/10.3390/software4040029

