Evaluating the Predictive Power of Software Metrics for Fault Localization
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Prediction Abstraction Level
2.3. Class Labels
- Strongly Faulty: Real faulty class.
- Faulty: Classes ranked between 1 and 5.
- Fairly Faulty: Classes ranked between 6 and 10.
- Weakly Faulty: Classes ranked between 11 and 15.
- Not Faulty: Classes ranked beyond position 15.
2.4. Feature Generation
2.4.1. Static Metrics
2.4.2. Dynamic Metrics
2.4.3. Test Suite Characteristics
2.5. Data Preparation
2.6. Random Baseline Estimation
2.7. Machine Learning Models
2.8. Evaluation Metrics
- -
- Accuracy (ACC): It measures the proportion of correctly predicted instances over the total number of instances. It provides a general overview of the model’s performance but can be misleading when dealing with imbalanced datasets.
- -
- Weighted Precision (WP): It quantifies the proportion of correctly identified positive predictions out of all predictions for a given class. Weighted precision extends this calculation by incorporating the support of each class, ensuring that the precision metric reflects the class distribution as follows:
- -
- Weighted Recall (WR): It measures the proportion of actual positive instances correctly identified by the model for a given class. Similarly to weighted precision, weighted recall accounts for class support, ensuring that it reflects the overall dataset distribution as follows:
2.9. Code Availability
3. Results and Discussion
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Haghighatkhah, A.; Oivo, M.; Banijamali, A.; Kuvaja, P. Improving the state of automotive software engineering. IEEE Softw. 2017, 34, 82–86. [Google Scholar] [CrossRef]
- Rana, R.; Staron, M.; Hansson, J.; Nilsson, M. Defect prediction over software life cycle in automotive domain state of the art and road map for future. In Proceedings of the 2014 9th International Conference on Software Engineering and Applications (ICSOFT-EA), Vienna, Austria, 29–31 August 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 377–382. [Google Scholar]
- Planning, S. The economic impacts of inadequate infrastructure for software testing. Natl. Inst. Stand. Technol. 2002, 1, 169. [Google Scholar]
- Jones, J.A.; Harrold, M.J. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, CA, USA, 7–11 November 2005; pp. 273–282. [Google Scholar]
- Renieres, M.; Reiss, S.P. Fault localization with nearest neighbor queries. In Proceedings of the 18th IEEE International Conference on Automated Software Engineering, Montreal, QC, Canada, 6–10 October 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 30–39. [Google Scholar]
- Zeller, A. Isolating cause-effect chains from computer programs. ACM SIGSOFT Softw. Eng. Notes 2002, 27, 1–10. [Google Scholar] [CrossRef]
- Omri, S.; Montag, P.; Sinz, C. Static analysis and code complexity metrics as early indicators of software defects. J. Softw. Eng. Appl. 2018, 11, 153. [Google Scholar] [CrossRef]
- Arab, I.; Bourhnane, S.; Kafou, F. Unifying modeling language-merise integration approach for software design. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 4. [Google Scholar] [CrossRef]
- Falah, B.; Akour, M.; Arab, I.; M’hanna, Y. An attempt towards a formalizing UML class diagram semantics. In Proceedings of the New Trends in Information Technology (NTIT-2017), Amman, Jordan, 25–27 April 2017; pp. 21–27. [Google Scholar]
- Oyeniran, O.C.; Adewusi, A.O.; Adeleke, A.G.; Akwawa, L.A.; Azubuko, C.F. AI-driven DevOps: Leveraging machine learning for automated software deployment and maintenance. Eng. Sci. Technol. J. 2023, 4, 6, 728–740. [Google Scholar] [CrossRef]
- Arab, I.; Bourhnane, S. Reducing the cost of mutation operators through a novel taxonomy: Application on scripting languages. In Proceedings of the International Conference on Geoinformatics and Data Analysis, Prague, Czech Republic, 20–22 April 2018; pp. 47–56. [Google Scholar]
- Jones, J.A.; Bowring, J.F.; Harrold, M.J. Debugging in parallel. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, London, UK, 9–12 July 2007; pp. 16–26. [Google Scholar]
- Abreu, R.; Zoeteweij, P.; Van Gemund, A.J. On the accuracy of spectrum-based fault localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION (TAICPART-MUTATION 2007), Windsor, UK, 10–14 September 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 89–98. [Google Scholar]
- Wong, W.E.; Gao, R.; Li, Y.; Abreu, R.; Wotawa, F. A survey on software fault localization. IEEE Trans. Softw. Eng. 2016, 42, 707–740. [Google Scholar] [CrossRef]
- Pearson, S.; Campos, J.; Just, R.; Fraser, G.; Abreu, R.; Ernst, M.D.; Pang, D.; Keller, B. Evaluating and improving fault localization. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, Argentina, 20–28 May 2017; pp. 609–620. [Google Scholar]
- Liblit, B.; Naik, M.; Zheng, A.X.; Aiken, A.; Jordan, M.I. Scalable statistical bug isolation. ACM Sigplan Not. 2005, 40, 15–26. [Google Scholar] [CrossRef]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef]
- Arab, I.; Barakat, K. ToxTree: Descriptor-based machine learning models for both hERG and Nav1. 5 cardiotoxicity liability predictions. arXiv 2021, arXiv:2112.13467. [Google Scholar]
- Hapke, H.; Howard, C.; Lane, H. Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python. Simon and Schuster; Amazon: Seattle, WA, USA, 2019. [Google Scholar]
- Zou, Y.; Li, H.; Li, D.; Zhao, M.; Chen, Z. Systematic Analysis of Learning-Based Software Fault Localization. In Proceedings of the 2024 10th International Symposium on System Security, Safety, and Reliability (ISSSR), Xiamen, China, 16–17 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 478–489. [Google Scholar]
- Wang, J.; Huang, Y.; Chen, C.; Liu, Z.; Wang, S.; Wang, Q. Software testing with large language models: Survey, landscape, and vision. IEEE Trans. Softw. Eng. 2024, 50, 911–936. [Google Scholar] [CrossRef]
- Tufano, M.; Drain, D.; Svyatkovskiy, A.; Deng, S.K.; Sundaresan, N. Unit test case generation with transformers and focal context. arXiv 2020, arXiv:2009.05617. [Google Scholar]
- Mastropaolo, A.; Cooper, N.; Palacio, D.N.; Scalabrino, S.; Poshyvanyk, D.; Oliveto, R.; Bavota, G. Using transfer learning for code-related tasks. IEEE Trans. Softw. Eng. 2022, 49, 1580–1598. [Google Scholar] [CrossRef]
- Sobania, D.; Briesch, M.; Hanna, C.; Petke, J. An analysis of the automatic bug fixing performance of chatgpt. In Proceedings of the 2023 IEEE/ACM International Workshop on Automated Program Repair (APR), Melbourne, Australia, 16 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 23–30. [Google Scholar]
- Mukherjee, U.; Rahman, M.M. Employing deep learning and structured information retrieval to answer clarification questions on bug reports. arXiv 2023, arXiv:2304.12494. [Google Scholar]
- Mahbub, P.; Shuvo, O.; Rahman, M.M. Explaining software bugs leveraging code structures in neural machine translation. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 640–652. [Google Scholar]
- Just, R.; Jalali, D.; Ernst, M.D. Defects4j: A database of existing faults to enable controlled testing studies for java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, ser. ISSTA 2014, New York, NY, USA, 21–26 July 2014; ACM: New York, NY, USA, 2014; pp. 437–440. [Google Scholar]
- Artzi, S.; Dolby, J.; Tip, F.; Pistoia, M. Fault localization for dynamic web applications. IEEE Trans. Softw. Eng. 2011, 38, 314–335. [Google Scholar] [CrossRef]
- Mariani, L.; Pastore, F.; Pezze, M. Dynamic analysis for diagnosing integration faults. IEEE Trans. Softw. Eng. 2010, 37, 486–508. [Google Scholar] [CrossRef]
- Baah, G.K.; Podgurski, A.; Harrold, M.J. Mitigating the confounding effects of program dependences for effective fault localization. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering; ACM: New York, NY, USA, 2011; pp. 146–156. [Google Scholar]
- Ye, X.; Bunescu, R.; Liu, C. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering; ACM: New York, NY, USA, 2024; pp. 689–699. [Google Scholar]
- Zhou, J.; Zhang, H.; Lo, D. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE); ACM: New York, NY, USA, 2012; pp. 14–24. [Google Scholar]
- Kim, D.; Tao, Y.; Kim, S.; Zeller, A. Where should we fix this bug? a two-phase recommendation model. IEEE Trans. Softw. Eng. 2013, 39, 1597–1610. [Google Scholar]
- Zagane, M.; Abdi, M.K.; Alenezi, M. Deep learning for software vulnerabilities detection using code metrics. IEEE Access 2020, 8, 74562–74570. [Google Scholar] [CrossRef]
- Kochhar, P.S.; Xia, X.; Lo, D.; Li, S. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis, Saarbrücken, Germany, 18–20 July 2016; pp. 165–176. [Google Scholar]
- Sarhan, Q.I.; Beszédes, Á. A survey of challenges in spectrum-based software fault localization. IEEE Access 2022, 10, 10618–10639. [Google Scholar] [CrossRef]
- Ma, C.; Tan, T.; Chen, Y.; Dong, Y. An if-while-if model-based performance evaluation of ranking metrics for spectra-based fault localization. In Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference, Kyoto, Japan, 22–26 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 609–618. [Google Scholar]
- Li, P.; Jiang, M.; Ding, Z. Fault localization with weighted test model in model transformations. IEEE Access 2020, 8, 14054–14064. [Google Scholar] [CrossRef]
- Keller, F.; Grunske, L.; Heiden, S.; Filieri, A.; van Hoorn, A.; Lo, D. A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In Proceedings of the 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), Prague, Czech Republic, 25–29 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 114–125. [Google Scholar]
- Campos, J.; Riboira, A.; Perez, A.; Abreu, R. Gzoltar: An eclipse plug-in for testing and debugging. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, Essen, Germany, 3–7 September 2012; pp. 378–381. [Google Scholar]
- Arab, I.; Falah, B.; Magel, K. SCMS: Tool for Assessing a Novel Taxonomy of Complexity Metrics for any Java Project at the Class and Method Levels based on Statement Level Metrics. Adv. Sci. Technol. Eng. Syst. J. 2019, 4, 220–228. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
- Al Qasem, O.; Akour, M.; Alenezi, M. The influence of deep learning algorithms factors in software fault prediction. IEEE Access 2020, 8, 63945–63960. [Google Scholar] [CrossRef]
Metric | Short Description |
---|---|
CBO | Counts the number of dependencies a class has |
NOF | Counts the number of fields in a class, no matter their modifiers |
NOPF | Counts only the public fields |
NOSF | Counts only the static fields |
NOM | Counts the number of methods, no matter its modifiers |
NOPM | Counts only the public methods |
NOSM | Counts only the static methods |
WMC | It counts the number of branch instructions in a class |
LOC | It counts the lines of counts, ignoring empty lines |
LCOM | Calculates the lack of Cohesion of Methods |
Tot2Op | Counts the total number of operators |
TotMaxOp | Counts the total of the max operators |
Max2Op | Counts the max operators |
MaxTotOp | Counts the max total number of operators based on method results |
Tot2Lev | Counts the total number of levels in the whole class code based on method results |
TotMaxLev | Counts the sum of the maximum level in each method |
MaxTotLev | Counts the max of the total number of levels in each method |
Max2Lev | Counts the max level in the whole class. (In other words, the deepest branch) |
Tot2DU | Counts the total amount of data usage in the class |
TotMaxDU | Counts the total amount of max data usage in the class |
Max2DU | Counts the max amount of data usage in the class |
MaxTotDU | Counts the max amount of total data usage in the class |
PubMembers | Counts the number of public members (fields or methods) |
Metric | Short Description |
---|---|
DIT | It counts the number of “fathers” a class has |
NOSI | Counts the number of invocations to static methods |
NOC | Counts the number of children a class has |
RFC | Counts the number of unique method invocations in a class |
Tot2DF | Counts the total number of data flows in a class |
TotMaxDF | Counts the total max data flow in each method of the class |
Max2DF | Counts the max data flow in each method of the class |
MaxTotDF | Counts the max of total data flows in each method of the class |
TotInMetCall | Counts the total number of within-class method calls |
MaxInMetCall | Counts the max number of within-class method calls |
InOutDeg | Counts the number of in-class calls for the external method (Similar to the out-degrees of a dynamic call graph) |
Metric | Short Description |
---|---|
Run Time | The run time in seconds that it took Gzoltar to run all the tests and generate the matrix |
Ncf | The number of failed test cases that cover the class |
Nuf | The number of failed test cases that do not cover the class |
Ncs | The number of successful test cases that cover the class |
Ns | The number of successful tests |
Nf | The number of failed tests |
Ntsc | The total number of statements in the class covered by the test suit |
Ndsc | The distinct number of statements covered by the test suite in a class |
Nntc | The total number of test cases |
PassTestRatio | The ratio of passed test cases in a class vs. the total number of tests that cover the class |
FailTestRatio | The ratio of failed test cases in a class vs. the total number of tests that cover the class |
TotPassTestRatio | The ratio of passed test cases in a class vs. the total number of tests in the test suite |
TotFailTestRatio | The ratio of failed test cases in a class vs. the total number of tests in the test suite |
NTestRunPerRT | The number of tests run on a class vs. the total run time |
Um | Uniqueness metric |
Md | Matrix density |
Nmd | Normalized matrix density |
Gs | Gini Simpson |
DDU | Density Diversity Uniqueness |
Model | WP | WR | ACC |
---|---|---|---|
RF | 73.8% ± 2.1% | 73.5% ± 2.1% | 73.5% ± 2.1% |
XGBoost | 78.5% ± 2.0% | 78.1% ± 2.0% | 78.1% ± 2.0% |
LightGBM | 79.0% ± 1.9% | 78.6% ± 2.0% | 78.6% ± 2.0% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arab, I.; Magel, K.; Akour, M. Evaluating the Predictive Power of Software Metrics for Fault Localization. Computers 2025, 14, 222. https://doi.org/10.3390/computers14060222
Arab I, Magel K, Akour M. Evaluating the Predictive Power of Software Metrics for Fault Localization. Computers. 2025; 14(6):222. https://doi.org/10.3390/computers14060222
Chicago/Turabian StyleArab, Issar, Kenneth Magel, and Mohammed Akour. 2025. "Evaluating the Predictive Power of Software Metrics for Fault Localization" Computers 14, no. 6: 222. https://doi.org/10.3390/computers14060222
APA StyleArab, I., Magel, K., & Akour, M. (2025). Evaluating the Predictive Power of Software Metrics for Fault Localization. Computers, 14(6), 222. https://doi.org/10.3390/computers14060222