Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach
Abstract
:1. Introduction
1.1. Motivation
1.2. Contributions
- A generalized mathematical model, called Tsallis distribution, is derived using the maximum-entropy principle.
- Tsallis distribution is fit to fault data sets of enterprise and open-source software, and it is found to be a generic model.
- Applications of the Tsallis distribution in software fault-prediction and the software-reliability model are also outlined.
2. Related Work
3. Methodology
3.1. Data Collection
3.2. Generalized Pareto Distribution
3.3. Weibull Distribution
3.4. Maximum Entropy Tsallis Distribution
Algorithm 1 Algorithm for Fitting Tsallis Distribution to Empirical Dataset of Software Faults |
Require: Empirical data |
Ensure: Estimated values of q and β |
Compute arithmetic mean A from the data |
Compute empirical cumulative distribution of faults |
Initialize Tsallis entropy parameter q |
Give initial value to parameter β |
while do |
compute using (18) |
repeat above two steps till converges |
compute cumulative distribution of faults using (14) |
compute KS statistics |
increment q |
end while |
Choose minimum value KS and corresponding q and |
4. Results and Discussion
5. Threats of Validity
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jalote, P. An Integrated Approach to Software Engineering; Springer: New York, NY, USA, 2005. [Google Scholar]
- Kaur, N.; Singh, H. An empirical assessment of threshold techniques to discriminate the fault status of software. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 6339–6353. [Google Scholar] [CrossRef]
- Huang, C.Y.; Kuo, C.S.; Luan, S.P. Evaluation and application of bounded generalized pareto analysis to fault distributions inopen-source software. IEEE Trans. Rel. 2014, 63, 309–319. [Google Scholar] [CrossRef]
- Boehm, B.; Basili, V.R. Software defect reduction top 10 list. Computer 2001, 34, 135–137. [Google Scholar] [CrossRef]
- Ozakinci, R.; Tarhan, A. Early software defect prediction: A systematic map and review. J. Syst. Softw. 2018, 144, 216–239. [Google Scholar] [CrossRef]
- Tanaka, K.; Tsuda, K. Methods to predict the number of software faults using Weibull distribution. In Proceedings of the IEEE 40th Annual Computer Software and Applications Conference, Atlanta, GA, USA, 10–14 June 2016; pp. 105–110. [Google Scholar]
- Ostrand, T.J.; Weyuker, E.J. The distribution of faults in a large industrial software system. ACM SIGSOFT Softw. Eng. Notes 2002, 27, 55–64. [Google Scholar] [CrossRef]
- Fenton, N.E.; Ohisson, N. Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng. 2000, 26, 797–814. [Google Scholar] [CrossRef] [Green Version]
- Vrankovi, A.; Grbac, T.G. Replication of quantitative analysis of bug distributions on open-source software systems. In Proceedings of the 7th Workshop of Software Quality Analysis, Monitoring, Improvement, and Applications, Novi Sad, Serbia, 27–30 August 2018; pp. 22:1–22:9. [Google Scholar]
- Zhang, H. On the distribution of software faults. IEEE Trans. Softw. Eng. 2008, 34, 301–302. [Google Scholar] [CrossRef]
- Grbac, T.G.; Runeson, P.; Huljenic, D. A second replicated quantitative analysis of bug distributions in complex software systems. IEEE Trans. Softw. Eng. 2013, 39, 462–476. [Google Scholar] [CrossRef] [Green Version]
- Sriram, C.K.; Muthukumaran, K.; Murthy, N.L.B. Empirical study on the distribution of faults in software systems. Int. J. Softw. Eng. Knowl. Eng. 2018, 28, 97–122. [Google Scholar] [CrossRef]
- Grbac, T.G.; Huljenic, D. On the probability distribution of faults in complex software systems. Inf. Softw. Technol. 2015, 58, 250–258. [Google Scholar] [CrossRef]
- Pham, T.; Pham, H. A generalized software-reliability model with stochastic fault-detection rate. Ann. Oper. Res. 2019, 277, 83–93. [Google Scholar] [CrossRef]
- Thapar, S.S.; Singh, P.; Rani, S. Using ordered Probit model to study the effects of component quality on reusability. Appl. Math. Inf. Sci. 2018, 12, 159–170. [Google Scholar] [CrossRef]
- Harter, D.E.; Kemerer, C.F.; Slaughter, S.A. Does software process improvement reduce the severity of defects? A longitudinal field study. IEEE Trans. Softw. Eng. 2012, 38, 810–827. [Google Scholar] [CrossRef]
- Andersson, C.; Runeson, P. A replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng. 2007, 33, 273–286. [Google Scholar] [CrossRef]
- Daskalantonakis, M.K. A practical view of software measurement and implementation experiences within motorola. IEEE Trans. Softw. Eng. 1992, 18, 998–1010. [Google Scholar] [CrossRef] [Green Version]
- Concas, G.; Marchesi, M.; Murgia, A.; Tonelli, R.; Turnu, I. On the distribution of bugs in the eclipse system. IEEE Trans. Softw. Eng. 2011, 37, 872–877. [Google Scholar] [CrossRef]
- Hribar, L.; Dula, D. Weibull distribution in modeling component faults. In Proceedings of the 52nd 52nd International Symposium ELMAR, Zadar, Croatia, 15–17 September 2010; pp. 183–186. [Google Scholar]
- Hunt, F.; Johnson, P. On the Pareto distribution of sourceforge projects. In Proceedings of the International Workshop open-source software Develop, Orlando, FL, USA, 19–25 May 2002; pp. 122–129. [Google Scholar]
- Zimmermann, T.; Premraj, R.; Zeller, A. Predicting defects for eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, Minneapolis, MN, USA, 20–26 May 2007. [Google Scholar]
- Equinox. Available online: https://bug.inf.usi.ch/download.php (accessed on 30 October 2022).
- KAA Platform. Available online: https://www.kaaproject.org/ (accessed on 18 February 2020).
- GCC. Available online: https://gcc.gnu.org/bugzilla/ (accessed on 18 February 2020).
- Samba. Available online: https://bugzilla.samba.org/ (accessed on 18 February 2020).
- Available online: https://bugs.python.org/ (accessed on 18 February 2020).
- Available online: https://bugzilla.mozilla.org/ (accessed on 18 February 2020).
- Kuo, C.; Huang, C.; Luan, S. A study of using two-parameter generalized Pareto model to analyze the fault distribution of open-source software. In Proceedings of the IEEE Sixth International Conference on Software Security and Reliability, Gaithersburg, MD, USA, 20–22 June 2012; pp. 88–97. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Gell-mann, M.; Tsallis, C. Nonextensive Entropy: Interdisciplinary Applications; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
- Hatton, L. Power-law distributions of component size in general software systems. IEEE Trans. Softw. Eng. 2009, 35, 566–572. [Google Scholar] [CrossRef]
- Peterson, J.; Dixit, P.D.; Dill, K.A. A maximum entropy framework for nonexponential distributions. Proc. Natl. Acad. Sci. USA 2013, 110, 20380–20385. [Google Scholar] [CrossRef] [Green Version]
- Sharma, S.; Pendharkar, P.C.; Karmeshu, K. Learning component size distributions for software cost estimation: Models based on arithmetic and shifted geometric means rules. IEEE Trans. Softw. Eng. 2021. [Google Scholar] [CrossRef]
- Karmeshu, K.; Sharma, S. Power law and Tsallis entropy: Network traffic and applications. In Chaos, Nonlinearity, Complexity; Springer: Berlin/Heidelberg, Germany, 2006; pp. 162–178. [Google Scholar]
- Karmeshu, K.; Sharma, S. q-Exponential product-form solution of packet distribution in queueing networks: Maximization of Tsallis entropy. IEEE Comm. Lett. 2006, 10, 585–587. [Google Scholar] [CrossRef]
- Sharma, S.; Karmeshu, K. Bimodal packet distribution in loss systems using maximum Tsallis entropy principle. IEEE Trans. Comm. 2008, 56, 1530–1535. [Google Scholar] [CrossRef]
- Sharma, S.; Karmeshu, K. Power law characteristics and loss probability: Finite buffer queueing systems. IEEE Comm. Lett. 2009, 13, 971–973. [Google Scholar] [CrossRef]
- Sharma, S.; Pendharkar, P.C. On the analysis of power law distribution in software component sizes. J. Softw. Evol. Process 2022, 34, e2417. [Google Scholar] [CrossRef]
- Massey, F.J. The Kolmogrov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
- Clauset, A.; Shallz, C.R.; Newman, M.E.J. Power-law distributions in empirical data. SIAM Rev. 2009, 51, 661–703. [Google Scholar] [CrossRef] [Green Version]
- Rana, R.; Staron, M.; Berger, C.; Hansson, J.; Nilsson, M.; Meding, W. Analyzing defect inflow distribution and applying Bayesian inference method for software defect prediction in large software projects. J. Syst. Softw. 2016, 117, 229–244. [Google Scholar] [CrossRef]
- Goel, A.L.; Okumoto, K. Time-dependent error-detection rate model for software reliability and other performance measures. IEEE Trans. Rel. 1979, 28, 206–211. [Google Scholar] [CrossRef]
Software | Number of Modules | Number of Pre-Release Faults | Number of Post-Release Faults |
---|---|---|---|
Eclipse 2.0 | 376 | 4152 | 2049 |
Eclipse 2.1 | 433 | 2007 | 1394 |
Eclipse 3.0 | 431 | 3312 | 2151 |
Software | Type | Number of Modules | Number of Faults |
---|---|---|---|
Equinox | enterprise | 313 | 3120 |
KAA | enterprise | 30 | 711 |
gcc version 10 | open source | 23 | 290 |
samba version 3.0 | open source | 35 | 2519 |
samba version 4.0 | open source | 19 | 2523 |
samba version 4.1 | open source | 133 | 2398 |
Python version 3.9 | open source | 74 | 841 |
Firefox version 2.0 | open source | 46 | 10,000 |
Firefox for Android | open source | 29 | 10,000 |
Generalized Pareto | Weibull | ||||||
---|---|---|---|---|---|---|---|
KS | h Value | p Value | KS | h Value | p Value | ||
Pre-release faults | Eclipse 2.0 | 0.1944 | 0 | 0.4603 | 0.3889 | 1 | 0.0059 |
Eclipse 2.1 | 0.1667 | 0 | 0.8608 | 0.3750 | 0 | 0.0506 | |
Eclipse 3.0 | 0.1250 | 0 | 0.9868 | 0.2500 | 0 | 0.3873 | |
Post-release faults | Eclipse 2.0 | 0.2353 | 0 | 0.6725 | 0.8824 | 0 | 0.2083 |
Eclipse 2.1 | 0.9091 | 1 | 8.1868 | 0.7083 | 1 | 4.0102 | |
Eclipse 3.0 | 0.9412 | 1 | 1.0822 | 0.5833 | 1 | 2.7336 | |
Equinox | 1.0000 | 1 | 1.3029 | 1.0000 | 1 | 1.3029 | |
KAA | 0.0741 | 0 | 1.0000 | 0.0741 | 0 | 1.0000 |
Tsallis | ||||||
---|---|---|---|---|---|---|
KS | h Value | p Value | q | |||
Pre-release faults | Eclipse 2.0 | 0.0811 | 0 | 0.9995 | 0.71 | 1.2978 |
Eclipse 2.1 | 0.1600 | 0 | 0.9896 | 0.75 | 1.6322 | |
Eclipse 3.0 | 0.1111 | 0 | 0.9713 | 0.71 | 1.7671 | |
Post-release faults | Eclipse 2.0 | 0.0556 | 0 | 1.0000 | 0.72 | 3.1030 |
Eclipse 2.1 | 0.0909 | 0 | 1.0000 | 0.82 | 2.9025 | |
Eclipse 3.0 | 0.1176 | 0 | 0.9994 | 0.76 | 2.6499 | |
Equinox | 0.0435 | 0 | 1.0000 | 0.66 | 0.2850 | |
KAA | 0.0741 | 0 | 1.0000 | 0.51 | 0.1250 |
Dataset | Generalized Pareto | Weibull | Tsallis | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
KS | h Value | p Value | KS | h Value | p Value | KS | h Value | p Value | q | ||
gcc version 10 | 0.1429 | 0 | 0.9971 | 0.2857 | 0 | 0.5407 | 0.1429 | 0 | 0.9971 | 0.70 | 0.1857 |
samba version 3.0 | 0.1111 | 0 | 0.9936 | 0.1111 | 0 | 0.9936 | 0.1111 | 0 | 0.9936 | 0.71 | 0.0327 |
samba version 4.0 | 0.1500 | 0 | 0.9655 | 0.1500 | 0 | 0.9655 | 0.1000 | 0 | 0.9999 | 0.71 | 0.0178 |
samba version 4.1 | 0.9474 | 1 | 1.3431 | 0.1053 | 0 | 0.9998 | 0.1053 | 0 | 0.9998 | 0.83 | 0.0158 |
Python version 3.9 | 1.0000 | 1 | 1.5659 | 1.0000 | 1 | 1.5659 | 0.1579 | 0 | 0.9563 | 0.56 | 0.6151 |
Firefox version 2.0 | 1.0000 | 1 | 1.3029 | 1.0000 | 1 | 1.3029 | 0.0652 | 0 | 0.9999 | 0.66 | 0.0143 |
Firefox for Android | 1.0000 | 1 | 5.0391 | 1.0000 | 1 | 5.0391 | 0.1034 | 0 | 0.9961 | 0.57 | 0.0206 |
Software Type | Pareto and Its Variants | Weibull | Tsallis |
---|---|---|---|
Enterprise | × | √ | √ |
Open source | √ | × | √ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sharma, S. Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach. Software 2022, 1, 473-484. https://doi.org/10.3390/software1040020
Sharma S. Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach. Software. 2022; 1(4):473-484. https://doi.org/10.3390/software1040020
Chicago/Turabian StyleSharma, Shachi. 2022. "Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach" Software 1, no. 4: 473-484. https://doi.org/10.3390/software1040020