# Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation

#### 1.2. Contributions

- A generalized mathematical model, called Tsallis distribution, is derived using the maximum-entropy principle.
- Tsallis distribution is fit to fault data sets of enterprise and open-source software, and it is found to be a generic model.
- Applications of the Tsallis distribution in software fault-prediction and the software-reliability model are also outlined.

## 2. Related Work

## 3. Methodology

#### 3.1. Data Collection

#### 3.2. Generalized Pareto Distribution

#### 3.3. Weibull Distribution

#### 3.4. Maximum Entropy Tsallis Distribution

**Cumulative distribution of faults:**

**Estimation of parameters:**

Algorithm 1 Algorithm for Fitting Tsallis Distribution to Empirical Dataset of Software Faults |

Require: Empirical data |

Ensure: Estimated values of q and β |

Compute arithmetic mean A from the data |

Compute empirical cumulative distribution of faults |

Initialize Tsallis entropy parameter q |

Give initial value to parameter β |

while $q<1$ do |

compute $\Delta \beta $ using (18) |

$\beta \phantom{\rule{3.33333pt}{0ex}}\leftarrow \phantom{\rule{3.33333pt}{0ex}}\beta +\Delta \beta $ |

repeat above two steps till $\beta $ converges |

compute cumulative distribution of faults using (14) |

compute KS statistics |

increment q |

end while |

Choose minimum value KS and corresponding q and $\beta $ |

## 4. Results and Discussion

## 5. Threats of Validity

## 6. Conclusions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Jalote, P. An Integrated Approach to Software Engineering; Springer: New York, NY, USA, 2005. [Google Scholar]
- Kaur, N.; Singh, H. An empirical assessment of threshold techniques to discriminate the fault status of software. J. King Saud Univ. Comput. Inf. Sci.
**2022**, 34, 6339–6353. [Google Scholar] [CrossRef] - Huang, C.Y.; Kuo, C.S.; Luan, S.P. Evaluation and application of bounded generalized pareto analysis to fault distributions inopen-source software. IEEE Trans. Rel.
**2014**, 63, 309–319. [Google Scholar] [CrossRef] - Boehm, B.; Basili, V.R. Software defect reduction top 10 list. Computer
**2001**, 34, 135–137. [Google Scholar] [CrossRef] - Ozakinci, R.; Tarhan, A. Early software defect prediction: A systematic map and review. J. Syst. Softw.
**2018**, 144, 216–239. [Google Scholar] [CrossRef] - Tanaka, K.; Tsuda, K. Methods to predict the number of software faults using Weibull distribution. In Proceedings of the IEEE 40th Annual Computer Software and Applications Conference, Atlanta, GA, USA, 10–14 June 2016; pp. 105–110. [Google Scholar]
- Ostrand, T.J.; Weyuker, E.J. The distribution of faults in a large industrial software system. ACM SIGSOFT Softw. Eng. Notes
**2002**, 27, 55–64. [Google Scholar] [CrossRef] - Fenton, N.E.; Ohisson, N. Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng.
**2000**, 26, 797–814. [Google Scholar] [CrossRef] [Green Version] - Vrankovi, A.; Grbac, T.G. Replication of quantitative analysis of bug distributions on open-source software systems. In Proceedings of the 7th Workshop of Software Quality Analysis, Monitoring, Improvement, and Applications, Novi Sad, Serbia, 27–30 August 2018; pp. 22:1–22:9. [Google Scholar]
- Zhang, H. On the distribution of software faults. IEEE Trans. Softw. Eng.
**2008**, 34, 301–302. [Google Scholar] [CrossRef] - Grbac, T.G.; Runeson, P.; Huljenic, D. A second replicated quantitative analysis of bug distributions in complex software systems. IEEE Trans. Softw. Eng.
**2013**, 39, 462–476. [Google Scholar] [CrossRef] [Green Version] - Sriram, C.K.; Muthukumaran, K.; Murthy, N.L.B. Empirical study on the distribution of faults in software systems. Int. J. Softw. Eng. Knowl. Eng.
**2018**, 28, 97–122. [Google Scholar] [CrossRef] - Grbac, T.G.; Huljenic, D. On the probability distribution of faults in complex software systems. Inf. Softw. Technol.
**2015**, 58, 250–258. [Google Scholar] [CrossRef] - Pham, T.; Pham, H. A generalized software-reliability model with stochastic fault-detection rate. Ann. Oper. Res.
**2019**, 277, 83–93. [Google Scholar] [CrossRef] - Thapar, S.S.; Singh, P.; Rani, S. Using ordered Probit model to study the effects of component quality on reusability. Appl. Math. Inf. Sci.
**2018**, 12, 159–170. [Google Scholar] [CrossRef] - Harter, D.E.; Kemerer, C.F.; Slaughter, S.A. Does software process improvement reduce the severity of defects? A longitudinal field study. IEEE Trans. Softw. Eng.
**2012**, 38, 810–827. [Google Scholar] [CrossRef] - Andersson, C.; Runeson, P. A replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng.
**2007**, 33, 273–286. [Google Scholar] [CrossRef] - Daskalantonakis, M.K. A practical view of software measurement and implementation experiences within motorola. IEEE Trans. Softw. Eng.
**1992**, 18, 998–1010. [Google Scholar] [CrossRef] [Green Version] - Concas, G.; Marchesi, M.; Murgia, A.; Tonelli, R.; Turnu, I. On the distribution of bugs in the eclipse system. IEEE Trans. Softw. Eng.
**2011**, 37, 872–877. [Google Scholar] [CrossRef] - Hribar, L.; Dula, D. Weibull distribution in modeling component faults. In Proceedings of the 52nd 52nd International Symposium ELMAR, Zadar, Croatia, 15–17 September 2010; pp. 183–186. [Google Scholar]
- Hunt, F.; Johnson, P. On the Pareto distribution of sourceforge projects. In Proceedings of the International Workshop open-source software Develop, Orlando, FL, USA, 19–25 May 2002; pp. 122–129. [Google Scholar]
- Zimmermann, T.; Premraj, R.; Zeller, A. Predicting defects for eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, Minneapolis, MN, USA, 20–26 May 2007. [Google Scholar]
- Equinox. Available online: https://bug.inf.usi.ch/download.php (accessed on 30 October 2022).
- KAA Platform. Available online: https://www.kaaproject.org/ (accessed on 18 February 2020).
- GCC. Available online: https://gcc.gnu.org/bugzilla/ (accessed on 18 February 2020).
- Samba. Available online: https://bugzilla.samba.org/ (accessed on 18 February 2020).
- Available online: https://bugs.python.org/ (accessed on 18 February 2020).
- Available online: https://bugzilla.mozilla.org/ (accessed on 18 February 2020).
- Kuo, C.; Huang, C.; Luan, S. A study of using two-parameter generalized Pareto model to analyze the fault distribution of open-source software. In Proceedings of the IEEE Sixth International Conference on Software Security and Reliability, Gaithersburg, MD, USA, 20–22 June 2012; pp. 88–97. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] [Green Version] - Gell-mann, M.; Tsallis, C. Nonextensive Entropy: Interdisciplinary Applications; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
- Hatton, L. Power-law distributions of component size in general software systems. IEEE Trans. Softw. Eng.
**2009**, 35, 566–572. [Google Scholar] [CrossRef] - Peterson, J.; Dixit, P.D.; Dill, K.A. A maximum entropy framework for nonexponential distributions. Proc. Natl. Acad. Sci. USA
**2013**, 110, 20380–20385. [Google Scholar] [CrossRef] [Green Version] - Sharma, S.; Pendharkar, P.C.; Karmeshu, K. Learning component size distributions for software cost estimation: Models based on arithmetic and shifted geometric means rules. IEEE Trans. Softw. Eng.
**2021**. [Google Scholar] [CrossRef] - Karmeshu, K.; Sharma, S. Power law and Tsallis entropy: Network traffic and applications. In Chaos, Nonlinearity, Complexity; Springer: Berlin/Heidelberg, Germany, 2006; pp. 162–178. [Google Scholar]
- Karmeshu, K.; Sharma, S. q-Exponential product-form solution of packet distribution in queueing networks: Maximization of Tsallis entropy. IEEE Comm. Lett.
**2006**, 10, 585–587. [Google Scholar] [CrossRef] - Sharma, S.; Karmeshu, K. Bimodal packet distribution in loss systems using maximum Tsallis entropy principle. IEEE Trans. Comm.
**2008**, 56, 1530–1535. [Google Scholar] [CrossRef] - Sharma, S.; Karmeshu, K. Power law characteristics and loss probability: Finite buffer queueing systems. IEEE Comm. Lett.
**2009**, 13, 971–973. [Google Scholar] [CrossRef] - Sharma, S.; Pendharkar, P.C. On the analysis of power law distribution in software component sizes. J. Softw. Evol. Process
**2022**, 34, e2417. [Google Scholar] [CrossRef] - Massey, F.J. The Kolmogrov-Smirnov test for goodness of fit. J. Am. Stat. Assoc.
**1951**, 46, 68–78. [Google Scholar] [CrossRef] - Clauset, A.; Shallz, C.R.; Newman, M.E.J. Power-law distributions in empirical data. SIAM Rev.
**2009**, 51, 661–703. [Google Scholar] [CrossRef] [Green Version] - Rana, R.; Staron, M.; Berger, C.; Hansson, J.; Nilsson, M.; Meding, W. Analyzing defect inflow distribution and applying Bayesian inference method for software defect prediction in large software projects. J. Syst. Softw.
**2016**, 117, 229–244. [Google Scholar] [CrossRef] - Goel, A.L.; Okumoto, K. Time-dependent error-detection rate model for software reliability and other performance measures. IEEE Trans. Rel.
**1979**, 28, 206–211. [Google Scholar] [CrossRef]

Software | Number of Modules | Number of Pre-Release Faults | Number of Post-Release Faults |
---|---|---|---|

Eclipse 2.0 | 376 | 4152 | 2049 |

Eclipse 2.1 | 433 | 2007 | 1394 |

Eclipse 3.0 | 431 | 3312 | 2151 |

Software | Type | Number of Modules | Number of Faults |
---|---|---|---|

Equinox | enterprise | 313 | 3120 |

KAA | enterprise | 30 | 711 |

gcc version 10 | open source | 23 | 290 |

samba version 3.0 | open source | 35 | 2519 |

samba version 4.0 | open source | 19 | 2523 |

samba version 4.1 | open source | 133 | 2398 |

Python version 3.9 | open source | 74 | 841 |

Firefox version 2.0 | open source | 46 | 10,000 |

Firefox for Android | open source | 29 | 10,000 |

Generalized Pareto | Weibull | ||||||
---|---|---|---|---|---|---|---|

KS | h Value | p Value | KS | h Value | p Value | ||

Pre-release faults | Eclipse 2.0 | 0.1944 | 0 | 0.4603 | 0.3889 | 1 | 0.0059 |

Eclipse 2.1 | 0.1667 | 0 | 0.8608 | 0.3750 | 0 | 0.0506 | |

Eclipse 3.0 | 0.1250 | 0 | 0.9868 | 0.2500 | 0 | 0.3873 | |

Post-release faults | Eclipse 2.0 | 0.2353 | 0 | 0.6725 | 0.8824 | 0 | 0.2083 |

Eclipse 2.1 | 0.9091 | 1 | 8.1868 $\times {10}^{-7}$ | 0.7083 | 1 | 4.0102 $\times {10}^{-6}$ | |

Eclipse 3.0 | 0.9412 | 1 | 1.0822 $\times {10}^{-7}$ | 0.5833 | 1 | 2.7336 $\times {10}^{-4}$ | |

Equinox | 1.0000 | 1 | 1.3029 $\times {10}^{-21}$ | 1.0000 | 1 | 1.3029 $\times {10}^{-21}$ | |

KAA | 0.0741 | 0 | 1.0000 | 0.0741 | 0 | 1.0000 |

Tsallis | ||||||
---|---|---|---|---|---|---|

KS | h Value | p Value | q | $\mathsf{\beta}$ | ||

Pre-release faults | Eclipse 2.0 | 0.0811 | 0 | 0.9995 | 0.71 | 1.2978 |

Eclipse 2.1 | 0.1600 | 0 | 0.9896 | 0.75 | 1.6322 | |

Eclipse 3.0 | 0.1111 | 0 | 0.9713 | 0.71 | 1.7671 | |

Post-release faults | Eclipse 2.0 | 0.0556 | 0 | 1.0000 | 0.72 | 3.1030 |

Eclipse 2.1 | 0.0909 | 0 | 1.0000 | 0.82 | 2.9025 | |

Eclipse 3.0 | 0.1176 | 0 | 0.9994 | 0.76 | 2.6499 | |

Equinox | 0.0435 | 0 | 1.0000 | 0.66 | 0.2850 | |

KAA | 0.0741 | 0 | 1.0000 | 0.51 | 0.1250 |

Dataset | Generalized Pareto | Weibull | Tsallis | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

KS | h Value | p Value | KS | h Value | p Value | KS | h Value | p Value | q | $\mathsf{\beta}$ | |

gcc version 10 | 0.1429 | 0 | 0.9971 | 0.2857 | 0 | 0.5407 | 0.1429 | 0 | 0.9971 | 0.70 | 0.1857 |

samba version 3.0 | 0.1111 | 0 | 0.9936 | 0.1111 | 0 | 0.9936 | 0.1111 | 0 | 0.9936 | 0.71 | 0.0327 |

samba version 4.0 | 0.1500 | 0 | 0.9655 | 0.1500 | 0 | 0.9655 | 0.1000 | 0 | 0.9999 | 0.71 | 0.0178 |

samba version 4.1 | 0.9474 | 1 | 1.3431 $\times {10}^{-8}$ | 0.1053 | 0 | 0.9998 | 0.1053 | 0 | 0.9998 | 0.83 | 0.0158 |

Python version 3.9 | 1.0000 | 1 | 1.5659 $\times {10}^{-9}$ | 1.0000 | 1 | 1.5659 $\times {10}^{-9}$ | 0.1579 | 0 | 0.9563 | 0.56 | 0.6151 |

Firefox version 2.0 | 1.0000 | 1 | 1.3029 $\times {10}^{-21}$ | 1.0000 | 1 | 1.3029 $\times {10}^{-21}$ | 0.0652 | 0 | 0.9999 | 0.66 | 0.0143 |

Firefox for Android | 1.0000 | 1 | 5.0391 $\times {10}^{-14}$ | 1.0000 | 1 | 5.0391 $\times {10}^{-14}$ | 0.1034 | 0 | 0.9961 | 0.57 | 0.0206 |

Software Type | Pareto and Its Variants | Weibull | Tsallis |
---|---|---|---|

Enterprise | × | √ | √ |

Open source | √ | × | √ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sharma, S.
Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach. *Software* **2022**, *1*, 473-484.
https://doi.org/10.3390/software1040020

**AMA Style**

Sharma S.
Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach. *Software*. 2022; 1(4):473-484.
https://doi.org/10.3390/software1040020

**Chicago/Turabian Style**

Sharma, Shachi.
2022. "Analysis of Faults in Software Systems Using Tsallis Distribution: A Unified Approach" *Software* 1, no. 4: 473-484.
https://doi.org/10.3390/software1040020