1. Introduction
The Burr III distribution, renowned for its adaptability, has been widely applied across various domains, including reliability engineering and survival analysis [
1,
2,
3,
4]. It was first introduced by Burr in 1942 [
5], and has undergone substantial enhancements to augment its modeling capabilities. Such advancements have expanded the distribution’s utility, allowing it to encompass a broader spectrum of data configurations and demonstrating its significant practical value [
6,
7,
8]. Recent progress in the field has spurred the development of innovative regression models and analytical tools predicated on the structural nuances of the Burr III distribution. Based on tuning parameters and modifying functional structures, the tools can be utilized for fitting and make predictions when given various types of data. These tools have found widespread application in data analysis, underpinning informed decisionmaking across a variety of scientific disciplines. Contrasting with the Weibull and gamma distributions, the Burr Types III and XII display a broader spectrum of skewness and kurtosis [
2], with Burr III being particularly versatile. In this paper, we therefore focus on the extension of the Burr III distribution.
In reliability modeling, the odds ratio serves as a pivotal statistical metric. It provides profound insights into the relationships between exposures and outcomes, proving indispensable in epidemiology and public health. The odds ratio has been effectively employed in assessing medical interventions and identifying behavioral risk factors, thereby shaping healthcare policies and preventive measures [
9,
10,
11].
In recent years, numerous methodologies for constructing generalized continuous probability distributions have been proposed [
12,
13,
14,
15,
16]. Contributions to this burgeoning field include the modified slash distributions [
17], modifiedX family of distributions [
18], the new arcsinegenerator distribution [
19], enhanced version of the generalized Weibull distribution [
20], McDonald Generalized Power Weibull distributions [
21], the exponentiated XLindley distribution [
22], the Pareto–Poisson distribution [
23], the Kumaraswamy Generalized Inverse Lomax distribution [
24], and others that have significantly advanced the statistical modeling landscape.
In Chen et al. [
25], we explored the exponentiated odds ratio generator to find the general form of distribution in terms of the odds ratio. The mathematical construct is defined as follows:
where
$T\left(s\right)$ represents the cumulative distribution function (cdf) of the transformer, respectively.
$D(x,\mathsf{\Phi})$, and
$\overline{D}(x,\mathsf{\Phi})$ denote the cdf and survival function, respectively, of any baseline distribution associated with a random variable
x.
$\mathsf{\Phi}$ refers to the baseline distribution’s vector of parameters.
The development of statistical distributions for bathtubshaped datasets is crucial for accurately modeling various reallife phenomena where the hazard rate initially decreases, then stabilizes, and, finally, increases over time. This characteristic shape is observed in numerous applications, particularly in reliability engineering and biomedical fields. For example, the mortality rate of humans typically follows a bathtub curve: high during infancy, low during most of adulthood, and high again in old age. This pattern is also prevalent in the failure rates of mechanical and electronic components, where early failures (infant mortality), a period of reliable operation (useful life), and wearout failures toward the end of the life cycle are common. Drugs also exhibit similar patterns where efficacy might vary significantly across different age groups, with higher failure rates in children and the elderly compared to middleaged individuals. By adding additional parameters to existing distributions, statisticians can improve the precision of reliability assessments and risk evaluations, which are critical for industries like insurance, engineering, and healthcare.
Due to the inherent complexity of bathtubshaped data, developing new distributions using traditional methods, such as those used for uniform, exponential, gamma, and Weibull distributions, is not feasible. However, integrating established general structures with simpler distributions can yield new families of distributions capable of modeling bathtub shapes. These new distributions also perform well with other dataset shapes, enhancing their versatility and applicability across various fields. This approach allows for the creation of models that capture the unique characteristics of bathtubshaped hazard functions while maintaining the simplicity and robustness of traditional distributions.
This paper introduces a revised Burr III distribution that integrates the odds ratio generator, aimed at improving the modeling of realworld data. The motivation for developing this new family of distributions is to enhance the flexibility of the basic function
$D(x,\mathsf{\Phi})$, especially for singleparameter baseline distributions. We aim to show that, by adding additional parameters, simple distributions with limited variability can be transformed to display a wide range of shapes and skewness, as explored in
Section 3. Moreover, the flexibility and usefulness of this new family of distributions are demonstrated through four reallife examples. Notably, as illustrated in the first two examples of
Section 6, even with the simplest parent distribution, the uniform distribution, the BSIORUniform provides a robust bathtub shape for modeling reallife datasets. As shown in Figures 11 and 13, wellknown distributions such as the gamma, Weibull, and generalized exponential distributions all fail to model bathtubshaped datasets. Complex models, like the Burr III [
5] and those related to the Weibull and exponential distributions, fail to model the bathtubshaped dataset as well. These models include the Weibull generalized exponential distribution [
26], the Type2 Gumbel [
27], the Lomax Gumbel Type2 [
28], and the Exponentiated Generalized Gumbel Type2 distributions [
29]. These observations further support the novelty and significance of the proposed model. Recent literature has introduced several univariate distributions, such as those in [
30,
31,
32], which often possess complex structures that can complicate their practical use. In contrast, the BSIORG model, described in Equation (
2), maintains a relatively simple structure. This simplicity aids in the ease of computing its properties and performing parameter inference, offering advantages over many more generalized distributions.
The structure of the paper is as follows:
Section 2 delineates the new distribution family and examines its key submodel, the BSIORG family of distributions. In
Section 3, we highlight several special cases with illustrative examples of probability density functions, hazard rate functions, and plots of skewness and kurtosis.
Section 4 is dedicated to exploring the statistical properties of the BSIORG family of distributions, covering aspects such as hazard rate functions, quantile functions, moments, and more.
Section 5 presents various estimation methods along with simulation results. Finally,
Section 6 demonstrates the proposed model’s flexibility through reallife applications.
4. Statistical Properties of the Burr III Scaled Inverse Odds Ratio–G Distribution
Given the extensive formulation of the probability density function in Equation (
3), computing the statistical properties of this new family of distributions could involve intricate processes, and deriving closed forms for complex parent distributions may be challenging. To expedite these calculations, we will first expand the pdf of the BSIORG into the wellknown exponentiatedG distribution, for which the statistical properties have been established both theoretically and numerically. This expansion simplifies the computation of all related statistical measures, including moments, the momentgenerating function, incomplete and conditional moments, moment of residual life and reversed residual life, the Rényi entropy, order statistics, stochastic ordering, and probabilityweighted moments. Due to the complexity of this paper, comprehensive proofs and detailed derivations of these properties are provided in the
Supplementary Information, accessible at
https://github.com/shusenpu/BSIORG/blob/main/Supplementary_Info.pdf (accessed on 2 May 2024).
4.1. Basic Properties of the BSIORG Distribution
A series of expansions for the probability density function allows for simplifying calculations in properties’ derivations, approximating functions during simulation processes, and facilitating the analytical manipulation of the pdf.
Theorem 1. The pdf of the BSIORG distribution can be expressed as a linear combination of the exponentiated generalized distribution as follows:where the coefficients ${c}_{i,j}$ are given byand the term ${r}_{ibjb1}(x,\mathsf{\Phi})$ is defined asrepresenting the exponentiated generalized distribution’s pdf given parameter ${W}^{\ast}=ibjb$. Proof. We consider the general form of binomial series expansion:
Thus, the pdf of the BSIORG distribution can be expanded as
We note that, using the definition of odds ratio, the function can be generalized as
Therefore, we have the pdf as
where
and
which is the pdf of the exponentiated generalized distribution with parameter
${b}^{\ast}=ibjb$. □
The pdf is represented as an infinite series, which implies that the function can be approximated by sufficiently summing many terms of the series to approximate its true value. The quality of the approximation typically improves as more terms are included. The coefficients ${c}_{i,j}$ modify the contribution of each term in the series. These coefficients can be interpreted as weights that adjust the influence of each term based on the distribution. It can be easily demonstrated that the series converges because it forms a geometric sequence with a common ratio $0<D(x,\mathsf{\Phi})<1$. The expansion provides an intuitive and easier way to calculate the statistical properties of the distribution, while it is relatively difficult to perform derivations like integration on the original pdf with odds ratios.
The hrf defines the instantaneous rate at which events occur, given that no prior event has happened. A higher hazard rate at a certain time indicates a greater risk of the event happening. The hazard rate does not represent the probability of an event occurring at a specific time but the rate or intensity of occurrence at an infinitesimal interval around that time. In their seminal work, ref. [
33] demonstrated the equivalent behavior of hrf, reverse hazard function and the mean residual life function.
Remark 1. The hrf of the distribution can be derived as follows:and the reverse hrf is The quantile function, which can also be termed as the inverse cdf, serves as a method to map probabilities back to values in the domain of a random variable. For a given probability p (where $0\le p\le 1$), the quantile function provides the value x such that the probability of the random variable being less than or equal to x is p.
Remark 2. The quantile function for the BSIORG distribution is defined aswhere $0\le p\le 1$ and Generally, the quantile function can be used to find percentiles and could be used for inferences. For example, $p=0.5$ gives the median of the distribution, the value below which $50\%$ of the data falls. The simple form of the quantile function q provides a direct way to calculate the probability since it can be solved directly by plugging in the quantile values.
4.2. Moments, Incomplete Moments, and Generating Functions
Moments serve as statistical indicators that succinctly capture the essence of probability distributions and datasets.
Remark 3. For a random variable Y∼$\mathit{B}\text{}\mathit{SIOR}\text{}\mathit{G}(x;k,a,b,\mathsf{\Phi})$, the rth moment of the BSIORG distribution iswhere ${Z}_{i,j}$ follows the exponentiated generalized distribution, which is obtained by raising the cdf of a baseline distribution to a certain power, with ${c}_{i,j}$ as defined in Theorem 1. The convergence of the rth moment can be demonstrated by examining the expansion of the pdf of BSIORG, where the pdf can be expressed as a coefficient multiplied by a certain power of the parent cdf, $D(x,\mathsf{\Phi})$. On one end, as i and j increase, the tails of the power series approach zero since $D(x,\mathsf{\Phi})<1$. On the other end, the powers of $D(x,\mathsf{\Phi})$ can be sorted while calculating the rth moment of ${Z}_{i,j}$. To establish an upper bound for $E\left({Y}^{r}\right)$, we replace all coefficients with their maximum value and rank the ${Z}_{i,j}^{r}$ according to the power of $D(x,\mathsf{\Phi})$, forming a geometric series with a common ratio of $D(x,\mathsf{\Phi})$. Therefore, we conclude that $E\left({Y}^{r}\right)$ converges. A similar approach can be applied to demonstrate that the infinite series converges in all remaining conclusions and remarks.
An incomplete moment refers to the moment of a portion of a distribution. It is defined as the expected value of a given function of a random variable over a specified range, which is in contrast to the complete moment that takes into account the entire distribution.
Remark 4. The incomplete moment for the distribution is formulated aswhere ${I}_{i,j}\left(y\right)={\int}_{0}^{z}{y}^{s}{r}_{ibjb}(x,\mathsf{\Phi})$ is the incomplete moments of the exponentiatedG distribution. Here, we can examine the moment in the designated range from 0 to z as specified by the range, which is especially useful when dealing with truncated proportions of distribution, such as censoring in survival analysis. The moment generating function (mgf) encapsulates all the moments of the probability distribution. By evaluating its derivatives at zero, the mgf provides a systematic way to calculate the mean, variance, skewness, and other moments of the distribution.
Remark 5. The mgf can be found usingwhere ${M}_{{Z}_{i,j}}\left(t\right)$ denotes the mgf of the exponentiated generalized distribution with parameter ${W}^{\ast}=ibjb$. Thus, the overall distribution is a mixture or combination of multiple distributions, each with a different parameter ${W}^{\ast}$. Adjusted by the coefficient weights ${c}_{i,j}$, each ${M}_{{Z}_{i,j}}\left(t\right)$ contributes to the overall ${M}_{Y}\left(t\right)$, which can be used for finding the nth moment of the distribution by taking the nth derivative of the resulting ${M}_{Y}\left(t\right)$ function.
4.3. Moment of Residual Life and Reversed Residual Life
In survival analysis and reliability engineering, the moment of residual life and moment of reversed residual life are essential for analyzing the distribution of timetoevent data. The moment of residual life at the time t quantifies the expected remaining lifetime given that an event has not occurred by t. Conversely, the moment of reversed residual life at the time t reflects the expected past duration given that an event occurred before t.
Lemma 1. The $m\phantom{\rule{4pt}{0ex}}th$ moment of residual life, denoted as ${R}_{m}\left(t\right)$, is formally defined aswhere X is the lifetime random variable, and m is the moment order. The moment of residual life of the distribution can be derived from Equation (10):where ${g}_{i,j}\left(x\right)$ is the pdf of exponentiatedG with parameter $ibjb$. Following a similar calculation, we can find the mth moment of reversed residual life as follows.
Remark 6. The $m\phantom{\rule{4pt}{0ex}}th$ moment of reversed residual life, denoted as ${R}_{m}^{\prime}\left(t\right)$, can be derived as 4.4. Skewness and Kurtosis Analysis
Skewness quantifies the asymmetry of the distribution, while kurtosis measures its tail heaviness, with positive skewness indicating a rightskewed distribution and kurtosis indicating peakedness.
Lemma 1. Given $E\left({Y}^{n}\right)={\mu}_{n}^{\prime}$, the coefficient of skewness (${\theta}_{s}$) for Y∼$\mathit{B}\text{}\mathit{SIOR}\text{}\mathit{G}(x;k,a,b,\mathsf{\Phi})$ isand the coefficient of kurtosis (${\theta}_{k}$) is 4.5. Rényi Entropy
Rényi entropy is a generalization of the Shannon entropy and provides a measure of the diversity, uncertainty, or randomness of a probability distribution.
Theorem 2. The Rényi entropy for the BSIORG distribution is calculated aswhere $0<\omega \ne 1$, indicating the diversity of values the distribution can take, with ${I}_{REG}$ being the Rényi entropy for the exponentiated generalized distribution parameterized by ${W}^{\ast}=\frac{i\omega bjb}{\omega}$. Proof. The Rényi entropy of the BSIORG distribution is given by
where
$\omega >0$ and
$\omega \ne 1$. By applying the same expansion technique for the pdf, we obtain
Using the definition of odds ratio, we have
Therefore, the further expansion of the Rényi entropy can be generalized as
where
${I}_{REG}$ is the Rényi entropy of the exponentiated generalized distribution with parameter
${W}^{\ast}=\frac{i\omega bjb}{\omega}$. □
The Rényi entropy for the BSIORG distribution captures the diversity of the distribution’s outcomes weighted by the order $\omega $. Higher values of ${I}_{R}\left(\omega \right)$ indicate greater uncertainty or randomness within the distribution for the given order $\omega $.
4.6. Order Statistics and Stochastic Ordering
Theorem 3. For ${X}_{1},{X}_{2},\dots ,{X}_{N}$ as i.i.d. random variables from the BSIORG distribution, the pdf of the jth order statistic ${f}_{j:N}\left(x\right)$ isexpressing ${f}_{j:N}\left(x\right)$ as a linear combination of BSIORG with parameters $({k}^{\ast},a,b)$, where ${k}^{\ast}=(j+s)k$. Proof. Let
${X}_{1},{X}_{2},\cdots ,{X}_{n}$ be independent identically distributed random variables distributed by the BSIORG distribution. The pdf of the
jth order statistic
${f}_{j:n}\left(x\right)$ is given by
□
Theorem 4. Given ${X}_{1}$∼$\mathit{B}\text{}\mathit{SIOR}\text{}\mathit{G}(x;k,{a}_{1},b,\mathsf{\Phi})$ and ${X}_{2}$∼$\mathit{B}\text{}\mathit{SIOR}\text{}\mathit{G}(x;k,{a}_{2},b,\mathsf{\Phi})$, the likelihood ratio Λ isindicating the relative likelihood of outcomes from two distributions based on their parameters. Earlier order statistics (smaller j) tend to have “lighter” tails as they are biased towards smaller values, whereas later order statistics (larger j) have “heavier” tails, reflecting larger sample values. The parameter modification $(j+s)k$ implies that higherorder statistics or terms in the sum increasingly stretch or scale the distribution, accounting for the more extreme values expected in higherorder statistics.
4.7. Probability Weighted Moments
Probability Weighted Moments (PWMs) are derived from our probability distribution, which weights moments by the probabilities themselves. They provide a way to summarize the shape of a probability distribution and are particularly useful for characterizing the tails of the distribution.
Lemma 4. The rth PWM for the distribution can be calculated as follows:which simplifies tofor nonnegative integers p, q, and r. For practical applications, often, $r=0$; thus, the PWM simplifies further, toA common example used for illustration is the first PWM ($PW{M}_{1,0}$), expressed asfor $p=1$ and $q=0$, which simplifies the evaluation process for specific cases. Commonly, in practice, particularly for hydrological models where one might want to understand the mean behavior rather than moments of higher order, r is set to 0, simplifying the PWM to only consider the probability weights without raising x to any power.
5. Methods of Estimation
In this section, we discuss five estimation methods crucial for discerning the parameters of proposed BSIORG distributions. The exploration of these methods is the key to achieving accurate approximations that align BSIORG models with observed data. Then, we apply the Monte Carlo simulation to test the convergence of approximated parameters as the sample sizes grow to determine its reliability. The estimation methods include the Maximum Likelihood Estimation (MLE), Least Square (LS) and Weighted Least Square (WLS) Estimation, Maximum Product Spacing Estimation (MPS), the Cramér–von Mises Estimation (CVM), and the Anderson and Darling Estimation (ADE).
Given an independent random sample
$({X}_{1},{X}_{2},\dots ,{X}_{n})$ from the BSIORG distribution with parameter vector
$\sigma ={(k,a,b,\mathsf{\Phi})}^{T}$, the likelihood function
$\Delta \left(\sigma \right)$ is formulated as
MLE seeks to find the set of parameters that maximize the likelihood function, making it a powerful method for parameter estimation by leveraging the full probability model, as shown in Equation (
11).
The LS technique in Equation (
12) minimizes the sum of the squared differences between observed and theoretical values, offering a straightforward approach for fitting models to data by emphasizing overall error reduction. An extension of LS, from Equ ation (
13), WLS, assigns weights to data points, prioritizing certain observations over others, thus refining the fitting process, especially when dealing with heteroscedastic data. The LS and WLS methods are expressed as [
34]
and
Both methods aim to find the best parameters (
$\sigma $) that align the theoretical distribution specified by the BSIORG model as closely as possible to the observed data.
The MPS is valuable for complex distributions, as highlighted by Cheng and Amin [
35]. The MPS is maximized by optimizing
where
${T}_{i}$ represents spacing functions and estimators are found by solving
$\left(\frac{\partial L}{\partial k},\frac{\partial L}{\partial a},\frac{\partial L}{\partial b},\frac{\partial L}{\partial {\mathsf{\Phi}}_{s}}\right)$$=0$. MPS in Equation (
14) focuses on maximizing the product of the spacings between ordered observations and their estimated distribution, providing an alternative to MLE that is less sensitive to outliers.
Estimators are derived by minimizing the Cramér–von Mises criterion,
$\mathit{CVM}(x,\sigma )$, with respect to
$\sigma $, where
The Cramér–von Mises distance between continuous distribution functions is one of the distinguished measures of deviation between distributions, as shown in Equation (
15).
As an advancement of the Cramér–von Mises criterion, the Anderson–Darling approach in Equation (
16) places more weight on the tails of the distribution, making it particularly useful for detecting discrepancies in the distribution’s extremities. By minimizing the formula below, we can obtain the Anderson–Darling estimators:
This method emphasizes the tails of the distribution, making it particularly useful for distributions with significant tail behaviors.
To determine the maximum or minimum values of each estimator, we differentiate the specified objective functions and use numerical methods such as iteratively reweighted least squares (IRLS) or Newton–Raphson to locate their extreme values. For a practical demonstration of how the numerical simulation operates, we employ BSIORE as an illustrative example. The detailed steps of the simulation process are outlined in Algorithm 1. We conducted a comprehensive simulation study using Monte Carlo simulations to estimate the parameters of the BSIORE distribution. Then, we picked
$a=2.5$,
$b=0.8$,
$\lambda =1.3$, and
$k=1.2$ as the initial values for each parameter. Please note that the model is highly nonlinear and sensitive to the initial parameter values. Then,
$N=40$, 80, 160, 320, and 640 were selected as sample sizes to generate random samples, with each experiment replicated 500 times to ensure statistical reliability. The resulting data were then analyzed to compute both bias and mean squared error (MSE) for each dataset. As illustrated in
Figure 10, the MSE demonstrated convergence towards 0 with increasing
N, affirming the stability and reliability of the estimations across all cases.
Algorithm 1 Monte Carlo Simulation for Parameter Estimation 
 1:
Input: Randomly select a set of parameters $a,b,\lambda ,k$ as the true value  2:
Initialize: Initiation of each parameter for optimization, $\mathrm{init\_cond}\leftarrow [{a}_{0},{b}_{0},{\lambda}_{0},{k}_{0}]$  3:
Set: Sample sizes ${N}_{s}=\{40,80,160,320,640\}$, number of simulations $NN=500$  4:
Define: BEORE function as per the model specifics  5:
for each n in ${N}_{s}$ do  6:
Initialize data frame error to store parameters  7:
for $k=1$ to $NN$ do  8:
Generate uniform random numbers ${F}_{x}$ of size n  9:
Initialize vectors: $x,\mathrm{ls\_ins},\mathrm{wls\_ins},\mathrm{ins},{i}_{21},{i}_{2n1}$  10:
for each i in 1 to n do  11:
$r\leftarrow \mathrm{BEORE}({F}_{x}\left[i\right],{a}_{1},{b}_{1},{\lambda}_{1},{k}_{1})$  12:
Store r in x  13:
Calculate indices for estimation methods including MLE, LS, WLS, etc.  14:
end for  15:
Sort x  16:
${x}_{\mathrm{rev}}\leftarrow \mathrm{sort}(x,\mathrm{decreasing}=\mathrm{TRUE})$  17:
Apply estimation methods and store results  18:
Store errors in error data frame  19:
end for  20:
Compute mean parameter estimates and MSE from error  21:
end for  22:
Output: Final parameter estimates and MSE for each sample size

6. Application
In this section, we transition from theoretical discussions to practical examinations, highlighting the applicability of our model through the analysis of realworld datasets. This exploration is designed to validate the practical utility of the newly introduced BSIORU and BSIORE distributions by showcasing their effectiveness in real datadriven scenarios. Some simple and wellknown distributions such as gamma, generalized exponential, Loglogistic, and Weibull distributions are included in the comparison. Given that the BSIORE model derives from the Burr III and exponential distributions, we aim to benchmark it against other models based on Burr III [
5], as well as those related to the Weibull and exponential distributions—Weibull generalized exponential (WGE) distribution [
26], known for its applicability in complex scenarios. Furthermore, in light of Chen’s work [
25] on Type2 Gumbel distributions, we have also incorporated comparisons with Type2 Gumbel (T2G) [
27], Lomax Gumbel Type2 (LGT) [
28], and Exponentiated Generalized Gumbel Type2 (EGG2) distributions [
29] into our analysis. It should be noted that both LGT and EGG2 have four parameters each. When these models are compared to BSIORE and BSIORU, the significance of the proposed model is evident, even though it shares the same number of parameters with its counterparts.
To thoroughly assess and compare the efficacy of statistical models, several goodnessoffit metrics were employed, each evaluating distinct facets of model performance. The metrics include the following:
The 2 LogLikelihood Statistic [
36], which quantifies the fit of a model by summarizing the discrepancies between observed and expected values under the model. A lower statistic suggests a better fit and this metric underpins various other statistical tests.
Cramér–von Mises Statistic (
${W}^{\ast}$) [
37], which measures how closely a theoretical cdf matches the empirical cdf by integrating the squared differences across all values. A lower
${W}^{\ast}$ value indicates a better fit, as it means there is less deviation between the theoretical and empirical cdfs. This provides a thorough assessment of the deviation between the modeled and observed data.
Anderson–Darling Statistic (
${A}^{\ast}$) [
38], akin to
${W}^{\ast}$ but placing greater emphasis on the tails of the distribution. This makes it particularly sensitive to extremities in data, valuable for analyses where tail behavior is crucial. A lower
${A}^{\ast}$ value indicates a better fit, especially when the tails of the distribution are well modeled.
Akaike Information Criterion (
$AIC$) [
36], which balances model fit against the number of parameters, penalizing unnecessary complexity. Derived from information entropy, it seeks to minimize information loss, preferring models with lower
$AIC$ values. A lower
$AIC$ value indicates a better fit, as it suggests that the model achieves a good balance between accuracy and simplicity.
Bayesian Information Criterion (
$BIC$) [
39], similar to
$AIC$ but imposing a stronger penalty on the number of parameters. Based on Bayesian probability, it is useful for selecting among a finite set of models, favoring simplicity unless a more complex model significantly enhances fit. A lower
$BIC$ value indicates a better fit, favoring models that are simpler and have fewer parameters unless a more complex model significantly improves the fit.
Consistent Akaike Information Criterion (
$CAIC$) [
40], an augmentation of
$AIC$ that incorporates an extra penalty for parameter count, making it more conservative and particularly apt for larger datasets where overfitting is a concern. A lower
$CAIC$ value indicates a better fit, particularly in larger datasets where it helps avoid overfitting.
Hannan–Quinn Criterion (
$HQIC$) [
41], which, like
$AIC$ and
$BIC$, employs a logarithmically growing penalty term with sample size. It offers a compromise between the propensity of
$AIC$ to overfit and the strict penalties of
$BIC$. A lower
$HQIC$ value indicates a better fit, balancing between the risk of overfitting and underfitting.
Kolmogorov–Smirnov Test Statistic (
$KS$) [
42], which identifies the maximum divergence between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The corresponding pvalue assists in recognizing statistically significant deviations, with a smaller
$KS$ statistic indicating a more accurate fit between observed and modeled distributions, as it suggests minimal divergence.
6.1. Lifetime Data
The first dataset contains the lifetime data of 50 devices, which was provided in [
43]. The goodnessoffit statistics are shown in
Table 1 and
Figure 11 presents the histogram of observed data and the density plots of fitted distributions. The BSIORU distribution emerges as the best model for this bathtub dataset since it has the optimal goodnessoffit metrics and the highest KS test pvalue.
Figure 12 illustrates the Kaplan–Meier (KM) survival curve and both theoretical and empirical cdfs. The theoretical predictions align well with actual data, highlighting that the BSIORU distribution is effective in modeling the data with a bathtub shape.
6.2. Failure Data
This dataset contains the failure and running times of a sample of 30 devices given by Meeker and Escobar [
44]. The model results are shown in
Table 2.
Figure 13 presents the histogram of observed data and the density plots of fitted distributions. The BSIORU distribution has the best performance for this dataset since it has the optimal value of all goodnessoffit metrics.
Figure 14 presents the Kaplan–Meier (KM) survival curve and the theoretical and empirical cdfs plot. It is obvious that the BSIORU distribution can approximate the true data well.
6.3. Lung Cancer Data
This dataset contains the survival time of 128 patients with advanced lung cancer from the North Central Cancer Treatment Group. A summary of parameter estimates and goodnessoffit metrics is presented in
Table 3, while
Figure 15 illustrates the comparison between the histogram of observed data and the density plots of fitted distributions. The BSIORE distribution emerges as the superior model for this dataset, evidenced by its better goodnessoffit metrics and the highest KS test
pvalue, as detailed in
Table 3.
Figure 16 presents a suite of plots including the Kaplan–Meier (KM) survival curve, both theoretical and empirical cdfs, and a Total Time on Test (TTT) plot adjusted for scaling. The empirical consistency observed between theoretical predictions and actual data underscores the BSIORE distribution’s adeptness at modeling the data with a monotonic hazard rate structure.
6.4. Bladder Cancer Data
This dataset represents the recorded remission times given in months from bladder cancer patients, reported by Lee and Wang [
45]. A summary of parameter estimates and goodnessoffit metrics is presented in
Table 4, while
Figure 17 illustrates the comparison between the histogram of observed data and the density plots of fitted distributions. The BSIORE distribution emerges as the superior model for this dataset, evidenced by its better goodnessoffit metrics and the highest KS test
pvalue, as detailed in
Table 4.
Figure 18 presents a suite of plots including the KM survival curve, both theoretical and empirical cdfs, and a TTT plot adjusted for scaling. The empirical consistency observed between theoretical predictions and actual data underscores the BSIORE distribution’s adeptness at modeling the data with a monotonic hazard rate structure.
These results from realworld datasets do more than just confirm the statistical robustness of the BSIORG model; they also amplify its practical relevance in the field of biomedical science. By accurately capturing the dynamics of the specified cancer datasets, this method supports predictive maintenance and ensures the quality of materials essential for advanced applications. The successful correlation of theoretical models with empirical evidence highlights the model’s capability to enhance decisionmaking in healthcare. The primary contribution of the proposed family of distributions lies in its exceptional performance in modeling bathtubshaped data. While traditional distributions such as gamma, generalized exponential, Loglogistic, and Weibull may achieve a better Bayesian Information Criterion (BIC) for simpler shapes like unimodal data due to their fewer parameters, the BSIORG family excels overall. Across diverse datasets, the BSIORG distributions consistently deliver superior fits, making them highly effective for complex data structures.
7. Conclusions
This paper presented the modified Burr III Odds Ratio–G distribution, a comprehensive generalization aimed at enhancing data modeling through the incorporation of Burr III and the odds ratio. The extensive examination of its subfamilies, particularly the Burr III Scaled Inverse Odds Ratio–G distribution, revealed its versatility and efficiency in fitting cancer data. The theoretical underpinnings were rigorously explored, providing insights into its statistical properties and potential applications. Through simulation studies and reallife data analyses, the BSIORG distribution demonstrated superior performance in modeling and prediction accuracy over several wellknown distributions, underlining its significance and utility in statistical modeling. Notably, even the simplest model of BSIORG, the BSIOR–Uniform distribution, is capable of generating bathtubshaped density and hazard rate functions. As demonstrated in the application section, the BSIORG distribution flexibly models both complex bathtub shapes and skewed data. This broad applicability highlights the model’s practical utility for various types of data. With the key statistical properties thoroughly examined in this paper, this new model can be readily applied to data analysis and statistical modeling. However, we acknowledge that no single model is universally the best. In line with George E. P. Box’s observation, “All models are wrong, but some are useful”, we have transparently presented instances where our model may not be the optimal fit, to provide an honest and balanced view. Future research could further explore the application of this model in more diverse datasets and compare its performance with other recent distributions, potentially opening new avenues for statistical analysis and modeling across various scientific domains. Given the limited information available on distributions with bathtubshaped probability density functions and hazard rate functions, a systematic literature review of all models capable of generating these shapes would be particularly valuable. Furthermore, our ongoing work aims to assess the robustness and accuracy of various statistical distributions in producing bathtub shapes, enabling a comprehensive evaluation of their efficacy in modeling diverse realworld data.