Abstract
With the aim of identifying a probability model that not only correctly describes the stochastic behavior of extreme environmental factors such as excess rain, acid rain pH level, and concentrations of ozone, but also measures concentrations of NO and leads deliberations, etc., for a specific site or multiple site forms as well as for life testing experiments, we introduced a novel class of distributions known as the Sine Burr family. Some exceptional prototypes of this class are proposed. Statistical assets of the presented class, such as density function, complete and incomplete moments, average deviation, and Lorenz and Bonferroni graphs, are proposed. Parameter estimation is made via the likelihood method. Moreover, the application is explained by using four real data sets. We have also illustrated the significance and elasticity of the proposed class in the above-mentioned stochastic phenomenon.
1. Introduction
Several researchers have offered approaches for introducing probability models as examples. This phenomenon of adding parameters innovates more robust families of distributions, which are being effectively used for modeling engineering, economics, biological studies and environmental sciences data sets. Therefore, in this regard, some famous classes are the Marshall Olkin- by [], beta- by [], the Kumaraswamy- studied by [], odd Fréchet- by [] logistic- by [], exponentiated generalized- proposed by [], odd generalized N-H- by [], - class by [], transmuted odd Fréchet- by [], exponentiated power generalized Weibull power series- by [], the Weibull- by [], the exponentiated half-logistic generated family by [], Type II half logistic class by the odd [], bivariate Weibull-G family by [], exponentiated generalized alpha power family of distributions by [], truncated Cauchy power Weibull-G class of distributions by [], odd Perks-G class of distributions by [], Type I half logistic Burr X-G family by [], sine Topp-Leone-G family of distributions by [], exponentiated version of the M family of distributions by [], a new power Topp-Leone generated family of distributions by [], truncated inverted Kumaraswamy generated family of distributions by [], generalized exponential class discussed by [], the beta odd log-logistic generalized studied by [], alpha power transformation family of distributions introduced by [], the Kumaraswamy exponential Pareto proposed by [], the generalized Burr XII power series(GBXIIPS) class studied by [], additive Weibull geometric (AWG) distribution proposed by [] and the beta exponentiated modified Weibull (BEMW) distribution developed by [], among others. However, in recent years, Ref. [] presented another idea of generating to obtain a new life distribution by modification of trigonometric functions to give new statistical distributions. They transformed the sine function into a new statistical distribution called the sine- class, with the cumulative distribution function (cdf) and probability density function (pdf) expressed as
and
respectively. The failure rate function (hrf) is defined as
Some motivational factors of this family are: in its simple form, the two cumulative functions and possess an equal number of parameters, and it always avoids the problem of over parametrization, i.e., no additional parameters. In addition, cdf possesses the capability of surging the tractability of , offering new adaptable classes. Until recently, new trigonometric families of probability models developed thus far include -trigonometric model studied by [], sine square distribution discussed by [], a cosine approximation to the normal distribution by [], odd hyperbolic cosine exponential–exponential distribution by [], odd hyperbolic cosine family of lifetime distributions by [], transmuted arcsine distribution properties and application by [], the arcsine exponentiated- family by [], among others. These are very complicated models that are seldom employed by applied practitioners. In order to create more feasible models using trigonometric functions, the challenge of avoiding non-identifiability issues is monumental. The proposed generalization is significant in this regard. Further, we must focus on developing a model that can capture all types of hazard rate curves. The sub-models of the ingenious family being studied in this article fulfills this aspect admirably. One key feature in proposing new generalizations include the continual improvement of the fits of new models when compared to conventional models using natural data sets. We are overwhelmed by the performance of the two sub-models fitted on four data sets, which outweighs twelve competitive well-established models, including four distributions with four parameters. Additionally, in order to quantify the similarity of the proposed model with its respective competing model using the same data, the Vuong test is used to compare the model fits that yielded significant findings, thus reinforcing the motivation in proposing the new family.
Ref. [] introduced the Burr class of probability models. The cdf and pdf for the Burr family are expressed by
and
respectively, and for comprehension, we can call the survival function (sf) and also as the pdf of a certain baseline model relying on a vector of unknown Here, we are going to propose a class of sine-created models by taking into account the Burr class as the baseline distribution in the sine family. This new family is referred to as the Sine Burr class of models.
The remainder of the article is sketched as follows. Starting from the second section, an innovative extended generator, called the Sine Burr family, is presented, and its sub-models are discussed. The third section deals with the model, which is not a nonlinear combination of probability models. Statistical properties of the family are provided in the fourth section. Inference about the population parameter based on a maximum likelihood estimation (MLE) is performed in the fifth section. The sixth section deals with the application of the proposed family. The final section states the conclusion.
2. Ingenious Proposed Class
Here, we construct a relatively new flexible model of distributions called the Sine Burr family of distributions by inserting (3) into (1), and we obtain the cdf, which is expressed as
where the respective pdf is
whereas the sf and hazard rate function (hrf) are expressed as
and
2.1. Sub-Models of SBX Family
In Table 1, we study four possible sub-models of the SBX class. The sub-models of this class possess the parental distributions, i.e., Lomax, log-logistic, exponential, and Rayleigh models, which are presented in Table 1. Therefore, we have the cdf and pdf of these parent models.
Table 1.
Odds ratios and baseline models.
From this table, we pick model 1 and 2, study their pdf and hrf shapes and apply them to four real-life data sets in Section 6.3 for a thorough analysis.
2.1.1. A Sine Burr Lomax (SBXL) Probability Model
The cdf and pdf of sine Burr Lomax distribution are
and
2.1.2. A Sine Burr Loglogistic (SBXLL) Probability Model
Remark 1.
This family of distributions has the ability to model the positively skewed and symmetrical data (Figure 1 and Figure 2) with decreasing failure rate, increasing failure rate, bathtub shape, upside-down bathtub and decreasing-increasing-decreasing failure data (Figure 3 and Figure 4) structure in an appropriate fashion.
Figure 1.
Pdf graphs of the SBXL model.
Figure 2.
Pdf graphs of the SBXLL model.
Figure 3.
Plots of hrf of the SBXL model for random parameter values.
Figure 4.
Plots of hrf of the SBXLL model for random parameter values.
2.1.3. A Sine Burr-X Exponential (SBXE) Distribution
If and , then the cdf and pdf of the SBXE model (for ) are given below
and
2.1.4. A Sine Burr Rayleigh (SBXR) Probability Model
The incorporation of the Rayleigh distribution’s cdf and pdf into Equations (1) and (2) is given below
and
3. Expansion of the SBX Density Function
Here, we derived the pdf expansion of the Sine Burr class of distributions. By applying the Taylor series expansion, we obtain,
We have
Inserting (10) in (6), the density function reduces to
if and the generalized binomial series expansion holds
and on applying (12) to the last term in (11), we obtain
On expanding , we obtain
Inserting the above term in (13), the density function becomes
where
inserting (15) into (14) the , which is an infinite linear combination of probability models
where
and is the pdf with power parameter Thus, the probability model can be viewed as a mixture of infinite components of exponentiated densities with parameters . Thus, several mathematical features of the model come directly from those of the model. In addition, the cdf of the family can be expressed as a mixture of cdfs where
where is the cdf with power parameter .
4. Mathematical and Statistical Properties
Here we shall study quantiles, moment generating, moments, conditional moments, mean deviation, Bonferroni and Lorenz and order statistics of the class of distribution.
4.1. Percentile Function
Suppose X to be a continuous variate, then its cumulative distribution function is expressed as . Now, from this definition, a percentile function generally sends back a threshold measurement x underneath which a haphazard draws from the given cdf would fall percent of the time. In this regard, the inverse of the percentile function, yields as follows
where denotes the percentile function of . As is characterized by the equation , . The median is given by
The skewness measure is due to the Bowley skewness defined by
On the other hand, the Moors kurtosis (Moors, (1988)) based on quantiles is given by
where ·) represents the percentile function. The measures and possess the usual characteristics.
4.2. Moment Generating Functions Cum Moments
In mathematics and statistics, moments of a function are reasonable procedures associated with the shape of the function’s graph. If the function represents density or mass function, then the first moment represents the center of the mass or expected value, and the second moment is the rotational inertia or the variance. Similarly, the ratio of the third mean moment to the square of the second mean moment is the skewness, and the ratio fourth moment about the mean to the second moment about the mean is the kurtosis. Moreover, these moments not only determine the shape of a function but also help to characterize the probability functions.
Let be a stochastic variate possessing pdf with power parameter . The moment of a SBX class of distributions can be obtained from (16)
where denotes the exponentiated distribution with power parameter Another formula for the moment follows from (16) as
where
can be estimated in terms of the baseline percentile function, i.e., as
Now we introduce two formulae for the moment generating function. The initial rule can be compiled from Equation (16) as given by
where is the moment generating function of . Consequently, we can easily determine from the exp generating function. The second formula for the follows from (16) as
where is the mgf of random variable given by
which can be compiled numerically by using the baseline percentile function, i.e., Table 2 and Table 3 give a numerical analysis for the mean , variance , skewness , kurtosis and coefficient of variation for SBXL and SBXLL models, respectively.
Table 2.
Numerical values of , , , , and at for the SBXL model.
Table 3.
Numerical values of , , , , and at for the SBXLL model.
Figure 5 and Figure 6 represent the 3-D plots of the , , and of the SBXL and SBXLL distributions, respectively, for several values of parameters.
Figure 5.
Three-dimensional plots of , , and of the SBXL distribution for .
Figure 6.
Three-dimensional plots of , , and of the SBXLL distribution for .
4.3. Conditional Moments
Prediction via lifetime probability models compels researchers to adopt the conditional moments methodology, the average residual lifetime function and mean inactivity time function. In this section, we focussed ourselves on the initial partial moment, which points out the Lorenz cum Bonferroni graphs, which are helpful in demography, econometrics, medicine, survival analysis and indemnity applications. Therefore, for this, the partial moments of the variate X defined as for any real is given as
where
and can be evaluated numerically.
4.3.1. Mean Deviation
The partial moments methodology is quite useful in finding the average deviance between the median and mean, where the median/mean aberration yields key evidence that is typical of a population. These partial moments can be used in many fields such as economics and insurance. Let stochastic measure X have the SBX family of distribution. The mean deviations about the mean and the mean deviations about the median are defined by
and
respectively, where (X) = (, and is the first complete moment given by (20) with .
4.3.2. Bonferroni and Lorenz Curves
For a positive stochastic variate X, the Lorenz and Bonferroni curves, for a given probability , are given by and , respectively, where , and is the percentile function of X at percentile p.
4.4. Order Statistics
Order observations are precise and important statistical measurements that covenant with the order data. One can define them by letting , ,…, be independent stochastic variates following the SBX family of distributions of size n and letting the arrangement of these variates in ascending order be , ,…, , then the variates …≤ are ordered statistics of random variables. These ordered observations are frequently used in the reliability analysis of a system. The cumulative distribution function of order statistics is expressed as follows
The corresponding pdf is expressed in the given form as
Then the moment of the order statistics is given by
where this integral can be evaluated numerically.
5. Parameter Estimation
Method of Maximum Likelihood
Statistical implications are usually passed through three dissimilar methods such as interval and point estimation, as well as hypothesis testing. Although numerous methodologies for parameter estimation exist in the literature, the likelihood method is the most versatile one, which enjoys anticipated chattels when fabricating the confidence regions and intervals, as well as in test statistics. The asymptotic theory of these estimates convey simple calculations that toil well in limited information contained in the samples. Statisticians frequently pursue estimating quantities such as the density of a test statistic that depends on the sample size so as to obtain better estimate distributions. The subsequent calculations for the MLEs in distribution theory can be definitely handled either logically or mathematically. In this section, we are trying to cope with parameter estimation via the MLE method from the whole sample. Let be a stochastic realization of size n from the SBX distribution given by (5). Let be a vector of the parameters. The log-likelihood function is given by
The log-likelihood can be maximized by differentiating (23) with respect to the parameters, i.e.,
where and The MLEs of parameters can be materialized by resolving the system of nonlinear equations, i.e., , and we are unable to find the solutions of these equations analytically by using the Newton Raphson method via statistical packages such as Mathematica [12.0], R and Matlab.
6. Real-Life Applications of the Proposed Family
Recently, Ref. [] studied the hazards associated with health in the context of extreme value theory. In this part, we focus the application of the proposed model on three different scenarios, such as real-life environmental, survival and biomedical aspects, on five different data sets, which include rainfall acidity of 40 successive days in the state of Minnesota, the line transect data, the failure time of brake pads for 88 cars, the lengths of power failures (in minutes) and the length of time that 72 guinea pigs lived after receiving an injection of a specific amount of mycobacterium tuberculosis in a medical experiment. Sources of the mentioned data sets are given in their respective sections.
6.1. Focused Distributions
For the selection of appropriate models, we have studied the twelve rivalry distributions, each of which has its own merits and demerits. These distributions include Beta–Weibull (BWD), Beta–Lomax (BLD), exponentiated generalized Lomax (EGLD), Weibull generalized Lomax (WGLD), odd Weibull–Lomax (OWLD), exponentiated Weibull (EWD), new sine inverse Weibull (NSINIWD), exponentiated exponential (EED), generalized Lindley (GLD), Weibull (WD), log-logistic (LLD) and Lomax (LD) distributions. These distribution are studied by [,,,,,,,,,,,], respectively. Regarding the selection of these distribution criteria, we chose the most notable, well-established four- and three-parameter models, respectively. The required computations were carried out using the R script AdequacyModel.
6.2. Test Statistics
For comparisons purposes, we sought the help of some goodness of fit tests, as discussed by [,,], such as chi-square , Anderson Darling (AD), the Cramer Von Misses (CVM) and the Kolmogrov–Simnorov (KS) statistics, along with some information criterion, such as Akaike information criterion (A.I.C), corrected Akaike information criterion (A.IC.C), Bayesian information criterion (B.I.C), Hannan–Quinn Information criterion (H.Q.I.C) based on the log-likelihood (ℓ) result. For corresponding formulas and explanation, readers are referred to [,,]. Additionally, the Vuong test (VT) statistics are also used for testing the credibility of the proposed model, and comprehensive details are stated in [,]. Further, the empirical findings of these comparisons are displayed in Tables 9, 14, 19 and 24, respectively.
6.3. Examples
Here, we have focused our attention on three types of applications that are frequently desired by different applied researchers, so our target becomes more focused on the environmental, failure time of components and biomedical data of the study. In Table 4, we define two proposed distributions, SBXL and SBXLL, by their cdfs as follows.
Table 4.
CDFs of proposed models.
In order to pursue these targets, we compared our models with the most competing models of that are, i.e., we have compared our proposed models as follows: SBXL is fitted on environmental data sets (Data-I and Data-II), SBXLL is fitted on the failure time of data sets (Data-III and Data-IV), and for biomedical data, (Data-V) both SBXL and SBXLL are fitted, respectively.
Case-I: Environmental Data Sets
Any occurrence, activity, or state that has a harmful effect on the environment is considered an environmental hazard. Physical or chemical pollution in the air, water, and soil is a reflection of environmental risks. Environmental risks have the ability to damage both people and the environment severely. There is a growing global effort to enhance environmental-related decision-making.
Data-I. Because of the large concentrations of nitric and sulfuric acids in the atmosphere that are washed down to the earth, acid rain is a common environmental phenomenon that has a trickle-down effect on a number of ecological variables, such as numbers of species, abundances of worms, change in the sizes of crabs, measures of quality of water or physiological condition of individual animals, etc. The production of acidic pollutants in the atmosphere results from the oxidation of sulpher and nitrogen in coal and other fossil fuels. In many industrialized nations, acid rain has significantly harmed forests. Acid rain can be avoided by using low-sulfur fuel and coal. Environmental catastrophes are covered in this part of the study. Acidity level is measured on a pH scale, which varies from one (highly acidic) to seven (neutral). Acid rain is considered to have a pH of less than 5.7. The first data measures the acidity of rainfalls for forty days in the state of Minnesota. This data set was reported by [], and its values are given as 3.71, 4.23, 4.16, 2.98, 3.23, 4.67, 3.99, 5.04, 4.55, 3.24, 2.80, 3.44, 3.27, 2.66, 2.95, 4.70, 5.12, 3.77, 3.12, 2.38, 4.57, 3.88, 2.97, 3.70, 2.53, 2.67, 4.12, 4.80, 3.55, 3.86, 2.51, 3.33, 3.85, 2.35, 3.12, 4.39, 5.09, 3.38, 2.73, 3.07. In addition, for drawing a valid conclusion, grouping of the data is made via the R computational package. Possible groups, [0.03, 2.54], [2.54, 6.22], [6.22, 11.8], [11.8, 21.7], [21.7, 38.7], [38.7, 60.6], possess the frequencies 9, 8, 8, 8, 8, 9, respectively.
Table 5 and Table 6 show that there is a close association between theoretical and descriptive statistics of data. It also implies that the proposed model has an ability to work in platykurtic and positively skewed data much more effectively as compared to the competing distributions.
Table 5.
Summary statistics related to data-I.
Table 6.
Theoretical statistical measures from SBXL for data-I.
Furthermore, Table 7 and Table 8 exhibit the environment, which supports the proposed model in every aspect. These tables not only display that SBXL has the least values of goodness of fit statistics but also the minimum loss of information principle.
Table 7.
MLEs and goodness-of-fit of data set-I.
Table 8.
Comparison of data set I fitting via information criterion.
Data-II (Table 9). In order to simulate detectability, distances of observed targets from transect lines are frequently utilized in line-transect distance sampling to estimate population densities. The present crisis is associated with large populations of wild animals in a particular environment. This method’s fundamental premise is that all creatures are found where they first appear. Thus, animal migration that is not controlled by the transect and observer might seriously disrupt the natural food chain in a community. This data set, obtained from [], represents the distances from the transect line for the 68 stakes detected in walking L = 1000 m and searching w = 20 m on each side of the line. The measurements are: 2.0, 0.5, 10.4, 3.6, 0.9, 1.0, 3.4, 2.9, 8.2, 6.5, 5.7, 3.0, 4.0, 0.1, 11.8, 14.2, 2.4, 1.6, 13.3, 6.5, 8.3, 4.9, 1.5, 18.6, 0.4, 0.4, 0.2, 11.6, 3.2, 7.1, 10.7, 3.9, 6.1, 6.4, 3.8, 15.2, 3.5, 3.1, 7.9, 18.2, 10.1, 4.4, 1.3, 13.7, 6.3, 3.6, 9.0, 7.7, 4.9, 9.1, 3.3, 8.5, 6.1, 0.4, 9.3, 0.5, 1.2, 1.7, 4.5, 3.1, 3.1, 6.6, 4.4, 5.0, 3.2, 7.7, 18.2, 4.1. For converting into groups, the bins code of the R computational package is used, and possible groups with respective frequencies are displayed as [0.1, 1.52], [1.52, 3.23], [3.23, 4.45], [4.45, 6.57], [6.57, 9.97], [9.97, 18.6], and the frequencies are 12, 11, 11, 11, 11 and 12, respectively.
Table 9.
Vuong’s test applied on data set-I at .
Table 10 and Table 11 also advocate that SBXL explains the data situation in a better manner. However, the tune of working the SBXL is encouraging in that it not only works in positively skewed data but also has the strength to manage the lepto kurtic curves in a better fashion as compared with the competing distributions.
Table 10.
Summary statistics of data-II.
Table 11.
Theoretical statistical measures from SBXL for data-II.
Moreover, Table 12 and Table 13 represent that the SBXL model and the data conditions are very well by showing the minimum values of and the highest p-value of KS statistics alongside the least values of and .
Table 12.
MLEs and goodness-of-fit of data set-II.
Table 13.
Comparison of data set II fitting via information criterion.
Overall Analysis of Data set-I and II via Goodness of Fit: Table 7 and Table 8 indicate that the proposed model exhibits much better goodness of fit statistics values compared with the competing distribution. However, some silent features are worth mentioning, such as chi-square , A, and W, and KS values are the least among the competing models along with the highest p-value; thus, the mentioned tables totally support the suitability of the proposed model. Further, Table 9 further consolidates our claim of the suitability of a larger Vuong test statistics value. In addition, the proposed model also openly displays its suitability for data set II in which Table 12 and Table 13 exhibit the minimum values of chi-square and A. Additionally, Table 14 suggests that the proposed model is the only model with reliable Vuong statistics. Overall, Table 8 and Table 13 suggest that the proposed model also possesses the minimum values of log-likelihood () and all the other information criteria, especially when compared to its competing four-parameter and three-parameter distributions asserting the acclaimed supremacy.
Table 14.
Vuong’s test (VT) applied on data set II at .
Figure 7, Figure 8, Figure 9 and Figure 10 support the numerical values results of the application for data sets I and II, respectively, which strengthen our claim regarding the dominance of the SBXL model over its respective competitive models.
Figure 7.
Plots of estimated pdf and cdf of the SBXL model for data set-I.
Figure 8.
Plots of estimated pdf and cdf of the SBXL model for data set-II.
Figure 9.
Plots of estimated pdf and cdf of the SBXLL model for data set-III.
Figure 10.
Plots of estimated pdf and cdf of the SBXLL model for data set-IV.
Case-II: Failure time data sets
Failure is the occurrence, or unsuitable state, in which any object or component of an item does not or would not operate as previously defined. Failure analysis is the logical, systematic investigation of a product, its design, use, and documentation after a failure in order to pinpoint the failure mode, pinpoint the failure mechanism, and pinpoint the failure’s fundamental cause. As systems are becoming more diverse, failure time analysis is a discipline whose significance continues to expand. In the subsection under study, we explore two data sets that are related to this field.
Data-III: The braking system on a vehicle defines the safety of the vehicle. The brake pads and disk setup make up the braking system, where the brake pads are critical safety components see []. In this regard, a manufacturer decided to select a sample of vehicles sold over the preceding 12 months at a specific group of dealers. After that period, only the cars that still had the initial pads were reselected. For each car, the brake pad failure time measurement could have been observed. In this regard, the following data represent the failure time of automobile brake pads for 98 cars, where the number of miles or kilometers are driven is known to be related to the pads failure time; see []. However, the current data only present the failure time (in km) data, which is left truncated; see []. In addition, for drawing a valid conclusion, we have created different classes, such as [18.6, 44], [44, 53.9], [53.9, 65], [65, 77.6], [77.6, 91], [91, 166], having a number of observations against each class, which are 15, 15, 14, 15, 14, 15, respectively.
Table 15 and Table 16 also reinforce that SBXLL explains the data situation in a nice way. However, the theoretical values of mean, median, standard deviation, skewness and kurtosis are in accordance with its observed facts. Further, the tune of working the SBXLL is encouraging in that it not only works in positively skewed data but also has the strength to manage the lepto kurtic curves in a better fashion compared with its competing distributions (Table 17 and Table 18).
Table 15.
Summary statistics in relation to data-III.
Table 16.
Theoretical statistical measures of SBXLL from data-III.
Table 17.
Data set-III MLEs and goodness-of-fit.
Table 18.
Comparison of data set III fitting via information criterion.
Furthermore, the VT statistics, as displayed in Table 19, are also in close association with the above results. Thus, our proposed model seems to be a natural choice for such data sets.
Table 19.
Vuong’s test applied on data set III at .
Data IV: A power failure is a period of time during which the electricity supply to a specific structure or area is interrupted, typically as a result of a natural weather event, such as damage to the cables caused by strong winds, lightning, freezing rain, ice buildup on the lines, snow, etc. Power outages can also be triggered by wildlife and tree branches hitting power cables. This data set is obtained from [] the power failures’ lengths measured in minutes: 22, 18, 135, 15, 90, 78, 69, 98, 102, 83, 55, 28, 121, 120, 13, 22, 124, 112, 70, 66, 74, 89, 103, 24, 21, 112, 21, 40, 98, 87, 132, 115, 21, 28, 43, 37, 50, 96, 118, 158, 74, 78, 83, 93, 95. We have also grouped the data with the help of the bins code of the R computational package, where possible classes with respective frequencies are enlisted as [13, 22.7], [22.7, 53.3], [53.3, 78], [78, 95.3], [95.3, 114], [114, 158] and frequencies are 8,7,8, 7,7 and 8, respectively (Table 20 and Table 21).
Table 20.
Summary statistics of data-IV.
Table 21.
Theoretical statistical measures of SBXLL from data-IV.
Moreover, Table 22 and Table 23 offer that the SBXLL models and the data conditions are very well by showing the minimum values of and highest p-value of KS statistics along with the lowest values of , as well as the lowest loss of information behavior.
Table 22.
MLEs and goodness-of-fit related to data set-IV.
Table 23.
Comparison of data set IV fitting via information criterion.
Furthermore, the VT statistics, as displayed in Table 24, are in close association with the above results. Thus, our proposed model seems to be a natural choice for such data sets.
Table 24.
Vuong’s test was applied on data set IV at .
General discussion about data set-III and IV: Table 15 and Table 16 show that data set-III is positively skewed; however, Table 20 and Table 21 related to data set-IV exhibit a negatively skewed behavior of platykurtic nature. In addition, both data sets are in a non-normal phenomenon, which is tested by the Shapiro–Wilk test and found to be non-normal with the Shapiro–Wilk test statistics 0.9603 and 0.9455 with p-values 0.0087 and 0.0342, respectively. Furthermore, for outlier detection, Grubbsťest is used, which indicates that data set-III shows some evidence of outlier presence with critical values of , whereas data set-IV does not produce any sign of outliers with at the 5% level of significance.
Analysis of Data set-III and IV via Goodness of Fit: From Table 17, Table 18, Table 22 and Table 23, it is quite evident that the proposed model yielded much better goodness of fit statistics as compared to its competing distributions. These statistics completely outfit the competing models in all respects. Further, minimum outweighs the VT statistic value in Table 19 and Table 24, which paves the path of suitability of the proposed model. Figure 11 and Figure 12 support the numerical value results of applications for data sets III and IV, respectively, which further solidifies the superiority of SBXLL models over the competitive models.
Figure 11.
Plots of estimated pdf of SBXL and SBXLL models for data set V.
Figure 12.
Plots of estimated cdf of SBXL and SBXLL models for data set V.
Case-III: Biomedical Data Set
Data-V One of the most serious bacterial diseases in the world is mycobacterial tuberculosis (MBT). MBT infection affects two billion people, according to estimates. Since MTB is easily transmitted and long-course chemotherapy treatments are challenging to deliver, controlling the disease is a daunting task. Developing short-term antibiotic regimens to reduce the emergence of drug resistance, developing novel medications to treat TB patients, and developing new vaccines with more efficacy than traditional vaccines, such as BCG, are all critically needed new methods for the control of TB. Organs and tissues from guinea pigs are typically utilized in scientific research. Guinea pig blood transfusions and isolated organ preparations, including lung and intestine from the species, are extensively used in studies to develop novel drugs. The fifth data set corresponds to the survival time of the guinea pigs after receiving an injection of a specific amount of MBT in a medical experiment, as recently studied by [] in the context of comparative parameter estimation techniques. some descriptive measures of the data are reported in Table 25.
Table 25.
Summary statistics of data-V.
The descriptive statistics reveal that data-V has a right-tailed distribution. A higher signifies more varied results when MBT is infused into the bloodstream of guinea pigs. This variability is evident from the kurtosis result of platykurtic characteristics. The result in Table 26 shows that both special models, SBXL and SBXLL, have similar properties to fit data of this nature.
Table 26.
Theoretical statistical measures of SBXL and SBXLL from data-V.
Moreover, Table 27 and Table 28 represent that for SBXL and SBXLL models, the data are displayed very well by showing minimum values of , the highest p-value of KS statistics, and the lowest values of AD and CVM, as well as the lowest loss of information behavior.
Table 27.
MLEs and goodness-of-fit related to data set-V for SBXL and SBXLL models.
Table 28.
Comparison of data set V fitting via information criterion of SBXL and SBXLL.
Furthermore, the VT statistics as displayed in Table 29 are closely related to the above results. These results suggest that the proposed model (SBXL) seems to be more appropriate for such data set.
Table 29.
Vuong’s test applied for the SBXL model on data set V at .
The comparison of VT statistics, presented in Table 30, reasserts the superior behaviour of the proposed SBXLL for the data set.
Table 30.
Vuong’s test applied for the SBXLL model on data set V at .
Analysis of Data set-V via Goodness of Fit: The empirical findings in Table 27 and Table 28 are quite revealing of the fact that the proposed models, SBXL and SBXLL, yield far better goodness of fit statistics than its parallel models. Moreover, the minimum is significant to the VT statistic value in Table 29 and Table 30, which further strengthens the suitability of the proposed model. Figure 11 and Figure 12 support the evaluated results of application for data set V, which further solidifies the superiority of SBXL and SBXLL models over well-established competing models.
7. Conclusions
This article presents a new family under the name Sine Burr family of distributions. Some properties of the proposed family such as moments and moment generating function, percentile function, partial moments, order statistics, Lorenz and Bonferroni Curves and mean deviance are discussed. The model parameters are estimated by the MLE method. Four members of Sine Burr are considered, including Sine Burr Lomax, Sine Burr exponential, Sine Burr Rayleigh and Sine Burr log-logistic distribution. Environmental, failure life testing and biomedical experimental data sets are modeled via Sine Burr Lomax and Sine Burr log-logistic models on four different data sets. In each case, the proposed models produced reliable results while observing the least lost information principles. The fact that the special models stemmed from the proposed generalization are flexible enough to model data sets from such a diverse field makes it a quintessential family for further exploration. To be more concise, we are hopeful that the proposed family, along with its members, will be appealing for extensive applications in numerous fields such as insurance, bio-informatics, economics and queuing theory, as well as meteorology and hydrology.
Author Contributions
Conceptualization, I.E. and M.E.; methodology, H.E.S.; software, S.K.; validation, M.M.A., T.H. and N.A.; formal analysis, S.K. and T.H.; investigation, N.A.; resources, H.E.S.; writing—original draft preparation, M.E. and I.E.; writing—review and editing, M.E., S.K. and I.E.; visualization, N.A.; supervision, N.A.; project administration, H.E.S.; funding acquisition, I.E. All authors have read and agreed to the published version of the manuscript.
Funding
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through project number IFP-IMSIU202203.
Data Availability Statement
All the data sets are readily available in the manuscript.
Conflicts of Interest
The authors declare no conflict to interest.
References
- Marshall, A.; Olkin, I. A new method for adding a parameter to a family of distributions with applications to the exponential and Weibull families. Biometrika 1997, 84, 641–652. [Google Scholar] [CrossRef]
- Eugene, N.; Lee, C.; Famoye, F. Beta-normal distribution and its applications. Commun. Stat. Theory Methods 2002, 31, 497–512. [Google Scholar] [CrossRef]
- Cordeiro, G.; de Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2011, 81, 883–898. [Google Scholar] [CrossRef]
- Haq, M.; Elgarhy, M. The odd Fréchet- G family of probability distributions. J. Stat. Appl. Probab. 2018, 7, 189–203. [Google Scholar] [CrossRef]
- Torabi, H.; Montazeri, N.H. The logistic-uniform distribution and its application. Commun. Stat. Simul. Comput. 2014, 43, 2551–2569. [Google Scholar] [CrossRef]
- Cordeiro, G.; Ortega, E.; da Cunha, D.C. The exponentiated generalized class of distributions. J. Data Sci. 2013, 11, 1–27. [Google Scholar] [CrossRef]
- Zubair, A.; Elgarhy, M.; Hamedani, G.; Butt, N. Odd generalized N-H generated family of distributions with application to exponential model. Pak. J. Stat. Oper. Res. 2020, 16, 53–71. [Google Scholar]
- Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions. Metron 2013, 71, 63–79. [Google Scholar] [CrossRef] [Green Version]
- Badr, M.M.; Elbatal, I.; Jamal, F.; Chesneau, C.; Elgarhy, M. The transmuted odd Fréchet-G family of distributions: Theory and applications. Mathematics 2020, 8, 958. [Google Scholar] [CrossRef]
- Aldahlan, M.A.; Jamal, F.; Chesneau, C.; Elbatal, I.; Elgarhy, M. Exponentiated power generalized Weibull power series family of distributions: Properties, estimation and applications. PLoS ONE 2020, 15, e0230004. [Google Scholar] [CrossRef] [Green Version]
- Bourguignon, M.; Silva, R.B.; Cordeiro, G.M. The Weibull-G family of probability distributions. J. Data Sci. 2014, 12, 1253–1268. [Google Scholar] [CrossRef]
- Cordeiro, G.; Alizadeh, M.; Ortega, E. The exponentiated half-logistic family of distributions: Properties and applications. J. Probab. Stat. 2014, 81, 1–21. [Google Scholar] [CrossRef]
- Hassan, A.S.; Elgarhy, M.; Shakil, M. Type II half Logistic family of distributions with applications. Pak. J. Stat. Oper. Res. 2017, 13, 245–264. [Google Scholar]
- El-Sherpieny, E.S.A.; Muhammed, H.Z.; Almetwally, E.M. Bivariate Weibull-G family based on copula function: Properties, Bayesian and non-Bayesian estimation and applications. Statistics. Optim. Inf. Comput. 2022, 10, 678–709. [Google Scholar] [CrossRef]
- ElSherpieny, E.A.; Almetwally, E.M. The Exponentiated Generalized Alpha Power Family of Distribution: Properties and Applications. Pak. J. Stat. Oper. Res. 2022, 8, 349–367. [Google Scholar] [CrossRef]
- Alotaibi, N.; Elbatal, I.; Almetwally, E.M.; Alyami, S.A.; Al-Moisheer, A.S.; Elgarhy, M. Truncated Cauchy Power Weibull-G Class of Distributions: Bayesian and Non-Bayesian Inference Modelling for COVID-19 and Carbon Fiber Data. Mathematics 2022, 10, 1565. [Google Scholar] [CrossRef]
- Elbatal, I.; Alotaibi, N.; Almetwally, E.M.; Alyami, S.A.; Elgarhy, M. On Odd Perks-G Class of Distributions: Properties, Regression Model, Discretization, Bayesian and Non-Bayesian Estimation, and Applications. Symmetry 2022, 14, 883. [Google Scholar] [CrossRef]
- Algarni, A.; MAlmarashi, A.; Elbatal, I.; SHassan, A.; Almetwally, E.M.; MDaghistani, A.; Elgarhy, M. Type I half logistic Burr XG family: Properties, Bayesian, and non-Bayesian estimation under censored samples and applications to COVID-19 data. Math. Probl. Eng. 2021, 2021, 5461130. [Google Scholar] [CrossRef]
- Al-Babtain, A.A.; Elbatal, I.; Chesneau, C.; Elgarhy, M. Sine Topp-Leone-G family of distributions: Theory and applications. Open Phys. 2020, 18, 74–593. [Google Scholar] [CrossRef]
- Bantan, R.A.; Chesneau, C.; Jamal, F.; Elgarhy, M. On the Analysis of New COVID-19 Cases in Pakistan Using an Exponentiated Version of the M Family of Distributions. Mathematics 2020, 8, 953. [Google Scholar] [CrossRef]
- Bantan, R.A.; Jamal, F.; Chesneau, C.; Elgarhy, M. A New Power Topp–Leone Generated Family of Distributions with Applications. Entropy 2019, 21, 1177. [Google Scholar] [CrossRef] [Green Version]
- Bantan, R.A.; Jamal, F.; Chesneau, C.; Elgarhy, M. Truncated inverted Kumaraswamy generated family of distributions with applications. Entropy 2019, 21, 1089. [Google Scholar] [CrossRef] [Green Version]
- Tahir, M.H.; Cordeiro, G.M.; Alzaatreh, A.; Mansoor, M.; Zubair, M. The Logistic-X family of distributions and its applications. Commun. Stat.-Theory Methods 2016, 45, 7326–7349. [Google Scholar] [CrossRef] [Green Version]
- Cordeiro, G.M.; Alizadeh, M.; Tahir, H.; Mansoor, M.; Bourguignon, M.; Hamedani, G. The beta odd log-logistic family of distributions. Hacet. J. Math. Stat. 2015, forthcoming. [Google Scholar] [CrossRef]
- Mahdavi, A.; Kundu, D. A new method for generating distributions with an application to exponential distribution. Commun. Stat.-Theory Methods 2017, 46, 6543–6557. [Google Scholar] [CrossRef]
- Elbatal, I.; Aryal, G. A new generalization of the exponential Pareto distribution. J. Inf. Optim. Sci. 2017, 38, 675–697. [Google Scholar] [CrossRef]
- Elbatal, I.; Altun, E.; Afify, A.Z.; Ozel, G. The Generalized Burr XII Power Series Distributions with Properties and Applications. Ann. Data Sci. 2018, 6, 571–597. [Google Scholar] [CrossRef]
- Elbatal, I.; Mansour, M.M.; Ahsanullah, M. The Additive Weibull-Geometric Distribution: Theory and Applications. J. Stat. Theory Appl. 2016, 15, 125–141. [Google Scholar] [CrossRef] [Green Version]
- Shahzad, M.N.; Ullah, E.; Hussanan, A. Beta Exponentiated Modified Weibull Distribution: Properties and Application. Symmetry 2019, 11, 781. [Google Scholar] [CrossRef] [Green Version]
- Kumar, D.; Singh, U.; Singh, S.K. A New Distribution Using Sine Function Its Application to Bladder Cancer Patients Data. J. Stat. Appl. Probab. 2015, 4, 417–427. [Google Scholar]
- Nadarajah, S.; Kotz, S. Beta Trigonometric Distribution. Port. Econ. J. 2006, 5, 207–224. [Google Scholar] [CrossRef]
- Al-Faris, R.Q.; Khan, S. Sine Square Distribution: A New Statistical Model Based on the Sine Function. J. Appl. Probab. Stat. 2008, 3, 163–173. [Google Scholar]
- Raab, D.H.; Green, E.H. A cosine approximation to the normal distribution. Psychometrika 1961, 26, 447–450. [Google Scholar] [CrossRef]
- Kharazmi, O.; Saadatinik, A.; Jahangard, S. Odd Hyperbolic Cosine Exponential-Exponential (OHC-EE) Distribution. Ann. Data Sci. 2019, 6, 765–785. [Google Scholar] [CrossRef] [Green Version]
- Kharazmi, O.; Saadatinik, A.; Alizadeh, M.; Hamedani, G.G. Odd hyperbolic cosine-FG (OHC-FG) family of lifetime distributions. J. Stat. Appl. 2018, 18, 387–401. [Google Scholar]
- Bleed, S.; Abdelali, A. Transmuted Arcsine Distribution Properties and Application. Int. J. Res. 2018, 10, 1–11. [Google Scholar] [CrossRef]
- Wenjing, H.; Afify, Z.; Goual, H. The Arcsine Exponentiated-X Family: Validation and Insurance Application. Complexity 2020, 2020, 8394815. [Google Scholar] [CrossRef]
- Yousof, H.M.; Ahmed, Z.; Hamedani, G.H.; Aryal, G. The Burr X generator of distributions for lifetime data. J. Stat. Theory Appl. 2016, 16, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Fayomi, A.; Khan, S.; Tahir, M.H.; Algarni, A.; Jamal, F.; Abu-Shanab, R. A New Extended Gumbel distribution: Properties and Application. PloS ONE 2022, 17, e0267142. [Google Scholar] [CrossRef]
- Lee, C.; Famoye, F.; Olumolade, O. Beta-Weibull Distribution: Some Properties and Applications to censored Data. J. Mod. Appl. Stat. Methods 2007, 6, 17. [Google Scholar] [CrossRef]
- Rajab, M.; Aleem, M.; Nawaz, T.; Daniyal, M. On Five Parameter Beta Lomax Distribution. J. Stat. 2013, 20, 102–118. [Google Scholar]
- Mead, M.E. On Five-Parameter Lomax Distribution:Properties and Applications. Pak. J. Stat. Oper. Res. 2015, 12, 185–199. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; Ortega, E.M.M.; Ramires, T.G. A new generalized Weibull family of distributions: Mathematical properties and applications. J. Stat. Distrib. Appl. 2015, 2, 13. [Google Scholar] [CrossRef] [Green Version]
- Pal, M.; Ali, M.M.; Woo, J. Exponentiated Weibull distribution. Statistica 2006, 66, 139–147. [Google Scholar] [CrossRef]
- Mahmood, Z.; Chesneau, C. A New Sine-G Family of Distributions: Properties and Applications. Bull. Comput. App. Math. 2019, 7, 53–81. [Google Scholar]
- Gupta, R.D.; Kundu, D. Generalized exponential distribution. Austral N. Z. J. Stat. 1999, 41, 173–188. [Google Scholar] [CrossRef]
- Nadarajah, S.; Bakouch, H.S.; Tahmasbi, R. A generalized Lindley distribution. Sankhya B 2011, 73, 331–359. [Google Scholar] [CrossRef]
- Chesneau, C.; Bakouch, H.S.; Hussain, T.; Para, B.A. The cosine geometric distribution with count data modeling. J. Appl. Stat. 2021, 48, 124–137. [Google Scholar] [CrossRef]
- Hussain, T.; Bakouch, H.S.; Chesneau, C. A new probability model with application to heavy-tailed hydrological data. Environ. Ecol. Stat. 2019, 26, 12–151. [Google Scholar] [CrossRef]
- Hussain, T.; Bakouch, H.S.; Iqbal, Z. A New Probability Model for Hydrologic Events: Properties and Applications. J. Agric. Environ. Stat. 2018, 23, 63–82. [Google Scholar] [CrossRef]
- Vuong, Q.H. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
- Ross, M.S. Introductory Statistics, 3rd ed.; Elsevier: Oxford, UK, 2010; p. 365. [Google Scholar]
- Patil, G.P.; Rao, C.R. Handbook of Statistics 12: Environmental Statistics; Elsevier Science: Amsterdam, The Netherlands, 1994; p. 35. [Google Scholar]
- Mukherjee, I.; Maiti, S.S.; Singh, V.V. Study on estimators of the PDF and CDF of the one parameter polynomial exponential distribution. arXiv 2020, arXiv:2006.06272v1. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).