1. Introduction
In the current statistical literature, various univariate continuous distributions may be employed in a variety of data modeling applications. Furthermore, it appears that the number of accessible distributions is insufficient to handle the different data found in domains such as medicine, biology, demography, engineering sciences, actuarial science, finance, economics, and dependability [
1]. Researchers in statistics and applied mathematics are interested in developing new extended continuous distributions that are more effective for data modeling. Methods for expanding well-known distributions include adding parameters, compounding, generating, transforming, and composing. Several statisticians were drawn to develop novel models in recent decades by the emergence of new families of continuous distributions. Our specific interest is in the TCP-G family, which was presented by [
2]. The TCP-G family’s cumulative distribution function (CDF) and probability density function (PDF) are defined below:
and
where
and
are the CDF and PDF, respectively, for any baseline distribution, with the set of parameters
and
being a shape parameter of the TCP-G family.
On the basis of the TCP-G family, relevant studies were supplied, for example, the TCP Weibull-G family [
3], TCP inverse exponential distribution [
4], TCP odd Fr
chet-G family [
5], and TCP Lomax distribution [
6].
Because of their application, inverted or inverse distributions are essential in many domains, including biological sciences, chemical data, life test issues, medical sciences, etc. In terms of the density and hazard rate function (HRF), inverted conformation distributions differ from non-inverted conformation distributions. Many authors studied these inverted models, such as the inverse power Lindley distribution [
7], inverted Kumumaraswamy distribution [
8], inverted length-biased exponential distribution [
9], inverted log-logistic distribution [
10], inverted Gompertz distribution [
11], inverted Lindley distribution [
12], inverted generalized linear exponential distribution [
13], inverted Nakagami-m distribution [
14], inverted Nadarajah–Haghigh distribution [
15], inverse power Maxwell distribution [
16], inverse power Lomax distribution [
17], discrete inverse Burr distribution [
18], discrete inverse Rayleigh distribution [
19], inverse Weibull distribution [
20], inverse Sushila distribution [
21], and inverse log-gamma distribution [
22].
The CDF and PDF of the ITL model with a shape parameter
a in [
23] is provided with:
and
Some academics investigated and created novel extensions and generalizations of the ITL distribution, including the power ITL distribution investigated by [
24], Kumaraswamy ITL distribution discussed by [
25], alpha power ITL distribution proposed by [
26], half-logistic ITL distribution suggested by [
27], the odd log-logistic Topp–Leone G family by [
28], and the Burr III-Topp–Leone-G family by [
29].
The primary purpose of this article is to present and examine the statistical properties of a novel two-parameter model known as the truncated Cauchy power-inverted Topp–Leone (TCP-ITL) distribution. The following considerations persuaded us to investigate the suggested model. It is specified as follows:
It is fascinating to see the suggested model’s adaptability with the various graphical forms of the PDF and HRF. As a result, the numerical and graphical analyses of the related PDF and HRF revealed unexpected features, demonstrating the previously unknown fitting capability of the TCP-ITL.
Some different statistical features of the TCP-ITL model, such as the QF, moments and incomplete moments, moment-generating function, and four different types of entropy, such as the Rnyi entropy (RE), Havrda and Charvat entropy (HaChE), Tsallis entropy (TSE), and Arimoto entropy (ArE).
The statistical inference of the model parameters under complete and hybrid censored data by using the maximum likelihood (ML), maximum product of spacing (MPSP), and Bayesian estimation approaches are explored.
The potential of the TCP-ITL distribution is demonstrated using four real data sets in contrast to the ITL, the Kumaraswamy (K), Marshall–Olkin–Kumaraswamy (MOK), beta, alpha power Kumaraswamy (APK), and exponentiated Kumaraswamy (EK) distributions. According to the results of the criterion measurements, the recommended distribution is the best option for the data sets under consideration.
Structure of the Paper
This paper has the following structure.
Section 2 describes the construction of the TCP-ITL model. The CDF, PDF, reliability function (RF), and HRF, as well as the asymptotes and graphical forms for the PDF and HRF, are all provided in
Section 2.
Section 3 establishes clear representations of several fundamental aspects of the proposed TCP-ITL, such as the QF, linear representation of the PDF,
rth ordinary, and
sth incomplete moments and moment-generating function. Various forms of an entropy measure are proposed in
Section 4. In
Section 5, we carry out the estimation using three approaches, the ML approach, MPSP approach, and Bayesian approach, to estimate the unknown parameters of the TCP-ITL model. In
Section 6, a Monte Carlo simulation examination is conducted to determine the efficiency of the three recommended estimation methodologies. In
Section 7, we apply the TCP-ITL using three genuine data sets. Furthermore, the suggested model is compared to many well-known comparison models, including the ITL, K, MOK, beta, APK, and EK models. Eventually, in
Section 8, we offer some final thoughts on our results from all aspects of this research.
5. Model Inference and Estimation Method
Hybrid censoring is discussed briefly in this section. Assume that n test units are uniformly distributed with the PDF and that is a vector of unknown parameters. Let represent the ordered s of these test units. Remember that a life test experiment is ended under hybrid censoring when either a pre-specified time T or a pre-determined r number of units fail. In this scenario, an experiment is to be stopped at a random time point , where . As a result, the observed lifespan under this censorship could fall into one of three categories:
- Category I:
, if and (type-I censored).
- Category II:
, if (type-II censored).
- Category III:
, if and (complete sample).
Notice that Category I, Category II, and Category III, respectively, correspond to the type-I, type-II censoring, and complete sample. Then,
is a censored hybrid sample with the PDF
and the CDF
describing its distribution. The corresponding likelihood function can then be expressed as
For more information on the hybrid censoring sample, see Balakrishnan and Kundu [
36]. For more information examples, see [
37]’s obtained Bayesian estimation and prediction for a hybrid censored lognormal distribution.
To evaluate the estimation problem of the TCP-ITL distribution based on hybrid censoring samples, this part uses three estimate methods: the maximum likelihood, maximum product of spacing, and Bayesian.
5.1. Maximum Likelihood Estimation
The maximum likelihood estimators (MLEs) of the TCP-ITL distribution were investigated. It was worked as the case when both
are unknown. Suppose that
be a random sample from the TCP-ITL distribution and assume that
be the parameter vector. The likelihood function of the TCP-ITL distribution under the hybrid censored samples takes the form
The log-likelihood
function is defined as follows:
The components of score vector
are given below
and
To produce the MLE, two nonlinear systems of equations that are differentiating (
20) and (
21) with respect to
and
and equating each solution to zero must be solved concurrently. By using the ’maxLik’ package, which uses the Newton–Rabson (NR) method of maximization in the maximum likelihood computations, one can utilize the R statistical programming language software to calculate the desired MLEs
and
for any given data set.
5.2. Maximum Product of Spacing Method
If
is a random sample of the size
n, you can describe the uniform spacing as:
where
denotes the uniform spacings,
,
, and
. This is the general form of MPS with hybrid censored samples.
The maximum product of spacing (MPS) estimators (MPSE) of the TCP-ITL distribution parameters based on hybrid censored samples can be obtained by maximizing
with respect to
a and
. Further, the MPSE of the TCP-ITL distribution can also be obtained by solving the nonlinear equation of derivatives of
with respect to
and
.
5.3. Bayesian Estimation
In this subsection, the Bayesian estimation of the parameter of the model is obtained when data are observed based on the squared error loss function (SELF), which is defined by
where
is an estimator of
. Denote the prior and posterior distributions of
by
and
, respectively. Under the SELF, the Bayesian estimation of any function
of
is given by
Prior distribution is important for the development of Bayes estimators.
Under the assumption of gamma prior distributions, we investigate this estimate problem. Therefore, it is assumed here that
a and
follow independent gamma distributions with
, and
, with probability densities given by, respectively,
Using the informative prior (
26) and the likelihood function (
18), the joint posterior density can be derived as follows:
The marginal posterior densities of the parameters
a and
can be derived as
Because the marginal posterior densities in (
28) are not well-known distributions, we will utilize the Metropolis–Hastings sampler to produce values for
a and
using the normal proposal distribution in (
28).
Furthermore, Chen and Shao’s [
38] approach was widely used to create the highest posterior density (HPD) intervals for the Bayesian estimation with uncertain benefit distribution parameters. For example, using two endpoints from the MCMC sample outputs,
and
percentiles, a
HPD interval can be produced. The Bayes trustworthy intervals for the
, and
parameters are calculated as follows:
Sorted parameters as , and , and N is the length of MCMC generated.
The symmetric credible intervals of and become and .
6. Simulation
In this section, we conduct a simulation study to compare the performance of the proposed methods. We first simulate the hybrid censored data from the TCP-ITL distribution for different choices of n and r for all cases as:
If , , and 50. While , , and 100.
The time of the hybrid censored sample was changed for each case as follows:
In
Table 4: If
, and 9999 when
. If
, and 999 when
. If
, and 99 when
.
In
Table 5: If
, and 9999 when
. If
, and 99 when
. If
, and 99 when
.
In
Table 6: If
, and 99,999,999 when
. If
, and 99 when
. If
, and 99 when
.
In the simulation study, the comparison between the MLE, MPS, and Bayesian estimation methods were discussed, although we know it is impossible to compare Bayesian methods to a classical estimation method, but by using information of the MLE to generate the Bayesian estimate, we can compare between the MLE and Bayesian estimation methods. Many recent papers discussed the comparison between the MLE and Bayesian and also different estimation methods. The mathematical difference between the MLE and Bayesian is the parameters have prior distribution (random variables). We used gamma as the prior distribution with shapes and scale parameters (hyper-parameters). Now, how to select the hyper-parameters? We select the hyper-parameters by using the information of the MLE and gamma information. This method is denoted as the elicitation of hyper-parameters, see Dey et al. [
39].
By equating
and
with the mean and variance of gamma priors distribution, we may determine their respective means and variances. We obtain
where
N is a total iteration of simulation. Now, on solving the above two equations, the estimated hyper-parameters can be written as
We then compute the MLE and MPS of
a and
using the NR algorithm and Bayesian estimates using the Metropolis–Hastings (MH) algorithm based on 10000 Monte Carlo simulations. We would like to point out that we used the R programming language to generate estimators for the shake computation. We recommend utilizing the ’maxLik’ package, which solves classical estimates using the NR algorithm of maximizing in numerical calculations, see [
40], and the ’CODA’ package, which simulates MCMC varieties to generate Bayesian estimates, see [
41]. It is seen that the performance of the MLE, MPS, and Bayesian estimates were obtained using the mean square error (MSE) and the length of the confidence intervals values. In the confidence interval, the asymptotic confidence interval for the MLE and MPS were determined where the length of these terms is L.ACI, see [
42,
43]. While in the Bayesian estimation, the credible confidence interval (L.CCI) was obtained. To determine the best method, the smallest terms of the MSE and the length of the confidence interval values were selected.
We can draw the following conclusions from
Table 4,
Table 5 and
Table 6. As expected, the proposed estimates of
a and
perform better as n increases in terms of their MSE and length of CI. The findings showed that the MSE and length of CI decrease with the sample size. These results unequivocally show the accuracy and consistency of the estimators. As a result, the three estimation approaches do a good job of the TCP-ITL distribution parameters. We show the Bayesian method of estimation is better than the other methods. It is also observed that the L.CCI are smaller as compared to the length of the CI.
Figure 6 discusses the heat map of the MSE for
Table 4,
Table 5 and
Table 6, including the following:
X-label: Bayes1 is the MSE of a for the Bayesian, Bayes2 is the MSE of for the Bayesian, MPS1 is the MSE of a for the MPS, MPS2 is the MSE of for the MPS, MLE1 is the MSE of a for the MLE, and MLE2 is the MSE of for the MLE.
Y-label: n50c1s1 is the MSE when , , , and the first T value; n50c1s2 is the MSE when , , and the second T value; n50c1s3 is the MSE when , , , and the third T value; n50c2s1 is the MSE when , , , and the first T value; n50c2s2 is the MSE when , , , and the second T value; n50c2s3 is the MSE when , , , and the third T value; n50c3s1 is the MSE when , , , and the first T value; n50c3s2 is the MSE when , , , and the second T value; n50c3s3 is the MSE when , , , and the third T value; n100c1s1 is the MSE when , , , and the first T value; n100c1s2 is the MSE when , , , and the second T value; n100c1s3 is the MSE when , , , and the third T value; n100c2s1 is the MSE when , , , and the first T value; n100c2s2 is the MSE when , , , and the second T value; n100c2s3 is the MSE when , , , and the third T value; n100c3s1 is the MSE when , , , and the first T value; n100c3s2 is the MSE when , , , and the second T value; n100c3s3 is the MSE when , , , and the third T value. The dark color indicates that the MSE value is large, while the light color indicates that the MSE value is small.
7. Application of Real Data
In this section, three actual data sets are used to demonstrate the TCP-ITL distribution’s potential. The TCP-ITL distribution is contrasted with many rival models, including the odd log-logistic modified Weibull (OLLMW) distribution by Saboor et al. [
44], Kumaraswamy Weibull (KW) distribution by Cordeiro et al. [
45], extended odd Weibull Lomax (EOWL) distribution by Alsuhabi et al. [
46], Weibull–Lomax (WL) distribution by Tahir et al. [
47], extended Weibull (EW) distribution by Peng et al. [
48], modified Kies inverted Topp–Leone (MKITL) distribution by Almetwally et al. [
49], inverse Weibull (IW) distribution and X-gamma Lomax (XGL) distribution by Almetwally et al. [
50], generalized inverse Weibull (GIW) distribution by De Gusmao et al. [
51], and gamma distribution.
For Global Reserves Natural data set I in
Table 7, we obtained different comparison models in
Table 8 and results of estimation methods.
Table 9,
Table 10 and
Table 11 provide values for the Cramer–von Mises (CVM), Anderson–Darling (AD), and Kolmogorov–Smirnov (KSD) statistics, along with their P-values (PVKS), for all the models fitted based on the three real data sets. These statistics include the Akaike information criterion (AIC), correct Akaike information criterion (CAIC), Bayesian information criterion (BIC), and Hannan–Quinn. The MLE and standard errors (SE) of the parameters for the models under consideration are also included in these tables. The SE values were obtained by the square root of the diagonal of the inverse of a Hessian matrix, where we obtained the Hessian matrix by using the ’maxLik’ package. When compared to all the other models applied to each real data set in
Table 9,
Table 10 and
Table 11, the TCP-ITL distribution has the highest P-value and the lowest KS, CvM, AD, AIC, BIC, HQIC, and CAIC values.
Figure 7,
Figure 8 and
Figure 9 show the fit empirical, histogram, QQ-plot, and PP-plot for the TCP-ITL distribution for the COVID-19 data of the United Kingdom and Canada.
Table 8,
Table 12, and
Table 13 discuss the different estimation methods for the parameters of the TCP-ITL distribution based on the hybrid censored samples. By these results, we note the time
T increases and the size
r increases, and the SE decreased. The Bayesian estimation method has the smallest SE for the parameters of the TCP-ITL distribution based on the hybrid censored samples comparing the MLE. The MPS is not applicable in data I and III because we note the data sets have the same observation.
For the Bayesian estimation, we discussed
Figure 10,
Figure 11 and
Figure 12 of the MCMC results to check the convergences, and we conclude the MCMC results have convergence.
7.2. Data Set II
The second data set was obtained from the WHO [
52], with the set of data belonging to Senegal for 31 days from 18 July 2021 to 17 August 2021 where the mortality rate received for COVID-19 is 0.1017, 0.1179, 0.1361, 0.1720, 0.1885, 0.1867, 0.1465, 0.0904, 0.2144, 0.0883, 0.2447, 0.1207, 0.1869, 0.2504, 0.3282, 0.2271, 0.2897, 0.1437, 0.4596, 0.2817, 0.2469, 0.1528, 0.2269, 0.1954, 0.4639, 0.2820, 0.1323, 0.2334, 0.2470, 0.1882, and 0.2022. The mortality rate equation is
Throughout this subsection, we apply the TCP-ITL model to a real-world data set to assess its adaptability. To compare the TCP-ITL model to the other six fitted distributions, one, two, and three parameters are employed. We compare the TCP-ITL distribution with the ITL [
23], beta, Kumaraswamy (K), Marshall–Olkin–Kumaraswamy (MOK) [
53], alpha power Kumaraswamy (MOK) [
54], and exponentiated Kumaraswamy (EK) [
55].
The parameter estimates of the MLE with the standard error (SE) and the numerical value are presented in
Table 9 and
Table 10. Moreover, the numerical values of the KSD and its PVKS, AIC, BIC, HQIC, and CAIC statistics for the data sets are presented in
Table 9 and
Table 10. From
Table 9 and
Table 10, the values of the KSD, AIC, BIC, HQIC, and CAIC are minimum for the TCP-ITL distribution. Thus, the TCP-ITL distribution is a better model for the data sets as compared with the other six models.
Figure 7,
Figure 8 and
Figure 9 display the fitted PDF plots of each data set.
7.3. Data Set III
The third data set, given in Dumonceaux and Antle, includes 20 observations of the maximum flood level (in millions of cubic feet per second) for the Susquehanna River near Harrisburg, Pennsylvania [
56], and Mazucheli et al. [
57]. The data are as follows: 0.26, 0.27, 0.30, 0.32, 0.32, 0.34, 0.38, 0.38, 0.39, 0.40, 0.41, 0.42, 0.42, 0.42, 0.45, 0.48, 0.49, 0.61, 0.65, 0.74.