Next Article in Journal
An Unsupervised Machine Learning-Based Framework for Transferring Local Factories into Supply Chain Networks
Next Article in Special Issue
An Unconditional Positivity-Preserving Difference Scheme for Models of Cancer Migration and Invasion
Previous Article in Journal
A First Approach to Closeness Distributions
Previous Article in Special Issue
A Mathematical Model to Control the Prevalence of a Directly and Indirectly Transmitted Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Generalized DUS Transformed Log-Normal Distribution and Its Applications to Cancer and Heart Transplant Datasets

by
Muhammed Rasheed Irshad
1,
Christophe Chesneau
2,*,
Soman Latha Nitin
3,
Damodaran Santhamani Shibu
3 and
Radhakumari Maya
4
1
Department of Statistics, Cochin University of Science and Technology, Cochin 682 022, Kerala, India
2
Department of Mathematics, Université de Caen Basse-Normandie, LMNO, UFR de Sciences, F-14032 Caen, France
3
Department of Statistics, University College, Thiruvananthapuram 695 034, Kerala, India
4
Department of Statistics, Government College for Women, Thiruvananthapuram 695 014, Kerala, India
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(23), 3113; https://doi.org/10.3390/math9233113
Submission received: 20 October 2021 / Revised: 27 November 2021 / Accepted: 1 December 2021 / Published: 2 December 2021
(This article belongs to the Special Issue Mathematical Modeling and Analysis in Biology and Medicine)

Abstract

:
Many studies have underlined the importance of the log-normal distribution in the modeling of phenomena occurring in biology. With this in mind, in this article we offer a new and motivated transformed version of the log-normal distribution, primarily for use with biological data. The hazard rate function, quantile function, and several other significant aspects of the new distribution are investigated. In particular, we show that the hazard rate function has increasing, decreasing, bathtub, and upside-down bathtub shapes. The maximum likelihood and Bayesian techniques are both used to estimate unknown parameters. Based on the proposed distribution, we also present a parametric regression model and a Bayesian regression approach. As an assessment of the longstanding performance, simulation studies based on maximum likelihood and Bayesian techniques of estimation procedures are also conducted. Two real datasets are used to demonstrate the applicability of the new distribution. The efficiency of the third parameter in the new model is tested by utilizing the likelihood ratio test. Furthermore, the parametric bootstrap approach is used to determine the effectiveness of the suggested model for the datasets.

1. Introduction

In practice, the log-normal (LN) distribution has a wide variety of applications in an empirical sense for fitting data. In biology, too, there are diverse applications for the LN distribution. The presence of the LN distribution in biological science has been highlighted on numerous occasions. Earlier, in a study of the relationship between genes and characters in quantitative inheritance, [1] utilized the LN theory. The bivariate LN distribution has been examined by [2] in specific references to allometry, the study of biological scaling. In terms of statistical data derived from biological and agricultural sources, [3] provided much more general references. According to [4], a study on the intricacy of the biochemical processes involved in gene expression has induced an emergent LN distribution of expression levels. Again, ref. [5] discovered that a form of the LN distribution fit the postpartum blood loss data from several geographical areas quite well, implying that the LN distribution may fit postpartum blood loss globally.
In real life, the traditional basic distributions often fail to characterize and do not accurately predict most of the real-life datasets arising from complicated phenomena. Since the quality of results by statistical analysis heavily depends on the assumed model, there is huge importance in the selection of an adaptive model for analyzing the data. For this reason, it is necessary to find more allied distributions to get better quality and more accurate results. Since the LN distribution has superior importance in the field of biological sciences, it is inevitable to derive a new extended version of the LN distribution not only for modelling the biological data but also for the variety of datasets from other study areas where the LN distribution has the best fit. Note that, the LN distribution has been utilized in a range of domains which includes most of the applied areas, such as economics, sociology, and meteorology, to name just a few examples. For more applications of LN distribution in biology as well as in various study areas, one can go through the references [6,7].
On the mathematical side, the probability density function (pdf) for a LN random variable W is given by
q ( w ) = 1 2 π σ w exp ( log w μ ) 2 2 σ 2 , w > 0 , μ R , σ > 0 .
Thus, the LN distribution depends on two parameters, a scale parameter μ and a shape parameter σ . Recently, there has been a surge in interest in the art of adding parameters to well-known existing distributions in order to get different shapes of hazard/failure rate functions (i) for applying them in various real-life situations and (ii) for analyzing data with a high degree of skewness and kurtosis. A fair review of some of the extended models is presented in [8]. As in the context of extending or generalizing baseline distributions, several authors have started to develop families of distributions based on conventional distributions or using some other techniques. Thus, in this article we propose a new extended version of the LN distribution by using a transformation technique that includes an additional shape parameter. We aim to reveal some statistical properties of the proposed model and apply them to real-life data. The chief motivations for introducing this extended lifetime model are to (i) propose a new flexible version of the LN distribution that can be used, especially to model biological data, since the LN distribution has eminent superiority in biological sciences and its related fields, and also to be applied in a wider class of other reliability problems, and (ii) to possess some new additional shapes on the hazard/failure rate.
The remaining part of the article is structured as follows. Section 2 reveals the method of construction of the distribution. In Section 3, we define the considered distribution and examine the hazard rate function. The quantile function and some of its associated measures are derived in Section 4. In Section 5, the maximum likelihood (ML) and Bayesian estimation techniques are used to estimate the unknown parameters of the new model. Furthermore, a parametric bootstrap method of simulation using the ML estimates (MLEs) is presented in Section 6. A parametric regression model associated with the new distribution is defined in Section 7. Again, a Bayesian regression method is presented in Section 8. To analyze the consistency of ML and Bayesian estimates of the model parameters, two types of simulation studies are conducted in Section 9. In Section 10, we compare the potentiality of the proposed distribution to competing distributions using two real datasets, one univariate uncensored dataset, and one censored dataset, both based on biological science. Finally, Section 11 covers the penultimate concluding remarks.

2. Construction of the New Distribution

Ref. [9] suggested a transformation method known as the DUS transformation, which utilizes exponential as the baseline distribution and is termed the DUS exponential ( D U S E ) distribution. If G ( x ) is the cumulative distribution function (cdf) of some baseline continuous distribution, then the DUS transformation yields a new cdf given by
F ( x ) = exp [ G ( x ) ] 1 e 1 , x R .
The benefit of utilizing this transformed modification is that the new distribution will generate a computation-efficient distribution as it never contains any new parameter other than the parameter(s) involved in the baseline distribution. Again, ref. [10] introduced a new generalized form of DUS transformation and the authors took the exponential distribution as the baseline distribution. The cdf of the generalized DUS (GDUS) transformation is given by
F ( x ) = exp [ G α ( x ) ] 1 e 1 , x R , α > 0 .
Considering the immense applicability of the LN distribution as specified in the previous section, we propose to apply it as the baseline distribution in the GDUS transformation.

3. Definition of the Distribution

The definition of the new distribution, as well as several key features, are presented in this section. Henceforth, we call the new distribution the generalized DUS transformed log-normal (GDUSLN) distribution, and it is defined as follows:
Definition 1.
We say that a random variable X follows the GDUSLN distribution with parameters α , μ and σ if its cdf is given by
F ( x ) = exp Φ α log x μ σ 1 e 1
and its pdf is given by
f ( x ) = α σ x ( e 1 ) ϕ log x μ σ Φ α 1 log x μ σ exp Φ α log x μ σ ,
where x > 0 , μ R and α , σ > 0 . Furthermore, Φ ( . ) and ϕ ( . ) are the cdf and pdf of the standard normal distribution, respectively. It is understood that F ( x ) = f ( x ) = 0 for x 0 .
The plots in Figure 1 and Figure 2 portray the corresponding cdf and pdf of the GDUSLN distribution.
We observe that the pdf may be decreasing and unimodal with a certain flexibility in the mode and tails. It is, however, mainly right-skewed or almost symmetrical.
The cdf of the GDUSLN distribution in (1) is mitigated to the cdf of the DUS transformed log-normal (DUSLN) distribution, once α = 1 . It is worth mentioning that the DUSLN distribution is not discussed in the available literature.

Hazard Rate Function

The hazard rate function of the GDUSLN distribution is given by
h ( x ) = f ( x ) S ( x ) ,
where S ( x ) = 1 F ( x ) is the survival function specified by
S ( x ) = e 2 exp Φ α log x μ σ e 1 .
Thus, the hazard rate function gets the form
h ( x ) = α ϕ log x μ σ Φ α 1 log x μ σ exp Φ α log x μ σ σ x e 2 exp Φ α log x μ σ .
Furthermore, plots in Figure 3 refer to the shapes of the hazard rate function.
It is observed that the hazard rate function possesses all the common shapes, such as increasing, decreasing, bathtub, and upside-down bathtub shapes. In this context, one of the innovative features of our model is the ability to design a bathtub-shaped failure rate function with a long flat region. This region, nevertheless, is extremely important in real-world applications, emphasizing the need for proper flat region modeling (see [11]). Furthermore, from Figure 3, it is fascinating to observe that the GDUSLN distribution has a new decreasing–increasing–decreasing shape, which we call the inverted N-shaped hazard rate function, and again possesses a special shape starting with a flat region and continuing with an increasing–decreasing shape, which we call the constant-increasing–decreasing shaped hazard rate function. More elaborately, the following results are observed from Figure 3: The hazard rate function graphs for various combinations of parameters reveal a variety of shapes including increasing ( α = 0.1 , μ 0.9 , σ = 0.09 ), decreasing ( α = 1.5 , μ = 0.7 , σ 1.5 ), bathtub ( α = 0.0001 , μ = 1.5 , 0.04 σ 0.12 ), and upside-down bathtub ( α = 0.2 , 0.5 μ 1.5 , σ = 0.5 ). Furthermore, it can be found that the shapes vary from decreasing to increasing via upside-down bathtub when α 0.1 , μ = 0.01 , and σ = 1.1 .

4. Quantile Function and Associated Measures

In this section, we derive an analytical expression for the quantile function of the GDUSLN distribution and some of its associated measures.
Theorem 1.
Let p ( 0 , 1 ) . If X follows the GDUSLN distribution as given in (1), then the p t h quantile of the distribution is given by Q p = F 1 ( p ) , and, more explicitly,
Q p = exp μ + σ Φ 1 log u ( e 1 ) + 1 1 / α ,
where Φ 1 ( . ) is the quantile function of a standard normal distribution.
Proof. 
For the GDUSLN distribution, Q p is the solution of the equation
exp Φ α log ( Q p ) μ σ 1 e 1 = p , Φ log ( Q p ) μ σ = log p ( e 1 ) + 1 1 / α .
On simplifications, (4) reduces to
log ( Q p ) μ σ = Φ 1 log p ( e 1 ) + 1 1 / α Q p = exp μ + σ Φ 1 log p ( e 1 ) + 1 1 / α .
As a remark, since Φ 1 ( . ) is the quantile function of a standard normal distribution, Q p in Equation (3) also gets the form
Q p = exp μ + σ 2   erf 1 2 log p ( e 1 ) + 1 1 / α 1 ,
where erf 1 ( . ) is the inverse error function.
Now, by putting p = 0.5 , in Equation (5), we get the median of the GDUSLN distribution, and it is given by
M = Q 0.5 = exp μ + σ 2   erf 1 2 log 1 2 ( e 1 ) + 1 1 / α 1 .
Equation (5) delivers the first and third quartiles of the distribution ( Q 0.25 and Q 0.75 ) for p = 1 / 4 and p = 3 / 4 , respectively. □

5. Estimation of Parameters

In this section, we discuss how to estimate the parameters of the GDUSLN distribution by employing two well-known methods, namely the ML and the Bayesian methods.

5.1. ML Estimation

In this subsection, we consider the ML estimation for the GDUSLN model parameters α , μ and σ . Let X 1 , X 2 , , X n symbolize a random sample from the GDUSLN distribution, and let x 1 , x 2 , , x n reflect the observed values. Then the log-likelihood function can then be written in the following form:
L n = n log ( α ) n log ( σ ) n log ( e 1 ) i = 1 n log ( x i ) + i = 1 n log ϕ log ( x i ) μ σ + ( α 1 ) i = 1 n log Φ log ( x i ) μ σ + i = 1 n Φ α log ( x i ) μ σ .
The score function associated with the log-likelihood function is
U = L n α , L n μ , L n σ T .
Now, the associated nonlinear log-likelihood equations are given by L n / α = 0 , L n / μ = 0 and L n / σ = 0 , which can be explicated as
n α + i = 1 n log Φ log ( x i ) μ σ + i = 1 n Φ α log ( x i ) μ σ log Φ log ( x i ) μ σ = 0 ,
i = 1 n log ( x i ) μ σ 2 α 1 σ i = 1 n ϕ log ( x i ) μ σ Φ log ( x i ) μ σ α σ i = 1 n Φ α 1 log ( x i ) μ σ ϕ log ( x i ) μ σ = 0
and
n σ + i = 1 n ( log ( x i ) μ ) 2 σ 3 α 1 σ 2 i = 1 n ( log ( x i ) μ ) ϕ log ( x i ) μ σ Φ log ( x i ) μ σ α σ 2 i = 1 n ( log ( x i ) μ ) Φ α 1 log ( x i ) μ σ ϕ log ( x i ) μ σ = 0 ,
respectively.
One should get the MLEs ( α ^ , μ ^ , σ ^ ) of the GDUSLN model parameters ( α , μ , σ ) by synergistically solving the nonlinear Equations (7)–(9).
In this paper, for the numerical optimization, we maximize the log-likelihood function for finding the MLEs. For fixing a lower and upper bound for each parameter, the numerical optimization technique “L-BFGS-B” in fitdistrplus package of the RStudio software is used. The package provides a set of functions such as fitdist and mledist for fitting univariate distributions to various types of datasets. When the log-likelihood is maximized, one should carefully choose the initial values and remove the constraints of parameters (see [12]). Fitdistrplus is a very handy package that gives unique solutions for MLEs whenever there are questions about the initial guesses and convergence of the algorithm. As a result, we use the prefit function of this package, which delivers good starting values for the algorithm. As one of the returning components of the mledist function, the indication of convergence is done by using some integer codes, such that “0” indicates successful convergence, and “1” indicates that the maximum iteration limit has been reached. As such, “10” indicates the degeneracy of the algorithm, and “100” indicates that the algorithm encountered an internal error. For more details on this package, one should go through the link “https://CRAN.R-project.org/package=fitdistrplus (accessed on 4 September 2021)”.
The asymptotic confidence intervals for the parameters α , μ and σ are now executed. When it comes to the second partial derivatives of L n taken at Θ ^ = ( α ^ , μ ^ , σ ^ ) , the Hessian matrix of the GDUSLN distribution can be obtained, and it is given by
H ( Θ ^ ) = 2 L n α 2 2 L n α μ 2 L n α σ 2 L n μ α 2 L n μ 2 2 L n μ σ 2 L n σ α 2 L n σ μ 2 L n σ 2 .
Now, the observed Fisher’s information matrix J ( Θ ^ ) can be obtained by taking negative of the Hessian matrix. That is,
J ( Θ ^ ) = H ( Θ ^ ) .
In the case of α = 1 , we derive the second partial derivatives of (6) by concerning the parameters μ and σ , and are given as follows:
2 L n μ 2 = 1 σ 2 n + i = 1 n log ( x i ) μ σ ϕ log ( x i ) μ σ ,
2 L n σ 2 = n σ 2 3 σ 2 i = 1 n log ( x i ) μ σ 2 1 σ 2 i = 1 n log ( x i ) μ σ 3 ϕ log ( x i ) μ σ + 2 σ 2 i = 1 n log ( x i ) μ σ ϕ log ( x i ) μ σ
and
2 L n μ σ = 1 σ 2 i = 1 n ϕ log ( x i ) μ σ 2 σ 2 i = 1 n log ( x i ) μ σ 1 σ 2 i = 1 n log ( x i ) μ σ 2 ϕ log ( x i ) μ σ .
Clearly, E [ 2 L n / μ 2 ] = n / σ 2 < 0 , and E [ 2 L n / σ 2 ] = 2 n / σ 2 < 0 . Hence, the information matrix is non-singular, thus following the result for the GDUSLN model also. Thus, we verified that the MLEs of the GDUSLN model parameters are unique.
Now, the inverse of the observed Fisher’s information matrix provides the variance-covariance matrix of the MLEs, which is given by
Σ = J 1 ( Θ ^ ) = Σ 11 Σ 12 Σ 13 Σ 21 Σ 22 Σ 23 Σ 31 Σ 32 Σ 33 ,
and Σ i j = Σ j i for i j = 1 , 2 , 3 .
The asymptotically normal distribution of MLEs have been thoroughly established. That is, Θ ^ Θ follows asymptotically the multivariate normal distribution N 3 ( 0 , Σ ) .
Using the following formulae, we calculate the 100 × ( 1 δ ) % asymptotic confidence intervals for parameters.
α α ^ Z δ / 2 Σ 11 , μ μ ^ Z δ / 2 Σ 22 and σ σ ^ Z δ / 2 Σ 33 ,
where Z δ is the upper δ th percentile of the standard normal distribution.

5.2. Bayesian Estimation

In this subsection, we perform the Bayesian analysis for the GDUSLN model parameters. To do so, each parameter should have a prior density. We employ two types of priors for this: the half-Cauchy ( H C ) and the normal (N) priors. The pdf of the HC distribution with scale parameter a is defined as
f H C ( x * ) = 2 a π ( x * 2 + a 2 ) , x * > 0 , a > 0 .
The HC distribution has no mean nor variance. Meanwhile, its mode is equal to 0. Since the pdf of the HC is virtually flat but not totally flat at scale value equals 25, which verges on acquiring adequate information for the numerical approximation algorithm to continue looking at the target posterior pdf, the HC distribution with a = 25 is recommended as a noninformative prior. Ref. [13] suggested that the uniform distribution, or whether more information is required, is a superior alternative to the HC distribution. As a result, for the parameters α and σ , the HC distribution with a = 25 is chosen as a noninformative prior distribution in this article. Thus, we set the prior distributions of the parameters to be
μ N ( 0 , 1000 ) α , σ H C ( 25 ) .
The log-likelihood function of the GDUSLN distribution is given in Equation (6). Now, using (6) and (11), we obtain the joint posterior pdf as given by
π ( μ , α , σ | x ) L n × π ( μ ) × π ( α ) × π ( σ ) .
From (12), it is obvious that there is no analytical solution to find out the Bayesian estimates. Thus, we use a remarkable method of simulation, namely the Metropolis-Hastings algorithm of the Markov Chain Monte Carlo (MCMC) method.

6. Bootstrap Confidence Intervals

In this section, we use the parametric bootstrap method to approximate the distribution of MLEs of the GDUSLN model parameters. Then, we can use the bootstrap distribution to estimate confidence intervals of each parameter for the fitted GDUSLN distribution. Let Θ ^ be a MLE on the set of parameters of interest Θ = ( α , μ , σ ) using a given dataset { x 1 , x 2 , , x n } . The bootstrap is a method to estimate the distribution of statistic Θ ^ by getting a random sample Θ 1 * , Θ 2 * , , Θ B * for Θ based on B random samples that are drawn with replacement from { x 1 , x 2 , , x n } , see [14]. The bootstrap sample Θ 1 * , Θ 2 * , , Θ B * can be used to construct bootstrap confidence intervals for the parametric set Θ = ( α , μ , σ ) of the GDUSLN distribution.
Thus, using the following formulae, we calculate the 100 × ( 1 δ ) % bootstrap confidence intervals for parameters:
α α ^ z δ / 2 s e ^ α , b o o t , μ μ ^ z δ / 2 s e ^ μ , b o o t , σ σ ^ z δ / 2 s e ^ σ , b o o t ,
where z δ denotes the δ th percentile of the bootstrap sample and, for θ { α , μ , σ } ,
s e ^ θ , b o o t = 1 B b = 1 B θ b * 1 B b = 1 B θ b * 2 .

7. GDUSLN Regression Model

In this section, we define a regression model based on the GDUSLN distribution called the GDUSLN regression model. For finding the model based on the GDUSLN distribution, we consider a random variable X following the GDUSLN distribution with pdf as given in (2) and we define another random variable Y as Y = log ( X ) . Then the Y has the following pdf:
f Y ( y ) = α σ ( e 1 ) ϕ y μ σ Φ α 1 y μ σ exp Φ α y μ σ ,
where y R , the shape parameter α > 0 , the location parameter μ R , and the scale parameter σ > 0 . We allude to Equation (13) as the Log-GDUSLN (Log GDUS log-normal) distribution or otherwise, GDUS normal (GDUSN) distribution. It is worth mentioning that the GDUSN distribution is not covered in any of the existing literature. In this setting, the standardized random variable Z = ( Y μ ) / σ has the pdf given by
f Z ( z ) = α e 1 ϕ z Φ α 1 z exp Φ α z .
Now, the linear location-scale regression model by linking the response variable, say y i , and the explanatory variable vector, say v i T = ( v i 1 , v i 2 , , v i p ) , is obtained as:
y i = μ i + σ z i , i = 1 , 2 , , n ,
where z i is the random error component, has the pdf as given in (14), μ i = v i T τ is the location parameter of y i , where τ = ( τ 1 , τ 2 , , τ p ) T , α and σ are unknown parameters. The linear model μ = V τ represents the location parameter vector μ = ( μ 1 , μ 2 , , μ n ) T , where V = ( V 1 , V 2 , , V n ) T is a known model matrix.
Ultimately, in this article, we propose the GDUSLN regression model from (15) and it is given by
x i = exp ( y i ) = exp ( μ i + σ z i ) , i = 1 , 2 , , n .
Consider a sample ( x 1 , v 1 ) , ( x 2 , v 2 ) , , ( x n , v n ) of n independent observations. Here, typical likelihood estimation approach can be used. Now, for the vector of parameters ψ = ( τ T , α , σ ) T from model (16), the total log-likelihood function for right censored has the form
l ( ψ ) = log i = 1 n f ( x i ) δ i S ( x i ) 1 δ i = i = 1 n δ i log f ( x i ) + i = 1 n ( 1 δ i ) log S ( x i ) ,
with δ i = 1 if survival (uncensored) and δ i = 0 , if not (censored). Furthermore, for i = 1 , 2 , , n , f ( x i ) and S ( x i ) are the pdf and survival function of the GDUSLN distribution taken at x i , respectively.

8. Bayesian Regression Model

The Bayesian technique is shown to be particularly effective in analyzing survival models in many practical circumstances. Ergo, in this section, we will look at how the Bayesian approach fits the regression model based on the GDUSLN distribution when prior pieces of information about the parameters are taken into account. Accordingly, for the purpose of Bayesian analysis of this model, we implemented a simulation method.
Now, to perform a Bayesian analysis, one should adopt prior distributions for the parameters. Here, similar to Section 5.2, we utilized two different prior distributions, the HC and N priors. The pdf of the HC distribution with a as the scale parameter is given in Equation (10). Now, we write the right censored likelihood function as
L = i = 1 n f ( x i ) δ i S ( x i ) 1 δ i ,
with δ i = 1 , if survival (uncensored) and δ i = 0 , if not (censored). Furthermore, for i = 1 , 2 , , n , f ( x i ) and S ( x i ) are the pdf and survival function of the GDUSLN distribution taken at x i , respectively. We use the link function specified by
μ = V τ
as a linear combination of explanatory variables. Thus, we set the prior distributions of the parameters to be
τ j N ( 0 , 1000 ) ; j = 1 , 2 , , J α , σ H C ( 25 ) .
Now, using (17)–(19), the joint posterior pdf is obtained as
π ( τ , α , σ | x , V ) L ( x | V , τ , α , σ ) × π ( τ ) × π ( α ) × π ( σ ) .
From Equation (20), it is clear that the analytical solution is not possible to find out the Bayesian estimates. Thus, similar to Section 5.2, we use the method of simulation, namely, the Metropolis–Hastings algorithm of the MCMC method.

9. Performance of the Estimates Using Simulation Study

In this section, we conduct simulation experiments to assess the long-run performances of ML and Bayesian estimates of the GDUSLN distribution parameters for some finite sample sizes. We have generated samples of sizes n = 50 , 100 , 250 , 500 , 750 , and 1000 from the GDUSLN distribution using various values of parameters.

9.1. Simulation Study for the MLE

Here, the iteration is conducted 1001 times. Thus, we computed the average of the biases, mean squared errors (MSEs), coverage probabilities (CPs), and average lengths (ALs) of each parameter estimate for all replications in the respective sample sizes.
The analysis computes the values for the average biases and MSEs of the simulated estimates by the following formulae:
  • Average bias = 1 1001 i = 1 1001 ( θ ^ i θ ) , and
  • Average MSE = 1 1001 i = 1 1001 ( θ ^ i θ ) 2 ,
where i is the number of iterations, and θ { α , μ , σ } and θ ^ is the estimate of θ . The results of each parameter set are reported in Table 1, Table 2, Table 3 and Table 4.
It can be observed that with the increase in sample size, the MSEs and the ALs corresponding to each estimate fall. Furthermore, the CPs of the confidence intervals for each parameter are fairly close to the 95 % nominal levels. This confirms the consistent performance of MLEs of the GDUSLN distribution.

9.2. Simulation Study for Bayesian Estimates

We consider the prior distributions for the GDUSLN parameters as given in Section 5.2. Hence, here we iterated each sample 10,001 times. For each parameter set of respective sample sizes, the posterior summary results such as mean, standard deviation (SD), Monte Carlo error (MCE), 95 % confidence interval (CI), and median are presented in Table 5, Table 6, Table 7 and Table 8.
It is observed that the SD and MCE decrease as the sample size increases, which predicts the consistency of Bayesian estimates of the GDUSLN distribution.

10. Applications and Empirical Study

This section is comprised of demonstrating the empirical importance of the GDUSLN distribution. We consider two real datasets from the area of biological science. One is the univariate cancer survival dataset, which is used to compare the data modeling ability of the GDUSLN distribution over some competitive distributions, and the other is the heart transplant dataset for the regression study. We use the RStudio software for numerical evaluations of these datasets.

10.1. Cancer Survival Data

First, we utilize the dataset from [15] as a biological dataset, which represents an uncensored univariate dataset comprised of the remission times (in months) of a random sample of 128 bladder cancer patients. The descriptive measures of the real dataset, which include sample size (n), minimum ( m i n ), first quartile ( Q 1 ), median ( M d ), third quartile ( Q 3 ), maximum ( m a x ), and inter-quartile range (IQR) are given in Table 9.
We also investigate the empirical hazard rate function for the biology dataset using the idea of a total time on test (TTT) plot. The TTT plot is a graph being used to distinguish between several types of aging as displayed in the hazard rate shapes. The common shapes of the hazard rate possess constant, increasing, decreasing, bathtub, and upside-down bathtub shapes, and can be identified by using the TTT plot by the following methods:
  • A plot around the diagonal indicates a constant hazard rate, that is, the failure times can be considered exponentially distributed.
  • A concave plot (above the diagonal) indicates an increasing hazard rate function.
  • A convex plot (under the diagonal) indicates a decreasing hazard rate function.
  • A plot which first is convex, and then concave indicates a bathtub shaped hazard rate function.
  • A plot which first is concave, and then convex indicates an upside-down bathtub shaped hazard rate function.
For more about the TTT plot, see details in [16]. The TTT plot is drawn by plotting
T i n = r = 1 i x r : n + ( n i ) x i : n r = 1 n x r : n
against i / n , where i = 1 , 2 , , n and x r : n , r = 1 , 2 , , n are the order statistics of the sample.
Thus, the plot in Figure 4 indicates that this dataset represents an upside-down bathtub shaped hazard rate function. This case is covered by the characteristics of the GDUSLN distribution.
To show the potential advantage of the GDUSLN distribution, the following distributions are considered for comparison:
  • The two-parameter LN distribution.
  • The exponentiated LN (ELN) distribution or otherwise, the log-power-normal distribution (see [17]) with pdf
    f ( x ) = α x σ ϕ log x μ σ Φ log x μ σ α 1 , x > 0 , μ R , α , σ > 0 .
  • Generalized half-normal (GHN) distribution (see [18]) with pdf
    f ( x ) = 2 π α x x σ α exp 1 2 x σ 2 α , x , α , σ > 0 .
  • The new generalized Lindley distribution (NGLD) (see [19]) with pdf
    f ( x ) = e μ x 1 + μ μ α + 1 x α 1 Γ ( α ) + μ σ x σ 1 Γ ( σ ) , x > 0 , α , μ , σ > 0 ,
    where Γ ( p ) = 0 t p 1 e t d t .
  • The modified Weibull (MoW) distribution (see [20]) with pdf
    f ( x ) = μ σ x α μ 1 exp x α μ + α σ 1 e ( x / α ) μ , x > 0 , α , μ , σ > 0 .
  • The Weibull distribution with pdf
    f ( x ) = α σ x σ α 1 e ( x / σ ) α , x > 0 , α , σ > 0 .
We compare the competitive models to the proposed models using the following statistical tools: negative log-likelihood ( log L ), Kolmogorov–Smirnov (KS), Cramér-von Misses ( W * ), Anderson–Darling ( A * ) statistics, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) values. Table 10 and Table 11 display the corresponding MLEs and goodness-of-fit (GOF) statistics of the considered distributions corresponding to the bladder cancer dataset.
From these tables, we see that the GOF statistics values of the GDUSLN distribution are smaller than those of the other compared distributions. It can also be noted that the optimization algorithm possesses successful convergence as indicated in Section 5.1.
The empirical cdf and quantile-quantile (Q-Q) plots for the real dataset are given in Figure 5.
This figure shows some nice-shaped curves for those empirical and fitted functions. Thus, we conclude that the GDUSLN distribution is the most suitable distribution for this dataset compared to that of the other distributions.
Now, the Hessian matrix corresponding to bladder cancer dataset is obtained as
H ( Θ ) = 1918.1947 407.3235 825.5481 407.3235 126.7731 77.560 825.5481 77.560 620.8239 ,
and the corresponding estimated variance-covariance matrix is
Σ = 0.0332 0.0863 0.0334 0.0863 0.2326 0.0856 0.0334 0.0856 0.0353 .
It is observed that the determinant value of the observed information matrix ( | J ( Θ ^ ) | ) is non-zero, and hence satisfies the non-singularity condition of the information matrix. Now, Table 12 provides the 95 percent asymptotic confidence intervals for the GDUSLN parameters.
Next, we focus on estimating the parameters of the GDUSLN distribution using the Bayesian procedure based on the above discussed univariate bladder cancer survival dataset. In the context of Bayesian estimation, the analysis was performed using the Metropolis–Hastings algorithm of the MCMC method with 1000 iterations. For comparing Bayes estimates with the MLEs, both the estimates of the GDUSLN parameters for the real dataset are given in Table 13. The numerical computations on Bayesian estimation are done using RStudio software.

10.1.1. Results on Bootstrap Confidence Intervals

In this subsection, for the considered dataset, we utilize the computed MLEs to construct the 95 percent bootstrap confidence intervals for the parameters α , μ , and σ . Based on the GDUSLN distribution, we simulate 1001 samples of the same size as the real dataset, with true values of the parameters chosen as MLEs of the respective parameters. We calculate the MLEs α ^ b * , μ ^ b * and σ ^ b * , for b { 1 , 2 , , 1001 } for each sample obtained. Table 14 shows the median and 95 percent bootstrap confidence interval for the parameters α , μ and σ of the dataset.
It is also fascinating to look at the joint distribution of the bootstrapped values in a matrix of scatter plots to determine the potential structural correlation among the parameters. The matrix scatterplots of the bootstrapped values of the GDUSLN parameters, which portray the joint uncertainty distribution of the fitted parameters, are displayed in Figure 6.

10.1.2. Likelihood Ratio Test

We also utilized the likelihood ratio (LR) test for comparing the GDUSLN distribution, which has an additional parameter α with the LN distribution based on the above discussed bladder cancer survival dataset. The LR statistic for comparing the nested model H 0 : LN against H A : GDUSLN is
LR = 2 log likelihood under the null hypothesis likelihood in the whole parameter space
which asymptotically follows a chi-square distribution having d degrees of freedom, d being the number of additional parameters in the GDUSLN model. By using this result and standard statistical tables, we can obtain critical values for the LR test statistics for the given bladder cancer dataset. Table 15 includes the LR statistic and the corresponding p-value.
Given, the values of test statistic and the associated p-value, we reject the null hypothesis for the above discussed bladder cancer dataset and conclude that the GDUSLN distribution provides a significantly better representation than the LN distribution.

10.2. Stanford Heart Transplant Data

In this application, we validate the prominence of the GDUSLN regression model by applying it to the real dataset, the renowned Stanford heart transplant data. The dataset is given in [21], which can also be found in the R package p3state.msm. The goal of this study is to investigate the survival times ( y i ) of patients with covariates x 1 -year of acceptance to the program, x 2 -age of patient (in years), and x 3 -previous surgery status ( 1 = y e s , 0 = n o ). In this study, the transplant indicator is used as the censoring variable.

10.2.1. Results Using the GDUSLN Regression Model

The fitted non-linear regression model is given by
x i = exp ( τ 0 + τ 1 v 1 + τ 2 v 2 + τ 3 v 3 + σ z i ) ,
where the response variable x i is observed follows a random variable following the GDUSLN distribution.
In Table 16, we compare the performance of the GDUSLN regression model with that of the LN regression model, as well as the summaries due to the real dataset, which include estimates of all parameters, negative log-likelihood ( l ( ψ ) ), and the value of AIC.
Since its has the smallest AIC, the GDUSLN regression model is the best.

10.2.2. Results Using the GDUSLN Bayesian Regression

Table 17 represents the summary of 1000 times iterated simulated results, due to the censored dataset using Random Dive Metropolis–Hastings (RDMH) algorithm of the MCMC method, which includes the posterior mean, SD, Monte Carlo Standard Error (MCSE), effective sample size due to autocorrelation (ESS), 95% CI and the posterior median.

11. Concluding Remarks

In this article, we suggested a new distribution, which is a transformed version of the log-normal distribution, mainly to investigate data in the field of biology in this research. We explored the mathematical and statistical aspects of the new model, which we call the generalized DUS transformed log-normal (GDUSLN) distribution. We delivered specific expressions for the hazard rate function and the quantile function. The hazard rate function possesses all the common shapes such as increasing, decreasing, bathtub, and upside-down bathtub, and also possesses an interesting shape called the inverted N-shaped hazard rate function. The model parameters were estimated by using Bayesian estimation and the method of maximum likelihood, and also, the observed information matrix was presented. Further, we adopted the parametric bootstrap technique to obtain confidence intervals for the model parameters. More importantly, we introduced a parametric regression model and a Bayesian regression method based on the new distribution. Simulation studies were conducted to analyze the performance of ML and Bayesian estimates of the GDUSLN parameters and they confirm their consistency. The usefulness of the new model was illustrated by two applications of real datasets, which are related to the field of biology and used goodness-of-fit tests. The novel model consistently outperforms previous models in the literature in terms of fitting. We anticipate that the suggested model would find a wider range of applications in the modeling of positive real-world datasets, that is, not only in the area of biology but also in many other areas such as physics, astronomy, engineering, survival analysis, hydrology, economics, and so on.

Author Contributions

Conceptualization, M.R.I. and R.M.; methodology, M.R.I., C.C., S.L.N., D.S.S. and R.M.; validation, M.R.I., C.C., S.L.N., D.S.S. and R.M.; software, S.L.N. and D.S.S.; investigation, M.R.I., C.C., S.L.N., D.S.S. and R.M.; data curation, S.L.N. and D.S.S.; writing—original draft preparation, S.L.N. and D.S.S.; writing—review and editing, M.R.I., C.C., S.L.N., D.S.S. and R.M.; visualization, M.R.I., C.C., S.L.N., D.S.S. and R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the Editor and the unknown reviewers for the constructive comments, which greatly improved the present version of our article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sinnott, E.W. The Relation of Gene to Character in Quantitative Inheritance. Proc. Natl. Acad. Sci. USA 1937, 23, 224–227. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Kermack, K.A.; Haldane, J.B.S. Organic correlation and allometry. Biometrika 1950, 37, 30–41. [Google Scholar] [CrossRef] [PubMed]
  3. Bernstein, L.; Weatherall, M. Statistics for Medical and Other Biological Students. Q. Rev. Biol. 1954, 29, 303. [Google Scholar]
  4. Beal, J. Biochemical complexity drives log-normal variation in genetic expression. Eng. Biol. 2017, 1, 55–60. [Google Scholar] [CrossRef] [Green Version]
  5. Carvalho, J.; Piaggio, G.; Wojdyla, D.; Widmer, M.; Gülmezoglu, A. Distribution of postpartum blood loss: Modeling, estimation and application to clinical trials. Reprod. Health 2018, 15, 199. [Google Scholar] [CrossRef] [PubMed]
  6. Aitchison, J.; Brown, J.A.C. The Lognormal Distribution with Special Reference to Its Uses in Economics; Cambridge University Press: Cambridge, UK, 1957. [Google Scholar]
  7. Jobe, J.; Crow, E.; Shimizu, K. Lognormal Distributions: Theory and Applications. Technometrics 1989, 31, 392. [Google Scholar] [CrossRef]
  8. Pham, A.; Lai, C.D. On Recent Generalizations of the Weibull Distribution. Reliab. IEEE Trans. 2007, 56, 454–458. [Google Scholar] [CrossRef]
  9. Dinesh, K.; Umesh, S.; Sanjay Kumar, S. A Method of Proposing New Distribution and its Application to Bladder Cancer Patients Data. J. Stat. Appl. Probab. Lett. 2015, 3, 235–245. [Google Scholar]
  10. Maurya, S.K.; Kaushik, A.; Singh, S.K.; Singh, U. A new class of distribution having decreasing, increasing, and bathtub-shaped failure rate. Commun. Stat. Theory Methods 2017, 46, 10359–10372. [Google Scholar] [CrossRef]
  11. Irshad, M.R.; Maya, R.; Krishna, A. Exponentiated Power Muth Distribution and Associated Inference. J. Indian Soc. Probab. Stat. 2021, 1–38. [Google Scholar] [CrossRef]
  12. MacDonald, I.L. Does Newton-Raphson really fail? Stat. Methods Med. Res. 2014, 23, 308–311. [Google Scholar] [CrossRef] [PubMed]
  13. Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; Analytical Methods for Social Research; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  14. Wasserman, L. All of Nonparametric Statistics; Springer Texts in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
  15. Lee, E.; Wang, J. Statistical Methods for Survival Data Analysis; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 2003. [Google Scholar]
  16. Aarset, M.V. How to Identify a Bathtub Hazard Rate. IEEE Trans. Reliab. 1987, R-36, 106–108. [Google Scholar] [CrossRef]
  17. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The log-power-normal distribution with application to air pollution. Environmetrics 2014, 25, 44–56. [Google Scholar] [CrossRef]
  18. Cooray, K.; Ananda, M.M.A. A Generalization of the Half-Normal Distribution with Applications to Lifetime Data. Commun. Stat. Theory Methods 2008, 37, 1323–1337. [Google Scholar] [CrossRef]
  19. Elbatal, I.; Merovci, F.; Elgarhy, M. A new generalized Lindley distribution. Math. Theory Model. 2013, 3, 30–47. [Google Scholar]
  20. Xie, M.; Tang, Y.; Goh, T. A modified Weibull extension with bathtub-shaped failure rate function. Reliab. Eng. Syst. Saf. 2002, 76, 279–285. [Google Scholar] [CrossRef]
  21. Crowley, J.; Hu, M. Covariance Analysis of Heart Transplant Survival Data. J. Am. Stat. Assoc. 1977, 72, 27–36. [Google Scholar] [CrossRef]
Figure 1. Plots of the cdf of the GDUSLN distribution.
Figure 1. Plots of the cdf of the GDUSLN distribution.
Mathematics 09 03113 g001
Figure 2. Plots of the pdf of the GDUSLN distribution.
Figure 2. Plots of the pdf of the GDUSLN distribution.
Mathematics 09 03113 g002
Figure 3. Plots of the hazard rate function of the GDUSLN distribution.
Figure 3. Plots of the hazard rate function of the GDUSLN distribution.
Mathematics 09 03113 g003
Figure 4. The TTT plot of bladder cancer dataset.
Figure 4. The TTT plot of bladder cancer dataset.
Mathematics 09 03113 g004
Figure 5. Empirical plots on bladder cancer dataset.
Figure 5. Empirical plots on bladder cancer dataset.
Mathematics 09 03113 g005
Figure 6. Matrix scatter plot on bootstrappped values of the GDUSLN parameters due to bladder cancer dataset.
Figure 6. Matrix scatter plot on bootstrappped values of the GDUSLN parameters due to bladder cancer dataset.
Mathematics 09 03113 g006
Table 1. The MLE simulation results for ( α = 0.01 , μ = 0 , σ = 1 ).
Table 1. The MLE simulation results for ( α = 0.01 , μ = 0 , σ = 1 ).
ParametersnMLEBiasMSECPAL
α 500.02650.01650.01170.98600.1884
1000.01710.00710.00290.99000.0703
2500.01280.00280.0000620.99800.0285
5000.01180.00180.0000250.99990.0170
7500.01180.00180.0000110.99990.0138
10000.01170.0017 8.722 × 10 6 0.99990.0118
μ 50−0.2877−0.28772.03490.99996.5780
100−0.1107−0.11070.80010.99903.9497
250−0.0636−0.06360.19240.99802.2907
500−0.0306−0.03060.08210.99101.5273
750−0.0809−0.08090.03980.98801.2473
1000−0.0774−0.07740.03100.98601.0726
σ 501.20840.20840.41600.98803.0793
1001.13610.13610.18080.99101.8328
2501.09640.09640.05190.99501.0693
5001.06850.06850.02380.99700.7011
7501.07370.07370.01600.99500.5817
10001.06990.06990.01310.99500.4989
Table 2. The MLE simulation results for ( α = 1.5 , μ = 0 , σ = 1 ).
Table 2. The MLE simulation results for ( α = 1.5 , μ = 0 , σ = 1 ).
ParametersnMLEBiasMSECPAL
α 503.44501.945017.96930.821229.7384
1002.89441.394411.91530.835217.3465
2502.37890.87897.28030.88519.2240
5001.87990.37992.22170.89814.5803
7501.75220.25220.77470.93413.3028
10001.69360.19360.56280.92812.7298
μ 50−0.1516−0.15161.50500.92116.7687
100−0.1322−0.13221.07380.92214.7298
250−0.1360−0.13600.60040.94713.0166
500−0.0557−0.05570.30290.94612.0178
750−0.0638−0.06380.16460.96101.6335
1000−0.0495−0.04950.12980.94911.4036
σ 500.9850−0.01510.13540.93511.9180
1000.9993−0.000760.09460.92611.3519
2501.02170.02170.04860.95900.8666
5001.00800.00800.02530.94210.5928
7501.01360.01360.01390.96400.4816
10001.01100.01100.01120.95110.4152
Table 3. The MLE simulation results for ( α = 3.5 , μ = 0 , σ = 1 ).
Table 3. The MLE simulation results for ( α = 3.5 , μ = 0 , σ = 1 ).
ParametersnMLEBiasMSECPAL
α 503.3691−0.13097.36390.760226.7792
1003.76520.26526.72120.809220.8407
2504.06250.56255.50400.868114.4071
5003.96410.46414.14230.893110.1917
7504.05220.55233.33800.92718.6438
10003.80940.30942.56730.92616.9700
μ 500.37720.37720.90820.88415.7295
1000.18910.18910.58070.90814.2434
2500.02290.02290.31650.94312.8635
500−0.00075−0.000750.21640.94612.0896
750−0.0493−0.04930.15100.96501.7545
1000−0.0105−0.01050.12400.96101.5016
σ 500.8607−0.13930.09620.90511.5967
1000.9266−0.07340.05510.92111.1520
2500.9831−0.01690.02500.95600.7596
5000.9943−0.00570.01600.95410.5523
7501.00850.00850.01060.97500.4605
10000.9992−0.000780.00890.96300.3970
Table 4. The MLE simulation results for ( α = 0.01 , μ = 1.5 , σ = 0.5 ).
Table 4. The MLE simulation results for ( α = 0.01 , μ = 1.5 , σ = 0.5 ).
ParametersnMLEBiasMSECPAL
α 500.81200.802099.33060.999017.2775
1000.05970.04970.01800.99600.3258
2500.02780.017800.00090.99500.0855
5000.01200.0100.000230.99700.0393
7500.01720.00720.000120.99900.0260
10000.01600.0060 7.792 × 10 5 0.99700.0201
μ 50−0.5673−2.067310.64680.99509.7789
1000.6150−0.88501.69480.99204.0846
2501.0235−0.47650.45900.96901.9015
5001.1998−0.30020.18340.94211.1122
7501.2758−0.22420.10680.90910.8285
10001.3111−0.18890.07840.87110.6806
σ 501.31700.81701.38310.99803.7459
1000.90050.40050.32220.98801.8729
2500.73450.23440.10430.98600.9233
5000.65700.15700.04740.96000.5459
7500.62340.12340.02930.94810.4065
10000.60590.10590.02190.90310.3325
Table 5. Posterior summary results for ( α = 0.01 , μ = 0 , σ = 1 ).
Table 5. Posterior summary results for ( α = 0.01 , μ = 0 , σ = 1 ).
ParametersnMeanSDMCE95% CIMedian
α 500.14420.18240.0519(0.0042, 0.6596)0.0570
1000.03340.12070.0132(0.0063, 0.0611)0.0167
2500.02200.02120.0081(0.0141, 0.0898)0.0150
5000.01850.00760.0024(0.0118, 0.0466)0.0172
7500.02310.00520.0011(0.0208, 0.0262)0.0208
10000.01490.00090.00058(0.0135, 0.0156)0.0154
μ 50−3.47333.07230.8754(−9.6826, 0.8414)−2.7271
100−0.34521.26630.5454(−2.4443, 1.5253)−0.1428
250−0.40350.56960.2240(−2.3547, −0.0708)−0.1831
500−0.53470.41150.1992(−0.8684, 0.8708)−0.6868
750−0.75010.40260.1564(−1.2204, 0.1082)−0.6910
1000−0.70850.34810.1240(−0.8361, 0.6058)−0.8361
σ 502.39151.25600.3398(0.6745, 5.0700)2.1073
1001.35750.87230.2097(0.7271, 2.4231)1.1131
2501.31060.52670.1905(1.1652, 2.6330)1.1652
5001.29380.27870.0835(1.0247, 2.0993)1.2751
7501.48810.25250.0783(1.4131, 1.7219)1.4131
10001.18670.05060.03160(1.1108, 1.2723)1.2102
Table 6. Posterior summary results for ( α = 1.5 , μ = 0 , σ = 1 ).
Table 6. Posterior summary results for ( α = 1.5 , μ = 0 , σ = 1 ).
ParametersnMeanSDMCE95% CIMedian
α 504.16627.32442.0432(0.0093, 20.6094)0.9974
1004.21315.24941.5542(0.0717, 22.7491)2.2271
2502.24172.14100.6711(0.0846, 9.0499)1.6914
5001.53781.60050.2416(0.4581, 4.3854)1.3377
7501.62110.46960.1316(0.7095, 2.7769)1.5799
10001.49060.23160.0615(1.4512, 2.0980)1.4514
μ 500.03691.29950.3834(−2.5848, 2.0525)0.3404
100−0.40921.04410.3193(−2.6677, 1.8552)−0.1807
2500.15500.79310.2581(−1.3816, 1.7138)0.0570
5000.12530.46950.0839(−1.0430, 0.9718)0.1231
750−0.10650.28890.0811(−0.7150, 0.5232)−0.0650
1000−0.01700.14200.0331(−0.4267, 0.0236)0.0226
σ 500.84480.37020.1073(0.1403, 1.5976)0.8392
1001.08150.28330.0866(0.3392, 1.5447)1.0771
2500.85060.23290.0747(0.3265, 1.2604)0.8995
5000.95650.14420.0291(0.6713, 1.2297)0.9641
7501.05890.09860.0277(0.8357, 1.2506)1.0656
10001.00970.05210.0142(0.9907, 1.1366)0.9907
Table 7. Posterior summary results for ( α = 3.5 , μ = 0 , σ = 1 ).
Table 7. Posterior summary results for ( α = 3.5 , μ = 0 , σ = 1 ).
ParametersnMeanSDMCE95% CIMedian
α 508.44997.67682.1079(0.1380, 24.5375)6.2095
1005.35009.41711.7008(0.3285, 13.5378)3.2753
2502.37962.70530.8287(0.3081, 10.1427)1.3011
5002.87842.29010.8092(1.0101, 10.4748)1.9153
7504.42092.05850.6653(2.6587, 6.5986)4.0738
10002.84361.45920.3074(1.6108, 6.7540)2.8355
μ 500.22280.95730.2940(−1.0484, 2.0049)−0.0019
1000.03320.77690.2336(−1.3759, 1.4283)0.0934
2500.56180.63700.1979(−0.7925, 1.4505)0.6533
5000.29500.49910.1552(−0.8614, 0.9619)0.4101
750−0.19690.42490.1302(−0.7398, 0.2054)−0.1912
10000.21970.34720.1082(−0.6779, 0.5688)0.1770
σ 500.80800.26330.0797(0.2915, 1.1814)0.9090
1000.87180.21620.0649(0.4689, 1.1968)0.8866
2500.74430.18000.0582(0.4790, 1.0736)0.7109
5000.93430.13540.0390(0.7414, 1.2237)0.9326
7501.06920.12950.0371(0.9619, 1.2253)1.0566
10000.94660.10040.0306(0.8628, 1.2225)0.9323
Table 8. Posterior summary results for ( α = 0.01 , μ = 1.5 , σ = 0.5 ).
Table 8. Posterior summary results for ( α = 0.01 , μ = 1.5 , σ = 0.5 ).
ParametersnMeanSDMCE95% CIMedian
α 500.17790.29080.0794(0.0041, 1.3728)0.0583
1000.07340.09870.0260(0.0060, 0.3889)0.0330
2500.03000.04880.0124(0.0075, 0.1611)0.0160
5000.01600.03640.0062(0.0110, 0.0843)0.0110
7500.01590.00900.0021(0.0061, 0.0240)0.0147
10000.00920.00690.0012(0.0068, 0.0202)0.0076
μ 50−0.14241.41960.3924(−4.0866, 1.5825)0.3360
1000.64500.97480.2626(−1.6869, 1.8166)0.9622
2500.95450.63010.1960(−0.9009, 1.4706)1.1956
5001.16910.31830.0822(−0.1112, 1.2343)1.2343
7501.32690.21150.0573(1.0271, 1.7025)1.3379
10001.61810.14930.0226(1.2482, 1.7385)1.6251
σ 500.96120.59660.1678(0.2695, 2.6994)0.8186
1000.94950.50120.1325(0.3651, 2.0675)0.8613
2500.69180.31480.0984(0.4297, 1.5184)0.6134
5000.57190.18910.0508(0.5299, 1.4506)0.5300
7500.57600.10870.0281(0.3929, 0.7253)0.5727
10000.45040.10120.0194(0.4045, 0.7358)0.4185
Table 9. Descriptive statistics of real dataset.
Table 9. Descriptive statistics of real dataset.
Statisticn m i n Q 1 M d Q 3 m a x IQR
Values1280.083.3486.28011.67879.058.330
Table 10. Bladder cancer dataset: MLEs of the parameters.
Table 10. Bladder cancer dataset: MLEs of the parameters.
DistributionMLE
GDUSLN( α , μ , σ ) α ^ = 0.2330, μ ^ = 2.5675, σ ^ = 0.6660
LN( μ , σ )
ine ELN( α , μ , σ )
μ ^ = 1.7423, σ ^ = 1.0647
α ^ = 0.1514, μ ^ = 3.0502, σ ^ = 0.5401
GHN( μ , σ ) μ ^ = 0.7593, σ ^ = 11.4510
NGLD( α , μ , σ ) α ^ = 1.1848, μ ^ = 0.1287, σ ^ = 1.1851
MoW( α , μ , σ ) α ^ = 4.565 × 10 6 , μ ^ = 0.1378, σ ^ = 123.976
Weibull( α , σ ) α ^ = 1.0546, σ ^ = 9.4371
Table 11. Bladder cancer dataset: GOF statistics results.
Table 11. Bladder cancer dataset: GOF statistics results.
Distribution log L AICBICKS W * A *
GDUSLN409.0979824.1958832.75190.05510.06460.4318
LN412.6565829.3131835.01710.06440.13130.8708
ELN410.0441826.0883834.64440.05620.08460.5590
GHN418.7864841.5727847.27680.10180.38152.4201
NGLD411.0846828.1691836.72520.07510.14150.8233
MoW419.3804844.7608853.31690.09490.36322.3184
Weibull411.8936827.7873833.49130.07310.16701.0441
Table 12. The 95% asymptotic confidence intervals of the GDUSLN parameters based on bladder cancer dataset.
Table 12. The 95% asymptotic confidence intervals of the GDUSLN parameters based on bladder cancer dataset.
ParameterLowerUpper
α −0.12410.5901
μ 1.62223.5128
σ 0.29781.0341
Table 13. MLEs and Bayes estimates of the GDUSLN parameters on bladder cancer dataset.
Table 13. MLEs and Bayes estimates of the GDUSLN parameters on bladder cancer dataset.
ParameterMLBayes
α 0.23300.2058
μ 2.56752.6519
σ 0.66600.6395
Table 14. The median and 95% bootstrap confidence interval for the GDUSLN parameters on bladder cancer dataset.
Table 14. The median and 95% bootstrap confidence interval for the GDUSLN parameters on bladder cancer dataset.
ParameterMedianBootstrap CI
Bladder cancer
dataset
α 0.2599(0.0336, 2.2893)
μ 2.4878(0.6703, 3.3436)
σ 0.6813(0.3100, 1.2527)
Table 15. Likelihood ratio statistics and their p-values on bladder cancer dataset.
Table 15. Likelihood ratio statistics and their p-values on bladder cancer dataset.
LRp-Value
GDUSLN versus LN7.11730.00763
Table 16. Regression results on Stanford heart transplant dataset.
Table 16. Regression results on Stanford heart transplant dataset.
Parameter τ 0 τ 1 τ 2 τ 3 α σ ( ψ ) AIC
LN8.058−0.024−0.0221.131-1.317487.873985.747
GDUSLN10.039−0.016−0.0320.4990.01040.207485.526983.051
Table 17. GDUSLN Bayesian regression results on Stanford heart transplant dataset.
Table 17. GDUSLN Bayesian regression results on Stanford heart transplant dataset.
ParameterMeanSDMCSEESS95% CIMedian
τ 0 12.3190.1250.0596.657(11.865, 12.419)12.312
τ 1 −0.0640.0090.0071.296(−0.078, −0.052)−0.067
τ 2 −0.0180.01220.0091.785(−0.037, 0.002)−0.016
τ 3 0.7500.3090.06132.293(0.161, 1.309)0.740
α 0.0690.0520.0286.064(0.022, 0.258)0.055
σ 0.4600.1330.0765.967(0.308, 0.795)0.421
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Irshad, M.R.; Chesneau, C.; Nitin, S.L.; Shibu, D.S.; Maya, R. The Generalized DUS Transformed Log-Normal Distribution and Its Applications to Cancer and Heart Transplant Datasets. Mathematics 2021, 9, 3113. https://doi.org/10.3390/math9233113

AMA Style

Irshad MR, Chesneau C, Nitin SL, Shibu DS, Maya R. The Generalized DUS Transformed Log-Normal Distribution and Its Applications to Cancer and Heart Transplant Datasets. Mathematics. 2021; 9(23):3113. https://doi.org/10.3390/math9233113

Chicago/Turabian Style

Irshad, Muhammed Rasheed, Christophe Chesneau, Soman Latha Nitin, Damodaran Santhamani Shibu, and Radhakumari Maya. 2021. "The Generalized DUS Transformed Log-Normal Distribution and Its Applications to Cancer and Heart Transplant Datasets" Mathematics 9, no. 23: 3113. https://doi.org/10.3390/math9233113

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop