Next Article in Journal
Closed Form Bayesian Inferences for Binary Logistic Regression with Applications to American Voter Turnout
Previous Article in Journal
A New Predictive Algorithm for Time Series Forecasting Based on Machine Learning Techniques: Evidence for Decision Making in Agriculture and Tourism Sectors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Weibull-Beta Prime Distribution to Model COVID-19 Data with the Presence of Covariates and Censored Data

by
Elisângela C. Biazatti
1,
Gauss M. Cordeiro
2,*,
Gabriela M. Rodrigues
3,
Edwin M. M. Ortega
3 and
Luís H. de Santana
4
1
Department of Mathematics and Statistics, Federal University of Rondônia, Ji-Paraná 76900, Brazil
2
Departamento de Estatística, Universidade Federal de Pernambuco, Cidade Universitária, Recife 50670, Brazil
3
Departamento de Ciências Exatas, Universidade de São Paulo, ESALQ/USP, Piracicaba 13418, Brazil
4
Departamento de Tecnologia, Universidade Estadual de Maringá, Umuarama 87506, Brazil
*
Author to whom correspondence should be addressed.
Stats 2022, 5(4), 1159-1173; https://doi.org/10.3390/stats5040069
Submission received: 31 October 2022 / Revised: 12 November 2022 / Accepted: 14 November 2022 / Published: 17 November 2022
(This article belongs to the Section Regression Models)

Abstract

:
Motivated by the recent popularization of the beta prime distribution, a more flexible generalization is presented to fit symmetrical or asymmetrical and bimodal data, and a non-monotonic failure rate. Thus, the Weibull-beta prime distribution is defined, and some of its structural properties are obtained. The parameters are estimated by maximum likelihood, and a new regression model is proposed. Some simulations reveal that the estimators are consistent, and applications to censored COVID-19 data show the adequacy of the models.

1. Introduction

The beta prime (BP) distribution has become popular for analyzing lifetime and monotonic failure rate phenomena. For modeling monotonic failure rates, the Weibull, log-logistic, and log-normal distributions can also be good choices, but they do not model bathtub-shaped, unimodal, and bimodal failure rates that are common in survival analysis. Because of this, several models have been proposed in recent years.
In this context, the Weibull-G (W-G) family [1] proved itself to be a good competitor to the Beta-G (B-G) [2] and Kumaraswamy-G (Kw-G) [3] classes. In this family, a > 0 and b > 0 are two additional parameters to those of the G distribution as well as for the B-G and Kw-G classes. It is emphasized that the cumulative distribution function (cdf) of the beta distribution involves the incomplete beta function, whereas the Kumaraswamy cdf has a closed-form. In addition, the W-G family can be better explored and disseminated as the B-G and Kw-G classes have been highly cited in Google Scholar.
Recently, Ref. [4] defined a new extension of the W-G family, also a competitor of the B-G and Kw-G classes. Ref. [5] proposed a bivariate W-G family. The estimation of the parameters of the Weibull Generalized Exponential distribution based on the adaptive progressive type II (APTII) censored sample was explored by [6].
Ref. [7] addressed the estimation of the BP distribution and discussed some properties. A generalized BP model defined by [8,9,10] introduced regression models based on the BP distribution. Other recent works studied this distribution [11,12]. Through the McDonald’s inverted beta (McIB) distribution [13], we can obtain other generalizations of the BP distribution, for example, the Kumaraswamy Beta Prime and Beta Beta Prime models.
In this context, our main objective is to introduce the Weibull-beta prime (WBP) distribution. We illustrate the applicability of the new distribution to three real COVID-19 data sets. Currently, the USA has the highest number of COVID-19 cases worldwide. Brazil is the second country with most deaths (688,316 total deaths) [14], and several factors demand analysis of this number, including the continental dimension of the country, the proportion of elderly people, greater social vulnerability, and also the high rate of chronic diseases. In this way, we first verify the flexibility of the new distribution through graphical analyses and statistical tests using data on the number of new daily deaths due to COVID-19 in the US. Second, we provide an application to the times to death by this coronavirus in a Brazilian capital. In addition, a third application for regression modeling is done, in which we investigate the influence of covariates on the time to death from COVID-19 in the city of Campinas, Brazil. For these studies, we aim to contribute to the literature of new distributions and survival analysis, as well as direct efforts to estimate the impact caused by the disease.
The BP random variable W has cumulative distribution function (cdf)
G ( x ; α , β ) = I z ( α , β ) , x 0 ,
where z = z ( x ) = x / ( 1 + x ) , α > 0 and β > 0 are shape parameters, I z ( α , β ) = B ( α , β ) 1 0 z t α 1 ( 1 t ) β 1 d t and B ( α , β ) = 0 1 t α 1 ( 1 t ) β 1 d t (for z [ 0 , 1 ] ) are the incomplete beta and beta functions, respectively.
The probability density function (pdf) of W has the form
g ( x ; α , β ) = x α 1 ( 1 + x ) α β B ( α , β ) , x 0 ,
whose sth ordinary moment (for s < β ) becomes
E ( W s ) = B ( α + s , β s ) B ( α , β ) ·
Some other properties of W were tackled by [7]. The arguments in the functions are omitted from now on.
This article is organized as follows. Section 2 defines the Weibull-beta prime (WBP) model with four positive parameters. Section 3 provides some of its properties. Section 4 addresses the estimation and a simulation study. Section 5 develops a WBP regression model. Applications to three COVID-19 data sets in Section 6 confirm the potentiality of the new models. Some conclusions are found in Section 7.

2. WBP Distribution

By substituting (1) and (2) in the W-G family [1], the WBP pdf follows as (for x 0 )
f ( x ) = a b x α 1 ( 1 + x ) α β I z ( α , β ) b 1 B ( α , β ) 1 I z ( α , β ) b + 1 exp a I z ( α , β ) 1 I z ( α , β ) b ,
and the corresponding hazard rate function (hrf) becomes
h ( x ) = a b x α 1 ( 1 + x ) α β I z ( α , β ) b 1 B ( α , β ) 1 I z ( α , β ) b + 1 ·
Henceforth, let X W B P ( a , b , α , β ) have pdf (4). Figure 1 and Figure 2 report plots of the pdf and hrf of X, respectively. Figure 1a shows that the WBP distribution can model data with bimodality. The hrf in Figure 2 can have four main shapes.
The main motivation to introduce the WBP distribution is due to the wide use of the BP distribution and the fact that the current generalization provides better fits to complex real data.

3. Properties

3.1. Quantile Function

By inverting the W-G family cdf, the quantile function (qf) of X reduces to
x = Q ( u ) = F 1 ( u ) = G 1 log a 1 ( 1 u ) 1 / b 1 + log a 1 ( 1 u ) 1 / b ,
G 1 ( u ) follows by inverting (1)
G 1 ( u ) = = I u 1 ( α , β ) 1 I u 1 ( α , β ) ,
where I u 1 ( α , β ) is the inverse incomplete beta function, which can be calculated from InverseBetaRegularized[u,a,b] (in MATHEMATICA) as
I u 1 ( α , β ) u + β 1 α + 1 u 2 + ( β 1 ) ( α 2 + 3 β α α + 5 β 4 ) 2 ( α + 1 ) 2 ( α + 2 ) u 3 +
Plots of the Bowley skewness (B) [15] and Moors kurtosis (M) [16] of X based on octiles are given below.
For any fixed value of b, Figure 3a shows that the skewness decays when parameter a increases, showing more pronounced curvature for b = 3.5 . For  a = 0.09 and a = 0.1 , Figure 3b shows that the skewness starts constantly when b grows and then decays. For a = 0.2 and a = 0.4 , it decreases almost instantly.
The behavior of the kurtosis is analogous as shown in Figure 4a,b. In Figure 4a, for any fixed value of b, the kurtosis decreases and then asymptotically approaches a constant when a increases. For  b = 0.5 , this behavior is slower. In Figure 4b, the kurtosis decreases and becomes asymptotically constant when b grows. For  a = 0.2 and a = 0.4 , this behavior happens quickly.

3.2. Linear Representation

Proposition 1.
The WBP pdf (4) has the linear representation
f ( x ) = i , m = 0 B i , m g ( x ; α i , m , β ) ,
where B i , m ’s are real numbers and α i , m ( α ) = ( i + 1 ) α + m .
Proof of Proposition 1.
The density of X (except for typos) was determined by [1]
f ( x ) = j , k = 0 ω j , k h ( k + 1 ) b + j ( x ) ,
where h p ( x ) = p g ( x ) G ( x ) p 1 (for p > 0 ) and
ω j , k = ω j , k ( a , b ) = ( 1 ) j + k b a k + 1 [ ( k + 1 ) b + j ] k ! [ ( k + 1 ) b + 1 ] j .
In particular, for the BP baseline, we can expand
I z ( α , β ) ( k + 1 ) b + j 1 = i = 0 s i , j , k I z ( α , β ) i ,
where s i , j , k = s i , j , k ( b ) = l = i ( 1 ) i + l ( k + 1 ) b + j 1 l l i , and then from (8)
f ( x ) = i = 0 A i x α 1 ( 1 + x ) α β I z ( α , β ) i ,
where A i = A i ( a , b ) = B ( α , β ) 1 j , k = 0 [ ( k + 1 ) b + j ] ω j , k s i , j , k .
The power series holds
I z ( α , β ) = z α B ( α , β ) m = 0 q m z m , | z | < 1 ,
where q m = q m ( α , β ) = ( 1 β ) m / m ! ( α + m ) and ( p ) m = p ( p 1 ) ( p m + 1 ) is the falling factorial. For a natural number i 1 , the Identity 0.314 in [17] gives
m = 0 q m z m i = m = 0 e m ( i ) z m ,
where e 0 ( i ) = q 0 i , and 
e m ( i ) = 1 m q 0 l = 1 m [ ( i + 1 ) l m ] q l e m l ( i ) , i 1 ,
Hence,
I z ( α , β ) i = z i α B ( α , β ) i m = 0 e m ( i ) z m , | z | < 1 .
Letting z = z ( x ) = x / ( 1 + x ) ,
I z ( α , β ) i = 1 B ( α , β ) i m = 0 e m ( i ) x m + i α ( 1 + x ) m + i α , x > 0 .
Furthermore, for  i = 0 , let e 0 ( 0 ) = 1 , and  e m ( 0 ) = 0 for m 1 . Inserting (11) in Equation (10), and under the previous conditions, gives
f ( x ) = i , m = 0 B i , m g ( x ; α i , m , β ) ,
where α i , m ( α ) = ( i + 1 ) α + m , and 
B i , m = B i , m ( a , b , α , β ) = A i ( a , b ) e m ( i ) B ( α i , m , β ) B ( α , β ) i ,
which completes the proof.    ☐
Equation (12) confirms that the WBP density is a linear combination of BP densities, which is useful for finding properties of X. In fact, this representation is important since complete and incomplete moments, generating function, mean deviations, and reliability are well-known results for the BP distribution.

3.3. Moments

We obtain μ s = E ( X s ) . For  s < β , we can write from (12) and (3)
μ s = i , m = 0 B i , m B ( α i , m + s , β s ) B ( α i , m , β ) ·
The sth incomplete moment of X (for s < b ) follows from (12) as
J s ( w ) = 0 w x s f ( x ) d x = 0 w x s 1 , m = 0 B i , m g ( x ; α i , m , β ) d x = i , m = 0 B i , m B ( α m , l + s , β s ) B ( α m , l , β ) I w / ( 1 + w ) ( α i , m + s , β s ) .
The mean deviations and inequality measures are calculated from the first incomplete moment.

4. Estimation and Simulations

Let x 1 , , x n be a sample from (4). The log-likelihood function for τ = ( a , b , α , β ) is
l n ( τ ) = n log a b B ( α , β ) + ( α 1 ) i = 1 n log x i ( α + β ) i = 1 n log ( 1 + x i ) + ( b 1 ) i = 1 n log I z i ( α , β ) ( b + 1 ) i = 1 n log [ 1 I z i ( α , β ) ] a i = 1 n I z i ( α , β ) 1 I z i ( α , β ) b .
The maximum likelihood estimates (MLEs) can be found via the Adequacymodel library [18] in R software by choosing a maximization method among those available.

Simulation Study

The simulation comprises the generation of samples from the WBP model from Equation (6) and maximizes (14) through the use of the BFGS algorithm in R for n { 50 , 75 , 100 } from 10,000 replications under three scenarios: a = 0.75 , b = 1.5 , α = 2.5 and β = 2 (Scenario 1); a = 0.75 , b = 1.2 , α = 1 and β = 1.5 (Scenario 2); and a = 0.75 , b = 1.2 , α = 2 and β = 2.5 (Scenario 3).
The findings in Table 1 reveal (for all scenarios) that the biases and mean squared errors (MSEs) of the estimates decrease when n grows. Note that b ^ and α ^ are underestimating b and α for all cases. All estimators improve when n increases.

5. WBP Regression Model

A WBP regression model is constructed for censored samples, quite common in areas such as econometrics, engineering, and clinical trials. Generally, for censored samples, it is common to consider the systematic component for the shape parameter α . Thus, we consider the systematic component α i = exp ( v i λ ) , where v i = ( v i 1 , , v i p ) is the vector of covariates and λ = ( λ 1 , , λ p ) is the vector of unknown parameters. Let v = ( v 1 , , v p ) . Note that future research may be developed using more systematic components.
The survival function of X i | v i is
S ( x | v i ) = exp a I z ( α i , β ) 1 I z ( α i , β ) b .
Equation (15) defines the WPB regression model.
A special feature of survival data is the presence of censoring, which is the partial observation of the response. This refers to circumstances in which some subjects are free from the event of interest, for example, by being withdrawn early from the study or by the end of the experiment. Then, it is important to add this information to statistical modeling.
Let ( x 1 , v 1 ) , , ( x n , v n ) be n independent observations, where x i denotes the observed lifetime or censoring time of the ith observation. Assume that the lifetimes and censoring times are independent, and their sets are F and C, respectively, i.e., the censoring is non-informative. The log-likelihood function for the vector of parameters τ = ( a , b , β , λ ) from model (15) is
l ( τ ) = r log ( a b ) + i F ( α i 1 ) log ( x i ) i F ( α i + β ) log ( 1 + x i ) i F log [ B ( α i , β ) ] + ( b 1 ) i F log [ I z i ( α i , β ) ] ( b + 1 ) i F log [ 1 I z i ( α i , β ) ] a i F I z i ( α i , β ) 1 I z i ( α i , β ) b a i C I z i ( α i , β ) 1 I z i ( α i , β ) b ,
where r is the number of failures. The estimate τ ^ is found by maximizing Equation (16).

5.1. Diagnostic and Residual Analysis

The assessment of robustness aspects of the estimates in regression models has been an important concern of various researchers in recent decades. The deletion measures examine the impact on the estimates after dropping individual observations, and they are the most employed technique to detect influential observations; see, for example, Ref. [19].
A global influence measure considered by [20] is a generalization of the Cook distance defined by a standardized norm θ ^ ( i ) θ ^ , namely
G D i ( θ ) = ( θ ^ ( i ) θ ^ ) L ¨ ( θ ) ( θ ^ ( i ) θ ^ ) ,
where L ¨ ( θ ) is the observed information matrix.
Another influence measure is the likelihood distance given by
L D i ( θ ) = 2 l ( θ ^ ) l ( θ ^ ( i ) ) ,
where l ( θ ^ ) is the maximized log-likelihood function for the full sample and l ( θ ^ ( i ) ) is the maximized log-likelihood function for the sample excluding the ith observation.
The quantile residuals (qrs) have the form
q r i = Φ 1 1 exp a ^ I z i ( α ^ i , β ^ ) 1 I z i ( α ^ i , β ) b ^ ,
where Φ 1 ( · ) is the inverse of the standard normal cdf.
Various plots of these residuals can be adopted to assess the regression assumptions and detect influential observations.

5.2. Simulation Study

A simulation study examines the accuracy of the MLEs in the WBP regression model for n = 100 , 250, and 500 and censoring percentages 0%, 10%, and 30%. Here, 1000 replicates of each sample are generated using the inverse transformation method. The censoring times c 1 , , c n are obtained from a Uniform ( 0 , γ ) , where γ controls the censoring percentage. The systematic component for the parameter α i (for i = 1 , , n ) is
log ( α i ) = λ 0 + λ 1 v 1 i ,
where λ 0 = 1 , λ 1 = 1.5 , σ = 0.3 , a = 1.1 , and  b = 0.6 .
The simulation process follows as (for i = 1 , , n ):
(i) Generate v i 1 Uniform ( 0 , 1 ) , and calculate α i from (20);
(ii) The generated lifetimes x i * are determined from the WBP( a , b , α i , β ) model using Equation (6);
(iii) Generate c i uniform ( 0 , γ ) and obtain x i = min ( x i * , c i ) ;
(iv) Set the censoring indicator: if x i * < c i , then δ i = 1 ; otherwise, δ i = 0 .
The values in Table 2 reveal that the average estimates converge to the true parameters, and the MSEs and biases decrease when n grows. Furthermore, the biases and MSEs of the estimates become larger when the censoring percentage increases. Hence, we conclude that the estimators are consistent.

6. Applications

First, the fits of the WBP, BP, Beta Beta Prime (BBP), and Kumaraswamy Beta Prime (KwBP) distributions are compared. The BBP and KwBP are special models of the McDonald inverted beta (McIB) [13].
For all fitted models, we calculate the MLEs and their standard errors (SEs). The well-known statistics (AIC, CAIC, BIC) defined by the initial letters are also calculated to compare the WBP distribution with its nested BP model. The Cramer–Von Mises ( W * ), Anderson–Darling ( A * ) and Kolmogorov–Smirnov (K-S) (and its p-value) statistics compare the WPB model with other distributions using the AdequacyModel [18], MASS and GenSA libraries of the R software. The maximization is performed using the SANN method.

6.1. Application 1: COVID-19 Data in the US

The first data set refers to 95 daily new deaths due to COVID-19 in the US (from 2 April 2021 to 31 July 2021) extracted from the link: https://www.worldometers.info/coronavirus/country/us/. This data set is used since the US is currently the country with the highest number of deaths from COVID-19. In the period, we find an average of 499.56 new deaths daily, and a standard deviation of 222.69, which can be explained by the evident variation in the number of daily deaths. In fact, the minimum number of daily deaths is 158 deaths, and the maximum is 985. In addition, we obtain skewness = 0.44 and kurtosis = 2.06.
Table 3 reports the MLEs and their SEs (in parentheses). The statistics (and the p-values of K-S) are reported in Table 4. The WBP distribution is better than the KwBP, BBP, and BP models.
The generalized likelihood ratio (GLR) test [21] assesses if there is any significant difference in the fits of the distributions. The WBP model outperforms the KwBP (GLR = 4.18) and BBP (GLR = 4.99) distributions for a significance level of 5%.
Figure 5a displays the histogram and the estimated WBP, KwBP, and BBP densities. Figure 5b reports the empirical and estimated cumulative distributions. The WBP distribution yields the best fit for a significance level of 5%.

6.2. Application 2: COVID-19 Data in Florianópolis, Brazil

According to the Votorantim Institute’s COVID-19 Municipal Vulnerability Index (MVI), Florianópolis is the least vulnerable capital to COVID-19 in Brazil [22]. In this context, the second application refers to 116 times (in days) of COVID-19 patients from the date of hospitalization until death in the city of Florianópolis registered from January to March, 2022 in the Ministry of Health platform at https://dados.gov.br/dataset/bd-srag-2021 (accessed on 26 May 2022). The average number of days from hospitalization to death is approximately 9.71 for patients in the analyzed period. The standard deviation is 7.67, which can be explained by the variation in these times. In fact, the minimum time from hospitalization to death is just only one day and the maximum 29 days. Furthermore, the skewness is 0.81 and the kurtosis 2.75.
The MLEs, SEs, and the previous statistics (with the p-values of K-S) for the fitted distributions to these data are reported in Table 5 and Table 6. The numbers in the second table support that the WBP distribution is the best model.
The Vuong test [21] indicates that the new distribution is more adequate than the KwBP (GLR = 8.08) and BBP (GLR = 5.77) distributions for a 5% level of significance. A comparison of the WBP distribution with its BP sub-model gives LR = 31.21 (p-value = 1.668 × 10 7 ). Thus, the WBP distribution is the best one to describe the current data.
The histogram of the data and some estimated densities are reported in Figure 6a. Figure 6b displays the empirical and estimated cumulative distributions. They show that the WBP is the best model for these data.

6.3. Application 3: COVID-19 Data in Campinas, Brazil

Some regression models are fitted to 655 survival times of coronavirus patients hospitalized (on April 2021) in the city of Campinas (state of São Paulo) obtained from https://opendatasus.saude.gov.br/en/dataset/srag-2021-e-2022 (accessed on 1 September 2022). This city has the third largest municipal population in this State, around 1,213,792 people in 2020 according to the Brazilian Institute of Geography and Statistics (IBGE) [23], thus justifying its choice for the application. The censoring percentage (67.8%) refers to deaths from other causes or end of observation time. The survival time is the period of time (in days) from the first symptom to the death from COVID-19.
The covariates are (for i = 1 , , 655 ):
  • x i : observed time (in days);
  • c e n s i : censoring indicator (0 = censoring, 1 = observed lifetime);
  • v i 1 : age (in years);
  • v i 2 : Chronic cardiovascular pathology (1=yes, 0=no or not informed).
Other studies have analyzed the influence of covariates on the time to death from COVID-19. Ref. [24] analyzed coronavirus data in Curitiba, (Brazil) and verified the influence of the sex and age on the times (in days) elapsed from the date of hospitalization to the death. Ref. [25] investigated risk factors associated with these deaths in the Mexican population using survival analysis and concluded that the risk of death was higher for men, older individuals, chronic kidney disease patients, and people admitted to public health services.
First, the analysis is done by modeling only the response variable by fitting the WBP, KwBP, BBP, and BP distributions. The results of these preliminary analyses are reported in Figure 7, where the WBP distribution is better than the others.
Next, we consider the following systematic components:
M 0 : log ( α i ) = λ 0 ,
M 1 : log ( α i ) = λ 0 + λ 1 v 1 i ,
M 2 : log ( α i ) = λ 0 + λ 2 v 2 i ,
M 3 : log ( α i ) = λ 0 + λ 1 v 1 i + λ 2 v 2 i .
Table 7 gives the selection criteria values, and the WBP regression model has the lowest values for all systematic components. Note that this model with the structure M 3 is superior to the other models.
The WBP, BBP, KwBP, and BP regression models with the structure M 3 are evaluated using the quantile–quantile (QQ) and Worm plots of the qs in Figure 8 and Figure 9, respectively. The WBP regression model- M 3 is better than the others in agreement with the results in Table 7.
The findings in the final WBP regression model- M 3 are given in Table 8, where two covariables are significant.
Figure 10 displays the index plots of the case deletion measures G D i ( θ ) and L D i ( θ ) . From Figure 10a, the 323th, 409th, and 584th cases are possible influential observations referring to the following patients:
  • 323th: A 42-year-old patient with failure time equal to one day who does not have cardiovascular disease;
  • 409th: A 64-year-old patient with a failure time of one day who has cardiovascular disease;
  • 584th: A 57-year-old patient with a failure time of one day who has cardiovascular disease.
Added Figure 10.
We examine the quality of fit of the WBP regression model— M 3 . The qrs are randomly around zero as shown in Figure 11a. The QQ plot of these residuals with a simulated envelope [26] is displayed in Figure 11b. We can accept that there is evidence of a good fit of the WBP regression model.
Some interpretations of the final WBP regression model:
  • The survival time tends to decrease when the patient gets older;
  • There is a difference for the survival times between patients with chronic cardiovascular disease and those that do not present this condition.

7. Conclusions

We proposed a four-parameter Weibull beta prime (WBP) distribution. The estimation was conducted by the maximum likelihood method, and a simulation study showed the consistency of the estimators. We constructed a WBP regression model for censored data and proved the importance of the new models using three COVID-19 data sets. They were compared with some known competing models, and they were more suitable to fit all data sets. The regression model with censored data from COVID-19 patients showed that advanced age and cardiovascular disease are significant factors for the survival time. We concluded that the proposed models can be interesting alternatives for symmetric and asymmetric data, with bimodal shapes, censored or uncensored. Finally, future extensions of the article include, for example, other systematic components, thus defining heteroscedastic regression models based on the WBP distribution. In addition, generalizations of the new regression model for multivariate configurations and linear mixed effects models can be investigated.

Author Contributions

Conceptualization, E.C.B. and G.M.C.; methodology, E.C.B., G.M.C. and L.H.d.S.; software, E.C.B., E.M.M.O. and G.M.R.; validation, E.C.B., G.M.C., E.M.M.O. and G.M.R.; formal analysis, E.C.B., E.M.M.O. and G.M.R.; investigation, E.C.B., E.M.M.O. and G.M.R.; data curation, E.C.B., E.M.M.O. and G.M.R.; writing—original draft preparation, G.M.C., L.H.d.S. and E.M.M.O.; writing—review and editing, G.M.C., E.C.B. and E.M.M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Stated in the text.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bourguignon, M.; Silva, R.B.; Cordeiro, G.M. The Weibull-G family of probability distributions. J. Data Sci. 2014, 12, 53–68. [Google Scholar] [CrossRef]
  2. Eugene, N.; Lee, C.; Famoye, F. Beta-normal distribution and its applications. Commun. Stat.-Theory Methods 2002, 31, 497–512. [Google Scholar] [CrossRef]
  3. Cordeiro, G.M.; de Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2011, 81, 883–898. [Google Scholar] [CrossRef]
  4. Afify, A.Z.; Al-Mofleh, H.; Aljohani, H.M.; Cordeiro, G.M. The Marshall-Olkin-Weibull-H family: Estimation, simulations, and applications to COVID-19 data. J. King Saud Univ.-Sci. 2022, 34, 102115. [Google Scholar] [CrossRef]
  5. El-Sherpieny, E.-S.A.; Almetwally, E.M.; Muhammed, H.Z. Bivariate Weibull-G Family Based on Copula Function: Properties, Bayesian and non-Bayesian Estimation and Applications. Stat. Optim. Inf. Comput. 2022, 10, 678–709. [Google Scholar] [CrossRef]
  6. Almongy, H.M.; Almetwally, E.M.; Alharbi, R.; Alnagar, D.; Hafez, E.H.; Mohie El-Din, M.M. The Weibull Generalized Exponential Distribution with Censored Sample: Estimation and Application on Real Data. Complexity 2021, 2021, 6653534. [Google Scholar] [CrossRef]
  7. McDonald, J.B.; Richards, D.O. Model selection: Some generalized distributions. Commun. Stat.-Theory Methods 1987, 16, 1049–1074. [Google Scholar] [CrossRef]
  8. Bourguignon, M.; Santos-Neto, M.; de Castro, M. A new regression model for positive random variables with skewed and long tail. Metron 2021, 79, 33–55. [Google Scholar] [CrossRef]
  9. McDonald, J.B.; Butler, R.J. Regression models for positive random variables. J. Econom. 1990, 43, 227–251. [Google Scholar] [CrossRef]
  10. McDonald, J.B.; Xu, Y.J. A generalization of the beta distribution with applications. J. Econom. 1995, 66, 133–152. [Google Scholar] [CrossRef]
  11. Leão, J.; Bourguignon, M.; Saulo, H.; Santos-Neto, M.; Calsavara, V. The Negative Binomial Beta Prime Regression Model with Cure Rate: Application with a Melanoma Dataset. J. Stat. Theory Pract. 2021, 15, 1–21. [Google Scholar] [CrossRef]
  12. Medeiros, F.M.C.; Araújo, M.C.; Bourguignon, M. Improved Estimators in Beta Prime Regression Models. 2020, pp. 1–18. Available online: https://arxiv.org/pdf/2008.11750v1.pdf (accessed on 18 October 2021).
  13. Cordeiro, G.M.; Lemonte, A.J. The McDonald inverted beta distribution. J. Frankl. Inst. 2012, 349, 1174–1197. [Google Scholar] [CrossRef]
  14. Worldometer. COVID-19 CORONAVIRUS PANDEMIC. 2022. Available online: https://www.worldometers.info/coronavirus/ (accessed on 4 November 2022).
  15. Kenney, J.F.; Keeping, E.S. Mathematics of Statistics. D. Nostrand Co. 1961, 1, 429. [Google Scholar]
  16. Moors, J. A Quantile Alternative for Kurtosis. J. R. Stat. Soc. Ser. 1988, 37, 25–32. [Google Scholar] [CrossRef] [Green Version]
  17. Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products; Academic Press: Cambridge, MA, USA, 2000; Volume 1221. [Google Scholar]
  18. Marinho, P.R.D.; Silva, R.B.; Bourguignon, M.; Cordeiro, G.M.; Nadarajah, S. AdequacyModel: An R package for probability distributions and general purpose optimization. PLoS ONE 2019, 14, e0221487. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Cook, R.D.; Weisberg, S. Residuals and Influence in Regression; Chapman and Hall: New York, NY, USA, 1982. [Google Scholar]
  20. Xie, F.C.; Wei, B.C. Diagnostics analysis in censored generalized Poisson regression model. J. Stat. Simul. 2007, 77, 695–708. [Google Scholar] [CrossRef]
  21. Vuong, Q.H. Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econom. J. Econom. Soc. 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
  22. Votorantim Institute. Browse the IVM Indicators. 2022. Available online: https://institutovotorantim.org.br/ivm/ (accessed on 4 November 2022).
  23. Brazilian Institute of Geography and Statistics. Available online: ibge.gov.br (accessed on 7 November 2022).
  24. Biazatti, E.C.; Cordeiro, G.M.; de Lima, M.C.S. The Dual-Dagum Family of Distributions: Properties, Regression and Applications to COVID-19 Data. Model Assist. Stat. Appl. 2022, 17, 199–210. [Google Scholar] [CrossRef]
  25. Salinas-Escudero, G.; Carrillo-Vega, M.F.; Granados-García, V.; Martínez-Valverde, S.; Toledano-Toledano, F.; Garduño-Espinosa, J. A survival analysis of COVID-19 in the Mexican population. BMC Public Health 2020, 20, 1616. [Google Scholar]
  26. Atkinson, A.C. Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis; Clarendon Press Oxford: Oxford, UK, 1985; Volume 282. [Google Scholar]
Figure 1. Density functions: (a) WBP( a , b , α , β ) and (b) WBP( a , b , 5 , 2.5 ).
Figure 1. Density functions: (a) WBP( a , b , α , β ) and (b) WBP( a , b , 5 , 2.5 ).
Stats 05 00069 g001
Figure 2. Hazard rates: (a) WBP( a , b , 5.5 , 3 ) and (b) WBP( a , b , 2 , 4 ).
Figure 2. Hazard rates: (a) WBP( a , b , 5.5 , 3 ) and (b) WBP( a , b , 2 , 4 ).
Stats 05 00069 g002
Figure 3. Plots of B for the WBP( a , b , 2.5 , 3 ) distribution: (a) for b fixed and (b) for a fixed.
Figure 3. Plots of B for the WBP( a , b , 2.5 , 3 ) distribution: (a) for b fixed and (b) for a fixed.
Stats 05 00069 g003
Figure 4. Plots of M for the WBP( a , b , 2.5 , 3 ) distribution: (a) for b fixed and (b) for a fixed.
Figure 4. Plots of M for the WBP( a , b , 2.5 , 3 ) distribution: (a) for b fixed and (b) for a fixed.
Stats 05 00069 g004
Figure 5. (a) Best estimated densities for COVID-19 data in US; (b) empirical and estimated cumulative distributions.
Figure 5. (a) Best estimated densities for COVID-19 data in US; (b) empirical and estimated cumulative distributions.
Stats 05 00069 g005
Figure 6. (a) Best estimated densities for COVID-19 data in Florianópolis; (b) empirical and corresponding estimated cumulative distributions.
Figure 6. (a) Best estimated densities for COVID-19 data in Florianópolis; (b) empirical and corresponding estimated cumulative distributions.
Stats 05 00069 g006
Figure 7. Empirical and estimated survival functions for COVID-19 data in Campinas.
Figure 7. Empirical and estimated survival functions for COVID-19 data in Campinas.
Stats 05 00069 g007
Figure 8. QQ plots of the qrs for COVID-19 data in Campinas from the regression models: (a) WBP; (b) BBP; (c) KwBP; (d) BP.
Figure 8. QQ plots of the qrs for COVID-19 data in Campinas from the regression models: (a) WBP; (b) BBP; (c) KwBP; (d) BP.
Stats 05 00069 g008
Figure 9. Worm plots of the qrs for COVID-19 data in Campinas from the regression models: (a) WBP; (b) BBP; (c) KwBP; (d) BP.
Figure 9. Worm plots of the qrs for COVID-19 data in Campinas from the regression models: (a) WBP; (b) BBP; (c) KwBP; (d) BP.
Stats 05 00069 g009
Figure 10. Index plots for: (a) G D i ( θ ) and (b) L D i ( θ ) .
Figure 10. Index plots for: (a) G D i ( θ ) and (b) L D i ( θ ) .
Stats 05 00069 g010
Figure 11. Plots of the qrs for COVID-19 data in Campinas. (a) Index plot; (b) QQ plot with envelope.
Figure 11. Plots of the qrs for COVID-19 data in Campinas. (a) Index plot; (b) QQ plot with envelope.
Stats 05 00069 g011
Table 1. Simulation findings for the MLEs of the WBP distribution.
Table 1. Simulation findings for the MLEs of the WBP distribution.
ScenarionMeasuresEstimators
a ^ b ^ α ^ β ^
Scenario 150Average0.879591.222482.339682.05307
Bias0.12959−0.27752−0.160320.05307
MSE0.056900.108480.060480.02369
75Average0.868561.222942.339872.04091
Bias0.11856−0.27706−0.160130.04091
MSE0.046980.103960.050230.01599
100Average0.861951.225302.341402.03094
Bias0.11195−0.27469−0.158590.03094
MSE0.042230.100960.041980.01082
Scenario 250Average0.879590.995470.934311.54015
Bias0.12959−0.20453−0.065690.04015
MSE0.056900.055540.023940.03631
75Average0.868560.997760.949061.53002
Bias0.11856−0.20224−0.050940.03002
MSE0.046980.047990.018500.03098
100Average0.861951.000160.962221.52613
Bias0.11195−0.19984−0.037780.02613
MSE0.042230.043740.014240.02899
Scenario 350Average0.879590.995471.842642.57223
Bias0.12959−0.20453−0.157360.07223
MSE0.056900.055540.058550.05216
75Average0.868560.997761.844432.56393
Bias0.11856−0.20224−0.155570.06393
MSE0.046980.047990.053820.04080
100Average0.861951.000161.845162.55868
Bias0.11195−0.19984−0.154840.05868
MSE0.042230.043740.052010.03498
Table 2. Simulations from the WBP regression model.
Table 2. Simulations from the WBP regression model.
n = 100 n = 250 n = 500
% τ AveragesBiasesMSEsAveragesBiasesMSEsAveragesBiasesMSEs
0 % λ 0 1.32140.32140.61101.14150.14150.25011.05190.05190.1395
λ 1 1.51570.01570.34451.4956−0.00440.12451.50540.00540.0584
σ 0.48850.18850.14580.37240.07240.03830.33130.03130.0173
a1.13650.03650.62621.0916−0.00840.23201.0843−0.01570.1364
b0.5892−0.01080.05930.63330.03330.03430.65640.05640.0251
10 % λ 0 1.32840.32840.65311.14500.14500.26111.05330.05330.1464
λ 1 1.51910.01910.36821.4960−0.00400.13171.50550.00550.0604
σ 0.50210.20210.16470.37550.07550.04140.33400.03400.0190
a1.13580.03580.66501.10930.00930.29591.0861−0.01390.1462
b0.5884−0.01160.06360.63410.03410.03710.65670.05670.0266
30 % λ 0 1.38660.38660.74641.17470.17470.29831.07270.07270.1707
λ 1 1.51680.01680.39561.50050.00050.13721.50880.00880.0660
σ 0.56210.26210.24820.39550.09550.05460.34670.04670.0254
a1.10620.00620.55491.10550.00550.32721.0832−0.01680.1625
b0.5737−0.02630.07380.62380.02380.04000.65010.05010.0286
Table 3. Findings for COVID-19 data in US.
Table 3. Findings for COVID-19 data in US.
Distribution a ^ b ^ α ^ β ^
WBP1.24294.503633.46680.2694
(0.3271)(0.3768)( 1.5 × 10 5 )(0.0115)
KwBP25.712778.09548.86540.47724
(0.0551)(0.0266)(0.0549)(0.0056)
BBP46.085432.193414.83270.2898
(0.0087)(0.0098)(0.8696)(0.0035)
BP--10.00000.2753
--(2.1758)(0.0313)
Table 4. Adequacy measures for COVID-19 data in US.
Table 4. Adequacy measures for COVID-19 data in US.
Distribution W * A * K-Sp-ValueAICCAICBIC
WBP0.10610.74990.2517 1 . 2 × 10 5 1350.971351.411361.19
KwBP0.11040.84570.3425 4.2 × 10 10 1394.511394.961404.73
BBP0.11420.88140.3504 1.5 × 10 10 1424.631425.071434.84
BP0.11660.90230.4934< 2.2 × 10 16 1595.831595.961600.94
Table 5. Findings for COVID-19 data in Florianópolis.
Table 5. Findings for COVID-19 data in Florianópolis.
Distribution a ^ b ^ α ^ β ^
WBP0.35430.187638.298710.0908
(0.0550)(0.0161)(0.3971)(0.4519)
KwBP2.26110.064810.366813.5489
(0.0004)(0.0060)(0.0002)(0.0001)
BBP0.061938.746688.57590.5290
(0.0058)(0.0009)(0.0006)(0.0110)
BP--2.17320.7719
--(0.2970)(0.0881)
Table 6. Adequacy measures for COVID-19 data in Florianópolis.
Table 6. Adequacy measures for COVID-19 data in Florianópolis.
Distribution W * A * K-Sp-ValueAICCAICBIC
WBP0.41772.91130.2102 7 . 1 × 10 5 800.02800.38811.03
KwBP0.51183.48790.3246 4.9 × 10 11 833.40833.76844.42
BBP0.56533.82280.2874 9.5 × 10 9 824.16824.53835.18
BP0.50253.43830.2409 2.9 × 10 6 827.23827.34832.74
Table 7. Adequacy measures from regression models for COVID-19 data in Campinas.
Table 7. Adequacy measures from regression models for COVID-19 data in Campinas.
ModelAICBICCAICModelAICBICCAIC
M 1 WBP2093.6962111.6352115.635 M 3 WBP2071.6532094.0762099.076
BBP2160.9072178.8452182.845BBP2140.2012162.6242167.624
KwBP2111.8582129.7962133.796KwBP2090.3712112.7942117.794
BP2148.9462157.9152159.915BP2127.4322140.8852143.885
M 1 WBP2046.3382068.7622073.762 M 3 WBP2041.6422068.5502074.550
BBP2128.6412151.0642156.064BBP2122.5852149.4932155.493
KwBP2071.4962093.9192098.919KwBP2065.2022092.1102098.110
BP2115.2542128.7082131.708BP2109.5882127.5272131.527
Table 8. Estimation results from the WBP regression model for COVID-19 data in Campinas.
Table 8. Estimation results from the WBP regression model for COVID-19 data in Campinas.
MLEsSEsp-Values
λ 0 0.21540.0497<0.001
λ 1 −0.00990.0010<0.001
λ 2 −0.12570.0338<0.001
log ( β ) −1.59560.0066<0.001
log ( a ) −2.13100.0485<0.001
log ( b ) 1.64180.0272<0.001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Biazatti, E.C.; Cordeiro, G.M.; Rodrigues, G.M.; Ortega, E.M.M.; de Santana, L.H. A Weibull-Beta Prime Distribution to Model COVID-19 Data with the Presence of Covariates and Censored Data. Stats 2022, 5, 1159-1173. https://doi.org/10.3390/stats5040069

AMA Style

Biazatti EC, Cordeiro GM, Rodrigues GM, Ortega EMM, de Santana LH. A Weibull-Beta Prime Distribution to Model COVID-19 Data with the Presence of Covariates and Censored Data. Stats. 2022; 5(4):1159-1173. https://doi.org/10.3390/stats5040069

Chicago/Turabian Style

Biazatti, Elisângela C., Gauss M. Cordeiro, Gabriela M. Rodrigues, Edwin M. M. Ortega, and Luís H. de Santana. 2022. "A Weibull-Beta Prime Distribution to Model COVID-19 Data with the Presence of Covariates and Censored Data" Stats 5, no. 4: 1159-1173. https://doi.org/10.3390/stats5040069

Article Metrics

Back to TopTop