On the Analysis of New COVID-19 Cases in Pakistan Using an Exponentiated Version of the M Family of Distributions

: This paper develops the exponentiated Mfamily of continuous distributions, aiming to provide new statistical models for data ﬁtting purposes. It stands out from the other families, as it depends on two baseline distributions, with the use of ratio and power transforms in the deﬁnition of the main cumulative distribution function. Thanks to the joint action of the possibly different baseline distributions, ﬂexible statistical models can be created, motivating a complete study in this regard. Thus, we discuss the theoretical properties of the new family, with emphasis on those of potential interest to the overall probability and statistics. Then, a new three-parameter lifetime distribution is derived, with the choices of the inverse exponential and exponential distributions as baselines. After pointing out the great ﬂexibility of the related model, we apply it to analyze an actual dataset of current interest: the daily COVID-19 cases observed in Pakistan from 21 March to 29 May 2020 (inclusive). As notable results, we demonstrate that the proposed model is the best among the 15 top ranked models in the literature, including the inverse exponential and exponential models, several modern extensions of them depending on more parameters, and the “unexponentiated” version of the proposed model as well. As future perspectives, the proposed model can be of interest to analyze data on COVID-19 cases in other countries, for possible comparison studies.


Introduction
The modeling and analysis of real-life data are essential to understand important features of random phenomena and to draw suitable conclusions as well. In particular, this requires the choice of statistical models based on probability distributions, whose adequateness against the observations will strongly influence the pertinence of the outputs. The analysis of recent data in applied sciences (environmental sciences, engineering, finance, etc.) has shown the limitations of the classical distributions, whose flexibility does not allow revealing some important details. To go further into these limitations, new distributions, often divided into specific families of distributions, have been created. A short list of the notorious families is the following: the skew-normal family (see [1]), Marshall-Olkin-Gfamily (see [2]), exponentiated-G family (see [3]), beta-G family (see [4]), order statistics-G family (see [5]), sinh-arcsinh-G family (see [6]), transmuted-G (see [7]), gamma-G family (see [8]), Kumaraswamy-G (see [9]), Topp-Leone-G (see [10]), and ratio-exponentiated-G (see [11]). The global motivation behind them is to extend the modeling properties of a classical baseline distribution by adding one or more tuning parameters through the use of various flexible transformations (power, beta, gamma, ratio, etc.).
Among all the proposed families, the Mfamily of continuous distributions introduced by [12] stands out from the others due to its original construction; it is defined by a cumulative distribution function (cdf) based on a ratio involving two baseline cdfs, with possibly different characteristics. More specifically, the corresponding cdf is defined by: where F 1 (x; ξ 1 ) and F 2 (x; ξ 2 ) are two cdfs of continuous distributions with sets of parameters represented by ξ 1 and ξ 2 , respectively. These two baseline cdfs can be chosen independently of each other, without a particular condition. However, for practical purposes, in order to avoid the over-parametrization phenomenon, it is recommended not to have too many parameters involved; one can reduce ξ 1 and ξ 2 to a unique parameter, or take ξ 1 and ξ 2 as two different parameters, or ξ 1 can be chosen as a subset of parameters of ξ 2 , or vice versa. Clearly, the M family contains a plethora of ratio distributions and models, since a multitude of choices for F 1 (x; ξ 1 ) and F 2 (x; ξ 2 ) is possible. However, to the best of our knowledge, this versatile aspect has not been fully explored yet. Indeed, in the former work of [12], F(x; ξ 1 , ξ 2 ) was presented as (1), with the proof that it satisfied the properties of a valid cdf. Then, as a direct application, a new two-parameter lifetime distribution was defined by the cdf (1) under the following simple configuration: F 1 (x; ξ 1 ) = F 2 (x; ξ 2 ), and F 1 (x; ξ 1 ) was chosen as the cdf of the Weibull distribution, i.e., F 1 (x; ξ 1 ) = F 1 (x; a, b) = 1 − e −(x/b) a , a, b, x > 0. Thus, the corresponding cdf is given by: As a main application, it was proven that the related model had a better fit to the exponentiated exponential, Weibull, and gamma models, for the failure times of the air conditioning system data from [13]. This nice result validated the entry of the M family on the short list. However, for the special configuration F 1 (x; ξ 1 ) = F 2 (x; ξ 2 ), the M family loses its intrinsic originality for the following reasons: (i) it does not mix different features for the baseline cdfs; (ii) it is included in the well-known Marshall-Olkin family since we can express F(x; ξ 1 , ξ 2 ) as: with θ = 1/2. That is, the general form of F(x; ξ 1 , ξ 2 ) is exploited at its minimum; the M family has not revealed all of its potential. Based on the previous setting, the multiple contributions of the paper can be summarized as follows: (i) We introduce a simple and natural extension of the M family by the use of the power transform, called the EMfamily. (ii) We provide some mathematical results of this family, which are also new and applicable to the former M family. (iii) We consider F 1 (x; ξ 1 ) and F 2 (x; ξ 2 ) of different natures, i.e., exponential and inverse exponential, respectively, to create a new promising (three-parameter lifetime) distribution, which demonstrates a high modeling ability for data fitting; versatile shapes are observed for the main functions. (iv) We investigate the estimation of the model parameters by a top ranked method in terms of efficiency: the maximum likelihood method. (v) We apply this model to an actual dataset of COVID-19 cases observed in Pakistan during the year 2020. As a main result, for these data of particular interest, the proposed model possesses an excellent fitting behavior, better than that of 15 other top ranked models in the literature, attesting to the importance of these findings.
The remainder of the works is outlined as follows. In Section 2, the EM family is introduced, and some of its mathematical results are proven. A special distribution of interest is presented in Section 3, with discussions. The estimation of the related model parameters is studied in Section 4. The application to a COVID-19 dataset is presented in Section 5. The conclusion is given in Section 6.

The EM Family
Here, the EM family is defined, with some of its important mathematical properties.

Definition
The EM family is the exponentiated version of the M family, that is a mix between the exponentiated-G and M families developed by [3,12], respectively. It is defined by the following cdf: where γ > 0 is a shape parameter and F 1 (x; ξ 1 ) and F 2 (x; ξ 2 ) are two baseline cdfs of continuous distributions with sets of parameters represented by ξ 1 and ξ 2 , respectively. These two baseline cdfs can be chosen independently of each other. Naturally, by taking γ = 1, we rediscover the cdf of the former M family. The role of γ is to flexibilize the rigid ratio cdf given by (1), aiming to improve several of its characteristics (skewness, kurtosis, tails' heaviness, modes' properties, etc.). Furthermore, the following stochastic ordering result holds: showing different perspectives of modeling for the EM family in comparison to the former M family. Among the notable studies employing the exponentiated technique, we may refer the reader to [14][15][16].
In addition to the cdf, the probability density function (pdf) of a continuous distribution plays a fundamental role in probability and statistics. The pdf of the EM family is given by differentiating F(x; ξ 1 , ξ 2 , γ) with respect to x, almost surely. After some developments, it is obtained as: where f 1 (x; ξ 1 ) and f 2 (x; ξ 2 ) are the pdfs corresponding to F 1 (x; ξ 1 ) and F 2 (x; ξ 2 ), respectively. That is, for a random variable X defined on a generic probability set, say (Ω, A, P), having the pdf of the EM family and any set A ⊆ R, we have: The pdf is also central in the transfer theorem, which ensures that, for any function of X, say T(X), the expectation of T(X) is given by: provided that it exists. From this formula, several types of moments, coefficients, probabilistic functions, and entropy can be defined (see [17]). Let us mention that, thanks to their integral expressions, P(X ∈ A) and E[T(X)] can be determined numerically with the help of any mathematical software.

Reliability Functions
The following functions of the EM family are central in various probability and statistics areas, with emphasis on reliability analysis. First of all, the survival function (sf) is specified as: Furthermore, the hazard rate function (hrf), reversed hazard rate function (rhrf), and cumulative hazard rate function (chrf) are given by, respectively, and: All the details on these functions, along with their applications in concrete settings, can be found in [18]. In the next part of the study, emphasis will be put on the pdf and hrf, due to their strong meaning in the fitting of data.

Properties
Stochastic ordering results: Now, we aim to compare the EM family with other existing families of distributions in the (usual) stochastic ordering sense (see [19]). That is, for two random variables X and Y for which at least one has the cdf of the EM family, we formalize the fact that X is less likely than Y to take any value lower than x, i.e., P(X ≤ x) ≤ P(Y ≤ x). The main results are presented in the following proposition. Proposition 1. The following inequalities hold.
From the statistical point of view, Proposition 1 reveals a distributional hierarchy within the EM models and among other well-identified models of the literature. Thus, in comparison to these models, the EMIEEmodels offer new alternatives, depending on the repartition of the data.
Some series expansions: A series expansion of the cdf of the EM family in terms of simpler cdfs (defined as the product of exponentiated baseline cdfs) is described in the following proposition.

Proposition 2.
The cdf of the EM family can be expressed as: where: Proof. The key of the proof is to notice that F(x; ξ 1 , ξ 2 , γ) can be written as: , the general and standard binomial formulae give: Then, we can remark that G ,m (x; ξ 1 , ξ 2 ) is a valid cdf, corresponding to the one of the random variable max(X are m independent and identically distributed (iid) random variables having the cdf F 1 (x; ξ 1 ) and X From Proposition 2, we can derive a useful sum expression of the pdf of the EM family, as presented below. Corollary 1. Let us consider the notations of Proposition 2. The pdf of the EM family can be expressed as: Owing to Corollary 1 and (5), one can provide the following sum expression for E[T(X)]: where, by denoting Q 1 (u; ξ 1 ) and Q 2 (u; ξ 2 ) the inverse functions of F 1 (x; ξ 1 ) and F 2 (x; ξ 2 ), respectively, I k, ,m is given by: The involved integrals can have closed-forms, depending on the complexity of Then, one can admit the following useful approximation: where K and M denote large integers, such that the residual term of the approximation is negligible. Hence, we approximate the complicated integral quantity E[T(X)] by a finite sum of computable coefficients, which can be more efficient than computing the integral directly.

On a Special EM Distribution
The EM family contains a myriad of new ratio distributions. Here, we focus on a new promising one, exploiting the mix of possibly baseline cdfs of a different nature.

Definition and Shapes' Analysis
Here, we introduce the EM inverse exponential exponential (EMIEE) distribution with parameters α > 0, β > 0, and γ > 0, defined by the cdf given by (3), under the following configuration: corresponding to the cdf of the inverse exponential distribution with parameter α (see [20]), x > 0, corresponding to the cdf of the standard exponential distribution with parameter β.
The choice of these functions is motivated by the following arguments: (i) F 1 (x; α) and F 2 (x; β) are simple, both depending on only one parameter; (ii) the inverse exponential and exponential distributions are complementary, showing different characteristics on the tails, with various polynomial-exponential decay, summarized in the following relation: We thus aim to mix the features of these two distributions following the scheme of the EM family. That is, the cdf of the EMIEE distribution is the following: It represents a three-parameter lifetime distribution, with remarkable flexible properties. This aspect is developed in the next part of the study. At first glance, note that, if α → +∞, then F(x; α, β, γ) becomes the cdf of the exponentiated exponential distribution with parameters β and γ (see [3]), and if γ = 1, we obtain the "unexponentiated" version of the distribution, naturally called the MIEE distribution.
By differentiating with respect to x, the pdf of the EMIEE distribution is given by: All the functions of Section 2.2 can be expressed in a similar manner. Here, we only mention the hrf, which remains of great interest for such a lifetime distribution (see [18]). Therefore, it is expressed as: The rest of the study is devoted to some properties of the EMIEE distribution, beginning with the shape properties of f (x; α, β, γ) and h(x; α, β, γ).
The mode(s) analysis of the EMIEE distribution provides important information on the "tops of the bell shapes" of the related model. Mathematically, the mode(s) can be obtained by solving the following equation: d f (x; α, β, γ)/dx = 0, which is equivalent to: Several solutions are possible, depending on the values of α, β and γ. After investigations, the EMIEE distribution is revealed to be unimodal or bimodal (including a "limiting mode" in zero in this last case). An analytical expression for a mode seems however not possible. The asymptotes of f (x; α, β, γ) are studied below. After some developments, we get: This illustrates the importance of the power parameter γ in these asymptotes. Furthermore, we have: One can remark that the parameter α plays no role in these asymptotes. However, the fine variations of f (x; α, β, γ) are complicated to handle analytically, due to a high level of complexity for the involved equations. For this reason, we propose a graphical approach in Figure 1. From Figure 1, we see that f (x; α, β, γ) has very versatile shape properties. In particular, one or two modes, reversed J shapes, several kinds of bathtub shapes, N shapes, abrupt spikes, plate shapes, and remarkable heaviness on the tails are observed, reaching some extreme situations in terms of modeling.
Let us now focus on the shape properties of h(x; α, β, γ). First of all, the critical points of h(x; α, β, γ) can be obtained by solving the following equation: dh(x; α, β, γ)/dx = 0, which is equivalent to: The complexity of this equation is an obstacle for providing exact analytical solutions; the number of solutions depends on the values of α, β, and γ, and no closed-form of them can be set, motivating the use of a graphical approach, as proposed later.
The asymptotes of h(x; α, β, γ) are studied below. We have: As for f (x; α, β, γ), the parameter α plays no role in the asymptotes of h(x; α, β, γ). Furthermore, the deep shape properties of h(x; α, β, γ) are hard to present analytically. We thus complete our analysis by a graphical approach; some plots of h(x; α, β, γ) are sketched in Figure 2. From Figure 2, we see that the hrf can be increasing, decreasing, with reserved J shapes, constant shapes, and N shapes. This wide panel of shapes indicates the great flexibility of the related distribution. In this regard, we may refer the reader to [21].

On Different Measures
The raw moments of the EMIEE distribution can be determined and computed by using (5) or (6), along with important probability and statistical measures. As an example, for a random variable X following the EMIEE distribution with parameters α, β, and γ, the k th raw moment of X is given by µ k = E(X k ), corresponding to (5) with T(x) = x k . Furthermore, from the raw moments, the following measures can be specified: • the mean of X defined by µ 1 , remaining the central parameter of the distribution, • the variance of X given as Var = µ 2 − (µ 1 ) 2 , providing a dispersion parameter, • the standard deviation of X defined as σ = Var 1/2 , corresponding to a dispersion parameter with the same unit as the mean, • the skewness of X given by SK = [µ 3 − 3µ 1 µ 2 + 2(µ 1 ) 3 ]/σ 3 , measuring the lack of symmetry of tails of the EMIEE distribution (about the µ 1 ), • the kurtosis of X specified by KU = [µ 4 − 4µ 1 µ 3 + 6(µ 1 ) 2 µ 2 − 3(µ 1 ) 4 ]/σ 4 , measuring how heavily the tails of the EMIEE distribution differ from those of a normal distribution, • the coefficient of variation of X defined as CV = σ/µ 1 , providing a dispersion parameter that can serve as a benchmark for comparison.
We refer the reader to the book of [17] for further details on these measures. A numerical treatment of these measures is proposed in Tables 1-3, for some selected values of the parameters. Tables 1-3 show that the considered measures can take a wide range of values, with some increasing/decreasing tendencies depending on the increasing/decreasing tendencies of the values of the parameters. In particular, in Table 1, at α = 0.5 and β = 0.5 and when the value of γ increases, then the values of SK, KU, and CV decrease, but the value of Var increases. Furthermore, from Table 2, at γ = 2.0 and β = 0.5 and when the value of α increases, then the values of SK and KU increase, but the values of Var and CV decrease. Table 3 indicates that, at γ = 2.0 and α = 0.5 when the value of β increases, the values of Var and CV decrease. The versatility of these measures is an additional quality of the EMIEE distribution. Table 1. Numerical values of some moments, variance, skewness (SK), kurtosis (KU), and the coefficient of variation (CV) of the EMIEE distribution for some selected values of γ and at α = 0.5 and β = 0.5.
Measure γ = 1.5 γ = 2.0 γ = 2.5 γ = 3.0 γ = 3.5 γ = 4.0 γ = 4. We complete this part by discussing the entropy of the EMIEE distribution through the Rényi entropy defined by (this inequality is an additional condition to ensure the existence of I δ ). Therefore, one can express the main term via (5) or (6) by taking T(x) = [ f (x; α, β, γ)] δ−1 . A numerical study of the Rényi entropy is performed in Table 4. Table 4 shows the versatile nature of the Rényi entropy of X; it can be positive or negative, with varying values. In some sense, this shows the flexibility of the amount of randomness of the EMIEE distribution. Further details about the Rényi entropy, and the general concept of entropy, can be found in [22].

Application to a COVID-19 Dataset
Here, we propose a concrete application with an actual dataset to assess the interest in the EMIEE model. The considered data, called the COVID-19 dataset, is presented below.
COVID-19, which can be renamed as the "the flu of 2020", is due to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Sadly, it spread quickly in the beginning of the year 2020, taking thousands of victims, obliging governments to take exceptional measures to protect their people. The update as of 29 May 2020 situation of this pandemic tragedy can be found in [25][26][27]. Naturally, the overall comprehension of COVID-19 is a challenge for all scientists, but necessary for the sake of future generations. In this section, we modestly contribute to the subject by applying the EMIEE model to fit data of daily new COVID-19 confirmed cases in Pakistan from 21 March to 29 May 2020 (inclusive), showing that it is very efficient in this regard. We thus assumed that the new COVID-19 (confirmed) cases in Pakistan could be modeled by a continuous variable (since a discrete variable with a wide range of values could be considered as such) and provided a new statistical model that could be relevant for the following points: (i) Provide a precise estimation for some measures of interest related to COVID-19 cases in Pakistan (mean of cases, probability to have a certain number of cases, and so on), (ii) Compare the repartitions of the number of COVID-19 cases in Pakistan with those in other countries, (iii) Propose an efficient strategy for fitting data on COVID-19 cases in other countries, (iv) In a more challenging way, model the distribution of the number of cases for any pandemic with similar features and under a similar environment (with comparable populations, comparable climate, sanitary system, etc.).
Aiming to identify the possible shapes of the unknown hrf behind these data, we plot the total time on test (TTT) plot in Figure 3 (see [28] for further details on the use of TTT plots in data analysis). In Figure 3, since the red line is convex, then concave, the unknown hrf probably presents a bathtub shape. Therefore, the EMIEE distribution is appropriate to fit the data. Now, we aimed to compare the fitness of the EMIEE model with the one of 15 top ranked models in the literature: (i) the Weibull-exponential (WE) model by [29], (ii) the Lomax-exponential (LE) model by [30], (iii) the gamma-exponentiated exponential (GaE) model by [31], (iv) the beta Weibull (BW) model by [32], (v) the Kumaraswamy exponential (KE) model by [33], (vi) the Burr X-exponential (BXE) model by [34], (vii) the exponentiated exponential (EE) model by [35], (viii) the CStransformation of exponential (CE) model by [36], (ix) the standard exponential (E) model (see [37], among others), (x) the alpha-power inverse Weibull (AIW) model by [38], (xi) the Gompertz inverse exponential (GomIE) model by [39], (xii) the Weibull-inverse exponential (WIE) model by [40], (xiii) the inverse Weibull-inverse exponential (IWIE) model by [41], (xiv) the inverse exponential (IE) model by [20], and last, but not least, (xv) the "unexponentiated" version of the proposed EMIEE model, i.e., the MIEE model. We refer to the above references for the precise definitions of the related cdfs and pdfs, along with the Greek alphabet letters used for the parameters. Then, the model parameters were estimated through the practice of the maximum likelihood method (with the BFGS algorithm). The R software was used in this regard. The calculations of the MLEs and SEs for all the model parameters are provided in Table 7. Table 7. MLEs and standard errors (SEs) (under parentheses) of the model parameters for the COVID-19 dataset: Weibull-exponential (WE) model, the Lomax-exponential (LE) model, the gamma-exponentiated exponential (GaE) model, the beta Weibull (BW) model, the Kumaraswamy exponential (KE) model, the Burr X-exponential (BXE) model, the exponentiated exponential (EE) model, the CStransformation of exponential (CE) model, the standard exponential (E) model, the alpha-power inverse Weibull (AIW) model, the Gompertz inverse exponential (GomIE) model, the Weibull-inverse exponential (WIE) model, the inverse Weibull-inverse exponential (IWIE) model, the inverse exponential (IE) model. Therefore, based on (8), the corresponding estimated pdf is given by:

Model
Thus,f (x) is an estimated function of the unobservable underlying pdf of the number of COVID-19 cases in Pakistan. By the use of this function, one can estimate the quantities of interest. Some basics of them are presented below. By denoting X the random variable modeling the daily COVID-19 confirmed cases in Pakistan during the epidemic, the probability that X belongs to a chosen interval, say [a, b], can be estimated byp a,b = b af (x)dx. For instance, the probability that the COVID-19 cases in Pakistan are less than a certain values c is given byp 0,c . More generally, an estimation of the mean of a certain transformation of X, say T(X), can be estimated bŷ For instance, the average number of COVID-19 cases in Pakistan can be approximated with precision byμ * by taking T(x) = x, and so on.
As planned, a comparison of the models in terms of fitting was performed. We decided which was the best model by determining the values of the following statistical measures: minus complete log-likelihood function (−ˆ ), Akaike information criterion (AIC), Bayesian information criterion (BIC), Cramer-von Mises (W) criterion, and Anderson-Darling (A) criterion. Furthermore, we considered the value of the Kolmogorov-Smirnov (KS) statistic and its p-value. The best model was the one having the smallest −ˆ , AIC, BIC, W, A, and KS and the largest KS p-value. For the considered data, the obtained values are shown in Table 8. From Table 8, we see that the EMIEE model was the best among all the considered models, with the following numerical criteria: −ˆ = 221.3346, AIC = 448.6692, BIC = 455.4147, W = 0.1228, A = 0.8148, KS = 0.0991, and KS p-value = 0.5679. One can notice that the EMIEE model outperformed the baseline E and IE models, and also, the MIEE model was derived from the former M family, validating the use of the exponentiated transform for fitting purposes. Figure 4 shows the estimated pdf as described in (10) over the histogram of the data. Figure 5 presents the estimated cdf, i.e., based on (7),F(x) = F(x;α,β,γ), over the empirical cdf of the data.
The probability-probability (P-P) plot in Figure 6 shows how closely the estimated and empirical cdfs agreed.  In all the graphics, we see that the red curves fit perfectly the black data objects, motivating the importance of the EMIEE model in the analysis of the COVID-19 dataset. We end this application by displaying the estimated hrf of the EMIEE model in Figure 7.
We see that the estimated hrf has a bathtub shape, which was in coherence with what was interpreted in Figure 3.

Conclusions
In this paper, we derived a natural extension of the M family, called the exponentiated M (EM) family. We investigated its main mathematical properties and discussed its ability in terms of statistical modeling. Light was shed on a new promising distribution of the EM family, based on the inverse exponential and exponential distributions. It was called the EM inverse exponential exponential (EMIEE) distribution. We investigated the estimation of the EMIEE model parameters by a reputed method: the maximum likelihood method. We applied it to analyze new COVID-19 cases in Pakistan during 21 March to 29 May 2020 (inclusive), with fair comparisons with 15 other solid models. The fitting results were quite favorable to the EMIEE model. That is, the EMIEE model could be used for similar analyses in other countries, allowing comparisons in this regard and, consequently, a better understanding of the COVID-19 pandemic.