A Skew Logistic Distribution for Modelling COVID-19 Waves and Its Evaluation Using the Empirical Survival Jensen–Shannon Divergence

A novel yet simple extension of the symmetric logistic distribution is proposed by introducing a skewness parameter. It is shown how the three parameters of the ensuing skew logistic distribution may be estimated using maximum likelihood. The skew logistic distribution is then extended to the skew bi-logistic distribution to allow the modelling of multiple waves in epidemic time series data. The proposed skew-logistic model is validated on COVID-19 data from the UK, and is evaluated for goodness-of-fit against the logistic and normal distributions using the recently formulated empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2). We employ 95% bootstrap confidence intervals to assess the improvement in goodness-of-fit of the skew logistic distribution over the other distributions. The obtained confidence intervals for the ESJS are narrower than those for the KS2 on using this dataset, implying that the ESJS is more powerful than the KS2.


Introduction
In exponential growth, the population grows at a rate proportional to its current size. This is unrealistic, since in reality, growth will not exceed some maximum, called its carrying capacity. The logistic equation [1] (Chapter 6) deals with this problem by ensuring that the growth rate of the population decreases once the population reaches its carrying capacity [2]. Statistical modelling of the logistic equation's growth and decay is accomplished with the logistic distribution [3] and [4] (Chapter 22), noting that the tails of the logistic distribution are heavier than those of the ubiquitous normal distribution. The normal and logistic distributions are both symmetric, however, real data often exhibits skewness [5], which has given rise to extensions of the normal distribution to accommodate for skewness, as in the skew normal [6] and epsilon skew normal [7] distributions. Subsequently, skew logistic distributions were also devised, as in [8,9].
Epidemics, such as COVID-19, are traditionally modelled by compartmental models such as the SIR (Susceptible-Infected-Removed) model and its extension, the SEIR (Susceptible-Exposed-Infected-Removed) model, which estimate the trajectory of an epidemic [10]. These models typically rely on assumptions on how the disease is transmitted and progresses [11], and are routinely used to understand the consequences of policies such as mask wearing and social distancing [12]. Time series models [13], on the other hand, employ historical data to make forecasts about the future, are generally simpler than compartmental models, and are able to make forecasts on, for example, number of cases, hospitalisations and deaths. The SIR model can be interpreted as a logistic growth model [14,15]. However, as the data is inherently skewed, a skewed logistic statistical model would be a natural choice, although, as such, it does not rely on biological assumptions in its forecasts [16].
Herein, we present a novel yet simple (one may argue the simplest), three parameter skewed extension to the logistic distribution to allow for asymmetry; c.f. [16]. Nevertheless, if instead of our extension we deploy one of the other skew logistic distributions (such as the one described in [8]) the results would no doubt be comparable to the results we obtain herein; however, we pursue our simpler extension, detailing its statistical properties.
In the context of analysing epidemics, the logistic distribution is normally preferred, as it is a natural distribution to use in modelling population growth and decay. However, we still briefly mention a comparison of the results we obtain in modelling COVID-19 waves with the skew logistic distribution, to one which, instead, employs a skew normal distribution (more specifically we choose the, flexible, epsilon skew normal distribution [7]). The result of this comparison implies that utilising the epsilon skew normal distribution leads, overall, to results which are comparable to those when utilising the skew logistic distribution. However, in practice, it is still preferable to make use of the skew logistic distribution as it is the natural model to deploy in this context [17], since, on the whole, it is more consistent with the data as its tails are heavier than those of a skew normal distribution.
Epidemics are said to come in "waves". The precise definition of a wave is somewhat elusive [18], but it is generally accepted that, assuming we have a time series of the number of, say, daily hospitalisations, a wave will span over a period from one valley (minima) in the time series to another valley, with a peak (maxima) in between them. There is no strict requirement that waves do not overlap, although, for simplicity we will not consider any overlap as such; see [18], for an attempt to give an operational definition of the concept of epidemic wave. In order to combine waves, we make use of the concept of bi-logistic growth [19,20], or more generally, multi-logistic growth, which allows us to sum two or more instances of logistic growth when the time series spans over more than a single wave.
To fit the skew logistic distribution to the time series data we employ maximum likelihood, and to evaluate the goodness-of-fit we make use of the recently formulated empirical survival Jensen-Shannon divergence (E SJS) [21,22] and the well-established Kolmogorov-Smirnov two-sample test statistic (KS2) [23] (Section 6.3). The E SJS is an information-theoretic goodness-of-fit measure of a fitted parametric continuous distribution, which overcomes the inadequacy of the coefficient of determination, R 2 , as a goodness-of-fit measure for nonlinear models [24]. The KS2 statistic also satisfies this criteria regarding R 2 ; however, we observe that the 95% bootstrap confidence intervals [25] we obtain for the E SJS are narrower than those for the KS2, suggesting that the E SJS is more powerful [26] than the KS2. Another well-known limitation of the KS2 statistic is that it is less sensitive to discrepancies at the tails of the distribution than the E SJS statistic is, in the sense that as opposed to E SJS it is "local", i.e., its value is determined by a single point [27].
The rest of the paper is organised as follows. In Section 2, we introduce a skew logistic distribution, which is a simple extension of the standard, symmetric, logistic distribution obtained by adding to it a single skew parameter and derive some of its properties. In Section 3, we formulate the solution to the maximum likelihood estimation of the parameters of the skew logistic distribution. In Section 4, we make use of an extension of the skew logistic distribution to the bi-skew logistic distribution to model a time series of COVID-19 data items having more than a single wave. In Section 5, we provide analysis of daily COVID-19 deaths in the UK from 30 January 2020 to 30 July 2021, assuming the skew logistic distribution as an underlying model of the data. The evaluation of goodness-of-fit of the skew logistic distribution to the data makes use of the recently formulated E SJS, and compares the results to those when employing the KS2 instead. We observe that the same technique, which we applied to the analysis of COVID-19 deaths, can be used to model new cases and hospitalisations. Finally, in Section 6, we present our concluding remarks. It is worth noting that in the more general setting of information modelling, being able to detect epidemic waves may help supply chains in planning increased resistance to such adverse events [28]. We note that all computations were carried out using the Matlab software package.

A Skew Logistic Distribution
Here, we introduce a novel skew logistic distribution, which extends, in straightforward manner, the standard two parameter logistic distribution [3] and [4] (Chapter 22) by adding to it a skew parameter. The rationale for introducing the distribution is that, apart from its simple formulation, we believe that the maximum likelihood solution presented below is also simpler than those derived for other skew logistic distributions, such as the ones investigated in [8,9]. This point provides further justification for our skew logistic distribution when introducing the bi-skew logistic distribution in Section 4. Now, let µ be a location parameter, s be a scale parameter and λ be a skew parameter, where s > 0 and 0 < λ < 2. Then, the probability density function of the skew logistic distribution at a value x of the random variable X, denoted as f (x; λ, µ, s), is given by: noting that for clarity we write x − µ above as a shorthand for x − µ , and κ λ is a normalisation constant, which depends on λ.
When λ = 1, the skew logistic distribution reduces to the standard logistic distribution as in [3] and [4] (Chapter 22), which is symmetric. On the other hand, when 0 < λ < 1, the skew logistic distribution is positively skewed, and when 1 < λ < 2, it is negatively skewed. So, when λ = 1, κ λ = 1, and, for example, when λ = 0.5 or 1.5, κ λ = 2/π. For simplicity, from now on, unless necessary, we will omit to mention the constant κ λ as it will not effect any of the results.
The skewness of a random variable X [4,5], is defined as: and thus, assuming for simplicity of exposition (due the linearity of expectations [5]) that µ = 0 and s = 1, the skewness of the skew logistic distribution, denoted by γ(λ), is given by: First, we will show that letting λ 1 = λ, with 0 < λ 1 < 1, we have γ(λ 1 ) > 0, that is f (x; λ 1 , 0, 1) is positively skewed. We can split the integral in (2) into two integrals for the negative part from −∞ to 0 and the positive part from 0 to ∞, noting that when x = 0, the expression to the right of the integral is equal to 0. Then, on setting y = −x for the negative part, and y = x for the positive part, the result follows, as by algebraic manipulation it can be shown that: implying that γ(λ 1 ) > 0 as required. Second, in a similar fashion to above, on letting λ 2 = λ 1 + 1 = λ, with 1 < λ 2 < 2, it follows that γ(λ 2 ) < 0, that is f (x; λ 2 , 0, 1) is negatively skewed. In particular, by algebraic manipulation we have that: implying that γ(λ 2 ) < 0 as required.
The cumulative distribution function of the skew logistic distribution at a value x of the random variable X is obtained by integrating f (x; λ, µ, s), to obtain F(x; µ, s, λ), which is given by: is the Gauss hypergemoetric function [29] (Chapter 15); we assume a, b and c are positive real numbers, and that z is a real number extended outside the unit disk by analytic continuation [30].

Maximum Likelihood Estimation for the Skew Logistic Distribution
We now formulate the maximum likelihood estimation [31] of the parameters µ, s and λ of the skew logistic distribution. Let {x 1 , x 2 , . . . , x n } be a random sample of n values from the density function of the skew logistic distribution in (1). Then, the log likelihood function of its three parameters is given by: In order to solve the log likelihood function, we first partially differentiate ln L(λ, µ, s) as follows: It is therefore implied that the maximum likelihood estimators are the solutions to the following three equations: which can be solved numerically. We observe that the equation for µ in (10) does not contribute to solving the maximum likelihood, since the location parameter µ is equal to the mean only when λ = 1. We thus look at an alternative equation for µ, which involves the mode of the skew logistic distribution.
To derive the mode of the skew logistic distribution we solve the equation, to obtain: Thus, motivated by (12) we replace the equation for µ in (10) with: where m is the mode of the random sample.

The Bi-Skew Logistic Distribution for Modelling Epidemic Waves
We start by defining the bi-skew logistic distribution, which will enable us to model more than one wave of infections at a time. We then discuss how we partition the data into single waves, in a way that we can apply the maximum likelihood from the previous section to the data in a consistent manner.
We present the bi-skew logistic distribution, which is described by the sum, of two skew logistic distributions. It is given in full as: which characterises two distinct phases of logistic growth (c.f. [19,32]). We note that (14) can be readily extended to the general case of the sum of multiple skew logistic distributions; however, for simplicity, we only present the formula for the bi-skew logistic case. Thus, while the (single) skew logistic distribution can only model one wave of infected cases (or deaths, or hospitalisations), the bi-skew logistic distribution can model two waves of infections, and in the general cases, any number of waves.
In the presence of two waves, the maximum likelihood solution to (14), would give us access to the necessary model parameters, and solving the general case in the presence of multiple waves, when the sum in (14) may have two or more skew logistic distributions, is evidently even more challenging. Thus, we simplify the solution for the multiple wave case, and concentrate on an approximation assuming a sequential time series when one wave strictly follows the next. More specifically, we assume that each wave is modelled by a single skewed logistic distribution describing the growth phase until a peak is reached, followed by a decline phase; see [33] who consider epidemic waves in the context of the standard logistic distribution. Thus, a wave is represented by a temporal pattern of growth and decline, and the time series as a whole describes several waves as they evolve.
To provide further clarification of the model, we mention that the skew-bi logistic distribution is not a mixture model per se, in which case there is a mixture weight for each distribution in the sum, as in, say, a Gaussian mixture [34] (Chapter 9). In the bi-skew logistic distribution case we do not have mixture weights, rather, we have two phases in our context waves, which are sequential in nature, possibly with some overlap, as can be seen in Figure 1 (c.f. [19,32]). Strictly speaking, the bi-skew logistic distribution can be viewed as a mixture model where the mixture weights are each 0.5 and a scaling factor of 2 is applied. Thus, as an approximation, we add a preprocessing step where we segment the time series into distinct waves, resulting in a considerable reduction to the complexity of the maximum likelihood estimation. We do, however, remark that the maximum likelihood estimation for the bi-skew logistic distribution is much simpler than that of a corresponding mixture model, due to the absence of mixture weights. In particular, although we could, in principle, make use of the EM (expectation-maximisation) algorithm [34] (Chapter 9) and [35] to approximate the maximum likelihood estimates of the parameters, this would not be strictly necessary in the bi-skew logistic case, cf. [36]. The only caveat, which holds independently of whether the EM algorithm is deployed or not, is the additional number of parameters present in the equations being solved. We leave this investigation as future work, and focus on our approximation, which does not require the solution to the maximum likelihood of (14); the details of the preprocessing heuristic we apply are given in the following section.

Data Analysis of COVID-19 Deaths in the UK
Here, we provide a full analysis of COVID-19 deaths in the UK from 30 January 2020 to 30 July 2021, employing the E SJS goodness-of-fit statistic and comparing it to the KS2 statistic. The daily UK COVID-19 data we used was obtained from [37].
As a proof of concept of the modelling capability of the skew logistic distribution, we now provide a detailed analysis of the time series of COVID-19 deaths in the UK from 30 January 2020 to 30 July 2021.
To separate the waves, we first smoothed the raw data using a moving average with a centred sliding window of 7 days. We then applied a simple heuristic, where we identified all the minima in the time series and defined a wave as a consecutive portion of the time, of at least 72 days, with the endpoints of each wave being local minima apart from the first wave, which starts from day 0. The resulting four waves in the time series are shown in Figure 1; see last column of Table 1 for the endpoints of the four waves. It would be worthwhile, as future work, to investigate other heuristics, which may, for example, allow overlap between the waves to obtain more accurate start and end points and to distribute the number of cases between the waves when there is overlap between them.
In Table 1, we show the parameters resulting from maximum likelihood fits of the skew logistic distribution to the four waves. Figure 2 shows histograms of the four COVID-19 waves, each overlaid with the curve of the maximum likelihood fit of the skew logistic distribution to the data. Pearson's moment and median skewness coefficients [38] for the four waves are recorded in Table 2. It can be seen that the correlation between these and 1 − λ is close to 1, as we would expect.   We now turn to the evaluation of goodness-of-fit using the E SJS (empirical survival Jensen-Shannon divergence) [21,22], which generalises the Jensen-Shannon divergence [39] to survival functions, and the well-known KS2 (Kolmogorov-Smirnov two-sample test statistic) [23] (Section 6.3). We will also employ 95% bootstrap confidence intervals [25] to measure the improvement in the E SJS and KS2, goodness-of-fit measures, of the skewlogistic over the logistic and normal distributions, respectively. For completeness, we formally define the E SJS and KS2.
To set the scene, we assume a time series [40], x = {x 1 , x 2 , . . . , x n }, where x t , for t = 1, 2, . . . , n is a value indexed by time, t, in our case modelling the number of daily COVID-19 deaths. We are, in particular, interested in the marginal distribution of x, which we suppose comes from an underlying parametric continuous distribution D.
The empirical survival function of a value z for the time series x, denoted by S(x)[z], is given by: where I is the indicator function. In the following, we will let P(z) = S(x)[z] stand for the empirical survival function S(x)[z], where the time series x is assumed to be understood from context. We will generally be interested in the empirical survival function P, which we suppose arises from the survival function P of the parametric continuous distribution D, mentioned above.
The empirical survival Jensen-Shannon divergence (E SJS) between two empirical survival functions, Q 1 and Q 2 arising from the survival functions Q 1 and Q 2 , is given by: where: We note that the E SJS is bounded and can thus be normalised, so it is natural to assume its values are between 0 and 1; in particular, when Q 1 = Q 2 its value is zero. Moreover, its square root is a metric [41], cf. [21].
The Kolmogorov-Smirnov two-sample test statistic between Q 1 and Q 2 as above, is given by: where max is the maximum function, and |v| is the absolute value of a number v. We note that KS2 is bounded between 0 and 1, and is also a metric. For a parametric continuous distribution D, we let φ = φ(D, P) be the parameters that are obtained from fitting D to the empirical survival function, P, using maximum likelihood estimation. In addition, we let P φ = S φ (x) be the survival function of x, for D with parameters φ. Thus, the empirical survival Jensen-Shannon divergence and the Kolmogorov-Smirnov two-sample test statistic, between P and P φ , are given by E SJS( P, P φ ) and KS2( P, P φ ), respectively, where P and P φ are omitted below as they will be understood from context. These values provide us with two measures of goodness-of-fit for how well D with parameters φ is fitted to x [22].
We are now ready to present the results of the evaluation. In Table 3, we show the E SJS values for the four waves and the said improvements, while in Table 4, we show the corresponding KS2 values and improvements. In all cases, the skew logistic is a preferred model over both the logistic and normal distributions, justifying the addition of a skewness parameter as can be see in Figure 2. Moreover, in all but one case the logistic distribution was preferred over the normal distribution-wave 3, where the KS2 statistic of the normal distribution was smaller than that of the logistic distribution. We observe that, for the second wave, the E SJS and KS2 values for the skew logistic and logistic distribution were the closest, since, as can be seen from Table 1, the second wave was more or less symmetric, in which case the skew logistic distribution reduces to the logistic distribution. In Tables 5 and 6, we present the bootstrap 95% confidence intervals of the E SJS and KS2 improvements, respectively, using the percentile method, while in Tables 7 and 8, we provide the 95% confidence intervals of the E SJS and KS2 improvements, respectively, using the bias-corrected and accelerated (BCa) method [25], which adjusts the confidence intervals for bias and skewness in the empirical bootstrap distribution. In all cases, the mean of the bootstrap samples is above zero with a very tight standard deviation. As noted above, the second wave is more or less symmetric, so we expect that the standard logistic distribution will provide a fit to the data, which is as good as the skew logistic fit. It is thus not surprising that in this case the improvement percentages are, generally, not significant. In addition, the improvements for the third wave are also, generally, not significant, which may be due to the starting point of the third wave, given our heuristic, being close to its peak; see Figure 1. We observe that, for this dataset, it is not clear whether deploying the BCa method yields a significant advantage over simply deploying the percentile method. In Table 9, we show the mean and standard deviation statistics of the confidence interval widths, of the metrics we used to compare the distributions, implying that, in general, the E SJS goodness-of-fit measure is more powerful than the KS2 goodness-of-fit measure. This is based on the known result that statistical tests using measures resulting in smaller confidence intervals are normally considered to be more powerful, implying that a smaller sample size may be deployed [42]. Table 5. Results from the percentile method for the confidence interval of the difference of the E SJS between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively. As mentioned in the introduction, we obtained comparable results to the above when modelling epidemic waves with the epsilon skew normal distribution [7] as opposed to using the skew logistic distribution; see also [43] for a comparison of a skew logistic and skew normal distribution in the context of insurance loss data, showing that the skew logistic performed better than the skew normal distribution for fitting the datasets tested. Further to the note in the introduction that the skew logistic distribution is a more natural one to deploy in this case due to its heavier tails, we observe that in an epidemic scenario, the number of cases counted can only be non-negative, while the epsilon skew normal also supports negative values.  Table 8. Results from the BCa method for the confidence interval of the difference of the KS2 between the logistic (Logit) and skew logistic (SL), and between the normal (Norm) and skew logistic (SL) distributions, respectively; Diff, LB, UB, CI, Mean and STD stand for difference, lower bound, upper bound, confidence interval, mean of samples and standard deviation of samples, respectively.

Concluding Remarks
We have proposed the skew-logistic and bi-logistic distributions as models for single and multiple epidemic waves, respectively. The model is a simple extension of the symmetric logistic distribution, which can readily be deployed in the presence of skewed data that exhibits growth and decay. We provided validation for the proposed model using the E SJS as a goodness-of-fit statistic, showing that it is a good fit to COVID-19 data in UK and more powerful than the alternative KS2 statistic. As future work, we could use the model to compare the progression of multiple waves across different countries, extending the work of [16].